Avoid Nested for-loop

I’m trying to avoid writting nested for-loops, cause they are hard to read:

from bs4 import BeautifulSoup 
import requests
import time
import json

class bs_spider: 
    def __init__(self,url):
        '''define a class named bs_spider'''
        self.url = url
        
    def get_title_list(self, number, tag1, tag2): 
        '''define a method'''
        title_list = []
        for c in range(1,number+1):
            time.sleep(3)
            print('正在抓取第',str(c),'页')
            url1 = self.url + str(c)
            try:
                wb = requests.get(url1)
                soup = BeautifulSoup(wb.text,'lxml')
                for i in soup.find_all(tag1):
                    term = i.get(tag2)
                    if (term != None) : # delete null value
                        title_list.append(term)
            except OSError: # I don't know why catch OSError
                print('抱歉,无法访问您输入的链接!')
        print('抓取完毕!共抓取',str(len(title_list)),'个标题!')
        return list(set(title_list)) # get unique element

url = 'https://search.bilibili.com/all?keyword=冬泳怪鸽&page='
bbilititle = bs_spider(url).get_title_list(5,'a','title')
#> 正在抓取第 1 页
#> 正在抓取第 2 页
#> 正在抓取第 3 页
#> 正在抓取第 4 页
#> 正在抓取第 5 页
#> 抓取完毕!共抓取 200 个标题!

This is a simple chinese webpage crawler. The first for-loop is to catch every page, every time it runs, all ‘a’ and ‘title’ html tags will be collected, that what the second for does. The logic is clear, however, I bet you could not understand. Double for-loop is disgusting to read! When you use title_list = [] to create an empty list is not delicate, thus I tried to modify my code, first I write a function:

def find_tag(url_1,tag1,tag2):
    '''get tag from one page, get what caption tag'''
    try:
        wb = requests.get(url_1)
    except OSError:
        print('抱歉,无法访问您输入的链接!')
    soup = BeautifulSoup(wb.text,'lxml')
    alist = [i.get(tag2) for i in soup.find_all(tag1) if i.get(tag2) != None] 
    return alist 

Here find_tag is used to replacing the second for-loop. I use a list comprehension to construct a list, it makes my code clearer and tidier. Then I construct the bs_spider class:

class bs_spider: 
    def __init__(self,url):
        '''define a class named bs_spider'''
        self.url = url

    def get_title_list(self, number, tag1, tag2): 
        '''define a method'''
        title_list = []
        for c in range(1,number+1):
            time.sleep(3)
            print('正在抓取第',str(c),'页')
            url1 = self.url + str(c)
            term = find_tag(url1, tag1, tag2)
            title_list.extend(term)
        print('抓取完毕!共抓取',str(len(title_list)),'个标题!')
        return list(set(title_list)) # return unique elements
    
url = 'https://search.bilibili.com/all?keyword=冬泳怪鸽&page='
bbilititle = bs_spider(url).get_title_list(5,'a','title')     
#> 正在抓取第 1 页
#> 正在抓取第 2 页
#> 正在抓取第 3 页
#> 正在抓取第 4 页
#> 正在抓取第 5 页
#> 抓取完毕!共抓取 200 个标题!

I didn’t use list comprehension because it will make my code more complex. Attention that I use list.extend() method to merge all page’s captions.

Use more functions, list comprehensions rather than stacking for-loops. I feel this way cause I’ve read others’ code. That moment I deeply doubt if I’ve learned programming. Code readability is really important to me.