編程課程鏈接:https://www.gitbook.com/book/mugglecoding/qa/details
課程名稱:第三節(jié)練習(xí)項(xiàng)目:爬取租房信息
總結(jié):
1袖裕、for循環(huán)冬念,如果是只有一個(gè)變量敷存,不能用zip 不能加括號
例如 for a档址,b in zip(as旷偿,bs): 這個(gè)是可以的
例如 for a in zip(as): 或者 for a in (as): 這個(gè)是不可以的贰拿。
2赐写、中文的逗號,冒號,括號豌注,一律不能過伤塌。。轧铁。
3每聪、初學(xué)者,每一次拷貝出 selector css 路徑后齿风,最好都打印一下看看熊痴。
4、def 用來設(shè)置函數(shù)聂宾,函數(shù)書寫順序要求,在調(diào)用一個(gè)函數(shù)時(shí)诊笤,函數(shù)一定要在該調(diào)用上面系谐。這個(gè)不是特好,那就粘貼來粘貼去好了讨跟。
from bs4 import BeautifulSoup
import requests
def get_loder_sex(class_name):
if class_name==['member_ico']:
return '男'
else:
return '女'
def get_links(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
links = soup.select('#page_list > ul > li > a')
for link in links:
href = link.get("href")
get_detail_info(href)
def get_detail_info(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text,'lxml')
titles = soup.select('h4 > em')
addresss = soup.select(' p > span.pr5')
prices = soup.select(' div.day_l > span')
imgs = soup.select('#curBigImage')
owners = soup.select('div.js_box.clearfix > div.member_pic > a > img')
sexs = soup.select(' div.js_box.clearfix > div.member_pic > div')
names = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')
for title, address, price, img, owner, name, sex \
in zip(titles, addresss, prices, imgs, owners, names, sexs):
data = {
'title': title.get_text(),
"address": address.get_text(),
"price": price.get_text(),
"image": img.get("src"),
"owners": owner.get("src"),
"name": name.get_text(),
"sex": get_loder_sex(sex.get("class"))
}
print(data)
urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(number) for number in range(1,10)]
for single_url in urls:
get_links(single_url)
'''