最近開始學(xué)習(xí)爬蟲啃奴,從最簡單的爬糗事百科網(wǎng)開始,要爬的是24小時熱榜套么,網(wǎng)站長這樣:
糗事百科.png
看下網(wǎng)頁源碼:
源碼.png
可以看到要爬的糗事的位置,等下要寫正則
OK碳蛋,下面開始打開PYTHON胚泌,要用urllib,urllib2包,和re包
Request訪問網(wǎng)頁肃弟,urlopen打開網(wǎng)頁玷室,read把網(wǎng)頁內(nèi)容讀取下來
然后正則匹配要查找的內(nèi)容
# -*- coding: utf-8 -*-
import urllib
import urllib2
import re
#import sys
#reload(sys)
#sys.setdefaultencoding('uft-8')
page=1
url='http://www.qiushibaike.com/hot/page/'+str(page)
user_agent='Mozilla/4.0(compatible;MSIE 5.5;Windows NT)'
headers = {'User-Agent': user_agent}
try:
request = urllib2.Request(url,headers= headers)
response =urllib2.urlopen(request)
content =response.read().decode('utf-8')
pattern = re.compile('<span>([^<].*?)</span>.*?<div.*?>\s<span.*?>\s*?<span.*?>(\d*?)</i>',re.S)
items =re.findall(pattern,content)
#把匹配內(nèi)容存入文檔
a = open("E:\edx 6.001\qsbk.txt","w")
for item in items:
a.write(item[0].encode('utf-8'))
a.write('\n')
a.write(item[1])
a.write('\n')
a.close()
except urllib2.URLError as e:
if hasattr(e,"code"):
print e.code
if hasattr(e,"reason"):
print e.reason
這是爬下來的糗事:
糗事百科.png
爬下來后該干嘛呢零蓉,不如發(fā)郵件給好友分享一下,發(fā)郵件怎么能手寫呢穷缤,寫個python吧壁公,
python庫里有訪問郵件服務(wù)器的包
有幾點要注意的:
首先,要打開你使用的郵箱的SMTP服務(wù)绅项,登錄郵箱設(shè)置一下紊册;
然后會要你設(shè)置一個授權(quán)碼,以下代碼里的password要寫這個授權(quán)碼快耿,而不是你的郵箱密碼囊陡。
之后我運行發(fā)現(xiàn)發(fā)不出去郵件,報554錯誤掀亥,原來被網(wǎng)易當(dāng)垃圾郵件攔截撞反,到網(wǎng)上找了一圈,終于發(fā)現(xiàn)搪花,要加兩行代碼:就是'from' 'to'要寫上郵箱
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
sender = '*****@126.com'
receiver = '*******@qq.com'
subject = 'python email test'
smtpserver = 'smtp.126.com'
username = '********'
password = '1111111'
msgRoot = MIMEMultipart('related')
msgRoot['Subject'] = 'test mail'
#防止被當(dāng)垃圾郵件攔截
msgRoot['from']='*******@126.com'
msgRoot['to']='********@qq.com'
#構(gòu)造附件
att = MIMEText(open('qsbk.txt', 'rb').read(), 'base64', 'utf-8')
att["Content-Type"] = 'application/octet-stream'
att["Content-Disposition"] = 'attachment; filename="it's funny.txt"'
msgRoot.attach(att)
smtp = smtplib.SMTP()
smtp.connect('smtp.126.com')
smtp.login(username, password)
smtp.sendmail(sender, receiver, msgRoot.as_string())
smtp.quit()
print ("ok")
之后干嘛呢遏片,可以建個bat批處理,每天定時發(fā)送撮竿,嘿嘿