python爬蟲之爬取教務(wù)網(wǎng)成績(jī)
這次的內(nèi)容主要就是講述自己的第一只python爬蟲宿稀,而所要爬取的對(duì)象就是學(xué)校的教務(wù)網(wǎng)。促使我寫出了這樣一只python爬蟲的主要的原因就是學(xué)校查成績(jī)太過于麻煩赖捌。成績(jī)不是一次性的全部公布祝沸,而是一科一科的不定時(shí)的公布,所以我就決定自己編寫一只爬蟲,讓它自己運(yùn)行罩锐,然后將爬取得到的成績(jī)自動(dòng)的發(fā)送到郵箱里奉狈。這樣我就不用自己再去教務(wù)網(wǎng)查成績(jī)。好了涩惑,廢話不多說仁期,開始吧!
第一步:
在開始真正的編寫爬蟲之前我們要首先明白我們平時(shí)手動(dòng)的查詢成績(jī)是怎樣實(shí)現(xiàn)的竭恬,大概的過程是怎樣蟀拷,提交的數(shù)據(jù)有哪些,數(shù)據(jù)是提交到哪兒的萍聊。為了弄明白這些我們可以利用一個(gè)插件问芬,即HttpFox,可以利用火狐瀏覽器安裝這個(gè)插件寿桨,其他的便不再贅述此衅。
第二步:
得到上面要求的數(shù)據(jù)后便開始編寫爬蟲代碼。主要如下:
#-*- coding:utf-8 -*-
import urllib
import urllib2
import cookielib
from bs4 import BeautifulSoup
from email.mime.text import MIMEText
import smtplib
from email.header import Header
#Login and Query
def login(user_name,pass_word,efdf):
#get cookie
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
#login data
postdata = urllib.urlencode({
"_VIEWSTATEGEN":"CAA0A5A7",
"Sel_Type":"STU",
"txt_dsdsdsdjkjkjc":user_name,
"txt_dsdfdfgfouyy":pass_word,
"txt_ysdsdsdskgf":"",
"pcInfo":"",
"typeName":"",
"aerererdsdxcxdfgfg":"",
"efdfdfuuyyuuckjg":efdf
})
# Login
loginUrl = "http://202.202.1.176:8080/_data/index_login.aspx"
html1 = opener.open(loginUrl,postdata)
# Query score
gradeUrl2 = "http://202.202.1.176:8080/xscj/Stu_MyScore_rpt.aspx"
postdata2 = urllib.urlencode({
"sel_xn":"2016",
"sel_xq":"0",
"SJ":"0",
"SelXNXQ":"2",
"zfx_flag":"0",
"zxf":"0"
})
html2 = opener.open(gradeUrl2,postdata2)
#deal data
soup = BeautifulSoup(html2.read(),"html.parser")
content = soup.select('td')
# output result
str_content='Scores\n'
i=0
while i<len(content):
if(i==0 or i==1):
print content[i].text
str_content=str_content+content[i].text+'\n'
i=i+1
else:
for j in range(10):
if (i+j) < len(content):
print (content[i+j].text),
str_content = str_content+content[i+j].text+' '
if j == 9:
print ("\n")
str_content = str_content+'\n'
else:
break
i = i+10
return str_content
#send email
def Send_email(receiver,send_content):
mail_host="smtp.qq.com"
mail_user="1432864950@qq.com"
mail_pass="************" #這兒是郵箱的授權(quán)碼
message=MIMEText(send_content,'plain','utf-8')
message['From']=Header(mail_user,'utf-8')
message['To']=Header(receiver,'utf-8')
subject = 'Score'
message['Subject']=Header(subject,'utf-8')
try:
smtpObj = smtplib.SMTP_SSL(mail_host,465)
smtpObj.login(mail_user,mail_pass)
smtpObj.sendmail(mail_user,receivers,message.as_string())
smtpObj.quit()
return ['Send email success']
except smtplib.SMTPException,e:
return ['There must have some error']
return
result1 = login("20154302","*****","70982C84EDDBBDBF2F28546AF2A6FA")
Send_email("3344963462@qq.com",result1)
這篇文章并不是一篇python爬蟲的教程文章亭螟,只是用于記錄我自己的一些經(jīng)歷挡鞍,所以有很多的細(xì)節(jié)的地方都沒有講解。