背景介紹
昨天接到一個需求希停,朋友有個留言板系統(tǒng)、他希望可以有個爬蟲程序、每天可以爬取一個App上的最新資訊辰企、自動發(fā)布到留言板系統(tǒng)上。
項(xiàng)目梳理
- 了解留言板系統(tǒng)
由于朋友不懂技術(shù)况鸣、所以直接把朋友整個留言板系統(tǒng)拿過來了牢贸、哇塞、打開一看镐捧、是由ASP+ACCESS 古董級搭站方式潜索、估計(jì)是從哪個寶買的系統(tǒng)...沒有去深究他
- 思考實(shí)現(xiàn)方式
- 1、軟件運(yùn)行在服務(wù)器 直接訪問ACCESS懂酱、每天直接更新到ACCESS數(shù)據(jù)庫
- 2竹习、軟件運(yùn)行在客戶端 留言板系統(tǒng)增加一個數(shù)據(jù)接口服務(wù)、客戶端將采集到的數(shù)據(jù)POST到這個數(shù)據(jù)接口服務(wù)列牺、由這個接口服務(wù)提交數(shù)據(jù)到ACCESS整陌。
- 3、留言板系統(tǒng)增加一個使用ASP語言搭建一個采集服務(wù)、留言板系統(tǒng)增加一個數(shù)據(jù)接口服務(wù)泌辫、每天直接在瀏覽器運(yùn)行這個采集服務(wù)就可以了
項(xiàng)目開始
考慮到程序簡便性随夸、和新鮮性決定使用第三種方案、使用ASP搭建采集服務(wù)和數(shù)據(jù)接口服務(wù)
項(xiàng)目實(shí)施
采集對象APP-稅問精選
關(guān)于如何采集APP上的內(nèi)容震放、稍后會有詳細(xì)介紹宾毒、在此在做簡單介紹、不做展開
- 保證手機(jī)和電腦同一局域網(wǎng)下
- 電腦開啟Fiddler4殿遂、并設(shè)置相關(guān)htts和端口
- 將Fiddle4的端口和電腦的IP配置到手機(jī)上
這時訪問APP伍俘,F(xiàn)iddler4就可以監(jiān)測到請求的header相關(guān)信息了、
具體的采集流程不做過多闡述 直接放下代碼
<!--
autor:索索軟件工作室
date:2017-01-18
QQ:859867801
-->
<%@language=vbscript codepage=65001 %>
<% Response.Charset = "utf-8" %>
<%
Server.ScriptTimeOut = 500
postUrl = "http://localhost:81/sprider_post.asp"
'獲取列表
msg = getHTTPPage("http://app.taxwen.com/taxcloud/read/find/getAllClassify")
'Response.write(msg)
'解析列表
arru = RegExpTest("ncid"":""(.*?)""", msg)
arruText = RegExpTest("name"":""(.*?)""", msg)
for i=0 to ubound(arru)-1
itemUrl = "http://app.taxwen.com/taxcloud/read/find/getSubClassById?ncid="+ arru(i)
'response.write(itemUrl&"<br>")
msgItem = getHTTPPage(itemUrl)
'response.write(msgItem&"<br>")
arruItem = RegExpTest("cid"":""(.*?)""", msgItem)
arruItemText = RegExpTest("name"":""(.*?)""", msgItem)
for j=0 to ubound(arruItem)-1
itemUrlList = "http://app.taxwen.com/taxcloud/read/findlist/newslist?cid="+arruItem(j)+"&pageNo=1"
'response.write(itemUrlList&"<br>")
msgItemList = getHTTPPage(itemUrlList)
'response.write(msgItemList&"<br>")
arruItemDet = RegExpTest("docid"":""(.*?)""", msgItemList)
arruItemDetTime = RegExpTest("indate"":(.*?),", msgItemList)
arruItemDetText = RegExpTest("title"":""(.*?)""", msgItemList)
for k=0 to ubound(arruItemDet)-1
ctime = CDbl(arruItemDetTime(k))
nTime = CDbl(getTime())
If ctime > nTime Then
itemUrlListDet = "http://app.taxwen.com/taxcloud/read/findlist/getnewscontent?docid="+arruItemDet(k)+"&userid= "
'response.write("ctime:"&ctime&"-getTime:"&getTime()&"-"&FromUnixTime(ctime, +8)&"大余"&FromUnixTime(getTime(), +8)&"<br>")
msgItemListDet = getHTTPPage(itemUrlListDet)
title = RegExpTest("<title>(.*?)</title>", msgItemListDet)
txt = RegExpTest("<div style=""border-top: solid 1px #eee;""></div>([\s\S]*?)</div>", msgItemListDet)
If IsEmpty(title)=False And IsEmpty(txt)=False And ubound(txt)>=1 And ubound(title)>=1 Then
txtsrc = txt(0)
arruItemDetImg = RegExpTest("img data-original=""(.*?)""", txtsrc)
for n=0 to ubound(arruItemDetImg)-1
patrn = "<img data-original="""+arruItemDetImg(n)+""" src=""./media/jquery/loading.gif"" style=""max-width:100%"">"
replStr = "[img]"&arruItemDetImg(n)&"[/img]"
txtsrc = ReplaceHTML(txtsrc, patrn, replStr)
Next
response.write arruText(i)&"-"&arruItemText(j)&"-"&arruItemDetText(k)&"-"&FromUnixTime(ctime, +8)
param = "title="&title(0)&"&txt="+txtsrc&"&homepage="+itemUrlListDet
srst = PostHTTPPage(postUrl, param)
rst = CDbl(srst)
If rst > 0 Then
response.write ":上傳成功"&"<br>"
ElseIf rst < 0 Then
response.write ":已存在"&"<br>"
Else
response.write ":上傳失敗"&"<br>"
End If
End If
End If
Next
Next
Next
Function FromUnixTime(intTime, intTimeZone)
If Len(intTime) =13 Then
intTime = left(intTime, 10)
End if
If IsEmpty(intTime) or Not IsNumeric(intTime) Then
FromUnixTime = Now()
Exit Function
End If
If IsEmpty(intTime) or Not IsNumeric(intTimeZone) Then intTimeZone = 0
FromUnixTime = DateAdd("s", intTime, "1970-01-01 00:00:00")
FromUnixTime = DateAdd("h", intTimeZone, FromUnixTime)
End Function
Public Function getTime()
getTime = DateDiff("s", "1970-01-01 08:00:00", Date()) * 1000 + Int(CDbl(Timer()) * 1000)-60*60*24*3*1000
End Function
function PostHTTPPage(url,data)
dim Http
set Http=server.createobject("MSXML2.SERVERXMLHTTP.3.0")
Http.open "POST",url,false
Http.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded"
Http.send(data)
if Http.readystate<>4 then
exit function
End if
PostHTTPPage=bytesToBSTR1(Http.responseBody,"utf-8")
set http=nothing
End Function
Function bytesToBSTR1(body,Cset)
if lenb(body)=0 then
bytesToBSTR1=""
exit function
end if
dim mystream
set mystream=server.createobject("adodb.stream")
mystream.type=2
mystream.mode=3
mystream.open
mystream.writetext body
mystream.position=0
mystream.charset=Cset
mystream.position=2
bstr=mystream.readtext()
mystream.close
set mystream=nothing
bytesToBSTR1=bstr
End Function
Function getHTTPPage(url)
dim objXML
set objXML=createobject("MSXML2.XMLHTTP")
objXML.open "get",url,false
objXML.send()
If objXML.readystate<>4 then
exit function
End If
getHTTPPage=bytesToBSTR1(objXML.responseBody,"utf-8")
set objXML=nothing
if err.number<>0 then err.Clear
End Function
Function RegExpTest(patrn, strng)
Dim regEx, Match, Matches ' 建立變量勉躺。
Set regEx = New RegExp ' 建立正則表達(dá)式癌瘾。
regEx.Pattern = patrn ' 設(shè)置模式。
regEx.IgnoreCase = True ' 設(shè)置是否區(qū)分大小寫饵溅。
regEx.Global = True ' 設(shè)置全程可用性妨退。
Set Matches = regEx.Execute(strng) ' 執(zhí)行搜索。
For Each Match in Matches ' 遍歷 Matches 集合蜕企。
RetStr = RetStr & Match.SubMatches(0) & "," '值為123和44的數(shù)組
Next
RegExpTest = Split(RetStr, ",")
End Function
'正則替換函數(shù)
Function ReplaceHTML(srcstr, patrn, replStr)
Set regEx = New RegExp
regEx.Pattern = patrn
regEx.IgnoreCase = True
regEx.Global = True
regEx.Execute(srcstr)
ReplaceHTML = regEx.Replace(srcstr, replStr)
Set regEx = Nothing
End Function
%>```
## 具體的數(shù)據(jù)接口服務(wù)
<%@language=vbscript codepage=65001 %>
<% Response.Charset = "utf-8" %>
<%
Server.ScriptTimeOut = 500
UserName = "Admin-S"
Face = ""
sex = ""
HomePage = Request.form("homepage")
Email = "admin6@qq.com"
Subject = Request.form("title")
content = Request.form("txt")
content = Replace(content,"imgsrc=", "img src=")
IPinfo = "127.0.0.1"
bookdate = now
pic = "p16.gif"
secret = "0"
qq = "25250508"
mark = "0"
fontcolor = "標(biāo)題醒目"
Set rs11 = Server.CreateObject( "ADODB.Recordset" )
rs11.open "Select * From guest where subject = '"&Subject&"' and HomePage = '"&HomePage&"'order by id desc" ,Conn,1,1
id=rs11("id")
rs11.close
If id > 0 Then
Response.write -1
set rs11=Nothing
Else
sql="Insert Into guest (username,face,sex,homepage,mail,subject,content,IP,lydate,lastdate,pic,secret,qq,lastname,mark,fontcolor) Values('"& UserName &"','"& Face &"','"& sex &"','"& HomePage &"', '"& Email &"','"& Subject &"','"&content &"','"& IPinfo &"','"& bookdate &"','"& bookdate &"','"& pic &"',"& secret &",'"&qq&"','——',"&mark&",'"&fontcolor&"')"
conn.Execute sql
Set rs = Server.CreateObject( "ADODB.Recordset" )
rs.open "Select * From guest order by id desc" ,Conn,1,1
id=rs("id")
rs.close
Response.write(id)
set rs=Nothing
End If
conn.close
%>
# 結(jié)束
> 以上有問題咬荷,歡迎留言
整個留言版系統(tǒng)
[git源碼包](https://github.com/ZhouYoung/ly_web)