一眶拉、配置web scraper
從Chrome瀏覽的擴展商店中安裝web scraper筐咧;安裝過程不做贅述仔掸;
安裝完成后脆贵,在瀏覽器頁面按F12打開console模式,點擊web scraper進行操作起暮。
二卖氨、內(nèi)容抓取簡單操作
1.循環(huán)多個相同頁面內(nèi)容抓取
可以使用正規(guī)則表達式,循環(huán)抓取指定頁面负懦,如[x-y]
2.表格按行顯示
開啟首列內(nèi)容為"multiple"的設(shè)置為true筒捺,其他列的"multiple"為false;
3.抓取子頁面內(nèi)容的元素
設(shè)置link纸厉,并以該元素為父節(jié)點系吭。
簡單案例:
{"selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"link","selector":"td.table-com-name a","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"name","selector":"td.table-com-name a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"date","selector":"td.table-time","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"jieduan","selector":"td.table-stage a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"lingyu","selector":"td.table-type a","regex":"","delay":""}],"startUrl":"http://www.cyzone.cn/index.php?c=index&a=init&tpl=dbsearch&wq=%E5%86%9C%E6%9D%91&modelid=18&page=[1-9]","_id":"nongcun2"}