python開源IP代理池--IPProxys

今天博客開始繼續(xù)更新,謝謝大家對(duì)我的關(guān)注和支持。這幾天一直是在寫一個(gè)ip代理池的開源項(xiàng)目。通過前幾篇的博客瞎抛,我們可以了解到突破反爬蟲機(jī)制的一個(gè)重要舉措就是代理ip。擁有龐大穩(wěn)定的ip代理却紧,在爬蟲工作中將起到重要的作用,但是從成本的角度來說桐臊,一般穩(wěn)定的ip池都很貴,因此我這個(gè)開源項(xiàng)目的意義就誕生了晓殊,爬取一些代理網(wǎng)站提供的免費(fèi)ip(雖然70%都是不好使的,但是扛不住量大断凶,網(wǎng)站多),檢測(cè)有效性后存儲(chǔ)到數(shù)據(jù)庫中巫俺,同時(shí)搭建一個(gè)http服務(wù)器懒浮,提供一個(gè)api接口,供大家的爬蟲程序調(diào)用。(我的新書《Python爬蟲開發(fā)與項(xiàng)目實(shí)戰(zhàn)》發(fā)布了砚著,大家在這里可以看到樣章

好了次伶,廢話不多說,咱們進(jìn)入今天的主題稽穆,講解一下我寫的這個(gè)開源項(xiàng)目IPProxys冠王。

下面是這個(gè)項(xiàng)目的工程結(jié)構(gòu):

api包:主要是實(shí)現(xiàn)http服務(wù)器,提供api接口(通過get請(qǐng)求,返回json數(shù)據(jù))

data文件夾:主要是數(shù)據(jù)庫文件的存儲(chǔ)位置和qqwry.dat(可以查詢ip的地理位置)

db包:主要是封裝了一些數(shù)據(jù)庫的操作

spider包:主要是爬蟲的核心功能舌镶,爬取代理網(wǎng)站上的代理ip

test包:測(cè)試一些用例柱彻,不參與整個(gè)項(xiàng)目的運(yùn)行

util包:提供一些工具類。IPAddress.py查詢ip的地理位置

validator包:用來測(cè)試ip地址是否可用

config.py:主要是配置信息(包括配置ip地址的解析方式和數(shù)據(jù)庫的配置)

接下來講一下關(guān)鍵代碼:

首先說一下apiServer.py:


#coding:utf-8

'''

定義幾個(gè)關(guān)鍵字餐胀,count types,protocol,country,area,

'''

import urllib

from config import API_PORT

from db.SQLiteHelper import SqliteHelper



__author__ = 'Xaxdus'



import BaseHTTPServer

import json

import urlparse



# keylist=['count', 'types','protocol','country','area']

class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):



    def do_GET(self):

        """

        """

        dict={}



        parsed_path = urlparse.urlparse(self.path)

        try:

            query = urllib.unquote(parsed_path.query)

            print query

            if query.find('&')!=-1:

                params = query.split('&')

                for param in params:

                    dict[param.split('=')[0]]=param.split('=')[1]

            else:

                    dict[query.split('=')[0]]=query.split('=')[1]

            str_count=''

            conditions=[]

            for key in dict:

                if key =='count':

                    str_count = 'lIMIT 0,%s'% dict[key]

                if key =='country' or key =='area':

                    conditions .append(key+" LIKE '"+dict[key]+"%'")

                elif key =='types' or key =='protocol' or key =='country' or key =='area':

                    conditions .append(key+"="+dict[key])

            if len(conditions)>1:

                conditions = ' AND '.join(conditions)

            else:

                conditions =conditions[0]

            sqlHelper = SqliteHelper()

            result = sqlHelper.select(sqlHelper.tableName,conditions,str_count)

            # print type(result)

            # for r in  result:

            #     print r

            print result

            data = json.dumps(result)

            self.send_response(200)

            self.end_headers()

            self.wfile.write(data)

        except Exception,e:

            print e

            self.send_response(404)



if __name__=='__main__':

    server = BaseHTTPServer.HTTPServer(('0.0.0.0',API_PORT), WebRequestHandler)

    server.serve_forever()

從代碼中可以看出是對(duì)參數(shù)的解析哟楷,參數(shù)包括count(數(shù)量), types(模式),protocol(協(xié)議),country(國家),area(地區(qū)),(


types類型(0高匿名,1透明)否灾,protocol(0 http,1 https http),country(國家),area(省市))例如訪問http://127.0.0.1:8000/?count=8&types=0.返回json數(shù)據(jù)卖擅。如下圖所示:

接著說一下SQLiteHelper.py(主要是對(duì)sqlite的操作):


#coding:utf-8

from config import DB_CONFIG

from db.SqlHelper import SqlHelper



__author__ = 'Xaxdus'

import sqlite3

class SqliteHelper(SqlHelper):



    tableName='proxys'

    def __init__(self):

        '''

        建立數(shù)據(jù)庫的鏈接

        :return:

        '''

        self.database = sqlite3.connect(DB_CONFIG['dbPath'],check_same_thread=False)

        self.cursor = self.database.cursor()

        #創(chuàng)建表結(jié)構(gòu)

        self.createTable()





    def createTable(self):

        self.cursor.execute("create TABLE IF NOT EXISTS %s (id INTEGER PRIMARY KEY ,ip VARCHAR(16) NOT NULL,"

               "port INTEGER NOT NULL ,types INTEGER NOT NULL ,protocol INTEGER NOT NULL DEFAULT 0,"

               "country VARCHAR (20) NOT NULL,area VARCHAR (20) NOT NULL,updatetime TimeStamp NOT NULL DEFAULT (datetime('now','localtime')) ,speed DECIMAL(3,2) NOT NULL DEFAULT 100)"% self.tableName)



        self.database.commit()



    def select(self,tableName,condition,count):

        '''



        :param tableName: 表名

        :param condition: 條件包含占位符

        :param value:  占位符所對(duì)應(yīng)的值(主要是為了防注入)

        :return:

        '''

        command = 'SELECT DISTINCT ip,port FROM %s WHERE %s ORDER BY speed ASC %s '%(tableName,condition,count)



        self.cursor.execute(command)

        result = self.cursor.fetchall()

        return result



    def selectAll(self):

        self.cursor.execute('SELECT DISTINCT ip,port FROM %s ORDER BY speed ASC '%self.tableName)

        result = self.cursor.fetchall()

        return result



    def selectCount(self):

        self.cursor.execute('SELECT COUNT( DISTINCT ip) FROM %s'%self.tableName)

        count = self.cursor.fetchone()

        return count



    def selectOne(self,tableName,condition,value):

        '''



        :param tableName: 表名

        :param condition: 條件包含占位符

        :param value:  占位符所對(duì)應(yīng)的值(主要是為了防注入)

        :return:

        '''

        self.cursor.execute('SELECT DISTINCT ip,port FROM %s WHERE %s ORDER BY speed ASC'%(tableName,condition),value)

        result = self.cursor.fetchone()

        return result



    def update(self,tableName,condition,value):

        self.cursor.execute('UPDATE %s %s'%(tableName,condition),value)

        self.database.commit()



    def delete(self,tableName,condition):

        '''



        :param tableName: 表名

        :param condition: 條件

        :return:

        '''

        deleCommand = 'DELETE FROM %s WHERE %s'%(tableName,condition)

        # print deleCommand

        self.cursor.execute(deleCommand)

        self.commit()



    def commit(self):

        self.database.commit()





    def insert(self,tableName,value):



        proxy = [value['ip'],value['port'],value['type'],value['protocol'],value['country'],value['area'],value['speed']]

        # print proxy

        self.cursor.execute("INSERT INTO %s (ip,port,types,protocol,country,area,speed)VALUES (?,?,?,?,?,?,?)"% tableName

                            ,proxy)





    def batch_insert(self,tableName,values):



        for value in values:

            if value!=None:

                self.insert(self.tableName,value)

        self.database.commit()





    def close(self):

        self.cursor.close()

        self.database.close()







if __name__=="__main__":

    s = SqliteHelper()

    print s.selectCount()[0]

    # print s.selectAll()

HtmlPraser.py(主要是對(duì)html進(jìn)行解析):

使用lxml的xpath進(jìn)行解析


#coding:utf-8

import datetime

from config import QQWRY_PATH, CHINA_AREA



from util.IPAddress import IPAddresss

from util.logger import logger



__author__ = 'Xaxdus'

from lxml import etree

class Html_Parser(object):



    def __init__(self):

        self.ips = IPAddresss(QQWRY_PATH)

    def parse(self,response,parser):

        '''



        :param response: 響應(yīng)

        :param type: 解析方式

        :return:

        '''

        if parser['type']=='xpath':

            proxylist=[]

            root = etree.HTML(response)

            proxys = root.xpath(parser['pattern'])

            for proxy in proxys:

                # print parser['postion']['ip']

                ip = proxy.xpath(parser['postion']['ip'])[0].text

                port = proxy.xpath(parser['postion']['port'])[0].text

                type = proxy.xpath(parser['postion']['type'])[0].text

                if type.find(u'高匿')!=-1:

                    type = 0

                else:

                    type = 1

                protocol=''

                if len(parser['postion']['protocol']) > 0:

                    protocol = proxy.xpath(parser['postion']['protocol'])[0].text

                    if protocol.lower().find('https')!=-1:

                        protocol = 1

                    else:

                        protocol = 0

                else:

                    protocol = 0

                addr = self.ips.getIpAddr(self.ips.str2ip(ip))

                country = ''

                area = ''

                if addr.find(u'省')!=-1 or self.AuthCountry(addr):

                    country = u'中國'

                    area = addr

                else:

                    country = addr

                    area = ''

                # updatetime = datetime.datetime.now()

                # ip,端口墨技,類型(0高匿名惩阶,1透明),protocol(0 http,1 https http),country(國家),area(省市),updatetime(更新時(shí)間)



                # proxy ={'ip':ip,'port':int(port),'type':int(type),'protocol':int(protocol),'country':country,'area':area,'updatetime':updatetime,'speed':100}

                proxy ={'ip':ip,'port':int(port),'type':int(type),'protocol':int(protocol),'country':country,'area':area,'speed':100}

                print proxy

                proxylist.append(proxy)



            return proxylist



    def AuthCountry(self,addr):

        '''

        用來判斷地址是哪個(gè)國家的

        :param addr:

        :return:

        '''

        for area in CHINA_AREA:

            if addr.find(area)!=-1:

                return True

        return False


IPAddresss.py(通過讀取純真qqwry.dat,對(duì)ip地址進(jìn)行定位)扣汪,讀取的方式可以參考:http://ju.outofmemory.cn/entry/85998;https://linuxtoy.org/archives/python-ip.html


#! /usr/bin/env python

# -*- coding: utf-8 -*-







import socket

import struct





class IPAddresss:

    def __init__(self, ipdbFile):

        self.ipdb = open(ipdbFile, "rb")

        str = self.ipdb.read(8)

        (self.firstIndex, self.lastIndex) = struct.unpack('II', str)

        self.indexCount = (self.lastIndex - self.firstIndex)/7+1

        # print self.getVersion(), u" 紀(jì)錄總數(shù): %d 條 "%(self.indexCount)



    def getVersion(self):

        s = self.getIpAddr(0xffffff00L)

        return s



    def getAreaAddr(self, offset=0):

        if offset:

            self.ipdb.seek(offset)

        str = self.ipdb.read(1)

        (byte,) = struct.unpack('B', str)

        if byte == 0x01 or byte == 0x02:

            p = self.getLong3()

            if p:

                return self.getString(p)

            else:

                return ""

        else:

            self.ipdb.seek(-1, 1)

            return self.getString(offset)



    def getAddr(self, offset, ip=0):

        self.ipdb.seek(offset + 4)

        countryAddr = ""

        areaAddr = ""

        str = self.ipdb.read(1)

        (byte,) = struct.unpack('B', str)

        if byte == 0x01:

            countryOffset = self.getLong3()

            self.ipdb.seek(countryOffset)

            str = self.ipdb.read(1)

            (b,) = struct.unpack('B', str)

            if b == 0x02:

                countryAddr = self.getString(self.getLong3())

                self.ipdb.seek(countryOffset + 4)

            else:

                countryAddr = self.getString(countryOffset)

            areaAddr = self.getAreaAddr()

        elif byte == 0x02:

            countryAddr = self.getString(self.getLong3())

            areaAddr = self.getAreaAddr(offset + 8)

        else:

            countryAddr = self.getString(offset + 4)

            areaAddr = self.getAreaAddr()

        return countryAddr + " " + areaAddr



    def dump(self, first , last):

        if last > self.indexCount :

            last = self.indexCount

        for index in range(first, last):

            offset = self.firstIndex + index * 7

            self.ipdb.seek(offset)

            buf = self.ipdb.read(7)

            (ip, of1, of2) = struct.unpack("IHB", buf)

            address = self.getAddr(of1 + (of2 << 16))

            # 把GBK轉(zhuǎn)為utf-8

            address = unicode(address, 'gbk').encode("utf-8")

            print "%d\t%s\t%s" % (index, self.ip2str(ip), address)



    def setIpRange(self, index):

        offset = self.firstIndex + index * 7

        self.ipdb.seek(offset)

        buf = self.ipdb.read(7)

        (self.curStartIp, of1, of2) = struct.unpack("IHB", buf)

        self.curEndIpOffset = of1 + (of2 << 16)

        self.ipdb.seek(self.curEndIpOffset)

        buf = self.ipdb.read(4)

        (self.curEndIp,) = struct.unpack("I", buf)



    def getIpAddr(self, ip):

        L = 0

        R = self.indexCount - 1

        while L < R-1:

            M = (L + R) / 2

            self.setIpRange(M)

            if ip == self.curStartIp:

                L = M

                break

            if ip > self.curStartIp:

                L = M

            else:

                R = M

        self.setIpRange(L)

        # version information, 255.255.255.X, urgy but useful

        if ip & 0xffffff00L == 0xffffff00L:

            self.setIpRange(R)

        if self.curStartIp <= ip <= self.curEndIp:

            address = self.getAddr(self.curEndIpOffset)

            # 把GBK轉(zhuǎn)為utf-8

            address = unicode(address, 'gbk')

        else:

            address = u"未找到該IP的地址"

        return address



    def getIpRange(self, ip):

        self.getIpAddr(ip)

        range = self.ip2str(self.curStartIp) + ' - ' \

            + self.ip2str(self.curEndIp)

        return range



    def getString(self, offset = 0):

        if offset :

            self.ipdb.seek(offset)

        str = ""

        ch = self.ipdb.read(1)

        (byte,) = struct.unpack('B', ch)

        while byte != 0:

            str += ch

            ch = self.ipdb.read(1)

            (byte,) = struct.unpack('B', ch)

        return str



    def ip2str(self, ip):

        return str(ip >> 24)+'.'+str((ip >> 16) & 0xffL)+'.'+str((ip >> 8) & 0xffL)+'.'+str(ip & 0xffL)



    def str2ip(self, s):

        (ip,) = struct.unpack('I', socket.inet_aton(s))

        return ((ip >> 24) & 0xffL) | ((ip & 0xffL) << 24) | ((ip >> 8) & 0xff00L) | ((ip & 0xff00L) << 8)



    def getLong3(self, offset=0):

        if offset:

            self.ipdb.seek(offset)

        str = self.ipdb.read(3)

        (a, b) = struct.unpack('HB', str)

        return (b << 16) + a







最后看一下validator.py断楷,由于使用的是python2.7,所以要使用協(xié)程采用了gevent:


#coding:utf-8

import datetime

from gevent.pool import Pool

import requests

import time

from config import TEST_URL

import config

from db.SQLiteHelper import SqliteHelper

from gevent import monkey

monkey.patch_all()

__author__ = 'Xaxdus'



class Validator(object):



    def __init__(self):

        self.detect_pool = Pool(config.THREADNUM)





    def __init__(self,sqlHelper):

        self.detect_pool = Pool(config.THREADNUM)

        self.sqlHelper =sqlHelper





    def run_db(self):

        '''

        從數(shù)據(jù)庫中檢測(cè)

        :return:

        '''

        try:

            #首先將超時(shí)的全部刪除

            self.deleteOld()

            #接著將重復(fù)的刪除掉



            #接著檢測(cè)剩余的ip,是否可用

            results = self.sqlHelper.selectAll()

            self.detect_pool.map(self.detect_db,results)

            return self.sqlHelper.selectCount()#返回最終的數(shù)量

        except Exception,e:

            print e

            return 0







    def run_list(self,results):

        '''

        這個(gè)是先不進(jìn)入數(shù)據(jù)庫,直接從集合中刪除

        :param results:

        :return:

        '''

        # proxys=[]

        # for result in results:

        proxys = self.detect_pool.map(self.detect_list,results)

        #這個(gè)時(shí)候proxys的格式是[{},{},{},{},{}]

        return proxys













    def deleteOld(self):

        '''

        刪除舊的數(shù)據(jù)

        :return:

        '''

        condition = "updatetime<'%s'"%((datetime.datetime.now() - datetime.timedelta(minutes=config.MAXTIME)).strftime('%Y-%m-%d %H:%M:%S'))

        self.sqlHelper.delete(SqliteHelper.tableName,condition)











    def detect_db(self,result):

        '''



        :param result: 從數(shù)據(jù)庫中檢測(cè)

        :return:

        '''

        ip = result[0]

        port = str(result[1])

        proxies={"http": "http://%s:%s"%(ip,port)}

        start = time.time()

        try:

            r = requests.get(url=TEST_URL,headers=config.HEADER,timeout=config.TIMEOUT,proxies=proxies)



            if not r.ok:

                condition = "ip='"+ip+"' AND "+'port='+port

                print 'fail ip =%s'%ip

                self.sqlHelper.delete(SqliteHelper.tableName,condition)

            else:

                speed = round(time.time()-start, 2)

                self.sqlHelper.update(SqliteHelper.tableName,'SET speed=? WHERE ip=? AND port=?',(speed,ip,port))

                print 'success ip =%s,speed=%s'%(ip,speed)

        except Exception,e:

                condition = "ip='"+ip+"' AND "+'port='+port

                print 'fail ip =%s'%ip

                self.sqlHelper.delete(SqliteHelper.tableName,condition)







    def detect_list(self,proxy):

        '''

        :param proxy: ip字典

        :return:

        '''

        # for proxy in proxys:



        ip = proxy['ip']

        port = proxy['port']

        proxies={"http": "http://%s:%s"%(ip,port)}

        start = time.time()

        try:

            r = requests.get(url=TEST_URL,headers=config.HEADER,timeout=config.TIMEOUT,proxies=proxies)



            if not r.ok:

                print 'fail ip =%s'%ip

                proxy = None



            else:

                speed = round(time.time()-start,2)

                print 'success ip =%s,speed=%s'%(ip,speed)

                proxy['speed']=speed

                # return proxy

        except Exception,e:

                print 'fail ip =%s'%ip

                proxy = None

        return proxy

        # return proxys





if __name__=='__main__':

    # v = Validator()

    # results=[{'ip':'192.168.1.1','port':80}]*10

    # results = v.run(results)

    # print results

    pass


最后咱們看一下運(yùn)行效果: 切換到工程目錄下,cmd中執(zhí)行python IPProxys.py:

這個(gè)時(shí)候咱們?cè)跒g覽器中輸入請(qǐng)求,就會(huì)返回響應(yīng)的結(jié)果:

執(zhí)行流程是每隔半小時(shí)檢測(cè)一下數(shù)據(jù)庫中ip地址的有效性崭别,刪除無效的代理ip冬筒。如果ip地址數(shù)量少于一個(gè)數(shù)值,爬蟲將會(huì)啟動(dòng)茅主,進(jìn)行新一輪的爬取账千。當(dāng)然檢測(cè)時(shí)間和數(shù)據(jù)量都可以在config.py中配置。咱們看一下config.py的部分代碼,大家就明白了:


'''

數(shù)據(jù)庫的配置

'''

DB_CONFIG={

    'dbType':'sqlite',#sqlite,mysql,mongodb

    'dbPath':'./data/proxy.db',#這個(gè)僅僅對(duì)sqlite有效

    'dbUser':'',#用戶名

    'dbPass':'',#密碼

    'dbName':''#數(shù)據(jù)庫名稱



}



CHINA_AREA=[u'河北',u'山東',u'遼寧',u'黑龍江',u'吉林'

    ,u'甘肅',u'青海',u'河南',u'江蘇',u'湖北',u'湖南',

            u'江西',u'浙江',u'廣東',u'云南',u'福建',

            u'臺(tái)灣',u'海南',u'山西',u'四川',u'陜西',

            u'貴州',u'安徽',u'重慶',u'北京',u'上海',u'天津',u'廣西',u'內(nèi)蒙',u'西藏',u'新疆',u'寧夏',u'香港',u'澳門']

QQWRY_PATH="./data/qqwry.dat"



THREADNUM = 20

API_PORT=8000

'''

爬蟲爬取和檢測(cè)ip的設(shè)置條件

不需要檢測(cè)ip是否已經(jīng)存在暗膜,因?yàn)闀?huì)定時(shí)清理

'''

UPDATE_TIME=30*60#每半個(gè)小時(shí)檢測(cè)一次是否有代理ip失效

MINNUM = 500 #當(dāng)有效的ip值小于一個(gè)時(shí) 需要啟動(dòng)爬蟲進(jìn)行爬取

MAXTIME = 24*60 #當(dāng)爬取存儲(chǔ)開始一直使用的最大時(shí)間匀奏,如果超過這個(gè)時(shí)間,都刪除



TIMEOUT = 5#socket延時(shí)







'''

反爬蟲的設(shè)置

'''

'''

重試次數(shù)

'''

RETRY_TIME=3





'''

USER_AGENTS 隨機(jī)頭信息

'''

USER_AGENTS = [

    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",

    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",

    "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",

    "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",

    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",

    "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",

    "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",

    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",

    "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",

    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",

    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",

    "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",

    "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",

    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",

    "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",

    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",

    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",

    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",

    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",

    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",

    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",

    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",

    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",

    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",

    "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",

    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",

    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",

    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",

    "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"

]



HEADER = {

    'User-Agent': random.choice(USER_AGENTS),

    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

    'Accept-Language': 'en-US,en;q=0.5',

    'Connection': 'keep-alive',

    'Accept-Encoding': 'gzip, deflate',

}



TEST_URL='http://www.ip138.com/'

整個(gè)項(xiàng)目的代碼很簡(jiǎn)單学搜,大家如果想深入了解的話娃善,就詳細(xì)的看一下我的這個(gè)開源項(xiàng)目IPProxys代碼,代碼寫的有點(diǎn)粗糙,日后再繼續(xù)優(yōu)化瑞佩。





完整的代碼我已經(jīng)上傳到github上:https://github.com/qiyeboy/IPProxys
qqwry.dat下載鏈接:http://pan.baidu.com/s/1o7A6n8m 密碼:wcvs聚磺。


今天的分享就到這里,如果大家覺得還可以呀炬丸,記得贊賞呦瘫寝。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末蜒蕾,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子焕阿,更是在濱河造成了極大的恐慌咪啡,老刑警劉巖,帶你破解...
    沈念sama閱讀 216,591評(píng)論 6 501
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件暮屡,死亡現(xiàn)場(chǎng)離奇詭異撤摸,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)褒纲,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,448評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門准夷,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人莺掠,你說我怎么就攤上這事衫嵌。” “怎么了彻秆?”我有些...
    開封第一講書人閱讀 162,823評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵楔绞,是天一觀的道長(zhǎng)。 經(jīng)常有香客問我掖棉,道長(zhǎng),這世上最難降的妖魔是什么膀估? 我笑而不...
    開封第一講書人閱讀 58,204評(píng)論 1 292
  • 正文 為了忘掉前任幔亥,我火速辦了婚禮,結(jié)果婚禮上察纯,老公的妹妹穿的比我還像新娘帕棉。我一直安慰自己,他們只是感情好饼记,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,228評(píng)論 6 388
  • 文/花漫 我一把揭開白布香伴。 她就那樣靜靜地躺著,像睡著了一般具则。 火紅的嫁衣襯著肌膚如雪即纲。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,190評(píng)論 1 299
  • 那天博肋,我揣著相機(jī)與錄音低斋,去河邊找鬼。 笑死匪凡,一個(gè)胖子當(dāng)著我的面吹牛膊畴,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播病游,決...
    沈念sama閱讀 40,078評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼唇跨,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起买猖,我...
    開封第一講書人閱讀 38,923評(píng)論 0 274
  • 序言:老撾萬榮一對(duì)情侶失蹤改橘,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后政勃,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體唧龄,經(jīng)...
    沈念sama閱讀 45,334評(píng)論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,550評(píng)論 2 333
  • 正文 我和宋清朗相戀三年奸远,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了既棺。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,727評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡懒叛,死狀恐怖丸冕,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情薛窥,我是刑警寧澤胖烛,帶...
    沈念sama閱讀 35,428評(píng)論 5 343
  • 正文 年R本政府宣布,位于F島的核電站诅迷,受9級(jí)特大地震影響佩番,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜罢杉,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,022評(píng)論 3 326
  • 文/蒙蒙 一趟畏、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧滩租,春花似錦赋秀、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,672評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至技即,卻和暖如春著洼,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背而叼。 一陣腳步聲響...
    開封第一講書人閱讀 32,826評(píng)論 1 269
  • 我被黑心中介騙來泰國打工郭脂, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人澈歉。 一個(gè)月前我還...
    沈念sama閱讀 47,734評(píng)論 2 368
  • 正文 我出身青樓展鸡,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國和親埃难。 傳聞我的和親對(duì)象是個(gè)殘疾皇子莹弊,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,619評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容

  • Android 自定義View的各種姿勢(shì)1 Activity的顯示之ViewRootImpl詳解 Activity...
    passiontim閱讀 172,079評(píng)論 25 707
  • 1 前言 作為一名合格的數(shù)據(jù)分析師涤久,其完整的技術(shù)知識(shí)體系必須貫穿數(shù)據(jù)獲取、數(shù)據(jù)存儲(chǔ)忍弛、數(shù)據(jù)提取响迂、數(shù)據(jù)分析、數(shù)據(jù)挖掘细疚、...
    whenif閱讀 18,070評(píng)論 45 523
  • 背景: 最近比較閑蔗彤,想學(xué)習(xí)ruby on rails 于是找到了https://www.railstutorial...
    pingpong_龘閱讀 948評(píng)論 0 3
  • 橫看成嶺側(cè)成峰|【日記·書信】社群活動(dòng) 有些事情然遏,生無可解,直至陰陽兩隔吧彪,才陡然明了待侵。今生對(duì)不起,惟愿來世再還你姨裸。...
    寫意人閱讀 1,284評(píng)論 57 34
  • 笑罵式的開場(chǎng) 適合了巧遇的我們 小麥色的接觸 氤氳出舒適的溫度 靠近的動(dòng)作很輕 還是驚動(dòng)了平靜 自然吻合 默契攀升...
    coolH閱讀 284評(píng)論 0 1