app抓包

環(huán)境搭建

Android模擬器安裝

image


官網(wǎng)下載夜神模擬器安裝

抓包工具安裝

image

appium安裝

https://github.com/appium/appium-desktop/releases/tag/v1.11.0

mitmproxy安裝

下載安裝包, 直接點擊下一步即可安裝
https://github.com/mitmproxy/mitmproxy/releases/

裝好之后配置一下環(huán)境變量就行了

也可以直接使用pip install mitmproxy

安裝證書

在cmd中輸入mitmdump, 可以看到mitmdump已經(jīng)啟動了, 在監(jiān)聽8080端口

C:\Users\IIce>mitmdump
Proxy server listening at http://*:8080

打開模擬器, 配置代理

查看一下pc的ip

以太網(wǎng)適配器 以太網(wǎng):

   連接特定的 DNS 后綴 . . . . . . . : North-Class.com
   本地鏈接 IPv6 地址. . . . . . . . : fe80::68d7:38a8:2729:4d97%6
   IPv4 地址 . . . . . . . . . . . . : 192.168.100.243
   子網(wǎng)掩碼  . . . . . . . . . . . . : 255.255.255.0
   默認(rèn)網(wǎng)關(guān). . . . . . . . . . . . . : 192.168.100.250
image

配置好以后, 打開瀏覽器, 輸入baidu.com進行查看
此時會彈出證書問題, 點繼續(xù)即可

輸入mitm.it
選擇相應(yīng)的版本進行安裝

image

image

此時再訪問網(wǎng)站就不會有證書問題了

docker安裝

根據(jù)配置, 二選一

docker-toolbox

https://docs.docker.com/toolbox/toolbox_install_windows/
下載docker-toolbox, 雙擊進行安裝即可

如果安裝快要結(jié)束時報錯
IPersistFile:Save 失敗,代碼0x80070005 拒絕訪問
需要檢查殺毒軟件

安裝成功后會出現(xiàn)三個圖標(biāo)


image

雙擊Docker Quickstart Terminal圖標(biāo)亭螟,啟動一個終端
會下載一個boot2docker.iso文件,如果下載較慢的話,可以復(fù)制鏈接自行下載,
下載完成后復(fù)制到目錄中即可


image

如果出現(xiàn)Unable to start the VM: C:\Program Files\Oracle\VirtualBox\VBoxManage.exe startvm default --type headless failed:卸載掉Oracle VM Virtualbox安裝最新版即可
https://www.virtualbox.org/wiki/Downloads

完成后會出現(xiàn)

image

輸入docker run hello-world
image

Docker for Windows

https://docs.docker.com/docker-for-windows/install/

下載后雙擊安裝即可

如果安裝卡頓, 需要檢查殺毒軟件, 因為會修改注冊表和啟動項等

啟動時如果報錯
“Hardware assisted virtualization and data execution protection must be enabled in the BIOS”

需要開啟虛擬化Hyper-V

如果都開啟還如法啟動
參考https://www.e-learn.cn/content/wangluowenzhang/589447

兩個混裝可能出現(xiàn)的錯誤
https://blog.csdn.net/qq_35852248/article/details/80925154

設(shè)置加速器
https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors

fiddler設(shè)置

image
image

手機連接配置
查看pc端IP

...

以太網(wǎng)適配器 以太網(wǎng):

   連接特定的 DNS 后綴 . . . . . . . : North-Class.com
   本地鏈接 IPv6 地址. . . . . . . . : fe80::f44c:fb33:30bf:5c57%18
   IPv4 地址 . . . . . . . . . . . . : 192.168.100.248
   子網(wǎng)掩碼  . . . . . . . . . . . . : 255.255.255.0
   默認(rèn)網(wǎng)關(guān). . . . . . . . . . . . . : 192.168.100.250
...

設(shè)置代理,服務(wù)器主機名是pc端IPv4地址


image

設(shè)置完成后瀏覽器訪問主機IP+端口


image
image

App應(yīng)用在開啟抓包工具后無法聯(lián)網(wǎng)問題

http://www.imooc.com/article/251500

fiddler 不能抓包的方法

https://testerhome.com/topics/11462?from=singlemessage

豆果美食菜譜抓取

在模擬器中下載并安裝豆果美食
設(shè)置代理準(zhǔn)備進行數(shù)據(jù)抓包
打開fiddler和豆果美食


image

點擊菜譜分類


image

點擊標(biāo)簽進入詳情


image

抓包分析,http://api.douguo.net/recipe/flatcatalogs這個url返回的是菜譜分類

image

http://api.douguo.net/recipe/v2/search/0/20返回的是詳情
綜合最佳,收藏最多做過最多使用的都是這一個url,只是提交的參數(shù)不同

# 0:綜合最佳   2: 收藏最多   3:做過最多
"order": "0",

編碼實現(xiàn)

請求頭

首先將請求頭共用的部分提取出來,注釋掉的都是可以不用提交的

def handle_reques(url, data):
    header = {
        "client": "4",
        "version": "6934.2",
        "device": "OPPO R11",
        "sdk": "22,5.1.1",
        "imei": "866174010942858",
        "channel": "baidu",
        # "mac": " 6A:07:15:F0:34:85",
        "resolution": "1280*720",
        "dpi": "1.5",
        # "android-id": "6a0715f034851883",
        # "pseudo - id": "5f0348518836a071",
        "brand": "OPPO",
        "scale": "1.5",
        "timezone": "28800",
        "language": "zh",
        "cns": "3",
        "carrier": "CHINA+MOBILE",
        # "imsi": "460071060715240",
        "user-agent": "Mozilla/5.0 (Linux; Android 5.1.1; OPPO R11 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/39.0.0.0 Mobile Safari/537.36",
        "reach": "1",
        "newbie": "1",
        "Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "Keep-Alive",
        # "Cookie": "duid=59159842",
        "Host": "api.douguo.net",
        # "Content-Length": "74",
    }

    response = requests.post(url=url, headers=header, data=data)
    return response

菜譜分類


from multiprocessing import Queue

queue_list = Queue()

def handle_index():
    url = "http://api.douguo.net/recipe/flatcatalogs"
    data = {
        "client": "4",
        # "_session": "1552715432169866174010942858",
        # "v": "1503650468",
        # "_vs": "0",   0 和 2305都可以
        "_vs": "2305",

    }

    response = handle_reques(url=url, data=data)
    # print(response.text)
    response_to_dict = json.loads(response.text)

    for item in response_to_dict['result']['cs']:
        for item_1 in item['cs']:
            for item_2 in item_1['cs']:
                data_2 = {
                    "client": "4",
                    # "_session": "1552715831226866174010942858",
                    "keyword": item_2['name'],
                    # 0:綜合最佳   2: 收藏最多   3:做過最多
                    "order": "0",
                    "_vs": "400",
                }
                queue_list.put(data_2)

詳情

def handle_caipu_list(data):
    print("當(dāng)前處理:", data['keyword'])
    caipu_list_url = 'http://api.douguo.net/recipe/v2/search/0/20'
    caipu_list_response = handle_reques(url=caipu_list_url, data=data)
    response_to_dict = json.loads(caipu_list_response.text)
    handle_caipu_detail(data, response_to_dict)

    count=0
    while response_to_dict['result']['end'] == 0:
        count+=1
        caipu_list_url = 'http://api.douguo.net/recipe/v2/search/{}/20'.format(count*20)
        caipu_list_response = handle_reques(url=caipu_list_url, data=data)
        response_to_dict = json.loads(caipu_list_response.text)
        handle_caipu_detail(data, response_to_dict)

具體做法

def handle_caipu_detail(data, response_to_dict):

    for item in response_to_dict['result']['list']:
        caipu_info = {}
        caipu_info['shicai'] = data['keyword']

        if item['type'] == 13:
            caipu_info['author'] = item['r']['an']
            caipu_info['shicai_id'] = item['r']['id']  # 查看詳細(xì)操作步驟時使用
            caipu_info['describe'] = item['r']['cookstory']
            caipu_info['caipu_name'] = item['r']['n']
            caipu_info['zuoliao_list'] = item['r']['major']

            detail_url = 'http://api.douguo.net/recipe/detail/' + str(caipu_info['shicai_id'])
            detail_data = {
                "client": "4",
                # "_session": "1552715831226866174010942858",
                "author_id": "0",
                "_vs": "2803",
                "_ext": '{"query":{"kw":' + data["keyword"] + ',"src":"2803","type":"13","id":' + str(
                    caipu_info["shicai_id"]) + '}}',
            }

            detail_response = handle_reques(url=detail_url, data=detail_data)
            # print(detail_response.text)
            detail_response_to_dict = json.loads(detail_response.text)

            caipu_info['tips'] = detail_response_to_dict['result']['recipe']['tips']
            caipu_info['cook_step'] = detail_response_to_dict['result']['recipe']['cookstep']

            print('當(dāng)前入庫:', caipu_info['caipu_name'])
            mongo_info.insert_item(caipu_info)

        else:
            continue

入庫

import pymongo

from pymongo.collection import Collection


class Connect_Mongo:
    def __init__(self):
        self.client = pymongo.MongoClient()
        self.db_data = self.client['dou_guo_mei_shi']

    def insert_item(self, item):
        db_collection = Collection(self.db_data, 'mei_shi')
        db_collection.insert_one(item)


mongo_info = Connect_Mongo()

多線程測試

if __name__ == '__main__':
    handle_index()
    # print(queue_list.qsize())
    # handle_caipu_list(queue_list.get())
    pool = ThreadPoolExecutor()

    while queue_list.qsize() > 0:
        pool.submit(handle_caipu_list, queue_list.get())

安裝android-sdk

http://tools.android-studio.org/index.php/sdk
下載安裝即可

配置環(huán)境變量

變量
ANDROID_HOME(新建) G:\Program Files (x86)\Android\android-sdk
Path(添加) %ANDROID_HOME%\tools
Path(添加) %ANDROID_HOME%\platform-tools

運行SDK Manager.exe

image

安卓版本勾選最新版的即可,兼容舊版本
image

安裝完成后打開cmd,輸入adb,可以看到adb版本

Android Debug Bridge version 1.0.40
Version 28.0.2-5303910
Installed as G:\Program Files (x86)\Android\android-sdk\platform-tools\adb.exe

global options:
 -a         listen on all network interfaces, not just localhost
 -d         use USB device (error if multiple devices connected)
 -e         use TCP/IP device (error if multiple TCP/IP devices available)
 -s SERIAL  use device with given serial (overrides $ANDROID_SERIAL)
 -t ID      use device with given transport id
 -H         name of adb server host [default=localhost]
 -P         port of adb server [default=5037]
 -L SOCKET  listen on given socket for adb server [default=tcp:localhost:5037]

升級夜神模擬器的adb

android-sdk\platform-tools中的三個adb文件拷貝到模擬器安裝目錄下

image

將adb.exe復(fù)制一份,覆蓋掉原來的nox_adb.exe,
開啟模擬器的開發(fā)者選項
重啟模擬器,打開cmd

C:\Users\lenovo>adb devices
List of devices attached
127.0.0.1:52001 device

模擬器已連接上了

uiautomatorviewer

文件位置D:\Program Files (x86)\Android\android-sdk\tools\uiautomatorviewer.bat

雙擊運行, 將黑窗口最小化,不要關(guān)閉

點擊生成屏幕快照, 可以使用鼠標(biāo)查看元素的信息

appium

啟動參數(shù)配置
http://www.testclass.net/appium

{
  "platformName": "Android",
  "deviceName": "127.0.0.1:52001",
  "platformVersion": "5.1.1",
  "appPackage": "com.tal.kaoyan",
  "appActivity": "com.tal.kaoyan.ui.activity.SplashActivity"
}

appPackageappActivity 獲取
使用aapt.exe dump badging來獲取

D:\Program Files (x86)\Android\android-sdk\build-tools\28.0.3>aapt.exe dump badging F:\BrowserDownload\kaoyanbang_3.3.7beta.243.apk

package: name='com.tal.kaoyan' versionCode='92' versionName='3.3.7beta' compileSdkVersion='28' compileSdkVersionCodename='9'
sdkVersion:'16'
...

launchable-activity: name='com.tal.kaoyan.ui.activity.SplashActivity'  label='' icon=''

...

考研幫測試

pip install Appium-Python-Client
{
  "platformName": "Android",
  "deviceName": "127.0.0.1:52001",
  "platformVersion": "5.1.1",
  "appPackage": "com.tal.kaoyan",
  "appActivity": "com.tal.kaoyan.ui.activity.SplashActivity",
  "noReset": true
}
import time

from appium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

cap = {
    "platformName": "Android",
    "deviceName": "127.0.0.1:52001",
    "platformVersion": "5.1.1",
    "appPackage": "com.tal.kaoyan",
    "appActivity": "com.tal.kaoyan.ui.activity.SplashActivity",
    "noReset": True
}

name = ""
pwd = ""

driver = webdriver.Remote("http://localhost:4723/wd/hub", cap)


def get_size():
    x = driver.get_window_size()['width']
    y = driver.get_window_size()['height']
    return (x, y)


try:
    # 是否跳過
    if WebDriverWait(driver, 3).until(
            lambda x: x.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_skip']")):
        driver.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_skip']").click()
except:
    pass

try:
    if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_email_edittext']")):
        driver.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_email_edittext']").send_keys(name)
        driver.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_password_edittext']").send_keys(pwd)
        driver.find_element_by_xpath(
            "http://android.widget.Button[@resource-id='com.tal.kaoyan:id/login_login_btn']").click()
except:
    pass

try:
    # 隱私協(xié)議
    if WebDriverWait(driver, 3).until(
            lambda x: x.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_title']")):
        driver.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_agree']").click()
        driver.find_element_by_xpath(
            "http://android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]").click()
except:
    pass

# 點擊研訊
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
        "http://android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]/android.widget.LinearLayout[1]/android.widget.ImageView[1]")):
    driver.find_element_by_xpath(
        "http://android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]/android.widget.LinearLayout[1]/android.widget.ImageView[1]").click()

    l = get_size()

    x1 = int(l[0] * 0.5)
    y1 = int(l[1] * 0.75)
    y2 = int(l[1] * 0.25)

    # 滑動操作
    while True:
        driver.swipe(x1, y1, x1, y2)
        time.sleep(0.5)

整體操作和selenium差不多

抖音粉絲抓取

先找一個分享鏈接
https://www.douyin.com/share/user/96578108671

瀏覽器打開, 進行查看, 可以看到數(shù)字被進行了混淆,
字符文件鏈接https://s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eb9a50.woff

在線字體查看http://fontstore.baidu.com/static/editor/index.html
將下載的字體文件上傳到網(wǎng)站, 就能看到字符和數(shù)字之間的關(guān)系了

分享頁面內(nèi)容抓取

import re
import requests
import time
from lxml import etree

from douyin.handle_mongo import get_task


def handle_decode(input_data, share_web_url, task):
    search_douyin_str = re.compile('抖音ID:')
    regex_list = [
        {'name': ['  ', '  ', '  '], 'value': 0},
        {'name': ['  ', '  ', '  '], 'value': 1},
        {'name': ['  ', '  ', '  '], 'value': 2},
        {'name': ['  ', '  ', '  '], 'value': 3},
        {'name': ['  ', '  ', '  '], 'value': 4},
        {'name': ['  ', '  ', '  '], 'value': 5},
        {'name': ['  ', '  ', '  '], 'value': 6},
        {'name': ['  ', '  ', '  '], 'value': 7},
        {'name': ['  ', '  ', '  '], 'value': 8},
        {'name': ['  ', '  ', '  '], 'value': 9},
    ]

    for i1 in regex_list:
        for i2 in i1['name']:
            input_data = re.sub(i2, str(i1['value']), input_data)
    share_web_html = etree.HTML(input_data)
    douyin_info = {}
    douyin_info['nick_name'] = \
    share_web_html.xpath("http://div[@class='personal-card']/div[@class='info1']//p[@class='nickname']/text()")[0]
    if 'douyin_id' in task:
        douyin_info['douyin_id'] = task['douyin_id']
    else:
        douyin_id = ''.join(
            share_web_html.xpath("http://div[@class='personal-card']/div[@class='info1']/p[@class='shortid']/i/text()"))
        if douyin_id == '':
            try:
                douyin_info['douyin_id'] = re.sub(search_douyin_str, '', share_web_html.xpath(
                    "http://div[@class='personal-card']/div[@class='info1']/p[@class='shortid']/text()")[0]).strip()
            except:
                douyin_info['douyin_id'] = '無數(shù)據(jù)'
        else:
            douyin_info['douyin_id'] = douyin_id

    try:
        douyin_info['job'] = share_web_html.xpath(
            "http://div[@class='personal-card']/div[@class='info2']/div[@class='verify-info']/span[@class='info']/text()")[
            0].strip()
    except:
        pass
    douyin_info['describe'] = \
    share_web_html.xpath("http://div[@class='personal-card']/div[@class='info2']/p[@class='signature']/text()")[0].replace(
        '\n', ',')
    douyin_info['location'] = \
    share_web_html.xpath("http://div[@class='personal-card']/div[@class='info2']/p[@class='extra-info']/span[1]/text()")
    douyin_info['xingzuo'] = \
    share_web_html.xpath("http://div[@class='personal-card']/div[@class='info2']/p[@class='extra-info']/span[2]/text()")
    douyin_info['follow_count'] = share_web_html.xpath(
        "http://div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='focus block']//i[@class='icon iconfont follow-num']/text()")[
        0].strip()
    fans_value = ''.join(share_web_html.xpath(
        "http://div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='follower block']//i[@class='icon iconfont follow-num']/text()"))
    unit = share_web_html.xpath(
        "http://div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='follower block']/span[@class='num']/text()")
    if unit[-1].strip() == 'w':
        douyin_info['fans'] = str((int(fans_value) / 10)) + 'w'
    like = ''.join(share_web_html.xpath(
        "http://div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='liked-num block']//i[@class='icon iconfont follow-num']/text()"))
    unit = share_web_html.xpath(
        "http://div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='liked-num block']/span[@class='num']/text()")
    if unit[-1].strip() == 'w':
        douyin_info['like'] = str(int(like) / 10) + 'w'
    douyin_info['from_url'] = share_web_url

    print(douyin_info)



def handle_douyin_web_share(task):
    share_web_url = 'https://www.douyin.com/share/user/' + task
    print(share_web_url)
    share_web_header = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'
    }
    share_web_response = requests.get(url=share_web_url, headers=share_web_header)
    handle_decode(share_web_response.text, share_web_url, task)

if __name__ == '__main__':
    # task = get_task("share_id")
    handle_douyin_web_share("88445518961")
https://www.douyin.com/share/user/88445518961
{'nick_name': 'Dear-迪麗熱巴', 'douyin_id': '274110380', 'job': '演員', 'describe': '先定一個能達(dá)到的小目標(biāo)团甲,比方說來句簽名', 'location': [], 'xingzuo': [], 'follow_count': '0', 'fans': '5046.8w', 'like': '13527.7w', 'from_url': 'https://www.douyin.com/share/user/88445518961'}

粉絲抓取

前提: 登錄狀態(tài), 最新版本

抓取個人的粉絲

import sys
import time
from selenium.webdriver.support.ui import WebDriverWait
from appium import webdriver

desired_caps = {}
desired_caps['platformName'] = 'Android'
desired_caps['deviceName'] = '127.0.0.1:62001'
desired_caps['platformVersion'] = '5.1.1'
desired_caps['appPackage'] = 'com.ss.android.ugc.aweme'
desired_caps['appActivity'] = 'com.ss.android.ugc.aweme.splash.SplashActivity'
desired_caps['noReset'] = True
desired_caps['unicodeKeyboard'] = True
desired_caps['resetKeyboard'] = True

driver = webdriver.Remote('http://localhost:4723/wd/hub', desired_caps)


def get_size(driver):
    x = driver.get_window_size()['width']
    y = driver.get_window_size()['height']
    return (x, y)


def handle_douyin(driver):
    try:
        # 點擊搜索
        while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
                "http://android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']")):
            driver.find_element_by_xpath(
                "http://android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']").click()
            break
    except:
        print("找不到搜索按鈕")

    # 定位搜索框
    if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']")):
        # 獲取douyin_id進行搜索
        driver.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
        while driver.find_element_by_xpath(
                "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").text != '706942127':
            driver.find_element_by_xpath(
                "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
            time.sleep(0.1)
    # 點擊搜索
    driver.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.ss.android.ugc.aweme:id/afr']").click()

    # 點擊用戶標(biāo)簽
    if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath("http://android.widget.TextView[@text='用戶']")):
        driver.find_element_by_xpath("http://android.widget.TextView[@text='用戶']").click()

    # 點擊頭像
    if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
            "/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]")):
        driver.find_element_by_xpath(
            "/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]").click()
    # 點擊粉絲按鈕
    if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_id("com.ss.android.ugc.aweme:id/aj1")):
        driver.find_element_by_id("com.ss.android.ugc.aweme:id/aj1").click()

    l = get_size(driver)
    x1 = int(l[0] * 0.5)
    y1 = int(l[1] * 0.75)
    y2 = int(l[1] * 0.25)
    while True:
        if '沒有更多了' in driver.page_source:
            break
        driver.swipe(x1, y1, x1, y2)
        time.sleep(0.5)


if __name__ == '__main__':
    handle_douyin(driver)

Appium會先打開抖音, 然后點擊搜索圖標(biāo), 獲取搜索欄進行輸入, 點擊搜索按鈕, 點擊用戶, 點擊頭像, 點擊粉絲, 模擬滑動, 直到?jīng)]有粉絲了

粉絲入庫

使用mitmdump來將數(shù)據(jù)存入數(shù)據(jù)庫
mitmdump -s xxx.py

import json

from douyin.handle_mongo import save_task


def response(flow):
    if 'aweme/v1/user/follower/list/' in flow.request.url:
        for user in json.loads(flow.response.text)['followers']:
            douyin_info = {}
            douyin_info['share_id'] = user['uid']
            douyin_info['douyin_id'] = user['short_id']
            douyin_info['nickname'] = user['nickname']
            save_task(douyin_info)

這樣在滑動粉絲時, 就會將粉絲的信息添加進數(shù)據(jù)庫

多設(shè)備抓取

設(shè)置一下appium
appium客戶端設(shè)置 udid
appium服務(wù)端設(shè)置 bootstrapPort

image

需要開啟多個模擬器或者多臺真機

import multiprocessing
import sys
import time
from selenium.webdriver.support.ui import WebDriverWait
from appium import webdriver

# desired_caps = {}
# desired_caps['platformName'] = 'Android'
# desired_caps['deviceName'] = '127.0.0.1:62001'
# desired_caps['platformVersion'] = '5.1.1'
# desired_caps['appPackage'] = 'com.ss.android.ugc.aweme'
# desired_caps['appActivity'] = 'com.ss.android.ugc.aweme.splash.SplashActivity'
# desired_caps['noReset'] = True
# desired_caps['unicodeKeyboard'] = True
# desired_caps['resetKeyboard'] = True
#
# driver = webdriver.Remote('http://localhost:4723/wd/hub', desired_caps)


def get_size(driver):
    x = driver.get_window_size()['width']
    y = driver.get_window_size()['height']
    return (x, y)


def handle_douyin(driver):
    while True:
        # 定位搜索框
        while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
                "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']")):
            # 獲取douyin_id進行搜索
            driver.find_element_by_xpath(
                "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
            while driver.find_element_by_xpath(
                    "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").text != '706942127':
                driver.find_element_by_xpath(
                    "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
                time.sleep(0.1)
                break
            break
        # 點擊搜索
        driver.find_element_by_xpath("http://android.widget.TextView[@resource-id='com.ss.android.ugc.aweme:id/afr']").click()

        # 點擊用戶標(biāo)簽
        if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath("http://android.widget.TextView[@text='用戶']")):
            driver.find_element_by_xpath("http://android.widget.TextView[@text='用戶']").click()

        # 點擊頭像
        if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
                "/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]")):
            driver.find_element_by_xpath(
                "/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]").click()
        # 點擊粉絲按鈕
        if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id("com.ss.android.ugc.aweme:id/aj1")):
            driver.find_element_by_id("com.ss.android.ugc.aweme:id/aj1").click()

        l = get_size(driver)
        x1 = int(l[0] * 0.5)
        y1 = int(l[1] * 0.75)
        y2 = int(l[1] * 0.25)
        while True:
            if '沒有更多了' in driver.page_source:
                break
            elif '還沒有粉絲' in driver.page_source:
                break
            else:
                driver.swipe(x1, y1, x1, y2)
                time.sleep(0.5)

        driver.find_element_by_id("com.ss.android.ugc.aweme:id/n7").click()
        driver.find_element_by_id("com.ss.android.ugc.aweme:id/n7").click()
        driver.find_element_by_xpath(
            "http://android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").clear()


def handle_appium(device, port):
    caps = {}
    caps["platformName"] = "Android"
    caps["deviceName"] = device
    caps["platformVersion"] = "5.1.1"
    caps["appPackage"] = "com.ss.android.ugc.aweme"
    caps["appActivity"] = "com.ss.android.ugc.aweme.splash.SplashActivity"
    caps["noReset"] = True
    caps["unicodeKeyboard"] = True
    caps["resetKeyboard"] = True
    caps["udid"] = device

    driver = webdriver.Remote('http://localhost:'+str(port)+'/wd/hub', caps)

    try:
        # 點擊搜索圖標(biāo)
        while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
                "http://android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']")):
            driver.find_element_by_xpath(
                "http://android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']").click()
            break
    except:
        print("找不到搜索按鈕")

    handle_douyin(driver)

if __name__ == '__main__':
    m_list = []

    devices_list = ['127.0.0.1:62001', '127.0.0.1:62025']
    for device in range(len(devices_list)):
        port = 4723+2*device
        m_list.append(multiprocessing.Process(target=handle_appium, args=(devices_list[device], port)))

    for m in m_list:
        m.start()

    for m in m_list:
        m.join()

devices_list 里的數(shù)據(jù)可以通過adb devices查看

C:\Users\IIce>adb devices
List of devices attached
127.0.0.1:62001 device
127.0.0.1:62025 device

抖音視頻抓取

從抖音 APP 分享個人信息轮傍,復(fù)制鏈接申钩,獲得個人主頁地址色徘,示例:
https://www.iesdouyin.com/share/user/58862693224

視頻接口解析

使用 Chrome 抓包很洋,獲取視頻列表接口的請求信息


image

鏈接參數(shù)解析

https://www.iesdouyin.com/web/api/v2/aweme/post/?
user_id=58862693224 #   分享鏈接中的id
count=21            #   視頻個數(shù)
max_cursor=0        #   翻頁使用的參數(shù), 第一次是0, 往后會根據(jù)上次的返回結(jié)果進行變化
aid=1128            #   固定值
_signature=laPLvBAVyX-c77Gpje7Ys5Wjy6   #   簽名值失乾,由簽名算法計算
dytk=66cb5d220e0e48ed9195a7f62ac32764   #   不知道是啥, 網(wǎng)頁中可直接提取

獲取簽名算法

打開控制臺, 搜索_signature

image

image

定位_bytedAcrawler

image

定位 douyin_falcon:node_modules/byted-acrawler/dist/runtime

image

定位 __M.define

image

分析簽名算法的執(zhí)行邏輯

① 定義 __M對象,及其definerequire 函數(shù)
② 執(zhí)行 __M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime......" 這段代碼
③ 執(zhí)行_bytedAcrawler = require("douyin_falcon:node_modules/byted-acrawler/dist/runtime")

④ 計算簽名值 _signature = _bytedAcrawler.sign(user_id)

我們可以自己編寫一個html文件, 訪問這個文件來得到_signature
淘寶chromedriver鏡像

源碼地址

關(guān)于水印

視頻鏈接的url分兩種

  1. https://aweme.snssdk.com/aweme/v1/play/?video_id=v0300f6d0000bj81rdqrh6f3j18kvnpg&line=0&ratio=540p&media_type=4&vr_type=0&improve_bitrate=0
  2. https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0300f6d0000bj81rdqrh6f3j18kvnpg&line=0&ratio=540p&media_type=4&vr_type=0&improve_bitrate=0

區(qū)別:

  1. 第一個請求的是play,第二個請求的是playwm
  2. 第一個網(wǎng)站是打不開的, 第二個可以打開
  3. 都可以使用requests來獲取
  4. 第一個是無水印的!!!

通過Postman測試, 發(fā)現(xiàn)只保留一個video_id即可

image

參數(shù)說明

image

has_more來判斷是否需要翻頁
max_cursor下次請求時需要攜帶的, 首次為 0

參考

使用 NodeJS 提供抖音簽名算法服務(wù)

無水印解析

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末娩贷,一起剝皮案震驚了整個濱河市第晰,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌彬祖,老刑警劉巖茁瘦,帶你破解...
    沈念sama閱讀 217,542評論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異储笑,居然都是意外死亡腹躁,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,822評論 3 394
  • 文/潘曉璐 我一進店門南蓬,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人哑了,你說我怎么就攤上這事赘方。” “怎么了弱左?”我有些...
    開封第一講書人閱讀 163,912評論 0 354
  • 文/不壞的土叔 我叫張陵窄陡,是天一觀的道長。 經(jīng)常有香客問我拆火,道長跳夭,這世上最難降的妖魔是什么涂圆? 我笑而不...
    開封第一講書人閱讀 58,449評論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮币叹,結(jié)果婚禮上润歉,老公的妹妹穿的比我還像新娘。我一直安慰自己颈抚,他們只是感情好踩衩,可當(dāng)我...
    茶點故事閱讀 67,500評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著贩汉,像睡著了一般驱富。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上匹舞,一...
    開封第一講書人閱讀 51,370評論 1 302
  • 那天褐鸥,我揣著相機與錄音,去河邊找鬼赐稽。 笑死叫榕,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的又憨。 我是一名探鬼主播翠霍,決...
    沈念sama閱讀 40,193評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼蠢莺!你這毒婦竟也來了寒匙?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,074評論 0 276
  • 序言:老撾萬榮一對情侶失蹤躏将,失蹤者是張志新(化名)和其女友劉穎锄弱,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,505評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡云石,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,722評論 3 335
  • 正文 我和宋清朗相戀三年俏站,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片掸鹅。...
    茶點故事閱讀 39,841評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖拦赠,靈堂內(nèi)的尸體忽然破棺而出巍沙,到底是詐尸還是另有隱情,我是刑警寧澤荷鼠,帶...
    沈念sama閱讀 35,569評論 5 345
  • 正文 年R本政府宣布句携,位于F島的核電站,受9級特大地震影響允乐,放射性物質(zhì)發(fā)生泄漏矮嫉。R本人自食惡果不足惜削咆,卻給世界環(huán)境...
    茶點故事閱讀 41,168評論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望蠢笋。 院中可真熱鬧拨齐,春花似錦、人聲如沸挺尿。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,783評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽编矾。三九已至熟史,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間窄俏,已是汗流浹背蹂匹。 一陣腳步聲響...
    開封第一講書人閱讀 32,918評論 1 269
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留凹蜈,地道東北人限寞。 一個月前我還...
    沈念sama閱讀 47,962評論 2 370
  • 正文 我出身青樓,卻偏偏與公主長得像仰坦,于是被迫代替她去往敵國和親履植。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,781評論 2 354

推薦閱讀更多精彩內(nèi)容