筆趣閣小說電子書生成辦法

為什么要寫這個腳本

1.收費
2.廣告
只是想干干凈凈的看小說而已

步驟

1.使用下邊的腳本，下載小說，保存成xxx.html
2.使用calibre工具，生成電子書。記得在生成電子書時屋摇，目錄結構->一級目錄，填寫//h:h4 幽邓，這樣生成的電子書就帶有目錄了炮温。
3.好了，可以美美看純凈版小說了

腳本內容

#!/usr/bin/python3
#-*-coding:utf-8-*-
#biquge小說下載
import re
import urllib.request
import ssl
from pyquery import PyQuery as pq
import time 

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    html = html.decode('utf-8')
    return html
#下載列表
def getArticleList(listurl,contenturl_prefix):
    html=getHtml(listurl)
    doc = pq(html)
    ret=[]
    for a in doc("div#list dd a").items():
        href=contenturl_prefix+a.attr("href")
        title=a.text()
        ret.append((href,title))
    return ret
#下載內容
def getArticle(contenturl):
    html=getHtml(contenturl)
    doc = pq(html)
    return doc("div#content").html()

#
if __name__ == "__main__":
    article_list_url,article_url_prefix=("https://www.biquge.info/1_1760/","https://www.biquge.info/1_1760/")
    article_iterms = getArticleList(article_list_url,article_url_prefix)
    save2file = "/Users/myname/Downloads/xiaoshuo.html" 
    with open(save2file,'w',encoding="utf-8") as f:
        f.write("<html><body>")
        for art in article_iterms:
            content = getArticle(art[0])
            f.write("<h4>"+art[1]+"</h4>")
            f.write(content)
            print(art[1])
            time.sleep(1)
        f.write("<html><body>")

其實也就是個最簡單的爬蟲腳本牵舵，
稍作修改也適應其他小說網(wǎng)站柒啤。

加強版本，支持大本書切割

#!/usr/bin/python3
#-*-coding:utf-8-*-
#biquge小說下載
import re
import urllib.request
import ssl
from pyquery import PyQuery as pq
import time 
import sys

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    html = html.decode('utf-8')
    return html

#下載列表
def getArticleList(listurl,contenturl_prefix):
    html=getHtml(listurl)
    doc = pq(html)
    ret=[]
    for a in doc("div#list dd a").items():
        href=contenturl_prefix+a.attr("href")
        title=a.text()
        ret.append((href,title))
    return ret

#下載內容
def getArticle(contenturl):
    html=getHtml(contenturl)
    doc = pq(html)
    return doc("div#content").html()

#清洗    
def clearArticle(content):
    #TODO 清洗畸颅，自己實現(xiàn)担巩，可以用正則、字符串替換的方式清理
    return content.replace("龗","")

#支持分割功能
if __name__ == "__main__":
    article_list_url,article_url_prefix=("https://www.biquge.info/1_1760/","https://www.biquge.info/1_1760/")
    article_iterms = getArticleList(article_list_url,article_url_prefix)
    save2file = "/Users/myname/Downloads/xiaoshuo_{0:0>4d}.html" 
    single_book_size = 100 #1本書最多有多少章節(jié)没炒，避免一本書過大涛癌，轉換失敗
    fo = None
    index=0
    book_no=1
    for art in article_iterms:
        print(art[1])
        if index%single_book_size==0:
            if fo!=None:
                fo.write("</body></html>")
                fo.close()
                fo=None
                book_no=book_no+1
            fo=open(save2file.format(book_no),'w',encoding="utf-8")
            fo.write("<html><body>")
        content = clearArticle(getArticle(art[0]))
        fo.write("<h4>"+art[1]+"</h4>")
        fo.write(content)
        time.sleep(1)#下載1篇后，休息1秒鐘送火，做一個有道德的爬蟲
        index=index+1
    if index%single_book_size!=0:
        fo.write("</body></html>")
        fo.close()

最后編輯于：2019.03.01 08:53:52

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

筆趣閣小說電子書生成辦法

為什么要寫這個腳本

步驟

加強版本，支持大本書切割

筆趣閣小說電子書生成辦法