簡書Markdown遷移至Ghost

簡書一直用著挺好的每瞒，唯一讓我難受的點(diǎn)就是管理文章的頁面沒有搜索，文章多了就很難找之前寫過的文章暇唾。另外促脉，有好長時(shí)間了，簡書的圖片一直加載不出來策州，我起初以為是我的網(wǎng)絡(luò)開代理造成瘸味，搜索一番才知道是簡書截停了Firefox的訪問，不清楚處于什么原因抽活，但是堅(jiān)定了我撤離簡書的想法硫戈。

簡書還是很大度的，可以設(shè)置-賬號(hào)設(shè)置里可以打包下載所有文章下硕，結(jié)果是純markdown文本丁逝，很容易做遷移。

困難來到Ghost這邊梭姓， Ghost支持幾個(gè)國外平臺(tái)的導(dǎo)入霜幼，國內(nèi)的就不用想了。Ghost提供導(dǎo)入和導(dǎo)出選項(xiàng)誉尖，我這里仿造Ghost的導(dǎo)出格式罪既，把簡書的文章塞進(jìn)去，再導(dǎo)回去铡恕。

Ghost內(nèi)容生成

文章末尾提供了一個(gè)python腳本琢感，用于生成Ghost的導(dǎo)入文件。

聲明：腳本和本文所述內(nèi)容可能造成未知問題探熔，使用前確保你清楚其中的功能并做好備份驹针，本人不對造成的任何損失負(fù)責(zé)，轉(zhuǎn)載請注明出處诀艰。

首先介紹一下這個(gè)腳本的輸入輸出：

輸入
1. 簡書導(dǎo)出的rar文件
2. Ghost導(dǎo)出的json文件柬甥，用于獲取Ghost 中的配置信息
輸出
1. json格式的Ghost導(dǎo)入文件，包含文章信息
2. zip格式的Ghost導(dǎo)入文件其垄，包含圖片信息苛蒲，兩個(gè)文件需要分開單獨(dú)導(dǎo)入

腳本依賴

系統(tǒng)中7z命令進(jìn)行壓縮和解壓，所以運(yùn)行前保證你在系統(tǒng)命令行中可以使用7z绿满。
需要使用requests下載簡書的圖片臂外，使用pip install requests進(jìn)行安裝

腳本運(yùn)行

找到main函數(shù)，這里有四個(gè)參數(shù)，改成你的漏健，執(zhí)行就好了辜膝，生成文件放在了簡書導(dǎo)出的rar文件同級(jí)的目錄，文章名在下載時(shí)簡書出于通用性考慮把特殊字符換成了“-”漾肮，和本文無關(guān)。

設(shè)置參數(shù)

去我的Ghost茎毁，看看效果吧：http://ray.twig.ink


import os
import json
from pathlib import Path
import datetime
import subprocess


def handle_img (post_info, save_path, featured_first_img):
    """下載圖片并替換鏈接"""
    md_str = post_info['markdown']
    if 'https://upload-images' not in md_str:
        return md_str

    import re
    import requests
    # 匹配Markdown圖片鏈接
    pattern = r'!\[(.*?)\]\((.*?)\)'  # 匹配 ![alt text](image_url) 格式的圖片鏈接

    now = datetime.datetime.now()
    _rel_path = f'/content/images/{now.year}/{now.month}/'
    ghost_image_path = f'__GHOST_URL__{_rel_path}'
    image_save_path = f'{save_path}{_rel_path}'
    if not os.path.exists(image_save_path):
        os.makedirs(image_save_path)

    # 下載圖片
    matches = re.findall(pattern, md_str)
    for alt, url in matches:
        img_url = url.split('?')[0]
        img_file_name = img_url.split('/')[-1]
        image_save_url = f'{image_save_path}/{img_file_name}'
        print(f'downloading.. {url}')
        response = requests.get(url)
        if response.status_code == 200:
            with open(image_save_url, 'wb') as file:
                file.write(response.content)

        if featured_first_img and post_info['feature_image'] is None:
            post_info['feature_image'] = f'{ghost_image_path}/{img_file_name}'

    # 替換原文圖片鏈接
    def replace_image_url(match):
        alt_text = match.group(1)
        original_url = match.group(2)
        # 提取圖片名
        image_name = os.path.basename(original_url.split('?')[0])
        # 構(gòu)建新的圖片鏈接
        new_url = f'{ghost_image_path}{image_name}'
        return f'![{alt_text}]({new_url})'
    res = re.sub(pattern, replace_image_url, md_str)
    return res

def md_to_mobiledoc(markdown, mobiledoc_version):
    mobiledoc = json.dumps({
        'version': mobiledoc_version,
        'markups': [],
        'atoms': [],
        'cards': [['markdown', {'cardName': 'markdown', 'markdown': markdown}]],
        'sections': [[10, 0]]
    }, ensure_ascii=False)
    return mobiledoc

def generate_uuid():
    import uuid
    return str(uuid.uuid4())

def generate_id():
    """生成ghost格式的id克懊，但是這個(gè)導(dǎo)入的時(shí)候并沒有用，系統(tǒng)會(huì)自動(dòng)再生成一個(gè)"""
    custom_id = generate_uuid().replace('-', '')[-24:]
    return custom_id

def read_jianshu(zip_path: str):
    """將簡書的所有markdown文件讀出來"""
    _path = Path(zip_path)

    extract_to = os.path.join(_path.parent, _path.stem)
    unzip_file(zip_path, extract_to)
    posts = []
    tags = {}
    for md_file in find_md_files(extract_to):
        # print(f"Found MD file: {md_file}")
        __path = Path(md_file)
        with open(md_file, 'r', encoding='utf-8') as file:
            tag = __path.parent.name
            if tag not in tags.keys():
                tags[tag] = generate_id()
            tag_id = tags[tag]
            posts.append({
                'id': generate_id(),
                'tag': tag,
                'tag_id': tag_id,
                'title': __path.stem,
                'markdown': file.read(),
                'feature_image': None
            })
    return posts, tags

def unzip_file(zip_path, extract_to):
    """解壓rar文件到指定目錄"""
    if not os.path.exists(extract_to):
        os.makedirs(extract_to)
    res = subprocess.run(['7z', 'x', zip_path, f'-o{extract_to}', '-aoa'], capture_output=True, text=True)
    print(res.stdout)

def zip_file(folder_to_compress, compress_to):
    """壓縮文件"""
    res = subprocess.run(['7z', 'a', compress_to, folder_to_compress], capture_output=True, text=True)
    print(res.stdout)

def find_md_files(directory):
    """遞歸遍歷目錄七蜘，找到所有的.md文件"""
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.md'):
                yield os.path.join(root, file)

def build_ghost(post_infos: list[dict], ghost_config: dict, tags) -> dict:
    """使用已知的信息組裝post"""
    from datetime import datetime, timezone
    # 格式化時(shí)間為指定格式
    current_time = datetime.now(timezone.utc)
    formatted_time = current_time.strftime('%Y-%m-%dT%H:%M:%S.000Z')

    author_id = ghost_config['db'][0]['data']['users'][0]['id']

    _model = {
        'posts_authors': [{
            'id': generate_id(),
            "post_id": post['id'],
            "author_id": author_id,
            "sort_order": 0
        }for post in post_infos],
        'posts': [{
            "id": post['id'],
            "uuid": generate_uuid(),
            "title": post['title'],
            "feature_image": post['feature_image'],
            "mobiledoc": post['mobiledoc'],
            "type": 'post',
            "status": post['post_status'],
            "visibility": "public",
            "email_recipient_filter": "all",
            "created_at": formatted_time,
            "updated_at": formatted_time,
            "published_at": formatted_time,
            "show_title_and_feature_image": 1
        } for post in post_infos],
        'posts_tags': [{
            "id": generate_id(),
            "post_id": post['id'],
            "tag_id": post['tag_id'],
            "sort_order": 0
        } for post in post_infos],
        'tags': [{
            'id': tag_id,
            'name': tag,
            "visibility": "public",
            "created_at": formatted_time,
            "updated_at": formatted_time

        } for tag, tag_id in tags.items()],
    }
    res = ghost_config

    res_post = res['db'][0]['data']
    # ghost導(dǎo)入本身就是增量更新谭溉，不需要保留之前的文章
    res_post['posts'] = _model['posts']
    res_post['tags'] = _model['tags']
    res_post['posts_tags'] = _model['posts_tags']
    return res

def get_mobiledoc_version(ghost_config):
    _mobiledoc_str = ghost_config['db'][0]['data']['posts'][0]['mobiledoc']
    _mobiledoc = json.loads(_mobiledoc_str)
    return _mobiledoc['version']

def main():
    # 簡書文件路徑
    zip_path = '/Users/era/Downloads/user-7914065-1730503948.rar'
    # ghost 導(dǎo)出文件，需要文章里的數(shù)據(jù)橡卤，保證導(dǎo)出的文件中有文章
    ghost_json_path = '/Users/era/Downloads/tui-ge.ghost.2024-11-02-00-00-48.json'
    # 導(dǎo)入的文章設(shè)置為 草稿 或者 已發(fā)布 draft published
    post_status = 'published'
    # 第一張圖片作為封面
    first_img_as_feature = True

    post_infos, tags = read_jianshu(zip_path)
    with open(ghost_json_path) as file:
        ghost_config = json.load(file)
    # mobiledoc version
    mobiledoc_version = get_mobiledoc_version(ghost_config)

    for info in post_infos:
        # 先替換markdown中的圖片鏈接扮念，再轉(zhuǎn)換成mobiledoc
        md_str  = handle_img(info, Path(zip_path).parent, first_img_as_feature)
        info['mobiledoc'] = md_to_mobiledoc(md_str, mobiledoc_version)
        info['post_status'] = post_status
    print('download completed.')

    ghost_res = build_ghost(post_infos, ghost_config, tags)

    # 指定寫入文件路徑
    output_json_path = zip_path.replace('.rar', '.json')
    output_zip_path = zip_path.replace('.rar', '-pictures.zip')
    with open(output_json_path, 'w', encoding='utf-8') as json_file:
        json.dump(ghost_res, json_file, indent=4, ensure_ascii=False)

    zip_file(f'{Path(zip_path).parent}/content', output_zip_path)

    print(f"All done! Data saved to {output_json_path},{output_zip_path}")



if __name__ == "__main__":
    """
        pip install requests
        保證7z命令可用
    """
    main()

參考

json結(jié)構(gòu) https://ghost.org/docs/migration/custom/
導(dǎo)入圖片 https://ghost.org/help/imports/#image-imports
導(dǎo)入內(nèi)容 https://ghost.org/docs/migration/content/

最后編輯于：2024.11.03 09:22:03

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市碧库，隨后出現(xiàn)的幾起案子柜与，更是在濱河造成了極大的恐慌，老刑警劉巖嵌灰，帶你破解...
沈念sama閱讀 218,941評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件弄匕，死亡現(xiàn)場離奇詭異，居然都是意外死亡沽瞭，警方通過查閱死者的電腦和手機(jī)迁匠，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,397評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來驹溃，“玉大人城丧，你說我怎么就攤上這事⊥愫祝” “怎么了亡哄？”我有些...
開封第一講書人閱讀 165,345評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長傍药。經(jīng)常有香客問我磺平，道長，這世上最難降的妖魔是什么拐辽？我笑而不...
開封第一講書人閱讀 58,851評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任拣挪，我火速辦了婚禮，結(jié)果婚禮上俱诸，老公的妹妹穿的比我還像新娘菠劝。我一直安慰自己，他們只是感情好睁搭，可當(dāng)我...
茶點(diǎn)故事閱讀 67,868評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布赶诊。她就那樣靜靜地躺著笼平，像睡著了一般。火紅的嫁衣襯著肌膚如雪舔痪。梳的紋絲不亂的頭發(fā)上寓调，一...
開封第一講書人閱讀 51,688評論 1贊 305
城市分裂傳說
那天，我揣著相機(jī)與錄音锄码，去河邊找鬼夺英。笑死，一個(gè)胖子當(dāng)著我的面吹牛滋捶，可吹牛的內(nèi)容都是我干的痛悯。我是一名探鬼主播，決...
沈念sama閱讀 40,414評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼重窟，長吁一口氣：“原來是場噩夢啊……” “哼载萌！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起巡扇，我...
開封第一講書人閱讀 39,319評論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤扭仁，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后霎迫，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體斋枢，經(jīng)...
沈念sama閱讀 45,775評論 1贊 315
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,945評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年知给，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了瓤帚。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 40,096評論 1贊 350
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡涩赢，死狀恐怖戈次，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情筒扒，我是刑警寧澤怯邪，帶...
沈念sama閱讀 35,789評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站花墩，受9級(jí)特大地震影響悬秉，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜冰蘑，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,437評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一和泌、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧祠肥，春花似錦武氓、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,993評論 0贊 22
一樁弒父案县恕，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽东羹。三九已至，卻和暖如春忠烛，著一層夾襖步出監(jiān)牢的瞬間属提，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,107評論 1贊 271
情欲美人皮
我被黑心中介騙來泰國打工美尸，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留垒拢，地道東北人。一個(gè)月前我還...
沈念sama閱讀 48,308評論 3贊 372
代替公主和親
正文我出身青樓火惊，卻偏偏與公主長得像，于是被迫代替她去往敵國和親奔垦。傳聞我的和親對象是個(gè)殘疾皇子屹耐，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,037評論 2贊 355

簡書Markdown遷移至Ghost

Ghost內(nèi)容生成

腳本依賴

腳本運(yùn)行

參考

推薦閱讀更多精彩內(nèi)容