Python Json Decoder分析

JSON (JavaScript Object Notation) is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.

JSON 是一種輕量級的數(shù)據(jù)交換格式酣胀。采用完全獨立于編程語言的文本格式來存儲和表示數(shù)據(jù)声功。簡潔和清晰的層次結(jié)構(gòu)使得 JSON 成為理想的數(shù)據(jù)交換語言。 易于人閱讀和編寫芝薇,同時也易于機器解析和生成劫拢,并有效地提升網(wǎng)絡(luò)傳輸效率肉津。

Python的json庫

Python自帶了json庫,主要由Encoder舱沧、Decoder和Scanner三個部分組成妹沙。

最簡單的例子

import json
s = {
    'a': 'a',
    'b': 'b'
}
print(json.dumps(s))
# {"a": "a", "b": "b"}
s = '{"a": "a", "b": "b"}'
print(json.loads(s))
# {'a': 'a', 'b': 'b'}

def loads

函數(shù)定義

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).

    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal decoded with an ordered list of pairs.  The
    return value of ``object_pairs_hook`` will be used instead of the ``dict``.
    This feature can be used to implement custom decoders that rely on the
    order that the key and value pairs are decoded (for example,
    collections.OrderedDict will remember the order of insertion). If
    ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.

    ``parse_float``, if specified, will be called with the string
    of every JSON float to be decoded. By default this is equivalent to
    float(num_str). This can be used to use another datatype or parser
    for JSON floats (e.g. decimal.Decimal).

    ``parse_int``, if specified, will be called with the string
    of every JSON int to be decoded. By default this is equivalent to
    int(num_str). This can be used to use another datatype or parser
    for JSON integers (e.g. float).

    ``parse_constant``, if specified, will be called with one of the
    following strings: -Infinity, Infinity, NaN.
    This can be used to raise an exception if invalid JSON numbers
    are encountered.

    To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
    kwarg; otherwise ``JSONDecoder`` is used.

    The ``encoding`` argument is ignored and deprecated.

object_hook和object_pairs_hook都可以自定義解碼器,但是object_hook返回的是解碼后的dict熟吏。object_pairs_hook返回的是有序的key-value元祖列表距糖。當兩個都給定時玄窝,只調(diào)用object_pairs_hook。

舉個栗子

import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x))
# <class 'dict'> {'a': 1, 'b': 2, 'c': 3}
json.loads(j, object_pairs_hook=lambda x: print(type(x), x))
# <class 'list'> [('a', 1), ('b', 2), ('c', 3)]

parse_float悍引、parse_int以及parse_constant可以針對float恩脂、int、NaN等值做轉(zhuǎn)化趣斤。

再舉個栗子

import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x), parse_int=str)
# <class 'dict'> {'a': '1', 'b': '2', 'c': '3'}

如果要使用自定義解碼器俩块,可以創(chuàng)建一個JSONDecoder的子類,并通過cls參數(shù)調(diào)用它浓领。
另外encoding已經(jīng)被廢棄了玉凯,使用它沒有任何用處。

JSONDecoder

loads函數(shù)會調(diào)用JSONDecoder(**kw).decode()進行解析联贩,JSONDecoder在構(gòu)造函數(shù)中定義了各種類型變量解析函數(shù)以及掃描器漫仆。decode調(diào)用raw_decode從第一個不是空白字符的位置開始進行掃描。

def __init__(self, *, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, strict=True,
        object_pairs_hook=None):
    self.object_hook = object_hook
    self.parse_float = parse_float or float
    self.parse_int = parse_int or int
    self.parse_constant = parse_constant or _CONSTANTS.__getitem__
    self.strict = strict
    self.object_pairs_hook = object_pairs_hook
    self.parse_object = JSONObject
    self.parse_array = JSONArray
    self.parse_string = scanstring
    self.memo = {}
    self.scan_once = scanner.make_scanner(self)

def decode(self, s, _w=WHITESPACE.match):
    """Return the Python representation of ``s`` (a ``str`` instance
    containing a JSON document).

    """
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    # _w(s, start_idx).end() 指獲取從光標為start_idx的位置開始到下一個非空字符的光標位置撑蒜,也就是過濾空格
    # WHITESPACE定義了一個正則歹啼,這里不做了解,后文_w均為該意思
    end = _w(s, end).end()
    if end != len(s):
        raise JSONDecodeError("Extra data", s, end)
    return obj

def raw_decode(self, s, idx=0):
    """Decode a JSON document from ``s`` (a ``str`` beginning with
    a JSON document) and return a 2-tuple of the Python
    representation and the index in ``s`` where the document ended.

    This can be used to decode a JSON document from a string that may
    have extraneous data at the end.

    """
    try:
        obj, end = self.scan_once(s, idx)
    except StopIteration as err:
        raise JSONDecodeError("Expecting value", s, err.value) from None
    return obj, end

scanner

scanner的make_scanner會優(yōu)先使用CPython的scanner座菠,我們這里只看python的scanner狸眼,它指向了py_make_scanner。
py_make_scanner接收調(diào)用它的對象作為context浴滴,raw_decode調(diào)用py_make_scanner的scan_once拓萌,scan_once又調(diào)用了_scan_once,這個函數(shù)是最終負責掃描的函數(shù)升略。

_scan_once首先根據(jù)idx獲取第一個字符微王,根據(jù)字符進行判斷屬于哪種數(shù)據(jù)類型,并將字符串分發(fā)給相應(yīng)的處理函數(shù)進行解析品嚣,如果沒有命中任意一種類型或已經(jīng)掃描完成炕倘,拋出停止迭代的異常。

def _scan_once(string, idx):
    try:
        nextchar = string[idx]
    except IndexError:
        raise StopIteration(idx)

    if nextchar == '"':
        return parse_string(string, idx + 1, strict)
    elif nextchar == '{':
        return parse_object((string, idx + 1), strict,
            _scan_once, object_hook, object_pairs_hook, memo)
    elif nextchar == '[':
        return parse_array((string, idx + 1), _scan_once)
    elif nextchar == 'n' and string[idx:idx + 4] == 'null':
        return None, idx + 4
    elif nextchar == 't' and string[idx:idx + 4] == 'true':
        return True, idx + 4
    elif nextchar == 'f' and string[idx:idx + 5] == 'false':
        return False, idx + 5

    m = match_number(string, idx)
    if m is not None:
        integer, frac, exp = m.groups()
        if frac or exp:
            res = parse_float(integer + (frac or '') + (exp or ''))
        else:
            res = parse_int(integer)
        return res, m.end()
    elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
        return parse_constant('NaN'), idx + 3
    elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
        return parse_constant('Infinity'), idx + 8
    elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
        return parse_constant('-Infinity'), idx + 9
    else:
        raise StopIteration(idx)

parse_object

parse_object首先檢查字符是否為}或是"xxxx": 這種形式翰撑,如果不是則拋出異常罩旋,如果是"xxxx":形式,則將:后面的字符串再次執(zhí)行scan_once函數(shù)進行迭代眶诈,并把返回的結(jié)果添加到pair的list中涨醋,再檢查下面的字符是不是 ,” 若是則循環(huán)執(zhí)行scan_once,若是 } 結(jié)束解析逝撬,并調(diào)用object_hook進行自定義處理浴骂。

def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,
               memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    s, end = s_and_end
    pairs = []
    pairs_append = pairs.append
    # Backwards compatibility
    if memo is None:
        memo = {}
    memo_get = memo.setdefault
    # Use a slice to prevent IndexError from being raised, the following
    # check will raise a more specific ValueError if the string is empty
    nextchar = s[end:end + 1]
    # Normally we expect nextchar == '"'
    # 為了避免頻繁調(diào)用正則提高效率,用if過濾出小于2個空格的情況宪潮,超過才調(diào)用正則搜索空格結(jié)尾溯警,后文均為這個作用
    if nextchar != '"':  
        if nextchar in _ws:
            end = _w(s, end).end()
            nextchar = s[end:end + 1]
        # Trivial empty object
        if nextchar == '}':
            if object_pairs_hook is not None:
                result = object_pairs_hook(pairs)
                return result, end + 1
            pairs = {}
            if object_hook is not None:
                pairs = object_hook(pairs)
            return pairs, end + 1
        elif nextchar != '"':
            raise JSONDecodeError(
                "Expecting property name enclosed in double quotes", s, end)
    end += 1
    while True:
        key, end = scanstring(s, end, strict)
        key = memo_get(key, key)
        # To skip some function call overhead we optimize the fast paths where
        # the JSON key separator is ": " or just ":".
        if s[end:end + 1] != ':':
            end = _w(s, end).end()
            if s[end:end + 1] != ':':
                raise JSONDecodeError("Expecting ':' delimiter", s, end)
        end += 1

        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

        try:
            value, end = scan_once(s, end)
        except StopIteration as err:
            raise JSONDecodeError("Expecting value", s, err.value) from None
        pairs_append((key, value))
        try:
            nextchar = s[end]
            if nextchar in _ws:
                end = _w(s, end + 1).end()
                nextchar = s[end]
        except IndexError:
            nextchar = ''
        end += 1

        if nextchar == '}':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
        end = _w(s, end).end()
        nextchar = s[end:end + 1]
        end += 1
        if nextchar != '"':
            raise JSONDecodeError(
                "Expecting property name enclosed in double quotes", s, end - 1)
    if object_pairs_hook is not None:
        result = object_pairs_hook(pairs)
        return result, end
    pairs = dict(pairs)
    if object_hook is not None:
        pairs = object_hook(pairs)
    return pairs, end

parse_array

parse_array首先檢查字符是否為],為]則返回[]趣苏,否則將[后面的字符串再次執(zhí)行scan_once函數(shù)進行迭代,并把返回的結(jié)果添加到list中愧膀,再檢查下面的字符若是,則循環(huán)執(zhí)行scan_once拦键,若是]結(jié)束解析。

def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    s, end = s_and_end
    values = []
    nextchar = s[end:end + 1]
    if nextchar in _ws:
        end = _w(s, end + 1).end()
        nextchar = s[end:end + 1]
    # Look-ahead for trivial empty array
    if nextchar == ']':
        return values, end + 1
    _append = values.append
    while True:
        try:
            value, end = scan_once(s, end)
        except StopIteration as err:
            raise JSONDecodeError("Expecting value", s, err.value) from None
        _append(value)
        nextchar = s[end:end + 1]
        if nextchar in _ws:
            end = _w(s, end + 1).end()
            nextchar = s[end:end + 1]
        end += 1
        if nextchar == ']':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

    return values, end
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末檩淋,一起剝皮案震驚了整個濱河市芬为,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌蟀悦,老刑警劉巖媚朦,帶你破解...
    沈念sama閱讀 212,816評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異日戈,居然都是意外死亡询张,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,729評論 3 385
  • 文/潘曉璐 我一進店門浙炼,熙熙樓的掌柜王于貴愁眉苦臉地迎上來份氧,“玉大人,你說我怎么就攤上這事弯屈∥现模” “怎么了?”我有些...
    開封第一講書人閱讀 158,300評論 0 348
  • 文/不壞的土叔 我叫張陵资厉,是天一觀的道長厅缺。 經(jīng)常有香客問我,道長宴偿,這世上最難降的妖魔是什么湘捎? 我笑而不...
    開封第一講書人閱讀 56,780評論 1 285
  • 正文 為了忘掉前任,我火速辦了婚禮窄刘,結(jié)果婚禮上窥妇,老公的妹妹穿的比我還像新娘。我一直安慰自己娩践,他們只是感情好秩伞,可當我...
    茶點故事閱讀 65,890評論 6 385
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著欺矫,像睡著了一般。 火紅的嫁衣襯著肌膚如雪展氓。 梳的紋絲不亂的頭發(fā)上穆趴,一...
    開封第一講書人閱讀 50,084評論 1 291
  • 那天,我揣著相機與錄音遇汞,去河邊找鬼未妹。 笑死簿废,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的络它。 我是一名探鬼主播族檬,決...
    沈念sama閱讀 39,151評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼化戳!你這毒婦竟也來了单料?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 37,912評論 0 268
  • 序言:老撾萬榮一對情侶失蹤点楼,失蹤者是張志新(化名)和其女友劉穎扫尖,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體掠廓,經(jīng)...
    沈念sama閱讀 44,355評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡换怖,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,666評論 2 327
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了蟀瞧。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片沉颂。...
    茶點故事閱讀 38,809評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖悦污,靈堂內(nèi)的尸體忽然破棺而出铸屉,到底是詐尸還是另有隱情,我是刑警寧澤塞关,帶...
    沈念sama閱讀 34,504評論 4 334
  • 正文 年R本政府宣布抬探,位于F島的核電站,受9級特大地震影響帆赢,放射性物質(zhì)發(fā)生泄漏小压。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 40,150評論 3 317
  • 文/蒙蒙 一椰于、第九天 我趴在偏房一處隱蔽的房頂上張望怠益。 院中可真熱鬧,春花似錦瘾婿、人聲如沸蜻牢。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,882評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽抢呆。三九已至,卻和暖如春笛谦,著一層夾襖步出監(jiān)牢的瞬間抱虐,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,121評論 1 267
  • 我被黑心中介騙來泰國打工饥脑, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留恳邀,地道東北人懦冰。 一個月前我還...
    沈念sama閱讀 46,628評論 2 362
  • 正文 我出身青樓,卻偏偏與公主長得像谣沸,于是被迫代替她去往敵國和親刷钢。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 43,724評論 2 351

推薦閱讀更多精彩內(nèi)容