JSON (JavaScript Object Notation) is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.
JSON 是一種輕量級的數(shù)據(jù)交換格式酣胀。采用完全獨立于編程語言的文本格式來存儲和表示數(shù)據(jù)声功。簡潔和清晰的層次結(jié)構(gòu)使得 JSON 成為理想的數(shù)據(jù)交換語言。 易于人閱讀和編寫芝薇,同時也易于機器解析和生成劫拢,并有效地提升網(wǎng)絡(luò)傳輸效率肉津。
Python的json庫
Python自帶了json庫,主要由Encoder舱沧、Decoder和Scanner三個部分組成妹沙。
最簡單的例子
import json
s = {
'a': 'a',
'b': 'b'
}
print(json.dumps(s))
# {"a": "a", "b": "b"}
s = '{"a": "a", "b": "b"}'
print(json.loads(s))
# {'a': 'a', 'b': 'b'}
def loads
函數(shù)定義
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
The ``encoding`` argument is ignored and deprecated.
object_hook和object_pairs_hook都可以自定義解碼器,但是object_hook返回的是解碼后的dict熟吏。object_pairs_hook返回的是有序的key-value元祖列表距糖。當兩個都給定時玄窝,只調(diào)用object_pairs_hook。
舉個栗子
import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x))
# <class 'dict'> {'a': 1, 'b': 2, 'c': 3}
json.loads(j, object_pairs_hook=lambda x: print(type(x), x))
# <class 'list'> [('a', 1), ('b', 2), ('c', 3)]
parse_float悍引、parse_int以及parse_constant可以針對float恩脂、int、NaN等值做轉(zhuǎn)化趣斤。
再舉個栗子
import json
j = '{"a": 1,"b": 2,"c": 3}'
json.loads(j, object_hook=lambda x: print(type(x), x), parse_int=str)
# <class 'dict'> {'a': '1', 'b': '2', 'c': '3'}
如果要使用自定義解碼器俩块,可以創(chuàng)建一個JSONDecoder的子類,并通過cls參數(shù)調(diào)用它浓领。
另外encoding已經(jīng)被廢棄了玉凯,使用它沒有任何用處。
JSONDecoder
loads函數(shù)會調(diào)用JSONDecoder(**kw).decode()進行解析联贩,JSONDecoder在構(gòu)造函數(shù)中定義了各種類型變量解析函數(shù)以及掃描器漫仆。decode調(diào)用raw_decode從第一個不是空白字符的位置開始進行掃描。
def __init__(self, *, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, strict=True,
object_pairs_hook=None):
self.object_hook = object_hook
self.parse_float = parse_float or float
self.parse_int = parse_int or int
self.parse_constant = parse_constant or _CONSTANTS.__getitem__
self.strict = strict
self.object_pairs_hook = object_pairs_hook
self.parse_object = JSONObject
self.parse_array = JSONArray
self.parse_string = scanstring
self.memo = {}
self.scan_once = scanner.make_scanner(self)
def decode(self, s, _w=WHITESPACE.match):
"""Return the Python representation of ``s`` (a ``str`` instance
containing a JSON document).
"""
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
# _w(s, start_idx).end() 指獲取從光標為start_idx的位置開始到下一個非空字符的光標位置撑蒜,也就是過濾空格
# WHITESPACE定義了一個正則歹啼,這里不做了解,后文_w均為該意思
end = _w(s, end).end()
if end != len(s):
raise JSONDecodeError("Extra data", s, end)
return obj
def raw_decode(self, s, idx=0):
"""Decode a JSON document from ``s`` (a ``str`` beginning with
a JSON document) and return a 2-tuple of the Python
representation and the index in ``s`` where the document ended.
This can be used to decode a JSON document from a string that may
have extraneous data at the end.
"""
try:
obj, end = self.scan_once(s, idx)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
return obj, end
scanner
scanner的make_scanner會優(yōu)先使用CPython的scanner座菠,我們這里只看python的scanner狸眼,它指向了py_make_scanner。
py_make_scanner接收調(diào)用它的對象作為context浴滴,raw_decode調(diào)用py_make_scanner的scan_once拓萌,scan_once又調(diào)用了_scan_once,這個函數(shù)是最終負責掃描的函數(shù)升略。
_scan_once首先根據(jù)idx獲取第一個字符微王,根據(jù)字符進行判斷屬于哪種數(shù)據(jù)類型,并將字符串分發(fā)給相應(yīng)的處理函數(shù)進行解析品嚣,如果沒有命中任意一種類型或已經(jīng)掃描完成炕倘,拋出停止迭代的異常。
def _scan_once(string, idx):
try:
nextchar = string[idx]
except IndexError:
raise StopIteration(idx)
if nextchar == '"':
return parse_string(string, idx + 1, strict)
elif nextchar == '{':
return parse_object((string, idx + 1), strict,
_scan_once, object_hook, object_pairs_hook, memo)
elif nextchar == '[':
return parse_array((string, idx + 1), _scan_once)
elif nextchar == 'n' and string[idx:idx + 4] == 'null':
return None, idx + 4
elif nextchar == 't' and string[idx:idx + 4] == 'true':
return True, idx + 4
elif nextchar == 'f' and string[idx:idx + 5] == 'false':
return False, idx + 5
m = match_number(string, idx)
if m is not None:
integer, frac, exp = m.groups()
if frac or exp:
res = parse_float(integer + (frac or '') + (exp or ''))
else:
res = parse_int(integer)
return res, m.end()
elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
return parse_constant('NaN'), idx + 3
elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
return parse_constant('Infinity'), idx + 8
elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
return parse_constant('-Infinity'), idx + 9
else:
raise StopIteration(idx)
parse_object
parse_object首先檢查字符是否為}或是"xxxx": 這種形式翰撑,如果不是則拋出異常罩旋,如果是"xxxx":形式,則將:后面的字符串再次執(zhí)行scan_once函數(shù)進行迭代眶诈,并把返回的結(jié)果添加到pair的list中涨醋,再檢查下面的字符是不是 ,” 若是則循環(huán)執(zhí)行scan_once,若是 } 結(jié)束解析逝撬,并調(diào)用object_hook進行自定義處理浴骂。
def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,
memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
pairs = []
pairs_append = pairs.append
# Backwards compatibility
if memo is None:
memo = {}
memo_get = memo.setdefault
# Use a slice to prevent IndexError from being raised, the following
# check will raise a more specific ValueError if the string is empty
nextchar = s[end:end + 1]
# Normally we expect nextchar == '"'
# 為了避免頻繁調(diào)用正則提高效率,用if過濾出小于2個空格的情況宪潮,超過才調(diào)用正則搜索空格結(jié)尾溯警,后文均為這個作用
if nextchar != '"':
if nextchar in _ws:
end = _w(s, end).end()
nextchar = s[end:end + 1]
# Trivial empty object
if nextchar == '}':
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end + 1
pairs = {}
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end + 1
elif nextchar != '"':
raise JSONDecodeError(
"Expecting property name enclosed in double quotes", s, end)
end += 1
while True:
key, end = scanstring(s, end, strict)
key = memo_get(key, key)
# To skip some function call overhead we optimize the fast paths where
# the JSON key separator is ": " or just ":".
if s[end:end + 1] != ':':
end = _w(s, end).end()
if s[end:end + 1] != ':':
raise JSONDecodeError("Expecting ':' delimiter", s, end)
end += 1
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
pairs_append((key, value))
try:
nextchar = s[end]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end]
except IndexError:
nextchar = ''
end += 1
if nextchar == '}':
break
elif nextchar != ',':
raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
end = _w(s, end).end()
nextchar = s[end:end + 1]
end += 1
if nextchar != '"':
raise JSONDecodeError(
"Expecting property name enclosed in double quotes", s, end - 1)
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end
pairs = dict(pairs)
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end
parse_array
parse_array首先檢查字符是否為],為]則返回[]趣苏,否則將[后面的字符串再次執(zhí)行scan_once函數(shù)進行迭代,并把返回的結(jié)果添加到list中愧膀,再檢查下面的字符若是,則循環(huán)執(zhí)行scan_once拦键,若是]結(jié)束解析。
def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
values = []
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
# Look-ahead for trivial empty array
if nextchar == ']':
return values, end + 1
_append = values.append
while True:
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise JSONDecodeError("Expecting value", s, err.value) from None
_append(value)
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
end += 1
if nextchar == ']':
break
elif nextchar != ',':
raise JSONDecodeError("Expecting ',' delimiter", s, end - 1)
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
return values, end