今天需要解析一個非常長的json字符串,中間碰到了各種問題,總結(jié)了一下所有的注意事項(xiàng)瘦材。
首先我有一個字符串,原本非常長仿畸,我精簡了一下食棕,如下所示:
>>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128,
'monitors': [{'use': 100, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,
'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"
這應(yīng)該不是正規(guī)調(diào)用json.dumps()得到的字符串,而是用str()颁湖,原數(shù)據(jù)結(jié)構(gòu)是由字典宣蠕、列表、字符串甥捺、長整型的數(shù)據(jù)拼接起來的抢蚀,還包含著中文的Unicode字符。即
>>> origin={"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128,
"monitors": [{"use": 100, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L,
"monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}
>>> json.dumps(origin)
'{"product": "\\\\u62c9\\\\u52fe\\\\u7f51",
"monitors": [{"use": 100, "monitorweight": 10,
"monitorname": "\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22",
"monitorurl": "http://oss.lagou.com/"}], "downtime": 3.1280000000000001}'
>>> str(origin)
"{'product': u'\\\\u62c9\\\\u52fe\\\\u7f51',
'monitors': [{'use': 100, 'monitorweight': 10L,
'monitorname': u'\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22',
'monitorurl': u'http://oss.lagou.com'}], 'downtime': 3.128}"
如果是json.dumps(s)镰禾,直接就可以用json.loads(s)便可轉(zhuǎn)換為對象皿曲。那么針對這種用str()的,便會出現(xiàn)各種問題吴侦∥菪荩總結(jié)出現(xiàn)的如下幾點(diǎn)問題:
- 字符串里的鍵值對必須是用雙引號,不能用單引號备韧。單引號會報:Expecting property name: line 1 column 1 (char 1)
>>> s1="{'a':'a'}";s2='{"a":"a"}'
>>> json.loads(s1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)
>>> json.loads(s2)
{u'a': u'a'}
- str()后不管原來的鍵值是單引號還是雙引號劫樟,最終都會變成單引號,外層是雙引號织堂。所以需要替換為雙引號
>>> s={"a":"a"};str(s)
"{'a': 'a'}"
>>> s={'a':'a'};str(s)
"{'a': 'a'}"
>>> s={'a':'a'};s1=str(s)
>>> s1
"{'a': 'a'}"
>>> s2=s1.replace('\'','\"')
>>> s2
'{"a": "a"}'
>>> json.loads(s2)
{u'a': u'a'}
- unicode字符串叠艳,str()后還會帶u標(biāo)志,需要去掉易阳。
>>> s={'a':u'拉勾網(wǎng)'}
>>> s
{'a': u'\u62c9\u52fe\u7f51'}
>>> s1=str(s)
>>> s1
"{'a': u'\\u62c9\\u52fe\\u7f51'}"
>>> json.loads(s1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)
>>>
4.長整型數(shù)據(jù)附较,str()后還帶有L標(biāo)志,也需要處理潦俺。
>>> s={"a":10L}
>>> s1=str(s)
>>> s1
"{'a': 10L}"
>>> s
>>> s1='{"a":10L}'
>>> json.loads(s1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 193, in JSONObject
raise ValueError(errmsg("Expecting , delimiter", s, end - 1))
ValueError: Expecting , delimiter: line 1 column 7 (char 7)
最后再回到之前那個復(fù)雜的字符串拒课。
>>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128, 'monitors': [{'use': 100L, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"
>>> #替換單引號為雙引號
>>> s1=s.replace('\'','\"')
>>> s1
'{"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L,"monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> s2=s1.replace('u\"','\"')
>>> #去掉unicode標(biāo)志u
>>> s2
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> s3=s2.replace('..L','')
>>> s3
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> #去掉長整型的L
>>> import re
>>> s3=re.sub(r'(\d+)L','\g<1>',s2)
>>> s3
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100, "monitorurl": "http://oss.lagou.com","monitorweight": 10,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> #最終可以用json.loads()了徐勃。
>>> json.loads(s3)
{u'product': u'\u62c9\u52fe\u7f51', u'monitors': [{u'use': 100, u'monitorweight': 10, u'monitorname': u'\u804c\u4f4d\u641c\u7d22', u'monitorurl': u'http://oss.lagou.com'}], u'downtime': 3.1280000000000001}