0、楔子
1)什么是數(shù)據(jù)提取瓷马?
簡(jiǎn)單的來(lái)說(shuō),數(shù)據(jù)提取就是從響應(yīng)中獲取我們想要的數(shù)據(jù)的過(guò)程
2)數(shù)據(jù)分類
- 非結(jié)構(gòu)化的數(shù)據(jù):html,文本等
處理方法:正則表達(dá)式跨晴、xpath欧聘、beautiful soup - 結(jié)構(gòu)化數(shù)據(jù):json,xml等
3)什么是JSON端盆?
JSON(JavaScript Object Notation) 是一種輕量級(jí)的數(shù)據(jù)交換格式怀骤,它使得人們很容易的進(jìn)行閱讀和編寫。同時(shí)也方便了機(jī)器進(jìn)行解析和生成焕妙。
適用于進(jìn)行數(shù)據(jù)交互的場(chǎng)景蒋伦,比如網(wǎng)站前臺(tái)與后臺(tái)之間的數(shù)據(jù)交互。
4)如何找到返回json的url呢焚鹊?
- 使用瀏覽器/抓包工具進(jìn)行分析 wireshark(windows/linux),tcpdump(linux)
- 抓包手機(jī)app的軟件
1痕届、json.dumps()
json.dumps()用于將dict類型的數(shù)據(jù)轉(zhuǎn)成str
,因?yàn)槿绻苯訉ict類型的數(shù)據(jù)寫入json文件中會(huì)發(fā)生報(bào)錯(cuò)寺旺,因此在將數(shù)據(jù)寫入時(shí)需要用到該函數(shù)爷抓。
先來(lái)看一段代碼:
import json
# 學(xué)生信息
myinfo_dict = {"name":"李易陽(yáng)", "age":23, "sex":"男"}
# 類型
print(type(myinfo_dict))
# 轉(zhuǎn)換成json字符串?dāng)?shù)據(jù)
json_obj_str = json.dumps(myinfo_dict)
# 類型
print(type(json_obj_str))
輸出結(jié)果如下:
<class 'dict'>
<class 'str'>
再看官方文檔參數(shù)說(shuō)明如下:
def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw):
Serialize ``obj`` to a JSON formatted ``str``.
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
instead of raising a ``TypeError``.
If ``ensure_ascii`` is false, then the return value can contain non-ASCII
characters if they appear in strings contained in ``obj``. Otherwise, all
such characters are escaped in JSON strings.
If ``check_circular`` is false, then the circular reference check
for container types will be skipped and a circular reference will
result in an ``OverflowError`` (or worse).
If ``allow_nan`` is false, then it will be a ``ValueError`` to
serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
strict compliance of the JSON specification, instead of using the
JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation.
If specified, ``separators`` should be an ``(item_separator, key_separator)``
tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and
``(',', ': ')`` otherwise. To get the most compact JSON representation,
you should specify ``(',', ':')`` to eliminate whitespace.
``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.
If *sort_keys* is true (default: ``False``), then the output of
dictionaries will be sorted by key.
To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
``.default()`` method to serialize additional types), specify it with
the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.
2、json.loads()
json.loads()用于將str類型的數(shù)據(jù)轉(zhuǎn)成dict阻塑。
import json
name_emb = {'a':'1111','b':'2222','c':'3333','d':'4444'}
jsDumps = json.dumps(name_emb)
# jsDumps是json字符串格式
jsLoads = json.loads(jsDumps)
print(name_emb)
print(jsDumps)
print(jsLoads)
print(type(name_emb))
print(type(jsDumps))
print(type(jsLoads))
官方文檔參數(shù)說(shuō)明如下:
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document) to a Python object.
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders. If ``object_hook``
is also defined, the ``object_pairs_hook`` takes priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
The ``encoding`` argument is ignored and deprecated.
3蓝撇、json.dump()
json.dump()用于將dict類型的數(shù)據(jù)轉(zhuǎn)成str
,并寫入到j(luò)son文件中陈莽。下面兩種方法都可以將數(shù)據(jù)寫入json文件.
import json
name_emb = {'a':'1111','b':'2222','c':'3333','d':'4444'}
emb_filename = ('/home/cqh/faceData/emb_json.json')
json.dump(name_emb, open(emb_filename, "w"))
4渤昌、json.load()
json.load()用于從json文件中讀取數(shù)據(jù)。
import json
emb_filename = ('/home/cqh/faceData/emb_json.json')
jsObj = json.load(open(emb_filename))
print(jsObj)
print(type(jsObj))
for key in jsObj.keys():
print('key: %s value: %s' % (key,jsObj.get(key)))
運(yùn)行結(jié)果如下:
{u'a': u'1111', u'c': u'3333', u'b': u'2222', u'd': u'4444'}
<type 'dict'>
key: a value: 1111
key: c value: 3333
key: b value: 2222
key: d value: 4444
總結(jié):
json.dumps : dict轉(zhuǎn)成str 走搁,一個(gè)是將字典轉(zhuǎn)換為字符串
json.loads: str轉(zhuǎn)成dict 独柑,一個(gè)是將字符串轉(zhuǎn)換為字典
json.dump 是將python數(shù)據(jù)保存成json文件
json.load 是讀取json數(shù)據(jù)(文件)
具有read()或者write()方法的對(duì)象就是類文件對(duì)象
f = open(“a.txt”,”r”),其中f就是類文件對(duì)象
@墨雨出品 必屬精品 如有雷同 純屬巧合
`非學(xué)無(wú)以廣才私植,非志無(wú)以成學(xué)忌栅!`