序列化 (Serialization)是將對象的狀態(tài)信息轉(zhuǎn)換為可以存儲或傳輸?shù)男问降倪^程惫搏。在序列化期間,對象將其當前狀態(tài)寫入到臨時或持久性存儲區(qū)剿配。以后症见,可以通過從存儲區(qū)中讀取或反序列化對象的狀態(tài),重新創(chuàng)建該對象捺弦。
在scrapy_redis中饮寞,一個Request對象先經(jīng)過DupeFilter去重孝扛,然后遞交給scheduler調(diào)度儲存在Redis中,這就面臨一個問題幽崩,Request是一個對象苦始,Redis不能存儲該對象,這時就需要將request序列化儲存慌申。
scrapy中序列化模塊如下:
from scrapy_redis import picklecompat
"""A pickle wrapper module with protocol=-1 by default."""
try:
import cPickle as pickle # PY2
except ImportError:
import pickle
def loads(s):
return pickle.loads(s)
def dumps(obj):
return pickle.dumps(obj, protocol=-1)
當然python3直接使用pickle模塊陌选, 已經(jīng)沒有cPickle,該模塊最為重要的兩個方法蹄溉,序列化與反序列化如上咨油,通過序列化后的對象我們可以存儲在數(shù)據(jù)庫、文本等文件中柒爵,并快速恢復(fù)役电。
同時模式設(shè)計中的備忘錄模式通過這種方式達到最佳效果《python設(shè)計模式(十九):備忘錄模式》;可序列化的對象和數(shù)據(jù)類型如下:
-
None
,True,
False - 整數(shù)棉胀,長整數(shù)法瑟,浮點數(shù),復(fù)數(shù)
- 普通字符串和Unicode字符串
- 元組膏蚓、列表瓢谢、集合和字典畸写,只包含可選擇的對象驮瞧。
- 在模塊頂層定義的函數(shù)
- 在模塊頂層定義的內(nèi)置函數(shù)
- 在模塊的頂層定義的類。
- 這些類的實例
嘗試對不可序列化對象進行操作枯芬,將引發(fā)PicklingError
異常论笔;發(fā)生這種情況時,可能已經(jīng)將未指定的字節(jié)數(shù)寫入基礎(chǔ)文件千所。嘗試選擇高度遞歸的數(shù)據(jù)結(jié)構(gòu)可能會超過最大遞歸深度狂魔,RuntimeError
在這種情況下會被提起。
模塊API
pickle.dump
(obj, file[, protocol])
- Write a pickled representation of obj to the open file object file. This is equivalent to
Pickler(file,``protocol).dump(obj)
.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL
, the highest protocol version will be used.
*Changed in version 2.3: *Introduced the protocol parameter.
file must have awrite()
method that accepts a single string argument. It can thus be a file object opened for writing, aStringIO
object, or any other custom object that meets this interface. -
pickle.load
(file) - Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to
Unpickler(file).load()
.
file must have two methods, aread()
method that takes an integer argument, and areadline()
method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, aStringIO
object, or any other custom object that meets this interface.
This function automatically determines whether the data stream was written in binary mode or not. -
pickle.dumps
(obj[, protocol]) - Return the pickled representation of the object as a string, instead of writing it to a file.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL
, the highest protocol version will be used.
*Changed in version 2.3: *The protocol parameter was added. -
pickle.loads
(string) - Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.
至于應(yīng)用場景淫痰,比較常見的有如下幾種:
程序重啟時恢復(fù)上次的狀態(tài)最楷、會話存儲、對象的網(wǎng)絡(luò)傳輸待错。