python之文件操作
一银伟、文件操作基本流程
計(jì)算機(jī)系統(tǒng)分為:計(jì)算機(jī)硬件你虹,操作系統(tǒng),應(yīng)用程序三部分彤避。
我們用python或其他語(yǔ)言編寫的應(yīng)用程序若想要把數(shù)據(jù)永久保存下來(lái)傅物,必須要保存于硬盤中,這就涉及到應(yīng)用程序要操作硬件琉预,眾所周知董饰,應(yīng)用程序是無(wú)法直接操作硬件的,這就用到了操作系統(tǒng)圆米。操作系統(tǒng)把復(fù)雜的硬件操作封裝成簡(jiǎn)單的接口給用戶/應(yīng)用程序使用卒暂,其中文件就是操作系統(tǒng)提供給應(yīng)用程序來(lái)操作硬盤虛擬概念,用戶或應(yīng)用程序通過(guò)操作文件娄帖,可以將自己的數(shù)據(jù)永久保存下來(lái)也祠。
有了文件的概念,我們無(wú)需再去考慮操作硬盤的細(xì)節(jié)近速,只需要關(guān)注操作文件的流程:
#1. 打開文件诈嘿,得到文件句柄并賦值給一個(gè)變量
f=open('a.txt','r',encoding='utf-8') #默認(rèn)打開模式就為r
#2. 通過(guò)句柄對(duì)文件進(jìn)行操作
data=f.read()
#3. 關(guān)閉文件
f.close()
關(guān)閉文件的注意事項(xiàng):
打開一個(gè)文件包含兩部分資源:操作系統(tǒng)級(jí)打開的文件+應(yīng)用程序的變量。在操作完畢一個(gè)文件時(shí)削葱,必須把與該文件的這兩部分資源一個(gè)不落地回收奖亚,回收方法為:
1、f.close() #回收操作系統(tǒng)級(jí)打開的文件
2析砸、del f? ? ? #回收應(yīng)用程序級(jí)的變量
其中del f一定要發(fā)生在f.close()之后昔字,否則就會(huì)導(dǎo)致操作系統(tǒng)打開的文件還沒(méi)有關(guān)閉,白白占用資源首繁,
而python自動(dòng)的垃圾回收機(jī)制決定了我們無(wú)需考慮del f作郭,這就要求我們陨囊,在操作完畢文件后,一定要記住 f.close()
雖然我這么說(shuō)所坯,但是很多同學(xué)還是會(huì)很不要臉地忘記f.close(),對(duì)于這些不長(zhǎng)腦子的同學(xué)谆扎,我們推薦傻瓜式操作方式:使用with關(guān)鍵字來(lái)幫我們管理上下文
with open('a.txt','w') as f:
pass
with open('a.txt','r') as read_f,open('b.txt','w') as write_f:
data=read_f.read()
write_f.write(data)
注意
二挂捅、文件編碼
f=open(...)是由操作系統(tǒng)打開文件芹助,那么如果我們沒(méi)有為open指定編碼,那么打開文件的默認(rèn)編碼很明顯是操作系統(tǒng)說(shuō)了算了闲先,操作系統(tǒng)會(huì)用自己的默認(rèn)編碼去打開文件状土,在windows下是gbk,在linux下是utf-8伺糠。
#若要保證不亂碼蒙谓,文件以什么方式存的,就要以什么方式打開训桶。
f=open('a.txt','r',encoding='utf-8')
三累驮、文件的打開模式
文件句柄 = open(‘文件路徑’,‘模式’)
#1. 打開文件的模式有(默認(rèn)為文本模式):
r 舵揭,只讀模式【默認(rèn)模式谤专,文件必須存在,不存在則拋出異澄缟】
w置侍,只寫模式【不可讀;不存在則創(chuàng)建拦焚;存在則清空內(nèi)容】
a蜡坊, 只追加寫模式【不可讀;不存在則創(chuàng)建赎败;存在則只追加內(nèi)容】
#2. 對(duì)于非文本文件秕衙,我們只能使用b模式,"b"表示以字節(jié)的方式操作(而所有文件也都是以字節(jié)的形式存儲(chǔ)的僵刮,使用這種模式無(wú)需考慮文本文件的字符編碼据忘、圖片文件的jgp格式、視頻文件的avi格式)
rb
wb
ab
注:以b方式打開時(shí)妓笙,讀取到的內(nèi)容是字節(jié)類型若河,寫入時(shí)也需要提供字節(jié)類型,不能指定編碼
#3,‘+’模式(就是增加了一個(gè)功能)
r+寞宫, 讀寫【可讀萧福,可寫】
w+,寫讀【可寫辈赋,可讀】
a+鲫忍, 寫讀【可寫膏燕,可讀】
#4,以bytes類型操作的讀寫悟民,寫讀坝辫,寫讀模式
r+b, 讀寫【可讀射亏,可寫】
w+b近忙,寫讀【可寫,可讀】
a+b智润, 寫讀【可寫及舍,可讀】
四、文件操作方法
4.1常用操作方法
read(3):
1. 文件打開方式為文本模式時(shí)窟绷,代表讀取3個(gè)字符
2. 文件打開方式為b模式時(shí)锯玛,代表讀取3個(gè)字節(jié)
其余的文件內(nèi)光標(biāo)移動(dòng)都是以字節(jié)為單位的如:seek,tell兼蜈,truncate
注意:
1. seek有三種移動(dòng)方式0攘残,1,2为狸,其中1和2必須在b模式下進(jìn)行歼郭,但無(wú)論哪種模式,都是以bytes為單位移動(dòng)的
2. truncate是截?cái)辔募科剑晕募拇蜷_方式必須可寫实撒,但是不能用w或w+等方式打開,因?yàn)槟菢又苯忧蹇瘴募松骜詔runcate要在r+或a或a+等模式下測(cè)試效果知态。
4.2所有操作方法(了解)
class file(object)
def close(self): # real signature unknown; restored from __doc__
關(guān)閉文件
"""
close() -> None or (perhaps) an integer. Close the file.
Sets data attribute .closed to True. A closed file cannot be used for
further I/O operations. close() may be called more than once without
error. Some kinds of file objects (for example, opened by popen())
may return an exit status upon closing.
"""
def fileno(self): # real signature unknown; restored from __doc__
文件描述符
"""
fileno() -> integer "file descriptor".
This is needed for lower-level file interfaces, such os.read().
"""
return 0
def flush(self): # real signature unknown; restored from __doc__
刷新文件內(nèi)部緩沖區(qū)
""" flush() -> None. Flush the internal I/O buffer. """
pass
def isatty(self): # real signature unknown; restored from __doc__
判斷文件是否是同意tty設(shè)備
""" isatty() -> true or false. True if the file is connected to a tty device. """
return False
def next(self): # real signature unknown; restored from __doc__
獲取下一行數(shù)據(jù),不存在立叛,則報(bào)錯(cuò)
""" x.next() -> the next value, or raise StopIteration """
pass
def read(self, size=None): # real signature unknown; restored from __doc__
讀取指定字節(jié)數(shù)據(jù)
"""
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
"""
pass
def readinto(self): # real signature unknown; restored from __doc__
讀取到緩沖區(qū)负敏,不要用,將被遺棄
""" readinto() -> Undocumented. Don't use this; it may go away. """
pass
def readline(self, size=None): # real signature unknown; restored from __doc__
僅讀取一行數(shù)據(jù)
"""
readline([size]) -> next line from the file, as a string.
Retain newline. A non-negative size argument limits the maximum
number of bytes to return (an incomplete line may be returned then).
Return an empty string at EOF.
"""
pass
def readlines(self, size=None): # real signature unknown; restored from __doc__
讀取所有數(shù)據(jù)秘蛇,并根據(jù)換行保存值列表
"""
readlines([size]) -> list of strings, each a line from the file.
Call readline() repeatedly and return a list of the lines so read.
The optional size argument, if given, is an approximate bound on the
total number of bytes in the lines returned.
"""
return []
def seek(self, offset, whence=None): # real signature unknown; restored from __doc__
指定文件中指針位置
"""
seek(offset[, whence]) -> None. Move to new file position.
Argument offset is a byte count. Optional argument whence defaults to
(offset from start of file, offset should be >= 0); other values are 1
(move relative to current position, positive or negative), and 2 (move
relative to end of file, usually negative, although many platforms allow
seeking beyond the end of a file). If the file is opened in text mode,
only offsets returned by tell() are legal. Use of other offsets causes
undefined behavior.
Note that not all file objects are seekable.
"""
pass
def tell(self): # real signature unknown; restored from __doc__
獲取當(dāng)前指針位置
""" tell() -> current file position, an integer (may be a long integer). """
pass
def truncate(self, size=None): # real signature unknown; restored from __doc__
截?cái)鄶?shù)據(jù)其做,僅保留指定之前數(shù)據(jù)
"""
truncate([size]) -> None. Truncate the file to at most size bytes.
Size defaults to the current file position, as returned by tell().
"""
pass
def write(self, p_str): # real signature unknown; restored from __doc__
寫內(nèi)容
"""
write(str) -> None. Write string str to file.
Note that due to buffering, flush() or close() may be needed before
the file on disk reflects the data written.
"""
pass
def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__
將一個(gè)字符串列表寫入文件
"""
writelines(sequence_of_strings) -> None. Write the strings to the file.
Note that newlines are not added. The sequence can be any iterable object
producing strings. This is equivalent to calling write() for each string.
"""
pass
def xreadlines(self): # real signature unknown; restored from __doc__
可用于逐行讀取文件,非全部
"""
xreadlines() -> returns self.
For backward compatibility. File objects now include the performance
optimizations previously implemented in the xreadlines module.
"""
pass
2.x
class TextIOWrapper(_TextIOBase):
"""
Character and line based layer over a BufferedIOBase object, buffer.
encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).
errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
defaults to "strict".
newline controls how line endings are handled. It can be None, '',
'\n', '\r', and '\r\n'. It works as follows:
* On input, if newline is None, universal newlines mode is
enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.
* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '' or '\n', no translation takes place. If newline is any
of the other legal values, any '\n' characters written are translated
to the given string.
If line_buffering is True, a call to flush is implied when a call to
write contains a newline character.
"""
def close(self, *args, **kwargs): # real signature unknown
關(guān)閉文件
pass
def fileno(self, *args, **kwargs): # real signature unknown
文件描述符
pass
def flush(self, *args, **kwargs): # real signature unknown
刷新文件內(nèi)部緩沖區(qū)
pass
def isatty(self, *args, **kwargs): # real signature unknown
判斷文件是否是同意tty設(shè)備
pass
def read(self, *args, **kwargs): # real signature unknown
讀取指定字節(jié)數(shù)據(jù)
pass
def readable(self, *args, **kwargs): # real signature unknown
是否可讀
pass
def readline(self, *args, **kwargs): # real signature unknown
僅讀取一行數(shù)據(jù)
pass
def seek(self, *args, **kwargs): # real signature unknown
指定文件中指針位置
pass
def seekable(self, *args, **kwargs): # real signature unknown
指針是否可操作
pass
def tell(self, *args, **kwargs): # real signature unknown
獲取指針位置
pass
def truncate(self, *args, **kwargs): # real signature unknown
截?cái)鄶?shù)據(jù)赁还,僅保留指定之前數(shù)據(jù)
pass
def writable(self, *args, **kwargs): # real signature unknown
是否可寫
pass
def write(self, *args, **kwargs): # real signature unknown
寫內(nèi)容
pass
def __getstate__(self, *args, **kwargs): # real signature unknown
pass
def __init__(self, *args, **kwargs): # real signature unknown
pass
@staticmethod # known case of __new__
def __new__(*args, **kwargs): # real signature unknown
""" Create and return a new object. See help(type) for accurate signature. """
pass
def __next__(self, *args, **kwargs): # real signature unknown
""" Implement next(self). """
pass
def __repr__(self, *args, **kwargs): # real signature unknown
""" Return repr(self). """
pass
buffer = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
closed = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
encoding = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
errors = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
line_buffering = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
name = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
newlines = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
_CHUNK_SIZE = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
_finalizing = property(lambda self: object(), lambda self, v: None, lambda self: None) # default
3.x
五妖泄、文件的修改
文件的數(shù)據(jù)是存放于硬盤上的,因而只存在覆蓋艘策、不存在修改這么一說(shuō)蹈胡,我們平時(shí)看到的修改文件,都是模擬出來(lái)的效果,具體的說(shuō)有兩種實(shí)現(xiàn)方式:
方式一:將硬盤存放的該文件的內(nèi)容全部加載到內(nèi)存罚渐,在內(nèi)存中是可以修改的却汉,修改完畢后,再由內(nèi)存覆蓋到硬盤(word荷并,vim合砂,nodpad++等編輯器)
import os # 調(diào)用系統(tǒng)模塊
with open('a.txt') as read_f,open('a.txt.swap','w') as write_f:
data=read_f.read() #全部讀入內(nèi)存,如果文件很大,會(huì)很卡
data=data.replace('alex','SB') #在內(nèi)存中完成修改
write_f.write(data) #一次性寫入新文件
os.remove('a.txt') #刪除原文件
os.rename('.a.txt.swap','a.txt') #將新建的文件重命名為原文件
方式二:將硬盤存放的該文件的內(nèi)容一行一行地讀入內(nèi)存,修改完畢就寫入新文件源织,最后用新文件覆蓋源文件
import os
with open('a.txt') as read_f,open('.a.txt.swap','w') as write_f:
for line in read_f:
line=line.replace('alex','SB')
write_f.write(line)
os.remove('a.txt')
os.rename('.a.txt.swap','a.txt')
六翩伪、練習(xí)
1、 文件a.txt內(nèi)容:每一行內(nèi)容分別為商品名字雀鹃,價(jià)錢幻工,個(gè)數(shù)。
apple 10?3
tesla 100000 1
mac 3000 2
lenovo 30000 3
chicken 10 3
通過(guò)代碼黎茎,將其構(gòu)建成這種數(shù)據(jù)類型:[{'name':'apple','price':10,'amount':3},{'name':'tesla','price':1000000,'amount':1}......] 并計(jì)算出總價(jià)錢。
2当悔、有如下文件:
-------
alex是頭上長(zhǎng)了個(gè)包傅瞻。
alex其實(shí)是人妖。
誰(shuí)說(shuō)alex是sb盲憎?
你們真逗嗅骄,alex再牛逼,也掩飾不住資深屌絲的氣質(zhì)饼疙。
----------
將文件中所有的alex都替換成大寫的SB溺森。