收納進(jìn)此專輯:I/O流官方中文指南系列概述及索引
大部分內(nèi)容來自 The Java? Tutorials 官方指南绣溜,其余來自別處如ifeve的譯文、imooc叹话、書籍Android面試寶典等等。
作者: @youyuge
個(gè)人博客站點(diǎn): https://youyuge.cn
一、什么是字符流
什么又是字節(jié)流擂仍?它們之間又有什么關(guān)系?請(qǐng)先仔細(xì)閱讀我的這篇熬甚,對(duì)字符編碼有個(gè)徹底的認(rèn)知逢渔。
看完之后,你應(yīng)該理解地比較透徹了乡括,我個(gè)人的總結(jié)理解如下
- 首先肃廓,計(jì)算機(jī)中文件存儲(chǔ)是二進(jìn)制存儲(chǔ)字節(jié)智厌,CPU也只能讀取二進(jìn)制也就是0和1(你問為啥?百度一下)盲赊,但是如何0和1怎么表示我們的英文字母和漢字呢铣鹏?
- 為了能用二進(jìn)制來表示我們生活中的字符,我們做了人為的規(guī)定哀蘑,也就是編碼集诚卸,但是編碼的規(guī)定是人定的,是有很多種的绘迁,比如我們對(duì)一個(gè)漢字編碼合溺,同一個(gè)漢字在不同的規(guī)定下是不一樣的:
漢字“尤” --------> 5C24 (UTF-8編碼方式)
漢字“尤” --------> D3C8 (GBK編碼方式)
所以,我們用UTF-8格式寫了一個(gè)txt文件缀台,實(shí)際計(jì)算機(jī)會(huì)將我們的字符編碼成二進(jìn)制的0和1存儲(chǔ)起來棠赛。當(dāng)我們用UTF-8格式打開它的時(shí)候,文本編輯器會(huì)根據(jù)UTF-8的轉(zhuǎn)換規(guī)則膛腐,把二級(jí)制的一大堆0和1解碼成我們?nèi)丝吹亩淖址Ю剩⑶绎@示出來。
所謂的打開txt亂碼也就很好解釋了依疼, 我們用UTF-8的編碼準(zhǔn)則痰腮,去打開一個(gè)用GBK準(zhǔn)則寫的txt,就會(huì)亂碼律罢。通俗的說膀值,兩個(gè)編碼準(zhǔn)則其實(shí)是對(duì)一大堆二進(jìn)制0和1的不同翻譯罷了。
更通俗的一個(gè)例子误辑,我有一句中文“魷魚最好吃”沧踏,轉(zhuǎn)換成字母(編碼):you yu zui hao chi,把它存儲(chǔ)起來巾钉,發(fā)給別人翘狱。別人打開文件,看到“you yu zui hao chi”砰苍,他用英語這個(gè)去翻譯(解碼)潦匈,翻譯不出來,不明所以赚导。所以他又用中文這種解碼去翻譯茬缩,發(fā)現(xiàn)這不就是拼音嗎,大概就知道了意思吼旧。在這個(gè)例子里凰锡,中文就是一個(gè)編碼準(zhǔn)則(如GBK),英文也是一種(如UTF-8),而字母就是字節(jié)掂为,底層的二進(jìn)制裕膀。
二、字符流官方定義Character Streams
The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set. In Western locales, the local character set is usually an 8-bit superset of ASCII.
Java平臺(tái)使用Unicode標(biāo)準(zhǔn)去存儲(chǔ)字符的值勇哗。字符流I/O自動(dòng)把這種內(nèi)在的形式轉(zhuǎn)換成本地字符編碼集魂角。在西方,本地字符集通常是ASCII的8比特的超集智绸。
三、使用字符流Using Character Streams
All character stream classes are descended from Reader
and Writer
. As with byte streams, there are character stream classes that specialize in file I/O: FileReader
and FileWriter
. TheCopyCharacters
example illustrates these classes.
所有的字符流的類都是從Reader和Writer繼承而來访忿。和字節(jié)流一樣瞧栗,有專門對(duì)文件讀寫的字符流的類: FileReader and FileWriter。如下是用字符流復(fù)制文件:
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}
CopyCharacters is very similar to CopyBytes. The most important difference is that CopyCharacters uses FileReader and FileWriter for input and output in place of FileInputStream and FileOutputStream. Notice that both CopyBytes and CopyCharacters use an int variable to read to and write from. However, in CopyCharacters, the int variable holds a character value in its last 16 bits; in CopyBytes, the int variable holds a byte value in its last 8 bits.
復(fù)制字符和復(fù)制字節(jié)很像海铆。最重要的不同點(diǎn)是迹恐,復(fù)制字符用的是FileReader and FileWriter 而非FileInputStream and FileOutputStream。
注意卧斟,復(fù)制字節(jié)和復(fù)制字符都用了一個(gè)int整型變量(4字節(jié))來讀寫殴边。但是,復(fù)制字符珍语,int只有最后2字節(jié)有數(shù)據(jù)锤岸,前面2字節(jié)都是0,因?yàn)镴ava默認(rèn)是UTF-16編碼板乙。而復(fù)制字節(jié)是偷,每次一個(gè)字節(jié),所以int變量只有最后一個(gè)字節(jié)有數(shù)據(jù)募逞。
四蛋铆、字符流使用了字節(jié)流
Character streams are often "wrappers" for byte streams. The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes. FileReader, for example, uses FileInputStream, while FileWriter uses FileOutputStream.
字符流是對(duì)字節(jié)流的包裝。字符流使用字節(jié)流來操作物理I/O放接,而字符流是處理字符和字節(jié)直接的轉(zhuǎn)換刺啦。FileReader類使用了FileInputStream,而FileWriter使用了FileOutputStream纠脾。
There are two general-purpose byte-to-character "bridge" streams: InputStreamReader
andOutputStreamWriter
. Use them to create character streams when there are no prepackaged character stream classes that meet your needs. The sockets lesson in the networking trail shows how to create character streams from the byte streams provided by socket classes.
有兩種通用的字節(jié)到字符的“橋梁”流:InputStreamReader 和 OutputStreamWriter 玛瘸。當(dāng)沒有滿足你要求的預(yù)包裝的字符流的時(shí)候,使用它們來創(chuàng)建字符流吧苟蹈!在網(wǎng)絡(luò)指南中的socket課程里捧韵,展示了如何用提供的socket類來把字節(jié)流轉(zhuǎn)換成字符流。
五汉操、面向行的I/O
Character I/O usually occurs in bigger units than single characters. One common unit is the line: a string of characters with a line terminator at the end. A line terminator can be a carriage-return/line-feed sequence ("\r\n"), a single carriage-return ("\r"), or a single line-feed ("\n"). Supporting all possible line terminators allows programs to read text files created on any of the widely used operating systems.
有時(shí)候我們需要按照一行一行來讀入或者輸出再来。通常,一行被定義為:一個(gè)字符串,以一個(gè)行終止符結(jié)尾芒篷。行終止符可以是回車+換行 ("\r\n")(windows下的換行符)搜变,可以是一個(gè)單回車符("\r"),或者是單換行符("\n")(mac OSX系統(tǒng)的換行符)针炉。這樣一來挠他,各種不同系統(tǒng)創(chuàng)建的文本文件,我們都能使用這種方式來正確地獲取所謂的一行篡帕。
注意:println方法是在末尾加上當(dāng)前操作系統(tǒng)的換行符殖侵,所以為了保證跨平臺(tái)性,Java中的換行符不能簡單地寫“/r/n”镰烧,而必須用:
//java寫的根據(jù)系統(tǒng)平臺(tái)得到換行符CRLF
String lineSeparator = System.getProperty("line.separator", "/n");