對(duì)java中關(guān)于文件讀取方法效率的比較

前言

前不久準(zhǔn)備寫一個(gè)關(guān)于文本處理的小程序,需要高效地對(duì)文本進(jìn)行讀取肺素。于是就歸納了一下常見的文件讀取方法粤策,并測試了各自的時(shí)間,也閱讀了相關(guān)的一些源碼琐旁,希望能說清楚測試結(jié)果背后的道理涮阔,在以后用到相關(guān)操作時(shí),能選取最佳的方法灰殴。為了減少一些無關(guān)的干擾敬特,我們把源碼里的一些檢驗(yàn)參數(shù)等的代碼省略掰邢,有些代碼進(jìn)行了簡化。

常見的五類文件讀取方法

采用BufferedReader

static long testBuffered(String fileName) throws IOException{
    Long startTime = System.currentTimeMillis();
    BufferedReader reader = new BufferedReader(new FileReader(fileName));
    char[] buffer=new char[8*1024];
    long sum = 0;
    while((count=reader.read(buffer))!=-1)
    {
        sum += count;
    }
    reader.close();
    Long endTime = System.currentTimeMillis();
    System.out.println("Total time of BufferedReader is "+ (endTime - startTime) + " milliseconds, Total byte is " + sum);
    return endTime - startTime;
}

BufferedReader是一個(gè)很常見的文件讀取方法伟阔。buffer的大小為8*1024辣之。這是因?yàn)闉榱撕虰ufferedReader里的緩存進(jìn)行統(tǒng)一。BufferedReader的構(gòu)造函數(shù)如下:

private char cb[];
private static int defaultCharBufferSize = 8192;

public BufferedReader(Reader in, int sz) {
    super(in);
    this.in = in;
    cb = new char[sz];
    nextChar = nChars = 0;
}

public BufferedReader(Reader in) {
    this(in, defaultCharBufferSize);
}

我們可以看到如果構(gòu)造時(shí)未輸入?yún)?shù)皱炉,那么這個(gè)大小就是默認(rèn)的defaultCharBufferSize也就是$8192=8*1024$怀估,用這個(gè)大小呢,創(chuàng)建了一個(gè)私有數(shù)據(jù)cb合搅,我猜它是charbuffer的縮寫多搀。而BufferedReader的讀一串字符調(diào)用的是如下函數(shù)。

public int read(char cbuf[], int off, int len) throws IOException {
        synchronized (lock) {
            int n = read1(cbuf, off, len);
            if (n <= 0) return n;
            while ((n < len) && in.ready()) {
                int n1 = read1(cbuf, off + n, len - n);
                if (n1 <= 0) break;
                n += n1;
            }
            return n;
        }
    }

可見它是循環(huán)調(diào)用read1把傳入的數(shù)組(cbuf)填充到要求的長度(len)灾部。然后后面就是一連串的調(diào)用鏈如下圖

BufferedReader類中的read函數(shù)的調(diào)用鏈
BufferedReader類中的read函數(shù)的調(diào)用鏈

經(jīng)過各種嵌套調(diào)用后康铭,最后是用的是FileChannel,這也是本文里的第四種方法赌髓,于是當(dāng)然从藤,BufferedReader的效率很差。

采用RandomAccessFile

static long testRandomAccess(String fileName) throws IOException{
    Long startTime = System.currentTimeMillis();
    RandomAccessFile reader = new RandomAccessFile(fileName,"r");
    int count;
    byte[] buffer=new byte[8*1024];//緩沖區(qū)
    long sum = 0;
    while((count=reader.read(buffer))!=-1){
        sum += count;
    }
    reader.close();
    Long endTime = System.currentTimeMillis();
    System.out.println("Total time of RandomAccess is "+ (endTime - startTime) + " milliseconds, Total byte is " + sum);
    return endTime - startTime;
}

為啥上述代碼里的buffer也是8k呢锁蠕?這是因?yàn)檎{(diào)用鏈如下

RandomAccessFile中read函數(shù)的調(diào)用鏈
RandomAccessFile中read函數(shù)的調(diào)用鏈

可見該函數(shù)的調(diào)用鏈很短夷野,而且是用native函數(shù)進(jìn)行操作。最后的io_util.c的相關(guān)代碼如下

#define BUF_SIZE 8192

jint
readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
          jint off, jint len, jfieldID fid)
{
    jint nread;
    char stackBuf[BUF_SIZE];
    char *buf = stackBuf;
    if (len > BUF_SIZE) {
        buf = malloc(len);
    } 
    fd = GET_FD(this, fid);
    nread = IO_Read(fd, buf, len);
    (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
    if (buf != stackBuf) {
        free(buf);
    }
    return nread;
}

從上述代碼可以知道匿沛,如果要讀的數(shù)組的長度不大于8192扫责,那么就直接用該局部變量。如果大于逃呼,那么就需要重新分配這么一塊內(nèi)存鳖孤。因此我們?cè)跍y試代碼里片迅,選擇了8192這樣的長度阁簸,就是為了避免調(diào)用時(shí)需要從堆上分配內(nèi)存,畢竟C中的malloc和free都不是很快柠横,完全是效率黑洞推姻。

采用FileInputStream

這種方式也很常見平匈,原理也和名字一樣,把文件變成輸入流藏古,然后一個(gè)字符一個(gè)字符的讀取增炭。它是調(diào)用了InputStream的read函數(shù)實(shí)現(xiàn)的,代碼如下:

public int read(byte b[], int off, int len) throws IOException {
    int c = read();
    if (c == -1) {
        return -1;
    }
    b[off] = (byte)c;

    int i = 1;
    try {
        for (; i < len ; i++) {
            c = read();
            if (c == -1) {
                break;
            }
            b[off + i] = (byte)c;
        }
    } catch (IOException ee) {
    }
    return i;
}

采用與ByteBuffer配合的FileChannel

這種方式就和第一種方式的最后的調(diào)用那里差不多拧晕,所以速度按理來說還行隙姿。代碼如下:

static long testFileStreamChannel(String fileName) throws IOException{
    Long startTime = System.currentTimeMillis();
    FileInputStream reader = new FileInputStream(fileName);
    FileChannel ch = reader.getChannel();
    ByteBuffer bb = ByteBuffer.allocate(8*1024);
    long sum = 0;
    int count;
    while ((count=ch.read(bb)) != -1 )
    {
        sum += count;
        bb.clear();
    }
    reader.close();
    Long endTime = System.currentTimeMillis();
    System.out.println("Total time of FileStreamChannel is "+ (endTime - startTime) + " milliseconds, Total byte is " + sum);
    return endTime - startTime;
}

它調(diào)用的FileChannel的read函數(shù)其實(shí)內(nèi)部是用IOUtill里的read。代碼如下:

static int read(FileDescriptor fd, ByteBuffer dst, long position, NativeDispatcher nd) throws IOException
{
     if (dst instanceof DirectBuffer)
         return readIntoNativeBuffer(fd, dst, position, nd);

     ByteBuffer bb = Util.getTemporaryDirectBuffer(dst.remaining());
     try {
         int n = readIntoNativeBuffer(fd, bb, position, nd);
         bb.flip();
         if (n > 0)
             dst.put(bb);//放入傳入的緩存
         return n;
     } finally {
         Util.offerFirstTemporaryDirectBuffer(bb);
     }
 }

他就是申請(qǐng)一塊臨時(shí)堆外DirectByteBuffer厂捞,大小同傳入的buffer的大小输玷。然后讀取文件队丝,最后在把它放回傳入的緩存。

采用與MappedByteBuffer相結(jié)合的FileChannel

這類方法很少見欲鹏。測試代碼如下:

static long testFileStreamChannelMap(String fileName) throws IOException{
    Long startTime = System.currentTimeMillis();
    FileInputStream reader = new FileInputStream(fileName);
    FileChannel ch = reader.getChannel();
    MappedByteBuffer mb =ch.map( FileChannel.MapMode.READ_ONLY,0L, ch.size() );//這是關(guān)鍵
    long sum = 0;
    sum = mb.limit();
    reader.close();
    Long endTime = System.currentTimeMillis();
    System.out.println("Total time of testFileStreamChannelMap is "+ (endTime - startTime) + " milliseconds, Total byte is " + sum);
    return endTime - startTime;
}

我們現(xiàn)在看看上面有注釋的那句話干了什么

public MappedByteBuffer map(MapMode mode, long position, long size) throws IOException
{          
    int pagePosition = (int)(position % allocationGranularity);
    long mapPosition = position - pagePosition;
    long mapSize = size + pagePosition;
    try {
        // native方法机久,返回一個(gè)內(nèi)存映射的地址
        addr = map0(imode, mapPosition, mapSize);
    } catch (OutOfMemoryError x) {
        // 內(nèi)存不夠,手動(dòng)gc,然后再來
        System.gc();
        try {
            Thread.sleep(100);
        } catch (InterruptedException y) {
            Thread.currentThread().interrupt();
        }
        try {
            addr = map0(imode, mapPosition, mapSize);
        } catch (OutOfMemoryError y) {
            throw new IOException("Map failed", y);
        }
    }
    //根據(jù)地址赔嚎,構(gòu)造一個(gè)Buffer返回
    return Util.newMappedByteBufferR(isize, addr + pagePosition, mfd, um);
}

上述代碼中Util.newMappedByteBufferR這個(gè)名字很容易讓人誤解膘盖,其實(shí)它構(gòu)造的是MappedByteBuffer的子類DirectByteBuffer的子類DirectByteBufferR。也就是說尽狠,它獲取了文件在虛擬內(nèi)存中映射的地址衔憨,并構(gòu)造了一個(gè)DirectByteBufferR。這種類型的好處是袄膏,它是直接操縱那塊虛擬內(nèi)存的践图。

測試和分析總結(jié)

我們現(xiàn)在可以開始對(duì)這四種方法的讀取速率進(jìn)行測試了,將生成大小大約是1KB沉馆,128KB码党,256KB,512KB斥黑,768KB揖盘,1MB,128MB锌奴,256MB兽狭,512MB,768MB鹿蜀,1GB的文件進(jìn)行讀取箕慧。

static boolean generateFile(String fileName,long size){
    try {
        BufferedWriter writer = new BufferedWriter(new FileWriter(fileName),8*1024);
        for(int count = 0;count < size;count ++){
            writer.write('a');
        }
        writer.close();
    }catch (IOException e){
        e.printStackTrace();
        return false;
    }

    return true;
}

public static void main(String[] args) {
    String fileName = "data.txt";
    long m = 1024 ;
    long size[] = {m,m * 128,m * 256,m * 512,m * 768,m * 1024,m * 1024 * 128,m * 1024 * 256,m * 1024 * 512,m * 1024 * 768,m * 1024 * 1024};
    for (int i = 0;i < size.length;i ++ ) {
        generateFile(fileName, size[i]);
        try {
            testBuffered(fileName);
            testRandomAccess(fileName);
            testFileStream(fileName);
            testFileStreamChannel(fileName);
            testFileStreamChannelMap(fileName);
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("--------------------------------------------------------");
    }
}

測試得到的輸出如下:

Total time of BufferedReader is 1 milliseconds, Total byte is 1024
Total time of RandomAccess is 1 milliseconds, Total byte is 1024
Total time of FileStream is 0 milliseconds, Total byte is 1024
Total time of FileStreamChannel is 17 milliseconds, Total byte is 1024
Total time of testFileStreamChannelMap is 3 milliseconds, Total byte is 1024
--------------------------------------------------------
Total time of BufferedReader is 16 milliseconds, Total byte is 131072
Total time of RandomAccess is 0 milliseconds, Total byte is 131072
Total time of FileStream is 0 milliseconds, Total byte is 131072
Total time of FileStreamChannel is 0 milliseconds, Total byte is 131072
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 131072
--------------------------------------------------------
Total time of BufferedReader is 5 milliseconds, Total byte is 262144
Total time of RandomAccess is 1 milliseconds, Total byte is 262144
Total time of FileStream is 0 milliseconds, Total byte is 262144
Total time of FileStreamChannel is 1 milliseconds, Total byte is 262144
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 262144
--------------------------------------------------------
Total time of BufferedReader is 9 milliseconds, Total byte is 524288
Total time of RandomAccess is 0 milliseconds, Total byte is 524288
Total time of FileStream is 0 milliseconds, Total byte is 524288
Total time of FileStreamChannel is 1 milliseconds, Total byte is 524288
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 524288
--------------------------------------------------------
Total time of BufferedReader is 10 milliseconds, Total byte is 786432
Total time of RandomAccess is 0 milliseconds, Total byte is 786432
Total time of FileStream is 0 milliseconds, Total byte is 786432
Total time of FileStreamChannel is 5 milliseconds, Total byte is 786432
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 786432
--------------------------------------------------------
Total time of BufferedReader is 2 milliseconds, Total byte is 1048576
Total time of RandomAccess is 1 milliseconds, Total byte is 1048576
Total time of FileStream is 0 milliseconds, Total byte is 1048576
Total time of FileStreamChannel is 3 milliseconds, Total byte is 1048576
Total time of testFileStreamChannelMap is 1 milliseconds, Total byte is 1048576
--------------------------------------------------------
Total time of BufferedReader is 146 milliseconds, Total byte is 134217728
Total time of RandomAccess is 43 milliseconds, Total byte is 134217728
Total time of FileStream is 44 milliseconds, Total byte is 134217728
Total time of FileStreamChannel is 89 milliseconds, Total byte is 134217728
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 134217728
--------------------------------------------------------
Total time of BufferedReader is 230 milliseconds, Total byte is 268435456
Total time of RandomAccess is 88 milliseconds, Total byte is 268435456
Total time of FileStream is 85 milliseconds, Total byte is 268435456
Total time of FileStreamChannel is 107 milliseconds, Total byte is 268435456
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 268435456
--------------------------------------------------------
Total time of BufferedReader is 463 milliseconds, Total byte is 536870912
Total time of RandomAccess is 193 milliseconds, Total byte is 536870912
Total time of FileStream is 393 milliseconds, Total byte is 536870912
Total time of FileStreamChannel is 379 milliseconds, Total byte is 536870912
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 536870912
--------------------------------------------------------
Total time of BufferedReader is 844 milliseconds, Total byte is 805306368
Total time of RandomAccess is 282 milliseconds, Total byte is 805306368
Total time of FileStream is 273 milliseconds, Total byte is 805306368
Total time of FileStreamChannel is 255 milliseconds, Total byte is 805306368
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 805306368
--------------------------------------------------------
Total time of BufferedReader is 1097 milliseconds, Total byte is 1073741824
Total time of RandomAccess is 407 milliseconds, Total byte is 1073741824
Total time of FileStream is 348 milliseconds, Total byte is 1073741824
Total time of FileStreamChannel is 395 milliseconds, Total byte is 1073741824
Total time of testFileStreamChannelMap is 0 milliseconds, Total byte is 1073741824
--------------------------------------------------------

可以看見第一種方法所用的時(shí)間最長,這是完全符合我們的預(yù)期的茴恰。而最后一種因?yàn)橹苯硬倏v內(nèi)存颠焦,所以時(shí)間可以忽略。最后因?yàn)橐獦?gòu)造BufferedCache往枣,所以在小文件上也會(huì)花一些時(shí)間伐庭。于是我們可以得出結(jié)論BufferedReader效率怎么都比較低,完全可以棄用分冈。如果只是第一次讀取小文件的話圾另,不要用關(guān)于FileChannel的方法。輸入緩沖期不要大于8K雕沉,因?yàn)榇蟛糠值哪J(rèn)緩沖區(qū)都是8K盯捌,這樣可以容易配合。雖然在測試中FileChannel配合MappedByteBuffer在大文件中取得了很優(yōu)異的效果蘑秽,但是在實(shí)際使用中饺著,用這個(gè)的還是比較少。因?yàn)樗嬖诤芏鄦栴}如內(nèi)存占用肠牲、文件關(guān)閉不確定幼衰,被其打開的文件只有在垃圾回收的才會(huì)被關(guān)閉,而且這個(gè)時(shí)間點(diǎn)是不確定的缀雳。而這些問題是大部分程序員所深惡痛絕的渡嚣,畢竟這些行為沒法自己控制。不能重現(xiàn)的Bug最難修啊肥印。

轉(zhuǎn)載請(qǐng)注明:http://djjowfy.com/2017/09/10/對(duì)java中關(guān)于文件讀取方法效率的比較/

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末识椰,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子深碱,更是在濱河造成了極大的恐慌腹鹉,老刑警劉巖,帶你破解...
    沈念sama閱讀 216,591評(píng)論 6 501
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件敷硅,死亡現(xiàn)場離奇詭異功咒,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)绞蹦,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,448評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門力奋,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人幽七,你說我怎么就攤上這事景殷。” “怎么了澡屡?”我有些...
    開封第一講書人閱讀 162,823評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵猿挚,是天一觀的道長。 經(jīng)常有香客問我挪蹭,道長亭饵,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,204評(píng)論 1 292
  • 正文 為了忘掉前任梁厉,我火速辦了婚禮辜羊,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘词顾。我一直安慰自己八秃,他們只是感情好肉盹,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,228評(píng)論 6 388
  • 文/花漫 我一把揭開白布昔驱。 她就那樣靜靜地躺著,像睡著了一般上忍。 火紅的嫁衣襯著肌膚如雪骤肛。 梳的紋絲不亂的頭發(fā)上纳本,一...
    開封第一講書人閱讀 51,190評(píng)論 1 299
  • 那天,我揣著相機(jī)與錄音腋颠,去河邊找鬼繁成。 笑死,一個(gè)胖子當(dāng)著我的面吹牛淑玫,可吹牛的內(nèi)容都是我干的巾腕。 我是一名探鬼主播,決...
    沈念sama閱讀 40,078評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼絮蒿,長吁一口氣:“原來是場噩夢啊……” “哼尊搬!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起土涝,我...
    開封第一講書人閱讀 38,923評(píng)論 0 274
  • 序言:老撾萬榮一對(duì)情侶失蹤佛寿,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后回铛,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體狗准,經(jīng)...
    沈念sama閱讀 45,334評(píng)論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,550評(píng)論 2 333
  • 正文 我和宋清朗相戀三年茵肃,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了腔长。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,727評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡验残,死狀恐怖捞附,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情您没,我是刑警寧澤鸟召,帶...
    沈念sama閱讀 35,428評(píng)論 5 343
  • 正文 年R本政府宣布,位于F島的核電站氨鹏,受9級(jí)特大地震影響欧募,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜仆抵,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,022評(píng)論 3 326
  • 文/蒙蒙 一跟继、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧镣丑,春花似錦舔糖、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,672評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春摇庙,著一層夾襖步出監(jiān)牢的瞬間旱物,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,826評(píng)論 1 269
  • 我被黑心中介騙來泰國打工跟匆, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留异袄,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,734評(píng)論 2 368
  • 正文 我出身青樓玛臂,卻偏偏與公主長得像,于是被迫代替她去往敵國和親封孙。 傳聞我的和親對(duì)象是個(gè)殘疾皇子迹冤,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,619評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容

  • 1. Java基礎(chǔ)部分 基礎(chǔ)部分的順序:基本語法,類相關(guān)的語法虎忌,內(nèi)部類的語法泡徙,繼承相關(guān)的語法,異常的語法膜蠢,線程的語...
    子非魚_t_閱讀 31,623評(píng)論 18 399
  • 原文地址http://www.importnew.com/19816.html 概述 NIO主要有三個(gè)核心部分:C...
    期待現(xiàn)在閱讀 864評(píng)論 0 4
  • Java NIO(New IO)是從Java 1.4版本開始引入的一個(gè)新的IO API堪藐,可以替代標(biāo)準(zhǔn)的Java I...
    JackChen1024閱讀 7,555評(píng)論 1 143
  • 從三月份找實(shí)習(xí)到現(xiàn)在,面了一些公司挑围,掛了不少礁竞,但最終還是拿到小米、百度杉辙、阿里模捂、京東、新浪蜘矢、CVTE狂男、樂視家的研發(fā)崗...
    時(shí)芥藍(lán)閱讀 42,240評(píng)論 11 349
  • 因?yàn)橛』ǘ?/div>
    WillhelmWu閱讀 129評(píng)論 0 0