1、POI操作EXCEL工作簿的三種類型(都是實(shí)現(xiàn)Workbook接口):
- HSSFWorkbook
- XSSFWorkbook
- SXSSFWorkbook
2、一般使用流程
1.創(chuàng)建工作簿W(wǎng)orkbook
2.創(chuàng)建Sheet
3.創(chuàng)建行Row
4.創(chuàng)建單元格Cell
3、HSSFWorkbook
HSSFWorkbook是操作Excel2003以前(包括2003)的版本瘟裸,擴(kuò)展名為.xls庄吼,所以每個(gè)Sheet局限就是導(dǎo)出的行數(shù)至多為65535行,一般不會(huì)發(fā)生內(nèi)存不足的情況(OOM)膝但。
4、XSSFWorkbook
這種形式的出現(xiàn)是由于HSSFWorkbook的局限性而產(chǎn)生的谤草,因?yàn)槠渌鶎?dǎo)出的行數(shù)比較少跟束,并且只針對(duì)Excel2003以前(包括2003)的版本的版本莺奸,所以 XSSFWookbook應(yīng)運(yùn)而生,其對(duì)應(yīng)的是EXCEL2007以后的版本(1048576行冀宴,16384列)擴(kuò)展名.xlsx憾筏,每個(gè)Sheet最多可以導(dǎo)出104萬行,不過這樣就伴隨著一個(gè)OOM內(nèi)存溢出的問題花鹅,原因是你所創(chuàng)建的sheet row cell 等此時(shí)是存在內(nèi)存中的氧腰,隨著數(shù)據(jù)量增大 ,內(nèi)存的需求量也就增大刨肃,那么很大可能就是要OOM了古拴。
5、對(duì)于不同版本的EXCEL文檔要使用不同的工具類真友,如果使用錯(cuò)了程序會(huì)出現(xiàn)異常黄痪。
6、SXSSFWorkbook
從POI 3.8版本開始盔然,提供了一種基于XSSFWorkbook的低內(nèi)存占用的工作簿SXSSFWorkbook桅打。
SXSSF (Streaming Usermodel API)
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)
When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.
The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.
A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.
Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.
SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.
Please note that there are still things that still may consume a large amount of memory based on which features you are using, e.g. merged regions, hyperlinks, comments, ... are still only stored in memory and thus may require a lot of memory if used extensively.
Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.
引用官方的介紹,簡(jiǎn)單概括就是:
SXSSF是對(duì)XSSF的一種流式擴(kuò)展愈案,特點(diǎn)是采用了滑動(dòng)窗口的機(jī)制挺尾,低內(nèi)存占用,主要用于數(shù)據(jù)量非常大的電子表格而虛擬機(jī)堆有限的情況站绪。
原理是利用了滑動(dòng)窗口機(jī)制遭铺。
SXSSFWorkbook.DEFAULT_WINDOW_SIZE默認(rèn)值是100,表示在內(nèi)存中最多存在100個(gè)Row對(duì)象恢准,當(dāng)寫第101個(gè)Row對(duì)象的時(shí)候就會(huì)把第1個(gè)Row對(duì)象以XML格式寫入C:\Users\wange\AppData\Local\Temp路徑下的臨時(shí)文件中魂挂,后面的以此類推,始終保持內(nèi)存中最多存在100個(gè)Row對(duì)象馁筐。
SXSSFWorkbook默認(rèn)使用內(nèi)聯(lián)字符串而不是共享字符串表(SharedStringsTable)涂召。啟用共享字符串時(shí),文檔中的所有唯一字符串都必須保存在內(nèi)存中敏沉,因此會(huì)占用更多的內(nèi)存果正。
/**
* workbook - 模板工作簿
* rowAccessWindowSize - 保存在內(nèi)存中,直到刷新的行數(shù)赦抖。
* compressTmpFiles - 是否對(duì)臨時(shí)文件使用gzip壓縮舱卡,臨時(shí)文件可能占用過大的情況
* useSharedStringsTable - 是否使用共享字符串表
*/
SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable)
與XSSF的對(duì)比辅肾,在一個(gè)時(shí)間點(diǎn)上队萤,只可以訪問一定數(shù)量的Row;不再支持Sheet.clone()矫钓;不再支持公式的求值要尔。但是除了滑動(dòng)窗口舍杜,其余的EXCLE操作仍然使用的是XSSF的API。
另外官方提示導(dǎo)出EXCEL后應(yīng)該調(diào)用wb.dispose()來刪除之前保存的臨時(shí)文件赵辕。
//The example below writes a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.
package cn.nubia;
import org.junit.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import java.io.FileOutputStream;
import java.io.IOException;
public class App
{
public static void main( String[] args ) throws IOException {
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("C:\\Users\\wange\\Desktop\\wtf.xlsx");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
wb.dispose();
}
}
wb.write(out)通過源碼了解到過程是
1既绩、將wb的所有sheet調(diào)用flushRows()移出內(nèi)存,寫入臨時(shí).xml文件中
2还惠、生成了一個(gè)臨時(shí).xlsx文件將wb的一些模板數(shù)據(jù)寫入這個(gè)臨時(shí)文件
3饲握、將這個(gè)臨時(shí).xlsx文件轉(zhuǎn)成ZipFile,遍歷所有ZipEntry來獲取Sheet蚕键,如果沒有Sheet則直接復(fù)制流救欧。
4、如果能夠獲取到Sheet的則是那些臨時(shí).xml文件锣光,在對(duì)這些文件進(jìn)行解析并追蹤寫入導(dǎo)出文件中笆怠。
(這邊可能是涉及到了一些EXCEL文件格式的原理,就不深入研究了)
SXSSFWorkbook wb = new SXSSFWorkbook(-1)
初始化設(shè)置為-1的時(shí)候我們可以自己定義寫臨時(shí)文件規(guī)則誊爹,比如每讀1000行記錄flush到臨時(shí)一次蹬刷,可以大大減少磁盤IO次數(shù)。
7频丘、SXSSFWorkbook提供了一種低內(nèi)存占用的EXCEL導(dǎo)出方法办成,但是沒有提供讀取文件流的方法。因此讀入大數(shù)據(jù)量的時(shí)候還是只能使用XSSFWorkbook來讀取搂漠。
使用SAX模型來解析EXCEL不像DOM模型一下把所有文件內(nèi)容加載進(jìn)內(nèi)存诈火,它逐行掃描文檔,一邊掃描状答,一邊解析冷守。所以那些只需要單遍讀取內(nèi)容的應(yīng)用程序就可以從SAX解析中受益,這對(duì)大型文檔的解析是個(gè)巨大優(yōu)勢(shì)惊科。
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader xssfReader = new XSSFReader(pkg);