univocity-parsers簡(jiǎn)介
工作中經(jīng)常會(huì)遇到需要導(dǎo)出或者解析csv的需求滥酥,Java中處理csv的開源庫(kù)也有很多淋纲,本文主要介紹通過univocity-parsers
來解析和生成csv菇夸,univocity-parsers
的github地址見此,在寫這篇文章的時(shí)候univocity-parsers
最新版為2.6.3
注: 本文所有例子源碼在都在github上堪澎。
使用詳解
在詳解介紹之前麦萤,我們先通過一個(gè)簡(jiǎn)單的例子來看看如何使用univocity-parsers
@Slf4j
public class HowToUse {
@Data
@AllArgsConstructor
@NoArgsConstructor
public static class Student {
@Parsed(field = "userNumber")
private String userNumber;
@Parsed(field = "userName")
private String userName;
@Parsed(field = "age")
private Integer age;
}
public static final String[] HEADERS = new String[]{"userNumber", "userName", "age"};
@Test
public void howToUse() throws IOException {
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
// 生成CSV內(nèi)容
Student student = new Student("1111111111111111111111", "testUser", 20);
final CsvWriterSettings csvWriterSettings = new CsvWriterSettings();
csvWriterSettings.setHeaderWritingEnabled(Boolean.TRUE);
csvWriterSettings.setHeaders(HEADERS);
csvWriterSettings.setRowWriterProcessor(new BeanWriterProcessor<>(Student.class));
CsvWriter writer = new CsvWriter(outputStream, csvWriterSettings);
writer.processRecord(student);
writer.close();
final byte[] out = outputStream.toByteArray();
log.info("output: {}", new String(out));
// 解析CSV內(nèi)容
CsvParserSettings csvParserSettings = new CsvParserSettings();
final BeanListProcessor beanListProcessor = new BeanListProcessor(Student.class);
csvParserSettings.setProcessor(beanListProcessor);
CsvParser csvParser = new CsvParser(csvParserSettings);
csvParser.parse(new ByteArrayInputStream(out));
final List<Student> students = beanListProcessor.getBeans();
final String[] headers = beanListProcessor.getHeaders();
log.info("headers: {}", String.join(",", headers));
log.info("students: {}", students.toString());
}
}
}
- output: userNumber,userName,age
1111111111111111111111,testUser,20
- headers: userNumber,userName,age
- students: [HowToUse.Student(userNumber=1111111111111111111111, userName=testUser, age=20)]
這里可以看到,基于注解能夠很快的生成和解析CSV內(nèi)容本鸣。Parsed
可以標(biāo)記屬性和header之間的對(duì)應(yīng)關(guān)系,而Processor
負(fù)責(zé)處理這兩者之間的映射硅蹦。
生成csv文本
setting介紹
從上面的例子可以看出荣德,CsvWriterSettings
用來進(jìn)行輸出的一些配置。
// Format接口童芹,這里使用的CsvFormat涮瞻,下面對(duì)CsvFormat詳細(xì)介紹
private F format;
// 默認(rèn)的nullValue,輸出的屬性的如果是null假褪,則使用這個(gè)值進(jìn)行輸出
private String nullValue = null;
// 一個(gè)列最大字符長(zhǎng)度
private int maxCharsPerColumn = 4096;
// 最多列數(shù)
private int maxColumns = 512;
// 是否跳過空行署咽,例如輸出的時(shí)候如果對(duì)應(yīng)的object是null,如果是true,則跳過
private boolean skipEmptyLines = true;
// 是否跳過尾部的空格
private boolean ignoreTrailingWhitespaces = true;
// 是否跳過首部的空格
private boolean ignoreLeadingWhitespaces = true;
/**
可以配置一些對(duì)屬性的篩選
ExcludeFieldNameSelector(excludeFields): 通過屬性的名字來忽略一些屬性的輸出
FieldNameSelector(selectFields): 通過屬性的名字來選擇只輸出一些屬性
這里其他對(duì)FieldSelector的實(shí)現(xiàn)
**/
private FieldSelector fieldSelector = null;
//
private boolean autoConfigurationEnabled = true;
// 異常處理
private ProcessorErrorHandler<? extends Context> errorHandler;
// 配置出現(xiàn)異常的時(shí)候error meesage寫入到內(nèi)容的長(zhǎng)度
private int errorContentLength = -1;
// 是否跳過bits當(dāng)做空格
private boolean skipBitsAsWhitespace = true;
/**
這個(gè)是關(guān)鍵部分宁否,例如我們剛才使用的BeanWriterProcessor窒升,是通過Bean的方式輸入
也可以自己實(shí)現(xiàn)這個(gè)借口
**/
private RowWriterProcessor<?> rowWriterProcessor;
// 如果設(shè)置成true,在寫入第一行的數(shù)據(jù)的時(shí)候慕匠,如果headers設(shè)置了則會(huì)自動(dòng)先寫入header
private Boolean headerWritingEnabled = null;
// 如果寫入了一個(gè)empty的string可以用這個(gè)值代替
private String emptyValue = "";
private boolean expandIncompleteRows = false;
private boolean columnReorderingEnabled = false;
// headers的配置饱须,可以調(diào)用writer的writeHeaders方法進(jìn)行寫入header的操作
private String[] headers;
//
private boolean escapeUnquotedValues = false;
// 是否通過fortmat配置的quote符號(hào),所有的是否加上quote符號(hào)台谊,如果設(shè)置成true蓉媳,默認(rèn)配置符號(hào)是", 測(cè)原來列內(nèi)容為xxx,變成"xxx"
private boolean quoteAllFields = false;
//
private boolean isInputEscaped = false;
private boolean normalizeLineEndingsWithinQuotes = true;
private char[] quotationTriggers = new char[0];
// 如果設(shè)置成true, 如果內(nèi)容 My "precious",則變成 "My ""precious"""
private boolean quoteEscapingEnabled = false;
</code></pre>
<h3>format介紹</h3>
<pre><code class="language-java "> // 換行符锅铅,默認(rèn)為 \n
private static final String systemLineSeparatorString;
private static final char[] systemLineSeparator;
// 引用符號(hào)
private char quote = '"';
// 轉(zhuǎn)義符號(hào)
private char quoteEscape = '"';
// 分割符酪呻,默認(rèn)為,
private char delimiter = ',';
private Character charToEscapeQuoteEscaping = null;
通過一個(gè)簡(jiǎn)單的例子來看看改變fortmat的結(jié)果
@Test
public void excludeFields() throws IOException {
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
Student student = new Student("1111111111111111111111", "@testUser", 20);
CsvFormat csvFormat = new CsvFormat();
csvFormat.setQuote('@');
csvFormat.setQuoteEscape('*');
csvFormat.setDelimiter('|');
final CsvWriterSettings csvWriterSettings = new CsvWriterSettings();
csvWriterSettings.setHeaderWritingEnabled(Boolean.TRUE);
csvWriterSettings.setQuoteAllFields(true);
csvWriterSettings.setFormat(csvFormat);
csvWriterSettings.setQuoteEscapingEnabled(true);
csvWriterSettings.setHeaders(HEADERS);
csvWriterSettings.setRowWriterProcessor(new BeanWriterProcessor<>(Student.class));
CsvWriter writer = new CsvWriter(outputStream, csvWriterSettings);
writer.processRecord(student);
writer.close();
final byte[] out = outputStream.toByteArray();
log.info("output: {}", new String(out));
}
}
- output: @userNumber@|@userName@|@age@
@1111111111111111111111@|@*@testUser@|@20@
@1111111111111111111111@
這一部分因?yàn)?code>setQuoteAllFields設(shè)置為true盐须,則前后加上了@
|
設(shè)置成了分割符, 替換了原來的,
@*@testUser@
因?yàn)槔锩嬗蠤玩荠,則使用QuoteEscape來進(jìn)行轉(zhuǎn)義,經(jīng)常遇到需要用\
進(jìn)行轉(zhuǎn)義
注解的使用
有時(shí)候需要對(duì)輸出的文本進(jìn)行一些處理,例如有時(shí)候如果字段對(duì)應(yīng)的數(shù)字太長(zhǎng)丰歌,用excel打開csv文件的時(shí)候姨蟋,會(huì)被轉(zhuǎn)成科學(xué)計(jì)數(shù)法,這個(gè)時(shí)候可能需要對(duì)輸出的字段進(jìn)行一些處理
@Slf4j
public class AnnotationTest {
@AllArgsConstructor
@NoArgsConstructor
public static class Student {
@Parsed(field = "userNumber")
@Convert(conversionClass = HumanReadableStringOutputConvert.class)
private String userNumber;
@Parsed(field = "userName")
private String userName;
@Parsed(field = "age")
private Integer age;
}
public static class HumanReadableStringOutputConvert implements Conversion<String, String> {
private String prefix;
private String suffix;
public HumanReadableStringOutputConvert(String... args) {
String defaultPrefix = "=\"";
String defaultSuffix = "\"";
final int length = args.length;
if (length >= 1) {
defaultPrefix = args[0];
}
if (length >= 2) {
defaultSuffix = args[1];
}
this.prefix = defaultPrefix;
this.suffix = defaultSuffix;
}
@Override
public String execute(String input) {
return null;
}
@Override
public String revert(String input) {
if (input == null) {
return input;
}
return prefix + input + suffix;
}
}
@Test
public void name() throws IOException {
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
// 生成CSV內(nèi)容
Student student = new Student("1111111111111111111111", "testUser", 20);
final CsvWriterSettings csvWriterSettings = new CsvWriterSettings();
csvWriterSettings.setHeaderWritingEnabled(Boolean.TRUE);
csvWriterSettings.setHeaders(HEADERS);
csvWriterSettings.setRowWriterProcessor(new BeanWriterProcessor<>(Student.class));
CsvWriter writer = new CsvWriter(outputStream, csvWriterSettings);
writer.processRecord(student);
writer.close();
final byte[] out = outputStream.toByteArray();
log.info("output: {}", new String(out));
}
}
}
21:44:03.707 [main] INFO space.chaoluo.univocity.generate.AnnotationTest - output: userNumber,userName,age
="1111111111111111111111",testUser,20
通過Convert
的注解使用立帖,自定義一個(gè)convert
眼溶,重寫revert
方法,可以對(duì)輸出的內(nèi)容進(jìn)行一些處理
通過上面自定義的處理之后晓勇,用excel打開文本堂飞,userNumber
字段不會(huì)轉(zhuǎn)成科學(xué)計(jì)數(shù)法
注: execute
對(duì)應(yīng)的方法是解析的時(shí)候。
解析csv文本
通過上面對(duì)生成的介紹绑咱,在解析時(shí)候很多的配置也是同樣如此绰筛,只不過是通過CsvParserSettings
和CsvParser
去實(shí)現(xiàn)