Lucene是一個(gè)開(kāi)放源代碼的全文檢索引擎工具包,它提供了完整的查詢引擎和索引引擎,開(kāi)發(fā)人員可以方便的在目標(biāo)系統(tǒng)中實(shí)現(xiàn)全文檢索探熔。Lucene的核心使用的是基于倒排索引的,并且實(shí)現(xiàn)了實(shí)現(xiàn)了分塊索引烘挫。下面诀艰,先來(lái)體驗(yàn)一下Lucene對(duì)索引的增刪改查功能。Lucene存儲(chǔ)對(duì)象是以document為存儲(chǔ)單元饮六,對(duì)象中相關(guān)的屬性值則存放到Field中其垄。
第一步:引入依賴
<!-- Lucene核心 -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>4.7.2</version>
</dependency>
<!-- Lucene搜索查詢相關(guān) -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>4.7.2</version>
</dependency>
<!-- Lucene分詞器相關(guān) -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>4.7.2</version>
</dependency>
第二步:建立索引
這里使用標(biāo)準(zhǔn)分詞器建立5個(gè)Document的索引
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import java.io.File;
import java.io.IOException;
/**
* created by yuyufeng on 2017/11/13.
*/
public class LuceneIndexDemo {
public static void main(String[] args) {
// Lucene Document的域名
String fieldName = "blog";
String text = "";
// 建立5條索引
text = "10月11日杭州云棲大會(huì)上,馬云表達(dá)了對(duì)新建成的阿里巴巴全球研究院—阿里巴巴達(dá)摩院的愿景卤橄,希望達(dá)摩院二十年內(nèi)成為世界第一大經(jīng)濟(jì)體绿满,服務(wù)世界二十億人,創(chuàng)造一億個(gè)工作崗位虽风。";
doIndex(fieldName, text);
text = "中國(guó)互聯(lián)網(wǎng)界棒口,阿里巴巴被認(rèn)為是技術(shù)實(shí)力最弱的公司。我確實(shí)不懂技術(shù)辜膝,承認(rèn)不懂技術(shù)不丟人无牵,不懂裝懂才丟人。";
doIndex(fieldName, text);
text = "阿里巴巴未來(lái)二十年的目標(biāo)是打造世界第五大經(jīng)濟(jì)體厂抖,不是我們狂妄茎毁,而是世界需要這么一個(gè)經(jīng)濟(jì)體,也一定會(huì)有這么一個(gè)經(jīng)濟(jì)體忱辅。";
doIndex(fieldName, text);
text = "達(dá)摩院一定也必須要超越英特爾七蜘,必須超越微軟,必須超越IBM墙懂,因?yàn)槲覀兩诙皇兰o(jì)橡卤,我們是有機(jī)會(huì)后發(fā)優(yōu)勢(shì)的。";
doIndex(fieldName, text);
text = "阿里巴巴有很多爭(zhēng)議损搬,似乎無(wú)處不在碧库,我還真想不出有什么東西是我們不做的柜与。互聯(lián)網(wǎng)是一種思想嵌灰,是一種技術(shù)革命弄匕,不應(yīng)該有界限」敛t?缃鐦?lè)趣無(wú)窮迁匠。我覺(jué)得阿里巴巴的跨界還不錯(cuò)";
doIndex(fieldName, text);
}
private static void doIndex(String fieldName, String text) {
// 實(shí)例化IKAnalyzer分詞器
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
Directory directory = null;
IndexWriter iwriter;
try {
// 索引目錄
directory = new SimpleFSDirectory(new File("D://test/lucene_index"));
// 配置IndexWriterConfig
IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_47, analyzer);
iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
iwriter = new IndexWriter(directory, iwConfig);
// 寫入索引
Document doc = new Document();
Long id = System.currentTimeMillis();
doc.add(new StringField("ID", id+"", Field.Store.YES));
doc.add(new TextField(fieldName, text, Field.Store.YES));
iwriter.addDocument(doc);
iwriter.close();
System.out.println("建立索引成功:" + id);
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (directory != null) {
try {
directory.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
運(yùn)行結(jié)果:
建立索引成功:1510579712099
建立索引成功:1510579712355
建立索引成功:1510579712512
建立索引成功:1510579712743
建立索引成功:1510579712912
查看索引文件:運(yùn)行之后,打開(kāi)我們存放索引的文件夾驹溃,你會(huì)看到如下文件列表結(jié)構(gòu):
這里寫圖片描述
第三步:搜索查詢
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import java.io.File;
import java.io.IOException;
/**
* created by yuyufeng on 2017/11/13.
*/
public class LuceneSearchDemo {
public static void main(String[] args) {
// Lucene Document的域名
String fieldName = "blog";
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
Directory directory = null;
IndexReader ireader = null;
IndexSearcher isearcher;
try {
//索引目錄
directory = new SimpleFSDirectory(new File("D://test/lucene_index"));
// 配置IndexWriterConfig
IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_47, analyzer);
iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
// 搜索過(guò)程**********************************
// 實(shí)例化搜索器
ireader = DirectoryReader.open(directory);
isearcher = new IndexSearcher(ireader);
String keyword = "達(dá)摩院";
// 使用QueryParser查詢分析器構(gòu)造Query對(duì)象
QueryParser qp = new QueryParser(Version.LUCENE_47, fieldName, analyzer);
qp.setDefaultOperator(QueryParser.OR_OPERATOR); // and or 跟數(shù)據(jù)庫(kù)查詢語(yǔ)法類似
Query query = qp.parse(keyword);
System.out.println("Query = " + query);
// 搜索相似度最高的5條記錄
TopDocs topDocs = isearcher.search(query, 5);
System.out.println("命中:" + topDocs.totalHits);
// 遍歷輸出結(jié)果
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (int i = 0; i < topDocs.totalHits; i++) {
Document targetDoc = isearcher.doc(scoreDocs[i].doc);
System.out.println("內(nèi)容:" + targetDoc.toString());
}
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (ireader != null) {
try {
ireader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (directory != null) {
try {
directory.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
**keyword= :"達(dá)摩院"
運(yùn)行結(jié)果:
Query = blog:達(dá) blog:摩 blog:院
命中:2
內(nèi)容:Document<<stored<ID:1510579934220> stored,indexed,tokenized<blog:10月11日杭州云棲大會(huì)上城丧,馬云表達(dá)了對(duì)新建成的阿里巴巴全球研究院—阿里巴巴達(dá)摩院的愿景,希望達(dá)摩院二十年內(nèi)成為世界第一大經(jīng)濟(jì)體吠架,服務(wù)世界二十億人芙贫,創(chuàng)造一億個(gè)工作崗位搂鲫。>>
內(nèi)容:Document<stored<ID:1510579934765> stored,indexed,tokenized<blog:達(dá)摩院一定也必須要超越英特爾傍药,必須超越微軟,必須超越IBM魂仍,因?yàn)槲覀兩诙皇兰o(jì)拐辽,我們是有機(jī)會(huì)后發(fā)優(yōu)勢(shì)的。>>
**keyword= :"阿里巴巴達(dá)摩院"
運(yùn)行結(jié)果:
Query = blog:阿 blog:里 blog:巴 blog:巴 blog:達(dá) blog:摩 blog:院
命中:5
內(nèi)容:Document<stored<ID:1510579934220> stored,indexed,tokenized<blog:10月11日杭州云棲大會(huì)上擦酌,馬云表達(dá)了對(duì)新建成的阿里巴巴全球研究院—阿里巴巴達(dá)摩院的愿景俱诸,希望達(dá)摩院二十年內(nèi)成為世界第一大經(jīng)濟(jì)體,服務(wù)世界二十億人赊舶,創(chuàng)造一億個(gè)工作崗位睁搭。>>
內(nèi)容:Document<stored<ID:1510579934932> stored,indexed,tokenized<blog:阿里巴巴有很多爭(zhēng)議,似乎無(wú)處不在笼平,我還真想不出有什么東西是我們不做的园骆。互聯(lián)網(wǎng)是一種思想寓调,是一種技術(shù)革命锌唾,不應(yīng)該有界限《嵊ⅲ跨界樂(lè)趣無(wú)窮晌涕。我覺(jué)得阿里巴巴的跨界還不錯(cuò)>>
內(nèi)容:Document<stored<ID:1510579934765> stored,indexed,tokenized<blog:達(dá)摩院一定也必須要超越英特爾,必須超越微軟痛悯,必須超越IBM余黎,因?yàn)槲覀兩诙皇兰o(jì),我們是有機(jī)會(huì)后發(fā)優(yōu)勢(shì)的载萌。>>
內(nèi)容:Document<stored<ID:1510579934474> stored,indexed,tokenized<blog:中國(guó)互聯(lián)網(wǎng)界惧财,阿里巴巴被認(rèn)為是技術(shù)實(shí)力最弱的公司亲族。我確實(shí)不懂技術(shù),承認(rèn)不懂技術(shù)不丟人可缚,不懂裝懂才丟人霎迫。>>
內(nèi)容:Document<stored<ID:1510579934606> stored,indexed,tokenized<blog:阿里巴巴未來(lái)二十年的目標(biāo)是打造世界第五大經(jīng)濟(jì)體,不是我們狂妄帘靡,而是世界需要這么一個(gè)經(jīng)濟(jì)體知给,也一定會(huì)有這么一個(gè)經(jīng)濟(jì)體。>>
第四步:更新索引文檔
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
public class LuceneUpdateDemo {
public static void main(String[] args) {
// 實(shí)例化IKAnalyzer分詞器
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
Directory directory = null;
IndexWriter iwriter;
try {
// 索引目錄
directory = new SimpleFSDirectory(new File("D://test/lucene_index"));
// 配置IndexWriterConfig
IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_47, analyzer);
iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
iwriter = new IndexWriter(directory, iwConfig);
// 寫入索引
Document doc = new Document();
String id = "1510579934220";
doc.add(new StringField("ID", id, Field.Store.YES));
doc.add(new TextField("blog", "更新文檔后->達(dá)摩院一定也必須要超越英特爾描姚,必須超越微軟涩赢,必須超越IBM,因?yàn)槲覀兩诙皇兰o(jì)轩勘,我們是有機(jī)會(huì)后發(fā)優(yōu)勢(shì)的筒扒。", Field.Store.YES));
//先根據(jù)Term ID 刪除,在建立新的索引
iwriter.updateDocument(new Term("ID", id), doc);
iwriter.close();
System.out.println("更新索引成功:" + 1511233039462L);
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (directory != null) {
try {
directory.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
在執(zhí)行第三步查詢绊寻,即可查看更新結(jié)果
第五步:索引刪除
package top.yuyufeng.learn.lucene.demo1;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.*;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import java.io.File;
import java.io.IOException;
/**
* @author yuyufeng
* @date 2017/11/21
*/
public class LuceneDeleteDemo {
public static void main(String[] args) {
// Lucene Document的域名
String fieldName = "blog";
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
Directory directory = null;
IndexReader ireader = null;
IndexSearcher isearcher;
IndexWriter iwriter = null;
try {
//索引目錄
directory = new SimpleFSDirectory(new File("D://test/lucene_index"));
// 配置IndexWriterConfig
IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_47, analyzer);
iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
ireader = DirectoryReader.open(directory);
iwriter = new IndexWriter(directory, iwConfig);
iwriter.deleteDocuments(new Term("ID","1511235710648"));
//使用IndexWriter進(jìn)行Document刪除操作時(shí)花墩,文檔并不會(huì)立即被刪除,而是把這個(gè)刪除動(dòng)作緩存起來(lái)澄步,當(dāng)IndexWriter.Commit()或IndexWriter.Close()時(shí)冰蘑,刪除操作才會(huì)被真正執(zhí)行。
iwriter.commit();
iwriter.close();
ireader.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (directory != null) {
try {
directory.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
方法 | 說(shuō)明 |
---|---|
DeleteDocuments(Query query) | 根據(jù)Query條件來(lái)刪除單個(gè)或多個(gè)Document |
DeleteDocuments(Query[] queries) | 根據(jù)Query條件來(lái)刪除單個(gè)或多個(gè)Document |
DeleteDocuments(Term term) | 根據(jù)Term來(lái)刪除單個(gè)或多個(gè)Document |
DeleteDocuments(Term[] terms) | 根據(jù)Term來(lái)刪除單個(gè)或多個(gè)Document |
DeleteAll() | 刪除所有的Document |