前言:根據(jù)Lucene7.0版本介紹Lucene相關(guān)知識(shí)
Lucene7.0包目錄
Lucene7.0官方文檔
org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a Reader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. analyzers-common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.codecs provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.
org.apache.lucene.document provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the IndexSearcher which turns queries into TopDocs. A number of QueryParsers are provided for producing query structures from strings or xml.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, including FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.util contains a few handy data structures and util classes, ie FixedBitSet and PriorityQueue.
解釋
analysis:定義了一個(gè)分詞器的API抽象類(lèi)以及提供了一些常用分詞器怒医;分詞器的作業(yè)是建立索引過(guò)程中,對(duì)文本進(jìn)行分詞,去掉停用詞腮介,轉(zhuǎn)換成詞根等。(如果想深入了解分詞器推薦《Lucene實(shí)戰(zhàn)》的第四章端衰,Lucene的分析過(guò)程)
codecs:提供對(duì)反向索引結(jié)構(gòu)的編碼和解碼的抽象叠洗,以及根據(jù)應(yīng)用需要可選擇的不同實(shí)現(xiàn)。
document:提供簡(jiǎn)單的文檔類(lèi)旅东。文檔只是一組命名的Fields灭抑,其值可能是字符串或Reader實(shí)例。
index:提供了兩個(gè)主要的類(lèi):IndexWriter抵代,想索引中創(chuàng)建和添加文件腾节;IndexReader,訪問(wèn)索引數(shù)據(jù)荤牍。
search:提供的數(shù)據(jù)結(jié)構(gòu)來(lái)表示查詢(TermQuery案腺、PhraseQuery、BooleanQuery)并將查詢結(jié)果存放到TopDocs中康吵,提供從字符串或xml生成查詢結(jié)構(gòu)的QueryParsers劈榨。
store:定義一個(gè)抽象類(lèi)來(lái)存儲(chǔ)持久數(shù)據(jù),該目錄是由一個(gè)IndexOutput編寫(xiě)的命名文件的集合晦嵌,并由一個(gè)IndexInput讀取同辣。提供了多個(gè)實(shí)現(xiàn),包括使用文件系統(tǒng)目錄存儲(chǔ)文件的FSDirectory和將文件作為內(nèi)存駐留的數(shù)據(jù)結(jié)構(gòu)實(shí)現(xiàn)的RAMDirectory惭载。
util:包含了一些有用的數(shù)據(jù)結(jié)構(gòu)和工具類(lèi)旱函。
geo:Lucene核心的地理空間工具實(shí)現(xiàn)
注:能力一般,水平有限棕兼,如有不當(dāng)之處陡舅,請(qǐng)批評(píng)指正,定當(dāng)虛心接受伴挚!