Lucene實(shí)現(xiàn)全文檢索技術(shù)(包含SpringBoot整合Lucene 7.6.0 )

Lucene實(shí)現(xiàn)全文檢索的流程

① 綠色表示索引過程，對要搜索的原始內(nèi)容進(jìn)行索引構(gòu)建一個索引庫珊泳，索引過程包括：
確定原始內(nèi)容即要搜索的內(nèi)容à采集文檔à創(chuàng)建文檔à分析文檔à索引文檔

② 紅色表示搜索過程熟嫩，從索引庫中搜索內(nèi)容片习，搜索過程包括：
用戶通過搜索界面à創(chuàng)建查詢à執(zhí)行搜索照卦，從索引庫搜索引擎渲染搜索結(jié)果

引入核心依賴

lucene核心及其依賴

<!--lucene-->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>7.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>7.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>7.6.0</version>
        </dependency>

中文分詞器

<dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>7.6.0</version>
        </dependency>

文件IO操作

<dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version>
        </dependency>

原始文檔

原始文檔是指要索引和搜索的內(nèi)容。原始內(nèi)容包括互聯(lián)網(wǎng)上的網(wǎng)頁狸驳、數(shù)據(jù)庫中的數(shù)據(jù)预明、磁盤上的文件等。

用來測試的原始文檔

Field分析

創(chuàng)建索引

對所有文檔分析得出的語匯單元進(jìn)行索引耙箍，索引的目的是為了搜索撰糠，最終要實(shí)現(xiàn)只搜索被索引的語匯單元從而找到Document（文檔）。

索引庫

創(chuàng)建查詢

用戶輸入查詢關(guān)鍵字執(zhí)行搜索之前需要先構(gòu)建一個查詢對象辩昆，查詢對象中可以指定查詢要搜索的Field文檔域窗慎、查詢關(guān)鍵字等，查詢對象會生成具體的查詢語法卤材，
例如：
語法 “fileName:lucene”表示要搜索Field域的內(nèi)容為“l(fā)ucene”的文檔

代碼示例(創(chuàng)建索引)

   //創(chuàng)建索引
    @Test
    public void luceneCreateIndex() throws Exception{

        //指定索引存放的位置
        //E:\Lucene_index
        Directory directory = FSDirectory.open(Paths.get(new File("E:\\Lucene_index").getPath()));
        System.out.println("pathname"+Paths.get(new File("E:\\Lucene_index").getPath()));
       //創(chuàng)建一個分詞器
//        StandardAnalyzer analyzer = new StandardAnalyzer();
//        CJKAnalyzer cjkAnalyzer = new CJKAnalyzer();
        SmartChineseAnalyzer smartChineseAnalyzer = new SmartChineseAnalyzer();
        //創(chuàng)建indexwriterConfig(參數(shù)分詞器)
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(smartChineseAnalyzer);
        //創(chuàng)建indexwrite 對象(文件對象，索引配置對象)
        IndexWriter indexWriter = new IndexWriter(directory,indexWriterConfig);
        //原始文件
        File file = new File("E:\\Lucene_Document");

        for (File f: file.listFiles()){
            //文件名
            String fileName = f.getName();
            //文件內(nèi)容
            String fileContent = FileUtils.readFileToString(f,"GBK");
            System.out.println(fileContent);
            //文件路徑
            String path = f.getPath();
            //文件大小
            long fileSize = FileUtils.sizeOf(f);

            //創(chuàng)建文件域名
            //域的名稱 域的內(nèi)容 是否存儲
            Field fileNameField = new TextField("fileName", fileName, Field.Store.YES);
            Field fileContentField = new TextField("fileContent", fileContent, Field.Store.YES);
            Field filePathField = new TextField("filePath", path, Field.Store.YES);
            Field fileSizeField = new TextField("fileSize", fileSize+"", Field.Store.YES);

            //創(chuàng)建Document 對象
            Document indexableFields = new Document();
            indexableFields.add(fileNameField);
            indexableFields.add(fileContentField);
            indexableFields.add(filePathField);
            indexableFields.add(fileSizeField);
            //創(chuàng)建索引峦失，并寫入索引庫
            indexWriter.addDocument(indexableFields);

        }

        //關(guān)閉indexWriter
        indexWriter.close();
    }

代碼示例（查詢索引）

@Test
    public void searchIndex() throws IOException {
        //指定索引庫存放路徑
        //E:\Lucene_index
        Directory directory = FSDirectory.open(Paths.get(new File("E:\\Lucene_index").getPath()));
        //創(chuàng)建indexReader對象
        IndexReader indexReader = DirectoryReader.open(directory);
        //創(chuàng)建indexSearcher對象
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        //創(chuàng)建查詢
        Query query = new TermQuery(new Term("fileContent", "可愛"));
        //執(zhí)行查詢
        //參數(shù)一  查詢對象    參數(shù)二  查詢結(jié)果返回的最大值
        TopDocs topDocs = indexSearcher.search(query, 10);
        System.out.println("查詢結(jié)果的總數(shù)"+topDocs.totalHits);
        //遍歷查詢結(jié)果
        for (ScoreDoc scoreDoc: topDocs.scoreDocs){
            //scoreDoc.doc 屬性就是doucumnet對象的id
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.getField("fileName"));
            System.out.println(doc.getField("fileContent"));
            System.out.println(doc.getField("filePath"));
            System.out.println(doc.getField("fileSize"));
        }
        indexReader.close();
    }

Demo示例展示

可愛女人