繼續(xù)這個(gè)Lucene總結(jié)系列吧犬绒。今天要講的是一個(gè)Lucene的業(yè)務(wù)全程操作,然后這系列的以后都是以Lucene優(yōu)化以及原理為主了挂签。OK疤祭,開始!6拧勺馆!
本系列:
文章結(jié)構(gòu):(1)業(yè)務(wù)說明以及技術(shù)說明;(2)業(yè)務(wù)實(shí)現(xiàn)(配合SynonymFilterFactory實(shí)現(xiàn)高精度地切割檢索)搓译;
一悲柱、業(yè)務(wù)說明以及技術(shù)說明:
以下是我們要實(shí)現(xiàn)的效果喔!
(1)業(yè)務(wù)說明:文字檢索商品
流程:
1. 我們預(yù)先建立商品的索引庫在服務(wù)器。(根據(jù)商品的類別以及商品表的id和名字建立索引)
2. 文字檢索商品些己,先往索引庫去查詢索引信息豌鸡。比如:商品id、名字段标、價(jià)格.....
3. 查詢出一個(gè)list裝載著商品索引信息后就根據(jù)索引到的id往數(shù)據(jù)庫查詢商品詳細(xì)信息涯冠。
(2)技術(shù)說明:文字檢索商品
1.Lucene索引建立
2.根據(jù)建立好的lucene索引去查詢
3.得到的索引信息后,再根據(jù)索引中的商品id去查詢數(shù)據(jù)庫逼庞,得到商品的詳細(xì)信息蛇更。
二、業(yè)務(wù)實(shí)現(xiàn)
(1)索引建立:
@RunWith(SpringJUnit4ClassRunner.class) // 使用Springtest測試框架
@ContextConfiguration("/spring/spring-*.xml") // 加載配置
public class GoodIndexAdd {
private LuceneDao luceneDao = new LuceneDao();
@Autowired
private GoodClassifyDao goodClassifyDao;
@Test
public void addIndexForAll() throws IOException {
/**
* 8-62:商品種類ID的起始Commodity_classification
* 根據(jù)商品種類ID查詢所屬類別的商品信息赛糟,建立你的商品種類和商品索引派任,原因我只偽造了兩個(gè)商品種類假數(shù)據(jù),就是id=15和16的商品虑灰,所以我們只建立對他的索引咯
* */
for(int i = 15; i <= 16; i++){
System.out.println("goodClassifyDao "+goodClassifyDao);
List<GoodDetails> list = goodClassifyDao.findGoodDetailsByClassifyID(i);
System.out.println("junitTest:list.size()="+list.size());
for (int index = 0; index < list.size(); index++) {
luceneDao.addIndex(list.get(index));
System.out.println(list.get(index).toString());
}
}
}
}
聯(lián)查一個(gè)
<!-- 根據(jù)商品種類ID查詢所屬類別的商品信息 ,目前用于建立索引-->
<select id="findGoodDetailsByClassifyID"
parameterType="integer" resultType="com.fuzhu.entity.GoodDetails">
select
d.Good_ID ,
d.Classify_ID,
d.Good_Name
from
Commodity_classification c,
Commodity_list d
where
c.Classify_ID=#{value} and d.Classify_ID=c.Classify_ID
</select>
(2)Controller層:
// 文字檢索
@RequestMapping(value = "/findGoodByName",produces="text/html;charset=UTF-8", method = {RequestMethod.GET,RequestMethod.GET})
public Object findGoodByName(String goodName, HttpServletResponse response)
throws Exception {
response.setHeader("Access-Control-Allow-Origin", "*");//解決跨域問題
System.out.println("查找商品名參數(shù):" + goodName);
System.out.println("-------------------------------");
List<GoodDetails> goodDetailsList = goodService.findIndex(goodName, 0,
2);// 100
System.out.println("goodDetailsList=" + goodDetailsList.size());
String realGoodid = null;
GoodDetails goodAllDetails = new GoodDetails();
goodList = new ArrayList<GoodDetails>();
if (goodDetailsList != null && goodDetailsList.size() > 0) {
long start = System.nanoTime();
for (int index = 0; index < goodDetailsList.size(); index++) {
realGoodid = goodDetailsList.get(index).getGoodId();
goodAllDetails = goodService.findGoodAllDetailsById(realGoodid);
if (goodAllDetails == null) {
System.out.println("realGoodid=" + realGoodid);
}
if (goodAllDetails != null) {
goodAllDetails.setGoodName(goodDetailsList.get(index)
.getGoodName() + realGoodid);
goodList.add(goodAllDetails);
}
}
long time = System.nanoTime() - start;
System.out.println("測試耗時(shí)6窒埂!D赂馈!"+time);
}
System.out.println("現(xiàn)在北京時(shí)間是:" + new Date());
if (goodList != null) {
System.out.println("根據(jù)商品名找到的商品數(shù)目" + goodList.size());
}
return JSON.toJSONString(goodList);
}
(3)Service層調(diào)用檢索索引:
@Autowired
private LuceneDao luceneDao;//交給spring管理這個(gè)
@Override
public List<GoodDetails> findIndex(String keyword, int start, int row) {
// LuceneDao luceneDao = new LuceneDao();//交給spring管理這個(gè)
System.out.print("luceneDao "+luceneDao);
List<GoodDetails> goodDetailsList;
try {
goodDetailsList = luceneDao.findIndex(keyword, start, row);
return goodDetailsList;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
(4)Service層調(diào)用根據(jù)索引id檢索商品細(xì)節(jié):
@Override
public GoodDetails findGoodAllDetailsById(String goodId) {
GoodDetails goodDetails = goodDetailsDao.findGoodDetailsById(goodId);
return goodDetails;
}
(5)LuceneDao檢索索引庫細(xì)節(jié):
/*
* 分頁:每頁10條
* */
public List<GoodDetails> findIndex(String keywords, int start, int rows) throws Exception {
Directory directory = FSDirectory.open(new File(Constant.INDEXURL_ALL));//索引創(chuàng)建在硬盤上字旭。
IndexSearcher indexSearcher = LuceneUtils.getIndexSearcherOfSP();
/**同義詞處理*/
// String result = SynonymAnalyzerUtil.displayTokens(SynonymAnalyzerUtil.convertSynonym(SynonymAnalyzerUtil.analyzeChinese(keywords, true)));
// Analyzer analyzer4 = new IKAnalyzer(false);// 普通簡陋語意分詞處理
// TokenStream tokenstream = analyzer4.tokenStream("goodname", new StringReader(keyword));
String result = keywords;//不作分詞處理直接檢索
//需要根據(jù)哪幾個(gè)字段進(jìn)行檢索...
String fields[] = {"goodName"};
//查詢分析程序(查詢解析)
QueryParser queryParser = new MultiFieldQueryParser(LuceneUtils.getMatchVersion(), fields, LuceneUtils.getAnalyzer());
//不同的規(guī)則構(gòu)造不同的子類...
//title:keywords content:keywords
Query query = queryParser.parse(result);
//這里檢索的是索引目錄,會(huì)把整個(gè)索引目錄都讀取一遍
//根據(jù)query查詢对湃,返回前N條
TopDocs topDocs = indexSearcher.search(query, start+rows);
System.out.println("總記錄數(shù)="+topDocs.totalHits);
ScoreDoc scoreDoc[] = topDocs.scoreDocs;
/**添加設(shè)置文字高亮begin*/
//htmly頁面高亮顯示的格式化,默認(rèn)是<b></b>即加粗
Formatter formatter = new SimpleHTMLFormatter("<font color='red'>", "</font>");
Scorer scorer = new QueryScorer(query);
Highlighter highlighter = new Highlighter(formatter, scorer);
//設(shè)置文字摘要(高亮的部分)遗淳,此時(shí)摘要大小為10
//int fragmentSize = 10;
Fragmenter fragmenter = new SimpleFragmenter();
highlighter.setTextFragmenter(fragmenter);
/**添加設(shè)置文字高亮end*/
List<GoodDetails> goodDetailslist = new ArrayList<GoodDetails>();
//防止數(shù)組溢出
int endResult = Math.min(scoreDoc.length, start+rows);
GoodDetails goodDetails = null;
for(int i = start;i < endResult ;i++ ){
goodDetails = new GoodDetails();
//docID lucene的索引庫里面有很多的document拍柒,lucene為每個(gè)document定義了一個(gè)編號,唯一標(biāo)識屈暗,自增長
int docID = scoreDoc[i].doc;
System.out.println("標(biāo)識docID="+docID);
Document document = indexSearcher.doc(docID);
/**獲取文字高亮的信息begin*/
System.out.println("==========================");
TokenStream tokenStream = LuceneUtils.getAnalyzer().tokenStream("goodName", new StringReader(document.get("goodName")));
String goodName = highlighter.getBestFragment(tokenStream, document.get("goodName"));
System.out.println("goodName="+goodName);
System.out.println("==========================");
/**獲取文字高亮的信息end*/
//備注:document.get("id")的返回值是String
goodDetails.setGoodId((document.get("id")));
goodDetails.setGoodName(goodName);
goodDetailslist.add(goodDetails);
}
return goodDetailslist;
}
(6)檢索精確優(yōu)化拆讯,實(shí)現(xiàn)中文拆分:
public class SynonymAnalyzerUtil {
/**
*
* 此方法描述的是:進(jìn)行中文拆分
*/
public static String analyzeChinese(String input, boolean userSmart) throws IOException {
StringBuffer sb = new StringBuffer();
StringReader reader = new StringReader(input.trim());
// true 用智能分詞 脂男,false細(xì)粒度
IKSegmenter ikSeg = new IKSegmenter(reader, userSmart);
for (Lexeme lexeme = ikSeg.next(); lexeme != null; lexeme = ikSeg.next()) {
sb.append(lexeme.getLexemeText()).append(" ");
}
return sb.toString();
}
/**
*
* 此方法描述的是:針對上面方法拆分后的詞組進(jìn)行同義詞匹配,返回TokenStream
* synonyms.txt:同義詞表种呐,在resources目錄下
*/
public static TokenStream convertSynonym(String input) throws IOException{
Version ver = Version.LUCENE_44;
Map<String, String> filterArgs = new HashMap<String, String>();
filterArgs.put("luceneMatchVersion", ver.toString());
filterArgs.put("synonyms", "synonyms.txt");
filterArgs.put("expand", "true");
SynonymFilterFactory factory = new SynonymFilterFactory(filterArgs);
factory.inform(new FilesystemResourceLoader());
Analyzer IKAnalyzer = new IKAnalyzer();
TokenStream ts = factory.create(IKAnalyzer.tokenStream("someField", input));
return ts;
}
/**
*
* 此方法描述的是:將tokenstream拼成一個(gè)特地格式的字符串宰翅,交給IndexSearcher來處理,再進(jìn)行精確度高的檢索
*/
public static String displayTokens(TokenStream ts) throws IOException
{
StringBuffer sb = new StringBuffer();
CharTermAttribute termAttr = ts.addAttribute(CharTermAttribute.class);
ts.reset();
while (ts.incrementToken())
{
String token = termAttr.toString();
sb.append(token).append(" ");
System.out.print(token+"|");
}
System.out.println();
ts.end();
ts.close();
return sb.toString();
}
}
好了爽室,Lucene總結(jié)系列(二)--商品檢索系統(tǒng)的文字檢索業(yè)務(wù)(lucene項(xiàng)目使用)講完了汁讼。本博客系列是項(xiàng)目lucene業(yè)務(wù)的大致實(shí)現(xiàn),當(dāng)然一些算法的不能亂給阔墩,不過以后有自己的思路出來嘿架,寫給大家,分享經(jīng)驗(yàn)給大家啸箫。歡迎在下面指出錯(cuò)誤耸彪,共同學(xué)習(xí)!忘苛!你的點(diǎn)贊是對我最好的支持2跄取!