Elasticsearch: Elasticsearch 是一個實時的分布式搜索分析引擎畜伐,它能讓你以一個之前從未有過的速度和規(guī)模娇斑,去探索你的數(shù)據(jù)棋枕。 它被用作全文檢索、結(jié)構(gòu)化搜索息楔、分析以及這三個功能的組合寝贡。
可以這樣形容Elasticsearch:
- 一個分布式的實時文檔存儲,每個字段 可以被索引與搜索
- 一個分布式實時分析搜索引擎
- 能勝任上百個服務節(jié)點的擴展值依,并支持 PB 級別的結(jié)構(gòu)化或者非結(jié)構(gòu)化數(shù)據(jù)
可以在官方下載網(wǎng)址下載安裝Elasticsearch,當然也可以在github下載最新的版本(因為它是完全開源的).
1.解壓歸檔文件 tar -xvzf elasticsearch-version.tar.gz
cd elasticsearch-<version>
./bin/elasticsearch
之后就可以在瀏覽器地址欄輸入127.0.0.1:9200圃泡,然后你會看到下面類似的結(jié)果:
{
"name" : "a9W1rx3",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "rwt9_Vd4RZy6jPdJZ1N6nw",
"version" : {
"number" : "5.1.1",
"build_hash" : "5395e21",
"build_date" : "2016-12-06T12:36:15.409Z",
"build_snapshot" : false,
"lucene_version" : "6.3.0"
},
"tagline" : "You Know, for Search"
}
- 安裝kibana可視化web客戶端工具,下載地址
1. 解壓歸檔文件 tar -xvzf kibana.version.tar.gz
cd kibana.version
./bin/kibana
在瀏覽器地址欄輸入127.0.0.1:5601
然后你會看到:
選擇Dev Tools:
- 測試: 在Dev Tools控制臺輸入
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
- 獲取數(shù)據(jù)
GET /megacorp/employee/2
- 輕量搜索
GET /megacorp/employee/_search -- 返回前10條數(shù)據(jù)
GET /megacorp/employee/_search?q=last_name:Smith -- 返回last_name為Smith的數(shù)據(jù)
- 使用查詢表達式搜索
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
- 更復雜的查詢
GET /megacorp/employee/_search
{
"query":{
"bool":{
"must":{
"match":{
"last_name": "Smith"
}
}, -- 此處應當有個','
"filter":{
"range":{
"age":{
"gt": 30
}
}
}
}
}
}
- 全文搜索
GET /megacorp/employee/_search
{
"query":{
"match":{
"about": "rock climbing"
}
}
}
然后會得到兩個結(jié)果:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.53484553,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.53484553,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.26742277,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
默認按照相關(guān)性得分排序,即每個文檔跟查詢的匹配程度.因為第二個用戶的“about”屬性也提到了“rock”,所以也會返回愿险,但是第一個的“about”包含了"rock climbing”颇蜡,所以相關(guān)性得分更高排在前面.
Elasticsearch中的相關(guān)性概念非常重要,也是完全區(qū)別于傳統(tǒng)關(guān)系型數(shù)據(jù)庫的一個概念辆亏,數(shù)據(jù)庫中的一條記錄要么匹配要么不匹配
- 找出一個屬性中的獨立單詞是沒有問題的风秤,但有時候想要精確匹配一系列單詞或者短語
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing" -- 匹配短語"rock climbing"
}
}
}
- 高亮搜索: 許多應用都傾向于在每個搜索結(jié)果中高亮部分文本片段,以便讓用戶知道為何該文檔符合查詢條件
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": { -- 高亮參數(shù)
"fields" : {
"about" : {}
}
}
}
返回結(jié)果如下:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
當執(zhí)行該查詢時扮叨,返回結(jié)果與之前一樣缤弦,與此同時結(jié)果中還多了一個叫做 highlight 的部分。這個部分包含了 about 屬性匹配的文本片段彻磁,并以 HTML 標簽 <em></em> 封裝