ElasticSearch(以下簡稱ES)是一個(gè)基于Apache Lucene(TM)的開源搜索引擎媳荒。無論在開源還是專有領(lǐng)域,Lucene可以被認(rèn)為是迄今為止最先進(jìn)、性能最好的、功能最全的搜索引擎庫。其使用Java開發(fā)并使用Lucene作為其核心來實(shí)現(xiàn)所有索引和搜索的功能结蟋,但是它的目的是通過簡單的RESTful API來隱藏Lucene的復(fù)雜性,從而讓全文搜索變得簡單渔彰。
一嵌屎、安裝與啟動(dòng)(windows)
首先在官網(wǎng)下載zip包,下載地址:https://www.elastic.co/downloads/elasticsearch#ga-release恍涂,下載后解壓宝惰,啟動(dòng)bin目錄下的elasticsearch.bat,ElasticSearch便啟動(dòng)了再沧。這時(shí)在瀏覽器中輸入網(wǎng)址http://localhost:9200/?pretty掌测,可以看到一個(gè)Json(如下),顯示的是ES的版本等信息产园。
{
"name": "x62D3ht",
"cluster_name": "elasticsearch",
"cluster_uuid": "yDPE_WTBQE6Hp5ZBydgjSw",
"version": {
"number": "5.6.2",
"build_hash": "57e20f3",
"build_date": "2017-09-23T13:16:45.703Z",
"build_snapshot": false,
"lucene_version": "6.6.1"
},
"tagline": "You Know, for Search"
}
二汞斧、索引(index)與查詢
在Elasticsearch中存儲數(shù)據(jù)的行為就叫做索引(indexing),不過在索引之前什燕,我們需要明確數(shù)據(jù)應(yīng)該存儲在哪里屋吨。在Elasticsearch中鹦付,文檔歸屬于一種類型(type),而這些類型存在于索引(index)中川蒙,我們可以拿ES和傳統(tǒng)關(guān)系型數(shù)據(jù)庫做一個(gè)對比:
傳統(tǒng)數(shù)據(jù)庫 | ES | 說明 |
---|---|---|
Databases | Indices | 數(shù)據(jù)庫 |
Tables | Types | 表 |
Rows | Documents | 記錄 |
Columns | Fields | 字段 |
Elasticsearch集群可以包含多個(gè)索引(indices)(數(shù)據(jù)庫)著角,每一個(gè)索引可以包含多個(gè)類型(types)(表)事富,每一個(gè)類型包含多個(gè)文檔(documents)(行),然后每個(gè)文檔包含多個(gè)字段(Fields)(列)乘陪。
在這里要特別說明一下索引(index)在ES中的不同含義统台。
- 索引(名詞) 如上文所述,一個(gè)索引(index)就像是傳統(tǒng)關(guān)系數(shù)據(jù)庫中的數(shù)據(jù)庫啡邑,它是相關(guān)文檔存儲的地方贱勃,index的復(fù)數(shù)是indices 或indexes。
- 索引(動(dòng)詞) 「索引一個(gè)文檔」表示把一個(gè)文檔存儲到索引(名詞)里谤逼,以便它可以被檢索或者查詢贵扰。這很像SQL中的INSERT關(guān)鍵字,差別是流部,如果文檔已經(jīng)存在戚绕,新的文檔將覆蓋舊的文檔。
- 倒排索引 傳統(tǒng)數(shù)據(jù)庫為特定列增加一個(gè)索引枝冀,例如B-Tree索引來加速檢索舞丛。Elasticsearch和Lucene使用一種叫做倒排索引(inverted index)的數(shù)據(jù)結(jié)構(gòu)來達(dá)到相同目的。
索引
接下來我們通過建立一個(gè)員工目錄果漾,并對其進(jìn)行索引和搜索(可以使用Postman發(fā)送請求)球切,首先我們要?jiǎng)?chuàng)建員工目錄,大概有如下操作:
- 為每個(gè)員工的文檔(document)建立索引跨晴,每個(gè)文檔包含了相應(yīng)員工的所有信息欧聘。
- 每個(gè)文檔的類型為employee片林。
- employee類型歸屬于索引megacorp端盆。
- megacorp索引存儲在Elasticsearch集群中。
我們只需要一個(gè)命令就能完成這些操作:
在Postman中發(fā)送PUT請求:localhost:9200//megacorp/employee/1
在body中加入如下參數(shù)(Json格式):
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
發(fā)送請求后就會將一條員工記錄加入到ES中费封,在Postman中發(fā)送GET請求:localhost:9200//megacorp/employee/1就會查詢到這一條記錄焕妙。返回信息如下:
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
接下來,讓我們在目錄中加入更多員工信息:
發(fā)送PUT請求:localhost:9200//megacorp/employee/2弓摘,并設(shè)置body索引第二個(gè)員工文檔焚鹊。
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
發(fā)送PUT請求:localhost:9200//megacorp/employee/3,并設(shè)置body索引第三個(gè)員工文檔韧献。
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
搜索
上邊我們錄入了3條員工信息末患,可以通過如下請求搜索全部員工。
發(fā)送GET請求:localhost:9200//megacorp/employee/_search
返回信息如下:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [
"forestry"
]
}
}
]
}
}
可以看到我們使用_search代替原來的文檔id,響應(yīng)內(nèi)容的數(shù)組中包含所有的3個(gè)文檔锤窑,默認(rèn)情況下此搜索會返回前10條結(jié)果璧针。
查詢字符串
查詢字符串就像傳遞URL參數(shù)一樣去傳遞查詢語句,比如查詢last_name為"Smith"的文檔渊啰,可以發(fā)送GET請求:localhost:9200//megacorp/employee/_search?q=last_name:Smith
返回的結(jié)果如下:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.2876821,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
]
}
}
DSL語句查詢
查詢字符串便于通過命令進(jìn)行特定的查詢探橱,但是也有一定的局限性申屹,ES提供的更加強(qiáng)大的查詢語言(DSL查詢),DSL是以Json作為請求體進(jìn)行查詢隧膏,這樣上面的查詢可以使用如下方法:
發(fā)送POST請求:localhost:9200//megacorp/employee/_search哗讥,并設(shè)置body參數(shù):
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
返回的結(jié)果與之前用查詢字符串查詢的結(jié)果一樣,
更復(fù)雜的搜索
eg.查詢last_name為"smith" 并且年齡大于30的員工胞枕,發(fā)送POST請求:localhost:9200//megacorp/employee/_search杆煞,設(shè)置如下body參數(shù):
{
"query": {
"bool": {
"filter": {
"range": {
"age": {"gt": 30}
}
},
"must": {
"match": {"last_name": "Smith"}
}
}
}
}
響應(yīng)的內(nèi)容為:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}