ElasticSearch入門

ElasticSearch筆記

ElasticSearch.jpg

前言

Elasticsearch 是一個開源的搜索引擎，建立在一個全文搜索引擎庫 Apache Lucene? 基礎(chǔ)之上溺健。 Lucene 可以說是當(dāng)下最先進(jìn)、高性能蜈缤、全功能的搜索引擎庫--無論是開源還是私有扁掸。

但是 Lucene 僅僅只是一個庫。為了充分發(fā)揮其功能帜羊，你需要使用 Java 并將 Lucene 直接集成到應(yīng)用程序中。更糟糕的是鸠天，您可能需要獲得信息檢索學(xué)位才能了解其工作原理讼育。Lucene 非常復(fù)雜。

Elasticsearch 也是使用 Java 編寫的稠集，它的內(nèi)部使用 Lucene 做索引與搜索奶段，但是它的目的是使全文檢索變得簡單，通過隱藏 Lucene 的復(fù)雜性剥纷，取而代之的提供一套簡單一致的 RESTful API痹籍。

然而，Elasticsearch 不僅僅是 Lucene晦鞋，并且也不僅僅只是一個全文搜索引擎蹲缠。它可以被下面這樣準(zhǔn)確的形容：

一個分布式的實(shí)時文檔存儲棺克，每個字段 可以被索引與搜索
一個分布式實(shí)時分析搜索引擎
能勝任上百個服務(wù)節(jié)點(diǎn)的擴(kuò)展，并支持 PB 級別的結(jié)構(gòu)化或者非結(jié)構(gòu)化數(shù)據(jù)

Elasticsearch 將所有的功能打包成一個單獨(dú)的服務(wù)线定，這樣你可以通過程序與它提供的簡單的 RESTful API 進(jìn)行通信娜谊，可以使用自己喜歡的編程語言充當(dāng) web 客戶端，甚至可以使用命令行（去充當(dāng)這個客戶端）渔肩。

就 Elasticsearch 而言因俐，起步很簡單拇惋。對于初學(xué)者來說周偎，它預(yù)設(shè)了一些適當(dāng)?shù)哪J(rèn)值，并隱藏了復(fù)雜的搜索理論知識撑帖。它 開箱即用 蓉坎。只需最少的理解，你很快就能具有生產(chǎn)力

本文基于ElasticSearch 2.2胡嘿，參考自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/getting-started.html

ElasticSearch筆記

基礎(chǔ)知識

Near RealTime
Elastic是一種實(shí)時搜索引擎
Cluster
集群是各個節(jié)點(diǎn)的集合蛉艾，保存了所有的數(shù)據(jù)并提供了數(shù)據(jù)檢索能力。
- 默認(rèn)集群的名稱是elasticsearch
- 集群名稱不可重復(fù)
- 每個集群至少有一個節(jié)點(diǎn)
Node
多個節(jié)點(diǎn)構(gòu)成了集群衷敌，集群的數(shù)據(jù)存儲和數(shù)據(jù)檢索也是依賴于節(jié)點(diǎn)來完成的勿侯，節(jié)點(diǎn)的名稱是一個隨機(jī)值，當(dāng)然你也可以自己修改咯
- 節(jié)點(diǎn)默認(rèn)加入elasticsearch集群
- 一個集群的擁有的節(jié)點(diǎn)數(shù)量不做限制
- 如果網(wǎng)絡(luò)上沒有一個節(jié)點(diǎn)缴罗，那么此時新建一個節(jié)點(diǎn)將默認(rèn)加入到elasticsearch集群中
Index
索引是document的集合助琐，通過索引你可以查詢，更新面氓，刪除document
- 索引的名稱必須全部是小寫
- 集群中你可以定義多個index
Type
Type可以理解為Java中的類兵钮，Document就是通過該類創(chuàng)建出來的實(shí)例
Document
Document是可以被檢索到的最小數(shù)據(jù)單元集合
- Document數(shù)據(jù)全部是JSON格式
- Document必須指定其對應(yīng)type
Shards
數(shù)據(jù)分片，提供了index數(shù)據(jù)量無限增大的能力舌界。出現(xiàn)數(shù)據(jù)分片的原因：單個node容不下某個index中所有的數(shù)據(jù)掘譬。
Replicas
既然有了數(shù)據(jù)分片，也就意味著數(shù)據(jù)不在同一個物理節(jié)點(diǎn)上存儲呻拌，那么某次查詢就會可能出現(xiàn)某個數(shù)據(jù)分片所在的節(jié)點(diǎn)掛掉的情況葱轩，此時的解決方案就是進(jìn)行數(shù)據(jù)復(fù)制
- 默認(rèn)情況下，elasicsearch有5個數(shù)據(jù)分片和1次復(fù)制藐握，這就意味著如果集群中只有兩個nodeA和B酿箭，加入nodeA保存著源數(shù)據(jù)，那么經(jīng)過一個數(shù)據(jù)復(fù)制之后趾娃，nodeB中也會存在一份相同的數(shù)據(jù)

啟動ElasticSearch

在bin目錄下執(zhí)行

./elasticsarch

啟動時修改集群名稱和節(jié)點(diǎn)名稱

./elasticsearch --cluster.name fanyank_cluster --node.name fanyank_node1

快速上手

執(zhí)行健康檢查

curl "localhost:9200/_cat/health?v"

執(zhí)行成功之后可以看到

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1541906922 11:28:42  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%

status: green(正常)缭嫡，yellow(集群整體可用，但某些復(fù)制節(jié)點(diǎn)有問題)抬闷，red(集群不可用)

查詢所有節(jié)點(diǎn)

curl "localhost:9200/_cat/nodes?v"

執(zhí)行成功之后可以看到

host      ip        heap.percent ram.percent  load node.role master name       
127.0.0.1 127.0.0.1            4          81 -1.00 d         *      Nightshade

查詢所有index

curl "localhost:9200/_cat/indices?v"

執(zhí)行成功之后可以看到

health status index pri rep docs.count docs.deleted store.size pri.store.size

意味著我們目前還沒有建立任何索引

建立index
建立名稱為customer的index
```
curl -XPUT 'localhost:9200/customer?pretty'
curl 'localhost:9200/_cat/indices?v'
```
執(zhí)行之后可以看到
```
health status index    pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   customer   5   1          0            0       650b           650b
```
我們可以得知customer索引現(xiàn)在有5個數(shù)據(jù)分片和1次復(fù)制妇蛀，包含了0個文檔耕突。另外，health狀態(tài)為yellow评架，這是因?yàn)槲覀儸F(xiàn)在只有一個node,而復(fù)制操作至少需要兩個node眷茁，所以replica是連接不上的，所以狀態(tài)為yellow
創(chuàng)建mapping
每一個type都有一個對應(yīng)mapping文件纵诞，在插入第一個文檔的時候上祈，es會自動搜索插入文檔的字段并為這個type建立一個mapping，隨著插入文檔字段的增多浙芙，mapping中的字段也隨著增多登刺。
這里，我們手動創(chuàng)建一個mapping,對每一個字段我們都嚴(yán)格的定義類型嗡呼，我們要在customer索引創(chuàng)建如下的mapping
```
{
    "properties": {
         "name": {
             "type": "string",
             "index": "not_analyzed"  /*關(guān)閉分詞*/
         },
         "age": {
             "type": "integer"
         },
         "score": {
             "type": "double"
         },
         "create_time": {
             "type": "date",
             "format": "yy-MM-dd HH:mm:ss"   /*日期格式轉(zhuǎn)換*/
         }
     }
}
```
插入document
在插入數(shù)據(jù)前纸俭，必須指定數(shù)據(jù)的type∧淠耍現(xiàn)在我們在customer索引中插入一個類型為external的數(shù)據(jù)仁热，并且數(shù)據(jù)的ID為1
插入的JSON數(shù)據(jù) {"name":"fanyank"}
```
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '{"name" : "fanyank"}'
```
執(zhí)行之后可以看到
```
{
    "_index" : "customer",
    "_type" : "external",
    "_id" : "2",
    "_version" : 1,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
    },
    "created" : true
}
```
插入數(shù)據(jù)時注意，elasticsearch不會檢查索引是否存在辆雾，如果發(fā)現(xiàn)索引不存在万伤，那么elasticsearch會自動創(chuàng)建索引并插入窒悔，所以執(zhí)行插入時一定要檢查索引名稱是否拼寫正確。

我們在插入數(shù)據(jù)時也可以不指定ID敌买，這樣elasticsearch會為我們生成一個唯一的hashcode,注意此時使用的http請求是POST
```
curl -XPOST 'localhost:9200/customer/external?pretty' -d '{"name" : "fanyank"}'
```
執(zhí)行之后返回如下
```
{
    "_index" : "customer",
    "_type" : "external",
    "_id" : "AWcBDGmpUlH0QZg74qVp",
    "_version" : 1,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
    },
    "created" : true
}
```
更新數(shù)據(jù)
更新數(shù)據(jù)和插入數(shù)據(jù)一樣简珠，如果發(fā)現(xiàn)id已經(jīng)存在，那么elasticsearch就會執(zhí)行更新操作
elasticsearch執(zhí)行更新操作的本質(zhì)是刪除已有數(shù)據(jù)然后新插入一條數(shù)據(jù)
1. 方法一
```
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '{"name":"Big Bang"}'
```
2. 方法二
```
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '{"doc":{"name":"Nathan James"}}'
```
3. 方法三
  在external中添加age字段
```
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '{"doc":{"name":"Nathan James","age":20}}'
```
4. 方法四
  使用script
```
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
    "script" : "ctx._source.age += 5"
}'
```
  注意方法二,方法三請求是POST請求放妈，而且請求體包含doc字段

查詢數(shù)據(jù)
查詢我們剛剛插入的id為1的數(shù)據(jù)

curl 'localhost:9200/customer/external/1?pretty'

執(zhí)行之后返回如下

{
    "_index" : "customer",
    "_type" : "external",
    "_id" : "2",
    "_version" : 4,
    "found" : true,
    "_source" : {
        "name" : "fanyank"
    }
}

刪除index

curl -XDELETE 'localhost:9200/customer?pretty'
curl 'localhost:9200/_cat/indices?v'

刪除document

curl -XDELETE 'localhost:9200/customer/external/2?pretty'

批處理
批處理可以在一次請求中對多個document完成插入北救，更新，刪除操作
如下示例在一次請求中插入了一條數(shù)據(jù){"name":"Alice"},更新一條數(shù)據(jù){"name":"tom","age",16},刪除一條數(shù)據(jù){"_id":"1"}
```
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"index":{"_id":"3"}}
{"name":"Alice"}
{"update":{"_id":"2"}}
{"doc":{"name:"tom","age":"16"}}
{"delete":{"_id":"1"}}
'
```
這個不成功芜抒，稍后再試一下

搜索API詳解

搜索某個索引的全部數(shù)據(jù)

curl 'localhost:9200/bank/_search?q=*&pretty'

或者

curl 'localhost:9200/bank/_search?pretty' -d '
{
    "query" : {"match_all":{}}
}
'

限制返回的結(jié)果數(shù)量為100珍策，默認(rèn)限制的是10個

curl 'localhost:9200/bank/_search?pretty' -d '
{
    "query" : {"match_all":{}},
    "size": 100
}
'

分頁
返回下標(biāo)為10往后的10個數(shù)據(jù)

curl 'localhost:9200/bank/_search?pretty' -d '
{
    "query" : {"match_all":{}},
    "from": 10,
    "size": 10
}
'

排序
按照余額倒序排列

curl 'localhost:9200/bank/_search?pretty' -d '
{
    "query" : {"match_all":{}},
    "sort": { "balance": { "order": "desc" } }
}
'

限制返回的字段
限制返回的字段只有account_number和balance

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}'

條件查詢
查詢account_id為10010的賬戶

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": {"account_id":"10010"} }
}'

與查詢
查詢address為test,且account_id為10010的賬戶

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { 
    "bool": {
        "must": [
            {"match": {"account_id":"10010"}},
            {"match": {"address": "test"}}
        ]
    }
 }
}'

或查詢
查詢account_id為10010或account_id為10011的賬戶

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { 
    "bool": {
        "should": [
            {"match": {"account_id":"10010"}},
            {"match": {"account_id": "10011"}}
        ]
    }
 }
}'

非查詢
查詢account_id不為10010和10011的賬戶

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { 
    "bool": {
        "must_not": [
            {"match": {"account_id":"10010"}},
            {"match": {"account_id": "10011"}}
        ]
    }
 }
}'

組合查詢(與或非)
查詢account_id不為10011，且address為test的賬戶

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { 
    "bool": {
        "must_not": [
            {"match": {"account_id": "10011"}}
        ],
        "must": [
            {"match": {"addrss": "test"}}
        ]
    }
 }
}'

過濾器查詢

過濾器查詢宅倒，必須先要查出來所有數(shù)據(jù)然后才能進(jìn)行過濾

單個字段過濾查詢
通常查詢一個精確的值時攘宙，我們不希望對查詢進(jìn)行評分進(jìn)行計(jì)算，只希望對文檔進(jìn)行快速的包含或者排除計(jì)算拐迁，所以我們會使用constant_score查詢使得es以非評分的模式查詢term蹭劈，理論上以非評分模式進(jìn)行查詢的速度是優(yōu)于評分模式的。
1. term過濾單個字段
  查詢所有數(shù)據(jù)线召，過濾出姓名為fanyank的用戶
```
{
    "query": {
        "constant_score": {
            "filter": {
                "term": {
                    "name": "fanyank"
                }
            }
        }
    }
}
```
2. terms過濾出匹配某個字段的所有元素
  查詢出所有數(shù)據(jù)铺韧，過濾出姓名為fanyank,jerry的用戶
```
{
    "query": {
        "constant_score": {
            "filter": {
                "terms": {
                    "name": [
                        "fanyank",
                        "jerry"
                    ]
                }
            }
        }
    }
}
```

多個字段過濾組合查詢
我們在某些場景下可能用到組合過濾進(jìn)行查詢，如面對如下的SQL缓淹，我們的ES查詢參數(shù)應(yīng)該怎么寫呢哈打？

select * from customer where (name = 'fanyank' and age = 10)
or (age = 2)

此時我們會用到多個字段進(jìn)行過濾查詢塔逃，在開始之前，有必要了解一下bool查詢器料仗，因?yàn)?code>bool查詢器提供了查詢與或非的能力
bool查詢器結(jié)構(gòu)如下

{
    "bool": {
        "must": [],
        "must_not": [],
        "should": []
    }
}

其中
must為與查詢
must_not為非查詢
should為或查詢

多個term與或非查詢
接開頭的SQL語句湾盗，我們可以寫出如下查詢參數(shù)

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "should": [
                        {"term": {"age": 2}},
                        {
                            "bool": {
                                "must": [
                                    {"term": {"name": "fanyank"}},
                                    {"term": {"age": 10}}
                                ]
                            }
                        }
                    ]
                }
            }
        }
    }
}

范圍查詢

select * from customer where create_time between '2018-11-01' and '2018-11-31'

對應(yīng)的ES查詢?nèi)缦?/p>

{
    "query": {
        "filtered": {
            "filter": {
                "range": {
                    "create_time": {
                        "gte": "2018-11-01",
                        "lte": "2018-11-31"
                    }
                }
            }
        }
    }
}

range和terms組合查詢

select * from customer where 
name in ('fanyank','jerry')
and create_time between '2018-11-01' and '2018-11-03'

對應(yīng)的ES查詢參數(shù)如下：

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "name": [
                                    "fanyank",
                                    "jerry"
                                ]
                            }
                        },
                        {
                            "range": {
                                "create_time": {
                                    "gte": "2018-11-01",
                                    "lte": "2018-11-03"
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

聚合查詢

avg
使用avg查詢某個字段的平均值
查詢學(xué)生的平均成績

{
    "aggs": {
        "avg_grade": {
            "avg": {
                "field": "grade"
            }
        }
    }
}

查詢結(jié)果如下

{
    ...

    "aggregations": {
        "avg_grade": {
            "value": 75
        }
    }
}

cardinality
使用cardinality統(tǒng)計(jì)某個字段去重之后的值
查詢學(xué)生數(shù)量，由于每個學(xué)生的ID是唯一的立轧，所以去重之后統(tǒng)計(jì)結(jié)果就是學(xué)生數(shù)量格粪，但是如果按照姓名去做統(tǒng)計(jì)，那么統(tǒng)計(jì)出來的學(xué)生的數(shù)量就不準(zhǔn)確了氛改，因?yàn)榭赡艽嬖谛彰嗤膶W(xué)生
```
{
    "size": 0,
    "aggs": {
        "student_count": {
            "cardinality": {
                "field": "id"
            }
        }
    }
}
```
查詢結(jié)果如下：
```
{
    ...

    "aggregations": {
        "student_count": {
            "value": 100
        }
    }
}
```

stats
統(tǒng)計(jì)學(xué)生成績(min,max,avg,count,sum)

{
    "size": 0,
    "aggs": {
        "student_grade_stats": {
            "extended_stats": {
                "field": "grade"
            }
        }
    }
}

查詢結(jié)果如下:

{
    ...

    "aggregations": {
        "grade_stats": {
        "count": 9,
        "min": 72,
        "max": 99,
        "avg": 86,
        "sum": 774,
        "sum_of_squares": 67028,
        "variance": 51.55555555555556,
        "std_deviation": 7.180219742846005,
        "std_deviation_bounds": {
            "upper": 100.36043948569201,
            "lower": 71.63956051430799
        }
        }
    }
}

terms
terms可以讓數(shù)據(jù)按照給定字段的不同值進(jìn)行分組帐萎，組的形式以bucket形式呈現(xiàn)
把學(xué)生按照性別進(jìn)行分組

{
    "size": 0,
    "aggs": {
        "genders": {
            "terms": {
                "field": "gender"
            }
        }
    }
}

執(zhí)行之后結(jié)果如下:

{
    ...

    "aggregations" : {
        "genders" : {
            "doc_count_error_upper_bound": 0, 
            "sum_other_doc_count": 0, 
            "buckets" : [ 
                {
                    "key" : "male",
                    "doc_count" : 10
                },
                {
                    "key" : "female",
                    "doc_count" : 10
                },
            ]
        }
    }
}

value_count
統(tǒng)計(jì)學(xué)生個數(shù)(即使存在兩個名詞相同的學(xué)生也可以統(tǒng)計(jì)出來準(zhǔn)確的個數(shù))
```
{
    "size": 0,
    "aggs": {
        "student_count": {
            "value_count": {
                "field": "name"
            }
        }
    }
}
```

嵌套查詢
分別計(jì)算男生和女生的數(shù)學(xué)成績的平均分

{
    "aggs": {
        "groupby_gender": {
            "terms": {
                "field": "gender"
            },
            "aggs": {
                "avg_math_score": {
                    "avg": {
                        "field": "math"
                    }
                }
            }
        }
    }
}

查詢結(jié)果如下

{
    ...

    "aggregations": {
    "groupby_gender": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "male",
                "doc_count": 2,
                "avg_math_score": {
                    "value": 94.9
                }
            },
            {
                "key": "female",
                "doc_count": 1,
                "avg_math_score": {
                    "value": 9.9
                }
            }
        ]
    }
}
}

top_hits
top_hits也屬于聚合查詢的一種，top_hits的作用就是讓我們能夠查看被聚合document中的某些屬性(否則我們只能夠看到聚合的字段)平窘。
一個具體的應(yīng)用場景就是：按照姓名聚合學(xué)生的姓名吓肋，并求出姓名相同的學(xué)生的平均分凳怨，另外我還要求按照每個學(xué)生的年齡倒序排序

{   
    "size": 0,
    "aggs": {
        "group-by-name": {
            "terms": {
                "field": "name"
            },
            "aggs": {
                "add-top-hits": {
                    "top_hits": {
                    "sort": [
                            {
                                "age": {
                                    "order": "desc"
                                }
                            }
                    ],
                        "_source": {
                            "include": [
                                "age"
                            ]
                        }
                    }
                }, 
                "score-avg": {
                    "avg": {
                        "field": "score"
                    }
                }
            }
        }
    }
}

查詢結(jié)果如下:

{
    "took": 379,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "group-by-name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "fanyank",
                    "doc_count": 3,
                    "score-avg": {
                        "value": 7.3999999999999995
                    },
                    "add-top-hits": {
                        "hits": {
                            "total": 3,
                            "max_score": null,
                            "hits": [
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWc_90rRosSO2HBCMsNS",
                                    "_score": null,
                                    "_source": {
                                        "age": 29
                                    },
                                    "sort": [
                                        29
                                    ]
                                },
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWc_9xGqosSO2HBCMsNR",
                                    "_score": null,
                                    "_source": {
                                        "age": 10
                                    },
                                    "sort": [
                                        10
                                    ]
                                },
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWdAA4_cosSO2HBCMsNW",
                                    "_score": null,
                                    "_source": {
                                        "age": 8
                                    },
                                    "sort": [
                                        8
                                    ]
                                }
                            ]
                        }
                    }
                },
                {
                    "key": "jerry",
                    "doc_count": 2,
                    "score-avg": {
                        "value": 6.95
                    },
                    "add-top-hits": {
                        "hits": {
                            "total": 2,
                            "max_score": null,
                            "hits": [
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWdABHJyosSO2HBCMsNX",
                                    "_score": null,
                                    "_source": {
                                        "age": 8
                                    },
                                    "sort": [
                                        8
                                    ]
                                },
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWc_95cqosSO2HBCMsNT",
                                    "_score": null,
                                    "_source": {
                                        "age": 2
                                    },
                                    "sort": [
                                        2
                                    ]
                                }
                            ]
                        }
                    }
                },
                {
                    "key": "tom",
                    "doc_count": 1,
                    "score-avg": {
                        "value": 12.8
                    },
                    "add-top-hits": {
                        "hits": {
                            "total": 1,
                            "max_score": null,
                            "hits": [
                                {
                                    "_index": "customer",
                                    "_type": "external",
                                    "_id": "AWc_9oRhosSO2HBCMsNQ",
                                    "_score": null,
                                    "_source": {
                                        "age": 28
                                    },
                                    "sort": [
                                        28
                                    ]
                                }
                            ]
                        }
                    }
                }
            ]
        }
    }
}

script
盡管ES為我們提供了多種多樣的統(tǒng)計(jì)函數(shù)瑰艘，但是面對復(fù)雜的業(yè)務(wù)場景ES也常常感到無能為力，此時我們就要使用自己定義的腳本進(jìn)行數(shù)據(jù)統(tǒng)計(jì)
首先需要在/config/elasticseach.yml配置文件中配置肤舞，使得ES支持script腳本

script.inline: on
script.indexed: on

配置成功后重啟ES

查詢firstName和lastName拼接起來去重之后的數(shù)量

{
    "size": 0,
    "aggs": {
        "fullNameCount": {
            "cardinality": {
                "script": "doc['firstName'].value + ' ' + doc['lastName'].value"
            }
        }
    }
}

統(tǒng)計(jì)學(xué)生的成績紫新，如果成績不存在，則把學(xué)生的年齡作為成績進(jìn)行統(tǒng)計(jì)(雖然很不合理)

{
    "size": 0,
    "aggs": {
        "group_by_name": {
            "terms": {"field": "name"},
            "aggs": {
                "score_or_age": {
                    "sum": {
                        "script": "if(doc['score'].value==0){return doc['age'].value}else{return doc['score'].value}"
                    }
                }
            }
        }
    }
}

簡化版本(使用三目運(yùn)算符)

{
    "size": 0,
    "aggs": {
        "group_by_name": {
            "terms": {"field": "name"},
            "aggs": {
                "score_or_age": {
                    "sum": {
                        "script": "return (doc['score'].value==0) ? doc['age'].value : doc['score'].value"
                    }
                }
            }
        }
    }
}

查詢條件為字符串長度(查詢前注意檢查該target_field是否存在)

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "script": {
                                "script": "doc['target_field'] ? doc['target_field'].getValue().length() < 10 : 0"
                            }
                        }
                    ]
                }
            }
        }
    }
}

數(shù)據(jù)字典

某次搜索結(jié)果如下李剖，借此說明各個字段的含義

{
"took": 63,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 1000,
    "max_score": 1,
    "hits": [
    {
        "_index": "bank",
        "_type": "account",
        "_id": "1",
        "_score": 1,
        "_source": {
        "account_number": 1,
        "balance": 39225,
        "firstname": "Amber",
        "lastname": "Duke",
        "age": 32,
        "gender": "M",
        "address": "880 Holmes Lane",
        "employer": "Pyrami",
        "email": "amberduke@pyrami.com",
        "city": "Brogan",
        "state": "IL"
        }
    },
    {
        "_index": "bank",
        "_type": "account",
        "_id": "6",
        "_score": 1,
        "_source": {
        "account_number": 6,
        "balance": 5686,
        "firstname": "Hattie",
        "lastname": "Bond",
        "age": 36,
        "gender": "M",
        "address": "671 Bristol Street",
        "employer": "Netagy",
        "email": "hattiebond@netagy.com",
        "city": "Dante",
        "state": "TN"
        }
    }
    ]
}
}

數(shù)據(jù)字典：

took: 查詢花費(fèi)的毫秒數(shù)
time_out: 查詢是否超時
_shards: 查詢所涉及到的數(shù)據(jù)分片的數(shù)量芒率，成功數(shù)量和失敗數(shù)量
hits: 查詢結(jié)果
- hits_.total: 返回的結(jié)果總數(shù)
- hits.hits: 返回的結(jié)果，以數(shù)組形式返回
- _score: 與查詢參數(shù)匹配的程度
- max_score: 最大匹配程度

ES過濾非空字符串||ES處理空值

首先需要明確要過濾的是空字符串還是空值篙顺，比如我們要過濾的字段是non_field

空字符串
空字符串表現(xiàn)形式如下：

{
    ...
    "non_field": ""
}

要過濾空字符串偶芍，第一步先確定mapping中non_field字段的類型，如果是

{
    "non_field": {
        "type": "string",
        "index": "not_analyzed"  /*關(guān)閉分詞*/
    }
    /*或者這個類型*/
    "non_field": {
        "type": "keyword"
    }
}

那么可以直接使用如下參數(shù)過濾空字符串

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must_not": {
                        "term": {
                            "non_field": ""
                        }
                    }
                }
            }
        }
    }
}

如果mappping中的non_field類型為string類型德玫，有兩種做法匪蟀，第一種做法就是修改mapping中的類型，第二種做法就是使用script腳本去過濾
使用script腳本過濾參數(shù)如下：

{
    "query": {
        "filtered": {
            "filter": {
                "script": {
                    "script": "_source.non_field.length() != 0"
                }
            }
        }
    }
}

空值的表現(xiàn)形式如下：

{
    ...
    "non_field": null
    /*或者沒有這個字段*/
}

過濾空值做法比較簡單宰僧，參數(shù)如下:

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": {
                        "term": {
                            "exists": {"filed": "non_field"}
                        }
                    }
                }
            }
        }
    }
}

過濾空字符串和空值
如果non_field已關(guān)閉分詞材彪，參數(shù)如下：

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must_not": {
                        "term": {
                            "non_field": ""
                        }
                    },
                    "must": {
                        "exists": {
                            "field": "non_field"
                        }
                    }
                }
            }
        }
    }
}

如果non_field未關(guān)閉分詞，參數(shù)如下：

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "exists": {
                                "field": "non_field"
                            }
                        }, {
                            "script": {
                                "script": "_source.non_field.length() != 0"
                            }
                        }
                    ]
                }
            }
        }
    }
}

最后編輯于：2020.04.14 11:00:19

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末琴儿，一起剝皮案震驚了整個濱河市段化，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌造成，老刑警劉巖显熏，帶你破解...
沈念sama閱讀 207,248評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異晒屎，居然都是意外死亡喘蟆，警方通過查閱死者的電腦和手機(jī)现诀，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,681評論 2贊 381
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來履肃，“玉大人仔沿，你說我怎么就攤上這事〕咂澹” “怎么了封锉？”我有些...
開封第一講書人閱讀 153,443評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長膘螟。經(jīng)常有香客問我成福，道長，這世上最難降的妖魔是什么荆残？我笑而不...
開封第一講書人閱讀 55,475評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任奴艾，我火速辦了婚禮，結(jié)果婚禮上内斯，老公的妹妹穿的比我還像新娘蕴潦。我一直安慰自己，他們只是感情好俘闯，可當(dāng)我...
茶點(diǎn)故事閱讀 64,458評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布潭苞。她就那樣靜靜地躺著，像睡著了一般真朗。火紅的嫁衣襯著肌膚如雪此疹。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 49,185評論 1贊 284
城市分裂傳說
那天遮婶，我揣著相機(jī)與錄音蝗碎，去河邊找鬼。笑死旗扑，一個胖子當(dāng)著我的面吹牛蹦骑，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播肩豁，決...
沈念sama閱讀 38,451評論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼脊串，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了清钥？” 一聲冷哼從身側(cè)響起琼锋，我...
開封第一講書人閱讀 37,112評論 0贊 261
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎祟昭，沒想到半個月后缕坎，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,609評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡篡悟，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,083評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年谜叹，在試婚紗的時候發(fā)現(xiàn)自己被綠了匾寝。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,163評論 1贊 334
活死人
序言：一個原本活蹦亂跳的男人離奇死亡荷腊，死狀恐怖艳悔，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情女仰，我是刑警寧澤猜年，帶...
沈念sama閱讀 33,803評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站疾忍，受9級特大地震影響乔外，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜一罩，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,357評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一杨幼、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧聂渊，春花似錦差购、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,357評論 0贊 19
一樁弒父案歹撒，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽莲组。三九已至诊胞，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間锹杈，已是汗流浹背撵孤。一陣腳步聲響...
開封第一講書人閱讀 31,590評論 1贊 261
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留竭望，地道東北人邪码。一個月前我還...
沈念sama閱讀 45,636評論 2贊 355
代替公主和親
正文我出身青樓，卻偏偏與公主長得像咬清，于是被迫代替她去往敵國和親闭专。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,925評論 2贊 344

ElasticSearch入門

ElasticSearch筆記

前言

ElasticSearch筆記

基礎(chǔ)知識

啟動ElasticSearch

快速上手

搜索API詳解

過濾器查詢

聚合查詢

數(shù)據(jù)字典

ES過濾非空字符串||ES處理空值

推薦閱讀更多精彩內(nèi)容