Elasticsearch7學(xué)習(xí)筆記(下)

Elasticsearch7學(xué)習(xí)筆記(上)
Elasticsearch7學(xué)習(xí)筆記(中)
Elasticsearch7學(xué)習(xí)筆記(下)
Elasticsearch7學(xué)習(xí)筆記(實(shí)戰(zhàn))

十二双戳、深入聚合數(shù)據(jù)分析

12.1 bucket與metric兩個(gè)核心概念

bucket：就是對數(shù)據(jù)進(jìn)行分組事富，類似MySQL中的group

metric：對一個(gè)數(shù)據(jù)分組執(zhí)行的統(tǒng)計(jì)罢杉；metric就是對一個(gè)bucket執(zhí)行的某種聚合分析的操作，比如說求平均值是复，求最大值懦傍，求最小值

12.2 家電賣場案例以及統(tǒng)計(jì)哪種顏色電視銷量最高

以一個(gè)家電賣場中的電視銷售數(shù)據(jù)為背景只锭，來對各種品牌鱼蝉，各種顏色的電視的銷量和銷售額茉继，進(jìn)行各種各樣角度的分析

初始化數(shù)據(jù)

PUT /tvs
{
    "mappings": {
        "properties": {
            "price": {
                "type": "long"
            },
            "color": {
                "type": "keyword"
            },
            "brand": {
                "type": "keyword"
            },
            "sold_date": {
                "type": "date"
            }
        }
    }
}

添加數(shù)據(jù)

POST /tvs/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2019-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "綠色", "brand" : "小米", "sold_date" : "2019-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "藍(lán)色", "brand" : "TCL", "sold_date" : "2019-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "綠色", "brand" : "TCL", "sold_date" : "2019-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "紅色", "brand" : "三星", "sold_date" : "2020-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "藍(lán)色", "brand" : "小米", "sold_date" : "2020-02-12" }

統(tǒng)計(jì)哪種顏色的電視銷量最高

GET /tvs/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

size：只獲取聚合結(jié)果，而不要執(zhí)行聚合的原始數(shù)據(jù)
aggs：固定語法蚀乔，要對一份數(shù)據(jù)執(zhí)行分組聚合操作
popular_colors：就是對每個(gè)aggs烁竭，都要起一個(gè)名字，這個(gè)名字是隨機(jī)的吉挣，你隨便取什么都o(jì)k
terms：根據(jù)字段的值進(jìn)行分組
field：根據(jù)指定的字段的值進(jìn)行分組

返回結(jié)果說明：

hits.hits：我們指定了size是0派撕，所以hits.hits就是空的，否則會(huì)把執(zhí)行聚合的那些原始數(shù)據(jù)給你返回回來
aggregations：聚合結(jié)果
popular_color：我們指定的某個(gè)聚合的名稱
buckets：根據(jù)我們指定的field劃分出的buckets
key：每個(gè)bucket對應(yīng)的那個(gè)值
doc_count：這個(gè)bucket分組內(nèi)睬魂，有多少個(gè)數(shù)據(jù)
數(shù)量终吼，其實(shí)就是這種顏色的銷量

每種顏色對應(yīng)的bucket中的數(shù)據(jù)的默認(rèn)的排序規(guī)則：按照doc_count降序排序

12.3 實(shí)戰(zhàn)bucket+metric：統(tǒng)計(jì)每種顏色電視平均價(jià)格

GET /tvs/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": { 
            "avg_price": { 
               "avg": {
                  "field": "price" 
               }
            }
         }
      }
   }
}

12.4 bucket嵌套實(shí)現(xiàn)顏色+品牌的多層下鉆分析

統(tǒng)計(jì)每個(gè)顏色的平均價(jià)格，同時(shí)統(tǒng)計(jì)每個(gè)顏色下每個(gè)品牌的平均價(jià)格

GET /tvs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "color_avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "group_by_brand": {
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "brand_avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

這里需要知道的是es是根據(jù)語句順序執(zhí)行的氯哮，就像人去讀取執(zhí)行一樣际跪。

12.5 掌握更多metrics：統(tǒng)計(jì)每種顏色電視最大最小價(jià)格

更多的metric

count：bucket，terms喉钢，自動(dòng)就會(huì)有一個(gè)doc_count姆打，就相當(dāng)于是count
avg：avg aggs，求平均值
max：求一個(gè)bucket內(nèi)肠虽，指定field值最大的那個(gè)數(shù)據(jù)
min：求一個(gè)bucket內(nèi)幔戏，指定field值最小的那個(gè)數(shù)據(jù)
sum：求一個(gè)bucket內(nèi)，指定field值的總和



GET /tvs/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": {
            "avg_price": { "avg": { "field": "price" } },
            "min_price" : { "min": { "field": "price"} }, 
            "max_price" : { "max": { "field": "price"} },
            "sum_price" : { "sum": { "field": "price" } } 
         }
      }
   }
}

12.6 實(shí)戰(zhàn)histogram按價(jià)格區(qū)間統(tǒng)計(jì)電視銷量和銷售額

histogram：類似于terms税课，也是進(jìn)行bucket分組操作闲延，接收一個(gè)field，按照這個(gè)field的值的各個(gè)范圍區(qū)間韩玩，進(jìn)行bucket分組操作垒玲；比如：

"histogram":{ 
  "field": "price",
  "interval": 2000
},

interval：2000，劃分范圍找颓，0_{2000合愈，2000}4000，4000_{6000叮雳，6000}8000想暗，8000~10000分組

根據(jù)price的值，比如2500帘不，看落在哪個(gè)區(qū)間內(nèi)说莫，比如2000_{4000，此時(shí)就會(huì)將這條數(shù)據(jù)放入2000}4000對應(yīng)的那個(gè)bucket中

bucket劃分的方法寞焙，terms储狭，將field值相同的數(shù)據(jù)劃分到一個(gè)bucket中互婿；bucket有了之后，去對每個(gè)bucket執(zhí)行avg辽狈，count慈参，sum，max刮萌，min驮配，等各種metric聚合分析操作

GET /tvs/_search
{
   "size" : 0,
   "aggs":{
      "price":{
         "histogram":{ 
            "field": "price",
            "interval": 2000
         },
         "aggs":{
            "revenue": {
               "sum": { 
                 "field" : "price"
               }
             }
         }
      }
   }
}

12.7 實(shí)戰(zhàn)date histogram之統(tǒng)計(jì)每月電視銷量

histogram，按照某個(gè)值指定的interval劃分着茸；
date histogram壮锻，按照我們指定的某個(gè)date類型的日期field，以及日期interval涮阔，按照一定的日期間隔猜绣，去劃分；

GET /tvs/_search
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold_date",
            "interval": "month", 
            "format": "yyyy-MM-dd",
            "min_doc_count" : 0, 
            "extended_bounds" : { 
                "min" : "2019-01-01",
                "max" : "2020-12-31"
            }
         }
      }
   }
}

min_doc_count：即使某個(gè)日期interval敬特，2019-01-01~2019-01-31中掰邢，一條數(shù)據(jù)都沒有，那么這個(gè)區(qū)間也是要返回的伟阔，不然默認(rèn)是會(huì)過濾掉這個(gè)區(qū)間的

extended_bounds：min辣之，max：劃分bucket的時(shí)候，會(huì)限定在這個(gè)起始日期减俏，和截止日期內(nèi)

12.8 下鉆分析之統(tǒng)計(jì)每季度每個(gè)品牌的銷售額

GET /tvs/_search
{
  "size": 0,
  "aggs": {
    "group_by_sold_date": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2016-01-01",
          "max": "2017-12-31"
        }
      },
      "aggs": {
        "group_by_brand": {
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "sum_price": {
              "sum": {
                "field": "price"
              }
            }
          }
        },
        "total_sum_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

12.9 搜索+聚合：統(tǒng)計(jì)指定品牌下每個(gè)顏色的銷量

GET /tvs/_search 
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "小米"
      }
    }
  },
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      }
    }
  }
}

es的任何的聚合召烂，都必須在搜索出來的結(jié)果數(shù)據(jù)中進(jìn)行聚合分析操作。

12.10 global bucket：單個(gè)品牌與所有品牌銷量對比

一個(gè)聚合操作娃承，必須在query的搜索結(jié)果范圍內(nèi)執(zhí)行

上面的需求需要出來兩個(gè)結(jié)果，一個(gè)結(jié)果怕篷，是基于query搜索結(jié)果來聚合的; 一個(gè)結(jié)果历筝，是對所有數(shù)據(jù)執(zhí)行聚合的

GET /tvs/_search 
{
  "size": 0, 
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "single_brand_avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "all": {
      "global": {},
      "aggs": {
        "all_brand_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

global：就是global bucket，就是將所有數(shù)據(jù)納入聚合的scope廊谓，而不管之前的query

12.11 過濾+聚合：統(tǒng)計(jì)價(jià)格大于1200的電視平均價(jià)格

搜索+聚合
過濾+聚合

GET /tvs/_search 
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 1200
          }
        }
      }
    }
  },
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

12.12 bucket filter：統(tǒng)計(jì)牌品最近一個(gè)月的平均價(jià)格

GET /tvs/_search 
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "recent_150d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-150d"
          }
        }
      },
      "aggs": {
        "recent_150d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "recent_140d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-140d"
          }
        }
      },
      "aggs": {
        "recent_140d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "recent_130d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-130d"
          }
        }
      },
      "aggs": {
        "recent_130d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

bucket filter：對不同的bucket下的aggs梳猪，進(jìn)行filter

12.13 排序：按每種顏色的平均銷售額降序排序

默認(rèn)排序，是按照每個(gè)bucket的doc_count降序來排的

GET /tvs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

指定排序規(guī)則

GET /tvs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "avg_price": "asc"
        }
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

12.14 顏色+品牌下鉆分析時(shí)按最深層metric進(jìn)行排序

GET /tvs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"},
      "aggs": {
        "group_by_brand": {
          "terms": {
            "field": "brand",
            "order": {
              "avg_price": "desc"
            }
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

12.15 易并行聚合算法蒸痹，三角選擇原則春弥，近似聚合算法

易并行聚合算法：max

有些聚合分析的算法，是很容易就可以并行的叠荠，比如說max匿沛，只需要各個(gè)節(jié)點(diǎn)單獨(dú)求最大，然后將結(jié)果返回再求最大值即可榛鼎。

有些聚合分析的算法谎仲，是不好并行的，比如count(distinct)蛮浑，并不是在每個(gè)node上日矫，直接就去重求和就可以的，因?yàn)閿?shù)據(jù)可能會(huì)很多氯夷，同時(shí)各個(gè)節(jié)點(diǎn)之間也有重復(fù)數(shù)據(jù)的情況；

因此為提高性能es會(huì)采取近似聚合的方式，就是采用在每個(gè)node上進(jìn)行近估計(jì)的方式平匈，得到最終的結(jié)論；
近似估計(jì)后的結(jié)果藏古，不完全準(zhǔn)確增炭，但是速度會(huì)很快，一般會(huì)達(dá)到完全精準(zhǔn)的算法的性能的數(shù)十倍

三角選擇原則

精準(zhǔn)+實(shí)時(shí)+大數(shù)據(jù) --> 選擇2個(gè)

（1）精準(zhǔn)+實(shí)時(shí): 沒有大數(shù)據(jù)校翔，數(shù)據(jù)量很小弟跑，那么一般就是單機(jī)跑，隨便你則么玩兒就可以
（2）精準(zhǔn)+大數(shù)據(jù)：hadoop防症，批處理孟辑，非實(shí)時(shí)，可以處理海量數(shù)據(jù)蔫敲，保證精準(zhǔn)饲嗽，可能會(huì)跑幾個(gè)小時(shí)
（3）大數(shù)據(jù)+實(shí)時(shí)：es，不精準(zhǔn)奈嘿，近似估計(jì)貌虾，可能會(huì)有百分之幾的錯(cuò)誤率

近似聚合算法

如果采取近似估計(jì)的算法：延時(shí)在100ms左右，0.5%錯(cuò)誤

如果采取100%精準(zhǔn)的算法：延時(shí)一般在5s~幾十s裙犹，甚至幾十分鐘尽狠、幾個(gè)小時(shí)， 0%錯(cuò)誤

12.16 cardinality去重算法以及每月銷售品牌數(shù)量統(tǒng)計(jì)

cartinality metric：對每個(gè)bucket中的指定的field進(jìn)行去重叶圃，取去重后的count袄膏，類似于count(distcint)

GET /tvs/_search
{
  "size" : 0,
  "aggs" : {
      "months" : {
        "date_histogram": {
          "field": "sold_date",
          "interval": "month"
        },
        "aggs": {
          "distinct_colors" : {
              "cardinality" : {
                "field" : "brand"
              }
          }
        }
      }
  }
}

12.17 cardinality算法之優(yōu)化內(nèi)存開銷以及HLL算法

cardinality，count(distinct)掺冠，5%的錯(cuò)誤率沉馆，性能在100ms左右

precision_threshold優(yōu)化準(zhǔn)確率和內(nèi)存開銷

GET /tvs/_search
{
    "size" : 0,
    "aggs" : {
        "distinct_brand" : {
            "cardinality" : {
              "field" : "brand",
              "precision_threshold" : 100 
            }
        }
    }
}

brand去重，如果brand的unique value在precision_threshold個(gè)以內(nèi)德崭，cardinality斥黑，幾乎保證100%準(zhǔn)確

cardinality算法，會(huì)占用precision_threshold * 8 byte 內(nèi)存消耗眉厨，100 * 8 = 800個(gè)字節(jié)锌奴；
占用內(nèi)存很小,而且unique value如果的確在值以內(nèi)，那么可以確保100%準(zhǔn)確缺猛；數(shù)百萬的unique value缨叫，錯(cuò)誤率在5%以內(nèi)

precision_threshold椭符，值設(shè)置的越大，占用內(nèi)存越大耻姥，1000 * 8 = 8000 / 1000 = 8KB销钝，可以確保更多unique value的場景下，100%的準(zhǔn)確

HyperLogLog++ (HLL)算法性能優(yōu)化

cardinality底層算法：HLL算法琐簇，HLL算法的性能

對所有的uqniue value取hash值蒸健，通過hash值近似去求distcint count，存在誤差

默認(rèn)情況下婉商，發(fā)送一個(gè)cardinality請求的時(shí)候似忧，會(huì)動(dòng)態(tài)地對所有的field value，取hash值; 將取hash值的操作丈秩，前移到建立索引的時(shí)候

構(gòu)建hash

PUT /tvs
{
  "mappings": {
      "properties": {
        "brand": {
          "type": "text",
          "fields": {
            "hash": {
              "type": "murmur3" 
            }
          }
        }
      }
    }
}

基于hash進(jìn)行去重查詢

GET /tvs/_search
{
    "size" : 0,
    "aggs" : {
        "distinct_brand" : {
            "cardinality" : {
              "field" : "brand.hash",
              "precision_threshold" : 100 
            }
        }
    }
}

12.18 percentiles百分比算法以及網(wǎng)站訪問時(shí)延統(tǒng)計(jì)

需求：比如有一個(gè)網(wǎng)站盯捌，記錄下了每次請求的訪問的耗時(shí)，需要統(tǒng)計(jì)tp50蘑秽，tp90饺著，tp99

tp50：50%的請求的耗時(shí)最長在多長時(shí)間
tp90：90%的請求的耗時(shí)最長在多長時(shí)間
tp99：99%的請求的耗時(shí)最長在多長時(shí)間

設(shè)置mapping

PUT /website
{
    "mappings": {
      "properties": {
          "latency": {
              "type": "long"
          },
          "province": {
              "type": "keyword"
          },
          "timestamp": {
              "type": "date"
          }
      }
    }
}

添加數(shù)據(jù)

POST /website/_bulk
{ "index": {}}
{ "latency" : 105, "province" : "江蘇", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 83, "province" : "江蘇", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 92, "province" : "江蘇", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 112, "province" : "江蘇", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 68, "province" : "江蘇", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 76, "province" : "江蘇", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 101, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 275, "province" : "新疆", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 166, "province" : "新疆", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 654, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 389, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 302, "province" : "新疆", "timestamp" : "2016-10-29" }

統(tǒng)計(jì)數(shù)據(jù)

GET /website/_search 
{
  "size": 0,
  "aggs": {
    "latency_percentiles": {
      "percentiles": {
        "field": "latency",
        "percents": [
          50,
          95,
          99
        ]
      }
    },
    "latency_avg": {
      "avg": {
        "field": "latency"
      }
    }
  }
}

50%的請求，數(shù)值的最大的值是多少肠牲，不是完全準(zhǔn)確的

GET /website/_search 
{
  "size": 0,
  "aggs": {
    "group_by_province": {
      "terms": {
        "field": "province"
      },
      "aggs": {
        "latency_percentiles": {
          "percentiles": {
            "field": "latency",
            "percents": [
              50,
              95,
              99
            ]
          }
        },
        "latency_avg": {
          "avg": {
            "field": "latency"
          }
        }
      }
    }
}

12.19 percentiles rank以及網(wǎng)站訪問時(shí)延SLA統(tǒng)計(jì)

SLA：就是你提供的服務(wù)的標(biāo)準(zhǔn)

我們的網(wǎng)站的提供的訪問延時(shí)的SLA幼衰，確保所有的請求100%，都必須在200ms以內(nèi)缀雳，大公司內(nèi)渡嚣，一般都是要求100%在200ms以內(nèi)

如果超過1s，則需要升級到A級故障肥印，代表網(wǎng)站的訪問性能和用戶體驗(yàn)急劇下降

需求：在200ms以內(nèi)的识椰，有百分之多少，在1000毫秒以內(nèi)的有百分之多少深碱，percentile ranks metric

這個(gè)percentile ranks裤唠，其實(shí)比pencentile還要常用

按照品牌分組，計(jì)算莹痢，電視機(jī)，售價(jià)在1000占比墓赴，2000占比竞膳，3000占比

GET /website/_search 
{
  "size": 0,
  "aggs": {
    "group_by_province": {
      "terms": {
        "field": "province"
      },
      "aggs": {
        "latency_percentile_ranks": {
          "percentile_ranks": {
            "field": "latency",
            "values": [
              200,
              1000
            ]
          }
        }
      }
    }
  }
}

percentile的優(yōu)化：TDigest算法，用很多的節(jié)點(diǎn)來執(zhí)行百分比的計(jì)算诫硕，近似估計(jì)坦辟，有誤差，節(jié)點(diǎn)越多章办，越精準(zhǔn)

compression：限制節(jié)點(diǎn)數(shù)量最多 compression * 20 = 2000個(gè)node去計(jì)算锉走；默認(rèn)100滨彻；越大，占用內(nèi)存越多挪蹭，越精準(zhǔn)亭饵，性能越差

一個(gè)節(jié)點(diǎn)占用32字節(jié)，100 * 20 * 32 = 64KB

如果你想要percentile算法越精準(zhǔn)梁厉，compression可以設(shè)置的越大

12.20 基于doc value正排索引的聚合內(nèi)部原理

聚合分析的內(nèi)部原理是什么辜羊？
aggs，term词顾，metric avg max這些執(zhí)行一個(gè)聚合操作的時(shí)候八秃，內(nèi)部原理是怎樣的呢？
用了什么樣的數(shù)據(jù)結(jié)構(gòu)去執(zhí)行聚合肉盹？是不是用的倒排索引昔驱？

12.21 doc value機(jī)制內(nèi)核級原理深入探秘

doc value原理

（1）index-time生成

PUT/POST的時(shí)候，就會(huì)生成doc value數(shù)據(jù)上忍，也就是正排索引

（2）核心原理與倒排索引類似

正排索引骤肛，也會(huì)寫入磁盤文件中，然后呢睡雇，os cache先進(jìn)行緩存萌衬，以提升訪問doc value正排索引的性能
如果os cache內(nèi)存大小不足夠放得下整個(gè)正排索引，doc value它抱，就會(huì)將doc value的數(shù)據(jù)寫入磁盤文件中

（3）性能問題：給jvm更少內(nèi)存秕豫，64g服務(wù)器，給jvm最多16g

es官方是建議观蓄，es大量是基于os cache來進(jìn)行緩存和提升性能的混移，不建議用jvm內(nèi)存來進(jìn)行緩存，那樣會(huì)導(dǎo)致一定的gc開銷和oom問題侮穿；
給jvm更少的內(nèi)存歌径，給os cache更大的內(nèi)存；64g服務(wù)器亲茅，給jvm最多16g回铛，幾十個(gè)g的內(nèi)存給os cache；
os cache可以提升doc value和倒排索引的緩存和查詢效率

column壓縮

doc1: 550
doc2: 550
doc3: 500

合并相同值克锣，550茵肃，doc1和doc2都保留一個(gè)550的標(biāo)識(shí)即可

（1）所有值相同，直接保留單值
（2）少于256個(gè)值袭祟，使用table encoding模式：一種壓縮方式
（3）大于256個(gè)值验残，看有沒有最大公約數(shù)，有就除以最大公約數(shù)巾乳，然后保留這個(gè)最大公約數(shù)

doc1: 36
doc2: 24

6 --> doc1: 6, doc2: 4 --> 保留一個(gè)最大公約數(shù)6的標(biāo)識(shí)您没，6也保存起來

如果沒有最大公約數(shù)鸟召，采取offset結(jié)合壓縮的方式：

disable doc value

如果的確不需要doc value，比如聚合等操作氨鹏，那么可以禁用欧募，減少磁盤空間占用

PUT /my_index
{
  "mappings": {
      "properties": {
        "my_field": {
          "type":       "keyword"
          "doc_values": false 
        }
      }
    }
}

12.22 string field聚合實(shí)驗(yàn)以及fielddata原理初探

對于分詞的field執(zhí)行aggregation，發(fā)現(xiàn)報(bào)錯(cuò)

GET /test_index/_search 
{
  "aggs": {
    "group_by_test_field": {
      "terms": {
        "field": "test_field"
      }
    }
  }
}

對分詞的field喻犁，直接執(zhí)行聚合操作槽片，會(huì)報(bào)錯(cuò)，大概意思是說肢础，你必須要打開fielddata还栓，然后將正排索引數(shù)據(jù)加載到內(nèi)存中，才可以對分詞的field執(zhí)行聚合操作传轰，而且會(huì)消耗很大的內(nèi)存

給分詞的field剩盒，設(shè)置fielddata=true，發(fā)現(xiàn)可以執(zhí)行慨蛙，但是結(jié)果似乎不是我們需要的

如果要對分詞的field執(zhí)行聚合操作辽聊，必須將fielddata設(shè)置為true

POST /test_index/_mapping
{
  "properties": {
    "test_field": {
      "type": "text",
      "fielddata": true
    }
  }
}

GET /test_index/_search 
{
  "size": 0, 
  "aggs": {
    "group_by_test_field": {
      "terms": {
        "field": "test_field"
      }
    }
  }
}

使用內(nèi)置field不分詞，對string field進(jìn)行聚合

GET /test_index/_search 
{
  "size": 0,
  "aggs": {
    "group_by_test_field": {
      "terms": {
        "field": "test_field.keyword"
      }
    }
  }
}

如果對不分詞的field執(zhí)行聚合操作期贫，直接就可以執(zhí)行跟匆，不需要設(shè)置fieldata=true

分詞field+fielddata的工作原理

doc value --> 不分詞的所有field，可以執(zhí)行聚合操作 --> 如果某個(gè)field不分詞通砍，那么在創(chuàng)建索引時(shí)（index-time）就會(huì)自動(dòng)生成doc value --> 針對這些不分詞的field執(zhí)行聚合操作的時(shí)候玛臂，自動(dòng)就會(huì)用doc value來執(zhí)行

分詞field，是沒有doc value的封孙，在index-time迹冤，如果某個(gè)field是分詞的，那么是不會(huì)給它建立doc value正排索引的虎忌，因?yàn)榉衷~后泡徙，占用的空間過于大，所以默認(rèn)是不支持分詞field進(jìn)行聚合的

分詞field默認(rèn)沒有doc value膜蠢，所以直接對分詞field執(zhí)行聚合操作堪藐，是會(huì)報(bào)錯(cuò)的

對于分詞field，必須打開和使用fielddata挑围，完全存在于純內(nèi)存中庶橱，結(jié)構(gòu)和doc value類似；如果是ngram或者是大量term贪惹，那么必將占用大量的內(nèi)存。

如果一定要對分詞的field執(zhí)行聚合寂嘉，那么必須將fielddata=true奏瞬，然后es就會(huì)在執(zhí)行聚合操作的時(shí)候枫绅，現(xiàn)場將field對應(yīng)的數(shù)據(jù)，建立一份fielddata正排索引硼端，fielddata正排索引的結(jié)構(gòu)跟doc value是類似的并淋，
但是只會(huì)將fielddata正排索引加載到內(nèi)存中來，然后基于內(nèi)存中的fielddata正排索引執(zhí)行分詞field的聚合操作

如果直接對分詞field執(zhí)行聚合珍昨，報(bào)錯(cuò)县耽，才會(huì)讓我們開啟fielddata=true，告訴我們镣典，會(huì)將fielddata uninverted index，正排索引兄春，加載到內(nèi)存澎剥，會(huì)耗費(fèi)內(nèi)存空間

為什么fielddata必須在內(nèi)存？因?yàn)榇蠹易约核伎家幌赂嫌撸衷~的字符串哑姚，需要按照term進(jìn)行聚合，需要執(zhí)行更加復(fù)雜的算法和操作芜茵，如果基于磁盤和os cache叙量，那么性能會(huì)很差

12.23 fielddata內(nèi)存控制以及circuit breaker斷路器

fielddata核心原理

fielddata加載到內(nèi)存的過程是懶加載的，對一個(gè)分詞 field執(zhí)行聚合時(shí)九串，才會(huì)加載绞佩，而且是field-level加載的；

一個(gè)index的一個(gè)field蒸辆，所有doc都會(huì)被加載征炼，而不是少數(shù)doc；不是index-time創(chuàng)建躬贡，是query-time創(chuàng)建

fielddata內(nèi)存限制

indices.fielddata.cache.size: 20%谆奥，超出限制，清除內(nèi)存已有fielddata數(shù)據(jù)
fielddata占用的內(nèi)存超出了這個(gè)比例的限制拂玻，那么就清除掉內(nèi)存中已有的fielddata數(shù)據(jù)
默認(rèn)無限制酸些，限制內(nèi)存使用，但是會(huì)導(dǎo)致頻繁evict和reload檐蚜，大量IO性能損耗魄懂，以及內(nèi)存碎片和gc

監(jiān)控fielddata內(nèi)存使用

GET /_stats/fielddata?fields=*
GET /_nodes/stats/indices/fielddata?fields=*
GET /_nodes/stats/indices/fielddata?level=indices&fields=*

circuit breaker

如果一次query load的feilddata超過總內(nèi)存，就會(huì)發(fā)生內(nèi)存溢出（OOM）

circuit breaker會(huì)估算query要加載的fielddata大小闯第，如果超出總內(nèi)存市栗，就短路，query直接失敗

indices.breaker.fielddata.limit：fielddata的內(nèi)存限制，默認(rèn)60%
indices.breaker.request.limit：執(zhí)行聚合的內(nèi)存限制填帽，默認(rèn)40%
indices.breaker.total.limit：綜合上面兩個(gè)蛛淋，限制在70%以內(nèi)

fielddata filter的細(xì)粒度內(nèi)存加載控制

POST /test_index/_mapping
{
  "properties": {
    "my_field": {
      "type": "text",
      "fielddata": { 
        "filter": {
          "frequency": { 
            "min": 0.01, 
            "min_segment_size": 500  
          }
        }
      }
    }
  }
}

min：僅僅加載至少在1%的doc中出現(xiàn)過的term對應(yīng)的fielddata

比如說某個(gè)值，hello篡腌，總共有1000個(gè)doc褐荷，hello必須在10個(gè)doc中出現(xiàn)，那么這個(gè)hello對應(yīng)的fielddata才會(huì)加載到內(nèi)存中來

min_segment_size：少于500 doc的segment不加載fielddata

加載fielddata的時(shí)候嘹悼，也是按照segment去進(jìn)行加載的叛甫，某個(gè)segment里面的doc數(shù)量少于500個(gè)，那么這個(gè)segment的fielddata就不加載

一般不會(huì)去設(shè)置它杨伙，知道就好

fielddata預(yù)加載機(jī)制以及序號標(biāo)記預(yù)加載

如果真的要對分詞的field執(zhí)行聚合其监，那么每次都在query-time現(xiàn)場生產(chǎn)fielddata并加載到內(nèi)存中來，速度可能會(huì)比較慢

我們是不是可以預(yù)先生成加載fielddata到內(nèi)存中來缀台？

fielddata預(yù)加載

POST /test_index/_mapping
{
  "properties": {
    "test_field": {
      "type": "string",
      "fielddata": {
        "loading" : "eager" 
      }
    }
  }
}

query-time的fielddata生成和加載到內(nèi)存棠赛，變?yōu)閕ndex-time，建立倒排索引的時(shí)候膛腐，會(huì)同步生成fielddata并且加載到內(nèi)存中來睛约，
這樣的話，對分詞field的聚合性能當(dāng)然會(huì)大幅度增強(qiáng)

序號標(biāo)記預(yù)加載

global ordinal原理解釋

doc1: status1
doc2: status2
doc3: status2
doc4: status1

有很多重復(fù)值的情況哲身，會(huì)進(jìn)行g(shù)lobal ordinal標(biāo)記辩涝，類似下面

status1 --> 0
status2 --> 1

這樣doc中可以這樣存儲(chǔ)

doc1: 0
doc2: 1
doc3: 1
doc4: 0

建立的fielddata也會(huì)是這個(gè)樣子的，這樣的好處就是減少重復(fù)字符串的出現(xiàn)的次數(shù)勘天，減少內(nèi)存的消耗

POST /test_index/_mapping
{
  "properties": {
    "test_field": {
      "type": "string",
      "fielddata": {
        "loading" : "eager_global_ordinals" 
      }
    }
  }
}

12.24 海量bucket優(yōu)化機(jī)制：從深度優(yōu)先到廣度優(yōu)先

當(dāng)buckets數(shù)量特別多的時(shí)候怔揩，深度優(yōu)先和廣度優(yōu)先的原理

十三、數(shù)據(jù)建模實(shí)戰(zhàn)

13.1 關(guān)系型與document類型數(shù)據(jù)模型對比

關(guān)系型數(shù)據(jù)庫的數(shù)據(jù)模型：三范式 --> 將每個(gè)數(shù)據(jù)實(shí)體拆分為一個(gè)獨(dú)立的數(shù)據(jù)表脯丝，同時(shí)使用主外鍵關(guān)聯(lián)關(guān)系將多個(gè)數(shù)據(jù)表關(guān)聯(lián)起來 --> 確保沒有任何冗余的數(shù)據(jù)商膊；

es的數(shù)據(jù)模型：類似于面向?qū)ο蟮臄?shù)據(jù)模型，將所有由關(guān)聯(lián)關(guān)系的數(shù)據(jù)宠进，放在一個(gè)doc json類型數(shù)據(jù)中晕拆，整個(gè)數(shù)據(jù)的關(guān)系，還有完整的數(shù)據(jù)材蹬，都放在了一起实幕。

13.2 通過應(yīng)用層join實(shí)現(xiàn)用戶與博客的關(guān)聯(lián)

在構(gòu)造數(shù)據(jù)模型的時(shí)候，還是將有關(guān)聯(lián)關(guān)系的數(shù)據(jù)堤器，然后分割為不同的實(shí)體昆庇，類似于關(guān)系型數(shù)據(jù)庫中的模型

用戶信息：

PUT /website-users/1 
{
  "name":     "小魚兒",
  "email":    "xiaoyuer@sina.com",
  "birthday":      "1980-01-01"
}

用戶發(fā)布的博客

PUT /website-blogs/1
{
  "title":    "我的第一篇博客",
  "content":     "這是我的第一篇博客，開通啦Ｕ⒗！Ｕ骸拱撵！"
  "userId":     1 
}

在進(jìn)行查詢時(shí)就屬于應(yīng)用層的join，在應(yīng)用層先查出一份數(shù)據(jù)（查用戶信息）掂为，然后再查出一份數(shù)據(jù)（查詢博客信息）裕膀，進(jìn)行關(guān)聯(lián)

優(yōu)點(diǎn)和缺點(diǎn)

優(yōu)點(diǎn)：數(shù)據(jù)不冗余，維護(hù)方便
缺點(diǎn)：應(yīng)用層join勇哗，如果關(guān)聯(lián)數(shù)據(jù)過多，導(dǎo)致查詢過大寸齐，性能很差

13.3 通過數(shù)據(jù)冗余實(shí)現(xiàn)用戶與博客的關(guān)聯(lián)

PUT /website-users/1
{
  "name":     "小魚兒",
  "email":    "xiaoyuer@sina.com",
  "birthday":      "1980-01-01"
}

這里面冗余用戶名字段

PUT /website-blogs/_doc/1
{
  "title": "小魚兒的第一篇博客",
  "content": "大家好欲诺，我是小魚兒。渺鹦。扰法。",
  "userInfo": {
    "userId": 1,
    "username": "小魚兒"
  }
}

冗余數(shù)據(jù)，就是將可能會(huì)進(jìn)行搜索的條件和要搜索的數(shù)據(jù)毅厚，放在一個(gè)doc中

優(yōu)點(diǎn)和缺點(diǎn)

優(yōu)點(diǎn)：性能高塞颁，不需要執(zhí)行兩次搜索
缺點(diǎn)：數(shù)據(jù)冗余，維護(hù)成本高吸耿；比如某個(gè)字段更新后祠锣，需要更新相關(guān)的doc

13.4 對每個(gè)用戶發(fā)表的博客進(jìn)行分組

添加測試數(shù)據(jù)：

POST /website_users/_doc/3
{
  "name": "黃藥師",
  "email": "huangyaoshi@sina.com",
  "birthday": "1970-10-24"
}

PUT /website_blogs/_doc/3
{
  "title": "我是黃藥師",
  "content": "我是黃藥師啊，各位同學(xué)們Ｑ拾病０橥！",
  "userInfo": {
    "userId": 1,
    "userName": "黃藥師"
  }
}

PUT /website_users/_doc/2
{
  "name": "花無缺",
  "email": "huawuque@sina.com",
  "birthday": "1980-02-02"
}

PUT /website_blogs/_doc/4
{
  "title": "花無缺的身世揭秘",
  "content": "大家好妆棒，我是花無缺澡腾，所以我的身世是。糕珊。动分。",
  "userInfo": {
    "userId": 2,
    "userName": "花無缺"
  }
}

對每個(gè)用戶發(fā)表的博客進(jìn)行分組

GET /website_blogs/_search 
{
  "size": 0, 
  "aggs": {
    "group_by_username": {
      "terms": {
        "field": "userInfo.userName.keyword"
      },
      "aggs": {
        "top_blogs": {
          "top_hits": {
            "_source": {
              "includes": "title"
            }, 
            "size": 5
          }
        }
      }
    }
  }
}

13.5 對文件系統(tǒng)進(jìn)行數(shù)據(jù)建模以及文件搜索實(shí)戰(zhàn)

數(shù)據(jù)建模，對類似文件系統(tǒng)這種的有多層級關(guān)系的數(shù)據(jù)進(jìn)行建模

文件系統(tǒng)數(shù)據(jù)構(gòu)造

PUT /fs
{
  "settings": {
    "analysis": {
      "analyzer": {
        "paths": { 
          "tokenizer": "path_hierarchy"
        }
      }
    }
  }
}

path_hierarchy示例說明：當(dāng)文件路徑為/a/b/c/d 執(zhí)行path_hierarchy建立如下的分詞 /a/b/c/d, /a/b/c, /a/b, /a

PUT /fs/_mapping
{
  "properties": {
    "name": { 
      "type":  "keyword"
    },
    "path": { 
      "type":  "keyword",
      "fields": {
        "tree": { 
          "type":     "text",
          "analyzer": "paths"
        }
      }
    }
  }
}

添加數(shù)據(jù)

PUT /fs/_doc/1
{
  "name":     "README.txt", 
  "path":     "/workspace/projects/helloworld", 
  "contents": "這是我的第一個(gè)elasticsearch程序"
}

對文件系統(tǒng)執(zhí)行搜索

文件搜索需求：查找一份红选，內(nèi)容包括elasticsearch澜公，在/workspace/projects/hellworld這個(gè)目錄下的文件

GET /fs/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "contents": "elasticsearch"
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "path": "/workspace/projects/helloworld"
              }
            }
          }
        }
      ]
    }
  }
}

搜索需求2：搜索/workspace目錄下，內(nèi)容包含elasticsearch的所有的文件

GET /fs/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "contents": "elasticsearch"
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "path.tree": "/workspace"
              }
            }
          }
        }
      ]
    }
  }
}

13.6 基于全局鎖實(shí)現(xiàn)悲觀鎖并發(fā)控制

如果多個(gè)線程纠脾，都過來要給/workspace/projects/helloworld下的README.txt修改文件名玛瘸，需要處理出現(xiàn)多線程的并發(fā)安全問題；

全局鎖的上鎖

PUT /fs/_doc/global/_create
{}

fs: 你要上鎖的那個(gè)index
_doc: 就是你指定的一個(gè)對這個(gè)index上全局鎖的一個(gè)type
global: 就是你上的全局鎖對應(yīng)的這個(gè)doc的id
_create：強(qiáng)制必須是創(chuàng)建苟蹈，如果/fs/lock/global這個(gè)doc已經(jīng)存在糊渊，那么創(chuàng)建失敗，報(bào)錯(cuò)

刪除鎖

DELETE /fs/_doc/global

這個(gè)其實(shí)就是插入了一條帶ID的數(shù)據(jù)慧脱，操作完了再刪除渺绒，這樣其他的就可以繼續(xù)操作（如果程序掛掉了,沒有來得及刪除鎖咋整??）

13.7 基于document鎖實(shí)現(xiàn)悲觀鎖并發(fā)控制

通過腳本來加鎖，鎖具體某個(gè)ID的文檔

POST /fs/_doc/1/_update
{
  "upsert": { "process_id": 123 },
  "script": "if ( ctx._source.process_id != process_id ) { assert false }; ctx.op = 'noop';"
  "params": {
    "process_id": 123
  }
}

13.8 基于共享鎖和排他鎖實(shí)現(xiàn)悲觀鎖并發(fā)控制

共享鎖和排他鎖的說明（相當(dāng)于讀寫分離）：

共享鎖：這份數(shù)據(jù)是共享的，然后多個(gè)線程過來宗兼，都可以獲取同一個(gè)數(shù)據(jù)的共享鎖躏鱼，然后對這個(gè)數(shù)據(jù)執(zhí)行讀操作
排他鎖：是排他的操作，只能一個(gè)線程獲取排他鎖殷绍，然后執(zhí)行增刪改操作

13.9 基于nested object實(shí)現(xiàn)博客與評論嵌套關(guān)系

做一個(gè)實(shí)驗(yàn)染苛，引出來為什么需要nested object

PUT /website/_doc/6
{
  "title": "花無缺發(fā)表的一篇帖子",
  "content":  "我是花無缺，大家要不要考慮一下投資房產(chǎn)和買股票的事情啊主到。茶行。。",
  "tags":  [ "投資", "理財(cái)" ],
  "comments": [ 
    {
      "name":    "小魚兒",
      "comment": "什么股票暗窃俊畔师？推薦一下唄",
      "age":     28,
      "stars":   4,
      "date":    "2016-09-01"
    },
    {
      "name":    "黃藥師",
      "comment": "我喜歡投資房產(chǎn)，風(fēng)牧牢，險(xiǎn)大收益也大",
      "age":     31,
      "stars":   5,
      "date":    "2016-10-22"
    }
  ]
}

被年齡是28歲的黃藥師評論過的博客看锉，搜索

GET /website/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "comments.name": "黃藥師" }},
        { "match": { "comments.age":  28      }} 
      ]
    }
  }
}

這樣的查詢結(jié)果不是我們期望的

object類型底層數(shù)據(jù)結(jié)構(gòu)，會(huì)將一個(gè)json數(shù)組中的數(shù)據(jù)塔鳍，進(jìn)行扁平化伯铣；類似：

{
  "title":            [ "花無缺", "發(fā)表", "一篇", "帖子" ],
  "content":             [ "我", "是", "花無缺", "大家", "要不要", "考慮", "一下", "投資", "房產(chǎn)", "買", "股票", "事情" ],
  "tags":             [ "投資", "理財(cái)" ],
  "comments.name":    [ "小魚兒", "黃藥師" ],
  "comments.comment": [ "什么", "股票", "推薦", "我", "喜歡", "投資", "房產(chǎn)", "風(fēng)險(xiǎn)", "收益", "大" ],
  "comments.age":     [ 28, 31 ],
  "comments.stars":   [ 4, 5 ],
  "comments.date":    [ 2016-09-01, 2016-10-22 ]
}

引入nested object類型，來解決object類型底層數(shù)據(jù)結(jié)構(gòu)導(dǎo)致的問題

修改mapping献幔，將comments的類型從object設(shè)置為nested

PUT /website
{
  "mappings": {
      "properties": {
        "comments": {
          "type": "nested", 
          "properties": {
            "name":    { "type": "text"  },
            "comment": { "type": "text"  },
            "age":     { "type": "short"   },
            "stars":   { "type": "short"   },
            "date":    { "type": "date"    }
          }
        }
      }
    }
}

執(zhí)行查詢

GET /website/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "花無缺"
          }
        },
        {
          "nested": {
            "path": "comments",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "comments.name": "黃藥師"
                    }
                  },
                  {
                    "match": {
                      "comments.age": 28
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

13.10 對嵌套的博客評論數(shù)據(jù)進(jìn)行聚合分析

聚合數(shù)據(jù)分析的需求1：按照評論日期進(jìn)行bucket劃分懂傀，然后拿到每個(gè)月的評論的評分的平均值

GET /website/_search 
{
  "size": 0, 
  "aggs": {
    "comments_path": {
      "nested": {
        "path": "comments"
      }, 
      "aggs": {
        "group_by_comments_date": {
          "date_histogram": {
            "field": "comments.date",
            "calendar_interval": "month",
            "format": "yyyy-MM"
          },
          "aggs": {
            "avg_stars": {
              "avg": {
                "field": "comments.stars"
              }
            }
          }
        }
      }
    }
  }
}

查詢示例2

GET /website/_search 
{
  "size": 0,
  "aggs": {
    "comments_path": {
      "nested": {
        "path": "comments"
      },
      "aggs": {
        "group_by_comments_age": {
          "histogram": {
            "field": "comments.age",
            "interval": 10
          },
          "aggs": {
            "reverse_path": {
              "reverse_nested": {}, 
              "aggs": {
                "group_by_tags": {
                  "terms": {
                    "field": "tags.keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

十四、高級操作（使用較少）

14.1 基于term vector深入探查數(shù)據(jù)的情況

GET /twitter/tweet/1/_termvectors
GET /twitter/tweet/1/_termvectors?fields=text

GET /my_index/my_type/1/_termvectors
{
  "fields" : ["fullname"],
  "offsets" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}

14.2 深入剖析搜索結(jié)果的highlight高亮顯示

簡單示例

GET /blog_website/_search 
{
  "query": {
    "match": {
      "title": "博客"
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

<em></em>表現(xiàn)蜡感，會(huì)變成紅色蹬蚁，所以說你的指定的field中，如果包含了那個(gè)搜索詞的話郑兴，就會(huì)在那個(gè)field的文本中犀斋，對搜索詞進(jìn)行紅色的高亮顯示

GET /blog_website/blogs/_search 
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "博客"
          }
        },
        {
          "match": {
            "content": "博客"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

highlight中的field，必須跟query中的field一一對齊的

三種highlight介紹

plain highlight情连，lucene highlight叽粹，默認(rèn)

posting highlight，index_options=offsets

（1）性能比plain highlight要高却舀，因?yàn)椴恍枰匦聦Ω吡廖谋具M(jìn)行分詞
（2）對磁盤的消耗更少
（3）將文本切割為句子虫几，并且對句子進(jìn)行高亮，效果更好

GET /blog_website/_search 
{
  "query": {
    "match": {
      "content": "博客"
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

其實(shí)可以根據(jù)你的實(shí)際情況去考慮挽拔，一般情況下辆脸，用plain highlight也就足夠了，不需要做其他額外的設(shè)置螃诅；
如果對高亮的性能要求很高啡氢，可以嘗試啟用posting highlight状囱；
如果field的值特別大，超過了1M倘是，那么可以用fast vector highlight

設(shè)置高亮html標(biāo)簽亭枷，默認(rèn)是<em>標(biāo)簽

GET /blog_website/_search 
{
  "query": {
    "match": {
      "content": "博客"
    }
  },
  "highlight": {
    "pre_tags": ["<tag1>"],
    "post_tags": ["</tag2>"], 
    "fields": {
      "content": {
        "type": "plain"
      }
    }
  }
}

高亮片段fragment的設(shè)置

GET /blog_website/_search
{
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3, "no_match_size": 150 }
        }
    }
}

fragment_size: 你一個(gè)Field的值，比如有長度是1萬搀崭，但是你不可能在頁面上顯示這么長叨粘；設(shè)置要顯示出來的fragment文本判斷的長度，默認(rèn)是100瘤睹；

number_of_fragments：你可能你的高亮的fragment文本片段有多個(gè)片段宣鄙，你可以指定就顯示幾個(gè)片段

14.3 使用search template將搜索模板化

搜索模板，search template默蚌，高級功能，就可以將我們的一些搜索進(jìn)行模板化苇羡，然后的話绸吸，每次執(zhí)行這個(gè)搜索，就直接調(diào)用模板设江，給傳入一些參數(shù)就可以了

基礎(chǔ)示例

GET /website_blogs/_search/template
{
  "source": {
    "query": {
      "match": {
        "{{field}}": "{{value}}"
      }
    }
  },
  "params": {
    "field": "title",
    "value": "黃藥師"
  }
}

這個(gè)部分可以改為腳本文件锦茁，替換為"file":"search_by_title"

    "query": {
      "match": {
        "{{field}}": "{{value}}"
      }
    }

使用josn串

GET /website_blogs/_search/template
{
  "source": "{\"query\": {\"match\": {{#toJson}}matchCondition{{/toJson}}}}",
  "params": {
    "matchCondition": {
      "title": "黃藥師"
    }
  }
}

使用join

GET /website_blogs/_search/template
{
  "source": {
    "query": {
      "match": {
        "title": "{{#join delimiter=' '}}titles{{/join delimiter=' '}}"
      }
    }
  },
  "params": {
    "titles": ["黃藥師", "花無缺"]
  }
}

類比：

GET /website_blogs/_search/
{
  "query": { 
    "match" : { 
      "title" : "黃藥師 花無缺" 
    } 
  }
}

conditional

es的config/scripts目錄下，預(yù)先保存這個(gè)復(fù)雜的模板叉存，后綴名是.mustache码俩，文件名是conditonal

內(nèi)容如下：

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "line": "{{text}}" 
        }
      },
      "filter": {
        {{#line_no}} 
          "range": {
            "line_no": {
              {{#start}} 
                "gte": "{{start}}" 
                {{#end}},{{/end}} 
              {{/start}} 
              {{#end}} 
                "lte": "{{end}}" 
              {{/end}} 
            }
          }
        {{/line_no}} 
      }
    }
  }
}

查詢語句

GET /website_blogs/_search/template
{
  "file": "conditional",
  "params": {
    "text": "博客",
    "line_no": true,
    "start": 1,
    "end": 10
  }
}

保存search template

config/scripts，.mustache

提供一個(gè)思路

比如一般在大型的團(tuán)隊(duì)中歼捏，可能不同的人稿存，都會(huì)想要執(zhí)行一些類似的搜索操作；
這個(gè)時(shí)候瞳秽，有一些負(fù)責(zé)底層運(yùn)維的一些同學(xué)瓣履，就可以基于search template，封裝一些模板出來练俐，然后是放在各個(gè)es進(jìn)程的scripts目錄下的袖迎；
其他的團(tuán)隊(duì)，其實(shí)就不用各個(gè)團(tuán)隊(duì)自己反復(fù)手寫復(fù)雜的通用的查詢語句了腺晾，直接調(diào)用某個(gè)搜索模板燕锥，傳入一些參數(shù)就好了

14.4 基于completion suggest實(shí)現(xiàn)搜索提示

suggest，completion suggest悯蝉，自動(dòng)完成归形，搜索推薦，搜索提示 --> 自動(dòng)完成泉粉，auto completion

比如我們在百度连霉，搜索榴芳，你現(xiàn)在搜索“大話西游” --> 百度，自動(dòng)給你提示跺撼，“大話西游電影”窟感，“大話西游小說”， “大話西游手游”

不需要所有想要的輸入文本都輸入完歉井，搜索引擎會(huì)自動(dòng)提示你可能是你想要搜索的那個(gè)文本

初始化數(shù)據(jù)

PUT /news_website
{
  "mappings": {
      "properties" : {
        "title" : {
          "type": "text",
          "analyzer": "ik_max_word",
          "fields": {
            "suggest" : {
              "type" : "completion",
              "analyzer": "ik_max_word"
            }
          }
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
      }
    }
}

completion柿祈，es實(shí)現(xiàn)的時(shí)候，是非常高性能的哩至，其構(gòu)建不是倒排索引躏嚎，也不是正拍索引，就是單獨(dú)用于進(jìn)行前綴搜索的一種特殊的數(shù)據(jù)結(jié)構(gòu)菩貌，
而且會(huì)全部放在內(nèi)存中卢佣，所以auto completion進(jìn)行的前綴搜索提示，性能是非常高的箭阶。

PUT /news_website/_doc/1
{
  "title": "大話西游電影",
  "content": "大話西游的電影時(shí)隔20年即將在2017年4月重映"
}
PUT /news_website/_doc/2
{
  "title": "大話西游小說",
  "content": "某知名網(wǎng)絡(luò)小說作家已經(jīng)完成了大話西游同名小說的出版"
}
PUT /news_website/_doc/3
{
  "title": "大話西游手游",
  "content": "網(wǎng)易游戲近日出品了大話西游經(jīng)典IP的手游虚茶，正在火爆內(nèi)測中"
}

執(zhí)行查詢

GET /news_website/_search
{
  "suggest": {
    "my-suggest" : {
      "prefix" : "大話西游",
      "completion" : {
        "field" : "title.suggest"
      }
    }
  }
}

直接查詢

GET /news_website/_search
{
  "query": {
    "match": {
      "content": "大話西游電影"
    }
  }
}

14.5 使用動(dòng)態(tài)映射模板定制自己的映射策略

比如我們本來沒有某個(gè)type，或者沒有某個(gè)field仇参，但是希望在插入數(shù)據(jù)的時(shí)候嘹叫，es自動(dòng)為我們做一個(gè)識(shí)別，動(dòng)態(tài)映射出這個(gè)type的mapping诈乒，包括每個(gè)field的數(shù)據(jù)類型罩扇，一般用的動(dòng)態(tài)映射，dynamic mapping

這里有個(gè)問題怕磨，如果我們對dynamic mapping有一些自己獨(dú)特的需求喂饥，比如es默認(rèn)的，如經(jīng)過識(shí)別到一個(gè)數(shù)字癌压，field: 10仰泻，默認(rèn)是搞成這個(gè)field的數(shù)據(jù)類型是long，再比如說滩届，如果我們弄了一個(gè)field : "10"集侯，默認(rèn)就是text，還會(huì)帶一個(gè)keyword的內(nèi)置field帜消。我們沒法改變棠枉。

但是我們現(xiàn)在就是希望動(dòng)態(tài)映射的時(shí)候，根據(jù)我們的需求去映射泡挺，而不是讓es自己按照默認(rèn)的規(guī)則去玩兒

dyanmic mapping template辈讶，動(dòng)態(tài)映射模板

我們自己預(yù)先定義一個(gè)模板，然后插入數(shù)據(jù)的時(shí)候娄猫，相關(guān)的field，如果能夠根據(jù)我們預(yù)先定義的規(guī)則，匹配上某個(gè)我們預(yù)定義的模板讹弯，那么就會(huì)根據(jù)我們的模板來進(jìn)行mapping，決定這個(gè)Field的數(shù)據(jù)類型

根據(jù)類型匹配映射模板

動(dòng)態(tài)映射模板碍讯，有兩種方式，第一種扯躺，是根據(jù)新加入的field的默認(rèn)的數(shù)據(jù)類型捉兴，來進(jìn)行匹配，匹配上某個(gè)預(yù)定義的模板录语；
第二種倍啥，是根據(jù)新加入的field的名字，去匹配預(yù)定義的名字澎埠，或者去匹配一個(gè)預(yù)定義的通配符虽缕，然后匹配上某個(gè)預(yù)定義的模板

根據(jù)默認(rèn)類型來

PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 500
              }
            }
          }
        }
      }
    ]
  }
}

根據(jù)字段名配映射模板

PUT /my_index 
{
  "mappings": {
    "dynamic_templates": [
      {
        "string_as_integer": {
          "match_mapping_type": "string",
          "match": "long_*",
          "unmatch": "*_text",
          "mapping": {
            "type": "integer"
          }
        }
      }
    ]
  }
}

14.6 學(xué)習(xí)使用geo point地理位置數(shù)據(jù)類型

設(shè)置類型

PUT /hotel
{
  "mappings": {
    "properties": {
      "location":{
        "type": "geo_point"
      }
    }
  }
}

添加數(shù)據(jù)

PUT /hotel/_doc/1
{
  "name":"四季酒店",
  "location":{
    "lat":30.558456,
    "lon":104.073273
  }
}

lat: 緯度，lon：經(jīng)度

PUT /hotel/_doc/2
{
  "name":"成都威斯凱爾凱特酒店",
  "location":"30.5841,104.061939"
}

PUT /hotel/_doc/3
{
  "name":"北京天安門廣場",
  "location":{
    "lat":39.909187,
    "lon":116.397451
  }
}

緯度在前蒲稳，經(jīng)度在后

查詢范圍內(nèi)的數(shù)據(jù)（左上角和右下角的點(diǎn)組成的矩形內(nèi)的坐標(biāo)）

GET /hotel/_search
{
  "query": {
    "geo_bounding_box": {
      "location": {
        "top_left": {
          "lat": 40,
          "lon": 100
        },
        "bottom_right":{
           "lat": 30,
          "lon": 106
        }
      }
    }
  }
}

查詢包含成都彼宠，且在指定區(qū)域的數(shù)據(jù)

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "成都"
          }
        }
      ],
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": {
              "lat": 40,
              "lon": 100
            },
            "bottom_right": {
              "lat": 30,
              "lon": 106
            }
          }
        }
      }
    }
  }
}

搜索多個(gè)點(diǎn)組成的多邊型內(nèi)

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": [
        {
        "geo_polygon": {
          "location": {
            "points": [
              {
                "lat": 40,
              "lon": 100
              },
              {
               "lat": 30,
              "lon": 106
              },
              {
               "lat": 35,
              "lon": 120
              }
            ]
          }
        }
        }
      ]
    }
  }
}

搜索指定坐標(biāo)100km范圍內(nèi)的

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "100km",
       
            "location": {
              "lat": 30,
              "lon": 116
            }
          }
        }
      ]
    }
  }
}

統(tǒng)計(jì)距離100~300米內(nèi)的酒店數(shù)

GET /hotel/_search
{
"size": 0, 
  "aggs": {
    "agg_by_distance_range": {
      "geo_distance": {
        "field": "location",
        "origin": {
          "lat": 30,
          "lon": 106
        },
        "unit": "mi", 
        "ranges": [
          {
            "from": 100,
            "to": 300
          }
        ]
      }
    }
  }
}

十五、熟練掌握ES Java API

15.1 集群自動(dòng)探查以及汽車零售店案例背景

client集群自動(dòng)探查

默認(rèn)情況下弟塞，是根據(jù)我們手動(dòng)指定的所有節(jié)點(diǎn)，依次輪詢這些節(jié)點(diǎn)拙已，來發(fā)送各種請求的决记，如下面的代碼，我們可以手動(dòng)為client指定多個(gè)節(jié)點(diǎn)

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("192.168.6.1", 9200)
    , new HttpHost("192.168.6.2", 9200)));

但是問題是倍踪，如果我們有成百上千個(gè)節(jié)點(diǎn)呢系宫？難道也要這樣手動(dòng)添加嗎？

因此es client提供了一種集群節(jié)點(diǎn)自動(dòng)探查的功能建车，打開這個(gè)自動(dòng)探查機(jī)制以后扩借，es client會(huì)根據(jù)我們手動(dòng)指定的幾個(gè)節(jié)點(diǎn)連接過去，
然后通過集群狀態(tài)自動(dòng)獲取當(dāng)前集群中的所有data node缤至，然后用這份完整的列表更新自己內(nèi)部要發(fā)送請求的node list潮罪。
默認(rèn)每隔5秒鐘，就會(huì)更新一次node list领斥。

    // 老版本的寫法
    Settings settings = Settings.builder()
            .put("cluster.name", "docker-cluster")
            // 設(shè)置集群節(jié)點(diǎn)自動(dòng)發(fā)現(xiàn)
            .put("client.transport.sniff", true)
            .build();

注意嫉到，es client是不會(huì)將Master node納入node list的，因?yàn)橐苊饨omaster node發(fā)送搜索等請求月洛。

這樣的話何恶，我們其實(shí)直接就指定幾個(gè)master node，或者1個(gè)node就好了嚼黔，client會(huì)自動(dòng)去探查集群的所有節(jié)點(diǎn)细层，而且每隔5秒還會(huì)自動(dòng)刷新惜辑。

15.2 基于upsert實(shí)現(xiàn)汽車最新價(jià)格的調(diào)整

建立mapper

PUT /car_shop
{
  "mappings": {
      "properties": {
        "brand": {
          "type": "text",
          "analyzer": "ik_max_word",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "name": {
          "type": "text",
          "analyzer": "ik_max_word",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        }
      }
    }
}

Java代碼實(shí)現(xiàn)存在則更新否則添加

IndexRequest indexRequest = new IndexRequest("car_shop");
        indexRequest.id("1");
        indexRequest.source(XContentFactory.jsonBuilder()
                .startObject()
                .field("brand", "寶馬")
                .field("name", "寶馬320")
                .field("price", 320000)
                .field("produce_date", "2020-01-01")
                .endObject());

UpdateRequest updateRequest = new UpdateRequest("car_shop", "1");
updateRequest.doc(XContentFactory.jsonBuilder()
        .startObject()
        .field("price", 320000)
        .endObject()).upsert(indexRequest);

UpdateResponse response = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
System.out.println(response.getResult());

15.3 基于mget實(shí)現(xiàn)多輛汽車的配置與價(jià)格對比

場景：一般我們都可以在一些汽車網(wǎng)站上，或者在混合銷售多個(gè)品牌的汽車4S店的內(nèi)部疫赎，都可以在系統(tǒng)里調(diào)出來多個(gè)汽車的信息盛撑，放在網(wǎng)頁上，進(jìn)行對比

mget：一次性將多個(gè)document的數(shù)據(jù)查詢出來虚缎，放在一起顯示撵彻。

PUT /car_shop/_doc/2
{
    "brand": "奔馳",
    "name": "奔馳C200",
    "price": 350000,
    "produce_date": "2020-01-05"
}

Java代碼：

MultiGetRequest multiGetRequest = new MultiGetRequest();
multiGetRequest.add("car_shop", "1");
multiGetRequest.add("car_shop", "2");

MultiGetResponse multiGetResponse = restHighLevelClient.mget(multiGetRequest, RequestOptions.DEFAULT);
MultiGetItemResponse[] responses = multiGetResponse.getResponses();
for(MultiGetItemResponse response:responses){
    System.out.println(response.getResponse().getSourceAsMap());
}

15.4 基于bulk實(shí)現(xiàn)多4S店銷售數(shù)據(jù)批量上傳

業(yè)務(wù)場景：有一個(gè)汽車銷售公司，擁有很多家4S店实牡，這些4S店的數(shù)據(jù)陌僵，都會(huì)在一段時(shí)間內(nèi)陸續(xù)傳遞過來，汽車的銷售數(shù)據(jù)创坞，
現(xiàn)在希望能夠在內(nèi)存中緩存比如1000條銷售數(shù)據(jù)碗短，然后一次性批量上傳到es中去。

Java代碼：

BulkRequest bulkRequest = new BulkRequest();

// 添加數(shù)據(jù)
JSONObject car = new JSONObject();
car.put("brand", "奔馳");
car.put("name", "奔馳C200");
car.put("price", 350000);
car.put("produce_date", "2020-01-05");
car.put("sale_price", 360000);
car.put("sale_date", "2020-02-03");
bulkRequest.add(new IndexRequest("car_sales").id("3").source(car.toJSONString(), XContentType.JSON));

// 更新數(shù)據(jù)
bulkRequest.add(new UpdateRequest("car_shop", "2").doc(jsonBuilder()
        .startObject()
        .field("sale_price", "290000")
        .endObject()));

// 刪除數(shù)據(jù)
bulkRequest.add(new DeleteRequest("car_shop").id("1"));

BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println(bulk.hasFailures() +" " +bulk.buildFailureMessage());

15.5 基于scroll實(shí)現(xiàn)月度銷售數(shù)據(jù)批量下載

當(dāng)需要從es中下載大批量的數(shù)據(jù)時(shí)题涨，比如說做業(yè)務(wù)報(bào)表時(shí)需要將數(shù)據(jù)導(dǎo)出到Excel中偎谁，如果數(shù)據(jù)有幾十萬甚至是上百萬條數(shù)據(jù)，此時(shí)可以使用scroll對大量的數(shù)據(jù)批量的獲取和處理

// 創(chuàng)建查詢請求纲堵，設(shè)置index
SearchRequest searchRequest = new SearchRequest("car_shop");
// 設(shè)定滾動(dòng)時(shí)間間隔,60秒,不是處理查詢結(jié)果的所有文檔的所需時(shí)間
// 游標(biāo)查詢的過期時(shí)間會(huì)在每次做查詢的時(shí)候刷新巡雨，所以這個(gè)時(shí)間只需要足夠處理當(dāng)前批的結(jié)果就可以了
searchRequest.scroll(TimeValue.timeValueMillis(60000));

// 構(gòu)建查詢條件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("brand", "奔馳"));
// 每個(gè)批次實(shí)際返回的數(shù)量
searchSourceBuilder.size(2);
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

// 獲取第一頁的
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();

int page = 1;
//遍歷搜索命中的數(shù)據(jù)，直到?jīng)]有數(shù)據(jù)
while (searchHits != null && searchHits.length > 0) {
    System.out.println(String.format("--------第%s頁-------", page++));
    for (SearchHit searchHit : searchHits) {
        System.out.println(searchHit.getSourceAsString());
    }
    System.out.println("=========================");

    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
    scrollRequest.scroll(TimeValue.timeValueMillis(60000));
    try {
        searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }

    scrollId = searchResponse.getScrollId();
    searchHits = searchResponse.getHits().getHits();
}

// 清除滾屏任務(wù)
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
// 也可以選擇setScrollIds()將多個(gè)scrollId一起使用
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
System.out.println("succeeded:" + clearScrollResponse.isSucceeded());

所有數(shù)據(jù)獲取完畢之后席函，需要手動(dòng)清理掉 scroll_id 铐望。
雖然es 會(huì)有自動(dòng)清理機(jī)制，但是 scroll_id 的存在會(huì)耗費(fèi)大量的資源來保存一份當(dāng)前查詢結(jié)果集映像茂附，并且會(huì)占用文件描述符正蛙。所以用完之后要及時(shí)清理

15.6 基于search template實(shí)現(xiàn)按品牌分頁查詢模板

Map<String, Object> params = new HashMap<>(1);
params.put("brand", "奔馳");

SearchTemplateRequest templateRequest = new SearchTemplateRequest();
templateRequest.setScript("{\n" +
        "  \"query\": {\n" +
        "    \"match\": {\n" +
        "      \"brand\": \"{{brand}}\" \n" +
        "    }\n" +
        "  }\n" +
        "}\n");
templateRequest.setScriptParams(params);
templateRequest.setScriptType(ScriptType.INLINE);
templateRequest.setRequest(new SearchRequest("car_shop"));

SearchTemplateResponse templateResponse = restHighLevelClient.searchTemplate(templateRequest, RequestOptions.DEFAULT);
SearchHit[] hits = templateResponse.getResponse().getHits().getHits();
if(null!=hits && hits.length!=0){
    for (SearchHit hit : hits) {
        System.out.println(hit.getSourceAsString());
    }
}else {
    System.out.println("無符合條件的數(shù)據(jù)");
}

15.7 對汽車品牌進(jìn)行全文檢索、精準(zhǔn)查詢和前綴搜索

@Test
public void fullSearch() throws IOException {
    
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchQuery("brand", "奔馳"));
    search(searchSourceBuilder);
    System.out.println("-----------------------------");

    searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.multiMatchQuery("寶馬", "brand", "name"));
    search(searchSourceBuilder);
    System.out.println("-----------------------------");

    searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.prefixQuery("name", "奔"));
    search(searchSourceBuilder);
    System.out.println("-----------------------------");

}

private void search(SearchSourceBuilder searchSourceBuilder) throws IOException {
    SearchRequest searchRequest = new SearchRequest("car_shop");
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] searchHits = searchResponse.getHits().getHits();
    if(searchHits!=null && searchHits.length!=0){
        for (SearchHit searchHit : searchHits) {
            System.out.println(searchHit.getSourceAsString());
        }
    }
}

15.8 對汽車品牌進(jìn)行多種的條件組合搜索

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.boolQuery()
            .must(QueryBuilders.matchQuery("brand", "奔馳"))
            .mustNot(QueryBuilders.termQuery("name.raw", "奔馳C203"))
            .should(QueryBuilders.termQuery("produce_date", "2020-01-02"))
            .filter(QueryBuilders.rangeQuery("price").gte("280000").lt("500000"))
    );
    
SearchRequest searchRequest = new SearchRequest("car_shop");
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] searchHits = searchResponse.getHits().getHits();
if(searchHits!=null && searchHits.length!=0){
 for (SearchHit searchHit : searchHits) {
     System.out.println(searchHit.getSourceAsString());
 }
}

基于地理位置對周圍汽車4S店進(jìn)行搜索

需要將字段類型設(shè)置坐標(biāo)類型

POST /car_shop/_mapping
{
  "properties": {
      "pin": {
          "properties": {
              "location": {
                  "type": "geo_point"
              }
          }
      }
  }
}

添加數(shù)據(jù)

PUT /car_shop/_doc/5
{
    "name": "上海至全寶馬4S店",
    "pin" : {
        "location" : {
            "lat" : 40.12,
            "lon" : -71.34
        }
    }
}

搜索兩個(gè)坐標(biāo)點(diǎn)組成的一個(gè)區(qū)域

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.geoBoundingBoxQuery("pin.location")
        .setCorners(40.73, -74.1, 40.01, -71.12));

指定一個(gè)區(qū)域营曼，由三個(gè)坐標(biāo)點(diǎn)乒验，組成，比如上海大廈蒂阱，東方明珠塔锻全，上海火車站

searchSourceBuilder = new SearchSourceBuilder();
List<GeoPoint> points = new ArrayList<>();
points.add(new GeoPoint(40.73, -74.1));
points.add(new GeoPoint(40.01, -71.12));
points.add(new GeoPoint(50.56, -90.58));
searchSourceBuilder.query(QueryBuilders.geoPolygonQuery("pin.location", points));

搜索距離當(dāng)前位置在200公里內(nèi)的4s店

searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.geoDistanceQuery("pin.location")
        .point(40, -70).distance(200, DistanceUnit.KILOMETERS));