主要內(nèi)容: 修改以及定制分詞器厨姚,root object簡(jiǎn)單說(shuō)明衅澈, dynamic mapping(動(dòng)態(tài)映射)
1、修改以及定制分詞器
1.1谬墙、默認(rèn)的分詞器 standard
standard tokenizer:以單詞邊界進(jìn)行切分
standard token filter:什么都不做
lowercase token filter:將所有字母轉(zhuǎn)換為小寫(xiě)
stop token filer(默認(rèn)被禁用):移除停用詞今布,比如a、the拭抬、 it等等
1.2部默、修改分詞器的設(shè)置
啟用 english的停用詞token filter
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
? 可以試著運(yùn)行下列兩個(gè)方法,觀察區(qū)別
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}
GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}
1.3造虎、定制化自己的分詞器
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": { //自定義一個(gè)char_filter 傅蹂,將&符號(hào)轉(zhuǎn)化為and
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and"
]
}
},
"filter": { //自定義停用詞,
"my_stopwords": {
"type": "stop",
"stopwords": [
"the", // the a 為停用詞
"a"
]
}
},
"analyzer": { 自定義分詞器
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stopwords"
]
}
}
}
}
}
測(cè)試一下算凿,觀察結(jié)果
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
"analyzer": "my_analyzer"
}
使用自己定義的分詞器
PUT /my_index/_mapping
{
"properties": {
"content": { //對(duì)content字段使用自定義的分詞器
"type": "text",
"analyzer": "my_analyzer"
}
}
}
2份蝴、root object
2.1、root object概念
就是某個(gè)type對(duì)應(yīng)的mapping json氓轰,包括了properties婚夫,metadata(_id,_source署鸡,_type)案糙,settings(analyzer),其他settings(比如include_in_all)
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {}
}
}
}
2.2靴庆、properties
type时捌,index,analyzer
PUT /my_index/_mapping/
{
"properties": {
"title": {
"type": "text"
}
}
}
2.3撒穷、_source
好處
(1)查詢的時(shí)候匣椰,直接可以拿到完整的document,不需要先拿document id端礼,再發(fā)送一次請(qǐng)求拿document
(2)partial update基于_source實(shí)現(xiàn)
(3)reindex時(shí)禽笑,直接基于_source實(shí)現(xiàn)入录,不需要從數(shù)據(jù)庫(kù)(或者其他外部存儲(chǔ))查詢數(shù)據(jù)再修改
(4)可以基于_source定制返回field
(5)debug query更容易,因?yàn)榭梢灾苯涌吹絖source
如果不需要上述好處佳镜,可以禁用_source
PUT /my_index/_mapping
{
"_source": {
"enabled": false
}
}
2.4僚稿、_all
將所有field打包在一起,作為一個(gè)_all field蟀伸,建立索引蚀同。沒(méi)指定任何field進(jìn)行搜索時(shí),就是使用_all field在搜索啊掏。
···
PUT /my_index/_mapping/my_type3
{
"_all": {"enabled": false}
}
也可以在field級(jí)別設(shè)置include_in_all field蠢络,設(shè)置是否要將field的值包含在_all field中
PUT /my_index/_mapping/my_type4
{
"properties": {
"my_field": {
"type": "text",
"include_in_all": false
}
}
}
3、dynamic mapping定制化策略
3.1迟蜜、定制dynamic策略
true:遇到陌生字段刹孔,就進(jìn)行dynamic mapping
false:遇到陌生字段,就忽略
strict:遇到陌生字段娜睛,就報(bào)錯(cuò)
PUT /my_index
{
"mappings": {
"dynamic": "strict",
"properties": {
"title": {
"type": "text"
},
"address": {
"type": "object",
"dynamic": "true"
}
}
}
}
嘗試插入content字段髓霞,會(huì)提示content字段不被允許
PUT /my_index/_doc/1
{
"title": "my article",
"content": "this is my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
}
}
address字段則沒(méi)有這個(gè)問(wèn)題,因?yàn)樵O(shè)為dynamic畦戒,可以動(dòng)態(tài)插入
PUT /my_index/my_type/1
{
"title": "my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
}
}
3.2 定制dynamic maping策略
(1)date_detection
默認(rèn)會(huì)按照一定格式識(shí)別date方库,比如yyyy-MM-dd。但是如果某個(gè)field先過(guò)來(lái)一個(gè)2017-01-01的值障斋,就會(huì)被自動(dòng)dynamic mapping成date纵潦,后面如果再來(lái)一個(gè)"hello world"之類(lèi)的值,就會(huì)報(bào)錯(cuò)垃环±掖可以手動(dòng)關(guān)閉某個(gè)index的date_detection,如果有需要晴裹,自己手動(dòng)指定某個(gè)field為date類(lèi)型。
PUT /my_index/_mapping
{
"date_detection": false
}
(2)定制自己的dynamic mapping template(type level)(動(dòng)態(tài)映射模板)
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"longs_as_strings": {
"match_mapping_type": "string",
"match": "long_*",
"unmatch": "*_text",
"mapping": {
"type": "long"
}
}
}
]
}
}
插入數(shù)據(jù)
PUT my_index/_doc/1
{
"long_num": "5",
"long_text": "foo"
}
long_num
會(huì)轉(zhuǎn)化成long
long_text
會(huì)是默認(rèn)的string
更多操作參見(jiàn)官方文檔
Dynamic templates | Elasticsearch Reference [7.6] | Elastic https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dynamic-templates.html