1、實現(xiàn)關系型數(shù)據(jù)庫中的三范式
三范式 --> 將每個數(shù)據(jù)實體拆分為一個獨立的數(shù)據(jù)表儡炼,同時使用主外鍵關聯(lián)關系將多個數(shù)據(jù)表關聯(lián)起來 --> 確保沒有任何冗余的數(shù)據(jù)妓湘。
冗余數(shù)據(jù),就是說乌询,將可能會進行搜索的條件和要搜索的數(shù)據(jù)榜贴,放在一個doc中
無冗余數(shù)據(jù)優(yōu)點和缺點
優(yōu)點:數(shù)據(jù)不冗余,維護方便
缺點:應用層join妹田,如果關聯(lián)數(shù)據(jù)過多唬党,導致查詢過大,性能很差
有冗余數(shù)據(jù)優(yōu)點和缺點
優(yōu)點:性能高鬼佣,不需要執(zhí)行兩次搜索
缺點:數(shù)據(jù)冗余驶拱,維護成本高 --> 每次如果你的username變化了,同時要更新user type和blog type
(1)晶衷、構造更多測試數(shù)據(jù)
PUT /website/users/3
{
"name": "黃藥師",
"email": "huangyaoshi@sina.com",
"birthday": "1970-10-24"
}
PUT /website/blogs/3
{
"title": "我是黃藥師",
"content": "我是黃藥師啊蓝纲,各位同學們R趺稀!税迷!",
"userInfo": {
"userId": 1,
"userName": "黃藥師"
}
}
PUT /website/users/2
{
"name": "花無缺",
"email": "huawuque@sina.com",
"birthday": "1980-02-02"
}
PUT /website/blogs/4
{
"title": "花無缺的身世揭秘",
"content": "大家好永丝,我是花無缺,所以我的身世是箭养。慕嚷。。",
"userInfo": {
"userId": 2,
"userName": "花無缺"
}
}
(2)毕泌、對每個用戶發(fā)表的博客進行分組
GET /website/blogs/_search
{
"size": 0,
"aggs": {
"group_by_username": {
"terms": {
"field": "userInfo.username.keyword"
},
"aggs": {
"top_blogs": {
"top_hits": {
"_source": {
"include": "title"
},
"size": 5
}
}
}
}
}
}
2闯冷、對類似文件系統(tǒng)這種的有多層級關系的數(shù)據(jù)進行建模
(1)、path_hierarchy:對文本是文件目錄形式的進行目錄分詞
例:
PUT /fs
{
"settings": {
"analysis": {
"analyzer": {
"paths":{
"tokenizer":"path_hierarchy"
}
}
}
}
}
測試:
GET /fs/_analyze
{
"analyzer": "paths",
"text": "a/b/c"
}
結果:
{
"tokens": [
{
"token": "a",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "a/b",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "a/b/c",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
(2)實例操作
PUT /fs/_mapping/file
{
"properties": {
"name": {
"type": "keyword"
},
"path": {
"type": "keyword",
"fields": {
"tree": {
"type": "text",
"analyzer": "paths"
}
}
}
}
}
PUT /fs/file/1
{
"name": "README.txt",
"path": "/workspace/projects/helloworld",
"contents": "這是我的第一個elasticsearch程序"
}
PUT /fs/file/2
{
"name": "README.txt",
"path": "/workspace/projects/helloworld2",
"contents": "這是我的第一個elasticsearch程序"
}
文件搜索需求:查找一份懈词,內容包括elasticsearch蛇耀,在/workspace/projects/hellworld這個目錄下的文件,以下查詢,是以不分形式去查
GET /fs/file/_search
{
"query": {
"bool": {
"must": [
{"match": {
"contents": "elasticsearch"
}},
{"constant_score": {
"filter": {
"term": {
"path": "/workspace/projects/helloworld"
}
}
}}
]
}
}
}
搜索/workspace目錄下坎弯,內容包含elasticsearch的所有的文件纺涤,如果用上面的,path改成到workspace抠忘,是查不到數(shù)據(jù)的撩炊,使用path_hierarchy的分詞就可以查出來
GET /fs/file/_search
{
"query": {
"bool": {
"must": [
{"match": {
"contents": "elasticsearch"
}},
{"constant_score": {
"filter": {
"term": {
"path.tree": "/workspace"
}
}
}}
]
}
}
}
這樣兩個都可以查出來
3、全局鎖實現(xiàn)悲觀鎖并發(fā)控制崎脉,就是用_create語法
PUT /fs/lock/global/_create
{}
fs: 你要上鎖的那個index
lock: 就是你指定的一個對這個index上全局鎖的一個type
global: 就是你上的全局鎖對應的這個doc的id
_create:強制必須是創(chuàng)建拧咳,如果/fs/lock/global這個doc已經(jīng)存在,那么創(chuàng)建失敗囚灼,報錯
另外一個線程同時嘗試上鎖會報錯
PUT /fs/lock/global/_create
{}
全局鎖的優(yōu)點和缺點
優(yōu)點:操作非常簡單骆膝,非常容易使用,成本低
缺點:你直接就把整個index給上鎖了灶体,這個時候對index中所有的doc的操作阅签,都會被block住,導致整個系統(tǒng)的并發(fā)能力很低
上鎖解鎖的操作不是頻繁蝎抽,然后每次上鎖之后政钟,執(zhí)行的操作的耗時不會太長,用這種方式樟结,方便
上了鎖之后养交,另一個還是可以進行操作,是不是有問題瓢宦?
這種鎖只對create啟作用碎连,修改新增還是沒有用,只能靠version來按制
4刁笙、document鎖實現(xiàn)悲觀鎖并發(fā)控制
(1)破花、document鎖,是用腳本進行上鎖
document鎖疲吸,顧名思義座每,每次就鎖你要操作的,你要執(zhí)行增刪改的那些doc摘悴,doc鎖了峭梳,其他線程就不能對這些doc執(zhí)行增刪改操作了
POST /fs/lock/1/_update
{
"upsert": { "process_id": 123 },
"script": "if ( ctx._source.process_id != process_id ) { assert false }; ctx.op = 'noop';"
"params": {
"process_id": 123
}
}
/fs/lock,是固定的蹂喻,就是說fs下的lock type葱椭,專門用于進行上鎖
/fs/lock/id,比如1口四,id其實就是你要上鎖的那個doc的id孵运,代表了某個doc數(shù)據(jù)對應的lock(也是一個doc)
params,里面有個process_id蔓彩,是你的要執(zhí)行增刪改操作的進程的唯一id,很重要治笨,會在lock中,設置對對應的doc加鎖的進程的id赤嚼,這樣其他進程過來的時候旷赖,才知道,這條數(shù)據(jù)已經(jīng)被別人給鎖了
assert false更卒,不是當前進程加鎖的話紧憾,則拋出異常
ctx.op='noop'认境,不做任何修改
(2)document鎖的完整實驗過程
上鎖:
POST /fs/lock/1/_update
{
"upsert": {"process_id":321},
"script": {
"lang": "groovy",
"file": "judge-lock",
"params": {"paocess_id":321}
}
}
釋放鎖:
POST /fs/_refresh
好像沒啥用只是同時不能_update而已,其他線程也可以增冊改?柿汛??蔽午?樟遣??姿骏?糖声?
共享鎖,就是用_update語法分瘦,只是上鎖數(shù)據(jù)不能一樣
5蘸泻、基于nested object實現(xiàn)博客與評論嵌套關系
(1)、為什么需要nested object
冗余數(shù)據(jù)方式的來建模嘲玫,其實用的就是object類型悦施,我們這里又要引入一種新的object類型,nested object類型
PUT /website/blogs/6
{
"title": "花無缺發(fā)表的一篇帖子",
"content": "我是花無缺去团,大家要不要考慮一下投資房產(chǎn)和買股票的事情啊抡诞。穷蛹。。",
"tags": [ "投資", "理財" ],
"comments": [
{
"name": "小魚兒",
"comment": "什么股票爸绾埂肴熏?推薦一下唄",
"age": 28,
"stars": 4,
"date": "2016-09-01"
},
{
"name": "黃藥師",
"comment": "我喜歡投資房產(chǎn),風顷窒,險大收益也大",
"age": 31,
"stars": 5,
"date": "2016-10-22"
}
]
}
例蛙吏,查出博客評論是黃藥師并且年齡是28的
GET /website/blogs/_search
{
"query": {
"bool": {
"must": [
{"match": {
"comments.name": "黃藥師"
}},
{
"match": {
"comments.age": 28
}
}
]
}
}
}
按理是不應該出來的
(2)、object類型數(shù)據(jù)結構的底層存儲鞋吉。鸦做。。
{
"title": [ "花無缺", "發(fā)表", "一篇", "帖子" ],
"content": [ "我", "是", "花無缺", "大家", "要不要", "考慮", "一下", "投資", "房產(chǎn)", "買", "股票", "事情" ],
"tags": [ "投資", "理財" ],
"comments.name": [ "小魚兒", "黃藥師" ],
"comments.comment": [ "什么", "股票", "推薦", "我", "喜歡", "投資", "房產(chǎn)", "風險", "收益", "大" ],
"comments.age": [ 28, 31 ],
"comments.stars": [ 4, 5 ],
"comments.date": [ 2016-09-01, 2016-10-22 ]
}
object類型底層數(shù)據(jù)結構谓着,會將一個json數(shù)組中的數(shù)據(jù)泼诱,進行扁平化,所以這樣一找赊锚,整個貼子都出來了
(3)坷檩、引入nested object類型,來解決object類型底層數(shù)據(jù)結構導致的問題
修改mapping改抡,將comments的類型從object設置為nested
PUT /website
{
"mappings": {
"blogs": {
"properties": {
"comments": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"comment": { "type": "string" },
"age": { "type": "short" },
"stars": { "type": "short" },
"date": { "type": "date" }
}
}
}
}
}
}
底部是單獨存儲
{
"comments.name": [ "小魚兒" ],
"comments.comment": [ "什么", "股票", "推薦" ],
"comments.age": [ 28 ],
"comments.stars": [ 4 ],
"comments.date": [ 2014-09-01 ]
}
{
"comments.name": [ "黃藥師" ],
"comments.comment": [ "我", "喜歡", "投資", "房產(chǎn)", "風險", "收益", "大" ],
"comments.age": [ 31 ],
"comments.stars": [ 5 ],
"comments.date": [ 2014-10-22 ]
}
{
"title": [ "花無缺", "發(fā)表", "一篇", "帖子" ],
"body": [ "我", "是", "花無缺", "大家", "要不要", "考慮", "一下", "投資", "房產(chǎn)", "買", "股票", "事情" ],
"tags": [ "投資", "理財" ]
}
GET /website/blogs/_search
{
"query": {
"bool": {
"must": [
{"match": {
"title": "花無缺"
}},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{"match": {
"comments.name":"黃藥師"
}},
{
"match": {
"comments.age": 28
}
}
]
}
}
}
}
]
}
}
}
這樣就查不出來了
(4)矢炼、聚合數(shù)據(jù)分析的需求1:按照評論日期進行bucket劃分,然后拿到每個月的評論的評分的平均值
GET /website/blogs/_search
{
"size": 0,
"aggs": {
"comments_path": {
"nested": {
"path": "comments"
},
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "comments.date",
"interval": "month",
"format": "yyyy-MM-dd"
},
"aggs": {
"avg_stars": {
"avg": {
"field": "comments.stars"
}
}
}
}
}
}
}
}
(5)阿纤、reverse_nested句灌,可以在聚合后使用外層的buckets進行聚合
GET /website/blogs/_search
{
"size": 0,
"aggs": {
"comments_path": {
"nested": {
"path": "comments"
},
"aggs": {
"group_age": {
"histogram": {
"field": "comments.age",
"interval": 10
},
"aggs": {
"reverse_path": {
"reverse_nested": {},
"aggs": {
"group_tags": {
"terms": {
"field": "tags.keyword"
}
}
}
}
}
}
}
}
}
}
6、及父子關系數(shù)據(jù)建模
Object及nested object的建模欠拾,有個不好的地方胰锌,就是采取的是類似冗余數(shù)據(jù)的方式,將多個數(shù)據(jù)都放在一起了藐窄,維護成本就比較高
parent child建模方式资昧,采取的是類似于關系型數(shù)據(jù)庫的三范式類的建模,多個實體都分割開來荆忍,每個實體之間都通過一些關聯(lián)方式格带,進行了父子關系的關聯(lián),各種數(shù)據(jù)不需要都放在一起刹枉,父doc和子doc分別在進行更新的時候叽唱,都不會影響對方
(1)、案例背景:研發(fā)中心員工管理案例微宝,一個IT公司有多個研發(fā)中心棺亭,每個研發(fā)中心有多個員工
建立關系映射:父子關系建模的核心,多個type之間有父子關系蟋软,用_parent指定父type
PUT /company
{
"mappings": {
"rd_center": {},
"employee": {
"_parent": {
"type": "rd_center"
}
}
}
}
POST /company/rd_center/_bulk
{ "index": { "_id": "1" }}
{ "name": "北京研發(fā)總部", "city": "北京", "country": "中國" }
{ "index": { "_id": "2" }}
{ "name": "上海研發(fā)中心", "city": "上海", "country": "中國" }
{ "index": { "_id": "3" }}
{ "name": "硅谷人工智能實驗室", "city": "硅谷", "country": "美國" }
shard路由的時候镶摘,id=1的rd_center doc嗽桩,默認會根據(jù)id進行路由,到某一個shard
PUT /company/employee/1?parent=1
{
"name": "張三",
"birthday": "1970-10-24",
"hobby": "爬山"
}
維護父子關系的核心凄敢,parent=1碌冶,指定了這個數(shù)據(jù)的父doc的id
POST /company/employee/_bulk
{ "index": { "_id": 2, "parent": "1" }}
{ "name": "李四", "birthday": "1982-05-16", "hobby": "游泳" }
{ "index": { "_id": 3, "parent": "2" }}
{ "name": "王二", "birthday": "1979-04-01", "hobby": "爬山" }
{ "index": { "_id": 4, "parent": "3" }}
{ "name": "趙五", "birthday": "1987-05-11", "hobby": "騎馬" }
(2)驗證
搜索有1980年以后出生的員工的研發(fā)中心
GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"query": {
"range": {
"birthday": {
"gte": "1980-01-01"
}
}
}
}
}
}
搜索有名叫張三的員工的研發(fā)中心
GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"query": {
"match": {
"name":"張三"
}
}
}
}
}
搜索有至少2個以上員工的研發(fā)中心
GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"min_children": 2,
"query": {
"match_all": {}
}
}
}
}
搜索在中國的研發(fā)中心的員工
GET /company/employee/_search
{
"query": {
"has_parent": {
"parent_type": "rd_center",
"query": {
"term": {
"country.keyword": {
"value": "中國"
}
}
}
}
}
}
統(tǒng)計每個國家的有多少個員工,有那些愛好
GET /company/rd_center/_search
{
"size": 0,
"aggs": {
"group_country": {
"terms": {
"field": "country.keyword"
},
"aggs": {
"group_employee": {
"children": {
"type": "employee"
},
"aggs": {
"group_hobby": {
"terms": {
"field": "hobby.keyword"
}
}
}
}
}
}
}
}
7贡未、祖孫三層關系的數(shù)據(jù)建模,搜索
PUT /company
{
"mappings": {
"country": {},
"rd_center": {
"_parent": {
"type": "country"
}
},
"employee": {
"_parent": {
"type": "rd_center"
}
}
}
}
country -> rd_center -> employee蒙袍,祖孫三層數(shù)據(jù)模型
POST /company/country/_bulk
{ "index": { "_id": "1" }}
{ "name": "中國" }
{ "index": { "_id": "2" }}
{ "name": "美國" }
POST /company/rd_center/_bulk
{ "index": { "_id": "1", "parent": "1" }}
{ "name": "北京研發(fā)總部" }
{ "index": { "_id": "2", "parent": "1" }}
{ "name": "上海研發(fā)中心" }
{ "index": { "_id": "3", "parent": "2" }}
{ "name": "硅谷人工智能實驗室" }
PUT /company/employee/1?parent=1&routing=1
{
"name": "張三",
"dob": "1970-10-24",
"hobby": "爬山"
}
routing參數(shù)的講解俊卤,必須跟grandparent相同,否則有問題
country害幅,用的是自己的id去路由; rd_center消恍,parent,用的是country的id去路由; employee以现,如果也是僅僅指定一個parent狠怨,那么用的是rd_center的id去路由,這就導致祖孫三層數(shù)據(jù)不會在一個shard上,孫子輩兒邑遏,要手動指定routing佣赖,指定為爺爺輩兒的數(shù)據(jù)的id
搜索有爬山愛好的員工所在的國家
GET /company/country/_search
{
"query": {
"has_child": {
"type": "rd_center",
"query": {
"has_child": {
"type": "employee",
"query": {
"match": {
"hobby": "爬山"
}
}
}
}
}
}
}