1 .什么是文檔

在 Elasticsearch 中税娜，術(shù)語文檔有著特定的含義魁兼。它是指最頂層或者根對象, 這個(gè)根對象被序列化成 JSON 并存儲到 Elasticsearch 中婉徘，指定了唯一 ID。

2.文檔元數(shù)據(jù)

一個(gè)文檔不僅僅包含它的數(shù)據(jù)，也包含元數(shù)據(jù)——有關(guān)文檔的信息盖呼。三個(gè)必須的元數(shù)據(jù)元素如下：

_index 文檔在哪存放
_type 文檔表示的對象類別
_id 文檔唯一標(biāo)識
其他元數(shù)據(jù)另做介紹

3. 索引文檔

一個(gè)文檔的上面三個(gè)元數(shù)據(jù)唯一標(biāo)識一個(gè)文檔儒鹿。我們可以自己提供_id，也可以讓index API自動生成几晤。

PUT /{index}/{type}/{id}
{
  "field": "value",
  ...
}

示例：

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "Just trying this out...",
  "date":  "2014/01/01"
}

返回值：

{
   "_index":    "website",
   "_type":     "blog",
   "_id":       "123",
   "_version":  1,
   "created":   true
}

如果沒有提供_id约炎，Elasticsearch 可以幫我們自動生成 ID 。

POST /website/blog/
{
  "title": "My second blog entry",
  "text":  "Still trying this out...",
  "date":  "2014/01/01"
}

返回值：

{
   "_index":    "website",
   "_type":     "blog",
   "_id":       "AVFgSgVHUP18jI2wRx0w",
   "_version":  1,
   "created":   true
}

自動生成的 ID 是 URL-safe蟹瘾、基于 Base64 編碼且長度為20個(gè)字符的 GUID 字符串.

4.取回一個(gè)文檔

GET /website/blog/123?pretty
{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :    true,
  "_source" :  {
      "title": "My first blog entry",
      "text":  "Just trying this out...",
      "date":  "2014/01/01"
  }
}

如果我們請求一個(gè)不存在的文檔圾浅，我們?nèi)耘f會得到一個(gè) JSON 響應(yīng)體.

curl -i -XGET http://localhost:9200/website/blog/124?pretty

響應(yīng)：

HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 83

{
  "_index" : "website",
  "_type" :  "blog",
  "_id" :    "124",
  "found" :  false
}

3.1 返回文檔的一部分

默認(rèn)情況下， GET 請求會返回整個(gè)文檔憾朴，這個(gè)文檔正如存儲在_source 字段中的一樣狸捕。但是也許你只對其中的title 字段感興趣。單個(gè)字段能用_source 參數(shù)請求得到众雷，多個(gè)字段也能使用逗號分隔的列表來指定灸拍。

GET /website/blog/123?_source=title,text

響應(yīng)：

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :   true,
  "_source" : {
      "title": "My first blog entry" ,
      "text":  "Just trying this out..."
  }
}

或者，如果你只想得到 _source 字段砾省，不需要任何元數(shù)據(jù)鸡岗，你能使用 _source 端點(diǎn)：
GET /website/blog/123/_source
響應(yīng)：

{
   "title": "My first blog entry",
   "text":  "Just trying this out...",
   "date":  "2014/01/01"
}

4.更新整個(gè)文檔

在 Elasticsearch 中文檔是不可改變的，不能修改它們纯蛾。相反纤房，如果想要更新現(xiàn)有的文檔纵隔，需要重建索引或者進(jìn)行替換翻诉，我們可以使用相同的 index API 進(jìn)行實(shí)現(xiàn)

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2014/01/02"
}

響應(yīng)：

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 2,
  "created":   false
}

在內(nèi)部，Elasticsearch 已將舊文檔標(biāo)記為已刪除捌刮，并增加一個(gè)全新的文檔碰煌。盡管你不能再對舊版本的文檔進(jìn)行訪問，但它并不會立即消失绅作。當(dāng)繼續(xù)索引更多的數(shù)據(jù)芦圾，Elasticsearch 會在后臺清理這些已刪除文檔。

5.創(chuàng)建新文檔

_index 俄认、 _type 和 _id 的組合可以唯一標(biāo)識一個(gè)文檔个少。所以，確保創(chuàng)建一個(gè)新文檔的最簡單辦法是眯杏，使用索引請求的POST 形式讓 Elasticsearch 自動生成唯一_id :

POST /website/blog/
{ ... }

然而夜焦，如果已經(jīng)有自己的 _id ，那么我們必須告訴 Elasticsearch 岂贩，只有在相同的 _index 茫经、 _type 和 _id 不存在時(shí)才接受我們的索引請求。這里有兩種方式，他們做的實(shí)際是相同的事情卸伞。使用哪種抹镊，取決于哪種使用起來更方便。

PUT /website/blog/123/_create
{ ... }

如果創(chuàng)建新文檔的請求成功執(zhí)行荤傲，Elasticsearch 會返回元數(shù)據(jù)和一個(gè)201 Created的 HTTP 響應(yīng)碼垮耳。另一方面，如果具有相同的_index弃酌、_type和_id的文檔已經(jīng)存在氨菇，Elasticsearch 將會返回409 Conflict響應(yīng)碼，以及如下的錯(cuò)誤信息：

{
   "error": {
      "root_cause": [
         {
            "type": "document_already_exists_exception",
            "reason": "[blog][123]: document already exists",
            "shard": "0",
            "index": "website"
         }
      ],
      "type": "document_already_exists_exception",
      "reason": "[blog][123]: document already exists",
      "shard": "0",
      "index": "website"
   },
   "status": 409
}

6.刪除文檔

刪除文檔的語法和我們所知道的規(guī)則相同妓湘，只是使用DELETE 方法：DELETE /website/blog/123
如果找到該文檔查蓉，

{
  "found" :    true,
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 3
}

如果文檔沒有找到，

{
  "found" :    false,
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 4
}

7.空搜索

GET /_search

響應(yīng)

{
   "hits" : {
      "total" :       14,
      "hits" : [
        {
          "_index":   "us",
          "_type":    "tweet",
          "_id":      "7",
          "_score":   1,
          "_source": {
             "date":    "2014-09-17",
             "name":    "John Smith",
             "tweet":   "The Query DSL is really powerful and flexible",
             "user_id": 2
          }
       },
        ... 9 RESULTS REMOVED ...
      ],
      "max_score" :   1
   },
   "took" :           4,
   "_shards" : {
      "failed" :      0,
      "successful" :  10,
      "total" :       10
   },
   "timed_out" :      false
}

返回結(jié)果中最重要的部分是 hits 榜贴，它包含total 字段來表示匹配到的文檔總數(shù)豌研，并且一個(gè)hits 數(shù)組包含所查詢結(jié)果的前十個(gè)文檔。
max_score 值是與查詢所匹配文檔的 _score 的最大值唬党。took 值告訴我們執(zhí)行整個(gè)搜索請求耗費(fèi)了多少毫秒._shards 部分告訴我們在查詢中參與分片的總數(shù)鹃共，以及這些分片成功了多少個(gè)失敗了多少個(gè)。正常情況下我們不希望分片失敗驶拱，但是分片失敗是可能發(fā)生的霜浴。如果我們遭遇到一種災(zāi)難級別的故障，在這個(gè)故障中丟失了相同分片的原始數(shù)據(jù)和副本蓝纲，那么對這個(gè)分片將沒有可用副本來對搜索請求作出響應(yīng)阴孟。假若這樣，Elasticsearch 將報(bào)告這個(gè)分片是失敗的税迷，但是會繼續(xù)返回剩余分片的結(jié)果永丝。timed_out 值告訴我們查詢是否超時(shí)。默認(rèn)情況下箭养，搜索請求不會超時(shí)慕嚷。如果低響應(yīng)時(shí)間比完成結(jié)果更重要，你可以指定 timeout 為 10 或者 10ms（10毫秒）毕泌，或者 1s（1秒）：

GET /_search?timeout=10ms

應(yīng)當(dāng)注意的是 timeout 的節(jié)點(diǎn)返回到目前為止收集的結(jié)果并且關(guān)閉連接喝检。在后臺，其他的分片可能仍在執(zhí)行查詢即使是結(jié)果已經(jīng)被發(fā)送了撼泛。使用超時(shí)是因?yàn)?SLA(服務(wù)等級協(xié)議)對你是很重要的挠说，而不是因?yàn)橄肴ブ兄归L時(shí)間運(yùn)行的查詢。

8多索引坎弯，多類型

如果不對某一特殊的索引或者類型做限制纺涤，就會搜索集群中的所有文檔译暂。Elasticsearch 轉(zhuǎn)發(fā)搜索請求到每一個(gè)主分片或者副本分片，匯集查詢出的前10個(gè)結(jié)果撩炊，并且返回給我們外永。然而，經(jīng)常的情況下拧咳，你想在一個(gè)或多個(gè)特殊的索引并且在一個(gè)或者多個(gè)特殊的類型中進(jìn)行搜索伯顶。我們可以通過在URL中指定特殊的索引和類型達(dá)到這種效果，如下所示：

/gb/_search     # 在 gb 索引中搜索所有的類型
/g*,u*/_search  # 在任何以 g 或者 u 開頭的索引中搜索所有的類型
/gb/user/_search # 在 gb 索引中搜索 user 類型
/_all/user,tweet/_search # 在所有的索引中搜索 user 和 tweet 類型

當(dāng)在單一的索引下進(jìn)行搜索的時(shí)候骆膝，Elasticsearch 轉(zhuǎn)發(fā)請求到索引的每個(gè)分片中祭衩，可以是主分片也可以是副本分片，然后從每個(gè)分片中收集結(jié)果阅签。多索引搜索恰好也是用相同的方式工作的--只是會涉及到更多的分片掐暮。

elasticsearch 學(xué)習(xí)筆記2

elasticsearch 學(xué)習(xí)筆記2

1 .什么是文檔

2.文檔元數(shù)據(jù)

3. 索引文檔

4.取回一個(gè)文檔

3.1 返回文檔的一部分

4.更新整個(gè)文檔

5.創(chuàng)建新文檔

6.刪除文檔

7.空搜索

8多索引坎弯，多類型