ES 十月 10, 2021

搜索与查询

文章字数 20k 阅读约需 18 mins. 阅读次数 0

搜索与查询

查询上下文

查询示例

GET kibana_sample_data_ecommerce/_search
{
  "size": 1
}


{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4675,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "VB2OXXwBYKHeDs_3_B9c",
        "_score" : 1.0,
        "_source" : {
          "category" : [
            "Men's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Eddie",
          "customer_full_name" : "Eddie Underwood",
          "customer_gender" : "MALE",
          "customer_id" : 38,
          "customer_last_name" : "Underwood",
          "customer_phone" : "",
          "day_of_week" : "Monday",
          "day_of_week_i" : 0,
          "email" : "eddie@underwood-family.zzz",
          "manufacturer" : [
            "Elitelligence",
            "Oceanavigations"
          ],
          "order_date" : "2021-10-18T09:28:48+00:00",
          "order_id" : 584677,
          "products" : [
            {
              "base_price" : 11.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Elitelligence",
              "tax_amount" : 0,
              "product_id" : 6283,
              "category" : "Men's Clothing",
              "sku" : "ZO0549605496",
              "taxless_price" : 11.99,
              "unit_discount_amount" : 0,
              "min_price" : 6.35,
              "_id" : "sold_product_584677_6283",
              "discount_amount" : 0,
              "created_on" : "2016-12-26T09:28:48+00:00",
              "product_name" : "Basic T-shirt - dark blue/white",
              "price" : 11.99,
              "taxful_price" : 11.99,
              "base_unit_price" : 11.99
            },
            {
              "base_price" : 24.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Oceanavigations",
              "tax_amount" : 0,
              "product_id" : 19400,
              "category" : "Men's Clothing",
              "sku" : "ZO0299602996",
              "taxless_price" : 24.99,
              "unit_discount_amount" : 0,
              "min_price" : 11.75,
              "_id" : "sold_product_584677_19400",
              "discount_amount" : 0,
              "created_on" : "2016-12-26T09:28:48+00:00",
              "product_name" : "Sweatshirt - grey multicolor",
              "price" : 24.99,
              "taxful_price" : 24.99,
              "base_unit_price" : 24.99
            }
          ],
          "sku" : [
            "ZO0549605496",
            "ZO0299602996"
          ],
          "taxful_total_price" : 36.98,
          "taxless_total_price" : 36.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "eddie",
          "geoip" : {
            "country_iso_code" : "EG",
            "location" : {
              "lon" : 31.3,
              "lat" : 30.1
            },
            "region_name" : "Cairo Governorate",
            "continent_name" : "Africa",
            "city_name" : "Cairo"
          }
        }
      }
    ]
  }
}

结构

{
  "took" : 0,                  -- 请求消耗的时间
  "timed_out" : false,      -- 当前请求是否超时
  "_shards" : {             -- 当前请求的分片
    "total" : 1,                -- 一共一个
    "successful" : 1,            -- 成功一个
    "skipped" : 0,                -- 跳过0个
    "failed" : 0                -- 失败0个
  },
  "hits" : {                -- 请求结果
    "total" : {                -- 请求统计
      "value" : 4675,            -- 请求查询到4675条记录
      "relation" : "eq"          -- 当前查询关系 等于
    },
    "max_score" : 1.0,        -- 当前返回结果最大评分为1.0
    "hits" : [                -- 请求结果数据
      {
        "_index" : "kibana_sample_data_ecommerce",    -- 当前数据所在索引
        "_type" : "_doc",                -- 数据类型,7.0之前可以自定义,之后固定为_doc
        "_id" : "VB2OXXwBYKHeDs_3_B9c",  -- 当前数据id
        "_score" : 1.0,                    -- 相关度评分1.0. 默认根据评分排序,由高到低
        "_source" : {                    -- 导入的数据
          "customer_full_name" : "Eddie Underwood",
          "customer_gender" : "MALE",
          "customer_id" : 38,
          "customer_last_name" : "Underwood"
        }
      }
    ]
  }
}

相关度评分

数据根据查询的条件,算出一个相关度评分,然后数据根据相关度评分从高到低排序列出,在没有排序条件的时候.

在7.x之前相关度评分默认使用TF/IDF算法计算而来,7.x之后默认为BM25。

元数据

  1. 禁用元数据
    好处: 节约开销,节省不必要的查询浪费.

    坏处:

    • 不支持update、update_by_query和reindex API。
    • 不支持高亮.
    • 不支持reindex,更改mapping分析器与版本升级.
    • 通过查看索引时使用的原始文档来调试查询或聚合的功能。
    • 将来有可能自动修复索引损坏。

    总结: 可以使用压缩索引的方式来节省磁盘,比直接禁用更好.

    GET kibana_sample_data_ecommerce/_search
    {
      "_source": "fasle"   -- 查询时添加 -souce: false 条件
      , "size": 1
    }
    结果:
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 6,
        "successful" : 6,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4744,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : ".kibana-event-log-7.8.0-000001",
            "_type" : "_doc",
            "_id" : "Ih2OXXwBYKHeDs_3uR8r",
            "_score" : 1.0,
            "_source" : { }         -- _souce元数据不输出.
          }
        ]
      }
    }
    
  2. 数据源过滤器

    including: 结果中返回哪些字段

    Excluding: 结果中不返回哪些字段.只是结果字段不返回,还是可以通过字段进行检索.

    使用:

    1. 在mapping中定义过滤:支持通配符,但是这种方式不推荐,因为mapping不可变

      PUT user  -- 设置用户索引mappings
      {
        "mappings": {
          "_source": {
            "includes": [
                "name",
                "age"
              ],
              "excludes": [
                "sex",
                "birth"
                ]
          }
        }
      }
      
      PUT user/_doc/1 -- 插入一条数据
      {
        "name": "空痕影",
        "age": 18,
        "birth": "2000-12-12",
        "sex": "男"
      }
      
      GET user/_search  -- 查询
      结果:
      
      {
        "took" : 880,
        "timed_out" : false,
        "_shards" : {
          "total" : 1,
          "successful" : 1,
          "skipped" : 0,
          "failed" : 0
        },
        "hits" : {
          "total" : {
            "value" : 1,
            "relation" : "eq"
          },
          "max_score" : 1.0,
          "hits" : [
            {
              "_index" : "user",
              "_type" : "_doc",
              "_id" : "1",
              "_score" : 1.0,
              "_source" : {    -- 只显示name与age,不显示birth与sex
                "name" : "空痕影",
                "age" : 18
              }
            }
          ]
        }
      }
      
    2. 查询的时候动态的指定source

      • “_source” : “false”,

      • “_source” : “obj.*”,

      • “_source” : [“obj1.“,”obj2.“],

      • “_source” : {

        ​ “includes”:[“obj1.“,”obj2.“],

        ​ “excludes”:[“*.obj3”]

        }

        注:如果有includes与excludes有交集,以excludes为准,即不显示交集字段.

      
      GET user/_search
      {
        "_source": {
          "includes": ["name","age","birth"],
          "excludes": ["age"]
        }
      }
      结果:
      {
        "took" : 0,
        "timed_out" : false,
        "_shards" : {
          "total" : 1,
          "successful" : 1,
          "skipped" : 0,
          "failed" : 0
        },
        "hits" : {
          "total" : {
            "value" : 1,
            "relation" : "eq"
          },
          "max_score" : 1.0,
          "hits" : [
            {
              "_index" : "user",
              "_type" : "_doc",
              "_id" : "1",
              "_score" : 1.0,
              "_source" : {
                "name" : "空痕影",
                "birth" : "2000-12-12"
              }
            }
          ]
        }
      }
      

Query String

  • 查询所有

    GET user/_search

  • 带参数/精准匹配

    GET user/_search?q=name:空痕影

  • 带分页与排序

    GET user/_search?from=0&size=2&sort=age:asc

    注:带了排序后sort将为null.需要自己开启

  • _all搜索 相当于在所有有索引的字段中检索

    GET user/_serach?q=空痕影

全文检索 fulltext query

  • match: 匹配包含某个term的子句

    GET user/_search
    {
      "query": {
       "match":{
         "device": "huawei mate book"
       }
      }
    }
    结果:
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.1143606,
        "hits" : [
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.1143606,
            "_source" : {
              "name" : "空痕影2",
              "age" : 15,
              "birth" : "2011-12-12",
              "sex" : "男",
              "device" : "huawei mate book"
            }
          },
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "空痕影1",
              "age" : 18,
              "birth" : "2010-12-12",
              "sex" : "男",
              "device" : "huawei mate pad"
            }
          },
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 0.13353139,
            "_source" : {
              "name" : "空痕影3",
              "age" : 20,
              "birth" : "1996-12-12",
              "sex" : "男",
              "device" : "huawei mate phone"
            }
          }
        ]
      }
    }
    分析: 英文分词器以空格分隔,将查询词库 分成 huawei,mate,phone 三个词 来查询device字段被分词后的值.
    
  • match_all: 匹配所有结果的子句

    GET user/_search
    {
      "query": {
       "match_all":{}
      }
    }
    
  • multi_match: 多字段条件

    // 查询数据中 name与desc 字段包含查询字符串"3"的短语的记录.
    GET user/_search
    {
      "query": {
        "multi_match": {
          "query": "3",
          "fields": ["name","desc"]
        }
      }
    }
    结果:
    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 1.3862942,
        "hits" : [
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.3862942,
            "_source" : {
              "name" : "空痕影3",
              "age" : 20,
              "birth" : "1996-12-12",
              "sex" : "男",
              "device" : "huawei mate phone"
            }
          },
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "4",
            "_score" : 0.6931471,
            "_source" : {
              "name" : "空痕影4",
              "age" : 20,
              "birth" : "1996-12-12",
              "sex" : "男",
              "device" : "huawei mate phone",
              "desc" : "这是第3条数据"
            }
          }
        ]
      }
    }
    
    
  • match_phrase: 短语查询

    // 查询数据中一组词项都匹配的数据.即包含mate与book词项的且
    
    GET user/_search
    {
      "query": {
        "match_phrase": {
          "device": "mate book"
        }
      }
    }
    
    结果:
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.3093333,
        "hits" : [
          {
            "_index" : "user",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.3093333,
            "_source" : {
              "name" : "空痕影2",
              "age" : 15,
              "birth" : "2011-12-12",
              "sex" : "男",
              "device" : "huawei mate book"
            }
          }
        ]
      }
    }
    

精确查找 Term

team: 匹配和搜索词项完全相等的结果.

term和match_phrase区别:

  • match_phrase 会将检索关键词分词, match_phrase的分词结果必须在被检索字段的分词中都包含,而且顺序必须相同,而且默认必须都是连续的
  • term搜索不会将搜索词分词,但源文件内的字段分词需要keyword来控制.

term和keyword都是不分词.但作用域不同:

  • term是对于搜索词不分词,
  • keyword是字段类型,是对于source data中的字段值不分词
// ik分词器会将NFC手机分为nfc与手机两个词.
// 匹配name字段分词后的词中是否包含"nfc手机"这个完整词的.
GET product/_search
{
  "query": {
    "term": {
      "name": {
        "value": "nfc手机"
      }
    }
  }
}


// keyword:存储数据时候,不会分词建立索引
// 匹配name字段源数据是否 = "nfc手机" 这个词
GET product/_search
{
  "query": {
    "term": {
      "name.keyword": {
        "value": "nfc手机"
      }
    }
  }
}

teams: 匹配和搜索词项列表中任意项匹配的结果

// 查询 product 中的name字段是否包含小米与nfc词项.
GET product/_search
{
  "query": {
    "terms": {
      "name": [
        "小米",
        "nfc"
      ]
    }
  }
}

range:范围查找

// 查询价格大于1K,小于3K的记录
GET product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 3000
      }
    }
  }
}    

过滤器 Filter

query和filter的主要区别在:

  • filter是结果导向的而query是过程导向。
  • query倾向于“当前文档和查询的语句的相关度”而filter倾向于“当前文档和查询的条件是不是相符”。即在查询过程中,query是要对查询的每个结果计算相关性得分的,而filter不会。
  • 另外filter有相应的缓存机制,可以提高查询效率。

组合查询-Bool query

可以组合多个查询条件,bool查询也是采用more_matches_is_better的机制,因此满足must和should子句的文档将会合并起来计算分值

// 格式:
{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "filter" : [],
      "must_not" : [],
   }
}
  • must:必须满足子句(查询)必须出现在匹配的文档中,并将有助于得分。
  • filter:过滤器 不计算相关度分数,cache子句(查询)必须出现在匹配的文档中。但是不像 must查询的分数将被忽略。Filter子句在filter上下文中执行,这意味着计分被忽略,并且子句被考虑用于缓存。
  • should:可能满足 or子句(查询)应出现在匹配的文档中。
  • must_not:必须不满足 不计算相关度分数 not子句(查询)不得出现在匹配的文档中。子句在过滤器上下文中执行,这意味着计分被忽略,并且子句被视为用于缓存。由于忽略计分,0因此将返回所有文档的分数。

minimum_should_match:参数指定should返回的文档必须匹配的子句的数量或百分比。如果bool查询包含至少一个should子句,而没有must或 filter子句,则默认值为1。否则,默认值为0

0%