Elasticsearch-26-字符串排序问题及解决方案

字符串排序问题

如果对一个string类型的field进行排序,结果往往不准确,因为string类型的field要进行分词,分词后是多个单词,再排序就不是我们想要的结果了

如何解决

通常解决方式是,将一个string类型的field建立两次索引,一个分词用来进行搜索,一个不分词用来排序

示例

我们之前建立过一个website的索引,先把它删除掉

1	DELETE /website

然后重新建立索引并手动创建mapping.

PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "title":{
          "type": "text",
          "fields": {   // 这里是重点,title里面在建立一个 string类型的field
            "raw":{    // 名称
              "type": "string",  // 数据类型,不分词只能是string
              "index": "not_analyzed"  // 指定不分词
            }
          },
          "fielddata": true     // 建立正排索引,这个后面详细说
        },
        "content":{
          "type": "text"
        },
        "post_date":{
          "type": "date"
        },
        "author_id":{
          "type": "long"
        }
      }
    }
  }
}

建立好之后,往里面添加点数据

PUT /website/article/1
{
  "title": "second article",
  "content": "this is my second article",
  "post_date": "2017-02-01",
  "author_id": 110
}

PUT /website/article/2
{
  "title": "first article",
  "content": "this is my frist article",
  "post_date": "2017-01-01",
  "author_id": 110
}

PUT /website/article/3
{
  "title": "third article",
  "content": "this is my third article",
  "post_date": "2017-03-01",
  "author_id": 110
}

数据添加完成,我们来查询按照title排序一下

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title": {
        "order": "desc"
      }
    }
  ]
}

返回结果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 110
        },
        "sort": [
          "third"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-02-01",
          "author_id": 110
        },
        "sort": [
          "second"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my frist article",
          "post_date": "2017-01-01",
          "author_id": 110
        },
        "sort": [
          "first"
        ]
      }
    ]
  }
}

可以看到返回值中的sort这一列,是按照分词之后进行排序的,然后用我们上面创建出来title.raw来进行排序看下效果

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title.raw": {
        "order": "desc"
      }
    }
  ]
}

返回值:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 110
        },
        "sort": [
          "third article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-02-01",
          "author_id": 110
        },
        "sort": [
          "second article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my frist article",
          "post_date": "2017-01-01",
          "author_id": 110
        },
        "sort": [
          "first article"
        ]
      }
    ]
  }
}

再来看一下返回值中的sort ,这样排序就没有分词而是直接去排序的