Elasticsearch-26-字符串排序问题及解决方案

字符串排序问题

如果对一个string类型的field进行排序,结果往往不准确,因为string类型的field要进行分词,分词后是多个单词,再排序就不是我们想要的结果了

如何解决

通常解决方式是,将一个string类型的field建立两次索引,一个分词用来进行搜索,一个不分词用来排序

示例

我们之前建立过一个website的索引,先把它删除掉

1
DELETE /website

然后重新建立索引并手动创建mapping.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
PUT /website
{
"mappings": {
"article": {
"properties": {
"title":{
"type": "text",
"fields": { // 这里是重点,title里面在建立一个 string类型的field
"raw":{ // 名称
"type": "string", // 数据类型,不分词只能是string
"index": "not_analyzed" // 指定不分词
}
},
"fielddata": true // 建立正排索引,这个后面详细说
},
"content":{
"type": "text"
},
"post_date":{
"type": "date"
},
"author_id":{
"type": "long"
}
}
}
}
}

建立好之后,往里面添加点数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
PUT /website/article/1
{
"title": "second article",
"content": "this is my second article",
"post_date": "2017-02-01",
"author_id": 110
}

PUT /website/article/2
{
"title": "first article",
"content": "this is my frist article",
"post_date": "2017-01-01",
"author_id": 110
}

PUT /website/article/3
{
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 110
}

数据添加完成,我们来查询按照title排序一下

1
2
3
4
5
6
7
8
9
10
11
12
13
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": {
"order": "desc"
}
}
]
}

返回结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 110
},
"sort": [
"third"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-02-01",
"author_id": 110
},
"sort": [
"second"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my frist article",
"post_date": "2017-01-01",
"author_id": 110
},
"sort": [
"first"
]
}
]
}
}

可以看到 返回值中的sort这一列,是按照分词之后进行排序的,然后用我们上面创建出来title.raw来进行排序看下效果

1
2
3
4
5
6
7
8
9
10
11
12
13
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.raw": {
"order": "desc"
}
}
]
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 110
},
"sort": [
"third article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-02-01",
"author_id": 110
},
"sort": [
"second article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my frist article",
"post_date": "2017-01-01",
"author_id": 110
},
"sort": [
"first article"
]
}
]
}
}

再来看一下返回值中的sort ,这样排序就没有分词而是直接去排序的