Elasticsearch-38-实战案例-term filter搜索

之前都是随便写的一些demo来测试es的api,本文及以后将会基于一个案例,来更加深入使用这些api,之后会再使用Java api来实现具体功能.

场景

以一个IT论坛为背景,来置顶搜索需求,以及实现.

测试数据
1
2
3
4
5
6
7
8
9
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

使用_bulk api来添加数据,目前我们只添加这几个field,articleID,userId,hidden

执行完毕以后,我们来查看一下dynamic mapping给我建立的mapping

1
GET /forum/_mapping/article

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"forum": {
"mappings": {
"article": {
"properties": {
"articleID": {
"type": "text",
"fields": {
"keyword": { // 1
"type": "keyword",
"ignore_above": 256
}
}
},
"hidden": {
"type": "boolean"
},
"postDate": {
"type": "date"
},
"userID": {
"type": "long"
}
}
}
}
}
}

这里我们看1处,”articleID”的类型是text,里面还有一个”articleID.keyword”,这个东西是干嘛的呢?

在新版es中,type=text的时候,默认会设置两个field,一个是field本身,比如”articleID”,他是分词的,还有一个就是field.keyword,比如”articleID.keyword”,默认是不分词的, keyword里面还有一个属性是”ignore_above”:256,意思就是最多会保留256个字符

term filter的使用

term filter/query: 对搜索文本不分词,直接拿去倒排索引中去匹配,你输入的是什么,就去匹配什么

需求1:根据用户id来搜索帖子
1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"userID": 1
}
}
}
}
}
需求2:搜索没有隐藏的帖子
1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"hidden": false
}
}
}
}
}
需求3:根据发帖日期搜索帖子
1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"postDate": "2017-01-01"
}
}
}
}
}
需求4:根据帖子id搜索帖子
1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID": "XHDK-A-1293-#fJ3"
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

这里可以看到,一条结果也没有,但是应该是有这个数据的,为什么呢?

在添加数据的时候,字符串是默认会去分词,然后建立倒排索引的,而term是不去分词的,所以是查不到的

我们可以用上面es自动建立的keyword来进行搜索

1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID.keyword": "XHDK-A-1293-#fJ3"
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01"
}
}
]
}
}

这样就可以搜索到了,但是同时也有一个问题,就是keyword只会保留256个字符,如果这个字段太长的话那就还是搜索不到的.这时候,我们最好重建索引,手动设置mapping

删除索引

1
DELETE /forum

手动创建索引,指定articleID不分词

1
2
3
4
5
6
7
8
9
10
11
12
PUT /forum
{
"mappings": {
"article":{
"properties": {
"articleID":{
"type": "keyword"
}
}
}
}
}

然后把上面的数据重新添加进去.
现在,再用articleID来进行查询

1
2
3
4
5
6
7
8
9
10
11
12
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID": "XHDK-A-1293-#fJ3"
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01"
}
}
]
}
}

这时候就可以查询的到了

总结

  1. term filter:根据exact value来进行搜索,数字,Boolean,date类型的天然支持
  2. text类型的field需要在建立的索引的时候指定not_analyzed(新版中可以直接指定type为keyword),才可以使用term