Elasticsearch-89-针对nested object的聚合分析案例

对于nested object里面的数据,如何做聚合分析呢? 以上文的数据作为背景 来看两个案例

案例一

需求: 按照评论日期进行划分bucket,拿到每个月评论的stars的平均值
请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
GET /website/blogs/_search
{
"size": 0,
"aggs": {
"comments_path": {
"nested": {
"path": "comments"
},
"aggs": {
"group_by_comments_date": {
"date_histogram": {
"field": "comments.date",
"interval": "month",
"format": "yyyy-MM"
},
"aggs": {
"avg_stars": {
"avg": {
"field": "comments.stars"
}
}
}
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"comments_path": {
"doc_count": 2,
"group_by_comments_date": {
"buckets": [
{
"key_as_string": "2016-09",
"key": 1472688000000,
"doc_count": 1,
"avg_stars": {
"value": 4
}
},
{
"key_as_string": "2016-10",
"key": 1475280000000,
"doc_count": 1,
"avg_stars": {
"value": 5
}
}
]
}
}
}
}

看一下请求,第一个aggs,设置nested的path,然后第二个aggs是根据日期按月划分bucket的,第三个aggs是用来计算平均数的

案例二

需求: 根据评论的年龄来划分bucket,然后按博客的tags进行分组

这里,评论的年龄是comments里面的数据,但是tags是外面的数据,那么如何用nested object外面的数据来进行聚合呢,先看一下请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
GET /website/blogs/_search
{
"size": 0,
"aggs": {
"comments_path": {
"nested": {
"path": "comments"
},
"aggs": {
"group_by_age": {
"histogram": {
"field": "comments.age",
"interval": 10
},
"aggs": {
"reverse_path":{
"reverse_nested": {},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags.keyword"
}
}
}
}
}
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"comments_path": {
"doc_count": 2,
"group_by_comments_age": {
"buckets": [
{
"key": 20,
"doc_count": 1,
"reverse_path": {
"doc_count": 1,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "投资",
"doc_count": 1
},
{
"key": "理财",
"doc_count": 1
}
]
}
}
},
{
"key": 30,
"doc_count": 1,
"reverse_path": {
"doc_count": 1,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "投资",
"doc_count": 1
},
{
"key": "理财",
"doc_count": 1
}
]
}
}
}
]
}
}
}
}

首先,第一个aggs还是先设置path,然后下钻第二个aggs根据年龄去分段划分bucket,继续下钻,第三个aggs的时候用了一个reverse_nested:{},加上这个之后,就可以使用nested object外面的数据了,最后再一次下钻,按照tags分组,因为tags是分词的,所以用了tags.keyword