Elasticsearch-69-深入聚合分析数据II

常用的几种metric操作

上文中,用了avg和count这两个操作,一般来说,常用的metric操作就是以下几种

  • count: 计算数量,用terms操作来分组的话,就会自动有一个doc_count,就相当于是count
  • avg: 求一个bucket内,指定field数据的平均值
  • max: 求一个bucket内,指定field数据的最大值
  • min: 求一个bucket内,指定field数据的最小值
  • sum: 求一个bucket内,指定field数据的和

示例

需求: 统计每种颜色的电视的数量和价格的平均值,最大值,最小值,总和

请求体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
GET tvs/sales/_search
{
"size": 0,
"aggs": {
"color": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"max_price":{
"max": {
"field": "price"
}
},
"min_price":{
"min": {
"field": "price"
}
},
"sum_price":{
"sum": {
"field": "price"
}
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "红色",
"doc_count": 4,
"max_price": {
"value": 8000
},
"min_price": {
"value": 1000
},
"avg_price": {
"value": 3250
},
"sum_price": {
"value": 13000
}
},
{
"key": "绿色",
"doc_count": 2,
"max_price": {
"value": 3000
},
"min_price": {
"value": 1200
},
"avg_price": {
"value": 2100
},
"sum_price": {
"value": 4200
}
},
{
"key": "蓝色",
"doc_count": 2,
"max_price": {
"value": 2500
},
"min_price": {
"value": 1500
},
"avg_price": {
"value": 2000
},
"sum_price": {
"value": 4000
}
}
]
}
}
}

histogram

上面的请求都是用的terms来分组的, terms其实就是把field的值相同的数据分到了一个bucket里面,而histogram呢是可以根据某一范围区间去划分的

比如现在有一个需求,按照价格区间来统计销量和销售额

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"histogram": {
"field": "price",
"interval": 2000
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_price": {
"buckets": [
{
"key": 0,
"doc_count": 3,
"sum_price": {
"value": 3700
}
},
{
"key": 2000,
"doc_count": 4,
"sum_price": {
"value": 9500
}
},
{
"key": 4000,
"doc_count": 0,
"sum_price": {
"value": 0
}
},
{
"key": 6000,
"doc_count": 0,
"sum_price": {
"value": 0
}
},
{
"key": 8000,
"doc_count": 1,
"sum_price": {
"value": 8000
}
}
]
}
}
}

详细看一下请求体中的

1
2
3
4
"histogram": {
"field": "price",
"interval": 2000
}

这个部分,histogram和term类似也是进行bucket分组操作的, 里面的field就是按照哪个field进行分组,interval划分范围,比如我们请求中的是2000,那会就会划分0-2000,2000-4000,4000-6000….等等区间,然后根据price的值,去决定分到哪个bucket中,bucket有了之后,对它进行metric操作,和之前是一样的

date histogram

需求: 统计每个月的电视销量

date histogram,可以按照我们指定的某一个date类型的field,以及日期interval,按照一定的日期间隔,去划分bucket

请求:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET tvs/sales/_search
{
"size": 0,
"aggs": {
"sale": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min": "2016-01-01",
"max": "2017-12-31"
}
}
}
}
}

看一下请求,interval是month,就是按照月去划分,比如说2017-01-01~2017-01-31就是一个bucket, 然后 去扫描每个数据的date_field的值,判断落在哪个bucket中

min_doc_count:设置为0,意思就是说,即使某个interval区间中,一条数据都没有,那么这个区间也还是要返回的,不然默认是会过滤掉这个区间的

extended_bounds:划分bucket的时候,会限定这个起始日期和截止日期

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
{
"took": 28,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"sale": {
"buckets": [
{
"key_as_string": "2016-01-01",
"key": 1451606400000,
"doc_count": 0
},
{
"key_as_string": "2016-02-01",
"key": 1454284800000,
"doc_count": 0
},
{
"key_as_string": "2016-03-01",
"key": 1456790400000,
"doc_count": 0
},
{
"key_as_string": "2016-04-01",
"key": 1459468800000,
"doc_count": 0
},
{
"key_as_string": "2016-05-01",
"key": 1462060800000,
"doc_count": 1
},
{
"key_as_string": "2016-06-01",
"key": 1464739200000,
"doc_count": 0
},
{
"key_as_string": "2016-07-01",
"key": 1467331200000,
"doc_count": 1
},
{
"key_as_string": "2016-08-01",
"key": 1470009600000,
"doc_count": 1
},
{
"key_as_string": "2016-09-01",
"key": 1472688000000,
"doc_count": 0
},
{
"key_as_string": "2016-10-01",
"key": 1475280000000,
"doc_count": 1
},
{
"key_as_string": "2016-11-01",
"key": 1477958400000,
"doc_count": 2
},
{
"key_as_string": "2016-12-01",
"key": 1480550400000,
"doc_count": 0
},
{
"key_as_string": "2017-01-01",
"key": 1483228800000,
"doc_count": 1
},
{
"key_as_string": "2017-02-01",
"key": 1485907200000,
"doc_count": 1
},
{
"key_as_string": "2017-03-01",
"key": 1488326400000,
"doc_count": 0
},
{
"key_as_string": "2017-04-01",
"key": 1491004800000,
"doc_count": 0
},
{
"key_as_string": "2017-05-01",
"key": 1493596800000,
"doc_count": 0
},
{
"key_as_string": "2017-06-01",
"key": 1496275200000,
"doc_count": 0
},
{
"key_as_string": "2017-07-01",
"key": 1498867200000,
"doc_count": 0
},
{
"key_as_string": "2017-08-01",
"key": 1501545600000,
"doc_count": 0
},
{
"key_as_string": "2017-09-01",
"key": 1504224000000,
"doc_count": 0
},
{
"key_as_string": "2017-10-01",
"key": 1506816000000,
"doc_count": 0
},
{
"key_as_string": "2017-11-01",
"key": 1509494400000,
"doc_count": 0
},
{
"key_as_string": "2017-12-01",
"key": 1512086400000,
"doc_count": 0
}
]
}
}
}

返回值中,key_as_string 就是日期,key是13位的时间戳,doc_count就是统计的数量

案例

需求: 统计每个季度每个品牌的销售额

请求:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_quarter": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2016-01-01",
"max":"2017-12-31"
}
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"sum_of_price": {
"sum": {
"field": "price"
}
}
}
},
"total_sum_price":{
"sum": {
"field": "price"
}
}
}
}
}
}

请求中,先按照季度来分组,分好之后下面的下钻分析中,第一个group_by_brand 按照品牌分组, 第二个是total_sum_price 计算每二个季度的销售额, 然后group_by_brand下面继续下钻分析,统计每个品牌的销售额

返回值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_quarter": {
"buckets": [
{
"key_as_string": "2016-01-01",
"key": 1451606400000,
"doc_count": 0,
"total_sum_price": {
"value": 0
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key_as_string": "2016-04-01",
"key": 1459468800000,
"doc_count": 1,
"total_sum_price": {
"value": 3000
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "小米",
"doc_count": 1,
"sum_of_price": {
"value": 3000
}
}
]
}
},
{
"key_as_string": "2016-07-01",
"key": 1467331200000,
"doc_count": 2,
"total_sum_price": {
"value": 2700
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "TCL",
"doc_count": 2,
"sum_of_price": {
"value": 2700
}
}
]
}
},
{
"key_as_string": "2016-10-01",
"key": 1475280000000,
"doc_count": 3,
"total_sum_price": {
"value": 5000
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "长虹",
"doc_count": 3,
"sum_of_price": {
"value": 5000
}
}
]
}
},
{
"key_as_string": "2017-01-01",
"key": 1483228800000,
"doc_count": 2,
"total_sum_price": {
"value": 10500
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "三星",
"doc_count": 1,
"sum_of_price": {
"value": 8000
}
},
{
"key": "小米",
"doc_count": 1,
"sum_of_price": {
"value": 2500
}
}
]
}
},
{
"key_as_string": "2017-04-01",
"key": 1491004800000,
"doc_count": 0,
"total_sum_price": {
"value": 0
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key_as_string": "2017-07-01",
"key": 1498867200000,
"doc_count": 0,
"total_sum_price": {
"value": 0
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key_as_string": "2017-10-01",
"key": 1506816000000,
"doc_count": 0,
"total_sum_price": {
"value": 0
},
"group_by_brand": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
]
}
}
}