If you experience very high Master CPU usage, and low index throughput, you may be seeing the effects of a put-mapping conflict. This can best be confirmed through querying the pending_tasks API:
> curl --silent -X GET https://${ES_ENDPOINT}/_cat/pending_tasks
107800 55ms HIGH put-mapping
107801 55ms HIGH put-mapping
107802 13ms HIGH put-mapping
Further information from the hot_threads API can confirm by looking for a clusterService#updateTask hot thread that consumes significant CPU on the master node. This example shows 23%, but it would more commonly be 70%+:
> curl --silent -X GET https://${ES_ENDPOINT}/_nodes/hot_threads?pretty | egrep 'cpu usage by thread'
61.7% (308.3ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][refresh][T#1]'
58.5% (292.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
10.1% (50.4ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
23.2% (115.8ms out of 500ms) cpu usage by thread 'elasticsearch[master_node][clusterService#updateTask][T#1]'
0.4% (1.8ms out of 500ms) cpu usage by thread 'qtp1424067142-182'
87.0% (434.9ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
11.1% (55.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
1.0% (4.9ms out of 500ms) cpu usage by thread 'qtp1888932945-545'
3.1% (15.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
2.1% (10.5ms out of 500ms) cpu usage by thread 'qtp1888932945-645'
0.7% (3.2ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
2.6% (12.8ms out of 500ms) cpu usage by thread 'qtp1888932945-629'
0.0% (209micros out of 500ms) cpu usage by thread 'MetricAggregationClientPublisher-1'
0.0% (140.6micros out of 500ms) cpu usage by thread 'qtp1424067142-76'
2.7% (13.3ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
0.5% (2.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
0.1% (275micros out of 500ms) cpu usage by thread 'MetricAggregationClientPublisher-1'
If using CloudWatch logs Insights, and AWS ElasticSearch with error logs enabled and sent to CloudWatch, this query can quantify the volume of put-mapping conflicts occurring:
fields @timestamp
| filter @message like /failed to put mappings on indices/
| PARSE @message "[*][*][*] [*] failed to put mappings on indices [[[__PATH__]]], type [*]" as dateTimeString, severity, mappingAction, es_host, index
| sort @timestamp desc
| limit 250
The #1 priority should be to make sure that all beats are on the same exact version. Also, send every source to it’s own index, all filebeat logs should go to a filebeat index, all metricbeat logs to a metricbeat index, etc.. If using an older version of Logstash, use tags to filter outputs to different indices.
Leave a Reply