ElasticSearch Put-Mapping Conflict

If you experience very high Master CPU usage, and low index throughput, you may be seeing the effects of a put-mapping conflict. This can best be confirmed through querying the pending_tasks API:

> curl --silent -X GET https://${ES_ENDPOINT}/_cat/pending_tasks
107800 55ms HIGH put-mapping
107801 55ms HIGH put-mapping
107802 13ms HIGH put-mapping

Further information from the hot_threads API can confirm by looking for a clusterService#updateTask hot thread that consumes significant CPU on the master node. This example shows 23%, but it would more commonly be 70%+:

> curl --silent -X GET https://${ES_ENDPOINT}/_nodes/hot_threads?pretty | egrep 'cpu usage by thread'
   61.7% (308.3ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][refresh][T#1]'
   58.5% (292.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
   10.1% (50.4ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
   23.2% (115.8ms out of 500ms) cpu usage by thread 'elasticsearch[master_node][clusterService#updateTask][T#1]'
    0.4% (1.8ms out of 500ms) cpu usage by thread 'qtp1424067142-182'
   87.0% (434.9ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
   11.1% (55.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
    1.0% (4.9ms out of 500ms) cpu usage by thread 'qtp1888932945-545'
    3.1% (15.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
    2.1% (10.5ms out of 500ms) cpu usage by thread 'qtp1888932945-645'
    0.7% (3.2ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
    2.6% (12.8ms out of 500ms) cpu usage by thread 'qtp1888932945-629'
    0.0% (209micros out of 500ms) cpu usage by thread 'MetricAggregationClientPublisher-1'
    0.0% (140.6micros out of 500ms) cpu usage by thread 'qtp1424067142-76'
    2.7% (13.3ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#2]'
    0.5% (2.6ms out of 500ms) cpu usage by thread 'elasticsearch[worker_node][bulk][T#1]'
    0.1% (275micros out of 500ms) cpu usage by thread 'MetricAggregationClientPublisher-1'

If using CloudWatch logs Insights, and AWS ElasticSearch with error logs enabled and sent to CloudWatch, this query can quantify the volume of put-mapping conflicts occurring:

fields @timestamp
| filter @message like /failed to put mappings on indices/
| PARSE @message "[*][*][*] [*] failed to put mappings on indices [[[__PATH__]]], type [*]" as dateTimeString, severity, mappingAction, es_host, index
| sort @timestamp desc
| limit 250

The #1 priority should be to make sure that all beats are on the same exact version. Also, send every source to it’s own index, all filebeat logs should go to a filebeat index, all metricbeat logs to a metricbeat index, etc.. If using an older version of Logstash, use tags to filter outputs to different indices.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *