For me tuning the threadpool search queue_size solved the issue. I tried a number of other things and this is the one that solved it.
I added this to my elasticsearch.yml
threadpool.search.queue_size: 10000
and then restarted elasticsearch.
Reasoning… (from the docs)
A node holds several thread pools in order to improve how threads
memory consumption are managed within a node. Many of these pools also
have queues associated with them, which allow pending requests to be
held instead of discarded.
and for search in particular…
For count/search operations. Defaults to fixed with a size of int((#
of available_processors * 3) / 2) + 1, queue_size of 1000.
For more information you can refer to the elasticsearch docs here…
I had trouble finding this information so I hope this helps others!