Showing posts from June, 2016

Elasticsearch Indexing Performance Cheatsheet

You plan to index large amounts of data in Elasticsearch? Or you are already trying to do so but it turns out that throughput is too low? Here is a collection of tips and ideas to increase indexing throughput with Elasticsearch. Some of them I have successfully tried myself, others I have only read about and found them reasonable. In any case, I hope you will find them useful. General Performance Before doing anything more specific, it makes sense to follow the advice given in the Elasticsearch documentation on  configuration . In a nutshell: Set the maximum number of open file descriptors for the user running Elasticsearch to at least 32k or 64k. If possible, consider disabling swapping for the Elasticsearch process memory. Note, however, that in a virtualized environment this may not behave as expected. Set -Xms to the same value as -Xmx (the same result can be achieved by setting the ES_HEAP_SIZE environment variable). Leave some amount of physical memory unassigned so t


Here’s a little story... about a developer working in a software company who was building up a nice platform. One day the boss calls the developer on the phone and says: "There’s a new task I need you to do for a really important client, and it must be done by the end of the day. All that’s needed," the boss continues "is to add a small piece of functionality to that class method you’ve been working on... it shouldn’t be too complex..." The days  go by... and that small class, edit by edit, grows into a small code monster: the more you feed it with IFs, the bigger it grows! Do you think it’s just a fairy tale? Well, it’s not! What follows below is a single class method taken from real code that is live on a server somewhere on the Internet right now. And this is just a "baby monster". The more you feed it, the bigger it grows! How to join the  Campaign public ArrayList eseguiAnalisi() { ArrayList listaSegnalazioni = new ArrayLi