ElasticSearch – Getting started

Context

A few weeks ago, our team decided to use ElasticSearch for the search functionality of the project we are implementing. I had no previous experience implementing any search engines, so I got excited to get my hands onto doing something really new. What I am going to describe you here is the process I followed do a spike on ElasticSearch and what I learned out of it.

Why should you care about it

Elasticsearch is useful on several scenarios. It is very good for searching product catalogs, creating recommendation engines, logs aggregation and searching for faster debugging, data mining and identification of patterns, and in many more scenarios. There is a chance, the product you are working on might need such a thing.

Some basic concepts first

As an entry point to this task, first I had to learn the basic concepts about ElasticSearch. In essence, ElasticSearch is a document database which uses Lucene as a search engine. It is fast and horizontally scalable, though, it is a very memory hungry system. It stores all documents inside so called Indices. An index can be imaged something like a database in the relational databases world. Inside indices, you can have types, which you might think of as database tables (not exactly like that, but a good analogy).

Data of an index is saved in one or more shards (the default is 5 but this is configurable). Every shard can have one or more replicas. A replica is just a copy of the shard and it serves for optimizing performance and failover. If you don’t have millions of data, it might be good to have fewer shards for a better performance.

Setup ElasticSearch on your computer

To start experimenting, I started with setting up an instance on my computer. Setting up ElasticSearch is pretty easy (at least for the beginning). There are numerous ways you could follow and they are very well documented in their installation page. The approach I took is to have it as a docker container on my local machine so I can start experimenting right away and for production to have a chef script to do the installation for us.

If you already have docker installed, having ElasticSearch up and running on your machine is a matter of minutes by executing pull command to get the image

and running the container

This command runs the container and exposes port ElasticSearch on port 9200. If you try in your browser executing http://localhost:9200 (username: elastic, password: changeme) you will receive a response like

This shows that ElasticSearch is running inside the docker container and you can start playing with it. You can find all these installation details in Elastic’s site.

Loading data to an index

Now that ElasticSearch is running, the next step is to populate some data so we can test the search functionality. At simplest, we can communicate with ElasticSearch using curl cli tool or by using a http tool like Postman, or Sense – chrome extension. Pushing data can be done manually one record at a time, or in bulk. As a first step, we could create an index by executing

This will create index named documents with two shards and one replica per shard. It will also create a type named person.

To insert a person document, we can execute a post request with person’s data

The url segments map to {index}/{type}/{id}. If we didn’t specify id, then ElasticSearch would create an id for us. The response from ElasticSearch for this request would look like

From the response, I can read that I posted to index documents and type person and the request was successful.

For the sake of simplicity, we just posted the data to ElasticSearch without doing any data mapping. If we don’t do it, Elastic will do a dynamic mapping. If you want to enforce the type of data uploaded, you might consider to create a mapping when creating the index. Please take a look at mapping documentation for more details.

Loading more data

Posting one or two documents is easy, however, that would be of little help to really test searching functionality. To make the test more realistically, what I did was to export data from our PostgreSQL database to a json file and then posting the file to ElasticSearch using a curl command

If the export file is too large, you might want to split it to smaller ports (I got the right size by ‘trial and error’).

Searching

ElasticSearch offers plethora of query combinations. The query documentation is large and describes all the ways to construct complex queries with filters and aggregations. To the simplest form, for the sake of demonstration, if I want to search for the person whose phone number is ‘1234567’, the query would look like

or in a more complex form

Either one would bring the same result. For more complex queries, please consult the documentation.

Conclusion

With such a short article, I could only touch the top of the iceberg about ElasticSearch. This article summarises the steps you can take to bring up an instance of ElasticSearch, load some data on it and start experimenting with it. Happy hacking.

If you enjoyed the article, please share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInPin on PinterestShare on RedditEmail this to someone

Leave A Comment

Your email address will not be published. Required fields are marked *