Backup your database

To backup your database regularly is one of the tasks that can be very easily accomplished but is quite often neglected until the minute you face the first disaster. Backup solutions can be of very complex types which you might engineer to simple ones like copy and pasting. Depending on your application, the necessity for complex solutions might vary. So what could be a minimum solution that is viable and stable?

The simplest solution in my opinion should contain these elements:

  • Create a dump of the database
  • Copy the dump to another location (different from the DB server)

Now depending on your RDBMS and operating system, you can take different approaches to implement these two tasks. On a Linux system with a MySQL or PostgreSQL, my approach would be to write a shell script which does these two functions. One such a script might look like this

Line no. 3 creates the DATE variable which holds the current timestamp. It is used to construct the name of the backup file. Line no. 5 creates the FILE variable which holds the complete path of the backup file. Line no. 7 is the MySQL command to dump the database named “db_name” to the path determined in line no. 5.

So far, this code completes the first task of backup up the database. The next task is to copy it somewhere else so we don’t have the db and backup on the same disk. Now this “somewhere” could be some other server, some other disk, or somewhere else e.g.¬†cloud storage. My preferred choice would be to put it to a cloud storage like AWS S3. Having aws cli installed, copying the backup file to an AWS S3 bucket is a one line code

The complete file would look like:

Conclusion

I know a lot of people might not agree with me that this is a good solution, but in my opinion, this is the minimum code that does the work. What you need to do next is add this script to /etc/cron.daily folder and your script will execute daily and do the backup for you (please make sure that permissions are correctly set so it can be executed). It is not the most elegant solution but does the job.

As a bonus point, I would also add a “done” notification to a script which would send an email or post to a slack channel or whatever notification that suits you so you know that your script has executed.

ElasticSearch – Getting started

Context

A few weeks ago, our team decided to use ElasticSearch for the search functionality of the project we are implementing. I had no previous experience implementing any search engines, so I got excited to get my hands onto doing something really new. What I am going to describe you here is the process I followed do a spike on ElasticSearch and what I learned out of it.

Why should you care about it

Elasticsearch is useful on several scenarios. It is very good for searching product catalogs, creating recommendation engines, logs aggregation and searching for faster debugging, data mining and identification of patterns, and in many more scenarios. There is a chance, the product you are working on might need such a thing.

Some basic concepts first

As an entry point to this task, first I had to learn the basic concepts about ElasticSearch. In essence, ElasticSearch is a document database which uses Lucene as a search engine. It is fast and horizontally scalable, though, it is a very memory hungry system. It stores all documents inside so called Indices. An index can be imaged something like a database in the relational databases world. Inside indices, you can have types, which you might think of as database tables (not exactly like that, but a good analogy).

Data of an index is saved in one or more shards (the default is 5 but this is configurable). Every shard can have one or more replicas. A replica is just a copy of the shard and it serves for optimizing performance and failover. If you don’t have millions of data, it might be good to have fewer shards for a better performance.

Setup ElasticSearch on your computer

To start experimenting, I started with setting up an instance on my computer. Setting up ElasticSearch is pretty easy (at least for the beginning). There are numerous ways you could follow and they are very well documented in their installation page. The approach I took is to have it as a docker container on my local machine so I can start experimenting right away and for production to have a chef script to do the installation for us.

If you already have docker installed, having ElasticSearch up and running on your machine is a matter of minutes by executing pull command to get the image

and running the container

This command runs the container and exposes port ElasticSearch on port 9200. If you try in your browser executing http://localhost:9200 (username: elastic, password: changeme) you will receive a response like

This shows that ElasticSearch is running inside the docker container and you can start playing with it. You can find all these installation details in Elastic’s site.

Loading data to an index

Now that ElasticSearch is running, the next step is to populate some data so we can test the search functionality. At simplest, we can communicate with ElasticSearch using curl cli tool or by using a http tool like Postman, or Sense – chrome extension. Pushing data can be done manually one record at a time, or in bulk. As a first step, we could create an index by executing

This will create index named documents with two shards and one replica per shard. It will also create a type named person.

To insert a person document, we can execute a post request with person’s data

The url segments map to {index}/{type}/{id}. If we didn’t specify id, then ElasticSearch would create an id for us. The response from ElasticSearch for this request would look like

From the response, I can read that I posted to index documents and type person and the request was successful.

For the sake of simplicity, we just posted the data to ElasticSearch without doing any data mapping. If we don’t do it, Elastic will do a dynamic mapping. If you want to enforce the type of data uploaded, you might consider to create a mapping when creating the index. Please take a look at mapping documentation for more details.

Loading more data

Posting one or two documents is easy, however, that would be of little help to really test searching functionality. To make the test more realistically, what I did was to export data from our PostgreSQL database to a json file and then posting the file to ElasticSearch using a curl command

If the export file is too large, you might want to split it to smaller ports (I got the right size by ‘trial and error’).

Searching

ElasticSearch offers plethora of query combinations. The query documentation is large and describes all the ways to construct complex queries with filters and aggregations. To the simplest form, for the sake of demonstration, if I want to search for the person whose phone number is ‘1234567’, the query would look like

or in a more complex form

Either one would bring the same result. For more complex queries, please consult the documentation.

Conclusion

With such a short article, I could only touch the top of the iceberg about ElasticSearch. This article summarises the steps you can take to bring up an instance of ElasticSearch, load some data on it and start experimenting with it. Happy hacking.