A Quick Introduction to Elasticsearch for Node Developers

Elasticsearch <3 NodeJS, find out why

Feature Image

If you ask ten different developers the question “What is Elasticsearch?” don’t be surprised if each of them has a different answer. None of them are wrong; of course, it’s just that Elasticsearh is that versatile. This versatility has caused Elasticsearch to grow in popularity as a solution to many programming problems.

If you have wanted to experiment with this technology but still haven’t got the chance, today, in this tutorial, we will introduce you to the basic concepts of Elasticsearch. Then, we will use it to build a simple search engine with Node.js.


What is Elasticsearch?

This is a question that could have ten different answers. If I have to compact all of them into a single statement, I could say, Elasticsearch is a distributable, open-source, analytics, and search engine.

Elasticsearch is built on top of Apache Lucene. It stores data in the JSON format in a structure based on documents. In this regard, it is similar to a NoSQL database like MongoDB.

You can store and search a massive amount of data with Elasticseach in near realtime. It also provides a REST API for carrying out storing and search operations. Elasticsearch is highly scalable, and its distributed backend allows the distribution of tasks, including search, indexing, and analytics across a cluster of nodes.

To better understand how it works internally, let’s look at some of the core components and concepts used by Elasticsearch.

Cluster

A cluster is similar to any other cluster we see in a distributed system. It’s a collection of nodes, which are individual Elasticsearch servers.

Node

Node is an instance of an Elasticsearch where you can store, index, and search data. Nodes can belong to several types.

Each cluster has a master node that is responsible for cluster-wide management and configurations. Data nodes in a cluster are responsible for dealing with data like performing CRUD operations and responding to search and data aggregation queries.

You can query every node in a cluster, but some nodes, called client nodes, forward the received requests to either the master node or data nodes without processing them on their own.

Document

Documents are the basic unit of information that a node can index. Data in a document is stored in the JSON format. For example, if you are building an E-commerce website, you can store the details of each product in a different document. Data can be of different types, from numbers to texts to dates.

Index

The concept of an index in Elasticsearch is different from databases like MySQL.

In Elasticsearch, an index is a collection of documents that have similar characteristics. For example, in the previous E-commerce website, you can create an index of products, with all the individual product documents.

Each index has a unique name. You can use this name when performing CRUD or search operations on its documents. It is also the highest level of entity you can query in Elasticsearch.

Shards and Replicas

You can divide a single index into multiple parts, which are called shards. Each shard behaves as a fully-functional and independent “index.” You can distribute shards of an index across multiple nodes, ensuring that the data stored in a single node doesn’t exceed its capacity.

You can protect the system against node failures, which could leave individual shards inaccessible, by creating “replicas” of shards. They are redundant copies of the shards stored in your cluster of nodes. Replicas help to scale up the query capacity of the cluster as well.

Inverted Index

Elasticsearch uses the concept of the inverted index to deliver fast full-text search results. The inverted index is a data structure similar to a hash map since it maps individual words to their locations. It’s used by most search engines to deliver fast results when querying a large data set.

In Elasticsearch, the inverted index identifies every unique word that appears inside the text fields of documents in an index and maps them with the locations of all the documents each word appears in.

Now, when a full-text search query is sent to Elasticsearch, it filters the documents each word in the search query appears in using the inverted index in a matter of milliseconds despite the large size of the data set.

For example, let’s consider two documents that contain two different texts: “programming is for programmers” and “programmers are awesome”.

If these two are the only documents in our index, the inverted index created by Elasticsearch would look like something similar to this.

Word Document 1 Document 2
programming true false
is true false
for true false
programmers true true
are false true
awesome false true

Note that the actual inverted index stores more information than what is included in the above table.

Mapping

The mapping defines a field type (or types) for each data unit stored in a document. Like I mentioned before, Elasticsearch stores data that belongs to different field types .

However, you can choose not to define the field types initially. Elasticsearch has a feature called dynamic mapping which automatically detects and adds new fields to the index.

Elasticsearch indexing works differently for data that belong to different field types. For example, text fields are stored in inverted indices while numeric and geo fields are stored in BKD trees.

You can also define more than one field type for a data unit. For example, you can give a string “text” and “keyword” field types. Then, this string will be indexed as both a text field and a keyword field. This is important because, now, we can easily search for that string in full-text search and also use it for aggregations and sorting as a keyword.


Where is Elasticsearch used?

Elasticsearch is a technology used for a number of use cases. Let’s take a look at some of them.

  • Website search: Some websites need to provide fast searching features. For example, in an E-commerce site, users should be able to search for products and get the results quickly. We can implement this feature, including additional features like auto-completion using Elasticsearch.
  • Logging and log analytics: Elasticsearch is used to store and analyze logs from web applications and other similar applications in real-time.
  • More analytics usage: We can also use Elasticsearch for more analytical tasks like security analysis and business analytics. Elasticsearch stack provides useful tools like Kibana, used for data visualization and management, to easily achieve this task.
  • System monitoring and infrastructure metrics: Again, Elasticsearch is used to gather, store, and process performance metrics from different systems and visualize them in real-time.

Build a search engine with Elasticsearch and Node

After minutes of bleeding your eyes over reading, now you have got to the interesting part. We are going to implement a simple Node API to interact with Elasticsearch. We will create the endpoints to create new records and search stored data in this tutorial.

First, make sure to install Elasticsearch on your device following this guide . After installation, start Elasticsearch and make sure it is working properly by sending a request to the Elasticsearch server.

curl http://127.0.0.1:9200

Then, set up a new Node project for this tutorial.

We use the Node Elasticsearch client module, named elasticsearch. So, make sure to install the package along with express and body-parser.

npm install elasticsearch express body-parser

In the app.js file of our project, set up the Node server as you would normally do.

const express = require("express")
const bodyParser = require("body-parser")
const elasticsearch = require("elasticsearch")
const app = express()
app.use(bodyParser.json())

app.listen(process.env.PORT || 3000, () => {
    console.log("connected")
})

Now, the initial setup is complete. We can start working with Elasticsearch by creating an Elasticsearch client.

const esClient = elasticsearch.Client({
    host: "http://127.0.0.1:9200",
})

Next, we will create the POST /products endpoint. It accepts POST requests to index new products into an index called products in Elasticsearch.

For this, we can use the index method in the elasticsearch module.

app.post("/products", (req, res) => {
    esClient.index({
        index: 'products',
        body: {
            "id": req.body.id,
            "name": req.body.name,
            "price": req.body.price,
            "description": req.body.description,
        }
    })
    .then(response => {
        return res.json({"message": "Indexing successful"})
    })
    .catch(err => {
         return res.status(500).json({"message": "Error"})
    })
})

Since the index “products” doesn’t exist when we first start the server, the first request to this endpoint will prompt Elasticsearch to create the new index.

We can test this route by sending a new request via Postman. If your application is working properly, you will see the response message “Indexing successful”.

Trying our Products API

Trying our Products API

I have indexed several products with words like iPhone and Apple in them. Since we want to see how Elasticsearch responds to full-text queries, I recommend that you, too, index some products with somewhat similar names.

Next, let’s create the GET /products endpoint. It handles GET requests with text queries for a product user is searching for. We use this text query to search the name fields of the products indexed in Elasticsearch so that the server can respond with a list of products similar to what the user is looking for.

app.get("/products", (req, res) => {
    const searchText = req.query.text
    esClient.search({
        index: "products",
        body: {
            query: {
                match: {"name": searchText.trim()}
            }
        }
    })
    .then(response => {
        return res.json(response)
    })
    .catch(err => {
        return res.status(500).json({"message": "Error"})
    })
})

Again, we can test this route by sending a request via Postman. Since I saved a list of products that have names like Apple and iPhone, I use a search term that includes one of those words. The search term I used is “blue iphone backcover”.

Searching using Elasticsearch

Searching using Elasticsearch

The server responds with two products, one having “Apple iPhone 11 backcover” and another with “Apple iPhone 11”.

Though the 2nd result is not what the user was looking for, Elasticsearch considered it as a hit since it matches with one word in the search query, “iphone”. But it is listed after the result that is more relevant to our query.

If you look at the returned results again, you will see that there is a field called score returned with each hit. It determines the relevance of the result to the search query. A high score indicates a higher relevance, and the hits are listed in the descending order of the score, so the best match is being displayed at the top.


Conclusion

As you may have discovered after this tutorial, Elasticsearch is a quite fun tool to work with. There’s much more to Elasticsearch than what I covered here. But I hope this tutorial was enough to convince you to give it a try in your next project. So, have fun with Elasticsearch.

Join the Free Newsletter

A free, weekly e-mail with the best new articles, courses, and special bonuses.

We won't send you spam. Unsubscribe at any time.