
Elasticsearch
Basic Interview Q&A
1. What is Elasticsearch? Explain in brief.
Elasticsearch is an open-source, distributed search and analytics engine based on the Apache Lucene library. It is designed to operate in near real-time, providing scalable, fast, and accurate text-based search capabilities along with numerous analytical functions.
Elasticsearch can store, search, and analyze large volumes of structured and unstructured data, making it a popular choice for various use cases such as full-text search, log, and event data analysis, and application performance monitoring.
2. Define ELK stack.
The ELK Stack is a collection of three open-source products or projects – Elasticsearch, Logstash, and Kibana – developed and maintained by Elastic. The acronym "ELK" is derived from the initials of these products.
These tools work together to deliver end-to-end log and event data management, analysis, and visualization.
3. What are the primary use cases of Elasticsearch?
Elasticsearch has numerous use cases across various domains due to its powerful search and analytical capabilities. Some of its use cases include:
- Full-text search: Elasticsearch makes complex search queries easier by quickly searching across large datasets. It is particularly helpful for websites, applications, or businesses that require instant and relevant search results.
- Log and event data analysis: Elasticsearch helps in quickly analyzing log data, system events, and application events, improving system monitoring and problem diagnosis.
- Anomaly detection: It can be used to detect unusual patterns, such as fraudulent activities, cyber-attacks, or performance issues, by analyzing stored data in real time.
- Data visualization: Elasticsearch can be combined with other tools like Kibana to create interactive data visualizations. This makes it easier to explore, understand, and use data effectively.
- Metrics and performance monitoring: Elasticsearch helps collect, analyze, and visualize performance metrics such as response time and system load, which aids in system optimization and capacity planning.
- Autocompletion and spell correction: Elasticsearch can provide real-time auto-completion and spell correction while users search, enhancing user experience and making searches more efficient.
- Geo-spatial search: Elasticsearch supports searching and filtering data based on geographical location, enabling distance-based search results and location-based analytics.
4. What is an Elasticsearch index?
An Elasticsearch index is a logical namespace that stores collections of documents with similar characteristics. An Elasticsearch index is identified by a unique name, which helps in referring to the index during various operations such as searching, updating, and deleting.
In Elasticsearch, the data is stored in the form of JSON documents. Elasticsearch utilizes a data structure known as an inverted index, which is specifically designed to enable rapid full-text search. The inverted index notes down every distinct word in any document and then identifies the complete list of documents where each unique word appears.
5. How does Elasticsearch ensure data reliability?
Elasticsearch ensures data reliability through several features, including:
- Replication: Data is replicated to multiple nodes in the cluster, which protects against data loss in the event of a node failure.
- Sharding: Data is divided into shards, which can be distributed across multiple nodes. This improves performance and scalability.
- Snapshots and Restores: Elasticsearch provides a snapshot and restore feature that allows you to create and restore backups of your data. This protects against data loss due to human error or disasters.
- Monitoring and alerting: Elasticsearch provides many monitoring and alerting features that can help you to identify and address potential data problems.
- Security: Elasticsearch can be configured with several security features to protect your data from unauthorized access.
6. What is a node in Elasticsearch? What are the different types of nodes in Elasticsearch?
A node in Elasticsearch refers to a single running instance of the Elasticsearch process in a cluster. Node is used to store data and engage in the indexing as well as search capabilities of the cluster.
Nodes communicate with one another to distribute data and workload, ensuring a balanced and high-performing cluster. Nodes can be configured with different roles, which determine their responsibilities in the cluster.
By using nodes, Elasticsearch can scale to handle large amounts of data and traffic. Nodes can be added to a cluster as needed, and they can be removed without affecting the availability of data. This makes Elasticsearch a highly scalable and reliable solution for storing and searching data.
You can assign roles to the nodes by setting up node.roles in Elasticsearch.yml. However, if you don’t set nodes.roles, by default, nodes will be assigned the following roles:
- Master-eligible nodes: These nodes are responsible for cluster-wide actions such as creating or deleting indices, managing nodes, and maintaining the overall cluster health. They participate in the election process for a master node, and one of them is elected as the master node.
- Data nodes: Data nodes store the actual data, called shards. They perform data-related operations such as indexing, searching, and aggregating. They also manage data replication to ensure high availability and resilience.
- Ingest nodes: These nodes pre-process incoming data before indexing. They use Elasticsearch's ingest pipelines to transform, enrich, and filter the data as it is ingested.
- Coordinating (or client) nodes: These nodes route search requests and handle query results. They do not store data or perform ingest processing but act as smart load balancers that optimize the distribution of queries and aggregations.
- Machine learning nodes: Machine learning nodes are dedicated to running machine learning jobs in Elasticsearch, commonly used for anomaly detection and data analysis.
- Cross-cluster search (CCS) nodes: CCS nodes enable querying multiple Elasticsearch clusters at once, acting as a single point to execute federated searches across these clusters.
- Voting-only nodes: These nodes are master-eligible but cannot be elected as master. Their primary function is to vote in master node elections, helping prevent tied votes and maintain cluster stability.
When setting nodes.roles, make sure to cross-check that nodes have been assigned roles as per your cluster’s needs. For instance, master and data roles are a must for every cluster.
7. What is a shard in Elasticsearch? What are the different types of shards in Elasticsearch?
A shard in Elasticsearch is a logical division of an index. An index can have one or more shards, and each shard can be stored on a different node in the cluster. Shards are used to distribute data across multiple nodes, which improves performance and scalability.
There are two types of shards in Elasticsearch:
- Primary shards
- Replica shards.
Primary shards are responsible for storing the original data, while replica shards are used to store backups of the data. By default, each index has one primary shard, but you can add additional primary shards to improve performance. You can also add replica shards to increase the availability of your data in the event of a node failure.
8. What is a replica in Elasticsearch?
A replica in Elasticsearch is a copy of a primary shard. Replicas are used to improve the availability of data in the event of a node failure. By default, each index has one primary shard and zero or one replica shard. The number of replica shards can be configured, and it is recommended to have at least one replica shard for each primary shard.
Replicas are located on different nodes in the cluster. This ensures that if one node fails, the data will still be available on the other nodes. Replicas are also updated in real-time, so they always have the most up-to-date data. Since replicas, just like primary shards, store a part of the index data, they can serve read requests, i.e., search and aggregation queries.
Having more replicas means that the search and read workload can be distributed among the primary and replica shards, which improves query performance and reduces the overall load on individual primary shards.
9. What is a document in Elasticsearch?
A document in Elasticsearch is a basic unit of information that can be indexed, stored, and searched. Documents are represented in JSON (JavaScript Object Notation) format, which is both human-readable and machine-parseable.
Each document consists of a collection of fields with their respective values, which can be of various data types like text, numbers, dates, geolocations, or booleans.
10. How do you create, delete, list, and query indices in Elasticsearch?
You can use the following commands:
- Command to create a new index – PUT /test_index?pretty
- Command to delete index -DELETE /test_index?pretty
- Command to list all index names and their basic information – GET _cat/indices?v
- Command to query an index – GET test_index/_search
- Command to query multiple indices – GET test_index1, test_index2/ _search
11. What is the Elasticsearch query language?
The Elasticsearch query language is referred to as the Query DSL (Domain Specific Language). It is a powerful and flexible language used for expressing queries in Elasticsearch. Query DSL is built on top of JSON and is used to construct complex queries, filters, and aggregations.
Query DSL features a wide range of queries and search capabilities, which can be categorized into:
- Full-text queries
- Term level queries
- Compound queries
- Join queries
- Geo queries
- Specialized queries
Query DSL also supports various other features like pagination, source filtering, highlighting, and sorting, enabling users to build even more sophisticated search and analysis capabilities.
12. What do you understand by index alias in Elasticsearch?
An index alias in Elasticsearch is a secondary name that can be used to refer to an index. Aliases can be used to make it easier to manage and use indexes. Aliases allow you to perform operations on multiple indices simultaneously or simplify index management by hiding the complexity of the underlying index structure.
Here are some of the benefits of using aliases in Elasticsearch:
- Simplicity: Aliases make it easier to manage and use indexes by providing a secondary name that can be used to refer to an index.
- Flexibility: Aliases can be used to group together a set of indexes, which can be useful if you want to perform the same operation on a group of indexes.
- Scalability: Aliases can be used to make it easier to scale your Elasticsearch cluster by providing a way to refer to a group of indexes
13. Explain the concept of Elasticsearch mapping.
In Elasticsearch, a mapping is a JSON object that defines the structure of a document. It specifies the fields that are allowed in a document, as well as their data types and other properties.
Mappings are used to control how documents are stored and indexed, and they also affect how documents can be searched and analyzed. Mappings are a powerful tool that can be used to store data in a structured way. They make it easier to search, filter, and analyze your data.
14. What are analyzers in Elasticsearch?
In Elasticsearch, an analyzer is a component that is used to tokenize text. Analyzers are used to break down text into smaller units called tokens. These tokens are then used to index and search the text. The primary goal of analyzers is to transform the raw text into a structured format (tokens) that can be efficiently searched and analyzed.
An analyzer consists of three main components:
- Tokenizer: The tokenizer breaks the input text into a sequence of terms (tokens), usually by splitting it on whitespace or punctuation boundaries.
- Token filters: Token filters process the stream of tokens generated by the tokenizer and can modify, add, or remove tokens.
- Character filters: Character filters are used to preprocess the input text before it reaches the tokenizer. They can modify, add, or remove individual characters from the text.
15. What is Kibana?
Kibana is an open-source data visualization and exploration tool that works on top of Elasticsearch. Kibana allows you to:
- Visualize and explore data stored in Elasticsearch indexes by creating various types of visualizations such as pie charts, line charts, histograms, and heat maps.
- Analyze Elasticsearch data in real-time by creating interactive, shareable dashboards that can display multiple visualizations.
- Manage Elasticsearch, including configuring index patterns, adding new fields, and applying mapping changes.
- Discover and explore raw data in Elasticsearch, filtering and searching based on specific fields or queries.
- Monitor Elasticsearch performance by tracking various metrics and logs.
16. How does Elasticsearch scale horizontally?
Elasticsearch scales horizontally by distributing data across multiple nodes. Each node in an Elasticsearch cluster can store and process data. By adding more nodes to the cluster, you can increase the amount of data that can be stored and processed. Elasticsearch uses the concept of sharding and replication to scale horizontally.
- Sharding: Sharding is the process of splitting data into smaller units called shards, each containing a portion of the data. Each shard is a fully functional and independent index that can be hosted on any node within the Elasticsearch cluster.
- Replication: Replication creates replica shards, which are redundant or backup copies of the primary shards, for increased fault tolerance and improved read performance.
17. Explain the role of creating, reading, updating, and deleting documents in Elasticsearch.
Elasticsearch is a document-based NoSQL database designed for fast and flexible full-text search, analytics, and data manipulation. The primary operations you can perform on the documents stored in Elasticsearch are creating, reading, updating, and deleting, collectively known as CRUD operations.
- Creating documents: Creating a document in Elasticsearch involves adding a new document with a unique ID to a specific index.
- Reading documents: Reading refers to retrieving a document or a group of documents from Elasticsearch based on specific criteria. You can fetch a document using its unique ID or perform searches using various queries that may involve matching, filtering, or aggregating the data.
- Updating documents: Updating a document involves modifying the content of an existing document or adding new fields.
- Deleting documents: Deleting documents removes data from Elasticsearch. You can delete documents by specifying their unique IDs or by using queries to delete multiple documents that match certain criteria.
18. What is an Elasticsearch cluster?
An Elasticsearch cluster is a group of one or more interconnected nodes (servers) working together to handle search, indexing, and data management tasks. The cluster enables horizontal scaling, distributes data and operations across multiple nodes, and achieves high availability and fault tolerance by replicating data across these nodes.
Key concepts associated with Elasticsearch are:
- Nodes
- Index
- Shards
- Replicas
- Cluster state
19. What is the significance of the _source field in Elasticsearch?
The _source field in Elasticsearch is an important system field that stores the original JSON object that was passed when a document was indexed. It is an essential part of Elasticsearch as it enables a variety of functionalities and provides several benefits:
- Document Retrieval: When you fetch a document or perform a search in Elasticsearch, the _source field allows you to return complete or partial original JSON objects to the user.
- Partial Updates: Elasticsearch supports partial updates, which allow you to modify specific fields within a document without reindexing the entire document. By specifying the _source field in an update request and providing the updated fields, you can update only the necessary parts of the document.
- Source Filtering: Elasticsearch provides source filtering, which allows you to control the fields returned in search results. With source filtering, you can specify a whitelist or a blacklist of fields to include or exclude from the _source field.
20. Describe the inverted index in Elasticsearch.
The inverted index is a core data structure used by Elasticsearch for efficient full-text search and retrieval. It is the backbone of Elasticsearch's search capabilities, enabling fast and accurate keyword-based search queries.
An inverted index works by breaking down text documents into smaller units called tokens. These tokens are then stored in a database, along with a list of the documents that contain the token. When a search query is performed, the inverted index is used to quickly find the documents that contain the search terms.
An inverted index is a powerful tool that can be used to search large amounts of text data and is used by various popular search engines such as Google, Bing, and Yahoo.
21. Explain the concept of eventual consistency in Elasticsearch.
Eventual consistency is a model used by some distributed systems like Elasticsearch, wherein the system guarantees that all nodes will eventually have a consistent view of the data, but not necessarily immediately after a write operation. In other words, it allows for temporary inconsistencies between nodes in order to prioritize better performance and availability.
Eventual consistency is a good choice for many applications because it provides a good balance between consistency and availability. With eventual consistency, you can be sure that your data will eventually be consistent, but you may not get the latest data immediately.
22. What are the key differences between RDBMS and Elasticsearch?
RDBMS (Relational Database Management System) and Elasticsearch are both data management systems but have different architectural principles.
- Data Model: RDBMS uses a relational data model, where data is organized into tables with rows and columns. Elasticsearch uses a document-based data model, where data is stored as JSON documents within indices.
- Query language: RDBMS uses SQL (Structured Query Language) to interact with the database, including actions like creating, retrieving, updating, or deleting data. Elasticsearch uses a query DSL (Domain Specific Language), which is JSON-based, to perform search and analytics operations on the indexed data.
- Scaling: RDBMS generally scales vertically, requiring more powerful hardware to handle larger datasets or increased operations. Elasticsearch is designed for horizontal scaling, distributing the data and operations across multiple nodes within a cluster, enabling better performance and capacity management.
- Performance: RDBMS is optimized for transactional processing, including various consistency guarantees and support for ACID properties. Elasticsearch is built for high-performance search and analytics, prioritizing fast response times and real-time indexing capabilities.
23. Describe the parent-child relationship in Elasticsearch.
A parent-child relationship in Elasticsearch is a way to model a hierarchical relationship between documents. In a parent-child relationship, one document (the parent) can have one or more child documents. The parent document is the root of the hierarchy, and the child documents are its descendants.
To create a parent-child relationship, specify the parent field in the child document. The parent field is a string that contains the ID of the parent document. When you index a child document, Elasticsearch will automatically associate it with the parent document.
24. What are aggregations in Elasticsearch? What are the different types of aggregations in Elasticsearch?
Aggregations in Elasticsearch are a powerful feature that allows you to analyze, summarize, and perform complex calculations on your dataset in real-time. Aggregations provide the capability to group and extract actionable insights from indexed data, which can be used for data visualization, reporting, and analytical purposes.
There are three main types of aggregations in Elasticsearch:
- Bucket aggregations: Bucket aggregations group documents into buckets based on a common value. For example, you can use a bucket aggregation to group documents by the value of a field, such as the product name or the user ID.
- Metric aggregations: Metric aggregations calculate metrics, such as the count, sum, average, or minimum value of a field. For example, you can use a metric aggregation to calculate the total number of documents, the sum of all prices, or the average age of all users.
- Pipeline aggregations: Pipeline aggregations work on the outputs of other aggregations rather than directly on document data. They can be used to chain multiple aggregations or to perform additional calculations, such as cumulative sums or moving averages.
25. What are field data types in Elasticsearch mapping?
In Elasticsearch, field data types define the nature of the values stored in a field and determine how the data is indexed, stored, and searched. When defining an index mapping, you can specify the field data type for each field to ensure appropriate handling and interpretation of the data.
Elasticsearch supports various field data types, each suited for different kinds of data such as text, keyword, numeric, date, boolean, binary, array, and object.
26. What are Elasticsearch refresh and flush operations?
In Elasticsearch, refresh and flush are index management operations that handle the process of making indexed documents available for search and maintaining the durability and integrity of the data.
Refresh is an operation that makes changes to the index available for search. When you add or update a document in Elasticsearch, the change is not immediately available for search. Instead, it is first added to a buffer in memory. The refresh operation copies the changes from the buffer to the index, making them available for search.
A flush operation ensures that data that's stored in Elasticsearch's in-memory buffers (also known as the in-memory transaction log) is written to disk, providing durability and data integrity. It clears the in-memory buffers and frees up memory resources. In addition, the flush operation also commits the transaction log and starts a new one.
27. What is Elasticsearch cat()?
In Elasticsearch, the cat() API is a set of simple and concise APIs that provide information about the cluster, nodes, indices, and other components in a human-readable format. The cat() API is primarily used for troubleshooting, monitoring, and obtaining quick insights into the state and health of an Elasticsearch cluster.
Some of the common cat APIs are cat.indices, cat.nodes, cat.health, and cat.allocation among others.
28. Explain the function of cat.indices in Elasticsearch.
In Elasticsearch, the cat.indices API provides a way to retrieve information about the indices in the cluster in a human-readable format. It allows you to obtain various details and statistics about the indices, such as their names, sizes, health status, number of documents, and more.
The cat.indices API is primarily used for monitoring and troubleshooting purposes, as it provides a quick and concise overview of the indices within the Elasticsearch cluster. It is often utilized by administrators and developers to gather essential information about the state and performance of the indices.
29. What is the use of cat.nodes in Elastic search?
In Elasticsearch, the cat.nodes API is used to retrieve information about the nodes in an Elasticsearch cluster. It provides a concise and human-readable overview of the individual nodes, their roles, statuses, resource usage, and other relevant metrics.
It helps administrators, developers, and operators monitor the health, resource utilization, and roles of individual nodes, allowing for better cluster management, troubleshooting, and performance optimization.
30. What is the cat.health API in Elasticsearch?
In Elasticsearch, the cat.health API is used to retrieve information about the overall health status of the cluster. It provides a concise and human-readable overview of the health of the cluster and its indices. The cat.health API is useful for monitoring the overall health of the Elasticsearch cluster and gaining visibility into any potential issues related to shard allocation, replica shards, or unassigned shards.
It allows administrators and operators to quickly assess the state of the cluster and take appropriate actions to ensure its stability and performance.
31. Discuss Elasticsearch filter context and query context.
In Elasticsearch, queries can be executed in two different contexts: the filter context and the query context.
- Filter context: Filter context is used when you want to filter out documents that do not match certain criteria. For example, you could use filter context to find all documents that were created after a certain date. In filter context, no scoresFilter context is generally more performance-efficient since it bypasses the scoring process calculated, so the order of the documents in the results is not relevant.
- Query Context: Query context is used when you want to rank documents based on their relevance to a query, such as a full-text search or a match on multiple fields. For example, you could use query context to find all documents that contain the word "Elasticsearch". In query context, scores are calculated, so the documents in the results are ordered by relevance.
32. Explain the differences between a query and a filter.
In Elasticsearch, queries and filters are used to retrieve specific documents from an index, but they have some key differences in terms of their functionality and usage.
- Query: The primary purpose of a query is to determine the relevance of documents to a search query. It calculates a relevance score for each document based on how well it matches the query criteria. Queries are used when you want to retrieve the most relevant documents based on their relevance scores.
- Filter: The main purpose of a filter is to narrow down the search results by applying specific conditions or filters. Filters are used when you want to include or exclude documents based on certain criteria without considering their relevance scores.
Wrapping up
The comprehensive set of Elasticsearch questions and answers we have provided here can offer great insight and knowledge to both hiring managers and developers alike. If you are a hiring manager, these practical questions help assess a candidate's knowledge and expertise in Elasticsearch, ensuring they possess the necessary skills for the job.
By evaluating candidates' understanding of Elasticsearch's features, optimization techniques, data management, and cluster scalability, hiring managers can confidently identify qualified individuals. If you want Turing to help with pre-vetted Elasticsearch candidates for your full-time roles, then you can hire top Elasticsearch developers by signing up with us.
Developers can use these questions to prepare for their interview and finetune their understanding of different concepts related to Elasticsearch. For remote Elasticsearch jobs sign up at Turing and find dream jobs at Fortune 500 companies.
Hire Silicon Valley-caliber Elasticsearch developers at half the cost
Turing helps companies match with top quality remote JavaScript developers from across the world in a matter of days. Scale your engineering team with pre-vetted JavaScript developers at the push of a buttton.
Tired of interviewing candidates to find the best developers?
Hire top vetted developers within 4 days.
Leading enterprises, startups, and more have trusted Turing
Check out more interview questions
Hire remote developers
Tell us the skills you need and we'll find the best developer for you in days, not weeks.











