Top Elasticsearch Interview Questions and Answers
What is Elasticsearch?
Elasticsearch is a popular, open-source search and analytics engine based on Apache Lucene. It's a NoSQL database that stores data as JSON documents, making it well-suited for handling unstructured and semi-structured data. Elasticsearch is known for its speed, scalability, and ease of use, enabling near real-time search capabilities.
Creator and Release Date of Elasticsearch
Shay Banon created Elasticsearch, first released in February 2010. It's licensed under the Apache 2.0 license.
Latest Elasticsearch Version
As of this writing, the latest version of Elasticsearch is 8.x. Check the official Elasticsearch website for the most up-to-date information on releases.
Key Features of Elasticsearch
- Open-source and free to use.
- RESTful API for easy interaction.
- Supports various data types (including geolocation and text).
- Schema-free; documents can have different fields.
- Near real-time search capabilities.
- Highly scalable and distributed.
Basic Elasticsearch Operations
- Creating indices (databases).
- Indexing documents (adding data).
- Searching documents.
- Updating documents.
- Deleting documents and indices.
- Freezing indices (to improve performance).
Elasticsearch Web Access Port
Elasticsearch's default HTTP port is 9200. This can be changed by modifying the elasticsearch.yml
file.
Prerequisites for Working with Elasticsearch
- Familiarity with JSON and REST APIs.
- Java Development Kit (JDK) for installation.
- (Optional) A tool like Kibana for visualization or a plugin like elasticsearch-head.
Elasticsearch Indices
An index in Elasticsearch is analogous to a database in a relational database system. It's a logical namespace that groups documents of a similar type.
Elasticsearch GUI
Elasticsearch itself doesn't have a built-in GUI. Tools like Kibana provide a user-friendly interface for interacting with Elasticsearch.
ELK Stack
The ELK stack (Elasticsearch, Logstash, Kibana) is a popular combination of tools used for log management and analytics:
- Elasticsearch: Stores log data.
- Logstash: Processes and transforms log data.
- Kibana: Visualizes log data.
Tokenizers in Elasticsearch
Tokenizers break down text strings into individual words or terms (tokens) during indexing. Elasticsearch provides various tokenizers (e.g., standard, whitespace, n-gram).
Analyzers in Elasticsearch
Analyzers process text for indexing, typically consisting of a tokenizer and one or more filters. Analyzers prepare text data for efficient searching.
Frozen Indices
Freezing an index makes it read-only, improving performance and freeing up resources. Frozen indices are still searchable but cannot be updated until unfrozen.
Elasticsearch Mapping
Mapping defines how data fields are stored and indexed in Elasticsearch. This involves specifying data types for the fields.
Deleting an Index in Elasticsearch
Example
DELETE my_index
Near Real-Time (NRT) Search in Elasticsearch
Elasticsearch's NRT capability means that newly indexed documents become searchable very quickly, usually within a few seconds.
Elasticsearch APIs
Elasticsearch provides RESTful APIs for managing the cluster, indices, and documents. Common API categories include:
- Document APIs
- Search APIs
- Index APIs
- Cluster APIs
- Admin APIs
Multi-Document APIs
Multi-document APIs (like Bulk API, Update By Query API) enable efficient batch operations on multiple documents.
Fetching Documents in Elasticsearch
You can fetch documents using either GET requests (with query parameters) or POST requests (with a query in the request body) using Elasticsearch's search API.
Query DSL (Query Domain-Specific Language) in Elasticsearch
Elasticsearch uses Query DSL (based on Apache Lucene) for searching and querying documents.
Clusters in Elasticsearch
A cluster in Elasticsearch is a collection of nodes working together to provide search and indexing capabilities. A single Elasticsearch installation runs as a node. Multiple nodes form a cluster.
Schema (Mapping) in Elasticsearch
Elasticsearch uses mappings to define the structure and data types of fields within an index.
Documents in Elasticsearch
Documents are the basic units of data storage in Elasticsearch. Each document is a JSON object and can have different fields. Documents are stored within indices.
Documents in Elasticsearch
Documents are the fundamental units of data in Elasticsearch. Each document is a JSON object with key-value pairs representing its fields. Documents are stored within indices and are uniquely identified by an automatically generated ID.
Document Types in Elasticsearch
Document types (now largely deprecated in modern Elasticsearch versions) provided a way to logically group similar documents within an index. While still functional in older versions, newer Elasticsearch versions have moved towards a single-type-per-index approach.
Shards in Elasticsearch
An index in Elasticsearch can be split into multiple shards. Each shard is a self-contained, independently searchable and manageable unit of data that can be stored and managed on different nodes. Sharding enhances performance and scalability by distributing data across the cluster.
Companies Using Elasticsearch
Many companies use Elasticsearch for its search and analytics capabilities.
- Netflix
- Udemy
- Shopify
- Walmart
- Uber
- Slack
- Adobe
Index Lifecycle Management (ILM) in Elasticsearch
ILM (Index Lifecycle Management) automates index management by defining policies that control how indices transition through different phases (hot, warm, cold, delete). This optimizes resource utilization by making older, less frequently accessed indices read-only or deleting them.
Basic Document Operations in Elasticsearch
- Adding a document (
POST
). - Retrieving a document (
GET
). - Updating a document (
PUT
,POST
). - Deleting a document (
DELETE
).
Inverted Index in Elasticsearch
An inverted index is a data structure that maps terms (words) to the documents containing those terms. It allows for fast full-text search by enabling Elasticsearch to quickly locate documents containing specific words or phrases.
`from` and `size` Parameters in Elasticsearch
The from
and size
parameters control pagination in Elasticsearch search results, specifying the starting offset and the number of results to return.
Match vs. Term Queries
Match Query | Term Query |
---|---|
Analyzes the query string; performs a full-text search (finds documents containing similar terms). | Searches for exact matches of terms; doesn't analyze the query. |
Downloading Elasticsearch
The type of file you download for Elasticsearch depends on your operating system:
- Windows:
.zip
- Linux:
.tar.gz
- macOS:
.tar.gz
- Debian/Ubuntu:
.deb
Integrating Elasticsearch with Other Tools
Elasticsearch integrates with various tools and technologies:
- Kibana (for visualization).
- Logstash (for log processing).
- Many other tools and services.
Cluster Health in Elasticsearch
Cluster health indicates the overall status of an Elasticsearch cluster. A green status means all shards are allocated. Yellow indicates some unassigned shards, and red indicates problems.
Checking Cluster Health
GET _cluster/health
Write Operations on Frozen Indices
Write operations (indexing, updating, deleting) are not allowed on frozen indices. Indices must be unfrozen before writing is possible.
x-pack (Elasticsearch SQL)
x-pack (now part of the Elastic Stack) provides SQL support for Elasticsearch, enabling you to use SQL queries against your Elasticsearch data.
Ingest Node in Elasticsearch
An ingest node preprocesses documents before they are indexed. This allows for transformations (like adding or removing fields) before the documents are stored.
Repositories and Snapshots in Elasticsearch
A repository is a storage location for Elasticsearch snapshots. Snapshots are backups of your indices, allowing you to create backups, free up disk space and restore data.
Configuring path.repo
The path.repo
setting in the elasticsearch.yml
file specifies the directory where snapshots will be stored.
wait_for_completion
Parameter
The wait_for_completion
parameter in snapshot APIs controls whether the request waits for the snapshot operation to finish or returns immediately after the operation is started. Setting it to true
makes the request wait until the snapshot is complete.
Restore API in Elasticsearch
The restore API allows you to restore a previously created snapshot back into a cluster, recovering your data.
Restore API Example
POST /_snapshot/my_repository/my_snapshot/_restore