Top Elasticsearch Interview Questions and Answers

What is Elasticsearch?

Elasticsearch is a popular, open-source search and analytics engine based on Apache Lucene. It's a NoSQL database that stores data as JSON documents, making it well-suited for handling unstructured and semi-structured data. Elasticsearch is known for its speed, scalability, and ease of use, enabling near real-time search capabilities.

Creator and Release Date of Elasticsearch

Shay Banon created Elasticsearch, first released in February 2010. It's licensed under the Apache 2.0 license.

Latest Elasticsearch Version

As of this writing, the latest version of Elasticsearch is 8.x. Check the official Elasticsearch website for the most up-to-date information on releases.

Key Features of Elasticsearch

  • Open-source and free to use.
  • RESTful API for easy interaction.
  • Supports various data types (including geolocation and text).
  • Schema-free; documents can have different fields.
  • Near real-time search capabilities.
  • Highly scalable and distributed.

Basic Elasticsearch Operations

  • Creating indices (databases).
  • Indexing documents (adding data).
  • Searching documents.
  • Updating documents.
  • Deleting documents and indices.
  • Freezing indices (to improve performance).

Elasticsearch Web Access Port

Elasticsearch's default HTTP port is 9200. This can be changed by modifying the elasticsearch.yml file.

Prerequisites for Working with Elasticsearch

  • Familiarity with JSON and REST APIs.
  • Java Development Kit (JDK) for installation.
  • (Optional) A tool like Kibana for visualization or a plugin like elasticsearch-head.

Elasticsearch Indices

An index in Elasticsearch is analogous to a database in a relational database system. It's a logical namespace that groups documents of a similar type.

Elasticsearch GUI

Elasticsearch itself doesn't have a built-in GUI. Tools like Kibana provide a user-friendly interface for interacting with Elasticsearch.

ELK Stack

The ELK stack (Elasticsearch, Logstash, Kibana) is a popular combination of tools used for log management and analytics:

  • Elasticsearch: Stores log data.
  • Logstash: Processes and transforms log data.
  • Kibana: Visualizes log data.

Tokenizers in Elasticsearch

Tokenizers break down text strings into individual words or terms (tokens) during indexing. Elasticsearch provides various tokenizers (e.g., standard, whitespace, n-gram).

Analyzers in Elasticsearch

Analyzers process text for indexing, typically consisting of a tokenizer and one or more filters. Analyzers prepare text data for efficient searching.

Frozen Indices

Freezing an index makes it read-only, improving performance and freeing up resources. Frozen indices are still searchable but cannot be updated until unfrozen.

Elasticsearch Mapping

Mapping defines how data fields are stored and indexed in Elasticsearch. This involves specifying data types for the fields.

Deleting an Index in Elasticsearch

Example

DELETE my_index
        

Near Real-Time (NRT) Search in Elasticsearch

Elasticsearch's NRT capability means that newly indexed documents become searchable very quickly, usually within a few seconds.

Elasticsearch APIs

Elasticsearch provides RESTful APIs for managing the cluster, indices, and documents. Common API categories include:

  • Document APIs
  • Search APIs
  • Index APIs
  • Cluster APIs
  • Admin APIs

Multi-Document APIs

Multi-document APIs (like Bulk API, Update By Query API) enable efficient batch operations on multiple documents.

Fetching Documents in Elasticsearch

You can fetch documents using either GET requests (with query parameters) or POST requests (with a query in the request body) using Elasticsearch's search API.

Query DSL (Query Domain-Specific Language) in Elasticsearch

Elasticsearch uses Query DSL (based on Apache Lucene) for searching and querying documents.

Clusters in Elasticsearch

A cluster in Elasticsearch is a collection of nodes working together to provide search and indexing capabilities. A single Elasticsearch installation runs as a node. Multiple nodes form a cluster.

Schema (Mapping) in Elasticsearch

Elasticsearch uses mappings to define the structure and data types of fields within an index.

Documents in Elasticsearch

Documents are the basic units of data storage in Elasticsearch. Each document is a JSON object and can have different fields. Documents are stored within indices.

Documents in Elasticsearch

Documents are the fundamental units of data in Elasticsearch. Each document is a JSON object with key-value pairs representing its fields. Documents are stored within indices and are uniquely identified by an automatically generated ID.

Document Types in Elasticsearch

Document types (now largely deprecated in modern Elasticsearch versions) provided a way to logically group similar documents within an index. While still functional in older versions, newer Elasticsearch versions have moved towards a single-type-per-index approach.

Shards in Elasticsearch

An index in Elasticsearch can be split into multiple shards. Each shard is a self-contained, independently searchable and manageable unit of data that can be stored and managed on different nodes. Sharding enhances performance and scalability by distributing data across the cluster.

Companies Using Elasticsearch

Many companies use Elasticsearch for its search and analytics capabilities.

  • Netflix
  • Udemy
  • Shopify
  • Walmart
  • Uber
  • Slack
  • Adobe

Index Lifecycle Management (ILM) in Elasticsearch

ILM (Index Lifecycle Management) automates index management by defining policies that control how indices transition through different phases (hot, warm, cold, delete). This optimizes resource utilization by making older, less frequently accessed indices read-only or deleting them.

Basic Document Operations in Elasticsearch

  • Adding a document (POST).
  • Retrieving a document (GET).
  • Updating a document (PUT, POST).
  • Deleting a document (DELETE).

Inverted Index in Elasticsearch

An inverted index is a data structure that maps terms (words) to the documents containing those terms. It allows for fast full-text search by enabling Elasticsearch to quickly locate documents containing specific words or phrases.

`from` and `size` Parameters in Elasticsearch

The from and size parameters control pagination in Elasticsearch search results, specifying the starting offset and the number of results to return.

Match vs. Term Queries

Match Query Term Query
Analyzes the query string; performs a full-text search (finds documents containing similar terms). Searches for exact matches of terms; doesn't analyze the query.

Downloading Elasticsearch

The type of file you download for Elasticsearch depends on your operating system:

  • Windows: .zip
  • Linux: .tar.gz
  • macOS: .tar.gz
  • Debian/Ubuntu: .deb

Integrating Elasticsearch with Other Tools

Elasticsearch integrates with various tools and technologies:

  • Kibana (for visualization).
  • Logstash (for log processing).
  • Many other tools and services.

Cluster Health in Elasticsearch

Cluster health indicates the overall status of an Elasticsearch cluster. A green status means all shards are allocated. Yellow indicates some unassigned shards, and red indicates problems.

Checking Cluster Health

GET _cluster/health
        

Write Operations on Frozen Indices

Write operations (indexing, updating, deleting) are not allowed on frozen indices. Indices must be unfrozen before writing is possible.

x-pack (Elasticsearch SQL)

x-pack (now part of the Elastic Stack) provides SQL support for Elasticsearch, enabling you to use SQL queries against your Elasticsearch data.

Ingest Node in Elasticsearch

An ingest node preprocesses documents before they are indexed. This allows for transformations (like adding or removing fields) before the documents are stored.

Repositories and Snapshots in Elasticsearch

A repository is a storage location for Elasticsearch snapshots. Snapshots are backups of your indices, allowing you to create backups, free up disk space and restore data.

Configuring path.repo

The path.repo setting in the elasticsearch.yml file specifies the directory where snapshots will be stored.

wait_for_completion Parameter

The wait_for_completion parameter in snapshot APIs controls whether the request waits for the snapshot operation to finish or returns immediately after the operation is started. Setting it to true makes the request wait until the snapshot is complete.

Restore API in Elasticsearch

The restore API allows you to restore a previously created snapshot back into a cluster, recovering your data.

Restore API Example

POST /_snapshot/my_repository/my_snapshot/_restore