The Basics of Elasticsearch
The Basics of Elasticsearch
March 15, 2023
Following videos are helpful to understand the overview of Elasticsearch more.
Elasticsearch architecture
About search relevance
Documents and index
- Document: JSON object, equivalent to a row in a table of RDBMS
- Index: The set of documents collected by the same type of data. For exaple, one index is for a user, second one is for a product in an e-commerce service.
Elasticsearch Cluster
Elasticsearch is a distributed system for search.
- Shard: the part of index. On multiple nodes, all documents in an index is split by multiple shards.
- Primary shard: The original shard
- Replica shard: The copy of its primary shard. It can be used to increase the throughput
Search
Basic terms for search:
- Term is a word to search documents
Elastic search searches a term by looking up an inverse index and find documents
To create an inverse index, first, a tokenization is required to split tokens from a document. In most cases, this splits the document into each word
Then from the tokens, create inverse indices to look up a document by each token
Relevance score
There are a few algorithms to score a search result. The related factor is
- Term frequency: Frequency of a term in a document
- Document frequency: Frequency of a query in all documents
Trade off of relevance: Precision vs Recall

- Precision: Accuracy of positive results
- True Positive / (True Positive + False Positive)
- Recall: How much the correct data is retrieved
- True Positive / (True Positive + False Negative)
Questions
- How to search the data on Elasticsearch? Is inverted index included in each shard to look up a word?
- Yes. See this article
Last updated on