Introduction to Search Request
Get introduced to Elasticsearch search utilities.
We'll cover the following...
What is a search request?
A search request in Elasticsearch refers to a query that is transmitted to an Elasticsearch cluster with the purpose of retrieving data that matches particular criteria. To illustrate, suppose we want to find articles that include the term “marvel” or filter products with a price below “1000.” The search request defines the parameters of the search, including the index or indices to search, the query to run, any filters to apply, and any sorting or aggregation rules to use.
Elasticsearch provides a flexible and powerful search engine that supports a wide range of search types. These search types are commonly classified into three categories:
- Full-text queries
- Term-level queries
- Compound queries
These categories encompass various search types and offer versatile capabilities for searching and retrieving data from Elasticsearch.
Full-text queries
Full-text queries are powerful search queries designed to efficiently search and analyze unstructured or semi-structured text within Elasticsearch. These queries are specifically tailored to analyze fields such as text
fields, which contain textual data or documents.
With a full-text query, the search extends to the complete text within selected fields, beyond basic keyword or exact match searches. This extensive search capability is particularly valuable when dealing with large amounts of textual data, allowing for comprehensive exploration and retrieval of pertinent information.
When executing a full-text query, the query text undergoes analysis using an analyzer. This analysis process generates a list of tokens. These tokens are subsequently compared against the inverted index, which allows for efficient retrieval of relevant information. The distinguishing factor among various full-text queries lies in how they are matched with the inverted index, optimizing the search process for specific requirements.
We will be covering the following full-text queries:
-
Match query: This is the standard query for performing full-text searches and supporting fuzzy matching, phrase, and proximity queries. It searches for documents that contain the specified term or terms.
-
Multi-match: This query extends the functionality of the match query by allowing it to be executed on multiple fields simultaneously.
-
Match phrase: This query specifically looks for an exact phrase within a field, preserving the order of the terms. It only considers documents that contain the exact phrase as matches.
-
Match phrase prefix: This query combines the match phrase and prefix queries. It searches for a phrase within a field, allowing for a prefix match on the last term of the phrase.
Term-level queries
Term-level queries are designed to operate on exact terms or values within fields. These queries focus on matching specific terms rather than analyzing the text or considering the entire content of fields. Term-level queries are particularly useful for structured data or fields that do not undergo text analysis, such as date ranges, IP addresses, prices, or product IDs.
Unlike full-text queries, term-level queries do not perform any tokenization or stemming. They match terms exactly as they appear in the search query against the terms stored in the inverted index.
We will be covering the following term-level queries:
-
Term query: It is used to search for exact matches of a term in a specific field. It can be useful for searching for values like IDs or other fields where exact matches are required.
-
Range query: It is used to search for documents within a specified range of values in a specific field. It can be useful for searching for documents with a certain date range or for numeric values.
-
Prefix query: It is used to search for documents that contain a term with a specific prefix in a specific field. It can be useful for searching for documents with similar values with a common prefix.
-
Wildcard query: It is used to search for documents that contain a term with a specific pattern of characters in a specific field. It can be useful for searching for documents with similar values that match a specific pattern.
-
Fuzzy query: It is used to search for documents that contain terms similar to a specified term in a specific field. It can be useful for searching for documents with misspellings, typos, or other minor variations of a search term.
-
Regex query: It is a type of query that allows for searching documents using regular expressions.
Compound queries
Compound queries are query types that allow us to combine multiple queries together to create more complex and advanced search behaviors. These queries can be used to construct powerful search conditions by combining different types of queries and applying logical operations to them. One commonly used type of compound query in Elasticsearch is the boolean query.
The boolean query is a compound query that allows us to combine multiple clauses using boolean logic operators, such as must
, should
, and must_not
. It provides a flexible and expressive way to define complex search conditions by combining the results of individual queries.
Search Relevance
Search relevance is the measure of the accuracy of the relationship between the search query and the search results. In other words, search relevance is how well the search results answer the user’s question or satisfy their information needs.
The measures used to calculate the search relevance are precision and recall.
Precision
Precision is a measure of the accuracy of the search results. It is calculated as the number of relevant documents returned by the search divided by the total number of documents returned.
We can think of precision measures as how many of the documents returned by the search are actually relevant to the user’s query. For example, if a user runs a query for “Pepperoni Pizza,” and the returned documents contain 30 results with “Pepperoni Pizza,” 15 documents with “Veggie Pizza,” and 5 documents with “Margherita Pizza,” then the precision would be 60%. This would be calculated by 30 (the number of returned “Pepperoni Pizza” documents) divided by 50 (the total number of returned documents) multiplied by 100.
Recall
Recall is a measure of the completeness of the search results. It is calculated as the number of relevant documents returned by the search divided by the total number of relevant documents that exist in the index.
Recall measures the number of relevant documents in the index retrieved by the search. For example, let’s suppose a user runs a query for “Pepperoni Pizza.” In that case, 40 documents qualify as “Pepperoni Pizza,” and ...