Custom Analyzers: Character filters
Explore the built-in character filters offered by Elasticsearch.
Overview
A character filter is a component of an analyzer that receives the original text as a stream of characters and transforms the stream by adding, removing, or changing characters. It is used to modify or clean up the input text by removing special characters, converting cases, or replacing specific characters or sequences of characters.
There are three types of character filters available in Elasticsearch:
-
The HTML strip filter
-
The mapping character filter
-
The pattern-replace filter
HTML strip
An HTML strip is a character filter that removes HTML elements from the text and replaces these elements with their decoded values.
Example
The following request tests a custom analyzer that uses the html_strip
character filter with the keyword
tokenizer, which returns the entire text as a single token:
GET /_analyze{"tokenizer": "keyword","char_filter": ["html_strip"],"text": "<p>This is the <b>ElasticSearch</b> course.</p>"}
The custom analyzer produces the ...