edgy_text
and edgeNgram
Explanationedgy_text
Field Type:The edgy_text
field type typically uses Edge N-gram tokenization to create prefix-based tokens for indexing. This means the field is indexed in a way that allows matching partial terms or phrases, improving search results, especially for autocomplete or search-as-you-type functionality.
This is useful for autocompletion, as it allows users to search by just typing the beginning of a word (or a term) and get matches based on the indexed prefixes.
edgy_text
works in Solr:The edgy_text
field type in Solr is usually implemented using the Edge N-gram filter. This filter generates prefixes of the input text and indexes those prefixes. It works well for fields where you want to allow prefix-based matching or autocomplete-like behavior.
edgeNgram
Field Type:The edgeNgram
filter, when used with a Solr field type, can be applied to a field to tokenize the input data (like product names, categories, or SKUs) into progressively longer prefixes of words.
If a field like product_name
contains “laptop”, the edgeNgram filter might produce the following tokens:
- “l”
- “la”
- “lap”
- “lapt”
- “lapto”
- “exampl”
- “laptop”
This allows a search query like “lap” to match “laptop”, making it perfect for search-as-you-type or autocomplete features.
Here’s how you can configure both edgy_text
and edgeNgram
in Solr:
edgy_text
FieldType Definition:This edgy_text
field type uses the Edge N-gram filter to break words into progressively longer prefixes.
<fieldType name="edgy_text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!-- Whitespace tokenizer to split the text into tokens based on spaces -->
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- Edge N-gram filter to create prefixes from each word -->
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<!-- Similar tokenization for query analysis -->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
minGramSize="1"
: Specifies that the smallest prefix (starting point) will be of length 1 (e.g., “l” from “laptop”).maxGramSize="25"
: Limits the length of the token to a maximum of 25 characters. You can adjust this based on your needs.edgy_text
for Fields:Once you’ve defined this field type, you can apply it to any field where you want to enable prefix matching. For instance, for a product name or SKU:
<field name="product_name" type="edgy_text" indexed="true" stored="true"/>
<field name="sku" type="edgy_text" indexed="true" stored="true"/>
These fields now support partial matching for searches, such as a query for “lap” returning “laptop”.
Consider an e-commerce site where users are typing product names in a search box. As the user types the prefix of the product name (e.g., “lap” for “laptop”), Solr can quickly return relevant results because the field is indexed with all possible prefixes of the product name. This results in faster search suggestions and matches.
edgy_text
with Edge N-gram
: This combination helps create an efficient, powerful search feature for prefix-based matches, such as autocompletion.