Heads up!
Before you dive into using NLP models with your Opensolr index, please contact us to request the NLP models to be installed for your Opensolr index.
We’ll reply with the correct path to use for the.bin
files in yourschema.xml
orsolrconfig.xml
. Or, if you’d rather avoid all the hassle, just ask us to set it up for you—done and done.
This is your step-by-step guide to using AI-powered OpenNLP models with Opensolr. In this walkthrough, we’ll cover Named Entity Recognition (NER) using default OpenNLP models, so you can start extracting valuable information (like people, places, and organizations) directly from your indexed data.
⚠️ Note:
Currently, these models are enabled by default only in the Germany, Solr Version 9 environment. So, if you want an easy life, create your index there!
We’re happy to set up the models in any region (or even your dedicated Opensolr infrastructure for corporate accounts) if you reach out via our Support Helpdesk.
You can also download OpenNLP default models from us or the official OpenNLP website.
Create your Opensolr Index
Edit Your schema.xml
schema.xml
to edit.Dynamic Field (for storing entities):
<dynamicField name="*_s" type="string" multiValued="true" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" />
**NLP Tokenizer fieldType:**
<fieldType name="text_nlp" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.OpenNLPTokenizerFactory"
sentenceModel="en-sent.bin"
tokenizerModel="en-token.bin"/>
<filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="en-pos-maxent.bin"/>
<filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="en-chunker.bin"/>
<filter class="solr.TypeAsPayloadFilterFactory"/>
</analyzer>
</fieldType>
- **Important:** Don’t use the `text_nlp` type for your dynamic fields! It’s only for the update processor.
Save, then Edit Your solrconfig.xml
updateRequestProcessorChain
(and corresponding requestHandler
):<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">nlp</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="nlp">
<!-- Extract English People Names -->
<processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
<str name="modelFile">en-ner-person.bin</str>
<str name="analyzerFieldType">text_nlp</str>
<arr name="source">
<str>title</str>
<str>description</str>
</arr>
<str name="dest">people_s</str>
</processor>
<!-- Extract Spanish People Names -->
<processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
<str name="modelFile">es-ner-person.bin</str>
<str name="analyzerFieldType">text_nlp</str>
<arr name="source">
<str>title</str>
<str>description</str>
</arr>
<str name="dest">people_s</str>
</processor>
<!-- Extract Locations -->
<processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
<str name="modelFile">en-ner-location.bin</str>
<str name="analyzerFieldType">text_nlp</str>
<arr name="source">
<str>title</str>
<str>description</str>
</arr>
<str name="dest">location_s</str>
</processor>
<!-- Extract Organizations -->
<processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
<str name="modelFile">en-ner-organization.bin</str>
<str name="analyzerFieldType">text_nlp</str>
<arr name="source">
<str>title</str>
<str>description</str>
</arr>
<str name="dest">organization_s</str>
</processor>
<!-- Language Detection -->
<processor class="org.apache.solr.update.processor.OpenNLPLangDetectUpdateProcessorFactory">
<str name="langid.fl">title,text,description</str>
<str name="langid.langField">language_s</str>
<str name="langid.model">langdetect-183.bin</str>
</processor>
<!-- Remove duplicate extracted entities -->
<processor class="solr.UniqFieldsUpdateProcessorFactory">
<str name="fieldRegex">.*_s</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Populate Test Data (for the impatient!)
Sample JSON:
{
"id": "1",
"title": "Jack Sparrow was a pirate. Many feared him. He used to live in downtown Las Vegas.",
"description": "Jack Sparrow and Janette Sparrowa, are now on their way to Monte Carlo for the summer vacation, after working hard for Microsoft, creating the new and exciting Windows 11 which everyone now loves. :)",
"text": "The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.Learn more about how you can get involved."
}
See the Magic!
If any step trips you up, contact us and we’ll gladly assist you—whether it’s model enablement, schema help, or just a friendly chat about Solr and AI. 🤝
Happy Solr-ing & entity extracting!