Documentation > AI-RAG-NLP-API > How to use OpenNLP (NER) with Opensolr

🚀 UPDATE Oct 29, 2024: OpenNLP + Opensolr Integration Guide

Heads up!
Before you dive into using NLP models with your Opensolr index, please contact us to request the NLP models to be installed for your Opensolr index.
We’ll reply with the correct path to use for the .bin files in your schema.xml or solrconfig.xml. Or, if you’d rather avoid all the hassle, just ask us to set it up for you—done and done.


What’s this all about?

This is your step-by-step guide to using AI-powered OpenNLP models with Opensolr. In this walkthrough, we’ll cover Named Entity Recognition (NER) using default OpenNLP models, so you can start extracting valuable information (like people, places, and organizations) directly from your indexed data.

⚠️ Note:
Currently, these models are enabled by default only in the Germany, Solr Version 9 environment. So, if you want an easy life, create your index there!
We’re happy to set up the models in any region (or even your dedicated Opensolr infrastructure for corporate accounts) if you reach out via our Support Helpdesk.

Add New Opensolr Index

You can also download OpenNLP default models from us or the official OpenNLP website.


🛠️ Step-by-Step: Enable NLP Entity Extraction

  1. Create your Opensolr Index

    • Use this guide to create your Opensolr index (Solr 7, 8, or 9).
    • Pro Tip: Creating your index in the Germany Solr 9 Web Crawler Environment skips most of the manual steps below.
  2. Edit Your schema.xml

    • Go to the Opensolr Control Panel.
    • Click your Index Name → Configuration tab → select schema.xml to edit.

    Edit schema.xml

    • Add these snippets:

    Dynamic Field (for storing entities):

<dynamicField name="*_s" type="string" multiValued="true" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" />
  **NLP Tokenizer fieldType:**
<fieldType name="text_nlp" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
            sentenceModel="en-sent.bin"
            tokenizerModel="en-token.bin"/>
         <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="en-pos-maxent.bin"/>
         <filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="en-chunker.bin"/>
         <filter class="solr.TypeAsPayloadFilterFactory"/>
     </analyzer>
 </fieldType>
- **Important:** Don’t use the `text_nlp` type for your dynamic fields! It’s only for the update processor.
  1. Save, then Edit Your solrconfig.xml

    Save schema.xml

    • Add the following updateRequestProcessorChain (and corresponding requestHandler):
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
    <lst name="defaults">
        <str name="update.chain">nlp</str>
    </lst>
</requestHandler>
<updateRequestProcessorChain name="nlp">
    <!-- Extract English People Names -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-person.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">people_s</str>
    </processor>
    <!-- Extract Spanish People Names -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">es-ner-person.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">people_s</str>
    </processor>
    <!-- Extract Locations -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-location.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">location_s</str>
    </processor>
    <!-- Extract Organizations -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-organization.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">organization_s</str>
    </processor>
    <!-- Language Detection -->
    <processor class="org.apache.solr.update.processor.OpenNLPLangDetectUpdateProcessorFactory">
        <str name="langid.fl">title,text,description</str>
        <str name="langid.langField">language_s</str>
        <str name="langid.model">langdetect-183.bin</str>
    </processor>
    <!-- Remove duplicate extracted entities -->
    <processor class="solr.UniqFieldsUpdateProcessorFactory">
        <str name="fieldRegex">.*_s</str>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
  1. Populate Test Data (for the impatient!)

    • If you’re using the Germany Solr 9 Web Crawler, you can crawl your site and extract all the juicy entities automatically.
    • Or, insert a sample doc via Solr Admin:

    Solr Admin Panel Add Docs to Solr Index

    Sample JSON:

{
    "id": "1",
    "title": "Jack Sparrow was a pirate. Many feared him. He used to live in downtown Las Vegas.",
    "description": "Jack Sparrow and Janette Sparrowa, are now on their way to Monte Carlo for the summer vacation, after working hard for Microsoft, creating the new and exciting Windows 11 which everyone now loves. :)",
    "text": "The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.Learn more about how you can get involved."
}
  1. See the Magic!

    • Visit the query tab to see extracted entities in action!

    Solr Query Opensolr NLP End Result


Need a hand?

If any step trips you up, contact us and we’ll gladly assist you—whether it’s model enablement, schema help, or just a friendly chat about Solr and AI. 🤝


Happy Solr-ing & entity extracting!






Review us on Google Business
ISO-9001 CERTIFIED ISO-27001 CERTIFIED