Documentation

Select a category on the left, to get your answers quickly

AI-RAG-NLP-API API-Backup Managem... API-Cluster Manage... API-Config Files API-Index Manageme... API-Logs & Analyti... API-Security API-Web Crawler Billing CMS-Drupal CMS-Moodle CMS-Omeka CMS-Wordpress Data Backup Errors in Solr Opensolr Configura... Security Solr Manager Contr... Solr Tika Integrat... Teams Management Web Crawler Wiki

Opensolr AI Search

Opensolr AI Crawl & Search

Smarter Search. Zero Setup.

Our new AI-powered Web Crawler does the heavy lifting: it crawls your site, extracts structured data, applies NLP + NER, and feeds everything straight into Solr — fully indexed and ready to search.

No manual config. No fiddling with schemas. Just point it at your site and go.

AI enrichment: people, places, language, sentiment
Instant embedding with a clean, responsive UI
Supports HTML, PDFs, docs, images — even metadata & GPS
Live stats, recrawling, scheduling — all built-in

Learn More →

Because if your site’s content is smart, your search should be too. 🧠

Read All...

Hybrid Search in Opensolr: A Modern Approach

🚀 Hybrid Search in Apache Solr: Modern Power, Classic Roots

The Evolution of Search: From Keywords to Vectors 🔍➡️🧠

Important Pre-Req.

First make sure you have this embeddings field in your `schema.xml` (works with):

<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>

⚠️ Pay very close attention to the vectorDimension, as it has to match the embeddings that you are creating with your LLM Model. If using the Opensolr Index Embedding API, this has to be exactly: 384. This works with the Opensolr Embed API Endpoint which uses the all-MiniLM-L6-v2 embedding model.

Opensolr Also supports the native Solr /schema API, so you can also run these two, in order to add your fields to the `schema.xml`.

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 384,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":384,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in `solrconfig.xml` for atomic updates to use with the Opensolr Index Embedding API:

<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">

        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Why Vector Search Isn’t a Silver Bullet ⚠️

As much as we love innovation, vector search still has a few quirks:

Mystery Rankings: Why did document B leapfrog document A? Sometimes, it’s anyone’s guess. 🕳️
Chunky Business: Embedding models are picky eaters—they work best with just the right size of text chunks.
Keyword Nostalgia: Many users still expect the comfort of exact matches. “Where’s my keyword?” they ask. (Fair question!)

Hybrid Search: The Best of Both Worlds 🤝

Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.

How It Works

Double the Fun: Run a classic keyword query and a KNN vector search at the same time, creating two candidate lists.
Clever Combining: Merge and rank for maximum “aha!” moments.

Apache Solr Does Hybrid Search (Despite the Rumors) 💡

Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField, you’re all set.

Candidate Selection: Boolean Query Parser to the Rescue 🦸‍♂️

Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:

Union Example

q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: All unique hits from both searches. No duplicates, more to love! ❤️

Intersection Example

q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: Only the most relevant docs—where both worlds collide. 🤝

You also have to be mindful of the Solr version you are using, since we were able to make this work only on Solr version 9.0. Beware this did not work on Solr 9.6! Only reranking queries worked on Solr 9.6 (as shown below).

Basically, at this point, here are all the paramerers we sent Solr, to make this hybrid search working on Solr version 9.0:

Classic Solr Edismax Search combined with dense vector search (UNION)

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "vectorQuery":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "q":"{!bool must=$lexicalQuery must=$vectorQuery}",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "lexicalQuery":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "pf3":"text^5",
  "pf2":"tdescription^6"
}

Solr 9.6 reranking query. (It also works in Solr 9.0):

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "q":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "rqq":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "pf3":"text^5",
  "pf2":"tdescription^6",
  "rq":"{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}"
}

A few remarks:

🎹 This is based on the classic Opensolr Web Crawler Index, that does most of it’s work within the fields: title, description, text, uri.

📰 Index is populated with data crawled from various public news websites.

🔗 We embedded a concatenation of title, description and the first 50 sentences of text.

💼 We use the Opensolr Query Embed API, to embed our query at search-time.

🏃🏻‍♂️ You can see this search in action, here.

👩🏻‍💻 You can also see the Solr data and make your own queries on it. This index’ Solr API, is here.

🔐 Credentials are: Username: 123 / Password: 123 -> Enjoy! 🥳

Cheat Sheet

🤥 Below is a cheat-sheet, of the fields and how you’re supposed to use them if you run knn queries. Solr is very picky about what goes with knn and what doesn’t. For example, for the Union query, we were unable to use highlighting. But, if you follow the specs below, you’ll probably won’t be getting any Query can not be null Solr errors… (or will you? 🤭)

What Belongs Inside `{!edismax}` in `lexicalQuery`? 🧾

Parameter	Inside `lexicalQuery`	Why
`q`	✅ YES	Required for the subquery to function
`qf`, `pf`, `bf`, `bq`, `mm`, `ps`	✅ YES	All edismax features must go inside
`defType`	❌ NO	Already defined by `{!edismax}`
`hl`, `spellcheck`, `facet`, `rows`, `start`, `sort`	❌ NO	These are top-level Solr request features
—

💡 Hybrid Query Cheat Sheet

Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):

# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}

# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}

# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]

# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)

# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc

In Summary

Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨

“Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!”

Happy hybrid searching! 🥳

Read All...

AI-API - Summarize Context

Opensolr AI-Hints - Demo Implementation

The Opensolr AI-Hits API, is free to use as part of your Opensolr Account.
The Opensolr AI-Hints LLM will generate a summary of the context, either coming form your Opensolr Web Crawler Index, or a manually entered context.
A number of other instructions can be passed on to this API, for NER, and other capabilities. It is in Beta at this point, but will get better with time.

GET or POST https://api.opensolr.com/solr_manager/api/ai_summary
Parameters:
1. email - Required - your Opensolr registration email address
2. api_key - Required - your Opensolr api_key
3. index_name - Required - must be a valid Opensolr Index that belongs to you.
4. query - Optional - a Solr Search query stirng to search within your Opensolr Crawler Index. If left empty, you should add the context manually because no Solr search will be performed that would otherwise provide the context automatically.
5. context - Optional - If present, this context will be analyzed, instead of the content coming from your Solr Web Crawler Index.
6. instruction - Optional - The instruction to send to the AI LLM. Defaults to: "Answer the query in plain English language, or translate into English, and summarize all the key points about the dates, people, events, places, or other important things that you find in the context below. Focus on the main ideas." You can also use instructions such as: "Extract a list of people names... " or "Translated this content to ... LANG", etc...

Example: https://api.opensolr.com/solr_manager/api/ai_summary?email=PLEASE_LOG_IN&
api_key=PLEASE_LOG_IN&index_name=my_crawler_solr_index&instruction=Answer%20The%20Query&query=Who%20is%20Donald%20Trump?

Read All...

Create embeddings for a given payload, to be used in any application.

Opensolr API Endpoint: `embed`

Overview

The embed endpoint allows you to generate vector embeddings for any arbitrary text payload (up to 50,000 characters) and store those embeddings in your specified Opensolr index. This is ideal for embedding dynamic or ad-hoc content, without having to pre-index data in Solr first.

Endpoint URL

https://api.opensolr.com/solr_manager/api/embed

Supports only POST requests.

Authentication & Core Parameters

Parameter	Type	Required	Description
email	string	Yes	Your Opensolr registration email address.
api_key	string	Yes	Your API key from the Opensolr dashboard.
index_name	string	Yes	Name of your Opensolr index/core to use.

Embedding Parameters

Parameter	Type	Required	Default	Description
payload	string	Yes	–	The raw text string to embed. Maximum: 50,000 characters.

Special Notes

payload can be any UTF-8 text (e.g., a document, user input, generated content, etc).
If payload is missing or less than 2 characters, the API returns a 404 error with a JSON error response.
You must specify a valid index_name to indicate where the embedding should be stored (requires the appropriate field in your Solr schema).
The response is returned as JSON.

Example Solr Schema Requirement

To store embeddings, your Solr schema must define an appropriate vector field, for example:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" required="false" similarityFunction="cosine"/>

Adjust the name, type, and vectorDimension as needed to fit your use-case and model.

Example Usage

Example: POST Request

POST https://api.opensolr.com/solr_manager/api/embed
Content-Type: application/x-www-form-urlencoded

[email protected]&api_key=YOUR_API_KEY&index_name=your_index&payload=Your text to embed here.

Endpoint Behavior

Authenticates the user using email and api_key.
Checks for a valid payload parameter (must be 2-50,000 characters).
Generates vector embeddings for the provided payload.
Stores the generated embedding in the specified Solr index.
Responds with a JSON object containing the result.

Example Response

{
  "status": "success",
  "embedding": [/* vector values */],
  "length": 4381
}

Or, for invalid input:

{
  "ERROR": "Invalid payload"
}

Use Cases

Embed dynamic or user-generated text into your Solr index.
Integrate with chatbots, summarization tools, or any app needing AI-powered semantic representations on the fly.
Test or debug embedding functionality before bulk-embedding large datasets.

Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.

Read All...

Create embeddings for every document in your Opensolr Index

Opensolr API Endpoint: `embed_opensolr_index`

⚠️ Important Disclaimer (Read Before Use!)

Using the embed_opensolr_index endpoint involves Solr atomic updates, meaning each Solr document is updated individually with the new embeddings. Atomic updates in Solr only update the fields you include in the update payload—all other fields remain unchanged. However, you cannot generate embeddings from fields that are stored=false, because Solr cannot retrieve their values for you.

You will not lose stored=false fields just by running an atomic update. Atomic updates do NOT remove or overwrite fields you do not explicitly update. Data loss of non-stored fields only happens if you replace the entire document (full document overwrite), not during field-level atomic updates.

Because of this, it’s highly recommended to understand the implications of Solr atomic updates clearly. For most users, the safer approach is to create embeddings at indexing time (using the /embed endpoint), especially if you rely on non-stored fields for downstream features.

Please review the official documentation on Solr Atomic Updates to fully understand these implications before using this endpoint.

⚠️ Important Pre-Req:

First make sure you have this in `schema.xml`

<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>

Or run the following to create your field and type definition inside your schema.xml via the native Solr API:

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 384,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":384,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in `solrconfig.xml`:

<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">

        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Overview

The embed_opensolr_index endpoint allows Opensolr users to generate and store text embeddings for documents in their Opensolr indexes using a Large Language Model (LLM). These embeddings power advanced features such as semantic search, classification, and artificial intelligence capabilities on top of your Solr data.

Endpoint URL

https://api.opensolr.com/solr_manager/api/embed_opensolr_index

Supports both GET and POST methods.

Authentication Parameters

Parameter	Type	Required	Description
email	string	Yes	Your Opensolr registration email address.
api_key	string	Yes	Your API key from the Opensolr dashboard.
index_name	string	Yes	Name of your Opensolr index/core to be embedded.

Embedding Parameters

Parameter	Type	Required	Default	Description
emb_solr_fields	string	No	title,description,text	Comma-separated list of Solr fields to embed (can be any valid fields in your index).
emb_solr_embeddings_field_name	string	No	embeddings	Name of the Solr field to store generated embeddings.
emb_full_solr_grab	bool \| string	No	false	If “yes”, embed all documents in the index; otherwise use pagination parameters below.
emb_solr_start	integer	No	0	Starting document offset (for pagination).
emb_solr_rows	integer	No	10	Number of documents to process in the current request (page size).

Special Notes on Field Configuration

The API grabs data from the fields listed in emb_solr_fields, which defaults to title,description,text, but you may specify any fields from your index for embedding.
You can specify emb_solr_embeddings_field_name to match the embeddings field in your schema.
IMPORTANT: Your Solr index must have an embeddings field defined in your schema.xml. Example configuration:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" required="false" similarityFunction="cosine"/>

Replace embeddings and vector with your custom names if you use different field names.

Atomic Updates in Solr (Brief Explanation)

Solr atomic updates update only the fields you specify in the update request. Other fields—including those defined as non-stored (stored=false)—are not changed or removed by an atomic update. However, since non-stored fields cannot be retrieved from Solr, you cannot use them to generate embeddings after indexing time.

If you ever replace an entire document (full overwrite), non-stored fields will be lost unless you explicitly provide their values again.

Parameter Details

email: Your Opensolr account email. Used for authentication.
api_key: Secret API key from your Opensolr dashboard.
index_name: Name of the Solr core/index (must belong to your account).
emb_solr_fields: Fields to embed from your Solr documents. Comma-separated. Only text fields are supported.
emb_solr_embeddings_field_name: The Solr field where the generated embedding vector will be stored.
emb_full_solr_grab: Set to yes to embed all documents in the index; otherwise, the endpoint uses pagination.
emb_solr_start: Offset for selecting documents (for pagination through large datasets).
emb_solr_rows: Number of documents to embed per request (batch size).

Example Usage

Example: Minimal POST Request

POST https://api.opensolr.com/solr_manager/api/embed_opensolr_index
Content-Type: application/x-www-form-urlencoded

[email protected]&api_key=YOUR_API_KEY&index_name=your_index

Example: Full POST Request

POST https://api.opensolr.com/solr_manager/api/embed_opensolr_index
Content-Type: application/x-www-form-urlencoded

[email protected]&api_key=YOUR_API_KEY&index_name=your_index&emb_solr_fields=title,content&emb_solr_embeddings_field_name=embeddings&emb_full_solr_grab=yes

Example: GET Request

GET https://api.opensolr.com/solr_manager/api/[email protected]&api_key=YOUR_API_KEY&index_name=your_index

Endpoint Behavior

Authenticates the user with email and api_key.
Selects the index named index_name.
Fetches document data from the fields listed in emb_solr_fields.
Generates embeddings for each document and stores them in the specified emb_solr_embeddings_field_name.
If emb_full_solr_grab is yes, processes all documents; otherwise uses emb_solr_start and emb_solr_rows for batch processing.
Responds with plain text status/progress updates.

Use Cases

Enable vector/semantic search on your Opensolr index
Prepare your data for AI and LLM-powered analysis
Periodically update embeddings after document changes

Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.

Read All...

How to use OpenNLP (NER) with Opensolr

🚀 UPDATE Oct 29, 2024: OpenNLP + Opensolr Integration Guide

Heads up!
Before you dive into using NLP models with your Opensolr index, please contact us to request the NLP models to be installed for your Opensolr index.
We’ll reply with the correct path to use for the .bin files in your schema.xml or solrconfig.xml. Or, if you’d rather avoid all the hassle, just ask us to set it up for you—done and done.

What’s this all about?

This is your step-by-step guide to using AI-powered OpenNLP models with Opensolr. In this walkthrough, we’ll cover Named Entity Recognition (NER) using default OpenNLP models, so you can start extracting valuable information (like people, places, and organizations) directly from your indexed data.

⚠️ Note:
Currently, these models are enabled by default only in the Germany, Solr Version 9 environment. So, if you want an easy life, create your index there!
We’re happy to set up the models in any region (or even your dedicated Opensolr infrastructure for corporate accounts) if you reach out via our Support Helpdesk.

Add New Opensolr Index

You can also download OpenNLP default models from us or the official OpenNLP website.

🛠️ Step-by-Step: Enable NLP Entity Extraction

Create your Opensolr Index
- Use this guide to create your Opensolr index (Solr 7, 8, or 9).
- Pro Tip: Creating your index in the Germany Solr 9 Web Crawler Environment skips most of the manual steps below.
Edit Your schema.xml
- Go to the Opensolr Control Panel.
- Click your Index Name → Configuration tab → select schema.xml to edit.
- Add these snippets:
Dynamic Field (for storing entities):

<dynamicField name="*_s" type="string" multiValued="true" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" />

  **NLP Tokenizer fieldType:**

<fieldType name="text_nlp" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
            sentenceModel="en-sent.bin"
            tokenizerModel="en-token.bin"/>
         <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="en-pos-maxent.bin"/>
         <filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="en-chunker.bin"/>
         <filter class="solr.TypeAsPayloadFilterFactory"/>
     </analyzer>
 </fieldType>

- **Important:** Don’t use the `text_nlp` type for your dynamic fields! It’s only for the update processor.

Save, then Edit Your solrconfig.xml
- Add the following updateRequestProcessorChain (and corresponding requestHandler):

<requestHandler name="/update" class="solr.UpdateRequestHandler" >
    <lst name="defaults">
        <str name="update.chain">nlp</str>
    </lst>
</requestHandler>
<updateRequestProcessorChain name="nlp">
    <!-- Extract English People Names -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-person.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">people_s</str>
    </processor>
    <!-- Extract Spanish People Names -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">es-ner-person.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">people_s</str>
    </processor>
    <!-- Extract Locations -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-location.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">location_s</str>
    </processor>
    <!-- Extract Organizations -->
    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
        <str name="modelFile">en-ner-organization.bin</str>
        <str name="analyzerFieldType">text_nlp</str>
        <arr name="source">
            <str>title</str>
            <str>description</str>
        </arr>
        <str name="dest">organization_s</str>
    </processor>
    <!-- Language Detection -->
    <processor class="org.apache.solr.update.processor.OpenNLPLangDetectUpdateProcessorFactory">
        <str name="langid.fl">title,text,description</str>
        <str name="langid.langField">language_s</str>
        <str name="langid.model">langdetect-183.bin</str>
    </processor>
    <!-- Remove duplicate extracted entities -->
    <processor class="solr.UniqFieldsUpdateProcessorFactory">
        <str name="fieldRegex">.*_s</str>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Populate Test Data (for the impatient!)
- If you’re using the Germany Solr 9 Web Crawler, you can crawl your site and extract all the juicy entities automatically.
- Or, insert a sample doc via Solr Admin:
Sample JSON:

{
    "id": "1",
    "title": "Jack Sparrow was a pirate. Many feared him. He used to live in downtown Las Vegas.",
    "description": "Jack Sparrow and Janette Sparrowa, are now on their way to Monte Carlo for the summer vacation, after working hard for Microsoft, creating the new and exciting Windows 11 which everyone now loves. :)",
    "text": "The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.Learn more about how you can get involved."
}

See the Magic!
- Visit the query tab to see extracted entities in action!

Need a hand?

If any step trips you up, contact us and we’ll gladly assist you—whether it’s model enablement, schema help, or just a friendly chat about Solr and AI. 🤝

Happy Solr-ing & entity extracting!

Read All...

ICU library erorrs

Handling ICU Library Errors in Opensolr

What Are ICU Library Errors?

If you’re uploading or saving configuration files using the Opensolr Editor, you might occasionally be greeted by an error that looks a little something like this:

Error loading class ‘solr.ICUCollationField’

Don’t worry—this doesn’t mean the sky is falling or that your config files have started speaking in tongues.

Why Does This Happen?

The error above simply means the ICU (International Components for Unicode) library isn’t enabled on your Opensolr server (yet!). This library is required if your configuration references classes like solr.ICUCollationField—usually for advanced language collation and sorting.

How Do I Fix It?

The solution is delightfully simple: Contact Opensolr Support and request that we enable the ICU library for your server.

Contact Opensolr Support

A real human (yes, a human!) will flip the right switches for your server, and you’ll be back to uploading config files in no time.

Not Sure If This Is Your Error?

If you’re not sure what sort of error you’re running into—or just want to peek under the hood—you can always check your Error Logs after uploading config files:

Click your Index Name in the Opensolr Dashboard
Go to Tools → Error Logs

You’ll see something like this button in your dashboard:

Opensolr Error Logs

Check the logs to spot any ICU or other config errors. If it smells like ICU, contact us—if it smells like something else, well… contact us anyway. We’re here to help!

TL;DR

Get an ICU error? Ask Opensolr Support to enable the library for your server.
Not sure what’s broken? Check your Error Logs (Index Name → Tools → Error Logs).

Happy indexing!

Read All...

Documentation

Opensolr AI Search

Opensolr AI Crawl & Search

Smarter Search. Zero Setup.

Hybrid Search in Opensolr: A Modern Approach

🚀 Hybrid Search in Apache Solr: Modern Power, Classic Roots

The Evolution of Search: From Keywords to Vectors 🔍➡️🧠

Important Pre-Req.

First make sure you have this embeddings field in your schema.xml (works with):

Opensolr Also supports the native Solr /schema API, so you can also run these two, in order to add your fields to the schema.xml.

Seocond make sure you have this in solrconfig.xml for atomic updates to use with the Opensolr Index Embedding API:

Why Vector Search Isn’t a Silver Bullet ⚠️

Hybrid Search: The Best of Both Worlds 🤝

How It Works

Apache Solr Does Hybrid Search (Despite the Rumors) 💡

Candidate Selection: Boolean Query Parser to the Rescue 🦸‍♂️

Union Example

Intersection Example

Solr 9.6 reranking query. (It also works in Solr 9.0):

A few remarks:

Cheat Sheet

What Belongs Inside {!edismax} in lexicalQuery? 🧾

💡 Hybrid Query Cheat Sheet

In Summary

AI-API - Summarize Context

Opensolr AI-Hints - Demo Implementation

Create embeddings for a given payload, to be used in any application.

Opensolr API Endpoint: embed

Overview

Endpoint URL

Authentication & Core Parameters

Embedding Parameters

Special Notes

Example Solr Schema Requirement

Example Usage

Example: POST Request

Endpoint Behavior

Example Response

Use Cases

Support

Create embeddings for every document in your Opensolr Index

Opensolr API Endpoint: embed_opensolr_index

⚠️ Important Disclaimer (Read Before Use!)

⚠️ Important Pre-Req:

First make sure you have this in schema.xml

Or run the following to create your field and type definition inside your schema.xml via the native Solr API:

Seocond make sure you have this in solrconfig.xml:

Overview

Endpoint URL

Authentication Parameters

Embedding Parameters

Special Notes on Field Configuration

Atomic Updates in Solr (Brief Explanation)

Parameter Details

Example Usage

Example: Minimal POST Request

Example: Full POST Request

Example: GET Request

Endpoint Behavior

Use Cases

Support

How to use OpenNLP (NER) with Opensolr

🚀 UPDATE Oct 29, 2024: OpenNLP + Opensolr Integration Guide

What’s this all about?

🛠️ Step-by-Step: Enable NLP Entity Extraction

Need a hand?

ICU library erorrs

Handling ICU Library Errors in Opensolr

What Are ICU Library Errors?

Why Does This Happen?

How Do I Fix It?

Not Sure If This Is Your Error?

TL;DR

First make sure you have this embeddings field in your `schema.xml` (works with):

Opensolr Also supports the native Solr /schema API, so you can also run these two, in order to add your fields to the `schema.xml`.

Seocond make sure you have this in `solrconfig.xml` for atomic updates to use with the Opensolr Index Embedding API:

What Belongs Inside `{!edismax}` in `lexicalQuery`? 🧾

Opensolr API Endpoint: `embed`

Opensolr API Endpoint: `embed_opensolr_index`

First make sure you have this in `schema.xml`

Seocond make sure you have this in `solrconfig.xml`: