Documentation > AI-RAG-NLP-API > Hybrid Search in Opensolr: A Modern Approach

🚀 Hybrid Search in Apache Solr: Modern Power, Classic Roots

The Evolution of Search: From Keywords to Vectors 🔍➡️🧠

Important Pre-Req.

First make sure you have this embeddings field in your `schema.xml` (works with):

<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>

⚠️ Pay very close attention to the vectorDimension, as it has to match the embeddings that you are creating with your LLM Model. If using the Opensolr Index Embedding API, this has to be exactly: 384. This works with the Opensolr Embed API Endpoint which uses the all-MiniLM-L6-v2 embedding model.

Opensolr Also supports the native Solr /schema API, so you can also run these two, in order to add your fields to the `schema.xml`.

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 384,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":384,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in `solrconfig.xml` for atomic updates to use with the Opensolr Index Embedding API:

<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">

        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Why Vector Search Isn’t a Silver Bullet ⚠️

As much as we love innovation, vector search still has a few quirks:

Mystery Rankings: Why did document B leapfrog document A? Sometimes, it’s anyone’s guess. 🕳️
Chunky Business: Embedding models are picky eaters—they work best with just the right size of text chunks.
Keyword Nostalgia: Many users still expect the comfort of exact matches. “Where’s my keyword?” they ask. (Fair question!)

Hybrid Search: The Best of Both Worlds 🤝

Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.

How It Works

Double the Fun: Run a classic keyword query and a KNN vector search at the same time, creating two candidate lists.
Clever Combining: Merge and rank for maximum “aha!” moments.

Apache Solr Does Hybrid Search (Despite the Rumors) 💡

Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField, you’re all set.

Candidate Selection: Boolean Query Parser to the Rescue 🦸‍♂️

Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:

Union Example

q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: All unique hits from both searches. No duplicates, more to love! ❤️

Intersection Example

q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: Only the most relevant docs—where both worlds collide. 🤝

You also have to be mindful of the Solr version you are using, since we were able to make this work only on Solr version 9.0. Beware this did not work on Solr 9.6! Only reranking queries worked on Solr 9.6 (as shown below).

Basically, at this point, here are all the paramerers we sent Solr, to make this hybrid search working on Solr version 9.0:

Classic Solr Edismax Search combined with dense vector search (UNION)

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "vectorQuery":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "q":"{!bool must=$lexicalQuery must=$vectorQuery}",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "lexicalQuery":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "pf3":"text^5",
  "pf2":"tdescription^6"
}

Solr 9.6 reranking query. (It also works in Solr 9.0):

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "q":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "rqq":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "pf3":"text^5",
  "pf2":"tdescription^6",
  "rq":"{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}"
}

A few remarks:

🎹 This is based on the classic Opensolr Web Crawler Index, that does most of it’s work within the fields: title, description, text, uri.

📰 Index is populated with data crawled from various public news websites.

🔗 We embedded a concatenation of title, description and the first 50 sentences of text.

💼 We use the Opensolr Query Embed API, to embed our query at search-time.

🏃🏻‍♂️ You can see this search in action, here.

👩🏻‍💻 You can also see the Solr data and make your own queries on it. This index’ Solr API, is here.

🔐 Credentials are: Username: 123 / Password: 123 -> Enjoy! 🥳

Cheat Sheet

🤥 Below is a cheat-sheet, of the fields and how you’re supposed to use them if you run knn queries. Solr is very picky about what goes with knn and what doesn’t. For example, for the Union query, we were unable to use highlighting. But, if you follow the specs below, you’ll probably won’t be getting any Query can not be null Solr errors… (or will you? 🤭)

What Belongs Inside `{!edismax}` in `lexicalQuery`? 🧾

Parameter	Inside `lexicalQuery`	Why
`q`	✅ YES	Required for the subquery to function
`qf`, `pf`, `bf`, `bq`, `mm`, `ps`	✅ YES	All edismax features must go inside
`defType`	❌ NO	Already defined by `{!edismax}`
`hl`, `spellcheck`, `facet`, `rows`, `start`, `sort`	❌ NO	These are top-level Solr request features
—

💡 Hybrid Query Cheat Sheet

Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):

# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}

# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}

# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]

# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)

# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc

In Summary

Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨

“Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!”

Happy hybrid searching! 🥳