Documentation > AI-RAG-NLP-API > Hybrid Search in Opensolr: A Modern Approach

🚀 Hybrid Search in Apache Solr: Modern Power, Classic Roots

The Evolution of Search: From Keywords to Vectors 🔍➡️🧠

Important Pre-Req.

First make sure you have this in schema.xml

<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>

Or run the following to create your field and type definition inside your schema.xml via the native Solr API:

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 384,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":384,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in solrconfig.xml:

<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">

        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Vector search has quickly become a core tool for modern search platforms. With advances in language models, we can encode text into high-dimensional vectors, making it possible to find not just what you type, but what you mean. It’s like giving your search engine a sixth sense! 🕵️‍♂️


Why Vector Search Isn’t a Silver Bullet ⚠️

As much as we love innovation, vector search still has a few quirks:

  • Mystery Rankings: Why did document B leapfrog document A? Sometimes, it’s anyone’s guess. 🕳️
  • Chunky Business: Embedding models are picky eaters—they work best with just the right size of text chunks.
  • Keyword Nostalgia: Many users still expect the comfort of exact matches. “Where’s my keyword?” they ask. (Fair question!)

Hybrid Search: The Best of Both Worlds 🤝

Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.

How It Works

  1. Double the Fun: Run a classic keyword query and a KNN vector search at the same time, creating two candidate lists.
  2. Clever Combining: Merge and rank for maximum “aha!” moments.

Apache Solr Does Hybrid Search (Despite the Rumors) 💡

Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField, you’re all set.


Candidate Selection: Boolean Query Parser to the Rescue 🦸‍♂️

Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:

Union Example

q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: All unique hits from both searches. No duplicates, more to love! ❤️

Intersection Example

q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: Only the most relevant docs—where both worlds collide. 🤝


Pre-Filtering vs. Post-Filtering: Timing Matters 🕰️

  • Pre-filtering (Solr 9.1+): Filtering happens while finding KNNs—so you get K matches if possible.
  • Post-filtering: Filtering after-the-fact, which might leave you with fewer than K.

Adjust with the filter’s cost parameter. Need more detail? Check Solr’s Query Guide 📖


Ranking Results: Who’s on Top? 🏆

Mixing lexical and vector scores isn’t just math—it’s art (with a little science):

🧮 Sum Normalized Scores

Normalize lexical scores (0–1) and add to KNN scores. Easy math, solid baseline.

✖️ Multiply Normalized Scores

Scale lexical scores (like 0.1–1) and multiply by KNN scores.
Tip: Test with real data—let your results do the talking!


Ready for Next-Level? Learning To Rank 🤖

Why handcraft rules when a model can learn what works? Solr’s Learning To Rank (LTR) lets you blend scores with machine-learned finesse.

Sample Feature Set:

[
  {"name": "lexicalScore", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q" : "{!func}scale(query(${lexicalQuery}),0,1)" }},
  {"name": "vectorSimilarityScore", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q" : "{!func}vectorSimilarity(FLOAT32, DOT_PRODUCT, vectorField, ${queryVector})" }}
]

Train your model outside Solr, then plug it in for search that adapts and improves.


What Belongs Inside {!edismax} in lexicalQuery? 🧾

Parameter Inside lexicalQuery Why
q ✅ YES Required for the subquery to function
qf, pf, bf, bq, mm, ps ✅ YES All edismax features must go inside
defType ❌ NO Already defined by {!edismax}
hl, spellcheck, facet, rows, start, sort ❌ NO These are top-level Solr request features

💡 Hybrid Query Cheat Sheet

Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):

# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}

# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}

# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]

# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)

# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc

In Summary

Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨

“Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!”

Happy hybrid searching! 🥳






Review us on Google Business
ISO-9001 CERTIFIED ISO-27001 CERTIFIED