schema.xml
<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>
$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
"add-field-type": {
"name": "vector",
"class": "solr.DenseVectorField",
"vectorDimension": 384,
"similarityFunction": "cosine"
}
}'
$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
"add-field": {
"name":"embeddings",
"type":"vector",
"indexed":true,
"stored":false, // true if you want to see the vectors for debugging
"multiValued":false,
"required":false,
"dimension":384, // adjust to your embedder size
"similarityFunction":"cosine"
}
}'
solrconfig.xml
:<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<int name="numVersionBuckets">65536</int>
<int name="maxNumLogsToKeep">10</int>
<int name="numRecordsToKeep">10</int>
</updateLog>
.....
</updateHandler>
Vector search has quickly become a core tool for modern search platforms. With advances in language models, we can encode text into high-dimensional vectors, making it possible to find not just what you type, but what you mean. It’s like giving your search engine a sixth sense! 🕵️♂️
As much as we love innovation, vector search still has a few quirks:
Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.
Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField
, you’re all set.
Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:
q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]
Result: All unique hits from both searches. No duplicates, more to love! ❤️
q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]
Result: Only the most relevant docs—where both worlds collide. 🤝
Adjust with the filter’s cost parameter. Need more detail? Check Solr’s Query Guide 📖
Mixing lexical and vector scores isn’t just math—it’s art (with a little science):
Normalize lexical scores (0–1) and add to KNN scores. Easy math, solid baseline.
Scale lexical scores (like 0.1–1) and multiply by KNN scores.
Tip: Test with real data—let your results do the talking!
Why handcraft rules when a model can learn what works? Solr’s Learning To Rank (LTR) lets you blend scores with machine-learned finesse.
Sample Feature Set:
[
{"name": "lexicalScore", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q" : "{!func}scale(query(${lexicalQuery}),0,1)" }},
{"name": "vectorSimilarityScore", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q" : "{!func}vectorSimilarity(FLOAT32, DOT_PRODUCT, vectorField, ${queryVector})" }}
]
Train your model outside Solr, then plug it in for search that adapts and improves.
{!edismax}
in lexicalQuery
? 🧾Parameter | Inside lexicalQuery |
Why |
---|---|---|
q |
✅ YES | Required for the subquery to function |
qf , pf , bf , bq , mm , ps |
✅ YES | All edismax features must go inside |
defType |
❌ NO | Already defined by {!edismax} |
hl , spellcheck , facet , rows , start , sort |
❌ NO | These are top-level Solr request features |
Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):
# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}
# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}
# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]
# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)
# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc
Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨
“Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!”
Happy hybrid searching! 🥳