schema.xml
(works with):<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="cosine"/>
⚠️ Pay very close attention to the vectorDimension, as it has to match the embeddings that you are creating with your LLM Model. If using the Opensolr Index Embedding API, this has to be exactly: 384.
This works with the Opensolr Embed API Endpoint which uses the all-MiniLM-L6-v2
embedding model.
schema.xml
.$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
"add-field-type": {
"name": "vector",
"class": "solr.DenseVectorField",
"vectorDimension": 384,
"similarityFunction": "cosine"
}
}'
$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
"add-field": {
"name":"embeddings",
"type":"vector",
"indexed":true,
"stored":false, // true if you want to see the vectors for debugging
"multiValued":false,
"required":false,
"dimension":384, // adjust to your embedder size
"similarityFunction":"cosine"
}
}'
solrconfig.xml
for atomic updates to use with the Opensolr Index Embedding API:<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<int name="numVersionBuckets">65536</int>
<int name="maxNumLogsToKeep">10</int>
<int name="numRecordsToKeep">10</int>
</updateLog>
.....
</updateHandler>
As much as we love innovation, vector search still has a few quirks:
Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.
Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField
, you’re all set.
Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:
q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]
Result: All unique hits from both searches. No duplicates, more to love! ❤️
q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]
Result: Only the most relevant docs—where both worlds collide. 🤝
You also have to be mindful of the Solr version you are using, since we were able to make this work only on Solr version 9.0. Beware this did not work on Solr 9.6! Only reranking queries worked on Solr 9.6 (as shown below).
Basically, at this point, here are all the paramerers we sent Solr, to make this hybrid search working on Solr version 9.0:
Classic Solr Edismax Search combined with dense vector search (UNION)
{
"mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
"df":"title",
"ps":"3",
"bf":"recip(rord(timestamp),1,1500,500)^90",
"fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
"start":"0",
"fq":"+content_type:text*",
"rows":"100",
"vectorQuery":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
"q":"{!bool must=$lexicalQuery must=$vectorQuery}",
"qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
"pf":"title^15 description^7 uri^9",
"lexicalQuery":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
"pf3":"text^5",
"pf2":"tdescription^6"
}
{
"mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
"df":"title",
"ps":"3",
"bf":"recip(rord(timestamp),1,1500,500)^90",
"fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
"start":"0",
"fq":"+content_type:text*",
"rows":"100",
"q":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
"rqq":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
"qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
"pf":"title^15 description^7 uri^9",
"pf3":"text^5",
"pf2":"tdescription^6",
"rq":"{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}"
}
🎹 This is based on the classic Opensolr Web Crawler Index, that does most of it’s work within the fields: title, description, text, uri
.
📰 Index is populated with data crawled from various public news websites.
🔗 We embedded a concatenation of title
, description
and the first 50 sentences of text
.
💼 We use the Opensolr Query Embed API, to embed our query at search-time.
🏃🏻♂️ You can see this search in action, here.
👩🏻💻 You can also see the Solr data and make your own queries on it. This index’ Solr API, is here.
🔐 Credentials are: Username: 123 / Password: 123 -> Enjoy! 🥳
🤥 Below is a cheat-sheet, of the fields and how you’re supposed to use them if you run knn queries. Solr is very picky about what goes with knn and what doesn’t. For example, for the Union query, we were unable to use highlighting. But, if you follow the specs below, you’ll probably won’t be getting any Query can not be null
Solr errors… (or will you? 🤭)
{!edismax}
in lexicalQuery
? 🧾Parameter | Inside lexicalQuery |
Why |
---|---|---|
q |
✅ YES | Required for the subquery to function |
qf , pf , bf , bq , mm , ps |
✅ YES | All edismax features must go inside |
defType |
❌ NO | Already defined by {!edismax} |
hl , spellcheck , facet , rows , start , sort |
❌ NO | These are top-level Solr request features |
— |
Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):
# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}
# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}
# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]
# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)
# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc
Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨
“Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!”
Happy hybrid searching! 🥳