Hybrid Search

Opensolr Hybrid Search — find answers to your questions

Testing Opensolr AI Search — Vector Search, AI Hints & Doc...

Step-by-Step Guide
Testing Your Opensolr AI Search Engine
Four powerful features ship with every Opensolr Web Crawler index — intent-based Vector Search, instant AI Hints, one-click Document Reader, and hands-on Query Elevation.
CrawlIndexEmbedSolrSearch
Your complete AI search pipeline — fully managed, out of the box
Intent-Based Vector Search
Instead of matching exact keywords, vector search understands what you mean. A query like "winter hat" finds wool beanies, fleece earflap caps, and knit headwear — even when those exact words aren't on the page. Opensolr uses BGE-m3 embeddings (1024 dimensions) combined with traditional BM25 scoring for the best of both worlds: semantic understanding plus keyword precision.
winter hatAIBGE-m31024-dimensional vector embeddings98%Wool Winter Cap94%Knit Beanie Set89%Fleece Earflap Hat
Hybrid Scoring (BM25 + Vectors)BGE-m3 1024-dimMultilingual
AI Hints — Instant Answers from Your Content
Before your users even scroll through results, AI Hints delivers a concise, AI-generated answer right at the top of the page. It uses RAG (Retrieval-Augmented Generation) — the AI retrieves the most relevant passages from YOUR indexed content, then generates a focused answer. No hallucinations, no external data — every hint is grounded in your actual pages.
best pellet heater for garage?RAG: retrieves from YOUR indexed contentAI HintLook for 40,000+ BTU models with thermostatVentilation required for enclosed spacesSee top-rated pellet heaters in results below
RAG-PoweredGrounded in Your DataZero Hallucinations
Document Reader — Summarize Any Search Result
Every search result includes a "Read" button. Click it, and the AI reads the entire web page, extracts the key information, and generates a clean summary — in seconds. You can then download the summary as a PDF. No need to visit the page, skim through ads, or parse dense content yourself.
Best Pellet Heaters 2026 — Expert ReviewsComplete guide to choosing the right pellet heater...heatersguide.com/pellet-heaters-2026ReadAIReaderPage SummaryTop 5 pellet heaters ranked by efficiency, noise level,and value. Castle 12327 rated best overall at $1,299...Download PDF
One-Click SummariesPDF ExportKey Feature Extraction
Query Elevation — Pin & Exclude Search Results
Take full control of what your users see. Query Elevation lets you pin important results to the top or exclude irrelevant ones — directly from the Search UI, with zero code and no reindexing required. Perfect for promoting landing pages, burying outdated content, or curating high-value queries.
Search ResultsProduct Landing Pageyoursite.com/products/best-sellerPin↑ Pinned #1— forced to top for this queryDrag to reorder when multiple results are pinnedExcluded result — hidden from this query
  • Pin — Force a specific result to the top for a given search query
  • Exclude — Hide a result completely so it never appears for that query
  • Exclude All — Apply the rule globally, across every search query
  • Drag & drop — Reorder pinned results to control exactly which one shows first
Zero Code RequiredExclude Irrelevant ResultsPin & Reorder

Try It Live

Test these demo search engines with real vector search. Use conceptual, intent-based queries:

Try these conceptual queries to see how vector similarity goes beyond keyword matching:

  • climate disasters hurricanes floods wildfires
  • space exploration mars colonization economy
  • ancient microbes life beyond earth

Every demo page includes built-in dev tools — query parameter inspector, full Solr debugQuery output, crawl statistics, and search analytics.


Using the Solr API Directly

Direct API access for advanced users — learn more about hybrid search.

Example Solr endpoints (credentials: 123 / 123):

https://de9.solrcluster.com/solr/vector/select?wt=json&indent=true&q=*:*&rows=2
https://fi.solrcluster.com/solr/rueb/select?wt=json&indent=true&q=*:*&rows=2
https://chicago96.solrcluster.com/solr/peilishop/select?wt=json&indent=true&q=*:*&rows=2

Simple Lexical Query

curl -u 123:123 "https://de9.solrcluster.com/solr/vector/select?q=climate+change&rows=5&wt=json"

Pure Vector Query (KNN)

curl -u 123:123 "https://de9.solrcluster.com/solr/vector/select?q={!knn%20f=embeddings%20topK=50}[0.123,0.432,0.556,...]&wt=json"

Replace the vector array with your own embedding from the Opensolr AI NLP API.

Hybrid Query (Lexical + Vector)

curl -u 123:123 "https://de9.solrcluster.com/solr/vector/select?q={!bool%20should=$lexicalQuery%20should=$vectorQuery}&lexicalQuery={!edismax%20qf=content}climate+change&vectorQuery={!knn%20f=embeddings%20topK=50}[0.12,0.43,0.66,...]&wt=json"

Combines traditional keyword scoring with semantic vector similarity — best of both worlds.


Getting Embeddings via Opensolr API

Generate vector embeddings for any text using these endpoints:

function postEmbeddingRequest($email, $api_key, $core_name, $payload) {
    $apiUrl = "https://api.opensolr.com/solr_manager/api/embed";
    $postFields = http_build_query([
        'email'      => $email,
        'api_key'    => $api_key,
        'index_name' => $core_name,
        'payload'    => is_array($payload) ? json_encode($payload) : $payload
    ]);

    $ch = curl_init($apiUrl);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => $postFields,
        CURLOPT_HTTPHEADER     => ['Content-Type: application/x-www-form-urlencoded'],
        CURLOPT_TIMEOUT        => 30,
    ]);

    $response = curl_exec($ch);
    curl_close($ch);
    return json_decode($response, true);
}

The response includes the vector embedding array you can pass directly to Solr.


Code Examples

PHP PHP

<?php
$url = 'https://de9.solrcluster.com/solr/vector/select?wt=json';
$params = [
    'q'            => '{!bool should=$lexicalQuery should=$vectorQuery}',
    'lexicalQuery' => '{!edismax qf=content}climate disasters',
    'vectorQuery'  => '{!knn f=embeddings topK=50}[0.12,0.43,0.56,0.77]'
];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERPWD, '123:123');
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($params));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

echo $response;

Py Python

import requests
from requests.auth import HTTPBasicAuth

url = "https://de9.solrcluster.com/solr/vector/select"
params = {
    'q': '{!bool should=$lexicalQuery should=$vectorQuery}',
    'lexicalQuery': '{!edismax qf=content}climate disasters',
    'vectorQuery': '{!knn f=embeddings topK=50}[0.12,0.43,0.56,0.77]',
    'wt': 'json'
}

response = requests.post(url, data=params, auth=HTTPBasicAuth('123', '123'))
print(response.json())

JS JavaScript (AJAX)

<script>
fetch('https://de9.solrcluster.com/solr/vector/select?wt=json&q={!knn%20f=embeddings%20topK=10}[0.11,0.22,0.33]', {
    headers: { 'Authorization': 'Basic ' + btoa('123:123') }
})
.then(r => r.json())
.then(console.log);
</script>

Quick Reference

  • Adjust topK to control how many similar results to retrieve (usually 20-100).
  • Use {!bool should=...} for softer relevance mixing — vector similarity has more influence on ranking.
  • For best hybrid results, always combine both lexical and vector queries.
  • All demo search pages include built-in query inspector, debugQuery, crawl stats, and search analytics.
Ready to Add AI Search to Your Site?
Get a fully managed vector search engine with AI Hints and Document Reader — set up in minutes.
Custom Plans Available

This is a premium feature available on custom plans tailored to your needs and budget. For small websites, we can even provide these features for free after validating your use case. Contact us at support@opensolr.com to discuss your requirements.

Read Full Answer

Search Tuning — Per-Index Relevancy Controls

Search Tuning — per-index relevancy controls

Search Tuning is a panel inside the Index Settings of every Opensolr index that lets you shape how results are ranked for that one index, with no code changes and no re-indexing. Every setting is read at query time, so the very next search you run after moving a slider already uses the new value.

This page documents the controls exactly as they appear in the Opensolr admin panel today, with the real defaults from config/search_defaults.php and the live UI in tools_new.php. If you are a Drupal or WordPress user the same settings exist in your CMS module / plugin, with slightly different per-CMS defaults — see the note at the bottom of this page.

Where to find it

Open your index in the dashboard, then hit the Search Tuning section in the Index Settings sidebar:

https://opensolr.com/admin/solr_manager/tools/YOUR_INDEX_NAME

Every change is saved automatically (debounced ~400 ms). There is no Save button.

The controls, in order they appear in the UI

The panel shows different controls depending on whether your index is vector-enabled or pure lexical. Vector-only controls are clearly noted below.

What “vector-enabled” means — and what it costs

A vector-enabled index has a 1024-dimension dense-vector field (embeddings) in its schema, populated at index time by our GPU-backed embedding service (E5-large-instruct, multilingual). That vector field is what unlocks the AI / hybrid capabilities of Opensolr search:

  • Hybrid Search — lexical (BM25) and semantic (KNN) combined inside {!bool} with the four search modes shown above (Union / Keywords Required / Meaning Required / Intersection).
  • Pure Vector / Semantic Search — KNN-only mode with the Semantic ↔ Lexical Balance slider pushed all the way left.
  • AI Hints — the streaming AI summary panel above search results.
  • AI Reader — the full-screen AI-generated reading-mode for any single document.

None of these features are included in the standard Opensolr hosting plans. Standard plans give you full lexical search, faceting, autocomplete, spellcheck, query elevation, multilingual support, and all the rest — out of the box, no add-ons. Vector and AI features run on dedicated GPU infrastructure (LLM + embedding model) which is significantly more expensive to operate than a CPU-only Solr cluster, so they live on custom AI-enabled plans tailored per customer based on expected query volume, embedding budget, and feature mix.

If your index name ends in __dense (e.g. tdr__dense, fluke__dense) it is already on a vector-enabled plan and the AI controls in this panel are live. If your index name does not end in __dense and is not on the legacy whitelisted vector cores, the Semantic ↔ Lexical Balance, Search Mode and Vector Candidate Pool controls will not appear in the UI — the index is lexical-only.

To enable AI / vector search on an existing index, or to create a new __dense index for it, contact support@opensolr.com for a quote. See also the Hybrid Search architecture overview and the platform docs page on AI & Vector Search.

Defaults at a glance

ControlDefaultRangeVector only?
Field weight: Title0.100.01 – 1.0no
Field weight: Description0.050.01 – 1.0no
Field weight: URI0.010.01 – 1.0no
Field weight: Text0.010.01 – 1.0no
Field weight: LD Text (text_t)0.010.01 – 1.0no
Semantic ↔ Lexical Balance0.300.10 – 1.00 (10% – 100%)yes
Minimum MatchFlexibleFlexible / Balanced / Strict / Customno
Search ModeUnionUnion / KW Required / Meaning Required / Intersection / Lexical Onlyyes
Vector Candidate Pool (topK)20010 – 1000yes
Content Quality Boost0 (off)0 – 1.0no
Minimum Relevance Score0 (off)0 – 1.0no
Results Per Page1010 – 200no

Defaults are read from addons/default/modules/solr_manager/config/search_defaults.php on the Opensolr platform. A NULL value in your index configuration means "use the system default" — clicking the per-control Reset button restores that NULL.

1. Field Weights

Five inputs in a single row: Title, Description, URI, Text, LD Text. Each value is the multiplier applied to that field in the underlying Solr edismax qf / pf parameter when the lexical side of the query runs.

Why the defaults are so small (0.01–0.1): they are deliberately kept low so the lexical score stays in the same magnitude as the KNN cosine similarity score (0–1) when the two are combined inside {!bool}. If you push them much higher, lexical wins everything and semantic relevance becomes meaningless. If you push them lower, semantic wins everything and exact-keyword matches lose. Stick within 0.01–1.0 and tune relative to each other rather than absolutely.

The fifth field (LD Text): the text_t Solr field is the cleaner, structured-text version of text, populated from JSON-LD that the crawler extracted at index time. It usually has less boilerplate / nav / footer noise than text, so giving it a slightly higher weight (e.g. 0.03) often improves ranking on content-heavy sites.

Phrase boosting (Solr pf, pf2, pf3) is automatic on top of qf at 0.8x / 0.4x / 0.2x of the same field weights respectively. This rewards results where the query terms appear close together as a phrase or bigram/trigram. There is no UI for it because the multipliers track the field weights you set.

2. Semantic ↔ Lexical Balance (vector indexes only)

A slider 10% ↔ 100% mapping to a lexical_weight value of 0.10 to 1.00 (default 0.30). It controls how much the lexical (BM25) score contributes to the final ranking versus the vector (KNN) score.

Internally we apply a non-linear reshape: pow(BM25 + 1, lexical_weight) - 1. That means:

  • 1.00 — full BM25, lexical dominates. Best for product/SKU/identifier search.
  • 0.50 — sqrt-like compression of BM25, gentle blend.
  • 0.30 — default, vector takes the lead, lexical plays a supporting role. Good general-purpose value for content sites.
  • 0.10 — lexical heavily compressed, vector dominates. Best for natural-language / question-style queries.

Named-entity matches (people, places, brands) keep their dominance at higher slider values; conceptual / ambiguous queries benefit from lower values.

3. Search Mode (vector indexes only)

Five radio buttons that decide how the lexical query and the vector KNN query are combined inside the {!bool} query. Default is Union.

Mode{!bool} structureWhen to use it
Union (default)should + shouldBroadest. Either signal surfaces a doc. Best for general site search.
Keywords Requiredmust + shouldDoc must match keywords; semantic adds ranking boost. Best for product / part-number / brand search.
Meaning Requiredshould + mustDoc must be semantically relevant; keywords add ranking boost. Best for natural-language / Q&A queries.
Intersectionmust + mustBoth signals required. Most precise, most restrictive. Use when you want very tight relevance.
Lexical Onlyno vectorSkips the embedding API entirely. Pure BM25. Use when you want deterministic keyword behaviour, the embedding service is slow, or you want a fallback that doesn't depend on vector availability. Automatically zeros the Minimum Relevance Score.

4. Minimum Match (mm)

Controls how many of the user's typed query terms must appear in a document for it to qualify as a match. Default preset: Flexible. Three presets plus a Custom option for raw Solr mm syntax.

Presetmm valueBehaviour
Flexible (default)2<65% 4<50% 8<40%Lenient. Short queries match all terms; 4+ terms need ~50%; 8+ terms need ~40%. Best for broad, conversational searches.
Balanced2<90% 5<75% 8<60% 12<50%Proven middle ground. Short queries need ~90%, medium ~75%, long ~60%, very long ~50%.
Strict2<95% 5<90% 8<80%Precise. Almost all terms must match. Short queries ~95%, long queries still ~80%.
Customyour ownAny valid Solr mm value: positive integer (3), negative integer (-2), percentage (75%), negative percentage (-25%), or tiered (2<90% 5<75%).

Tiered syntax: N<M% means — when the query has more than N terms, require M% of them to match. Tiers chain: 2<90% 5<75% means 1–2 term queries require all to match, 3–5 term queries need 90%, 6+ need 75%.

5. Vector Candidate Pool (topK) (vector indexes only)

Slider 10 – 1000 (default 200). Maps directly to the KNN topK parameter: {!knn f=embeddings topK=200}[query_vector]. It is the size of the candidate pool the vector side considers before the {!bool} combiner ranks across both sides.

Higher = broader semantic recall (more dots considered) but slower per-query latency and more facet-count noise; lower = faster and tighter but you may lose semantically relevant docs that didn't make the topK cut. 200 works for most indexes up to ~1M docs. Consider 300–500 for very large indexes where you observe missing semantic matches; consider 50–100 if your index is small and you want maximum speed.

6. Content Quality Boost

Slider 0 – 1.0 (default 0 / off). When > 0, an additive boost is applied via bf=linear(quality_f, weight, 0) on the edismax side. The quality_f Solr field is computed at indexing time based on title length, description length, body text length, and presence of an og:image.

Effect: rich pages (long descriptions, body text, images) are pushed up; thin/stub pages (one-line titles, no body) drop. Useful when your index mixes substantial articles with low-effort pages and you want depth to win.

Try 0.3–0.5 for sites mixing detailed articles with stubs. Leave at 0 if your content is reasonably uniform — the boost just becomes noise.

7. Minimum Relevance Score

Slider 0 – 1.0 (default 0 / off). Applied as a Solr frange post-filter on the combined {!bool}(lexical+vector) score: fq={!frange l=X cache=false cost=200}query($q). Drops results whose final score falls below the threshold.

Why it exists: the lexical side of the query is wrapped in {!func} which means it scores every doc (zero for non-lexical hits), and KNN topK always returns N docs even when cosine similarity is near-zero. Without this filter, numFound inflates to the size of your base filter set and facet counts get polluted by noise docs.

  • 0 — off, all candidates returned. numFound may be huge. Default for backwards compatibility.
  • 0.1 – 0.3 — recommended sweet spot. Kills pure noise (sqrt(BM25+1)-1 easily clears 0.2; cosine >= 0.2 also clears it).
  • 0.4+ — increasingly strict. Use when results feel off-topic and you want only highly relevant docs.

Auto-zeroed when Lexical Only mode is selected — the threshold is calibrated for the combined hybrid score and would filter all results in pure keyword mode.

8. Results Per Page

Slider 10 – 200 (default 10). Number of results the search UI returns per page by default. Applies to both the hosted Search UI at search.opensolr.com/INDEX and to API responses when no explicit rows param is sent.

Higher values give you more results per request but increase response size and bandwidth. 10 is a UI-friendly default. 20–30 works well for grid layouts. 50+ for list-style result pages or when you do client-side filtering across the whole result set.

Quick recipes

Site typeRecommended settings
News / blog
surface fresh content fast
Min Match: Flexible. Search Mode: Union. Lexical balance ~ 0.3. Quality Boost 0.3 (penalises thin posts). Min Relevance Score 0.2 (cuts noise from broad queries).
Knowledge base / docs
natural-language Q&A
Min Match: Flexible. Search Mode: Meaning Required. Lexical balance 0.2. Bump Text and LD Text weights to 0.05–0.10. Quality Boost 0.3. Min Relevance Score 0.25.
E-commerce / product catalog
exact matches matter
Min Match: Strict. Search Mode: Keywords Required. Lexical balance 0.6–0.8. Title weight 0.2–0.3 (product names live in titles). Min Relevance Score 0.3.
Internal site search / navigation
find the page they typed
Min Match: Balanced. Search Mode: Lexical Only (no embedding round-trip; faster). Lexical balance 0.7. URI weight 0.05 (URL slugs help on nav-style queries).
Mixed-content portal
the system defaults
Don't change anything. The shipped defaults (Flexible / Union / 0.30 lexical / topK 200) are tuned for general-purpose hybrid search.

How it works under the hood

  1. You move a slider. JS calls POST /admin/solr_manager/save_search_tuning/{INDEX} with the field name and the new value. CSRF-protected. Debounced ~400 ms.
  2. The value is stored on your index row in default_solr_cores.search_* (e.g. search_lexical_weight, search_mm, search_bool_mode). NULL means “use the system default from search_defaults.php”.
  3. On the next search request, Hybrid_search.php reads your row, merges it on top of search_defaults.php, and constructs the final Solr query. No restart, no re-index.
  4. Reset sends a special null sentinel that nulls the column, dropping the override entirely.

Reset behaviour

Every control has a small Reset button next to it that restores that one setting to the system default (it nulls the corresponding DB column on your index). At the bottom of the panel a Reset All to Defaults button does the same for every control at once.

CMS module / plugin equivalents

The same controls exist in the official Drupal module and WordPress plugin, but they are stored at the CMS layer and have slightly different shipped defaults so the modules feel right out of the box for those CMSes:

If you operate the index directly via the Opensolr admin (not through a CMS module), this page (FAQ #202) is the canonical reference and the values you see in the admin UI match the table above.

Frequently asked

Do I need to re-index after changing tuning settings?
No. Tuning is read at query time. The next search uses your new values.

Why is the Semantic ↔ Lexical Balance slider missing on my index?
It only appears for vector-enabled indexes — those with a dense-vector field in the schema (typically the __dense suffix or one of the legacy whitelisted vector cores). Lexical-only indexes don't have a vector side to balance against.

What's the difference between Content Quality Boost and Minimum Relevance Score?
Quality Boost rewards rich docs (additive, in the score). Minimum Relevance Score filters out low-score docs (post-filter, before they reach the response). They're independent and you can use both together.

If I switch to Lexical Only, what happens to the vector / topK / balance settings?
They become inert. The query stops calling the embedding API entirely — no {!knn} clause is generated, no {!bool} combiner runs, just plain edismax with your field weights and mm. Min Relevance Score is auto-zeroed because its threshold is calibrated for the combined hybrid score.

Can I version / export / restore my tuning settings?
The values live on the default_solr_cores row for your index. You can read them via the Get Index API and re-apply them via the same admin save endpoint. There is no built-in version history yet.

The Freshness Window control I used to see is gone — where did it go?
Freshness Window and Default Search Mode controls were removed from the UI in April 2026 because users now set their own fresh= filter and date facets directly in the search UI. The backend settings are still readable for backwards compatibility but no longer surfaced in the admin panel.

Read Full Answer

Hybrid Search in Opensolr: A Modern Approach

Hybrid Search in Apache Solr: Modern Power, Classic Roots

The Evolution of Search: From Keywords to Vectors 🔍➡️🧠

Important Pre-Req.

First make sure you have this embeddings field in your schema.xml (works with):
<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" similarityFunction="cosine"/>

⚠️ Pay very close attention to the vectorDimension, as it has to match the embeddings that you are creating with your LLM Model. If using the Opensolr Index Embedding API, this has to be exactly: 1024. This works with the Opensolr Embed API Endpoint which uses the BAAI/bge-m3 embedding model.


Opensolr Also supports the native Solr /schema API, so you can also run these two, in order to add your fields to the schema.xml.
$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 1024,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":1024,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in solrconfig.xml for atomic updates to use with the Opensolr Index Embedding API:
<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
      
        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Already on Opensolr? Web Crawler indexes come with hybrid search ready out of the box — the embeddings field, BGE-m3 vectors, and the hybrid query pipeline are all pre-configured. No schema edits, no embedding setup. The manual setup above is for custom Opensolr indexes or self-hosted Solr.


Why Vector Search Isn’t a Silver Bullet ⚠️

As much as we love innovation, vector search still has a few quirks:

  • Mystery Rankings: Why did document B leapfrog document A? Sometimes, it’s anyone’s guess. 🕳️
  • Chunky Business: Embedding models are picky eaters—they work best with just the right size of text chunks.
  • Keyword Nostalgia: Many users still expect the comfort of exact matches. “Where’s my keyword?” they ask. (Fair question!)

Hybrid Search: The Best of Both Worlds 🤝

Hybrid search bridges the gap—combining trusty keyword (lexical) search with smart vector (neural) search for results that are both sharp and relevant.

How It Works

  1. Double the Fun: Run a classic keyword query and a KNN vector search at the same time, creating two candidate lists.
  2. Clever Combining: Merge and rank for maximum “aha!” moments.

Tuning the Balance: On Opensolr, Search Tuning gives you a visual slider to control the balance between keyword and semantic scoring (0.0 = pure keyword, 3.0 = heavily semantic). The system also adapts dynamically based on query length — short queries lean keyword, longer queries lean semantic.


Apache Solr Does Hybrid Search (Despite the Rumors) 💡

Contrary to the grapevine, Solr can absolutely do hybrid search—even if the docs are a little shy about it. If your schema mixes traditional fields with a solr.DenseVectorField, you’re all set.


Candidate Selection: Boolean Query Parser to the Rescue 🦸‍♂️

Solr’s Boolean Query Parser lets you mix and match candidate sets with flair:

Union Example

q={!bool should=$lexicalQuery should=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: All unique hits from both searches. No duplicates, more to love! ❤️

Intersection Example

q={!bool must=$lexicalQuery must=$vectorQuery}&
lexicalQuery={!type=edismax qf=text_field}term1&
vectorQuery={!knn f=vector topK=10}[0.001, -0.422, -0.284, ...]

Result: Only the most relevant docs—where both worlds collide. 🤝


You also have to be mindful of the Solr version you are using, since we were able to make this work only on Solr version 9.0. Beware this did not work on Solr 9.6! Only reranking queries worked on Solr 9.6 (as shown below).

Basically, at this point, here are all the paramerers we sent Solr, to make this hybrid search working on Solr version 9.0:

Classic Solr Edismax Search combined with dense vector search (UNION)

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "vectorQuery":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "q":"{!bool must=$lexicalQuery must=$vectorQuery}",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "lexicalQuery":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "pf3":"text^5",
  "pf2":"tdescription^6"
}

Solr 9.6 reranking query. (It also works in Solr 9.0):

{
  "mm":"1<100% 2<70% 3<45% 5<30% 7<20% 10<10%",
  "df":"title",
  "ps":"3",
  "bf":"recip(rord(timestamp),1,1500,500)^90",
  "fl":"score,meta_file_modification_date*,score,og_image,id,uri,description,title,meta_icon,content_type,creation_date,timestamp,meta_robots,content_type,meta_domain,meta_*,text",
  "start":"0",
  "fq":"+content_type:text*",
  "rows":"100",
  "q":"{!knn f=embeddings topK=100}[-0.024160323664546,...,0.031963128596544]",
  "rqq":"{!edismax qf=$qf bf=$bf ps=$ps pf=$pf pf2=$pf2 pf3=$pf3 mm=$mm}trump tariffs",
  "qf":"title^10 description^5 uri^3 text^2 phonetic_title^0.1",
  "pf":"title^15 description^7 uri^9",
  "pf3":"text^5",
  "pf2":"tdescription^6",
  "rq":"{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}"
}

A few remarks:

🎹 This is based on the classic Opensolr Web Crawler Index, that does most of it's work within the fields: title, description, text, uri.

📰 Index is populated with data crawled from various public news websites.

🔗 We embedded a concatenation of title, description and the first 50 sentences of text.

💼 We use the Opensolr Query Embed API, to embed our query at search-time.

🏃🏻‍♂️ You can see this search in action, here.

👩🏻‍💻 You can also see the Solr data and make your own queries on it. This index' Solr API, is here.

📦 For content the crawler can't reach, the Data Ingestion API lets you push documents via REST — each document automatically gets BGE-m3 embeddings, sentiment analysis, and language detection.

🔐 Credentials are: Username: 123 / Password: 123 -> Enjoy! 🥳


Cheat Sheet

🤥 Below is a cheat-sheet, of the fields and how you're supposed to use them if you run knn queries. Solr is very picky about what goes with knn and what doesn't. For example, for the Union query, we were unable to use highlighting. But, if you follow the specs below, you'll probably won't be getting any Query can not be null Solr errors... (or will you? 🤭)


What Belongs Inside {!edismax} in lexicalQuery? 🧾

Parameter Inside lexicalQuery Why
q ✅ YES Required for the subquery to function
qf, pf, bf, bq, mm, ps ✅ YES All edismax features must go inside
defType ❌ NO Already defined by {!edismax}
hl, spellcheck, facet, rows, start, sort ❌ NO These are top-level Solr request features

💡 Hybrid Query Cheat Sheet

Here’s how to do it right when you want all the bells and whistles (highlighting, spellcheck, deep edismax):

# TOP-LEVEL BOOLEAN QUERY COMPOSING EDISMAX AND KNN
q={!bool should=$lexicalQuery should=$vectorQuery}

# LEXICAL QUERY: ALL YOUR EDISMAX STUFF GOES HERE
&lexicalQuery={!edismax q=$qtext qf=$qf pf=$pf mm=$mm bf=$bf}

# VECTOR QUERY
&vectorQuery={!knn f=vectorField topK=10}[0.123, -0.456, ...]

# EDISMAX PARAMS
&qtext='flying machine'
&qf=title^6 description^3 text^2 uri^4
&pf=text^10
&mm=1<100% 2<75% 3<50% 6<30%
&bf=recip(ms(NOW,publish_date),3.16e-11,1,1)

# NON-QUERY STUFF
&hl=true
&hl.fl=text
&hl.q=$lexicalQuery
&spellcheck=true
&spellcheck.q=$qtext
&rows=20
&start=0
&sort=score desc

In Summary

Hybrid search gives you the sharp accuracy of keywords and the deep smarts of vectors—all in one system. With Solr, you can have classic reliability and modern magic. 🍦✨

"Why choose between classic and cutting-edge, when you can have both? Double-scoop your search!"


Opensolr: Hybrid Search Without the Complexity

Everything above — schema fields, embedding pipelines, boolean query composition, reranking — is what you wire up manually on your own Solr. On Opensolr, the hard parts are handled for you:

  • Web Crawler Indexes — Hybrid search works out of the box. The crawler automatically generates 1024-dim BGE-m3 embeddings for every page it crawls. No schema setup, no embedding code. Point it at a URL and you have a hybrid search engine. (Web Crawler)

  • One-Click Embeddings — Have an existing index? The Embedding API generates vectors for every document in your index with one call. No external model hosting, no batch scripts. (Index Embedding API)

  • Search Tuning — A visual slider controls the keyword vs semantic balance per index (0.0 = pure keyword BM25, 3.0 = heavily semantic). The system also adapts dynamically based on query length. No config files, instant effect. (Search Tuning)

  • Query Elevation — When the hybrid algorithm ranks something wrong for a specific query, pin the right document to the top or exclude the wrong one. Instant, no reindexing. (Query Elevation)

  • Click Analytics — See which hybrid results users actually click. High impressions with low CTR means the keyword/semantic balance needs adjusting — or the result needs pinning. (Click Analytics)

  • Data Ingestion API — Push documents from databases, APIs, or internal systems. Every document gets automatic BGE-m3 embeddings, sentiment analysis, and language detection — same enrichment pipeline as the crawler. (Data Ingestion API)

  • EDisMax + Vectors — The lexicalQuery examples in this guide use EDisMax for the keyword side. Same parameters (qf, pf, mm, ps), same behavior — Opensolr just removes the infrastructure work.

  • Error Audit — Hybrid queries can fail silently (wrong vector dimensions, missing fields, malformed KNN syntax). Error Audit captures every Solr error from the last 7 days, parsed and searchable — so you catch these before users do.

The hybrid search concepts in this guide apply directly on Opensolr — same Solr, same query syntax. The difference is you skip the embedding infrastructure, the schema plumbing, and the scoring guesswork.

Happy hybrid searching! 🥳

Custom Plans Available

This is a premium feature available on custom plans tailored to your needs and budget. For small websites, we can even provide these features for free after validating your use case. Contact us at support@opensolr.com to discuss your requirements.

Read Full Answer

Loading more articles...