Solr API

Opensolr Solr API — find answers to your questions

Querying the Solr API: Search Parameters Explained

Querying the Solr API: Search Parameters Explained

What is the Solr /select API?

Apache Solr exposes a powerful search API at the /select endpoint. This is how you query your index — you send an HTTP request with search parameters, and Solr returns matching documents as JSON.

Your Opensolr Index has its own URL (found in your Index Control Panel), and you query it like this:

https://YOUR_INDEX_HOST/solr/YOUR_INDEX_NAME/select?q=your+search+terms&wt=json

Your index is protected by HTTP Basic Authentication over HTTPS. The default username is opensolr and the password is your Secret API Key (found in your Control Panel Dashboard). You can change these credentials in the Security section.


🔍 The Solr Query Inspector: Your Cheat Sheet

Here is the best part: you do not need to figure out all these parameters from scratch.

The Opensolr Search UI has a built-in Solr Query Inspector that shows you the exact parameters used for every search. Here is how to use it:

  1. Go to your search UI at https://search.opensolr.com/YOUR_INDEX_NAME
  2. Type a search query and hit enter
  3. Look for the magnifying glass icon in the bottom-right corner of the page
  4. Click it to open the Solr Query Inspector modal
  5. You will see a table listing every single parameter sent to Solr, with their values
  6. Click the "Copy Params" button to copy all parameters to your clipboard

You can then paste these parameters directly into your application. This is the fastest way to get a working hybrid search — just copy what the built-in UI does and adapt it.


Essential Parameters

Here are the most important parameters for querying your Web Crawler index:

Basic Search

Parameter Description Example
q The search query q=opensolr documentation
df Default search field (if q does not specify a field) df=title
rows Number of results to return rows=10
start Offset for pagination (0-based) start=20 (page 3 with rows=10)
wt Response format wt=json
fl Fields to return (comma-separated) fl=id,uri,title,description,og_image,score
sort Sort order sort=creation_date desc or sort=score desc

Filter Queries (fq)

Filter queries narrow results without affecting the relevancy score. They are also cached by Solr, making repeated filtering very fast.

fq=content_type:text*              // Only HTML pages
fq=meta_domain:yoursite.com        // Only from a specific domain
fq=meta_detected_language:en       // Only English pages
fq=creation_date:[NOW-7DAY TO *]   // Only from the last 7 days
fq=-uri_s:*/admin*                 // Exclude admin pages

You can use multiple fq parameters — they are AND-ed together.


🧠 Hybrid Search: Combining Keywords and AI

This is where Opensolr really shines. The hybrid search combines traditional keyword matching (lexical search) with AI-powered semantic search (vector similarity) in a single query.

The formula looks like this:

q={!func}sum(
    product(VECTOR_WEIGHT, query($vectorQuery)),
    product(LEXICAL_WEIGHT, div(log(sum(1, query($lexicalQuery))), sum(log(sum(1, query($lexicalQuery))), 20)))
)

Do not panic — here is what it means in plain English:

  • vectorQuery — Searches the vector embeddings field using KNN (K-Nearest Neighbors). Finds pages that are semantically similar to your query, even if they do not share the same exact words.
  • lexicalQuery — Traditional keyword search using Solr's eDisMax parser. Finds pages that contain your exact search terms, with intelligent weighting.
  • The formula combines both scores — vector similarity gets direct weight, while the lexical score is dampened with a logarithmic function to prevent keyword-stuffed pages from dominating.

The Sub-Queries

Vector Query:

vectorQuery={!knn f=embeddings topK=250}[embedding_vector_here]

This tells Solr to find the 250 nearest documents in the vector space. The embedding vector is generated from your search query text using the same AI model that was used to embed the indexed pages.

Lexical Query:

lexicalQuery={!edismax
    qf="title^5 description^4 uri^0.5 text^0.01"
    pf="title^10 description^8 uri^1 text^0.02"
    pf2="title^5 description^4 uri^0.5 text^0.01"
    pf3="title^2.5 description^2 uri^0.25 text^0.005"
    ps=0 ps2=1 ps3=2
    mm="2<-1 5<-2"
}your search terms here

This is the keyword search component. The field weights are:

  • title (5x) — Most important. A match in the title is worth 5x a match in body text.
  • description (4x) — Second most important.
  • uri (0.5x) — URL matches get a small boost.
  • text (0.01x) — Body text matches get minimal direct weight (but phrase matches in pf boost them).

The pf / pf2 / pf3 parameters boost exact phrases, bigrams, and trigrams respectively, which means results that match your query as a complete phrase rank higher than results that just happen to contain the same words scattered around.

mm (minimum match) controls how many of your search terms must appear: for queries with 2-4 terms, all but one must match; for 5+ terms, all but two must match.


Highlighting

To get search term highlighting in your results (bold matching words in snippets):

hl=true
hl.fl=uri,title,description,text
hl.method=unified
hl.fragsize=200
hl.snippets=1
hl.tag.pre=<em>
hl.tag.post=</em>

The highlighted text is returned in a separate highlighting section of the JSON response, keyed by document id.


Spellcheck

To get "Did you mean...?" suggestions:

spellcheck=true
spellcheck.q=your search terms
spellcheck.count=5
spellcheck.collate=true
spellcheck.maxCollationTries=15
spellcheck.maxCollations=3

The suggestions appear in the spellcheck section of the response.


Faceting

To get counts of results grouped by field values (e.g., how many results per language):

facet=true
facet.field=meta_detected_language
facet.field=currency_s
facet.mincount=1
facet.sort=index

This returns a facet_counts section with value-count pairs.


🚫 Excluding Pages from Results

You can exclude certain pages or URL patterns from your search results using filter queries. This is useful for hiding taxonomy pages, tag pages, admin pages, or other non-content URLs.

In the Opensolr Control Panel, you can configure these exclusions in the search.xml configuration file. They are added as fq (filter query) parameters in the appends section, which means they are automatically applied to every search query:

<lst name="appends">
    <str name="fq">-uri_s:*/taxonomy*</str>
    <str name="fq">-uri_s:*/admin*</str>
    <str name="fq">-uri_s:*/tag/*</str>
    <str name="fq">-uri_s:*/store-locator*</str>
</lst>

The - prefix means "exclude". The uri_s field is the exact string version of the URL, and * is a wildcard. So -uri_s:*/taxonomy* means "exclude any URL containing /taxonomy".

You can also exclude by other fields:

<str name="fq">-og_image:"https://example.com/default-placeholder.jpg"</str>

This is a powerful way to clean up your search results without re-crawling.

Tip: You can also apply these exclusions dynamically in your application by adding fq parameters to your API queries, without modifying the search.xml config.

Read Full Answer

Understanding the Solr JSON Response

Understanding the Solr JSON Response

When you query your Opensolr Web Crawler index via the /select API, you get back a JSON object. This page explains every section of that response so you know exactly how to parse it in your application.


The Basic Response Structure

Here is a simplified example of what comes back when you search:

{
  "responseHeader": {
    "status": 0,
    "QTime": 12
  },
  "response": {
    "numFound": 1542,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      { ... },
      { ... }
    ]
  },
  "highlighting": {
    "doc_id_1": { ... },
    "doc_id_2": { ... }
  },
  "facet_counts": {
    "facet_fields": { ... }
  },
  "spellcheck": {
    "suggestions": [ ... ],
    "collations": [ ... ]
  }
}

responseHeader

"responseHeader": {
  "status": 0,
  "QTime": 12
}
Field Description
status 0 means success. Any other value means an error occurred.
QTime Query execution time in milliseconds. This is how long Solr took to process your query (not including network time). A good search returns in under 50ms.

response

This is the main section containing your search results.

"response": {
  "numFound": 1542,
  "start": 0,
  "numFoundExact": true,
  "docs": [ ... ]
}
Field Description
numFound Total number of documents matching your query. This is the number you display as "Showing results 1-10 of 1,542".
start The offset you requested (from the start parameter).
numFoundExact Whether numFound is exact (true) or an approximation. Almost always true.
docs Array of document objects — your actual search results.

A Document Object

Each item in the docs array looks like this:

{
  "id": "bc434cee9e50b48f80095c9f49bdeae3",
  "uri": "https://yoursite.com/blog/my-article",
  "title": "My Article Title",
  "description": "A short description of the article...",
  "text": "The full body text of the page...",
  "og_image": "https://yoursite.com/images/article-thumb.jpg",
  "meta_icon": "https://yoursite.com/favicon.ico",
  "meta_domain": "yoursite.com",
  "meta_detected_language": "en",
  "content_type": "text/html",
  "creation_date": "2026-02-20T14:30:00Z",
  "timestamp": 1771684200,
  "sent_pos": 0.153,
  "sent_neu": 0.752,
  "sent_neg": 0.095,
  "sent_com": 0.9856,
  "score": 1.8234,
  "price_f": 29.99,
  "currency_s": "USD"
}

The fields you get back depend on what you specified in the fl (field list) parameter. The score field is special — it is computed at query time and represents the relevancy score for that document.

Tip: Only request the fields you actually need. Requesting text (full body text) for every result increases response size significantly. If you only display titles and snippets, use fl=id,uri,title,description,og_image,score.


highlighting

If you set hl=true in your query, you get a highlighting section keyed by document ID:

"highlighting": {
  "bc434cee9e50b48f80095c9f49bdeae3": {
    "title": ["My <em>Article</em> Title"],
    "description": ["A short description about <em>article</em> writing..."],
    "text": ["...the main body discusses <em>article</em> formatting and..."]
  },
  "another_doc_id": {
    "title": ["Another <em>Article</em>"]
  }
}

How to use it in your code:

  1. Loop through your docs array
  2. For each doc, look up highlighting[doc.id]
  3. If a highlighted version exists for a field, use it instead of the raw field value
  4. The <em> tags wrap the matching search terms — style them with CSS (bold, background color, etc.)
// Example: get the best title to display
var docId = doc.id;
var hl = data.highlighting[docId] || {};
var displayTitle = (hl.title && hl.title[0]) ? hl.title[0] : doc.title;

Important: Highlighted text contains HTML (<em> tags), so you should insert it using innerHTML, not textContent. The rest of the text is already HTML-escaped by Solr.


facet_counts

If you set facet=true, you get a facet_counts section:

"facet_counts": {
  "facet_fields": {
    "meta_detected_language": [
      "en", 834,
      "de", 245,
      "fr", 123,
      "es", 89,
      "nl", 42
    ],
    "currency_s": [
      "USD", 156,
      "EUR", 89
    ]
  }
}

The facet arrays alternate between value and count: ["en", 834, "de", 245, ...] means 834 results in English, 245 in German, etc.

Parsing facets in JavaScript:

var langFacets = data.facet_counts.facet_fields.meta_detected_language;
var facets = [];
for (var i = 0; i < langFacets.length; i += 2) {
    facets.push({ value: langFacets[i], count: langFacets[i+1] });
}
// facets = [{value: "en", count: 834}, {value: "de", count: 245}, ...]

You can then render these as clickable filter buttons or a sidebar. When the user clicks "English (834)", add fq=meta_detected_language:en to your query to filter.


spellcheck

If you enabled spellcheck, you get suggestions for misspelled words:

"spellcheck": {
  "suggestions": [
    "opnesolr", {
      "numFound": 1,
      "suggestion": ["opensolr"]
    }
  ],
  "collations": [
    "collation", "opensolr search"
  ]
}

The collations section is the most useful — it gives you a corrected version of the entire query that you can show as "Did you mean: opensolr search?".

Parsing collations:

var collations = data.spellcheck ? data.spellcheck.collations : [];
if (collations.length >= 2) {
    var suggestion = collations[1]; // The corrected query string
    // Display: "Did you mean: <a>suggestion</a>?"
}

Error Responses

If something goes wrong, the response looks like this:

{
  "responseHeader": {
    "status": 400
  },
  "error": {
    "msg": "undefined field: nonexistent_field",
    "code": 400
  }
}

Always check responseHeader.status — if it is not 0, read the error.msg for details. Common errors:

  • 400 — Bad query syntax, undefined field, or invalid parameter value
  • 500 — Server error (rare, usually a misconfigured query)

stats

If you requested stats on a numeric field (e.g., stats=true&stats.field=price_f):

"stats": {
  "stats_fields": {
    "price_f": {
      "min": 4.99,
      "max": 299.99,
      "count": 156,
      "missing": 1386,
      "sum": 12450.50,
      "mean": 79.81
    }
  }
}

This is useful for displaying price ranges or other numeric summaries in your UI.

Read Full Answer

Pagination, Sorting & Freshness Boosting

Pagination, Sorting & Freshness Boosting

This page covers three essential features for any search UI: paginating through results, sorting in different orders, and boosting recent content so fresh pages appear first.


Pagination

Solr uses offset-based pagination with two parameters:

Parameter Description Default
rows Number of results per page 10
start Zero-based offset 0

How It Works

  • Page 1: start=0&rows=10 → results 1-10
  • Page 2: start=10&rows=10 → results 11-20
  • Page 3: start=20&rows=10 → results 21-30
  • Page N: start=(N-1)*rows&rows=10

The total number of results is in response.numFound. Use it to calculate the number of pages:

var totalResults = data.response.numFound;
var resultsPerPage = 10;
var totalPages = Math.ceil(totalResults / resultsPerPage);
var currentPage = Math.floor(data.response.start / resultsPerPage) + 1;

Building Page Links

function buildPageUrl(page) {
    var start = (page - 1) * resultsPerPage;
    return '/search?q=' + encodeURIComponent(query) + '&start=' + start + '&rows=' + resultsPerPage;
}

// Show: « 1 2 3 4 5 ... 154 »

Performance Tip

Do not use very large start values (e.g., start=10000). Deep pagination is expensive for Solr. If you need to go beyond a few hundred pages, consider using cursorMark pagination instead:

sort=score desc,id asc&cursorMark=*

The response includes a nextCursorMark value that you pass back in the next request. This is much more efficient for deep pagination.


Sorting

By default, results are sorted by relevancy score (most relevant first). You can override this with the sort parameter.

Common Sort Options

sort=score desc                    // By relevancy (default)
sort=creation_date desc            // Newest first
sort=creation_date asc             // Oldest first
sort=title_s asc                   // Alphabetical by title
sort=price_f asc                   // Cheapest first
sort=price_f desc                  // Most expensive first
sort=sent_com desc                 // Most positive sentiment first
sort=score desc,creation_date desc // By relevancy, then by date for ties

Combining Sort with Score

You can sort by multiple fields. If two documents have the same score, the second sort field breaks the tie:

sort=score desc, creation_date desc

This gives you relevancy-sorted results where equally relevant results show the newer one first.

Important: When sorting by a field, that field must have docValues=true in the schema. All the metadata, date, and numeric fields in the Web Crawler schema have this enabled, so you can sort by any of them.


⏱️ Freshness Boosting

Freshness boosting makes recent content rank higher in search results without completely overriding relevancy. A page published yesterday about your search topic will rank higher than a page from 3 years ago about the same topic — but a highly relevant old page can still outrank a barely relevant new one.

How to Enable Freshness

Add a boost function (bf) parameter to your eDisMax query:

bf=recip(ms(NOW,creation_date),3.16e-11,1,1)

This is a reciprocal function that gives maximum boost to very recent documents and diminishing boost to older ones:

  • Published today → strong boost
  • Published last week → moderate boost
  • Published last month → small boost
  • Published last year → tiny boost
  • Published 5+ years ago → essentially no boost

How Freshness Works in Hybrid Search

In the Opensolr hybrid search, freshness can be applied via the bf parameter inside the lexical query component. The Opensolr Search UI automatically applies freshness boosting when the user selects a freshness filter:

Filter What It Does
fresh=yes Applies a general freshness boost (recent pages rank higher)
fresh=today Strongly boosts pages published today
fresh=previous_week Boosts pages from the past 7 days
fresh=no No freshness boost — pure relevancy sorting

Date Range Filtering vs. Freshness Boosting

These are two different things:

  • Date range filter (fq) — Hard cutoff. Pages outside the range are completely excluded:

    fq=creation_date:[NOW-30DAY TO *]
    

    This shows ONLY pages from the last 30 days. Older pages are gone.

  • Freshness boost (bf) — Soft preference. Recent pages rank higher, but old pages still appear if they are highly relevant.

For most search UIs, freshness boosting is better than hard filtering because you do not accidentally hide great content just because it is old.


Putting It All Together

Here is a complete query URL combining pagination, sorting, and freshness:

/select?
  q=your search terms
  &defType=edismax
  &qf=title^5 description^4 text^1
  &bf=recip(ms(NOW,creation_date),3.16e-11,1,1)
  &rows=10
  &start=20
  &sort=score desc
  &fl=id,uri,title,description,og_image,creation_date,score
  &wt=json

This searches with relevancy + freshness boost, returns page 3 (results 21-30), and includes the fields you need for rendering.

Read Full Answer

Loading more articles...