Querying the Solr API: Search Parameters Explained

Documentation > WEB CRAWLER-Solr API > Querying the Solr API: Search Parameters Explained

Querying the Solr API: Search Parameters Explained

What is the Solr /select API?

Apache Solr exposes a powerful search API at the /select endpoint. This is how you query your index — you send an HTTP request with search parameters, and Solr returns matching documents as JSON.

Your Opensolr Index has its own URL (found in your Index Control Panel), and you query it like this:

https://YOUR_INDEX_HOST/solr/YOUR_INDEX_NAME/select?q=your+search+terms&wt=json

Your index is protected by HTTP Basic Authentication over HTTPS. The default username is opensolr and the password is your Secret API Key (found in your Control Panel Dashboard). You can change these credentials in the Security section.


🔍 The Solr Query Inspector: Your Cheat Sheet

Here is the best part: you do not need to figure out all these parameters from scratch.

The Opensolr Search UI has a built-in Solr Query Inspector that shows you the exact parameters used for every search. Here is how to use it:

  1. Go to your search UI at https://search.opensolr.com/YOUR_INDEX_NAME
  2. Type a search query and hit enter
  3. Look for the magnifying glass icon in the bottom-right corner of the page
  4. Click it to open the Solr Query Inspector modal
  5. You will see a table listing every single parameter sent to Solr, with their values
  6. Click the "Copy Params" button to copy all parameters to your clipboard

You can then paste these parameters directly into your application. This is the fastest way to get a working hybrid search — just copy what the built-in UI does and adapt it.


Essential Parameters

Here are the most important parameters for querying your Web Crawler index:

Basic Search

Parameter Description Example
q The search query q=opensolr documentation
df Default search field (if q does not specify a field) df=title
rows Number of results to return rows=10
start Offset for pagination (0-based) start=20 (page 3 with rows=10)
wt Response format wt=json
fl Fields to return (comma-separated) fl=id,uri,title,description,og_image,score
sort Sort order sort=creation_date desc or sort=score desc

Filter Queries (fq)

Filter queries narrow results without affecting the relevancy score. They are also cached by Solr, making repeated filtering very fast.

fq=content_type:text*              // Only HTML pages
fq=meta_domain:yoursite.com        // Only from a specific domain
fq=meta_detected_language:en       // Only English pages
fq=creation_date:[NOW-7DAY TO *]   // Only from the last 7 days
fq=-uri_s:*/admin*                 // Exclude admin pages

You can use multiple fq parameters — they are AND-ed together.


🧠 Hybrid Search: Combining Keywords and AI

This is where Opensolr really shines. The hybrid search combines traditional keyword matching (lexical search) with AI-powered semantic search (vector similarity) in a single query.

The formula looks like this:

q={!func}sum(
    product(VECTOR_WEIGHT, query($vectorQuery)),
    product(LEXICAL_WEIGHT, div(log(sum(1, query($lexicalQuery))), sum(log(sum(1, query($lexicalQuery))), 20)))
)

Do not panic — here is what it means in plain English:

  • vectorQuery — Searches the vector embeddings field using KNN (K-Nearest Neighbors). Finds pages that are semantically similar to your query, even if they do not share the same exact words.
  • lexicalQuery — Traditional keyword search using Solr's eDisMax parser. Finds pages that contain your exact search terms, with intelligent weighting.
  • The formula combines both scores — vector similarity gets direct weight, while the lexical score is dampened with a logarithmic function to prevent keyword-stuffed pages from dominating.

The Sub-Queries

Vector Query:

vectorQuery={!knn f=embeddings topK=250}[embedding_vector_here]

This tells Solr to find the 250 nearest documents in the vector space. The embedding vector is generated from your search query text using the same AI model that was used to embed the indexed pages.

Lexical Query:

lexicalQuery={!edismax
    qf="title^5 description^4 uri^0.5 text^0.01"
    pf="title^10 description^8 uri^1 text^0.02"
    pf2="title^5 description^4 uri^0.5 text^0.01"
    pf3="title^2.5 description^2 uri^0.25 text^0.005"
    ps=0 ps2=1 ps3=2
    mm="2<-1 5<-2"
}your search terms here

This is the keyword search component. The field weights are:

  • title (5x) — Most important. A match in the title is worth 5x a match in body text.
  • description (4x) — Second most important.
  • uri (0.5x) — URL matches get a small boost.
  • text (0.01x) — Body text matches get minimal direct weight (but phrase matches in pf boost them).

The pf / pf2 / pf3 parameters boost exact phrases, bigrams, and trigrams respectively, which means results that match your query as a complete phrase rank higher than results that just happen to contain the same words scattered around.

mm (minimum match) controls how many of your search terms must appear: for queries with 2-4 terms, all but one must match; for 5+ terms, all but two must match.


Highlighting

To get search term highlighting in your results (bold matching words in snippets):

hl=true
hl.fl=uri,title,description,text
hl.method=unified
hl.fragsize=200
hl.snippets=1
hl.tag.pre=<em>
hl.tag.post=</em>

The highlighted text is returned in a separate highlighting section of the JSON response, keyed by document id.


Spellcheck

To get "Did you mean...?" suggestions:

spellcheck=true
spellcheck.q=your search terms
spellcheck.count=5
spellcheck.collate=true
spellcheck.maxCollationTries=15
spellcheck.maxCollations=3

The suggestions appear in the spellcheck section of the response.


Faceting

To get counts of results grouped by field values (e.g., how many results per language):

facet=true
facet.field=meta_detected_language
facet.field=currency_s
facet.mincount=1
facet.sort=index

This returns a facet_counts section with value-count pairs.


🚫 Excluding Pages from Results

You can exclude certain pages or URL patterns from your search results using filter queries. This is useful for hiding taxonomy pages, tag pages, admin pages, or other non-content URLs.

In the Opensolr Control Panel, you can configure these exclusions in the search.xml configuration file. They are added as fq (filter query) parameters in the appends section, which means they are automatically applied to every search query:

<lst name="appends">
    <str name="fq">-uri_s:*/taxonomy*</str>
    <str name="fq">-uri_s:*/admin*</str>
    <str name="fq">-uri_s:*/tag/*</str>
    <str name="fq">-uri_s:*/store-locator*</str>
</lst>

The - prefix means "exclude". The uri_s field is the exact string version of the URL, and * is a wildcard. So -uri_s:*/taxonomy* means "exclude any URL containing /taxonomy".

You can also exclude by other fields:

<str name="fq">-og_image:"https://example.com/default-placeholder.jpg"</str>

This is a powerful way to clean up your search results without re-crawling.

Tip: You can also apply these exclusions dynamically in your application by adding fq parameters to your API queries, without modifying the search.xml config.