Pagination, Sorting & Freshness Boosting

Documentation > WEB CRAWLER-Solr API > Pagination, Sorting & Freshness Boosting

Pagination, Sorting & Freshness Boosting

This page covers three essential features for any search UI: paginating through results, sorting in different orders, and boosting recent content so fresh pages appear first.


Pagination

Solr uses offset-based pagination with two parameters:

Parameter Description Default
rows Number of results per page 10
start Zero-based offset 0

How It Works

  • Page 1: start=0&rows=10 โ†’ results 1-10
  • Page 2: start=10&rows=10 โ†’ results 11-20
  • Page 3: start=20&rows=10 โ†’ results 21-30
  • Page N: start=(N-1)*rows&rows=10

The total number of results is in response.numFound. Use it to calculate the number of pages:

var totalResults = data.response.numFound;
var resultsPerPage = 10;
var totalPages = Math.ceil(totalResults / resultsPerPage);
var currentPage = Math.floor(data.response.start / resultsPerPage) + 1;

Building Page Links

function buildPageUrl(page) {
    var start = (page - 1) * resultsPerPage;
    return '/search?q=' + encodeURIComponent(query) + '&start=' + start + '&rows=' + resultsPerPage;
}

// Show: ยซ 1 2 3 4 5 ... 154 ยป

Performance Tip

Do not use very large start values (e.g., start=10000). Deep pagination is expensive for Solr. If you need to go beyond a few hundred pages, consider using cursorMark pagination instead:

sort=score desc,id asc&cursorMark=*

The response includes a nextCursorMark value that you pass back in the next request. This is much more efficient for deep pagination.


Sorting

By default, results are sorted by relevancy score (most relevant first). You can override this with the sort parameter.

Common Sort Options

sort=score desc                    // By relevancy (default)
sort=creation_date desc            // Newest first
sort=creation_date asc             // Oldest first
sort=title_s asc                   // Alphabetical by title
sort=price_f asc                   // Cheapest first
sort=price_f desc                  // Most expensive first
sort=sent_com desc                 // Most positive sentiment first
sort=score desc,creation_date desc // By relevancy, then by date for ties

Combining Sort with Score

You can sort by multiple fields. If two documents have the same score, the second sort field breaks the tie:

sort=score desc, creation_date desc

This gives you relevancy-sorted results where equally relevant results show the newer one first.

Important: When sorting by a field, that field must have docValues=true in the schema. All the metadata, date, and numeric fields in the Web Crawler schema have this enabled, so you can sort by any of them.


โฑ๏ธ Freshness Boosting

Freshness boosting makes recent content rank higher in search results without completely overriding relevancy. A page published yesterday about your search topic will rank higher than a page from 3 years ago about the same topic โ€” but a highly relevant old page can still outrank a barely relevant new one.

How to Enable Freshness

Add a boost function (bf) parameter to your eDisMax query:

bf=recip(ms(NOW,creation_date),3.16e-11,1,1)

This is a reciprocal function that gives maximum boost to very recent documents and diminishing boost to older ones:

  • Published today โ†’ strong boost
  • Published last week โ†’ moderate boost
  • Published last month โ†’ small boost
  • Published last year โ†’ tiny boost
  • Published 5+ years ago โ†’ essentially no boost

How Freshness Works in Hybrid Search

In the Opensolr hybrid search, freshness can be applied via the bf parameter inside the lexical query component. The Opensolr Search UI automatically applies freshness boosting when the user selects a freshness filter:

Filter What It Does
fresh=yes Applies a general freshness boost (recent pages rank higher)
fresh=today Strongly boosts pages published today
fresh=previous_week Boosts pages from the past 7 days
fresh=no No freshness boost โ€” pure relevancy sorting

Date Range Filtering vs. Freshness Boosting

These are two different things:

  • Date range filter (fq) โ€” Hard cutoff. Pages outside the range are completely excluded:

    fq=creation_date:[NOW-30DAY TO *]
    

    This shows ONLY pages from the last 30 days. Older pages are gone.

  • Freshness boost (bf) โ€” Soft preference. Recent pages rank higher, but old pages still appear if they are highly relevant.

For most search UIs, freshness boosting is better than hard filtering because you do not accidentally hide great content just because it is old.


Putting It All Together

Here is a complete query URL combining pagination, sorting, and freshness:

/select?
  q=your search terms
  &defType=edismax
  &qf=title^5 description^4 text^1
  &bf=recip(ms(NOW,creation_date),3.16e-11,1,1)
  &rows=10
  &start=20
  &sort=score desc
  &fl=id,uri,title,description,og_image,creation_date,score
  &wt=json

This searches with relevancy + freshness boost, returns page 3 (results 21-30), and includes the fields you need for rendering.