Opensolr Changelog

Recent updates and improvements to the Opensolr platform.

Improved × Mar 2026 ×

Drupal Mar 29, 2026

  • Improved Language selector dropdown now shows the active language first instead of alphabetical order. Crawler Settings section moved above Content Types for better admin UX.
  • Improved Data Crawler and Data Ingestion admin tabs now show live document counts next to each content type — checked types show the real count, unchecked show 0, with a running total that updates instantly as you toggle checkboxes.
  • Improved Opensolr Search for Drupal — full multilingual support for meta tags and JSON-LD structured data. Product categories, tags, and brands now output in the correct page language, so the Web Crawler extracts translated facet values identical to Data Ingestion.

Drupal Mar 28, 2026

  • Improved Opensolr Search for Drupal — added Stop Crawl button, price display with currency symbols, and optimized query embeddings for better search relevance.

API Mar 28, 2026

  • Improved New is_query parameter on the embed and batch embed API endpoints. Set is_query=1 when embedding search queries to get optimized retrieval vectors.

Web Crawler Mar 28, 2026

  • Improved Core-wide thread limit enforcement — the max_threads setting now controls the total number of concurrent crawler processes across all start URLs for an index, not per URL. Setting threads to 1 means exactly 1 process at a time.
  • Improved Smarter description extraction — the crawler no longer picks up CSS, JavaScript, or theme builder garbage as page descriptions. Description priority: meta tags → JSON-LD structured data → first two sentences of extracted text.
  • Improved Crawler settings changes now take effect immediately. When you save new thread count, crawl mode, renderer, or pause settings, the active crawl schedule is automatically updated — no need to stop and restart.

Search Mar 28, 2026

  • Improved Price badges now display currency symbols (€, $, £, ¥, etc.) instead of raw currency codes in search results. Prices also display correctly for all indexes regardless of schema configuration.
  • Improved Improved vector search accuracy — the embedding model now uses instruction-tuned prefixes that optimize query vectors for retrieval. Expect 5–15% better recall on natural language queries and stronger cross-language matching (e.g. searching in Romanian and finding English results).

Drupal Mar 25, 2026

Security Mar 25, 2026

  • Improved 3-round security audit on the Drupal module: XSS protection on AI streaming output (DOMParser sanitizer), Solr injection prevention on range filters and facet values, CSRF on all admin write endpoints, and safe URL generation in CLI/cron context. 35+ issues found and fixed across all module files.

Data Ingestion Mar 25, 2026

  • Improved Full CJK and UTF-8 safety across the entire enrichment pipeline. Japanese, Chinese, emoji, and accented characters no longer crash batch processing. All text is sanitized before embedding, Solr push, and JSON encoding. One malformed byte in a PDF can never kill an entire batch again.
  • Improved Ingestion API now skips individual duplicate documents instead of rejecting the entire batch. One duplicate URI no longer blocks 49 other documents from being indexed. The Ingestion Queue also auto-refreshes every 20 seconds when jobs are pending.

Documentation Mar 23, 2026

Web Crawler Mar 20, 2026

  • Improved Web Crawler now automatically removes documents from the search index when their pages return non-200 status codes (404, 500, etc.) during crawling. Previously, dead pages could remain in search results indefinitely.

Search Mar 18, 2026

  • Improved Search clear button (✕) is now larger and more tappable on mobile — the button is bigger, has a generous tap target with padding, and shows a visual press animation on touch. Easier to clear a query on any device.
  • Improved Fresh mode is now a date window filter — Search Tuning now lets you set a Freshness Window of 2 to 365 days. When "Fresh" mode is selected, only content published within that window is returned — no more recent-but-irrelevant results pushing down the most relevant ones. The old boost-factor approach that could surface off-topic content is replaced by a clean date range filter.

Search Mar 17, 2026

  • Improved Search relevancy defaults updated to Flexible minimum match — queries now return more results by default, especially for longer natural-language searches. Short queries (1-2 words) still require all terms to match, while longer queries allow partial matches for better recall. Per-index Search Tuning overrides are unaffected.

Documentation Mar 15, 2026

Control Panel Mar 13, 2026

  • Improved Reload and Reset error messages in the Error Audit are now human-readable — raw Java stack traces are replaced with a short root-cause summary and a direct link to the Error Log for the full details.
  • Improved Index Reset is now bulletproof — the reset process verifies the index is actually empty after clearing it. If the standard reset fails (locked segments, corrupt index), it automatically falls back to a hard reset that nukes the data directory and rebuilds from scratch. Reset status is now properly reported back to the UI instead of always showing success.

Documentation Mar 8, 2026

  • Improved Updated the Data Ingestion API documentation with full content_type field guidance. The field reference now explains the default behavior, how it controls web vs media display in search results, and how MIME types are auto-detected when using rtf:true. All code examples (cURL, PHP, Python) now include content_type.
  • Improved Comprehensive code examples added to the Data Ingestion API documentation. Full working PHP and Python examples for both submission methods — JSON body and file upload — with error handling and job status polling. Plus updated cURL examples for every workflow.

Data Ingestion Mar 8, 2026

  • Improved Cleaner error display in the ingestion queue table. Error columns now show a short summary like "30 ok, 20 doc(s) failed — click for details" instead of the full raw error. Click the message to open the Job Detail modal with the complete breakdown per document.
  • Improved Detailed Solr error reporting in the Data Ingestion Queue. When a document fails at the Solr level — unknown field, type mismatch, schema violation — the exact error from Solr is captured and shown in the Job Detail modal. No more guessing why a document was rejected.

API Mar 8, 2026

  • Improved The Data Ingestion API now returns a doc_ids array in every successful response, showing the auto-generated document ID (md5 of uri) for each document in your batch. Use these IDs to track, query, or update specific documents in your index.
  • Improved URI is now mandatory for every document in the Data Ingestion API. The document ID is always generated as md5(uri), making the URI the single source of truth for document identity. Same URI = same document. Resubmitting a URI updates the existing document. Duplicate URIs in pending jobs are automatically rejected to prevent accidental double-indexing.

Search Mar 7, 2026

  • Improved The Pin, Exclude, and Exclude All buttons on the search elevation toolbar are now high-contrast and color-coded — orange for Pin, red for Exclude — so they stand out clearly as interactive controls.
  • Improved Elevation actions are now mutually exclusive per document — clicking Pin on an excluded result automatically removes the exclude first, and vice versa. No more stale conflicting rules.
  • Improved Query Analytics & Tools — the former Query Statistics page has been completely redesigned into a clean tabbed application. Overview, Queries, and Elevation Rules each live in their own tab with lazy AJAX loading. Elevation rules now show full document details (title, description, URL) instead of raw Solr IDs, with accordion-style collapsible query groups and a regex search to instantly find any elevated document across all rules.

Control Panel Mar 6, 2026

  • Improved The Add New Index page now uses a sidebar filter panel instead of dropdown menus. Region, Version, Country, Type, and Crawler filters are always visible on the left, with result counts next to each value. Click any value to filter, click it again to clear. Active filters are highlighted and a Clear All link resets everything. On mobile, filters collapse behind a sticky Filters button at the top of the page.
  • Improved The Add New Index page is now more compact and easier to scan. Fonts, cards, and filter controls have all been tightened up so you can see more server options at a glance without scrolling. Each card shows the key details — Solr version, region, and server type — cleanly and without clutter.

Web Crawler Mar 5, 2026

  • Improved Faster Playwright rendering in Chrome mode. Pages now complete in ~0.5–1s instead of 2–25s. The old approach waited for all network activity to stop (analytics, trackers, ad pixels), which stalled on busy pages. Now it waits for the DOM, gives JS 500ms to hydrate, and grabs the content.

Web Crawler Mar 4, 2026

  • Improved Solr batch indexing is now more reliable during crawls. When a batch insert to Solr fails (e.g. temporary overload or timeout), the documents are kept in the local buffer and retried on the next flush cycle, instead of being silently lost.

Web Crawler Mar 2, 2026

  • Improved Smarter Resume for the Web Crawler. Clicking Resume now always launches the crawler, even when the queue appears empty. Previously, the UI would refuse to resume if there were no pages left in the queue — but that is exactly the scenario where Resume needs to work, because the crawler re-discovers new content by re-reading your sitemaps. No more misleading "nothing to resume" messages.