Why: Drupal’s search_api_solr probes /admin/luke for capabilities. On large cores or restricted setups, that call can take seconds and return megabytes, causing PHP-FPM timeouts. Mapping /admin/luke → Ping returns a fast, tiny 200 OK.
When to use this
Drupal shows messages like “Solr was unable to retrieve additional information about the server” and/or
Logs include cURL error 28 (timeout after ~15s) while hitting /admin/luke, and
Your Solr index is hosted on OpenSolr and /admin/luke is restricted or very slow.
In the Configuration Files section, locate and edit your solrconfig.xml. (You can use the built-in file editor; no shell access needed.)
Replace the existing /admin/luke request handler block with the snippet above. (Remove or comment out any previous <requestHandler name="/admin/luke" ...> section.)
Save your changes.
OpenSolr will automatically detect and reload your index configuration gracefully — no manual restart required.
You should get a quick 200 response with a small JSON payload.
(Optional) Clear Drupal caches after the fix:
drushcr
Notes & trade-offs
This disables Luke introspection at /admin/luke. Admin tools that rely on the full Luke dump will no longer work on that path.
(Search and indexing remain fully functional.)
Alternatively, you can configure Drupal to skip schema checks entirely:
If you can’t edit Solr configs, a front proxy rule can stub /admin/luke instead:
RewriteRule^/solr/(.*)/admin/luke.*$-[R=200,L]
Why this works (one-liner)
Drupal only needs a quick “OK” response during capability probing. Replacing the heavy Luke dump with a fast ping avoids the long stalls that block PHP-FPM and Apache — and OpenSolr’s reload process applies this instantly and gracefully.
Absolutely classic Solr blunder! This one comes up often when people get a little too creative with their schema changes. Let’s break it down, old-school style.
What’s Happening Here?
You’re seeing this error:
cannot change field "tcngramm_X3b_cn_field_content_title_1" from index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
This is a Solr schema error. In short, Solr stores index “options” for fields—how much information (positions, offsets, etc.) is tracked for each field. Once a field is indexed with a certain setting, you can’t later try to index with different options without reindexing or fixing the schema.
Why Did This Happen?
At some point, your field tcngramm_X3b_cn_field_content_title_1 was defined/indexed with:
indexOptions=DOCS_AND_FREQS_AND_POSITIONS
Now, you (or Drupal, or the module, or something) are trying to index it with:
Solr doesn’t retroactively change old docs. After changing the schema, delete and reindex all docs for this core/collection.
Traditional steps:
Update schema/field type as needed
Reload Solr core/collection
Wipe the index (delete all docs)
Reindex from Drupal/Search API
4. Double-Check Drupal Search API Solr Settings
If you’re using field types or options like “highlighting”, it might set storeOffsetsWithPositions=true automatically. Check those module settings!
Summary Table (For the Busy Solr Admin):
Step
What to do
1. Check field definition
In Solr, ensure index options are consistent
2. Update as needed
Change to add offsets if required
3. Reload schema
Reload Solr core/collection
4. Wipe & reindex
Delete all docs and reindex
5. Check Drupal configs
Make sure module options match Solr setup
The Classic Gotcha:
Once a field’s been indexed a certain way, all documents forever after must use the same index options (unless you reindex everything after changing the schema). It’s like family traditions—break one, and there’s chaos at Thanksgiving.
Still stuck?
Contact Us about your issue and we'll look deeper into it.
java.lang.IllegalArgumentException: cannot change DocValues type from <type> to <other_type> for field <fieldName>
The Solr Schema explains DocValues in this way:
DocValues is recommended (required, if you are using *Point fields) for faceting, grouping, sorting and function queries. DocValues will make the index faster to load, more NRT-friendly and more memory-efficient. [DocValues are] currently only supported by StrField, UUIDField, all *PointFields, and depending on the field type, they might require the field to be single-valued, be required or have a default value.
DocValues is set in many schema fields similar to this one:
The DocValues type is determined internally by Solr in the following way:
DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are:
StrField and UUIDField.
If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. If the field is multi-valued, Lucene will use the SORTED_SET type. Any Trie* numeric fields, date fields and EnumField. If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. If the field is multi-valued, Lucene will use the SORTED_SET type. Boolean fields Int|Long|Float|Double|Date PointField If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. If the field is multi-valued, Lucene will use the SORTED_NUMERIC type. From the above, you can see that Solr’s complaint that it “cannot change a DocValues type” implies that someone has changed the definition of the named field, probably by altering the DocValues setting or by changing the multivalue setting.
Possible Solutions:
A field change requires you to reset your index (remove all index data) from yourOpensolr Index Control Panel and Re-index the data. Otherwise, the conflict between schema and index will cause many downstream error messages.
If that fails, try creating a new index, and Re-index the data in the new index. (This requires setting up your Drupal Server connection information).
When indexing data in Solr, you may encounter the following error:
ERROR (qtp123456-78) [<replica>] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid Number: <bad value> for field <numeric field>
This message means Solr attempted to index a value into a numeric field (such as int, float, double, or long), but the provided value wasn't a valid number. When this happens, Solr will drop the entire document from the index—so this error is no joke!
Why Does This Happen?
This error occurs if:
Non-numeric data is sent to a numeric field (e.g., "foo" instead of 42).
Your application code or data pipeline is generating or passing bad values.
There is legacy or inconsistent data sneaking through.
There’s a schema mismatch—maybe the field is defined as numeric in schema.xml, but your code sends a string.
Common Scenarios
Null, empty, or malformed values in incoming data (e.g., "", "NaN", "NULL", "abc")
Data imported from legacy systems or third parties
Changes in business logic or code that alter value formatting
How to Fix It
Check Your Data:
Verify that the value sent to Solr matches the numeric type defined in schema.xml.
Example: For a field <field name="price" type="pfloat" .../>, make sure you’re sending a valid float (e.g., 12.99).
Validate Before Sending:
Implement input validation in your application or ETL pipeline.
Reject or sanitize bad data before indexing.
Update Your Schema (If Needed):
If the field should accept text, not numbers, update your schema.xml accordingly.
Otherwise, fix the source of the non-numeric data.
Handle Legacy or Unknown Data:
Clean up legacy datasets or add migration scripts to convert values.
Default bad values to 0 or omit the field, depending on your business logic.
Log and monitor failed documents—don’t let errors slip by unnoticed.
Sample Error & Diagnosis
Sample Error Log:
org.apache.solr.common.SolrException: Invalid Number: "foo" for field price
Diagnosis:
Field: price (expected: float)
Value received: "foo" (invalid)
Resolution:
Update your application logic to ensure only numeric values are sent for the price field.
Still Stuck?
Don’t sweat it—sometimes even the best code lets a gremlin slip through.
If you keep running into this issue or can’t locate the source, contact OPENSOLR support for expert help.
Heads up, Solr wranglers! We’ve moved the maxBooleanClauses setting into your solr.xml (in your Solr home directory alongside your cores/collections).
<solr><intname="maxBooleanClauses">90589</int><!-- 🛠️ Bump your limit here --><shardHandlerFactoryname="shardHandlerFactory"class="HttpShardHandlerFactory"><intname="socketTimeout">${socketTimeout:600000}</int><intname="connTimeout">${connTimeout:60000}</int></shardHandlerFactory></solr>
😱 Symptom
org.apache.solr.common.SolrException: Too many boolean clauses (1025).
Default limit: 1024 clauses.
Triggers: large synonym files, wildcard expansions, or overly broad query filters.
🔍 Root Causes
Synonym explosion: Each synonym generates extra clauses.
Wildcard & prefix queries: * or leading wildcards (*term) expand to many terms.
Why am I not getting results for query A, in AJAX, but I am getting results for query A, without AJAX? Why am I not getting results for query A? Why am I not getting results for query B?
Opensolr, provides the Solr as a Service platform. Solr search results are not under the responsability of Opensolr, but rather, the way queries will work, depends solely on your Solr implementation, or the implementation of the CMS system you are using.
AnalyzingInfixSuggester is a Solr suggester implementation that provides powerful autocomplete and type-ahead functionality. Unlike basic prefix suggesters, it can match on substrings within terms, so a query for "york" would match "New York." It builds an internal lookup data structure (a Lucene index) at startup or upon first use, which is what makes it both powerful and problematic in certain configurations.
Why Does It Fail on Index Reload?
The root cause is that AnalyzingInfixSuggester builds its lookup structure during core initialization. When you reload your Opensolr Index, the suggester tries to rebuild this lookup from data that may not yet be available or accessible. This is especially common when:
The suggester dictionary does not have a dedicated storeDir configured, causing multiple suggesters to collide on the same directory.
The Solr configuration was exported by Drupal's search_api_solr module, which may not include the necessary directory separation for each dictionary.
The data directory is locked or incomplete during the reload cycle.
How to Fix It
Option 1: Add a Separate storeDir for Each Suggester Dictionary
Edit your solrconfig.xml (or solrconfig_extra.xml if using Drupal) and make sure each suggester dictionary has its own storeDir parameter:
BlendedInfixLookupFactory is a more robust alternative that also supports infix matching but handles the reload lifecycle better. Replace AnalyzingInfixLookupFactory with BlendedInfixLookupFactory in your configuration:
If you do not need autocomplete functionality, you can remove the suggester configuration from your solrconfig.xml entirely. Look for any <searchComponent name="suggest"> blocks and the associated <requestHandler name="/suggest"> and remove them.
After Making Changes
Once you have updated your configuration files in the Opensolr Config Files Editor, click Reload in your Opensolr Index control panel. If the reload succeeds without errors, the fix is working. Check the Error Log to confirm there are no remaining issues.
In Opensolr, a default new index, will feature asearch.xmlSolr configuration file (referenced further in solrconfig.xml), that you will have to address, since that contains a default /select requestHandler defined there, that may not be what you are looking for, when you are using your own custom datasets, etc.
You can fix this by simply removing the reference to search.xml inside your solrconfig.xml file:
UPDATE: As of Feb 08 2023, the new Opensolr Config Files Uploader, should take care of these dependencies automatically, so the steps below should not be necessary.
However, if you still run into issues, you can try the steps below:
There is often the case (as it is with drupal), that your config files will contain files like schema_extra.xml, or solrconfig_extra.xml
In this case, your main schema.xml and solrconfig.xml will contain references to various fields and types that are defined in those extra files.
Therefore, you need to split your config files archive into multiple archives, and upload them as follows:
- First upload the extra files (zip up the schema_extra.xml and other *_extra.xml files and upload that zip first) - Second upload the main schema.xml file, along with all other resource files, such as stopwords.txt, synonyms.txt, etc. - Third, upload a zip archive that contains solrconfig.xml alone.
Understanding the "Undefined field text" Error in Apache Solr
What Causes the "Undefined field text" Error?
The error message:
org.apache.solr.common.SolrException: Undefined field _text_
means that Solr received a query, filter, or request that references a field named _text_, but this field is not defined in your Solr schema.
This typically happens when:
Your schema.xml does not declare a <field> or <dynamicField> for _text_.
The field is expected by your queries or by a default configuration (e.g., df=_text_), but does not exist.
You are using Solr default configs (like in the example core) elsewhere, but your current core/schema does not have _text_.
A copyField configuration is missing or misconfigured, so nothing is populating _text_.
Upgrading from Solr 4.x/5.x to 6.x+ (or SolrCloud): _text_ was common in old examples, but new setups may not include it.
What Is _text_ in Solr?
_text_ is a conventional field name, not a built-in Solr field.
In many Solr demos, _text_ is a catch-all field that copies the content of multiple other fields (using <copyField>), to make full-text searching easier.
If your schema doesn't define _text_, and a query refers to it, you’ll get this error.
type="text_general" uses Solr's default full-text analysis (use your appropriate type).
multiValued="true" allows multiple fields to be copied into _text_.
stored="false" saves space if you only need it for searching.
2. Use <copyField> to Populate _text_
Add to your schema.xml:
<!-- Example: copy title, description, and content into _text_ --><copyFieldsource="title"dest="_text_"/><copyFieldsource="description"dest="_text_"/><copyFieldsource="content"dest="_text_"/>
Remove references to _text_ in your query, config, and client code.
Set the default search field (df) to an existing field (e.g., title or content).
Or, define and use a new catch-all field with a different name.
Summary Table
Cause
Solution
_text_ not defined
Add <field> for _text_ in schema.xml
_text_ not populated
Add <copyField> for sources to _text_
Query refers to _text_
Update query to use an existing field, or define _text_
Migrated core/config
Add _text_ back, or adjust queries/configs to not use it
Classic Troubleshooting Steps
Check your schema.xml for <field name="_text_".../>
Check for <copyField ... dest="_text_"/> entries
Check queries for q=...&_df=_text_ or similar
Check your Solr logs for stack traces and field definitions
Conclusion
The “Undefined field text” error means you’re referencing a field that isn’t defined or populated.
Restore _text_ with <field> and <copyField>, or update your queries/configs to not depend on it.
📦 Adjusting Solr Upload Limits and Boolean Clauses Like a Pro
When your Solr-powered app says “the file’s too big” or “too many boolean clauses!”—don’t panic. Here’s how to tweak your solrconfig.xml so you can upload more, search harder, and (almost) never hit a wall.
🚀 How to Change the Form Data Upload Limit
The key setting: formdataUploadLimitInKB in your solrconfig.xml file.
multipartUploadLimitInKB – For files uploaded via multipart POST.
formdataUploadLimitInKB – For files uploaded as form data.
Increase the value as needed. Warning: Don’t make it TOO big—your JVM heap and security folks will not thank you.
⚠️ Heads Up #1: Boolean Clauses
No matter how big you set those upload limits, if your query contains too many boolean clauses (lots of OR or AND), Solr will throw a “too many boolean clauses” error.
Solution:
Increase maxBooleanClauses in your solrconfig.xml:
<maxBooleanClauses>2048</maxBooleanClauses>
🧠 Heads Up #2: Alternate Query Strategies
If you’re hitting the limit even after increasing it, consider splitting your mega-query into several smaller queries and combining results in your app (think of it as Solr-powered pagination, but for logic!).
Example:
Query 1: https://opensolr-server.solrcluster.com/solr/production/select?q=*:*&fq=field_name:(VAL1 OR ... OR VALn)
Query 2: https://opensolr-server.solrcluster.com/solr/production/select?q=*:*&fq=field_name:(VALn+1 OR ... OR VALm)
And so on...
Where n and m are each less than your maxBooleanClauses setting.
Sometimes, in the shared opensolr cloud, the data folder may get corrupted, so it can't be read from or written into. One easy fix for this, is to simply remove your index, and then just create another one, preferably under another name.
If that doesn't work, please contact us, and we'll be happy to fix it up for you.
Also, keep in mind, that there may be more reasons, so please make sure to check your error log, by clicking the Error Log button inside your opensolr index control panel, and keep refreshing that page to make sure the errors you'll see are accurate.
If you do see errors in there, please email them to us, at support@opensolr.com and we'll fix it for you.
Common Causes of Indexing Failures
Field Names Do Not Match the Schema
One of the most common reasons data fails to index is a mismatch between the field names in your documents and the field names defined in your schema.xml. Solr will reject documents that contain fields not defined in the schema (unless you have a dynamicField rule that matches). Check that every field you are sending exists in your schema, and remember that field names are case-sensitive.
Missing Commit After Indexing
Solr does not make newly indexed documents searchable until a commit is issued. If you are sending documents via the API but they do not appear in search results, make sure you are either:
Sending a commit command after your indexing batch: <commit/>
Appending ?commit=true to your update request URL
Relying on autoCommit settings in your solrconfig.xml (check that <autoCommit> is configured with a reasonable maxTime value)
Document Size or Field Value Too Large
If individual documents contain very large field values (for example, a full PDF text in a single field), the indexing request may time out or fail silently. Try indexing a smaller document first to isolate whether size is the issue.
Authentication or Network Issues
If your Opensolr Index uses HTTP Basic Authentication, make sure your indexing client is sending the correct credentials with every request. An authentication failure may not always produce an obvious error in your application but will result in a 401 Unauthorized response from Solr.
Step-by-Step Debugging
Check the Error Log: Click the Error Log button in your Opensolr Index control panel. This will show you any exceptions Solr encountered while processing your documents. Common errors include unknown field, undefined field type, and document size exceeds limit.
Verify Schema Fields: Open your schema.xml in the Config Files Editor and confirm that every field in your indexing payload is defined. Pay attention to the type attribute — sending a string to a numeric field will cause a parsing error.
Test with a Simple cURL POST: Send a minimal test document directly to your Opensolr Index to rule out issues with your application code:
If this works but your application does not, the problem is in your application's indexing code.
Check Your Document Count: Query your index with q=*:*&rows=0 to see the total number of documents. If the count is not increasing after indexing attempts, the documents are being rejected.
Integration-Specific Tips
Drupal with Search API Solr
If you are using Drupal's Search API Solr module, make sure the Solr server connection is configured correctly in the Drupal admin panel. Run drush search-api:rebuild-tracker to reset the indexing tracker, then trigger a re-index. Check the Drupal watchdog logs (drush watchdog:show) for Solr-related errors.
WordPress with WPSolr or Ajax Search Pro
For WordPress integrations, verify the Solr endpoint URL and credentials in the plugin settings. Some plugins require you to manually trigger an initial full index from the plugin settings page. Check the WordPress debug log (wp-content/debug.log) if indexing appears to do nothing.
Custom Applications
If you are posting documents from a custom application using SolrJ, Python (pysolr, sunburnt), or any other client library, make sure you are handling the Solr response correctly. A successful index request returns an HTTP 200 status. Any other status code indicates a problem — parse the response body for the specific error message.
If you usually get an error, such as: Unknown field... Or Missing field, and your schema.xml already contains those fields, make sure you disable the Schemaless mode in solrconfig.xml
Just head on to the Config Files Editor in your opensolr index control panel, and locate a snippet that looks like this:
class="ManagedIndexSchemaFactory"
According to the solr documentation, you can disable the ManagedIndexSchemaFactory as per the instructions below:
To disable dynamic schema REST APIs, use the following for :
Also do not forget to remove the entire snippet regarding the ManagedIndexSchemaFactory, so that you won't accidentally use both.
Common Reasons Your Schema Changes Are Not Applied
1. You Have Not Reloaded Your Index
After editing your schema.xml through the Opensolr Config Files Editor, you must reload your Opensolr Index for changes to take effect. Simply saving the file is not enough. Click the Reload button in your Opensolr Index control panel after every schema change.
2. Managed-Schema Is Overriding Your schema.xml
Solr can operate in two schema modes: classic mode (using schema.xml) and managed mode (using managed-schema). If your solrconfig.xml contains a ManagedIndexSchemaFactory definition, Solr ignores schema.xml entirely and uses managed-schema instead. This is the most common reason schema.xml changes appear to have no effect.
To check which mode is active, open your solrconfig.xml in the Config Files Editor and search for schemaFactory. If you see:
Then your index is in managed mode. Replace it with the classic factory as described above to use schema.xml.
3. Typos in Field Names or Types
A common mistake is a mismatch between the field name in your schema.xml and the field name used when indexing documents. Field names are case-sensitive. For example, productName and productname are two different fields. Double-check your field definitions against the actual field names in your indexing requests.
4. Missing Field Types
If you reference a type in your field definition that does not exist in the schema, Solr will reject the configuration on reload. Make sure every type="..." attribute in your <field> definitions matches a <fieldType> defined elsewhere in the same schema file.
Step-by-Step Debugging Guide
Check the Error Log: After reloading, click the Error Log button in your Opensolr control panel. If the schema has syntax errors or missing field types, the error messages will appear here.
Verify the Active Schema: Use the Opensolr API or the Config Files Editor to confirm which schema file is currently active. If managed-schema exists and is being used, your schema.xml edits will not apply.
Test with a Simple Field: Add a single new field to your schema, reload, and try indexing a document with that field. This isolates whether the issue is with the reload process or with a specific field definition.
Compare Field Names: Make sure the field names in your indexing code (Drupal, WordPress, custom scripts) exactly match the field names in schema.xml, including case.
schema.xml vs managed-schema
Understanding the difference between these two approaches is important:
schema.xml (Classic Mode): You manually edit the XML file to define fields and field types. Changes require a reload. This gives you full control and is recommended when you want a predictable, version-controlled schema.
managed-schema (Managed Mode): Solr manages the schema file and allows changes through the Schema API (REST endpoints). This enables dynamic field creation without editing XML files but can lead to unexpected fields being added automatically.
On Opensolr, most users prefer classic schema.xml mode because it provides explicit control over the index structure. If you need to switch between modes, see our FAQ on moving from schema.xml to managed-schema.
Opensolr and JTS Topology Suite: A Match Made in Geo Heaven
Yes, you heard right: Opensolr now supports the JTS Topology Suite! And we do it with the kind of grace you’d expect from a platform that respects tradition while embracing progress.
What’s the Big Deal About JTS Topology Suite?
The JTS Topology Suite is like the Swiss Army Knife for geometric operations in Java. It's powerful, precise, and (let’s be honest) something you never knew you needed until you did. While it doesn’t come bundled with vanilla Solr, we’ve gone ahead and taken care of that for you.
Is It Enabled for Me?
In most Opensolr servers and data centers, JTS support is switched on by default. If you find that your index seems blissfully unaware of this geometric upgrade, don’t panic—simply contact our support team. We'll flip the switch faster than you can say “topology.”
Do I Need to Set Anything Up?
Nope! Once enabled, no extra setup is required on your part. That means less tinkering and more time for what really matters—like reading classic Solr documentation for fun.
Summary
JTS Topology Suite is now supported in Opensolr (not in default Solr).
If you’re uploading or saving configuration files using the Opensolr Editor, you might occasionally be greeted by an error that looks a little something like this:
Error loading class 'solr.ICUCollationField'
Don’t worry—this doesn’t mean the sky is falling or that your config files have started speaking in tongues.
Why Does This Happen?
The error above simply means the ICU (International Components for Unicode) library isn’t enabled on your Opensolr server (yet!). This library is required if your configuration references classes like solr.ICUCollationField—usually for advanced language collation and sorting.
How Do I Fix It?
The solution is delightfully simple: Contact Opensolr Support and request that we enable the ICU library for your server.
A real human (yes, a human!) will flip the right switches for your server, and you’ll be back to uploading config files in no time.
Not Sure If This Is Your Error?
If you’re not sure what sort of error you’re running into—or just want to peek under the hood—you can always check your Error Logs after uploading config files:
Click your Index Name in the Opensolr Dashboard
Go to Tools → Error Logs
You'll see something like this button in your dashboard:
Check the logs to spot any ICU or other config errors. If it smells like ICU, contact us—if it smells like something else, well… contact us anyway. We're here to help!
TL;DR
Get an ICU error? Ask Opensolr Support to enable the library for your server.
Not sure what’s broken? Check your Error Logs (Index Name → Tools → Error Logs).
You're trying to index a document and Solr throws this at you:
java.lang.IllegalArgumentException: Document contains at least one
immense term in field="your_field_name" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped.
Please correct the analyzer to not produce such terms.
This means a single term (token) in one of your fields exceeds 32,766 bytes — the hard limit imposed by Lucene's inverted index format. Lucene literally cannot store a term that large, so the document gets rejected.
What's Actually Happening
When Solr indexes text, it breaks it into tokens (terms) using your analyzer chain. Each token becomes an entry in the inverted index, and Lucene caps each entry at 32,766 bytes of UTF-8.
The problem is: if your field type does not tokenize the input (or barely tokenizes it), the entire field value is treated as a single giant term.
Common Causes
1. Using a string Field Type for Long Text
This is the number one cause. The string field type (solr.StrField) stores the entire value as a single token — no analysis, no tokenization. It's designed for short, exact-match values like IDs, tags, or status codes.
If you accidentally assign a string type to a field that receives full HTML pages, article bodies, or concatenated text, you'll hit the limit fast.
<!-- ❌ This will break on large content --><fieldname="sm_aggregated_field"type="string"indexed="true"stored="true"/>
2. A KeywordTokenizer with No Further Processing
The KeywordTokenizer treats the entire input as one token — same problem as string, just wrapped in a field type definition.
<!-- ❌ Still one giant token --><fieldTypename="text_keyword"class="solr.TextField"><analyzer><tokenizerclass="solr.KeywordTokenizerFactory"/></analyzer></fieldType>
3. Raw HTML or Encoded Data in the Field
Even with a proper tokenizer, if your application is sending raw HTML, Base64-encoded blobs, or serialized objects into a text field, you can end up with enormous single tokens — especially from long URLs in src or href attributes, inline CSS/JS, or data URIs.
The prefix in your error message is a clue. Decoding those bytes:
That's HTML with an embedded image — a classic sign of raw HTML being pushed into a field that can't handle it.
4. Aggregated / Concatenated Fields
Some applications (like Drupal's Search API) create aggregated fields that combine multiple source fields into one. If the combined content is huge and the field type doesn't tokenize, you get the immense term error.
How to Fix It
Solution 1: Change the Field Type to a Tokenized Type (Recommended)
The most straightforward fix. Switch your field from string to a text-based type that has a proper tokenizer:
<!-- ✅ Standard tokenized text field --><fieldname="sm_aggregated_field"type="text_general"indexed="true"stored="true"/>
Or if you need n-gram partial matching:
<!-- ✅ N-gram tokenized field --><fieldname="sm_aggregated_field"type="text_ngram"indexed="true"stored="true"/>
Common tokenized field types available in most Solr schemas:
Field Type
Tokenizer
Best For
text_general
StandardTokenizer
General full-text search
text_en
StandardTokenizer + stemming
English language content
text_ws
WhitespaceTokenizer
Whitespace-delimited text
text_ngram
NGramTokenizer
Partial / substring matching
Solution 2: Strip HTML Before Indexing
If the field is receiving raw HTML, strip it at the application level before sending it to Solr:
Or use Solr's built-in HTMLStripCharFilterFactory in your field type:
<!-- ✅ Strip HTML during analysis --><fieldTypename="text_html"class="solr.TextField"><analyzer><charFilterclass="solr.HTMLStripCharFilterFactory"/><tokenizerclass="solr.StandardTokenizerFactory"/><filterclass="solr.LowerCaseFilterFactory"/></analyzer></fieldType>
Solution 3: Truncate the Field Value
If you don't need the full content indexed (e.g., it's just for display), you can truncate at the application level or use Solr's LengthFilterFactory to drop oversized tokens:
Note: this only works if you don't need to search within the field.
Drupal / Search API Users
If you're using Drupal with Search API and the field is sm_aggregated_field, this is almost certainly an aggregated fulltext field combining multiple content fields. The fix:
In your Opensolr schema.xml, find the field definition for sm_aggregated_field
Change its type from string to text_general (or another tokenized type)
Save and reload your Opensolr Index
Re-index your content in Drupal
Quick Checklist
☐ Check the field type in your schema.xml — is it string or text_*?
☐ Check for raw HTML — are you stripping tags before indexing?
☐ Check aggregated fields — are multiple fields concatenated into one?
☐ Check for binary/encoded data — Base64, data URIs, serialized blobs?
☐ After changing schema — always reload the Opensolr Index and re-index your data
Got a field type question? Reach out to us at support@opensolr.com — we're happy to help you pick the right analyzer chain for your data. 🙌