Leader Not Available — Index Fetch Failed During Replication

The Error

Your Opensolr Index replication fails with an error like this:

WARN IndexFetcher:509 - Leader at: http://172.x.x.x/solr/your_core
is not available. Index fetch failed by exception:
org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
Error from server at http://172.x.x.x/solr/your_core:
Expected mime type in [application/octet-stream,
application/vnd.apache.solr.javabin] but got text/html.

You may also see variations such as:

Leader at: http://... is not available
Index fetch failed by exception
Expected mime type application/octet-stream but got text/html
Expected mime type application/vnd.apache.solr.javabin but got text/html
RemoteSolrException: Error from server
IndexFetcher:509

All of these point to the same root cause: the follower tried to fetch index data from the leader, but the leader returned an HTML page instead of binary Solr data.

What Is Actually Happening

Solr replication works by having follower (replica) nodes periodically poll the leader (master) node to download index segments. This communication uses Solr's binary protocol, which expects responses in application/octet-stream or application/vnd.apache.solr.javabin format.

When the follower receives text/html instead, it means something between the follower and the leader intercepted the request and returned an HTML page. The follower cannot parse HTML as binary index data, so replication fails.

Common Causes

1. Leader Solr Process Is Down

The most common cause. The leader server is running, but the Solr/Jetty process on it has crashed, run out of memory, or was stopped for maintenance. The web server (Apache/Nginx) on the leader is still responding — but it returns an HTML error page (like a 503 or a default landing page) instead of Solr's binary response.

2. Leader Index Was Removed or Renamed

The specific index (core) that the follower is trying to replicate no longer exists on the leader. The leader's Solr is running, but when asked for /solr/your_core/replication, it returns an HTML error because that core is not loaded.

3. Network or Firewall Issues

A firewall, load balancer, or reverse proxy between the follower and leader is intercepting the request and returning its own HTML error page (e.g., a 502 Bad Gateway or a challenge page).

4. Leader Is Overloaded

The leader is under heavy load (large indexing operations, high query traffic, or garbage collection pauses). It cannot respond to the replication request in time, and the connection is either dropped or handled by a proxy that returns an HTML timeout page.

5. Wrong Replication URL

The follower's solrconfig.xml has an incorrect leader URL — pointing to the wrong IP, wrong port, or wrong core name. The request reaches a server, but not Solr's replication handler.

How Opensolr Handles This

If you are using an Opensolr Resilient Cluster, replication is fully managed. Opensolr's infrastructure automatically monitors leader health and restarts Solr processes when they go down. Transient replication failures are normal during brief maintenance windows or rolling restarts — the follower will catch up automatically on the next replication cycle.

For manually configured replication, you need to investigate and resolve the root cause yourself.

How to Diagnose

Check the Leader

The error message contains the leader URL. Verify the leader is alive:

# Check if Solr responds on the leader
curl -s "http://LEADER_HOST:PORT/solr/CORE_NAME/admin/ping"

# Check replication handler directly
curl -s "http://LEADER_HOST:PORT/solr/CORE_NAME/replication?command=details&wt=json"

If you get HTML back instead of JSON, the leader's Solr process is not serving that core.

Check the Error Audit

Use the Error Audit dashboard in your Opensolr control panel. Search for IndexFetcher to see the full timeline of replication failures — how often they occur, when they started, and whether they resolved on their own.

Look at the error frequency:

Occasional errors (a few per day) → Transient. The leader was briefly unavailable during a restart or GC pause. No action needed if replication recovers.
Continuous errors (every replication cycle) → Persistent. The leader is down or misconfigured. Investigate immediately.

Verify the Replication Configuration

In your follower's solrconfig.xml, check the replication handler:

<requestHandler name="/replication" class="solr.ReplicationHandler">
  <lst name="follower">
    <str name="leaderUrl">http://LEADER_HOST:PORT/solr/CORE_NAME/replication</str>
    <str name="pollInterval">00:00:30</str>
  </lst>
</requestHandler>

Make sure:

The leaderUrl points to the correct host, port, and core name
The URL ends with /replication (not just /solr/CORE_NAME)
The leader's IP/hostname is reachable from the follower network

How to Fix It

If the Leader Is Down → Restart It

For Opensolr managed clusters, the watchdog automatically restarts failed Solr processes. If the error persists for more than a few minutes, contact support.

For manual replication setups, restart Solr on the leader server and verify it comes back with the correct core loaded.

If the Core Is Missing → Recreate or Re-point

If the core was removed from the leader, either recreate it or update the follower's leaderUrl to point to the correct core.

If the URL Is Wrong → Fix solrconfig.xml

Update the leaderUrl in the follower's replication handler configuration to the correct leader endpoint. Then reload the follower core.

Force a Replication After Fixing

Once the leader is back, you can trigger an immediate replication instead of waiting for the next poll cycle:

curl "http://FOLLOWER_HOST:PORT/solr/CORE_NAME/replication?command=fetchindex"

Or use the Opensolr Replication API to trigger replication programmatically.

Key Points

This error means the leader is the problem, not the follower. The follower is working correctly — it is reporting that it cannot reach the leader.
Receiving text/html instead of binary data means something (web server, proxy, or error page) answered instead of Solr.
Transient occurrences during maintenance or restarts are normal and self-resolving.
Persistent occurrences indicate the leader is down or misconfigured and needs attention.
Use the Error Audit to track frequency and determine if the issue is transient or persistent.

Need help with replication issues? Contact us at support@opensolr.com — we will diagnose the leader and get your replication back on track.