How Parallel Solr Indexing Works - pcntl_fork Architecture

Master/worker architecture with pcntl_fork

How It Works

Master Process Fork Workers Parallel Indexing Respawn

The turbo indexer uses pcntl_fork() and pcntl_exec() to spawn independent worker processes. Each worker:

  1. Bootstraps a fresh Drupal/Drush instance
  2. Requests a batch of items from the Search API tracker
  3. Indexes them to Solr
  4. Exits cleanly

The master process monitors workers and respawns them as they complete. This continues until all items are indexed. The Search API tracker handles coordination - each worker gets different items to index, with minimal overlap.

💡 Why Parallel Works
Solr indexing is idempotent - sending the same document twice just overwrites it. This means occasional overlap between workers is harmless, so we can skip expensive locking mechanisms and run at full speed.

Architecture

Master Process

The master process (started by drush ost) is responsible for:

  • Spawning the initial set of workers
  • Monitoring worker health via posix_kill()
  • Reaping zombie processes
  • Respawning workers when they complete
  • Displaying progress information

Worker Processes

Each worker is a completely independent PHP process that:

  • Has its own memory space (no shared state)
  • Bootstraps Drupal fresh (clean entity cache)
  • Processes exactly one batch then exits
  • Reports errors via exit codes

Dynamic Worker Management

The indexer maintains the exact number of workers you request. If you specify --workers=7, it ensures 7 workers are always running:

  • Workers are counted in real-time using posix_kill($pid, 0)
  • Dead workers are detected and removed from tracking
  • New workers are spawned until the target count is reached
  • 150ms stagger between spawns prevents bootstrap race conditions