True Parallel Architecture (v2.1)

Fetch Items Partition Spawn Workers Wait & Repeat

Version 2.1 introduces a completely rewritten parallelism model using pre-fetch partitioning. This eliminates race conditions and achieves true parallel processing with 5-10x speedup.

🚀 What Changed
Previous versions used indexItems() which has internal locking — only one worker could claim items at a time. Now items are pre-fetched and distributed, so workers never compete.

The Problem with Traditional Approach

Search API's $index->indexItems() method queries the tracker for pending items, marks them as in-progress, and indexes them. When multiple workers call this simultaneously:

The Solution: Pre-fetch Partitioning

The master process now handles item fetching and distribution:

Master Process Loop:
│
├── 1. Fetch N items directly from database
│      SELECT item_id FROM search_api_item
│      WHERE status=1 LIMIT (workers × batch)
│
├── 2. Split into chunks (one per worker)
│      Worker 1: items 0-199
│      Worker 2: items 200-399
│      Worker 3: items 400-599
│      ... and so on
│
├── 3. Write chunks to temp files
│      /tmp/opensolr_chunks/chunk_index_0.json
│      /tmp/opensolr_chunks/chunk_index_1.json
│      ...
│
├── 4. Spawn workers (each gets a chunk file)
│      pcntl_fork() + pcntl_exec(drush opensolr:worker)
│
├── 5. Wait for ALL workers to complete
│      pcntl_waitpid() for each PID
│
├── 6. Clean up chunk files
│
└── 7. Repeat until no items remaining

Worker Process

Each worker now has a simple, focused job:

  1. Read assigned item IDs from chunk file
  2. Load entities via $datasource->loadMultiple()
  3. Index items via $index->indexSpecificItems()
  4. Exit
💡 Why This Works
Each worker has a dedicated, non-overlapping set of items. No contention, no locking, no wasted cycles. True parallel processing.

Resume Support

A key benefit of this architecture is full resume support. You can stop and restart the indexer at any time without missing or duplicating items.

How Resume Works

  1. Tracker is source of truth — The search_api_item table tracks each item's status
  2. Items marked after success — Status changes from 1 (pending) to 0 (indexed) only after successful indexing
  3. Safe to stop anytime — Incomplete items remain status=1 and are picked up on restart

What Happens on Stop

Item State Status On Restart
Already indexed status = 0 Skipped (not re-fetched)
In-progress when stopped status = 1 Will be retried
Not yet fetched status = 1 Will be fetched
✅ Safe Guarantee
No duplicates, no missed items. The tracker ensures exactly-once processing across any number of stop/restart cycles.

Resume Example

# Start indexing
drush ost --workers=10 --batch=500 --background

# Check progress
drush oss
# Shows: 45% complete, 300,000 remaining

# Stop for maintenance
drush osstop

# ... later ...

# Resume - picks up where it left off
drush ost --workers=10 --batch=500 --background

# Check progress
drush oss
# Shows: 45% complete, 300,000 remaining (same as before)

Performance Comparison

Metric v2.0 (Old) v2.1 (New)
10 workers requested ~1 actually working 10 actually working
Throughput ~100-200 items/sec ~500-1500 items/sec
Worker contention High (competing for items) None (pre-partitioned)
Resume support Yes Yes

Temp Files

Chunk files are stored in /tmp/opensolr_chunks/ and are automatically cleaned up after each round. If the indexer crashes, you can safely delete them:

rm -rf /tmp/opensolr_chunks/