True Parallel Architecture (v2.1)
Version 2.1 introduces a completely rewritten parallelism model using pre-fetch partitioning. This eliminates race conditions and achieves true parallel processing with 5-10x speedup.
indexItems() which has internal locking — only one worker
could claim items at a time. Now items are pre-fetched and distributed, so workers never compete.
The Problem with Traditional Approach
Search API's $index->indexItems() method queries the tracker for pending items,
marks them as in-progress, and indexes them. When multiple workers call this simultaneously:
- Only one worker successfully claims items (database locking)
- Other workers get 0 items and spin/retry
- Result: 10 workers running, but only 1 actually indexing
- No speedup despite spawning multiple processes
The Solution: Pre-fetch Partitioning
The master process now handles item fetching and distribution:
Master Process Loop:
│
├── 1. Fetch N items directly from database
│ SELECT item_id FROM search_api_item
│ WHERE status=1 LIMIT (workers × batch)
│
├── 2. Split into chunks (one per worker)
│ Worker 1: items 0-199
│ Worker 2: items 200-399
│ Worker 3: items 400-599
│ ... and so on
│
├── 3. Write chunks to temp files
│ /tmp/opensolr_chunks/chunk_index_0.json
│ /tmp/opensolr_chunks/chunk_index_1.json
│ ...
│
├── 4. Spawn workers (each gets a chunk file)
│ pcntl_fork() + pcntl_exec(drush opensolr:worker)
│
├── 5. Wait for ALL workers to complete
│ pcntl_waitpid() for each PID
│
├── 6. Clean up chunk files
│
└── 7. Repeat until no items remaining
Worker Process
Each worker now has a simple, focused job:
- Read assigned item IDs from chunk file
- Load entities via
$datasource->loadMultiple() - Index items via
$index->indexSpecificItems() - Exit
Resume Support
A key benefit of this architecture is full resume support. You can stop and restart the indexer at any time without missing or duplicating items.
How Resume Works
- Tracker is source of truth — The
search_api_itemtable tracks each item's status - Items marked after success — Status changes from
1(pending) to0(indexed) only after successful indexing - Safe to stop anytime — Incomplete items remain
status=1and are picked up on restart
What Happens on Stop
| Item State | Status | On Restart |
|---|---|---|
| Already indexed | status = 0 |
Skipped (not re-fetched) |
| In-progress when stopped | status = 1 |
Will be retried |
| Not yet fetched | status = 1 |
Will be fetched |
Resume Example
# Start indexing
drush ost --workers=10 --batch=500 --background
# Check progress
drush oss
# Shows: 45% complete, 300,000 remaining
# Stop for maintenance
drush osstop
# ... later ...
# Resume - picks up where it left off
drush ost --workers=10 --batch=500 --background
# Check progress
drush oss
# Shows: 45% complete, 300,000 remaining (same as before)
Performance Comparison
| Metric | v2.0 (Old) | v2.1 (New) |
|---|---|---|
| 10 workers requested | ~1 actually working | 10 actually working |
| Throughput | ~100-200 items/sec | ~500-1500 items/sec |
| Worker contention | High (competing for items) | None (pre-partitioned) |
| Resume support | Yes | Yes |
Temp Files
Chunk files are stored in /tmp/opensolr_chunks/ and are automatically
cleaned up after each round. If the indexer crashes, you can safely delete them:
rm -rf /tmp/opensolr_chunks/