Solr's DataImportHandler (DIH) was removed in Solr 9. If you were using DIH to pull data from MySQL or PostgreSQL directly into Solr, you now need a simple script that queries your database, converts each row to a JSON document, and POSTs it to Solr's /update endpoint.
This guide gives you complete, working examples in curl, PHP, and Python — with MySQL connections, batching, and progress reporting.
Also see: How to Import XML Data into Opensolr if your source data is in XML format.
The Solr JSON Format
Solr accepts documents as a JSON array. Each object in the array becomes one document in your index:
[ { "id": "product-001", "title": "Wireless Headphones", "description": "Premium noise-cancelling wireless headphones", "price": 149.99, "category": "Electronics", "in_stock": true }, { "id": "product-002", "title": "Bluetooth Speaker", "description": "Portable waterproof Bluetooth speaker", "price": 79.99, "category": "Electronics", "in_stock": true } ]
Key rules:
- The payload is a JSON array of objects
- Each object is one Solr document
- Field names must match fields in your
schema.xml - The
idfield is required (unique key for each document) - Numeric fields should be sent as numbers (not strings) —
149.99not"149.99" - Boolean fields should be sent as
true/false(not"true") - You can send as many documents per batch as you want (recommended: 500–1000 per batch)
Example 1: curl — Quick Import from a JSON File
If you already have your data exported as a JSON file, the simplest way to import it:
#!/bin/bash # ===================================================================== # CONFIGURATION — Replace with YOUR Opensolr index details # ===================================================================== SOLR_HOST="YOUR_HOST" INDEX_NAME="YOUR_INDEX" USERNAME="opensolr" PASSWORD="YOUR_API_KEY" JSON_FILE="products.json" # Import the JSON file into Solr echo "Importing $JSON_FILE into $INDEX_NAME..." curl -u "$USERNAME:$PASSWORD" \ -H "Content-Type: application/json" \ --data-binary @"$JSON_FILE" \ "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json" echo "" echo "Done!"
You can also pipe a MySQL query directly to Solr using mysql CLI + jq:
# One-liner: MySQL → JSON → Solr mysql -u dbuser -pdbpass mydb -e "SELECT id, title, price FROM products" --batch --raw \ | python3 -c " import sys, json, csv reader = csv.DictReader(sys.stdin, delimiter='\t') docs = [dict(r) for r in reader] print(json.dumps(docs)) " \ | curl -u "$USERNAME:$PASSWORD" \ -H "Content-Type: application/json" \ -d @- \ "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"
Example 2: PHP — MySQL to Solr with Batching
This script connects to your MySQL database, reads rows, converts them to JSON, and sends them to Solr in configurable batches. Handles millions of rows without running out of memory.
<?php // ===================================================================== // CONFIGURATION — Replace with YOUR details // ===================================================================== // Your MySQL database $db_host = "localhost"; $db_user = "your_db_user"; $db_pass = "your_db_password"; $db_name = "your_database"; // Your Opensolr index $solr_host = "YOUR_HOST"; // e.g. de9.solrcluster.com $index_name = "YOUR_INDEX"; // e.g. my_products $solr_user = "opensolr"; $solr_pass = "YOUR_API_KEY"; // from your Opensolr control panel $batch_size = 500; // documents per batch $solr_url = "https://$solr_host/solr/$index_name/update?commit=true&wt=json"; // ===================================================================== // CONNECT TO MYSQL // ===================================================================== $db = new mysqli($db_host, $db_user, $db_pass, $db_name); if ($db->connect_error) { die("MySQL connection failed: " . $db->connect_error . "\n"); } $db->set_charset("utf8mb4"); // ===================================================================== // QUERY YOUR DATA // ===================================================================== // Adjust this query to match YOUR table and columns. // Column names become Solr field names — make sure they match your schema.xml $sql = "SELECT id, title, description, price, category, CASE WHEN in_stock = 1 THEN 'true' ELSE 'false' END AS in_stock, DATE_FORMAT(created_at, '%Y-%m-%dT%H:%i:%sZ') AS created_at FROM products ORDER BY id"; $result = $db->query($sql); if (!$result) { die("Query failed: " . $db->error . "\n"); } $total = $result->num_rows; echo "Found $total rows to index\n"; // ===================================================================== // SEND TO SOLR IN BATCHES // ===================================================================== $batch = []; $sent = 0; while ($row = $result->fetch_assoc()) { // Convert numeric strings to actual numbers for Solr if (isset($row['price'])) { $row['price'] = (float)$row['price']; } // Convert boolean strings if (isset($row['in_stock'])) { $row['in_stock'] = ($row['in_stock'] === 'true'); } $batch[] = $row; if (count($batch) >= $batch_size) { $sent += send_batch($batch, $solr_url, $solr_user, $solr_pass); $batch = []; echo " Indexed $sent / $total\n"; } } // Send remaining rows if (!empty($batch)) { $sent += send_batch($batch, $solr_url, $solr_user, $solr_pass); echo " Indexed $sent / $total\n"; } echo "Done! $sent documents indexed.\n"; $db->close(); // ===================================================================== // HELPER: Send a batch of documents to Solr as JSON // ===================================================================== function send_batch($docs, $url, $user, $pass) { $json = json_encode($docs, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES); $ch = curl_init($url); curl_setopt_array($ch, [ CURLOPT_USERPWD => "$user:$pass", CURLOPT_POST => true, CURLOPT_POSTFIELDS => $json, CURLOPT_HTTPHEADER => ["Content-Type: application/json"], CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 120, ]); $response = curl_exec($ch); $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($http_code !== 200) { echo " ERROR (HTTP $http_code): $response\n"; return 0; } return count($docs); }
Example 3: Python — MySQL to Solr with Batching
Same approach in Python using mysql-connector-python and requests. Handles large result sets with server-side cursors.
# import_sql.py — Query MySQL and index into Opensolr import mysql.connector import requests import json import sys # ===================================================================== # CONFIGURATION — Replace with YOUR details # ===================================================================== # Your MySQL database DB_HOST = "localhost" DB_USER = "your_db_user" DB_PASS = "your_db_password" DB_NAME = "your_database" # Your Opensolr index SOLR_HOST = "YOUR_HOST" # e.g. de9.solrcluster.com INDEX_NAME = "YOUR_INDEX" # e.g. my_products SOLR_USER = "opensolr" SOLR_PASS = "YOUR_API_KEY" # from your Opensolr control panel BATCH_SIZE = 500 # documents per batch SOLR_URL = f"https://{SOLR_HOST}/solr/{INDEX_NAME}/update?commit=true&wt=json" def send_batch(docs, url, auth): """Send a batch of documents to Solr as JSON.""" resp = requests.post( url, data=json.dumps(docs, default=str).encode("utf-8"), headers={"Content-Type": "application/json"}, auth=auth, timeout=120, ) if resp.status_code != 200: print(f" ERROR (HTTP {resp.status_code}): {resp.text}") return 0 return len(docs) def main(): auth = (SOLR_USER, SOLR_PASS) # ===================================================================== # CONNECT TO MYSQL # ===================================================================== conn = mysql.connector.connect( host=DB_HOST, user=DB_USER, password=DB_PASS, database=DB_NAME, charset="utf8mb4", ) cursor = conn.cursor(dictionary=True) # ===================================================================== # QUERY YOUR DATA # ===================================================================== # Adjust this query to match YOUR table and columns. # Column names become Solr field names. sql = """ SELECT id, title, description, price, category, in_stock, created_at FROM products ORDER BY id """ cursor.execute(sql) # ===================================================================== # SEND TO SOLR IN BATCHES # ===================================================================== batch = [] sent = 0 total = 0 for row in cursor: # Convert types for Solr doc = dict(row) # Convert Decimal to float if "price" in doc and doc["price"] is not None: doc["price"] = float(doc["price"]) # Convert datetime to ISO string if "created_at" in doc and doc["created_at"] is not None: doc["created_at"] = doc["created_at"].strftime("%Y-%m-%dT%H:%M:%SZ") # Convert tinyint to boolean if "in_stock" in doc: doc["in_stock"] = bool(doc["in_stock"]) batch.append(doc) total += 1 if len(batch) >= BATCH_SIZE: sent += send_batch(batch, SOLR_URL, auth) batch = [] print(f" Indexed {sent} / {total}...") # Send remaining rows if batch: sent += send_batch(batch, SOLR_URL, auth) print(f" Indexed {sent} / {total}...") print(f"Done! {sent} documents indexed out of {total} rows.") cursor.close() conn.close() if __name__ == "__main__": main()
Install dependencies:
pip install mysql-connector-python requests
Example 4: Python — PostgreSQL to Solr
If your database is PostgreSQL instead of MySQL, swap the connector:
# import_postgres.py — Query PostgreSQL and index into Opensolr import psycopg2 import psycopg2.extras import requests import json # ===================================================================== # CONFIGURATION # ===================================================================== DB_HOST = "localhost" DB_USER = "your_db_user" DB_PASS = "your_db_password" DB_NAME = "your_database" SOLR_HOST = "YOUR_HOST" INDEX_NAME = "YOUR_INDEX" SOLR_USER = "opensolr" SOLR_PASS = "YOUR_API_KEY" BATCH_SIZE = 500 SOLR_URL = f"https://{SOLR_HOST}/solr/{INDEX_NAME}/update?commit=true&wt=json" def send_batch(docs, url, auth): """Send a batch of documents to Solr as JSON.""" resp = requests.post( url, data=json.dumps(docs, default=str).encode("utf-8"), headers={"Content-Type": "application/json"}, auth=auth, timeout=120, ) if resp.status_code != 200: print(f" ERROR (HTTP {resp.status_code}): {resp.text}") return 0 return len(docs) def main(): auth = (SOLR_USER, SOLR_PASS) conn = psycopg2.connect( host=DB_HOST, user=DB_USER, password=DB_PASS, dbname=DB_NAME ) cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) cursor.execute(""" SELECT id, title, description, price, category, in_stock, created_at FROM products ORDER BY id """) batch = [] sent = 0 total = 0 for row in cursor: doc = dict(row) if "price" in doc and doc["price"] is not None: doc["price"] = float(doc["price"]) if "created_at" in doc and doc["created_at"] is not None: doc["created_at"] = doc["created_at"].strftime("%Y-%m-%dT%H:%M:%SZ") if "in_stock" in doc: doc["in_stock"] = bool(doc["in_stock"]) batch.append(doc) total += 1 if len(batch) >= BATCH_SIZE: sent += send_batch(batch, SOLR_URL, auth) batch = [] print(f" Indexed {sent} / {total}...") if batch: sent += send_batch(batch, SOLR_URL, auth) print(f"Done! {sent} documents indexed out of {total} rows.") cursor.close() conn.close() if __name__ == "__main__": main()
Install dependencies:
pip install psycopg2-binary requests
Handling Large Datasets
When importing millions of rows, keep these tips in mind:
| Tip | Why |
|---|---|
| Use batches of 500–1000 docs | Keeps memory low and avoids Solr timeouts |
Use commit=true on the last batch only |
Faster: intermediate batches use commit=false, final batch commits everything |
| Use server-side cursors | Prevents loading all rows into memory at once |
Set CURLOPT_TIMEOUT to 120+ |
Large batches may take longer than the default timeout |
| Cast types correctly | Send numbers as numbers, booleans as booleans — not strings |
Add a unique id field |
Required by Solr. Use your primary key or generate a UUID |
| Match field names to your schema | Column names in your SELECT become Solr field names |
Commit Strategy for Large Imports
For best performance on large imports, only commit on the final batch:
// During the loop — no commit $url_no_commit = "https://$solr_host/solr/$index_name/update?wt=json"; // After all batches — commit everything $url_commit = "https://$solr_host/solr/$index_name/update?commit=true&wt=json"; curl_setopt($ch, CURLOPT_URL, $url_commit); curl_setopt($ch, CURLOPT_POSTFIELDS, "[]"); curl_exec($ch);
Deleting Old Data Before Re-Import
If you want to wipe the index before a full re-import:
# Delete all documents curl -u "$USERNAME:$PASSWORD" \ -H "Content-Type: application/json" \ -d '{"delete":{"query":"*:*"}}' \ "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"
Or delete selectively:
# Delete only products in a specific category curl -u "$USERNAME:$PASSWORD" \ -H "Content-Type: application/json" \ -d '{"delete":{"query":"category:Electronics"}}' \ "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"
Quick Reference
| Item | Value |
|---|---|
| Solr endpoint | https://HOST/solr/INDEX/update?commit=true&wt=json |
| Content-Type | application/json |
| Authentication | HTTP Basic Auth (username + API key) |
| Payload format | JSON array of objects: [{"id":"1", ...}, ...] |
| Batch size | 500–1000 documents per POST (recommended) |
| Required field | id (unique key) |
| Date format | ISO 8601: 2024-01-15T10:30:00Z |
| Related FAQ | Import from XML |
Need help importing your database into Opensolr? We are here to help — free of charge.
Contact Support