How to Import Data from MySQL (SQL) into Your Opensolr Index

Configuration

SQL TO SOLR IMPORT FLOWYour DatabaseMySQL / PostgreSQLSELECT * FROM ...Build JSONPHP / Python scriptBatch rows into JSONPOST to Solr/update?commit=trueContent-Type: jsonIndexed!Documents livein your indexDataImportHandler (DIH) was removed in Solr 9Use JSON POST to /update insteadQuery your database, convert rows to JSON, and POST directly to Solr's /update endpoint.Works with MySQL, PostgreSQL, MariaDB, SQLite — any SQL database.

Solr's DataImportHandler (DIH) was removed in Solr 9. If you were using DIH to pull data from MySQL or PostgreSQL directly into Solr, you now need a simple script that queries your database, converts each row to a JSON document, and POSTs it to Solr's /update endpoint.

This guide gives you complete, working examples in curl, PHP, and Python — with MySQL connections, batching, and progress reporting.

Also see: How to Import XML Data into Opensolr if your source data is in XML format.


The Solr JSON Format

Solr accepts documents as a JSON array. Each object in the array becomes one document in your index:

SOLR JSON DOCUMENT FORMAT[{"id": "product-001", "title": "Wireless Headphones", "price": 149.99},Document 1{"id": "product-002", "title": "Bluetooth Speaker", "price": 79.99}Document 2]Each object becomes one document. Field names must match your schema.xml.

[
  {
    "id": "product-001",
    "title": "Wireless Headphones",
    "description": "Premium noise-cancelling wireless headphones",
    "price": 149.99,
    "category": "Electronics",
    "in_stock": true
  },
  {
    "id": "product-002",
    "title": "Bluetooth Speaker",
    "description": "Portable waterproof Bluetooth speaker",
    "price": 79.99,
    "category": "Electronics",
    "in_stock": true
  }
]

Key rules:

  • The payload is a JSON array of objects
  • Each object is one Solr document
  • Field names must match fields in your schema.xml
  • The id field is required (unique key for each document)
  • Numeric fields should be sent as numbers (not strings) — 149.99 not "149.99"
  • Boolean fields should be sent as true/false (not "true")
  • You can send as many documents per batch as you want (recommended: 500–1000 per batch)

$_ Example 1: curl — Quick Import from a JSON File

If you already have your data exported as a JSON file, the simplest way to import it:

#!/bin/bash
# =====================================================================
# CONFIGURATION — Replace with YOUR Opensolr index details
# =====================================================================
SOLR_HOST="YOUR_HOST"
INDEX_NAME="YOUR_INDEX"
USERNAME="opensolr"
PASSWORD="YOUR_API_KEY"

JSON_FILE="products.json"

# Import the JSON file into Solr
echo "Importing $JSON_FILE into $INDEX_NAME..."

curl -u "$USERNAME:$PASSWORD" \
  -H "Content-Type: application/json" \
  --data-binary @"$JSON_FILE" \
  "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"

echo ""
echo "Done!"

You can also pipe a MySQL query directly to Solr using mysql CLI + jq:

# One-liner: MySQL → JSON → Solr
mysql -u dbuser -pdbpass mydb -e "SELECT id, title, price FROM products" --batch --raw \
  | python3 -c "
import sys, json, csv
reader = csv.DictReader(sys.stdin, delimiter='\t')
docs = [dict(r) for r in reader]
print(json.dumps(docs))
" \
  | curl -u "$USERNAME:$PASSWORD" \
      -H "Content-Type: application/json" \
      -d @- \
      "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"

PHP Example 2: PHP — MySQL to Solr with Batching

This script connects to your MySQL database, reads rows, converts them to JSON, and sends them to Solr in configurable batches. Handles millions of rows without running out of memory.

<?php
// =====================================================================
// CONFIGURATION — Replace with YOUR details
// =====================================================================

// Your MySQL database
$db_host = "localhost";
$db_user = "your_db_user";
$db_pass = "your_db_password";
$db_name = "your_database";

// Your Opensolr index
$solr_host  = "YOUR_HOST";         // e.g. de9.solrcluster.com
$index_name = "YOUR_INDEX";        // e.g. my_products
$solr_user  = "opensolr";
$solr_pass  = "YOUR_API_KEY";      // from your Opensolr control panel
$batch_size = 500;                 // documents per batch

$solr_url = "https://$solr_host/solr/$index_name/update?commit=true&wt=json";

// =====================================================================
// CONNECT TO MYSQL
// =====================================================================
$db = new mysqli($db_host, $db_user, $db_pass, $db_name);
if ($db->connect_error) {
    die("MySQL connection failed: " . $db->connect_error . "\n");
}
$db->set_charset("utf8mb4");

// =====================================================================
// QUERY YOUR DATA
// =====================================================================
// Adjust this query to match YOUR table and columns.
// Column names become Solr field names — make sure they match your schema.xml
$sql = "SELECT
            id,
            title,
            description,
            price,
            category,
            CASE WHEN in_stock = 1 THEN 'true' ELSE 'false' END AS in_stock,
            DATE_FORMAT(created_at, '%Y-%m-%dT%H:%i:%sZ') AS created_at
        FROM products
        ORDER BY id";

$result = $db->query($sql);
if (!$result) {
    die("Query failed: " . $db->error . "\n");
}

$total = $result->num_rows;
echo "Found $total rows to index\n";

// =====================================================================
// SEND TO SOLR IN BATCHES
// =====================================================================
$batch = [];
$sent = 0;

while ($row = $result->fetch_assoc()) {
    // Convert numeric strings to actual numbers for Solr
    if (isset($row['price'])) {
        $row['price'] = (float)$row['price'];
    }
    // Convert boolean strings
    if (isset($row['in_stock'])) {
        $row['in_stock'] = ($row['in_stock'] === 'true');
    }

    $batch[] = $row;

    if (count($batch) >= $batch_size) {
        $sent += send_batch($batch, $solr_url, $solr_user, $solr_pass);
        $batch = [];
        echo "  Indexed $sent / $total\n";
    }
}

// Send remaining rows
if (!empty($batch)) {
    $sent += send_batch($batch, $solr_url, $solr_user, $solr_pass);
    echo "  Indexed $sent / $total\n";
}

echo "Done! $sent documents indexed.\n";
$db->close();

// =====================================================================
// HELPER: Send a batch of documents to Solr as JSON
// =====================================================================
function send_batch($docs, $url, $user, $pass) {
    $json = json_encode($docs, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);

    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_USERPWD        => "$user:$pass",
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => $json,
        CURLOPT_HTTPHEADER     => ["Content-Type: application/json"],
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 120,
    ]);

    $response = curl_exec($ch);
    $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($http_code !== 200) {
        echo "  ERROR (HTTP $http_code): $response\n";
        return 0;
    }

    return count($docs);
}

Py Example 3: Python — MySQL to Solr with Batching

Same approach in Python using mysql-connector-python and requests. Handles large result sets with server-side cursors.

# import_sql.py — Query MySQL and index into Opensolr
import mysql.connector
import requests
import json
import sys

# =====================================================================
# CONFIGURATION — Replace with YOUR details
# =====================================================================

# Your MySQL database
DB_HOST = "localhost"
DB_USER = "your_db_user"
DB_PASS = "your_db_password"
DB_NAME = "your_database"

# Your Opensolr index
SOLR_HOST  = "YOUR_HOST"        # e.g. de9.solrcluster.com
INDEX_NAME = "YOUR_INDEX"       # e.g. my_products
SOLR_USER  = "opensolr"
SOLR_PASS  = "YOUR_API_KEY"     # from your Opensolr control panel
BATCH_SIZE = 500                # documents per batch

SOLR_URL = f"https://{SOLR_HOST}/solr/{INDEX_NAME}/update?commit=true&wt=json"

def send_batch(docs, url, auth):
    """Send a batch of documents to Solr as JSON."""
    resp = requests.post(
        url,
        data=json.dumps(docs, default=str).encode("utf-8"),
        headers={"Content-Type": "application/json"},
        auth=auth,
        timeout=120,
    )
    if resp.status_code != 200:
        print(f"  ERROR (HTTP {resp.status_code}): {resp.text}")
        return 0
    return len(docs)

def main():
    auth = (SOLR_USER, SOLR_PASS)

    # =====================================================================
    # CONNECT TO MYSQL
    # =====================================================================
    conn = mysql.connector.connect(
        host=DB_HOST,
        user=DB_USER,
        password=DB_PASS,
        database=DB_NAME,
        charset="utf8mb4",
    )
    cursor = conn.cursor(dictionary=True)

    # =====================================================================
    # QUERY YOUR DATA
    # =====================================================================
    # Adjust this query to match YOUR table and columns.
    # Column names become Solr field names.
    sql = """
        SELECT
            id,
            title,
            description,
            price,
            category,
            in_stock,
            created_at
        FROM products
        ORDER BY id
    """
    cursor.execute(sql)

    # =====================================================================
    # SEND TO SOLR IN BATCHES
    # =====================================================================
    batch = []
    sent = 0
    total = 0

    for row in cursor:
        # Convert types for Solr
        doc = dict(row)

        # Convert Decimal to float
        if "price" in doc and doc["price"] is not None:
            doc["price"] = float(doc["price"])

        # Convert datetime to ISO string
        if "created_at" in doc and doc["created_at"] is not None:
            doc["created_at"] = doc["created_at"].strftime("%Y-%m-%dT%H:%M:%SZ")

        # Convert tinyint to boolean
        if "in_stock" in doc:
            doc["in_stock"] = bool(doc["in_stock"])

        batch.append(doc)
        total += 1

        if len(batch) >= BATCH_SIZE:
            sent += send_batch(batch, SOLR_URL, auth)
            batch = []
            print(f"  Indexed {sent} / {total}...")

    # Send remaining rows
    if batch:
        sent += send_batch(batch, SOLR_URL, auth)
        print(f"  Indexed {sent} / {total}...")

    print(f"Done! {sent} documents indexed out of {total} rows.")
    cursor.close()
    conn.close()

if __name__ == "__main__":
    main()

Install dependencies:

pip install mysql-connector-python requests

Py Example 4: Python — PostgreSQL to Solr

If your database is PostgreSQL instead of MySQL, swap the connector:

# import_postgres.py — Query PostgreSQL and index into Opensolr
import psycopg2
import psycopg2.extras
import requests
import json

# =====================================================================
# CONFIGURATION
# =====================================================================
DB_HOST = "localhost"
DB_USER = "your_db_user"
DB_PASS = "your_db_password"
DB_NAME = "your_database"

SOLR_HOST  = "YOUR_HOST"
INDEX_NAME = "YOUR_INDEX"
SOLR_USER  = "opensolr"
SOLR_PASS  = "YOUR_API_KEY"
BATCH_SIZE = 500

SOLR_URL = f"https://{SOLR_HOST}/solr/{INDEX_NAME}/update?commit=true&wt=json"

def send_batch(docs, url, auth):
    """Send a batch of documents to Solr as JSON."""
    resp = requests.post(
        url,
        data=json.dumps(docs, default=str).encode("utf-8"),
        headers={"Content-Type": "application/json"},
        auth=auth,
        timeout=120,
    )
    if resp.status_code != 200:
        print(f"  ERROR (HTTP {resp.status_code}): {resp.text}")
        return 0
    return len(docs)

def main():
    auth = (SOLR_USER, SOLR_PASS)

    conn = psycopg2.connect(
        host=DB_HOST, user=DB_USER, password=DB_PASS, dbname=DB_NAME
    )
    cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

    cursor.execute("""
        SELECT id, title, description, price, category, in_stock, created_at
        FROM products ORDER BY id
    """)

    batch = []
    sent = 0
    total = 0

    for row in cursor:
        doc = dict(row)
        if "price" in doc and doc["price"] is not None:
            doc["price"] = float(doc["price"])
        if "created_at" in doc and doc["created_at"] is not None:
            doc["created_at"] = doc["created_at"].strftime("%Y-%m-%dT%H:%M:%SZ")
        if "in_stock" in doc:
            doc["in_stock"] = bool(doc["in_stock"])

        batch.append(doc)
        total += 1

        if len(batch) >= BATCH_SIZE:
            sent += send_batch(batch, SOLR_URL, auth)
            batch = []
            print(f"  Indexed {sent} / {total}...")

    if batch:
        sent += send_batch(batch, SOLR_URL, auth)
    print(f"Done! {sent} documents indexed out of {total} rows.")

    cursor.close()
    conn.close()

if __name__ == "__main__":
    main()

Install dependencies:

pip install psycopg2-binary requests

Handling Large Datasets

When importing millions of rows, keep these tips in mind:

Tip Why
Use batches of 500–1000 docs Keeps memory low and avoids Solr timeouts
Use commit=true on the last batch only Faster: intermediate batches use commit=false, final batch commits everything
Use server-side cursors Prevents loading all rows into memory at once
Set CURLOPT_TIMEOUT to 120+ Large batches may take longer than the default timeout
Cast types correctly Send numbers as numbers, booleans as booleans — not strings
Add a unique id field Required by Solr. Use your primary key or generate a UUID
Match field names to your schema Column names in your SELECT become Solr field names

Commit Strategy for Large Imports

For best performance on large imports, only commit on the final batch:

// During the loop — no commit
$url_no_commit = "https://$solr_host/solr/$index_name/update?wt=json";

// After all batches — commit everything
$url_commit = "https://$solr_host/solr/$index_name/update?commit=true&wt=json";
curl_setopt($ch, CURLOPT_URL, $url_commit);
curl_setopt($ch, CURLOPT_POSTFIELDS, "[]");
curl_exec($ch);

Deleting Old Data Before Re-Import

If you want to wipe the index before a full re-import:

# Delete all documents
curl -u "$USERNAME:$PASSWORD" \
  -H "Content-Type: application/json" \
  -d '{"delete":{"query":"*:*"}}' \
  "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"

Or delete selectively:

# Delete only products in a specific category
curl -u "$USERNAME:$PASSWORD" \
  -H "Content-Type: application/json" \
  -d '{"delete":{"query":"category:Electronics"}}' \
  "https://$SOLR_HOST/solr/$INDEX_NAME/update?commit=true&wt=json"

Quick Reference

Item Value
Solr endpoint https://HOST/solr/INDEX/update?commit=true&wt=json
Content-Type application/json
Authentication HTTP Basic Auth (username + API key)
Payload format JSON array of objects: [{"id":"1", ...}, ...]
Batch size 500–1000 documents per POST (recommended)
Required field id (unique key)
Date format ISO 8601: 2024-01-15T10:30:00Z
Related FAQ Import from XML

Need help importing your database into Opensolr? We are here to help — free of charge.

Contact Support