Batch vector embeddings

Documentation > AI-RAG-NLP-API > Batch vector embeddings

Opensolr API Endpoint: batch_embed

Overview

The batch_embed endpoint allows you to generate vector embeddings for multiple text payloads in a single request — up to 50 items at a time, each up to 50,000 characters. This is significantly faster than calling the single embed endpoint in a loop, because all texts are encoded in a single GPU/CPU batch.


Endpoint URL

https://api.opensolr.com/solr_manager/api/batch_embed

Supports GET and POST requests.


Authentication Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index (for authentication).

Embedding Parameters

Parameter Type Required Description
payloads JSON array Yes A JSON array of strings to embed. Max 50 items, each max 50,000 characters.

You can pass payloads as:

  • A GET/POST parameter: payloads=["text1","text2","text3"]
  • A JSON body: {"payloads": ["text1", "text2", "text3"]}

Rate Limits

Limit Value
Max items per batch 50
Max characters per item 50,000
Min characters per item 2

Example Response

{
  "embeddings": [
    [0.0123, -0.0456, ...],
    [0.0789, -0.0012, ...],
    [0.0345, -0.0678, ...]
  ],
  "dimension": 1024,
  "count": 3
}

Quick Test (1-liner)

curl -s "https://api.opensolr.com/solr_manager/api/batch_embed?email=&api_key=LOGIN_FOR_API_KEY&index_name=your_index&payloads=%5B%22hello+world%22,%22opensolr+is+great%22%5D"

PHP Example

<?php
$url = "https://api.opensolr.com/solr_manager/api/batch_embed";

$payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs"
];

$params = [
    "email"      => "your@email.com",
    "api_key"    => "YOUR_API_KEY",
    "index_name" => "your_index",
    "payloads"   => json_encode($payloads)
];

$ch = curl_init($url);
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST           => true,
    CURLOPT_POSTFIELDS     => http_build_query($params),
    CURLOPT_TIMEOUT        => 120
]);

$response = json_decode(curl_exec($ch), true);
curl_close($ch);

if (isset($response["embeddings"])) {
    echo "Got " . $response["count"] . " embeddings of dimension " . $response["dimension"] . "\n";
    foreach ($response["embeddings"] as $i => $vector) {
        echo "Embedding $i: [" . implode(", ", array_slice($vector, 0, 5)) . ", ...]\n";
    }
} else {
    echo "Error: " . ($response["error"] ?? "Unknown") . "\n";
}

Python Example

import requests
import json

url = "https://api.opensolr.com/solr_manager/api/batch_embed"

payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs",
]

params = {
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": json.dumps(payloads),
}

response = requests.post(url, data=params, timeout=120).json()

if "embeddings" in response:
    print(f'Got {response["count"]} embeddings of dimension {response["dimension"]}')
    for i, vec in enumerate(response["embeddings"]):
        print(f"Embedding {i}: {vec[:5]}...")
else:
    print(f'Error: {response.get("error", "Unknown")}')

JSON Body Example (alternative)

You can also send the payloads as a JSON body:

curl -s -X POST "https://api.opensolr.com/solr_manager/api/batch_embed" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": ["hello world", "opensolr is great", "batch embeddings rock"]
  }'

Solr Schema Requirement

To store batch embeddings in Solr, your schema needs a vector field:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>

⚠️ This API uses the BAAI/bge-m3 embedding model which produces 1024-dimensional vectors.


Use Cases

  • Bulk preprocessing: Embed hundreds of documents before indexing them into Solr.
  • Comparison: Generate embeddings for multiple candidates and compare them using cosine similarity.
  • Clustering: Embed a set of texts and feed the vectors into a clustering algorithm.
  • Caching: Pre-compute embeddings for frequently searched queries.

See Also


Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.