Batch vector embeddings

Vector Embedding API

Opensolr API Endpoint: batch_embed

Overview

The batch_embed endpoint allows you to generate vector embeddings for multiple text payloads in a single request — up to 50 items at a time, each up to 50,000 characters. This is significantly faster than calling the single embed endpoint in a loop, because all texts are encoded in a single GPU/CPU batch.


Endpoint URL

https://api.opensolr.com/solr_manager/api/batch_embed

Supports GET and POST requests.


Authentication Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index (for authentication).

Embedding Parameters

Parameter Type Required Description
payloads JSON array Yes A JSON array of strings to embed. Max 50 items, each max 50,000 characters.

You can pass payloads as:

  • A GET/POST parameter: payloads=["text1","text2","text3"]
  • A JSON body: {"payloads": ["text1", "text2", "text3"]}

Rate Limits

Limit Value
Max items per batch 50
Max characters per item 50,000
Min characters per item 2

Query vs Document Embedding\n\nSet is_query=1 when embedding search queries (user input at search time). Leave is_query=0 (default) when embedding documents/passages at index time. The model prepends an instruction prefix for queries that optimizes the vectors for retrieval — improving recall by 5–15%, especially on natural language queries and cross-language matching.\n\n---\n\n## Example Response

{
  "embeddings": [
    [0.0123, -0.0456, ...],
    [0.0789, -0.0012, ...],
    [0.0345, -0.0678, ...]
  ],
  "dimension": 1024,
  "count": 3
}

$_ Quick Test (1-liner)

curl -s "https://api.opensolr.com/solr_manager/api/batch_embed?email=&api_key=LOGIN_FOR_API_KEY&index_name=your_index&payloads=%5B%22hello+world%22,%22opensolr+is+great%22%5D&is_query=0"

PHP PHP Example

<?php
$url = "https://api.opensolr.com/solr_manager/api/batch_embed";

$payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs"
];

$params = [
    "email"      => "your@email.com",
    "api_key"    => "YOUR_API_KEY",
    "index_name" => "your_index",
    "payloads"   => json_encode($payloads),\n    "is_query"   => "0"  // Set to "1" for search queries
];

$ch = curl_init($url);
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST           => true,
    CURLOPT_POSTFIELDS     => http_build_query($params),
    CURLOPT_TIMEOUT        => 120
]);

$response = json_decode(curl_exec($ch), true);
curl_close($ch);

if (isset($response["embeddings"])) {
    echo "Got " . $response["count"] . " embeddings of dimension " . $response["dimension"] . "
";
    foreach ($response["embeddings"] as $i => $vector) {
        echo "Embedding $i: [" . implode(", ", array_slice($vector, 0, 5)) . ", ...]
";
    }
} else {
    echo "Error: " . ($response["error"] ?? "Unknown") . "
";
}

Py Python Example

import requests
import json

url = "https://api.opensolr.com/solr_manager/api/batch_embed"

payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs",
]

params = {
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": json.dumps(payloads),\n    "is_query": "0",  # Set to "1" for search queries
}

response = requests.post(url, data=params, timeout=120).json()

if "embeddings" in response:
    print(f'Got {response["count"]} embeddings of dimension {response["dimension"]}')
    for i, vec in enumerate(response["embeddings"]):
        print(f"Embedding {i}: {vec[:5]}...")
else:
    print(f'Error: {response.get("error", "Unknown")}')

{ } JSON Body Example (alternative)

You can also send the payloads as a JSON body:

curl -s -X POST "https://api.opensolr.com/solr_manager/api/batch_embed" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": ["hello world", "opensolr is great", "batch embeddings rock"],\n    "is_query": "0"
  }'

Solr Schema Requirement

To store batch embeddings in Solr, your schema needs a vector field:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>

⚠️ This API uses the intfloat/multilingual-e5-large-instruct embedding model which produces 1024-dimensional vectors.


Use Cases

  • Bulk preprocessing: Embed hundreds of documents before indexing them into Solr.
  • Comparison: Generate embeddings for multiple candidates and compare them using cosine similarity.
  • Clustering: Embed a set of texts and feed the vectors into a clustering algorithm.
  • Caching: Pre-compute embeddings for frequently searched queries.

See Also


Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.

Custom Plans Available

This is a premium feature available on custom plans tailored to your needs and budget. For small websites, we can even provide these features for free after validating your use case. Contact us at support@opensolr.com to discuss your requirements.