Create vector embeddings

Vector Embedding API

Opensolr API Endpoint: embed

Overview

The embed endpoint allows you to generate vector embeddings for any arbitrary text payload (up to 50,000 characters) and use them wherever you need — in KNN searches, your own storage, or any downstream pipeline. This is ideal for embedding dynamic or ad-hoc content, without having to pre-index data in Solr first.


Endpoint URL

https://api.opensolr.com/solr_manager/api/embed

Supports only POST requests.


Authentication & Core Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index/core to use.

Embedding Parameters

Parameter Type Required Default Description
payload string Yes The raw text string to embed. Maximum: 50,000 characters.

Query vs Document Embedding\n\nThe embedding model (intfloat/multilingual-e5-large-instruct) is instruction-tuned — it produces better vectors when it knows whether the input is a search query or a document.\n\n- is_query=1: Prepends an instruction prefix that tells the model "this is a search query — optimize for retrieval." Use this when embedding user search strings before sending them to Solr KNN.\n- is_query=0 (default): No prefix. Use this when embedding documents/passages at index time.\n\nThis asymmetry improves recall by 5–15% on queries where the wording differs significantly from the indexed documents. It also enables cross-language matching (e.g., a Romanian query matching English documents).\n\n---\n\n## Special Notes

  • payload can be any UTF-8 text (e.g., a document, user input, generated content, etc).
  • If payload is missing or less than 2 characters, the API returns a 404 error with a JSON error response.
  • You must specify a valid index_name to indicate you are an Opensolr Member (only used for authentication purposes).
  • The response is returned as JSON.

Example Solr Schema Requirement

To store embeddings, your Solr schema must define an appropriate vector field, for example:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>

Adjust the name, type, and vectorDimension as needed to fit your use-case and model. ⚠️ IMPORTANT: This API uses the embedding model: intfloat/multilingual-e5-large-instruct which produces a 1024D vector representation of your text.


Example Usage

$_ cURL

curl -X POST https://api.opensolr.com/solr_manager/api/embed \
  -d "email=your@email.com" \
  -d "api_key=YOUR_API_KEY" \
  -d "index_name=your_index" \
  -d "payload=Your text to embed here. This can be any UTF-8 content up to 50000 characters." \\n  -d "is_query=0"

PHP PHP

<?php
$url = 'https://api.opensolr.com/solr_manager/api/embed';

$params = [
    'email'      => 'your@email.com',
    'api_key'    => 'YOUR_API_KEY',
    'index_name' => 'your_index',
    'payload'    => 'Your text to embed here. This can be any UTF-8 content up to 50000 characters.',
    'is_query'   => '0',  // Set to '1' for search queries
];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($params));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/x-www-form-urlencoded']);

$response = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

$data = json_decode($response, true);

if ($http_code === 200 && isset($data['embedding'])) {
    echo "Embedding generated — " . count($data['embedding']) . " dimensions
";
    // $data['embedding'] is a 1024-element float array ready for KNN search
} else {
    echo "Error: " . ($data['ERROR'] ?? $response) . "
";
}

Py Python

import requests

url = "https://api.opensolr.com/solr_manager/api/embed"

params = {
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payload": "Your text to embed here. This can be any UTF-8 content up to 50000 characters.",\n    "is_query": "0",  # Set to "1" for search queries
}

resp = requests.post(url, data=params)
data = resp.json()

if resp.status_code == 200 and "embedding" in data:
    print(f"Embedding generated — {len(data['embedding'])} dimensions")
    # data["embedding"] is a 1024-element float list ready for KNN search
else:
    print(f"Error: {data.get('ERROR', resp.text)}")

Endpoint Behavior

  1. Authenticates the user using email and api_key.
  2. Checks for a valid payload parameter (must be 2-50,000 characters).
  3. Generates vector embeddings for the provided payload.
  4. Uses index_name for authentication only — the embedding is NOT stored anywhere.
  5. Responds with a JSON object containing the result.

Example Response

{
  "status": "success",
  "embedding": [/* vector values */],
  "length": 4381
}

Or, for invalid input:

{
  "ERROR": "Invalid payload"
}

Use Cases

  • Embed dynamic or user-generated text into your Solr index.
  • Integrate with chatbots, summarization tools, or any app needing AI-powered semantic representations on the fly.
  • Test or debug embedding functionality before bulk-embedding large datasets.

Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.


Custom Plans Available

This is a premium feature available on custom plans tailored to your needs and budget. For small websites, we can even provide these features for free after validating your use case. Contact us at support@opensolr.com to discuss your requirements.