AI-Vector Embedding API

Find answers to your questions quickly and easily

Batch vector embeddings

Opensolr API Endpoint: batch_embed

Overview

The batch_embed endpoint allows you to generate vector embeddings for multiple text payloads in a single request — up to 50 items at a time, each up to 50,000 characters. This is significantly faster than calling the single embed endpoint in a loop, because all texts are encoded in a single GPU/CPU batch.


Endpoint URL

https://api.opensolr.com/solr_manager/api/batch_embed

Supports GET and POST requests.


Authentication Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index (for authentication).

Embedding Parameters

Parameter Type Required Description
payloads JSON array Yes A JSON array of strings to embed. Max 50 items, each max 50,000 characters.

You can pass payloads as:

  • A GET/POST parameter: payloads=["text1","text2","text3"]
  • A JSON body: {"payloads": ["text1", "text2", "text3"]}

Rate Limits

Limit Value
Max items per batch 50
Max characters per item 50,000
Min characters per item 2

Example Response

{
  "embeddings": [
    [0.0123, -0.0456, ...],
    [0.0789, -0.0012, ...],
    [0.0345, -0.0678, ...]
  ],
  "dimension": 1024,
  "count": 3
}

Quick Test (1-liner)

curl -s "https://api.opensolr.com/solr_manager/api/batch_embed?email=&api_key=LOGIN_FOR_API_KEY&index_name=your_index&payloads=%5B%22hello+world%22,%22opensolr+is+great%22%5D"

PHP Example

<?php
$url = "https://api.opensolr.com/solr_manager/api/batch_embed";

$payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs"
];

$params = [
    "email"      => "your@email.com",
    "api_key"    => "YOUR_API_KEY",
    "index_name" => "your_index",
    "payloads"   => json_encode($payloads)
];

$ch = curl_init($url);
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST           => true,
    CURLOPT_POSTFIELDS     => http_build_query($params),
    CURLOPT_TIMEOUT        => 120
]);

$response = json_decode(curl_exec($ch), true);
curl_close($ch);

if (isset($response["embeddings"])) {
    echo "Got " . $response["count"] . " embeddings of dimension " . $response["dimension"] . "\n";
    foreach ($response["embeddings"] as $i => $vector) {
        echo "Embedding $i: [" . implode(", ", array_slice($vector, 0, 5)) . ", ...]\n";
    }
} else {
    echo "Error: " . ($response["error"] ?? "Unknown") . "\n";
}

Python Example

import requests
import json

url = "https://api.opensolr.com/solr_manager/api/batch_embed"

payloads = [
    "Distributed search across multiple shards",
    "Real-time indexing with soft commits",
    "Faceted navigation for e-commerce catalogs",
]

params = {
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": json.dumps(payloads),
}

response = requests.post(url, data=params, timeout=120).json()

if "embeddings" in response:
    print(f'Got {response["count"]} embeddings of dimension {response["dimension"]}')
    for i, vec in enumerate(response["embeddings"]):
        print(f"Embedding {i}: {vec[:5]}...")
else:
    print(f'Error: {response.get("error", "Unknown")}')

JSON Body Example (alternative)

You can also send the payloads as a JSON body:

curl -s -X POST "https://api.opensolr.com/solr_manager/api/batch_embed" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "your@email.com",
    "api_key": "YOUR_API_KEY",
    "index_name": "your_index",
    "payloads": ["hello world", "opensolr is great", "batch embeddings rock"]
  }'

Solr Schema Requirement

To store batch embeddings in Solr, your schema needs a vector field:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>

⚠️ This API uses the BAAI/bge-m3 embedding model which produces 1024-dimensional vectors.


Use Cases

  • Bulk preprocessing: Embed hundreds of documents before indexing them into Solr.
  • Comparison: Generate embeddings for multiple candidates and compare them using cosine similarity.
  • Clustering: Embed a set of texts and feed the vectors into a clustering algorithm.
  • Caching: Pre-compute embeddings for frequently searched queries.

See Also


Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.

Read Full Answer

Create vector embeddings

Opensolr API Endpoint: embed

Overview

The embed endpoint allows you to generate vector embeddings for any arbitrary text payload (up to 50,000 characters) and store those embeddings in any Solr index, or use 'em wherever you need 'em. This is ideal for embedding dynamic or ad-hoc content, without having to pre-index data in Solr first.


Endpoint URL

https://api.opensolr.com/solr_manager/api/embed

Supports only POST requests.


Authentication & Core Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index/core to use.

Embedding Parameters

Parameter Type Required Default Description
payload string Yes The raw text string to embed. Maximum: 50,000 characters.

Special Notes

  • payload can be any UTF-8 text (e.g., a document, user input, generated content, etc).
  • If payload is missing or less than 2 characters, the API returns a 404 error with a JSON error response.
  • You must specify a valid index_name to indicate you are an Opensolr Member (only used for authentication purposes).
  • The response is returned as JSON.

Example Solr Schema Requirement

To store embeddings, your Solr schema must define an appropriate vector field, for example:

<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>

Adjust the name, type, and vectorDimension as needed to fit your use-case and model. ⚠️ IMPORTANT: This API uses the embedding model: BAAI/bge-m3 which produces a 1024D vector representation of your text.


Example Usage

Example: POST Request

POST https://api.opensolr.com/solr_manager/api/embed
Content-Type: application/x-www-form-urlencoded

email=your@email.com&api_key=YOUR_API_KEY&index_name=your_index&payload=Your text to embed here.

Endpoint Behavior

  1. Authenticates the user using email and api_key.
  2. Checks for a valid payload parameter (must be 2-50,000 characters).
  3. Generates vector embeddings for the provided payload.
  4. Stores the generated embedding in the specified Solr index.
  5. Responds with a JSON object containing the result.

Example Response

{
  "status": "success",
  "embedding": [/* vector values */],
  "length": 4381
}

Or, for invalid input:

{
  "ERROR": "Invalid payload"
}

Use Cases

  • Embed dynamic or user-generated text into your Solr index.
  • Integrate with chatbots, summarization tools, or any app needing AI-powered semantic representations on the fly.
  • Test or debug embedding functionality before bulk-embedding large datasets.

Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.


Read Full Answer

Create vector embeddings for every document in your Opensolr...

Opensolr API Endpoint: embed_opensolr_index

⚠️ Important Disclaimer (Read Before Use!)

In order to not lose any fields from your existing index, make sure that inside your schema.xml, all fields are defined with docValues="true" AND/OR stored="true".

Otherwise, when we create your embeddings vector, you will lose any field that does not have docValues="true" AND/OR stored="true".

Because of this, it's highly recommended to understand the implications of Solr atomic updates clearly. For most users, the safer approach is to create embeddings at indexing time, using the /embed endpoint, especially if you rely on non-stored or non-docValues fields for downstream features.

Please review the official documentation on Solr Atomic Updates to fully understand these implications before using this endpoint.

⚠️ Important Pre-Req:

First make sure you have this in schema.xml

<!--VECTORS-->
<field name="embeddings" type="vector" indexed="true" stored="true" multiValued="false" required="false" />
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" similarityFunction="cosine"/>

Or run the following to create your field and type definition inside your schema.xml via the native Solr API:

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fieldtypes -H 'Content-type:application/json' -d '{
  "add-field-type": {
    "name": "vector",
    "class": "solr.DenseVectorField",
    "vectorDimension": 1024,
    "similarityFunction": "cosine"
  }
}'

$ curl -u <INDEX_USERNAME>:<INDEX_PASSWORD> https://<OPENSOLR_INDEX_HOST>solr/<OPENSOLR_INDEX_NAME>/schema/fields -H 'Content-type:application/json' -d '{
  "add-field": {
    "name":"embeddings",
    "type":"vector",
    "indexed":true,
    "stored":false, // true if you want to see the vectors for debugging
    "multiValued":false,
    "required":false,
    "dimension":1024,  // adjust to your embedder size
    "similarityFunction":"cosine"
  }
}'

Seocond make sure you have this in solrconfig.xml:

<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
      
        <updateLog>
          <int name="numVersionBuckets">65536</int>
          <int name="maxNumLogsToKeep">10</int>
          <int name="numRecordsToKeep">10</int>
        </updateLog>

.....

</updateHandler>

Overview

The embed_opensolr_index endpoint allows Opensolr users to generate and store text embeddings for documents in their Opensolr indexes using a Large Language Model (LLM). These embeddings power advanced features such as semantic search, classification, and artificial intelligence capabilities on top of your Solr data.


Endpoint URL

https://api.opensolr.com/solr_manager/api/embed_opensolr_index

Supports both GET and POST methods.


Authentication Parameters

Parameter Type Required Description
email string Yes Your Opensolr registration email address.
api_key string Yes Your API key from the Opensolr dashboard.
index_name string Yes Name of your Opensolr index/core to be embedded.

Embedding Parameters

Parameter Type Required Default Description
emb_solr_fields string No title,description,text Comma-separated list of Solr fields to embed (can be any valid fields in your index).
emb_solr_embeddings_field_name string No embeddings Name of the Solr field to store generated embeddings.
emb_full_solr_grab bool | string No false If "yes", embed all documents in the index; otherwise use pagination parameters below.
emb_solr_start integer No 0 Starting document offset (for pagination).
emb_solr_rows integer No 10 Number of documents to process in the current request (page size).

Special Notes on Field Configuration

  • The API grabs data from the fields listed in emb_solr_fields, which defaults to title,description,text, but you may specify any fields from your index for embedding.
  • You can specify emb_solr_embeddings_field_name to match the embeddings field in your schema.
  • IMPORTANT: Your Solr index must have an embeddings field defined in your schema.xml. Example configuration:
<field name="embeddings" type="vector" indexed="true" stored="false" multiValued="false"/>
<fieldType name="vector" class="solr.DenseVectorField" vectorDimension="1024" required="false" similarityFunction="cosine"/>
  • Replace embeddings and vector with your custom names if you use different field names.

Atomic Updates in Solr (Brief Explanation)

Solr atomic updates update only the fields you specify in the update request. Other fields—including those defined as non-stored (stored=false)—are not changed or removed by an atomic update. However, since non-stored fields cannot be retrieved from Solr, you cannot use them to generate embeddings after indexing time.

If you ever replace an entire document (full overwrite), non-stored fields will be lost unless you explicitly provide their values again.


Parameter Details

  • email: Your Opensolr account email. Used for authentication.
  • api_key: Secret API key from your Opensolr dashboard.
  • index_name: Name of the Solr core/index (must belong to your account).
  • emb_solr_fields: Fields to embed from your Solr documents. Comma-separated. Only text fields are supported.
  • emb_solr_embeddings_field_name: The Solr field where the generated embedding vector will be stored.
  • emb_full_solr_grab: Set to yes to embed all documents in the index; otherwise, the endpoint uses pagination.
  • emb_solr_start: Offset for selecting documents (for pagination through large datasets).
  • emb_solr_rows: Number of documents to embed per request (batch size).

Example Usage

Example: Minimal POST Request

POST https://api.opensolr.com/solr_manager/api/embed_opensolr_index
Content-Type: application/x-www-form-urlencoded

email=your@email.com&api_key=YOUR_API_KEY&index_name=your_index

Example: Full POST Request

POST https://api.opensolr.com/solr_manager/api/embed_opensolr_index
Content-Type: application/x-www-form-urlencoded

email=your@email.com&api_key=YOUR_API_KEY&index_name=your_index&emb_solr_fields=title,content&emb_solr_embeddings_field_name=embeddings&emb_full_solr_grab=yes

Example: GET Request

GET https://api.opensolr.com/solr_manager/api/embed_opensolr_index?email=your@email.com&api_key=YOUR_API_KEY&index_name=your_index

Endpoint Behavior

  1. Authenticates the user with email and api_key.
  2. Selects the index named index_name.
  3. Fetches document data from the fields listed in emb_solr_fields.
  4. Generates embeddings for each document and stores them in the specified emb_solr_embeddings_field_name.
  5. If emb_full_solr_grab is yes, processes all documents; otherwise uses emb_solr_start and emb_solr_rows for batch processing.
  6. Responds with plain text status/progress updates.

Use Cases

  • Enable vector/semantic search on your Opensolr index
  • Prepare your data for AI and LLM-powered analysis
  • Periodically update embeddings after document changes

Support

For more information or help, visit Opensolr Support or use your Opensolr dashboard.

Read Full Answer

AI-API - Summarize Context

Opensolr AI-Hints - Demo Implementation

The Opensolr AI-Hints API, is free to use as part of your Opensolr Account.
The Opensolr AI-Hints LLM will generate a summary of the context, either coming form your Opensolr Web Crawler Index, or a manually entered context.
A number of other instructions can be passed on to this API, for NER, and other capabilities. It is in Beta at this point, but will get better with time.

  1. GET or POST https://api.opensolr.com/solr_manager/api/ai_summary
  2. Parameters:
    1. ​email - Required - your Opensolr registration email address
    2. api_key - Required - your Opensolr api_key
    3. ​index_name - Required - must be a valid Opensolr Index that belongs to you.
    4. ​query - Optional - a Solr Search query stirng to search within your Opensolr Crawler Index. If left empty, you should add the context manually because no Solr search will be performed that would otherwise provide the context automatically.
    5. ​context - Optional - If present, this context will be analyzed, instead of the content coming from your Solr Web Crawler Index.
    6. ​instruction - Optional - The instruction to send to the AI LLM. Defaults to: "Answer the query in plain English language, or translate into English, and summarize all the key points about the dates, people, events, places, or other important things that you find in the context below. Focus on the main ideas." You can also use instructions such as: "Extract a list of people names... " or "Translated this content to ... LANG", etc...
  3. Example: https://api.opensolr.com/solr_manager/api/ai_summary?email=PLEASE_LOG_IN&
    api_key=PLEASE_LOG_IN&index_name=my_crawler_solr_index&instruction=Answer%20The%20Query&query=Who%20is%20Donald%20Trump?
Read Full Answer