Data Ingestion API

The Data Ingestion API lets you push documents directly into your Opensolr index from your own application, database, or script. Instead of crawling a website, you send your content as structured JSON — and Opensolr takes care of the rest, including AI enrichment with embeddings, sentiment analysis, and language detection. For the complete API reference with code examples in multiple languages, see the Data Ingestion API guide.

On Drupal or WordPress? Your CMS already pushes for you.

If you are using the Drupal modules or the WordPress plugin, you do not need to call the Data Ingestion API directly. The connectors hook into content save/update/delete events and push to Opensolr automatically — including translations, files, and structured fields. This page applies to custom apps, scripts, or frameworks without a prebuilt connector.

How Data Ingestion Works

When to Use Data Ingestion vs. Web Crawler

Opensolr gives you two ways to get content into your index. Here is how to pick the right one:

Data Ingestion API

Best for: databases, product catalogs, CMS content, custom applications, programmatic content, or anything that is not a public website. You control exactly what data goes into the index.

Web Crawler

Best for: public websites, blogs, documentation sites, or any content that is accessible via a URL. The crawler discovers pages automatically — no coding required.

You can use both!

Data Ingestion and the Web Crawler can feed the same index. For example, you might crawl your public website AND push product data from your database via the API. Both end up searchable in the same index.

API Endpoint

To push documents, send a POST request to the Data Ingestion endpoint. Here is an example using curl:

# Push a batch of documents to your Opensolr index curl -X POST "https://api.opensolr.com/v1/ingest/YOUR_INDEX_NAME" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "documents": [ { "uri": "https://example.com/product/123", "title": "Wireless Bluetooth Headphones", "description": "Noise-cancelling over-ear headphones with 30-hour battery", "text": "Full product details and specifications go here...", "category": "Electronics", "price_f": 79.99, "currency_s": "USD" } ] }'

Replace YOUR_INDEX_NAME with your Opensolr index name and YOUR_API_KEY with your API key (found in your Opensolr dashboard). Need to index PDFs, Word documents, or other binary files? See Document Extraction (extractOnly) for how to use the rtf=true parameter.

Document Structure

Each document you push is a JSON object with fields. Some fields are required, others are optional. Here is the full reference:

Required Fields

Every document must include these fields:

Field	Type	Description
`uri`	String	A unique identifier for the document. Usually a URL or a unique ID like `product-123`. This is also used for deduplication.
`title`	String	The title of the document. Displayed in search results as the clickable heading.
`description`	String	A short summary or excerpt. Displayed below the title in search results.
`text`	String	The full text content of the document. This is the main body that gets searched. Can be as long as needed.

Optional Fields

Add these fields to enrich your documents with extra metadata:

Field	Type	Description
`category`	String	A category label (e.g., "Electronics", "Blog", "FAQ"). Useful for faceted filtering.
`author`	String	The author name. Useful for filtering or displaying in search results.
`og_image`	String (URL)	An image URL to display as a thumbnail in search results.
`price_f`	Float	A numeric price value. The `_f` suffix tells Opensolr this is a float (decimal number).
`currency_s`	String	Currency code (e.g., "USD", "EUR"). The `_s` suffix marks it as a string field.
`meta_*`	Varies	Any field starting with `meta_` is stored as custom metadata. Use appropriate type suffixes.

Field Type Suffixes

When you create custom fields, add a suffix to tell Opensolr what type of data the field contains. This ensures proper indexing and enables features like sorting and filtering.

Dynamic fields are automatic

You do not need to define these fields in your schema ahead of time. Just use the right suffix and Opensolr automatically handles the field type. For example, sending rating_f: 4.5 automatically creates a float field called rating_f. For the full list of available field types and their behaviors, see the Index Field Reference.

Batch Limit: 50 Documents Per Request

Each API request can contain up to 50 documents in the documents array. If you have more than 50 documents to push, split them into multiple requests. There is no limit on how many requests you can make — just keep each batch at 50 or fewer.

AI Enrichment Pipeline

Every document you push through the Data Ingestion API is automatically enriched by AI before it hits your index. This happens behind the scenes — you do not need to do anything extra.

Here is what each enrichment step does:

Embeddings — your document's text is converted into a 1024-dimensional numerical vector. This enables vector/semantic search — finding documents by meaning, not just keywords.
Sentiment Analysis — the text is analyzed to determine its overall tone: positive, negative, or neutral. Useful for filtering reviews, feedback, or social media content.
Language Detection — the language of the text is automatically identified. This enables language-aware search features and helps with multilingual indexes.

Ingestion Queue Dashboard

When you push documents, they enter an ingestion queue where they are processed one batch at a time. The dashboard shows you exactly what is happening with each batch.

Job States

Each ingestion job goes through these states:

Jobs can also be in a Paused state if you manually pause them from the dashboard.

Available Actions

Run

Start processing a pending or paused job immediately.

Pause

Temporarily halt a running job. It will stay in the queue and can be resumed.

Resume

Continue processing a paused job from where it left off.

Retry

Re-process a failed job. Useful if the failure was caused by a temporary issue.

Delete

Remove a job from the queue entirely. The documents will not be indexed.

Progress Monitoring

While a job is processing, the dashboard shows a progress bar and counts: how many documents have been enriched and indexed out of the total. You can watch it in real time. For detailed documentation on all queue actions including pause, resume, retry, and monitoring, see Ingestion Queue Management.

Automatic Deduplication

Opensolr automatically prevents duplicate documents using your uri field. Here is how it works:

Safe to re-push

You can push the same document as many times as you want. As long as the uri stays the same, Opensolr will update the existing document instead of creating a duplicate. This makes it safe to run your ingestion script repeatedly.

Step-by-Step: Pushing Your First Document

Here is a quick walkthrough to get your first document into your index via the API:

Get your API key — find it in your Opensolr Dashboard under your account settings.
Know your index name — this is the name you gave your index when you created it (e.g., my-products).
Prepare your document as a JSON object with at least the four required fields: uri, title, description, and text.
Send a POST request to https://api.opensolr.com/v1/ingest/YOUR_INDEX_NAME with your API key in the Authorization header and your document in the request body (see the code example above).
Check the response — a successful push returns a confirmation with the job ID. You can track the job in the Ingestion Queue Dashboard.
Search for your document — once the job completes (usually within seconds), your document is live and searchable in your index.

Data Ingestion API - Push Documents to Solr