Data Ingestion API
The Data Ingestion API lets you push documents directly into your Opensolr index from your own application, database, or script. Instead of crawling a website, you send your content as structured JSON — and Opensolr takes care of the rest, including AI enrichment with embeddings, sentiment analysis, and language detection. For the complete API reference with code examples in multiple languages, see the Data Ingestion API guide.
How Data Ingestion Works
When to Use Data Ingestion vs. Web Crawler
Opensolr gives you two ways to get content into your index. Here is how to pick the right one:
Data Ingestion API
Best for: databases, product catalogs, CMS content, custom applications, programmatic content, or anything that is not a public website. You control exactly what data goes into the index.
Web Crawler
Best for: public websites, blogs, documentation sites, or any content that is accessible via a URL. The crawler discovers pages automatically — no coding required.
Data Ingestion and the Web Crawler can feed the same index. For example, you might crawl your public website AND push product data from your database via the API. Both end up searchable in the same index.
API Endpoint
To push documents, send a POST request to the Data Ingestion endpoint. Here is an example using curl:
Replace YOUR_INDEX_NAME with your Opensolr index name and YOUR_API_KEY with your API key (found in your Opensolr dashboard). Need to index PDFs, Word documents, or other binary files? See Document Extraction (extractOnly) for how to use the rtf=true parameter.
Document Structure
Each document you push is a JSON object with fields. Some fields are required, others are optional. Here is the full reference:
Required Fields
Every document must include these fields:
| Field | Type | Description |
|---|---|---|
uri |
String | A unique identifier for the document. Usually a URL or a unique ID like product-123. This is also used for deduplication. |
title |
String | The title of the document. Displayed in search results as the clickable heading. |
description |
String | A short summary or excerpt. Displayed below the title in search results. |
text |
String | The full text content of the document. This is the main body that gets searched. Can be as long as needed. |
Optional Fields
Add these fields to enrich your documents with extra metadata:
| Field | Type | Description |
|---|---|---|
category |
String | A category label (e.g., "Electronics", "Blog", "FAQ"). Useful for faceted filtering. |
author |
String | The author name. Useful for filtering or displaying in search results. |
og_image |
String (URL) | An image URL to display as a thumbnail in search results. |
price_f |
Float | A numeric price value. The _f suffix tells Opensolr this is a float (decimal number). |
currency_s |
String | Currency code (e.g., "USD", "EUR"). The _s suffix marks it as a string field. |
meta_* |
Varies | Any field starting with meta_ is stored as custom metadata. Use appropriate type suffixes. |
Field Type Suffixes
When you create custom fields, add a suffix to tell Opensolr what type of data the field contains. This ensures proper indexing and enables features like sorting and filtering.
You do not need to define these fields in your schema ahead of time. Just use the right suffix and Opensolr automatically handles the field type. For example, sending rating_f: 4.5 automatically creates a float field called rating_f. For the full list of available field types and their behaviors, see the Index Field Reference.
Each API request can contain up to 50 documents in the documents array. If you have more than 50 documents to push, split them into multiple requests. There is no limit on how many requests you can make — just keep each batch at 50 or fewer.
AI Enrichment Pipeline
Every document you push through the Data Ingestion API is automatically enriched by AI before it hits your index. This happens behind the scenes — you do not need to do anything extra.
Here is what each enrichment step does:
- Embeddings — your document's text is converted into a 1024-dimensional numerical vector. This enables vector/semantic search — finding documents by meaning, not just keywords.
- Sentiment Analysis — the text is analyzed to determine its overall tone: positive, negative, or neutral. Useful for filtering reviews, feedback, or social media content.
- Language Detection — the language of the text is automatically identified. This enables language-aware search features and helps with multilingual indexes.
Ingestion Queue Dashboard
When you push documents, they enter an ingestion queue where they are processed one batch at a time. The dashboard shows you exactly what is happening with each batch.
Job States
Each ingestion job goes through these states:
Jobs can also be in a Paused state if you manually pause them from the dashboard.
Available Actions
Run
Start processing a pending or paused job immediately.
Pause
Temporarily halt a running job. It will stay in the queue and can be resumed.
Resume
Continue processing a paused job from where it left off.
Retry
Re-process a failed job. Useful if the failure was caused by a temporary issue.
Delete
Remove a job from the queue entirely. The documents will not be indexed.
Progress Monitoring
While a job is processing, the dashboard shows a progress bar and counts: how many documents have been enriched and indexed out of the total. You can watch it in real time. For detailed documentation on all queue actions including pause, resume, retry, and monitoring, see Ingestion Queue Management.
Automatic Deduplication
Opensolr automatically prevents duplicate documents using your uri field. Here is how it works:
You can push the same document as many times as you want. As long as the uri stays the same, Opensolr will update the existing document instead of creating a duplicate. This makes it safe to run your ingestion script repeatedly.
Step-by-Step: Pushing Your First Document
Here is a quick walkthrough to get your first document into your index via the API:
- Get your API key — find it in your Opensolr Dashboard under your account settings.
- Know your index name — this is the name you gave your index when you created it (e.g.,
my-products). - Prepare your document as a JSON object with at least the four required fields:
uri,title,description, andtext. - Send a POST request to
https://api.opensolr.com/v1/ingest/YOUR_INDEX_NAMEwith your API key in the Authorization header and your document in the request body (see the code example above). - Check the response — a successful push returns a confirmation with the job ID. You can track the job in the Ingestion Queue Dashboard.
- Search for your document — once the job completes (usually within seconds), your document is live and searchable in your index.
Related Documentation
Web Crawler
Prefer automatic crawling? The Web Crawler discovers and indexes your website pages without code.
AI & Vector Search
Learn how the AI embeddings from ingestion power semantic search across your documents.
API Reference
Full API documentation with all endpoints, parameters, and response formats.