Data Ingestion API - Push Documents to Solr

Push documents directly — no crawler needed

Data Ingestion API

The Data Ingestion API lets you push documents directly into your Opensolr index from your own application, database, or script. Instead of crawling a website, you send your content as structured JSON — and Opensolr takes care of the rest, including AI enrichment with embeddings, sentiment analysis, and language detection. For the complete API reference with code examples in multiple languages, see the Data Ingestion API guide.

How Data Ingestion Works

POST /api/ { "title": "..." "text": "..." } Your Application JSON API Opensolr Receives & Validates Opensolr API AI Enrichment Embeddings (1024-dim) Sentiment Analysis Language Detection AI Enrichment Searchable Opensolr Index Solr Index Send JSON documents via API — AI enrichment happens automatically before indexing.

When to Use Data Ingestion vs. Web Crawler

Opensolr gives you two ways to get content into your index. Here is how to pick the right one:

Data Ingestion API

Best for: databases, product catalogs, CMS content, custom applications, programmatic content, or anything that is not a public website. You control exactly what data goes into the index.

Web Crawler

Best for: public websites, blogs, documentation sites, or any content that is accessible via a URL. The crawler discovers pages automatically — no coding required.

You can use both!

Data Ingestion and the Web Crawler can feed the same index. For example, you might crawl your public website AND push product data from your database via the API. Both end up searchable in the same index.

API Endpoint

To push documents, send a POST request to the Data Ingestion endpoint. Here is an example using curl:

# Push a batch of documents to your Opensolr index curl -X POST "https://api.opensolr.com/v1/ingest/YOUR_INDEX_NAME" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "documents": [ { "uri": "https://example.com/product/123", "title": "Wireless Bluetooth Headphones", "description": "Noise-cancelling over-ear headphones with 30-hour battery", "text": "Full product details and specifications go here...", "category": "Electronics", "price_f": 79.99, "currency_s": "USD" } ] }'

Replace YOUR_INDEX_NAME with your Opensolr index name and YOUR_API_KEY with your API key (found in your Opensolr dashboard). Need to index PDFs, Word documents, or other binary files? See Document Extraction (extractOnly) for how to use the rtf=true parameter.

Document Structure

Each document you push is a JSON object with fields. Some fields are required, others are optional. Here is the full reference:

Required Fields

Every document must include these fields:

Field Type Description
uri String A unique identifier for the document. Usually a URL or a unique ID like product-123. This is also used for deduplication.
title String The title of the document. Displayed in search results as the clickable heading.
description String A short summary or excerpt. Displayed below the title in search results.
text String The full text content of the document. This is the main body that gets searched. Can be as long as needed.

Optional Fields

Add these fields to enrich your documents with extra metadata:

Field Type Description
category String A category label (e.g., "Electronics", "Blog", "FAQ"). Useful for faceted filtering.
author String The author name. Useful for filtering or displaying in search results.
og_image String (URL) An image URL to display as a thumbnail in search results.
price_f Float A numeric price value. The _f suffix tells Opensolr this is a float (decimal number).
currency_s String Currency code (e.g., "USD", "EUR"). The _s suffix marks it as a string field.
meta_* Varies Any field starting with meta_ is stored as custom metadata. Use appropriate type suffixes.

Field Type Suffixes

When you create custom fields, add a suffix to tell Opensolr what type of data the field contains. This ensures proper indexing and enables features like sorting and filtering.

Suffix Data Type Example _s String (text) brand_s: "Nike" _ss Multi-value string tags_ss: ["red","xl"] _f Float (decimal) price_f: 29.99 _i Integer (whole number) stock_i: 142 _dt Date/Time published_dt: "2026-01-15T10:30:00Z"
Dynamic fields are automatic

You do not need to define these fields in your schema ahead of time. Just use the right suffix and Opensolr automatically handles the field type. For example, sending rating_f: 4.5 automatically creates a float field called rating_f. For the full list of available field types and their behaviors, see the Index Field Reference.

Batch Limit: 50 Documents Per Request

Each API request can contain up to 50 documents in the documents array. If you have more than 50 documents to push, split them into multiple requests. There is no limit on how many requests you can make — just keep each batch at 50 or fewer.

AI Enrichment Pipeline

Every document you push through the Data Ingestion API is automatically enriched by AI before it hits your index. This happens behind the scenes — you do not need to do anything extra.

Raw Document Your JSON input [0.23, -0.87, 0.15, 0.42, -0.31, 0.78, ... 1024 dimensions 0.56, 0.12, -0.94] Embeddings 1024-dimensional vector Positive / Negative / Neutral scoring Sentiment Tone analysis EN DE ... FR ES ... Auto-detected Language Auto-detection

Here is what each enrichment step does:

  • Embeddings — your document's text is converted into a 1024-dimensional numerical vector. This enables vector/semantic search — finding documents by meaning, not just keywords.
  • Sentiment Analysis — the text is analyzed to determine its overall tone: positive, negative, or neutral. Useful for filtering reviews, feedback, or social media content.
  • Language Detection — the language of the text is automatically identified. This enables language-aware search features and helps with multilingual indexes.

Ingestion Queue Dashboard

When you push documents, they enter an ingestion queue where they are processed one batch at a time. The dashboard shows you exactly what is happening with each batch.

Job States

Each ingestion job goes through these states:

Pending Processing Completed or Failed Queued AI enriching In your index Needs attention

Jobs can also be in a Paused state if you manually pause them from the dashboard.

Available Actions

Run

Start processing a pending or paused job immediately.

Pause

Temporarily halt a running job. It will stay in the queue and can be resumed.

Resume

Continue processing a paused job from where it left off.

Retry

Re-process a failed job. Useful if the failure was caused by a temporary issue.

Delete

Remove a job from the queue entirely. The documents will not be indexed.

Progress Monitoring

While a job is processing, the dashboard shows a progress bar and counts: how many documents have been enriched and indexed out of the total. You can watch it in real time. For detailed documentation on all queue actions including pause, resume, retry, and monitoring, see Ingestion Queue Management.

Automatic Deduplication

Opensolr automatically prevents duplicate documents using your uri field. Here is how it works:

uri = "example.com/page/1" md5(uri) a1b2c3d4e5f6... Unique Document ID Same URI = Same Doc If you push a document with a URI that already exists, the old version is replaced — not duplicated. This means you can safely re-push updated content without worrying about duplicates.
Safe to re-push

You can push the same document as many times as you want. As long as the uri stays the same, Opensolr will update the existing document instead of creating a duplicate. This makes it safe to run your ingestion script repeatedly.

Step-by-Step: Pushing Your First Document

Here is a quick walkthrough to get your first document into your index via the API:

  1. Get your API key — find it in your Opensolr Dashboard under your account settings.
  2. Know your index name — this is the name you gave your index when you created it (e.g., my-products).
  3. Prepare your document as a JSON object with at least the four required fields: uri, title, description, and text.
  4. Send a POST request to https://api.opensolr.com/v1/ingest/YOUR_INDEX_NAME with your API key in the Authorization header and your document in the request body (see the code example above).
  5. Check the response — a successful push returns a confirmation with the job ID. You can track the job in the Ingestion Queue Dashboard.
  6. Search for your document — once the job completes (usually within seconds), your document is live and searchable in your index.

Related Documentation