Content Types
Choose exactly which WordPress content types get indexed β posts, pages, WooCommerce products, custom post types, and more. Each indexing method has its own content type selection so you can fine-tune what gets crawled vs. what gets pushed via the API.
Two Separate Content Type Selections
The plugin maintains two independent sets of content type checkboxes β one for the Web Crawler and one for Data Ingestion. This gives you full control over what each method indexes.
Data Crawler Tab
Controls which content types appear in your XML sitemap. The web crawler fetches pages listed in the sitemap, so unchecking a type here removes it from the sitemap and the crawler skips it.
Data Ingestion Tab
Controls which content types get pushed directly to Solr via the API. Only checked types are included when you click Ingest All Now or when real-time sync fires on post save.
Available Content Types
The plugin automatically detects all public post types registered in your WordPress installation. Common types include:
Posts
Standard WordPress blog posts β the most common content type
Pages
Static WordPress pages β About, Contact, Landing pages, etc.
Products
WooCommerce products β indexed with prices, categories, SKUs, and structured data
Custom Post Types
Any custom post type registered by your theme or plugins (e.g., Portfolios, Events, Testimonials)
Dynamic Document Counts
Each content type shows a live document count next to its checkbox β the number of published items of that type. As you check and uncheck types, the total counter at the bottom updates in real time to show you exactly how many documents will be indexed.
- Checked β shows the actual count of published items (e.g., "Posts (47)")
- Unchecked β shows 0, since those items will not be indexed
- Total β the sum of all checked types, updated live as you toggle checkboxes
Include Attached Files
Below the content type checkboxes, you will find an "Include attached files" option. When enabled, the crawler and ingestion system will also process files attached to your posts:
- PDFs β full-text extraction from PDF documents
- Word Documents β
.docxand.docfiles - Spreadsheets β
.xlsxand.xlsfiles - Presentations β
.pptxfiles - Text files β
.txt,.odt, and other plain-text formats
File contents are extracted and indexed alongside the post they are attached to, making them fully searchable.
Recommended Setup
For best results, enable the same content types on both tabs. The web crawler handles comprehensive indexing on a schedule, while data ingestion provides real-time sync when you publish or update a post. Both methods produce identical Solr documents with the same document ID, so there are no duplicates β whichever method writes last simply updates the document.
How Content Types Affect Indexing
Sitemap Generation
Only checked content types in the Data Crawler tab are included in your auto-generated /opensolr-sitemap.xml. The crawler only indexes URLs it finds in the sitemap.
API Ingestion
Only checked content types in the Data Ingestion tab are queued for bulk ingestion and real-time sync hooks. Unchecked types are ignored even when saved.
No Duplicates
Both methods use the same document ID: md5(uri). A post indexed by the crawler and later updated via ingestion remains a single document in your index β no cleanup needed.
Next, start the web crawler to index your content, or enable data ingestion for real-time sync.