Documentation > Wiki > How to import data into Solr, from XML

Importing data from XML into Opensolr

If you were using Solr's DataImport Handler, starting with Solr 9.x that is no longer possible.
Here's how to write a small script that will import data into your Opensolr Index, from XML files:

#!/bin/bash
USERNAME="<OPENSOLR_INDEX_HTTP_AUTH_USERNAME>"
PASSWORD="<OPENSOLR_INDEX_HTTP_AUTH_PASSWORD>"

echo "Starting import on all indexes..."
echo ""

echo "Importing: <YOUR_OPENSOLR_INDEX_NAME>"
echo "Downloading the xml data file"
wget -q <URL_TO_YOUR_XML_FILE>/<YOUR_XML_FILE_NAME>
echo "Removing all data"
curl -s -u $USERNAME:$PASSWORD "https://<YOUR_OPENSOLR_INDEX_HOSTNAME>/solr/<YOUR_OPENSOLR_INDEX_NAME>/update?commit=true&wt=json&indent=true" -H "Content-Type: text/xml" -d "*:*"
echo ""
echo "Uploading and Importing all data into <YOUR_OPENSOLR_INDEX_NAME>"
curl -u $USERNAME:$PASSWORD "https://<YOUR_OPENSOLR_INDEX_HOSTNAME>/solr/<YOUR_OPENSOLR_INDEX_NAME>/update?commit=true&wt=json&indent=true" --progress-bar -H "Content-Type: text/xml" --data-binary @<YOUR_XML_FILE_NAME> | tee -a "/dev/null" ; test ${PIPESTATUS[0]} -eq 0
echo ""
rm -rf <YOUR_XML_FILE_NAME>
echo "Done!"
echo ""
echo ""
echo ""

Now, the way this is made, is that if you have a minimal tech background, you can understand that everything within the <> brackets will have to be replaced with your Opensolr Index Name, your Opensolr Index Hostname, the URL for your XML file, and so forth. You can get all that info in your Opensolr Index Control Panel. Except for the URL to your XML file, which that is hosted somewhere on your end.

The way you format your XML file, is the classic Solr format.
This article may should show you more about the Solr XML Data File format.





Review us on Google Business
Close Check out the Fully Automated Solr server setup for Drupal & WordPress.