​Trigger Start the Web Crawler

Trigger Start the Web Crawler

  1. GET https://opensolr.com/solr_manager/api/start_crawl
  2. Parameters:
    1. ​email - (required) your opensolr registration email address
    2. api_key (required) your opensolr api_key
    3. ​core_name - (required) the name of the core you wish to start the web crawler process for
    4. follow_docs - (optional) follow documents and images? (yes/no). Default: no
    5. clean - (optional) start fresh, or resume from where you left off? (yes/no). Default: no.
    6. relax - (optional) number of microseconds to wait, before each request to new http links. Default 200000 (0.2 s).
    7. max_threads - (optional) the number of CPU threads to use when crawling. Buy more
    8. max_traffic - (optional) maximum Mb to download when crawling through the entire website. Buy more
    9. max_filesize - (optional) maximum size of the file to download and process by the crawler. Buy more
    10. auth_username - (optional) if your starting URLs are using Basic HTTP Auth, you can enter the username here.
    11. auth_password - (optional) if your starting URLs are using Basic HTTP Auth, you can enter the password here.
    12. mode - (optional) the mode to follow links. possible values are: 1 or 2 or 3 or 4. Default value is 1.
      1. Follow all DOMAIN links (full site crawl. follows all subdomains *.site.com, and all paths *.site.com/*)
      2. Follow all HOST links (limits crawl to HOST only site.com and all it's paths, doesn't follow other subdomains)
      3. Follow all PATH links (limits crawl to given PATH, site.com/path/ will not go outisde the /path/)
      4. Follow the initial URL only. not scraping any other links.
  3. Example: https://opensolr.com/solr_manager/api/start_crawl?email=support@opensolr.com&
    api_key=63c018d34b0106a6cc8fdab91c5f52b7&core_name=my_solr_core