Author Archives: Mark Reyes

About Mark Reyes

Web Developer based out of Southern California.

Resources for Solr 4.5, Nutch 1.7 and AJAX Solr

I’ll be publishing documentation on here as well as Github which will show you how to set up an Apache Solr instance, crawl then index a website with Apache Nutch and finally integrating those results to the front-end with AJAX Solr.

For now, here’s a list of resources which have proven to be helpful thus far:

Success with indexing Nutch 1.7 to Solr 4.5

In regards to my post on Stackoverflow, I pointed my crawl and index to the location of my collection. In this case:


$ bin/nutch crawl urls -solr http://localhost:8983/solr/rockies -depth 1 -topN 5
$ bin/nutch solrindex http://localhost:8983/solr/rockies crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

Additionally, I updated the -depth to 1 (specifies how deep to go after the link is defined. In this case 1 link from main page) and -topN to 5 (how many documents will be retrieved from each level).