Cheat Sheet For DB Based SOLR Indexing
- Define data-config.xml (whatever the name of your data configuration file) :
- This file defines from how to read data from RDBMS to your document to be indexed. So, define your SQL for full import as well as subsequent partial imports (called as delta imports) in this file.
- how does the data read get mapped to fields: Map here columns to SOLR fields.
- Make sure that you test your sql in using your favorite RDBMS client.
- solrconfig.xml : Register request handler and data-config.xml in solrconfig.xml
- For example, if your db import is defined as dbimport in data-config.xml, you can define a request handler and specify request's url and map to data-config.xml
- schema.xml should contain all the fields that are defined in document in data-config.xml The solr config specifies how those fields should be dealt with when adding documents to the index.
- You can define your datasource either in data-cofig.xml or in solrconfig.xml
- You can index the data by http invocation of http://:port/solr/dbimport?command=full-import (please note that the use whatever path you mentioned for 'dbimport' in your request handler.
- Please make sure that appropriate jdbc driver is in the lib path of solr.
- You can monitor the progress / status as : http://host:port/solr/admin/stats.jsp
- To look inside the index, use web version of Luke added as solr plugin : http://host:port/solr/admin/luke BTW the perfect way to look into indexes would be to install Luke and point to the data dir.
- Cleanup / Re-index: You can either cleanup solr indexes through issuing cleanup command on your dbimport or you can simply wipe of the content of data directory. However, make sure that you really want to do it.
- You can debug (very minimal) indexing by specifying debug=true in your dbimport command. However, make sure that you add commit=true
For details: http://wiki.apache.org/solr/DataImportHandler
Labels: Indexing, SOLR
Save This Page on del.icio.us