Khanderao on Emerging And Integration Technologies

Monday, October 08, 2012

Cheat Sheet For DB Based SOLR Indexing

  1. Define data-config.xml (whatever the name of your data configuration file) : 
    1. This file defines from how to read data from RDBMS to your document to be indexed. So, define your SQL for full import as well as subsequent partial imports (called as delta imports) in this file.
    2. how does the data read get mapped to fields: Map here columns to SOLR fields.
    3.  Make sure that you test your sql in using your favorite RDBMS client.
  2. solrconfig.xml : Register request handler and data-config.xml in solrconfig.xml
    1. For example, if your db import is defined as dbimport in data-config.xml, you can define a request handler and specify request's url and map to data-config.xml
  3. schema.xml should contain all the fields that are defined in document in data-config.xml The solr config specifies how those fields should be dealt with when adding documents to the index.
  4. You can define your datasource either in data-cofig.xml or in solrconfig.xml
  5. You can index the data by http invocation of http://:port/solr/dbimport?command=full-import  (please note that the use whatever path you mentioned for 'dbimport' in your request handler.
  6. Please make sure that appropriate jdbc driver is in the lib path of solr.
  7. You can monitor the progress / status as     :     http://host:port/solr/admin/stats.jsp
  8. To look inside the index, use web version of Luke added as solr plugin :     http://host:port/solr/admin/luke  BTW the perfect way to look into indexes would be to install Luke and point to the data dir.
  9. Cleanup / Re-index: You can either cleanup solr indexes through issuing cleanup command on your dbimport or you can simply wipe of the content of data directory. However, make sure that you really want to do it.
  10. You can debug (very minimal) indexing by specifying debug=true in your dbimport command. However, make sure that you add commit=true 
For details:

Labels: ,

Add to Technorati Favorites

Save This Page on

Thursday, October 04, 2012

Reclaiming Space from Deleted Big Tables from MySQL

So, in my earlier post, I mentioned about a need of dynamically resizing (increasing) EBS volume on EC2. Here is how I landed in the situation. In the prototype, my database grew very high and I could not reclaim the innodb space of mysql even after dropping large tables or even database. The ibdata1 seems to be greedy and never gives up. And there must be a good technical reason why mysql does not support an utility to release unused space.

Any how, here are the steps for reclaiming the space. Disclaimer: As you know I am not a DBA but I have to do what I have to do:

1. Take a sqldump of entire db 2

2. Shutdown mysql

3. delete (filesystem) ibdata1, ib_logfile0 and 1

4. Edit my.cnf (/etc/my.cnf) : add: innodb_file_per_table
    With this param, table data would be in separate files and only metadata will reside in ibdata1

5. Start mysqld

6. Reload the data dump.


Add to Technorati Favorites

Save This Page on

Monday, October 01, 2012

Need a support to dynamically increase size of EBS volume of EC2 running instance

Recently I started a prototype involving Big Data processing on EC2.  I started with a "guesstimated" size for EBS volume. As in any POC / guesstimates, I was wrong and very soon the size grew and I needed to increase the space. But I realized that we cannot dynamically increase size of EBS volume of a running instance. That seems to be a problem to me. In my opinion, in today's world of visualization and pay as you use model, the vertical and linear scalability should be without any down time.

Anyway, since I was doing this as a POC, I was able to afford a small down time. BTW the process for increase the volume size is not that difficult. Here are the instructions from another blogger:

Add to Technorati Favorites

Save This Page on