Khanderao on Emerging And Integration Technologies: October 2012

Cheat Sheet For DB Based SOLR Indexing

Define data-config.xml (whatever the name of your data configuration file) :

This file defines from how to read data from RDBMS to your document to be indexed. So, define your SQL for full import as well as subsequent partial imports (called as delta imports) in this file.
how does the data read get mapped to fields: Map here columns to SOLR fields.
Make sure that you test your sql in using your favorite RDBMS client.

solrconfig.xml : Register request handler and data-config.xml in solrconfig.xml

For example, if your db import is defined as dbimport in data-config.xml, you can define a request handler and specify request's url and map to data-config.xml

schema.xml should contain all the fields that are defined in document in data-config.xml The solr config specifies how those fields should be dealt with when adding documents to the index.
You can define your datasource either in data-cofig.xml or in solrconfig.xml
You can index the data by http invocation of http://:port/solr/dbimport?command=full-import (please note that the use whatever path you mentioned for 'dbimport' in your request handler.
Please make sure that appropriate jdbc driver is in the lib path of solr.
You can monitor the progress / status as : http://host:port/solr/admin/stats.jsp
To look inside the index, use web version of Luke added as solr plugin : http://host:port/solr/admin/luke BTW the perfect way to look into indexes would be to install Luke and point to the data dir.
Cleanup / Re-index: You can either cleanup solr indexes through issuing cleanup command on your dbimport or you can simply wipe of the content of data directory. However, make sure that you really want to do it.
You can debug (very minimal) indexing by specifying debug=true in your dbimport command. However, make sure that you add commit=true

For details: http://wiki.apache.org/solr/DataImportHandler

Labels: Indexing, SOLR

Save This Page on del.icio.us

Reclaiming Space from Deleted Big Tables from MySQL

So, in my earlier post, I mentioned about a need of dynamically resizing (increasing) EBS volume on EC2. Here is how I landed in the situation. In the prototype, my database grew very high and I could not reclaim the innodb space of mysql even after dropping large tables or even database. The ibdata1 seems to be greedy and never gives up. And there must be a good technical reason why mysql does not support an utility to release unused space.

Any how, here are the steps for reclaiming the space. Disclaimer: As you know I am not a DBA but I have to do what I have to do:

1. Take a sqldump of entire db 2

2. Shutdown mysql

3. delete (filesystem) ibdata1, ib_logfile0 and 1

4. Edit my.cnf (/etc/my.cnf) : add: innodb_file_per_table
With this param, table data would be in separate files and only metadata will reside in ibdata1

5. Start mysqld

6. Reload the data dump.

Labels: mysql

Save This Page on del.icio.us

Need a support to dynamically increase size of EBS volume of EC2 running instance

Recently I started a prototype involving Big Data processing on EC2. I started with a "guesstimated" size for EBS volume. As in any POC / guesstimates, I was wrong and very soon the size grew and I needed to increase the space. But I realized that we cannot dynamically increase size of EBS volume of a running instance. That seems to be a problem to me. In my opinion, in today's world of visualization and pay as you use model, the vertical and linear scalability should be without any down time.

Anyway, since I was doing this as a POC, I was able to afford a small down time. BTW the process for increase the volume size is not that difficult. Here are the instructions from another blogger: http://www.e-zest.net/blog/simple-steps-to-change-size-of-ebs-volume-in-ec2-of-aws-using-aws-console/

Save This Page on del.icio.us

Khanderao on Emerging And Integration Technologies

Monday, October 08, 2012

Cheat Sheet For DB Based SOLR Indexing

Thursday, October 04, 2012

Reclaiming Space from Deleted Big Tables from MySQL

Monday, October 01, 2012

Need a support to dynamically increase size of EBS volume of EC2 running instance

About

About Me

Links

Architects and PMs

Previous to this post

Monthly Archives

Quick links on blogsite