Tuesday, 22 May 2012

How to index data in solr from database automatically?

As most of the application store the data in relational databases and once the data size gets larger the option comes to solr. Index the data of database using solr DIH. It requires you to configure the data-config.xml. All the required info about setting up solr and data-config.xml is available at
http://abhijitbashetti.blogspot.in/2011/09/apache-solr-set-up-for-tomcat-on-linux.html and

the other useful link is http://abhijitbashetti.blogspot.in/2011/09/indexing-database-with-solr-34-from.html

Now the question comes how to automize the same. I have used the JBoss for scheduling the same.

In the data-config.xml use the last_index_time as the variable resolver. Or add a new column to the table from where you are fetching the data for indexing e.g last_modification_date.

For the next scheduling which will be invoked for updating the index for the updated data in your database. It will check the last_modification_date with last_index_time and update the index accordingly.

Here you can use the jboss scheduling mechanism by firing different url to your solr server.

i.e. Add the variable resolver in your query and send those data in the http URL to solr.

Your database query would be like ...



select  doc.document_id as id,
        doc.name as name,
        doc.author as author,
        from   ds_document_c doc
        where doc.last_modification_date >= to_date(${dataimporter.request.last_index_time}, 'DD/MM/YYYY HH24:MI:SS');


and the http url for the solr would like

http://localhost:8080/solr/select?qt=/dataimport&command=full-import&clean=false&commit=true&verbose=true&last_index_time='12/05/2012'


This will help you to automate your database indexing ....











No comments:

Post a Comment