Thursday 29 September 2011

SOLR Oracle data-importer with variableResolver in data-config.xml


Most applications store data in relational databases like mysql , oracle , db2 ...and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr. For the same we will create the data-config.xml. The data-config.xml will have the variable in the query.



The data-config for MySql will look like this :

<dataConfig>
<dataSource name="ds-db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="root" password="root"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="documents">
<entity name="book" dataSource="ds-db" query="select distinct
book.id as id,
book.title,
book.author,
book.publisher,
from Books book
where book.book_added_date >= to_date($ {dataimporter.request.lastIndexDate}, 'DD/MM/YYYY HH24:MI:SS')))"

transformer="DateFormatTransformer">
<field column=”id” name=”id”/>
<field column=”title” name=”title”/>
<field column=”author” name=”author”/>
<field column=”publisher” name=”publisher”/>
<entity name=”content” query=”select description from content where content_id='${book.id}'”>
<field column=”description” name=”description”/>
</entity>
</entity>
</document>
</dataConfig>

In the url you need to pass the variable resolver with value.
The url to start the data-import in this case will be :
http://localhost:8080/solr/admin/select/?qt=/dataimport&command=full-import&clean=false&commit=true&lastIndexDate='08/05/2011 20:16:11'

For the first time indexing you need pass “lastIndexDate=null”.


The data-config for Oracle will look like this :

<dataConfig>
<dataSource name="ds-db" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@127.0.0.1:1521:test" user="dev" password="dev"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="documents">
<entity name="book" dataSource="ds-db" query="select distinct
book.id as id,
book.title,
book.author,
book.publisher,
from Books book
where book.book_added_date >= to_date($ {dataimporter.request.lastIndexDate}, 'DD/MM/YYYY HH24:MI:SS')))"

transformer="DateFormatTransformer">
<field column=”id” name=”id”/>
<field column=”title” name=”title”/>
<field column=”author” name=”author”/>
<field column=”publisher” name=”publisher”/>
<entity name=”content” query=”select description from content where content_id='${book.Id}'”>
<field column=”description” name=”description”/>
</entity>
</entity>
</document>
</dataConfig>

The change here in data-config.xml for oracle id ${book.Id} and not the ${book.id}. It took me long time to find out this by debugging.






No comments:

Post a Comment