Most applications store data in relational databases like mysql , oracle , db2 ...and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr. For the same we will create the data-config.xml. The data-config.xml will have the variable in the query.
The data-config for MySql will look like this :
<dataConfig>
 <dataSource
name="ds-db" driver="com.mysql.jdbc.Driver"    
   url="jdbc:mysql://localhost/dbname" user="root"
password="root"/>
 <dataSource
name="ds-file" type="BinFileDataSource"/>
  <document
name="documents">
   <entity
name="book" dataSource="ds-db" query="select
distinct
    book.id
as id,
    book.title,
    book.author,
    book.publisher,
    from
Books book
    where
book.book_added_date >= to_date($
        {dataimporter.request.lastIndexDate}, 'DD/MM/YYYY
HH24:MI:SS')))"  
    transformer="DateFormatTransformer">
    <field
column=”id” name=”id”/>
    <field
column=”title” name=”title”/>
    <field
column=”author” name=”author”/>
    <field
column=”publisher” name=”publisher”/>
   <entity
name=”content” query=”select description from content where
     content_id='${book.id}'”>
    <field
column=”description” name=”description”/>
   </entity>
   </entity>
  </document>
</dataConfig>
In
the url you need to pass the variable resolver with value.
The
url to start the data-import in this case will be : 
http://localhost:8080/solr/admin/select/?qt=/dataimport&command=full-import&clean=false&commit=true&lastIndexDate='08/05/2011
20:16:11'
For
the first time indexing you need pass “lastIndexDate=null”.
The
data-config for Oracle will look like this : 
<dataConfig>
 <dataSource
name="ds-db" driver="oracle.jdbc.driver.OracleDriver"
       url="jdbc:oracle:thin:@127.0.0.1:1521:test"
user="dev" password="dev"/>
 <dataSource
name="ds-file" type="BinFileDataSource"/>
  <document
name="documents">
   <entity
name="book" dataSource="ds-db" query="select
distinct
    book.id
as id,
    book.title,
    book.author,
    book.publisher,
    from
Books book
    where
book.book_added_date >= to_date($
        {dataimporter.request.lastIndexDate}, 'DD/MM/YYYY
HH24:MI:SS')))"  
    transformer="DateFormatTransformer">
    <field
column=”id” name=”id”/>
    <field
column=”title” name=”title”/>
    <field
column=”author” name=”author”/>
    <field
column=”publisher” name=”publisher”/>
   <entity
name=”content” query=”select description from content where
     content_id='${book.Id}'”>
    <field
column=”description” name=”description”/>
   </entity>
   </entity>
  </document>
</dataConfig>
The
change here in data-config.xml for oracle id ${book.Id}
and not the ${book.id}.
It took me long time to find out this by debugging.
 
No comments:
Post a Comment