-
Download
the Tomcat-5.5.33 from here.
-
Install
Tomcat (no special instructions here--just run the install and
select directory wherever you wish to install)
-
Start
Tomcat by startup.sh in bin dir.
-
Verify
the installation of Tomcat by going to http://localhost:8080
-
Download
SOLR from one of the mirrors found here (downloaded
the apache-solr-3.4.0-src.tgz package) and unzip the package. e.g.
Solr is extracted at /home/abashetti/Downloads/apache-solr-3.4.0/
-
Open
the Terminal. Go to the extracted apache solr folder. e.g. cd
/home/abashetti/Downloads/apache-solr-3.4.0/solr
-
Create
the solr war. Run the ant commands – ant clean , ant compile and
ant dist.
-
Ant
dist will create the *solr*.war in */solr/dist/ folder. e.g. path
for the war file
is(/home/abashetti/Downloads/apache-solr-3.4.0/solr/dist).
-
To
avail the dataimporter functionality add the
apache-solr-dataimporthandler , apache-solr-dataimporthandler-extras
jars to solr lib.
-
The
apache-solr-dataimporthandler , apache-solr-dataimporthandler-extras
jars are available at
*/apache-solr-3.4.0/solr/contrib/dataimporthandler/target/
e.g.
path is from where I copied the jar files is
(/home/abashetti/Downloads/apache-solr-3.4.0/solr/contrib/dataimporthandler/target/)
&
solr lib path is
(/home/abashetti/Downloads/apache-solr-3.4.0/solr/lib).
-
-
Build
the source code of Apache Tika using maven. For maven set up read
here.
-
Copy
the jar files named tika-app , tika-bundle , tika-core ,
tika-parsers from target to solr lib. In my case solr lib path is
(/home/abashetti/Downloads/apache-solr-3.4.0/solr/lib).
-
Create
the solr war again after adding the jars. Run the ant commands –
ant clean , ant compile and ant dist.
-
Create
a directory SOLR. It is the SOLR HOME, where SOLR will be hosted
from
(e.g.
/home/abashetti/Downloads/solr).
-
Copy
the files and folder from path
/home/abashetti/Downloads/apache-solr-3.4.0/solr/example/solr to
your SOLR HOME. e.g destination path is
(/home/abashetti/Downloads/solr/).
-
Visit http://localhost:8080/solr/admin to
make sure everything is still running.
-
Go
to the path
/home/abashetti/Downloads/apache-solr-3.4.0/solr/example/solr/conf.
-
Create
a file data-config.xml. Add the database connection information and
the query
in
this file.
-
Configuring
the datasource in the data-config.xml.
<dataConfig>
<dataSource name="ds-db"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@127.0.0.1:1521:test" user="root"
password="root"/>
<dataSource name="ds-file"
type="BinFileDataSource"/>
<document name="documents">
<entity name="document" dataSource="ds-db"
query="select distinct
doc.document_id as id,
doc.title,
doc.author,
doc.publisher,
(case when doc.content_format_code not
in('doc','pdf','xml','txt','ppt','xls') then
( select path.document_path from document_path
path where path.doc_id = doc.id )
else
''
end)contentpath
from ds_document_c doc
where
doc.index_state_modification_date >= to_date($
{dataimporter.request.lastIndexDate}, 'DD/MM/YYYY HH24:MI:SS')))"
transformer="DateFormatTransformer">
<field column="id" name="id"/>
<field column="title" name="title"/>
<field column="author" name="author"/>
<field column="publisher" name="publisher"/>
</entity>
<entity name="textEntity"
processor="TikaEntityProcessor"
url="$ {document.CONTENTPATH}" dataSource="ds-file"
format="text" onError="continue">
<field column="text" name="text"/>
</entity>
</document>
</dataConfig>
Substitute
the database username and password with your database credentials.
-
Add
the location of data-config in solrconfig.xml under the
DataImortHandler Section.
<requestHandler
name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
-
Edit
the schema.xml file. The schema.xml file contains all of the details
about which fields your documents can contain, and how those fields
should be dealt with when adding documents to the index, or when
querying those fields.
<field
name=”id” type=”integer” indexed=”true” stored=”true”
/>
<field name=”title” type=”string” indexed=”true”
stored=”true” /> <field name=”author” type=”string”
indexed=”true” stored=”true” /> <field
name=”publisher” type=”string” indexed=”true”
stored=”true” /> <field name=”text” type=”text”
indexed=”true” stored=”true” />
Find
the “<uniqueKey>” node and change it to:
<uniqueKey>id</uniqueKey>
Find
the “<defaultSearchField>” node and change it to:
<defaultSearchField>text</defaultSearchField>;
Delete
all the “<copyField>” nodes.
-
Copy
the *solr*.war file from the dist directory
in the unzipped SOLR package to your Tomcat webapps folder.
-
Rename
the *solr*.war file to solr.war
-
Specify
the solr home in the catlina.sh
JAVA_OPTS="$JAVA_OPTS
-Dsolr.solr.home=/home/abashetti/Downloads/solr"
-
Add
the above line just below the JAVA_OPTS="$JAVA_OPTS
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
-
Copy
the jar ojdbc6.jar to the path : */apache-tomcat-5.5.33/common/lib
-
Now
go the http://localhost:8080/solr/admin/dataimport.jsp.
Click on the /DATAIMPORT
link. You
will see the dataimporter console. Click on the button “Full
Import With Cleaning” . It will start indexing. Clicking the on
the status button you will know the progress of the indexing. If
indexing is in progress it will show the status as “busy”
otherwise “Indexing completed for “number” of documents”
-
Once
the indexing is completed, go the http://localhost:8080/solr/admin
click on the search button to check the result.
- Download the Tomcat-5.5.33 from here.
- Install Tomcat (no special instructions here--just run the install and select directory wherever you wish to install)
- Start Tomcat by startup.sh in bin dir.
- Verify the installation of Tomcat by going to http://localhost:8080
- Download SOLR from one of the mirrors found here (downloaded the apache-solr-3.4.0-src.tgz package) and unzip the package. e.g. Solr is extracted at /home/abashetti/Downloads/apache-solr-3.4.0/
- Open the Terminal. Go to the extracted apache solr folder. e.g. cd /home/abashetti/Downloads/apache-solr-3.4.0/solr
- Create the solr war. Run the ant commands – ant clean , ant compile and ant dist.
- Ant dist will create the *solr*.war in */solr/dist/ folder. e.g. path for the war file is(/home/abashetti/Downloads/apache-solr-3.4.0/solr/dist).
- To avail the dataimporter functionality add the apache-solr-dataimporthandler , apache-solr-dataimporthandler-extras jars to solr lib.
- The apache-solr-dataimporthandler , apache-solr-dataimporthandler-extras jars are available at */apache-solr-3.4.0/solr/contrib/dataimporthandler/target/e.g. path is from where I copied the jar files is (/home/abashetti/Downloads/apache-solr-3.4.0/solr/contrib/dataimporthandler/target/)& solr lib path is (/home/abashetti/Downloads/apache-solr-3.4.0/solr/lib).
-
- Build the source code of Apache Tika using maven. For maven set up read here.
- Copy the jar files named tika-app , tika-bundle , tika-core , tika-parsers from target to solr lib. In my case solr lib path is (/home/abashetti/Downloads/apache-solr-3.4.0/solr/lib).
- Create the solr war again after adding the jars. Run the ant commands – ant clean , ant compile and ant dist.
- Create a directory SOLR. It is the SOLR HOME, where SOLR will be hosted from(e.g. /home/abashetti/Downloads/solr).
- Copy the files and folder from path /home/abashetti/Downloads/apache-solr-3.4.0/solr/example/solr to your SOLR HOME. e.g destination path is(/home/abashetti/Downloads/solr/).
- Visit http://localhost:8080/solr/admin to make sure everything is still running.
- Go to the path /home/abashetti/Downloads/apache-solr-3.4.0/solr/example/solr/conf.
- Create a file data-config.xml. Add the database connection information and the queryin this file.
- Configuring the datasource in the data-config.xml.
<dataConfig>
<dataSource name="ds-db"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@127.0.0.1:1521:test" user="root"
password="root"/>
<dataSource name="ds-file"
type="BinFileDataSource"/>
<document name="documents">
<entity name="document" dataSource="ds-db"
query="select distinct
doc.document_id as id,
doc.title,
doc.author,
doc.publisher,
(case when doc.content_format_code not
in('doc','pdf','xml','txt','ppt','xls') then
( select path.document_path from document_path
path where path.doc_id = doc.id )
else
''
end)contentpath
from ds_document_c doc
where
doc.index_state_modification_date >= to_date($
{dataimporter.request.lastIndexDate}, 'DD/MM/YYYY HH24:MI:SS')))"
transformer="DateFormatTransformer">
<field column="id" name="id"/>
<field column="title" name="title"/>
<field column="author" name="author"/>
<field column="publisher" name="publisher"/>
</entity>
<entity name="textEntity"
processor="TikaEntityProcessor"
url="$ {document.CONTENTPATH}" dataSource="ds-file"
format="text" onError="continue">
<field column="text" name="text"/>
</entity>
</document>
</dataConfig>
Substitute
the database username and password with your database credentials.
- Add the location of data-config in solrconfig.xml under the DataImortHandler Section.
<requestHandler
name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
- Edit the schema.xml file. The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields.
<field
name=”id” type=”integer” indexed=”true” stored=”true”
/>
<field name=”title” type=”string” indexed=”true”
stored=”true” /> <field name=”author” type=”string”
indexed=”true” stored=”true” /> <field
name=”publisher” type=”string” indexed=”true”
stored=”true” /> <field name=”text” type=”text”
indexed=”true” stored=”true” />
Find
the “<uniqueKey>” node and change it to:
<uniqueKey>id</uniqueKey>
Find
the “<defaultSearchField>” node and change it to:
<defaultSearchField>text</defaultSearchField>;
Delete
all the “<copyField>” nodes.
- Copy the *solr*.war file from the dist directory in the unzipped SOLR package to your Tomcat webapps folder.
- Rename the *solr*.war file to solr.war
- Specify the solr home in the catlina.shJAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/home/abashetti/Downloads/solr"
- Add the above line just below the JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
- Copy the jar ojdbc6.jar to the path : */apache-tomcat-5.5.33/common/lib
- Now go the http://localhost:8080/solr/admin/dataimport.jsp. Click on the /DATAIMPORT link. You will see the dataimporter console. Click on the button “Full Import With Cleaning” . It will start indexing. Clicking the on the status button you will know the progress of the indexing. If indexing is in progress it will show the status as “busy” otherwise “Indexing completed for “number” of documents”
- Once the indexing is completed, go the http://localhost:8080/solr/admin
click on the search button to check the result.
there's no folder named as target : Help plz
ReplyDeleteI have used solr 3.4... and this setup is for the same version. which version you are using....?
Deleteif you are using solr 4.3.1 then the data-import handler jars would be available at path "*/solr-4.3.1/solr/build/contrib".
DeleteAre you still facing the same issue..?
DeleteWhich version of solr you are using ...?
ReplyDelete