README.txt in Apache Solr Attachments 6.3

Same filename and directory in other branches

Apache Solr Attachments for 7.x

Requires the ability to run java and installation of tika 0.3 or higher,
or access to a solr server set up for content extraction (e.g. a Solr
1.4 final release).  For Solr, there is a patch to apply to the solrconfig
to add another request handler.

see:
http://lucene.apache.org/tika/gettingstarted.html
http://lucene.apache.org/tika/formats.html

Tika will extract many file formats, including PDFs, MS Office (2003 format
as well as new docx format).  Java 6 (aka 1.6) may be needed on some
platforms to support all formats.  The page on formats seems not to be 100%
up to date.  In particular, https://issues.apache.org/jira/browse/TIKA-152
is committed, so it does currently support MS Office 2007 documents to
some reasonable degee.

The easiest-to-find pre-built Tika app is available from the download page:
http://tika.apache.org/download.html

e.g. tika-app-1.1.jar

You can copy/move the jar to somewhere convenient, though it's probably a good idea
to keep it outside your docroot.

Solr 1.4.1 uses tika 0.4, 1.4.2-dev uses tika 0.7, and Solr 3.5.0 uses 0.10,
and newer releases of Sorl are likey to use 1.1+.  Note that the version
numbering changed, so 0.1 was followed by 1.0 and 1.1.

If you need to build tika from source using maven (mvn).  Get the tika
source from:
http://lucene.apache.org/tika/download.html

You may need to increase the memory for java/mvn using (for example):
export MAVEN_OPTS="-Xmx1024m -Xms512m"

mvn install

will build the full set of tika applications - it will build the app jar
in a location like tika-app/target/tika-app-1.1.jar

Copy tika-app-1.1.jar from there or point the module path to it.

See also build instructions at: http://drupal.org/node/540974#comment-1944082

If you are using Solr to extract your content, you need to copy (or symlink)
the contents of contrib/extraction/lib to a directory named lib under your
solr home, or alter solrconfig.xml to add the orgiginal directory as a
lib directory.

File

README.txt

View source


Apache Solr Attachments for 7.x

Requires the ability to run java and installation of tika 0.3 or higher,
or access to a solr server set up for content extraction (e.g. a Solr
1.4 final release).  For Solr, there is a patch to apply to the solrconfig
to add another request handler.

see:
http://lucene.apache.org/tika/gettingstarted.html
http://lucene.apache.org/tika/formats.html

Tika will extract many file formats, including PDFs, MS Office (2003 format
as well as new docx format).  Java 6 (aka 1.6) may be needed on some
platforms to support all formats.  The page on formats seems not to be 100%
up to date.  In particular, https://issues.apache.org/jira/browse/TIKA-152
is committed, so it does currently support MS Office 2007 documents to
some reasonable degee.

The easiest-to-find pre-built Tika app is available from the download page:
http://tika.apache.org/download.html

e.g. tika-app-1.1.jar

You can copy/move the jar to somewhere convenient, though it's probably a good idea
to keep it outside your docroot.

Solr 1.4.1 uses tika 0.4, 1.4.2-dev uses tika 0.7, and Solr 3.5.0 uses 0.10,
and newer releases of Sorl are likey to use 1.1+.  Note that the version
numbering changed, so 0.1 was followed by 1.0 and 1.1.

If you need to build tika from source using maven (mvn).  Get the tika
source from:
http://lucene.apache.org/tika/download.html

You may need to increase the memory for java/mvn using (for example):
export MAVEN_OPTS="-Xmx1024m -Xms512m"

mvn install

will build the full set of tika applications - it will build the app jar
in a location like tika-app/target/tika-app-1.1.jar

Copy tika-app-1.1.jar from there or point the module path to it.

See also build instructions at: http://drupal.org/node/540974#comment-1944082

If you are using Solr to extract your content, you need to copy (or symlink)
the contents of contrib/extraction/lib to a directory named lib under your
solr home, or alter solrconfig.xml to add the orgiginal directory as a
lib directory.

You are here

README.txt in Apache Solr Attachments 6.3

File

API Navigation