You are here

README.txt in Apache Solr Attachments 6.3

Same filename and directory in other branches
  1. 6 README.txt
  2. 6.2 README.txt
  3. 7 README.txt
Apache Solr Attachments for 7.x

Requires the ability to run java and installation of tika 0.3 or higher,
or access to a solr server set up for content extraction (e.g. a Solr
1.4 final release).  For Solr, there is a patch to apply to the solrconfig
to add another request handler.

see:
http://lucene.apache.org/tika/gettingstarted.html
http://lucene.apache.org/tika/formats.html

Tika will extract many file formats, including PDFs, MS Office (2003 format
as well as new docx format).  Java 6 (aka 1.6) may be needed on some
platforms to support all formats.  The page on formats seems not to be 100%
up to date.  In particular, https://issues.apache.org/jira/browse/TIKA-152
is committed, so it does currently support MS Office 2007 documents to
some reasonable degee.

The easiest-to-find pre-built Tika app is available from the download page:
http://tika.apache.org/download.html

e.g. tika-app-1.1.jar

You can copy/move the jar to somewhere convenient, though it's probably a good idea
to keep it outside your docroot.

Solr 1.4.1 uses tika 0.4, 1.4.2-dev uses tika 0.7, and Solr 3.5.0 uses 0.10,
and newer releases of Sorl are likey to use 1.1+.  Note that the version
numbering changed, so 0.1 was followed by 1.0 and 1.1.

If you need to build tika from source using maven (mvn).  Get the tika
source from:
http://lucene.apache.org/tika/download.html

You may need to increase the memory for java/mvn using (for example):
export MAVEN_OPTS="-Xmx1024m -Xms512m"

mvn install

will build the full set of tika applications - it will build the app jar
in a location like tika-app/target/tika-app-1.1.jar

Copy tika-app-1.1.jar from there or point the module path to it.

See also build instructions at: http://drupal.org/node/540974#comment-1944082

If you are using Solr to extract your content, you need to copy (or symlink)
the contents of contrib/extraction/lib to a directory named lib under your
solr home, or alter solrconfig.xml to add the orgiginal directory as a
lib directory.

File

README.txt
View source
  1. Apache Solr Attachments for 7.x
  2. Requires the ability to run java and installation of tika 0.3 or higher,
  3. or access to a solr server set up for content extraction (e.g. a Solr
  4. 1.4 final release). For Solr, there is a patch to apply to the solrconfig
  5. to add another request handler.
  6. see:
  7. http://lucene.apache.org/tika/gettingstarted.html
  8. http://lucene.apache.org/tika/formats.html
  9. Tika will extract many file formats, including PDFs, MS Office (2003 format
  10. as well as new docx format). Java 6 (aka 1.6) may be needed on some
  11. platforms to support all formats. The page on formats seems not to be 100%
  12. up to date. In particular, https://issues.apache.org/jira/browse/TIKA-152
  13. is committed, so it does currently support MS Office 2007 documents to
  14. some reasonable degee.
  15. The easiest-to-find pre-built Tika app is available from the download page:
  16. http://tika.apache.org/download.html
  17. e.g. tika-app-1.1.jar
  18. You can copy/move the jar to somewhere convenient, though it's probably a good idea
  19. to keep it outside your docroot.
  20. Solr 1.4.1 uses tika 0.4, 1.4.2-dev uses tika 0.7, and Solr 3.5.0 uses 0.10,
  21. and newer releases of Sorl are likey to use 1.1+. Note that the version
  22. numbering changed, so 0.1 was followed by 1.0 and 1.1.
  23. If you need to build tika from source using maven (mvn). Get the tika
  24. source from:
  25. http://lucene.apache.org/tika/download.html
  26. You may need to increase the memory for java/mvn using (for example):
  27. export MAVEN_OPTS="-Xmx1024m -Xms512m"
  28. mvn install
  29. will build the full set of tika applications - it will build the app jar
  30. in a location like tika-app/target/tika-app-1.1.jar
  31. Copy tika-app-1.1.jar from there or point the module path to it.
  32. See also build instructions at: http://drupal.org/node/540974#comment-1944082
  33. If you are using Solr to extract your content, you need to copy (or symlink)
  34. the contents of contrib/extraction/lib to a directory named lib under your
  35. solr home, or alter solrconfig.xml to add the orgiginal directory as a
  36. lib directory.