You are here

README.txt in Apache Solr Search 8

This module integrates Drupal with the Apache Solr search platform. Solr search
can be used as a replacement for core content search and boasts both extra
features and better performance. Among the extra features is the ability to have
faceted search on facets ranging from content author to taxonomy to arbitrary
Field API fields.

The module comes with a schema.xml, solrconfig.xml, and protwords.txt file which
must be used in your Solr installation.

This module depends on the search framework in core.  When used in combination
with core search module, Apache Solr is not the default search. Access it via a
new tab on the default search page, called "Site".  You may configure it
to be default at ?q=admin/config/search/settings

Updating from 6.x
-----------------

IMPORTANT: there is no upgrade path from 6.x-1.x or 6.x-2.x. If you previously
installed those modules you must disable and uninstall them prior to
installing 7.x-1.x.

You will have to install the new schema.xml and solrconfig.xml files, and restart
the Solr server (or core) and delete your index and reindex all content.

Installation
------------

Prerequisite: Java 5 or higher (a.k.a. 1.5.x).  PHP 5.2.4 or higher.

Install the Apache Solr Drupal module as you would any Drupal module. Note
that the Drupal 7.x-1.x branch does not require the SolrPhpClient to
be installed. All necessary code is now included with this module.

Before enabling the module, you must have a working Solr server, or be
subscribed to a service like Acquia Search.

The Debian/Ubuntu packages for Solr should NOT be used to install Solr.
For example, do NOT install the solr or solr-jetty packages.

Download the latest Solr 1.4.x or 3.x release (e.g. 1.4.1 or 3.6.1) from:
http://www.apache.org/dyn/closer.cgi/lucene/solr/

Apache Lucene 3.1, 3.2 or 3.3, have a possible index corruption bug on
server crash or power loss (LUCENE-3418) and have bugs that interfere
with the Drupal admin reports. Solr 3.4 has a problem with
SortMissingLast so Solr 3.5.0 or later is strongly preferred.

Unpack the Solr tarball somewhere not visible to the web (not in your
webserver docroot and not inside of your Drupal directory).

The Solr download comes with an example application that you can use for
testing, development, and even for smaller production sites. This
application is found at apache-solr-1.4.1/example.

You must use 3 Solr configuration files that come with the Drupal
module or the integration will not work correctly.

For Solr 1.4 use the ones found in:
solr-conf/solr-1.4/

for Solr 3.5.0 or 3.6.1 use:
solr-conf/solr-3.x/

While the Solr 1.4 files will work for Solr 3.5+, they are not optimal
and you will be missing important new features.

For example, when deploying solr 1.4:

Move apache-solr-1.4.1/example/solr/conf/schema.xml and rename it to
something like schema.bak. Then move the solr-conf/solr-1.4/schema.xml
that comes with this Drupal module to take its place.

Similarly, move apache-solr-1.4.1/example/solr/conf/solrconfig.xml and rename
it like solrconfig.bak. Then move the solr-conf/solr-1.4/solrconfig.xml
that comes with this module to take its place.

Finally, move apache-solr-1.4.1/example/solr/conf/protwords.txt and rename it
protwords.bak. Then move the solr-conf/solr-1.4/protwords.txt that comes
with this module to take its place.

Make sure that the conf directory includes the following files - the Solr core
may not load if you don't have at least an empty file present:
solrconfig.xml
schema.xml
elevate.xml
mapping-ISOLatin1Accent.txt
protwords.txt
stopwords.txt
synonyms.txt

Now start the solr application by opening a shell, changing directory to
apache-solr-1.4.1/example, and executing the command java -jar start.jar

Test that your solr server is now available by visiting
http://localhost:8983/solr/admin/

Now, you should enable the "Apache Solr framework" and "Apache Solr search"
modules. Check that you can connect to Solr at ?q=admin/setting/apachesolr
Now run cron on your Drupal site until your content is indexed. You
can monitor the index at ?q=admin/settings/apachesolr/index

The solrconfig.xml that comes with this modules defines auto-commit, so
it may take a few minutes between running cron and when the new content
is visible in search.

To use facets you should download facetapi http://drupal.org/project/facetapi
This module will allow you to define and set facets next to your search pages.
Once this module is enabled, enable blocks for facets first at
Administer > Site configuration > Apache Solr > Enabled filters
then position them as you like at Administer > Site building > Blocks.

Access Sub-module
------------
The included Apache Solr Access module integrates with the node access
system using node access grants. It does not (and can not) work
with modules using hook_node_access() to block viewing of nodes because
it's impossible to apply those dynamic filters to as-yet-unknown search
results to return the correct number per page.  This same restriction
applies to any module that does content searching or listing (e.g. Views).

Settings.php
------------
You can override environment settings using the following syntax in your
settings.php

$conf['apachesolr_environments']['my_env_id']['url'] = 'http://localhost:8983';

Configuration variables
-----------------------

The module provides some (hidden) variables that can be used to tweak its
behavior:

 - apachesolr_luke_limit: the limit (in terms of number of documents in the
   index) above which the module will not retrieve the number of terms per field
   when performing LUKE queries (for performance reasons).

 - apachesolr_tags_to_index: the list of HTML tags that the module will index
   (see apachesolr_index_add_tags_to_document()).

 - apachesolr_exclude_nodeapi_types: an array of node types each of which is
   an array of one or more module names, such as 'comment'.  Any type listed
   will have any listed modules' hook_node_update_index() implementation skipped
   when indexing. This can be useful for excluding comments or taxonomy links.

 - apachesolr_ping_timeout: the timeout (in seconds) after which the module will
   consider the Apache Solr server unavailable.

 - apachesolr_optimize_interval: the interval (in seconds) between automatic
   optimizations of the Apache Solr index. Set to 0 to disable.

 - apachesolr_cache_delay: the interval (in seconds) after an update after which
   the module will requery the Apache Solr for the index structure. Set it to
   your autocommit delay plus a few seconds.

 - apachesolr_query_class: the default query class to use.

 - apachesolr_index_comments_with_node: TRUE | FALSE. Whether to index comments
   along with each node.

 - apachesolr_cron_mass_limit: update or delete at most this many documents in
   each Solr request, such as when making {apachesolr_search_node} consistent
   with {node}.

 - apachesolr_index_user: Define with which user you want the index process to
   happen.

Troubleshooting
---------------
Problem:
You use http basic auth to limit access to your Solr server.

Solution:
Set the Server URL to include the username and password like
http://username:password@example.com:8080/solr

Problem:
Links to nodes appear in the search results with a different host name or
subdomain than is preferred.  e.g. sometimes at http://example.com
and sometimes at http://www.example.com

Solution:
Set $base_url in settings.php to insure that an identical absolute url is
generated at all times when nodes are indexed.  Alternately, set up a re-direct
in .htaccess to prevent site visitors from accessing the site via more than one
site address.

Problem:
The 'Solr Index Queries' test fails with file permission errors.

Solution:
When running this test you should have your tomcat/jetty running as the same user
as the user under which PHP runs (often the same as the webserver). This is
important because of the on-the-fly folder creation within PHP.


Themers
----------------

See inline docs in apachesolr_theme and apachesolr_search_theme functions
within apachesolr.module and apachesolr_search.module.

File

README.txt
View source
  1. This module integrates Drupal with the Apache Solr search platform. Solr search
  2. can be used as a replacement for core content search and boasts both extra
  3. features and better performance. Among the extra features is the ability to have
  4. faceted search on facets ranging from content author to taxonomy to arbitrary
  5. Field API fields.
  6. The module comes with a schema.xml, solrconfig.xml, and protwords.txt file which
  7. must be used in your Solr installation.
  8. This module depends on the search framework in core. When used in combination
  9. with core search module, Apache Solr is not the default search. Access it via a
  10. new tab on the default search page, called "Site". You may configure it
  11. to be default at ?q=admin/config/search/settings
  12. Updating from 6.x
  13. -----------------
  14. IMPORTANT: there is no upgrade path from 6.x-1.x or 6.x-2.x. If you previously
  15. installed those modules you must disable and uninstall them prior to
  16. installing 7.x-1.x.
  17. You will have to install the new schema.xml and solrconfig.xml files, and restart
  18. the Solr server (or core) and delete your index and reindex all content.
  19. Installation
  20. ------------
  21. Prerequisite: Java 5 or higher (a.k.a. 1.5.x). PHP 5.2.4 or higher.
  22. Install the Apache Solr Drupal module as you would any Drupal module. Note
  23. that the Drupal 7.x-1.x branch does not require the SolrPhpClient to
  24. be installed. All necessary code is now included with this module.
  25. Before enabling the module, you must have a working Solr server, or be
  26. subscribed to a service like Acquia Search.
  27. The Debian/Ubuntu packages for Solr should NOT be used to install Solr.
  28. For example, do NOT install the solr or solr-jetty packages.
  29. Download the latest Solr 1.4.x or 3.x release (e.g. 1.4.1 or 3.6.1) from:
  30. http://www.apache.org/dyn/closer.cgi/lucene/solr/
  31. Apache Lucene 3.1, 3.2 or 3.3, have a possible index corruption bug on
  32. server crash or power loss (LUCENE-3418) and have bugs that interfere
  33. with the Drupal admin reports. Solr 3.4 has a problem with
  34. SortMissingLast so Solr 3.5.0 or later is strongly preferred.
  35. Unpack the Solr tarball somewhere not visible to the web (not in your
  36. webserver docroot and not inside of your Drupal directory).
  37. The Solr download comes with an example application that you can use for
  38. testing, development, and even for smaller production sites. This
  39. application is found at apache-solr-1.4.1/example.
  40. You must use 3 Solr configuration files that come with the Drupal
  41. module or the integration will not work correctly.
  42. For Solr 1.4 use the ones found in:
  43. solr-conf/solr-1.4/
  44. for Solr 3.5.0 or 3.6.1 use:
  45. solr-conf/solr-3.x/
  46. While the Solr 1.4 files will work for Solr 3.5+, they are not optimal
  47. and you will be missing important new features.
  48. For example, when deploying solr 1.4:
  49. Move apache-solr-1.4.1/example/solr/conf/schema.xml and rename it to
  50. something like schema.bak. Then move the solr-conf/solr-1.4/schema.xml
  51. that comes with this Drupal module to take its place.
  52. Similarly, move apache-solr-1.4.1/example/solr/conf/solrconfig.xml and rename
  53. it like solrconfig.bak. Then move the solr-conf/solr-1.4/solrconfig.xml
  54. that comes with this module to take its place.
  55. Finally, move apache-solr-1.4.1/example/solr/conf/protwords.txt and rename it
  56. protwords.bak. Then move the solr-conf/solr-1.4/protwords.txt that comes
  57. with this module to take its place.
  58. Make sure that the conf directory includes the following files - the Solr core
  59. may not load if you don't have at least an empty file present:
  60. solrconfig.xml
  61. schema.xml
  62. elevate.xml
  63. mapping-ISOLatin1Accent.txt
  64. protwords.txt
  65. stopwords.txt
  66. synonyms.txt
  67. Now start the solr application by opening a shell, changing directory to
  68. apache-solr-1.4.1/example, and executing the command java -jar start.jar
  69. Test that your solr server is now available by visiting
  70. http://localhost:8983/solr/admin/
  71. Now, you should enable the "Apache Solr framework" and "Apache Solr search"
  72. modules. Check that you can connect to Solr at ?q=admin/setting/apachesolr
  73. Now run cron on your Drupal site until your content is indexed. You
  74. can monitor the index at ?q=admin/settings/apachesolr/index
  75. The solrconfig.xml that comes with this modules defines auto-commit, so
  76. it may take a few minutes between running cron and when the new content
  77. is visible in search.
  78. To use facets you should download facetapi http://drupal.org/project/facetapi
  79. This module will allow you to define and set facets next to your search pages.
  80. Once this module is enabled, enable blocks for facets first at
  81. Administer > Site configuration > Apache Solr > Enabled filters
  82. then position them as you like at Administer > Site building > Blocks.
  83. Access Sub-module
  84. ------------
  85. The included Apache Solr Access module integrates with the node access
  86. system using node access grants. It does not (and can not) work
  87. with modules using hook_node_access() to block viewing of nodes because
  88. it's impossible to apply those dynamic filters to as-yet-unknown search
  89. results to return the correct number per page. This same restriction
  90. applies to any module that does content searching or listing (e.g. Views).
  91. Settings.php
  92. ------------
  93. You can override environment settings using the following syntax in your
  94. settings.php
  95. $conf['apachesolr_environments']['my_env_id']['url'] = 'http://localhost:8983';
  96. Configuration variables
  97. -----------------------
  98. The module provides some (hidden) variables that can be used to tweak its
  99. behavior:
  100. - apachesolr_luke_limit: the limit (in terms of number of documents in the
  101. index) above which the module will not retrieve the number of terms per field
  102. when performing LUKE queries (for performance reasons).
  103. - apachesolr_tags_to_index: the list of HTML tags that the module will index
  104. (see apachesolr_index_add_tags_to_document()).
  105. - apachesolr_exclude_nodeapi_types: an array of node types each of which is
  106. an array of one or more module names, such as 'comment'. Any type listed
  107. will have any listed modules' hook_node_update_index() implementation skipped
  108. when indexing. This can be useful for excluding comments or taxonomy links.
  109. - apachesolr_ping_timeout: the timeout (in seconds) after which the module will
  110. consider the Apache Solr server unavailable.
  111. - apachesolr_optimize_interval: the interval (in seconds) between automatic
  112. optimizations of the Apache Solr index. Set to 0 to disable.
  113. - apachesolr_cache_delay: the interval (in seconds) after an update after which
  114. the module will requery the Apache Solr for the index structure. Set it to
  115. your autocommit delay plus a few seconds.
  116. - apachesolr_query_class: the default query class to use.
  117. - apachesolr_index_comments_with_node: TRUE | FALSE. Whether to index comments
  118. along with each node.
  119. - apachesolr_cron_mass_limit: update or delete at most this many documents in
  120. each Solr request, such as when making {apachesolr_search_node} consistent
  121. with {node}.
  122. - apachesolr_index_user: Define with which user you want the index process to
  123. happen.
  124. Troubleshooting
  125. ---------------
  126. Problem:
  127. You use http basic auth to limit access to your Solr server.
  128. Solution:
  129. Set the Server URL to include the username and password like
  130. http://username:password@example.com:8080/solr
  131. Problem:
  132. Links to nodes appear in the search results with a different host name or
  133. subdomain than is preferred. e.g. sometimes at http://example.com
  134. and sometimes at http://www.example.com
  135. Solution:
  136. Set $base_url in settings.php to insure that an identical absolute url is
  137. generated at all times when nodes are indexed. Alternately, set up a re-direct
  138. in .htaccess to prevent site visitors from accessing the site via more than one
  139. site address.
  140. Problem:
  141. The 'Solr Index Queries' test fails with file permission errors.
  142. Solution:
  143. When running this test you should have your tomcat/jetty running as the same user
  144. as the user under which PHP runs (often the same as the webserver). This is
  145. important because of the on-the-fly folder creation within PHP.
  146. Themers
  147. ----------------
  148. See inline docs in apachesolr_theme and apachesolr_search_theme functions
  149. within apachesolr.module and apachesolr_search.module.