You are here

README.txt in Apache Solr Multilingual 6

Same filename and directory in other branches
  1. 6.3 README.txt
  2. 6.2 README.txt
  3. 7 README.txt
Apache Solr Multilingual
========================

Name: apachesolr_multilingual
Authors: Markus Kalkbrenner | Cocomore AG
         Matthias Huder | Cocomore AG
Drupal: 6.x
Sponsor: Cocomore AG - http://www.cocomore.com
                       http://drupal.cocomore.com


Description
===========

Apache Solr Multilingual extends Apache Solr Search Integration
in a clean way to provide:
  * better support for non-English languages
  * support for multilingual search
  * an easy to use administration interface for non-English and
multilingual search


Installation
============

1. Place whole apachesolr_multilingual folder into your Drupal
   modules/ or better sites/x/modules/ directory.

2. Enable the apachesolr_multilingual module at
   admin/build/modules

3. Optional but recommended:
   Enable the apachesolr_multilingual_texfile module at
   administer/modules. Apache Solr requires some text files
   like stopwords.txt. This module adds an adminstration
   interface for such files to drupal. If you don't like it
   you need to maintain such files manually.

Now you have different options to complete your setup:

1. Your site uses a unique non-English language.
   If you additionally installed apachesolr_multilingual_texfile
   continue at "A) Unique Language and Apache Solr Multilingual
   Texfile". Otherwise continue at "C) Unique Language"

2. Your site uses multiple languages (multilingual) and your
   content is assigned to languages using the locale module.
   If you additionally installed apachesolr_multilingual_texfile
   continue at "B) Multiple Languages and Apache Solr Multilingual
   Texfile". Otherwise continue at "D) Multiple Languages"



A) Unique Language and Apache Solr Multilingual Texfile
=======================================================

1. Ensure that all the language you want to cover is
   available and enabled at admin/settings/language

2. Enable the languages you want to cover at
   admin/settings/apachesolr/multilingual
   and "Save configuration"

3. Adjust all solr text files to your needs at
   admin/settings/apachesolr/multilingual

4. Download apachesolr_unique_language_config.zip at
   admin/settings/apachesolr/schema_generator

5. Extract apachesolr_unique_language_config.zip to your solr
   conf directory and restart solr

6. "Re-index all content" at settings/apachesolr/index.


B) Multiple Languages and Apache Solr Multilingual Texfile
==========================================================

1. Ensure that all the languages you want to cover with
   multilingual search are available and enabled at
   admin/settings/language

2. Enable all the languages you want to cover with
   multilingual search at admin/settings/apachesolr/multilingual
   and "Save configuration"

3. Adjust all solr text files to your needs at
   admin/settings/apachesolr/multilingual

4. Download apachesolr_multilingual_config.zip at
   admin/settings/apachesolr/schema_generator

5. Extract apachesolr_multilingual_config.zip to your solr
   conf directory and restart solr

6. "Re-index all content" at settings/apachesolr/index.
   It's important that you already have content in every langauge
   at this point. Otherwise the checkboxes in the next step won't
   exist until you indexed some content in a specific language

7. Go to admin/settings/apachesolr/query-fields and set "Body" and
   "Title" to "Omit". Enable all language specific bodies and titles
   like body_en or title_de by selecting any value you like but not
   "Omit". And don't forget to "Save configuration".

8. Optional: Like described in 7 omit
     "Body text inside links (A tags)",
     "Body text inside H1 tags",
     "Body text inside H2 or H3 tags",
     "Body text inside H4, H5, or H6 tags",
     "Body text in inline tags like EM or STRONG"
   and turn on the labguage specific fields like
     "tags_a_de",
     "tags_h1_de",
     "tags_h2_h3_de",
     "tags_h4_h5_h6_de",
     "tags_inline_de".

9. Optional: If you insatalled the module "Taxonomy translation" and
   turned on "Index taxonomy term translations" at
   /admin/settings/apachesolr/multilingual you should omit
   "All taxonomy term names" and enable the language specific equivalent
   like "taxonomy_names_de" instead like described in 7.


C) Unique Language
==================

1. Ensure that all the language you want to cover is
   available and enabled at admin/settings/language

2. Enable the languages you want to cover at
   admin/settings/apachesolr/multilingual
   and "Save configuration"

4. Download schema.xml for unique language setup at
   admin/settings/apachesolr/schema_generator

5. Copy schema.xml to your solr conf directory

6. Ensure that you have these four files in your solr conf
   directory:
     stopwords.txt
     synonyms.txt
     protwords.txt
     compoundwords.txt

7. Restart solr

8. "Re-index all content" at settings/apachesolr/index.


D) Multiple Languages
=====================

1. Ensure that all the languages you want to cover with
   multilingual search are available and enabled at
   admin/settings/language

2. Enable all the languages you want to cover with
   multilingual search at admin/settings/apachesolr/multilingual
   and "Save configuration"

4. Download schema.xml for multilingual setup at
   admin/settings/apachesolr/schema_generator

5. Copy schema.xml to your solr conf directory

6. Ensure that you have these four files in your solr conf
   directory for each language:
     stopwords_LANGUAGE.txt
     synonyms_LANGUAGE.txt
     protwords_LANGUAGE.txt
     compoundwords_LANGUAGE.txt

7. Restart solr

8. "Re-index all content" at settings/apachesolr/index.
   It's important that you already have content in every langauge
   at this point. Otherwise the checkboxes in the next step won't
   exist until you indexed some content in a specific language

9. Go to admin/settings/apachesolr/query-fields and set "Body" and
   "Title" to "Omit". Enable all language specific bodies and titles
   like body_en or title_de by selecting any value you like but not
   "Omit". And don't forget to "Save configuration".

10. Optional: Like described in 9 omit
     "Body text inside links (A tags)",
     "Body text inside H1 tags",
     "Body text inside H2 or H3 tags",
     "Body text inside H4, H5, or H6 tags",
     "Body text in inline tags like EM or STRONG"
   and turn on the labguage specific fields like
     "tags_a_de",
     "tags_h1_de",
     "tags_h2_h3_de",
     "tags_h4_h5_h6_de",
     "tags_inline_de".

11. Optional: If you insatalled the module "Taxonomy translation" and
   turned on "Index taxonomy term translations" at
   /admin/settings/apachesolr/multilingual you should omit
   "All taxonomy term names" and enable the language specific equivalent
   like "taxonomy_names_de" instead like described in 9.


Spell Checker
=============

How it works:
* langauge neutral spell checker doesn't use any stop words.
* as soon as a user limited his search by language facet spell
  checking is language specific

ToDo:
* admin configures if spell checker is language specific if
  site language changes (language selector, URL, ...)
* admin configures if more than one suggestion should be made
  in different languages (expensive because solr needs to be queried
  one time per language)


Apache Solr Text Files
======================

stopwords.txt
=============
TODO


protwords.txt
=============
TODO


synonyms.txt
=============
TODO


compoundwords.txt
=================
TODO


Troubleshooting
===============

Searching for words containing accents or umlauts does not work!
You need to verify the configuration of your servlet container (tomcat, jetty, ...)
to support UTF-8 characters within the URL. For tomcat you have to add an attribute
URIEncoding="UTF-8" to your Connector definition. See Solr's documentation for details:
http://wiki.apache.org/solr/SolrInstall
http://wiki.apache.org/solr/SolrTomcat

File

README.txt
View source
  1. Apache Solr Multilingual
  2. ========================
  3. Name: apachesolr_multilingual
  4. Authors: Markus Kalkbrenner | Cocomore AG
  5. Matthias Huder | Cocomore AG
  6. Drupal: 6.x
  7. Sponsor: Cocomore AG - http://www.cocomore.com
  8. http://drupal.cocomore.com
  9. Description
  10. ===========
  11. Apache Solr Multilingual extends Apache Solr Search Integration
  12. in a clean way to provide:
  13. * better support for non-English languages
  14. * support for multilingual search
  15. * an easy to use administration interface for non-English and
  16. multilingual search
  17. Installation
  18. ============
  19. 1. Place whole apachesolr_multilingual folder into your Drupal
  20. modules/ or better sites/x/modules/ directory.
  21. 2. Enable the apachesolr_multilingual module at
  22. admin/build/modules
  23. 3. Optional but recommended:
  24. Enable the apachesolr_multilingual_texfile module at
  25. administer/modules. Apache Solr requires some text files
  26. like stopwords.txt. This module adds an adminstration
  27. interface for such files to drupal. If you don't like it
  28. you need to maintain such files manually.
  29. Now you have different options to complete your setup:
  30. 1. Your site uses a unique non-English language.
  31. If you additionally installed apachesolr_multilingual_texfile
  32. continue at "A) Unique Language and Apache Solr Multilingual
  33. Texfile". Otherwise continue at "C) Unique Language"
  34. 2. Your site uses multiple languages (multilingual) and your
  35. content is assigned to languages using the locale module.
  36. If you additionally installed apachesolr_multilingual_texfile
  37. continue at "B) Multiple Languages and Apache Solr Multilingual
  38. Texfile". Otherwise continue at "D) Multiple Languages"
  39. A) Unique Language and Apache Solr Multilingual Texfile
  40. =======================================================
  41. 1. Ensure that all the language you want to cover is
  42. available and enabled at admin/settings/language
  43. 2. Enable the languages you want to cover at
  44. admin/settings/apachesolr/multilingual
  45. and "Save configuration"
  46. 3. Adjust all solr text files to your needs at
  47. admin/settings/apachesolr/multilingual
  48. 4. Download apachesolr_unique_language_config.zip at
  49. admin/settings/apachesolr/schema_generator
  50. 5. Extract apachesolr_unique_language_config.zip to your solr
  51. conf directory and restart solr
  52. 6. "Re-index all content" at settings/apachesolr/index.
  53. B) Multiple Languages and Apache Solr Multilingual Texfile
  54. ==========================================================
  55. 1. Ensure that all the languages you want to cover with
  56. multilingual search are available and enabled at
  57. admin/settings/language
  58. 2. Enable all the languages you want to cover with
  59. multilingual search at admin/settings/apachesolr/multilingual
  60. and "Save configuration"
  61. 3. Adjust all solr text files to your needs at
  62. admin/settings/apachesolr/multilingual
  63. 4. Download apachesolr_multilingual_config.zip at
  64. admin/settings/apachesolr/schema_generator
  65. 5. Extract apachesolr_multilingual_config.zip to your solr
  66. conf directory and restart solr
  67. 6. "Re-index all content" at settings/apachesolr/index.
  68. It's important that you already have content in every langauge
  69. at this point. Otherwise the checkboxes in the next step won't
  70. exist until you indexed some content in a specific language
  71. 7. Go to admin/settings/apachesolr/query-fields and set "Body" and
  72. "Title" to "Omit". Enable all language specific bodies and titles
  73. like body_en or title_de by selecting any value you like but not
  74. "Omit". And don't forget to "Save configuration".
  75. 8. Optional: Like described in 7 omit
  76. "Body text inside links (A tags)",
  77. "Body text inside H1 tags",
  78. "Body text inside H2 or H3 tags",
  79. "Body text inside H4, H5, or H6 tags",
  80. "Body text in inline tags like EM or STRONG"
  81. and turn on the labguage specific fields like
  82. "tags_a_de",
  83. "tags_h1_de",
  84. "tags_h2_h3_de",
  85. "tags_h4_h5_h6_de",
  86. "tags_inline_de".
  87. 9. Optional: If you insatalled the module "Taxonomy translation" and
  88. turned on "Index taxonomy term translations" at
  89. /admin/settings/apachesolr/multilingual you should omit
  90. "All taxonomy term names" and enable the language specific equivalent
  91. like "taxonomy_names_de" instead like described in 7.
  92. C) Unique Language
  93. ==================
  94. 1. Ensure that all the language you want to cover is
  95. available and enabled at admin/settings/language
  96. 2. Enable the languages you want to cover at
  97. admin/settings/apachesolr/multilingual
  98. and "Save configuration"
  99. 4. Download schema.xml for unique language setup at
  100. admin/settings/apachesolr/schema_generator
  101. 5. Copy schema.xml to your solr conf directory
  102. 6. Ensure that you have these four files in your solr conf
  103. directory:
  104. stopwords.txt
  105. synonyms.txt
  106. protwords.txt
  107. compoundwords.txt
  108. 7. Restart solr
  109. 8. "Re-index all content" at settings/apachesolr/index.
  110. D) Multiple Languages
  111. =====================
  112. 1. Ensure that all the languages you want to cover with
  113. multilingual search are available and enabled at
  114. admin/settings/language
  115. 2. Enable all the languages you want to cover with
  116. multilingual search at admin/settings/apachesolr/multilingual
  117. and "Save configuration"
  118. 4. Download schema.xml for multilingual setup at
  119. admin/settings/apachesolr/schema_generator
  120. 5. Copy schema.xml to your solr conf directory
  121. 6. Ensure that you have these four files in your solr conf
  122. directory for each language:
  123. stopwords_LANGUAGE.txt
  124. synonyms_LANGUAGE.txt
  125. protwords_LANGUAGE.txt
  126. compoundwords_LANGUAGE.txt
  127. 7. Restart solr
  128. 8. "Re-index all content" at settings/apachesolr/index.
  129. It's important that you already have content in every langauge
  130. at this point. Otherwise the checkboxes in the next step won't
  131. exist until you indexed some content in a specific language
  132. 9. Go to admin/settings/apachesolr/query-fields and set "Body" and
  133. "Title" to "Omit". Enable all language specific bodies and titles
  134. like body_en or title_de by selecting any value you like but not
  135. "Omit". And don't forget to "Save configuration".
  136. 10. Optional: Like described in 9 omit
  137. "Body text inside links (A tags)",
  138. "Body text inside H1 tags",
  139. "Body text inside H2 or H3 tags",
  140. "Body text inside H4, H5, or H6 tags",
  141. "Body text in inline tags like EM or STRONG"
  142. and turn on the labguage specific fields like
  143. "tags_a_de",
  144. "tags_h1_de",
  145. "tags_h2_h3_de",
  146. "tags_h4_h5_h6_de",
  147. "tags_inline_de".
  148. 11. Optional: If you insatalled the module "Taxonomy translation" and
  149. turned on "Index taxonomy term translations" at
  150. /admin/settings/apachesolr/multilingual you should omit
  151. "All taxonomy term names" and enable the language specific equivalent
  152. like "taxonomy_names_de" instead like described in 9.
  153. Spell Checker
  154. =============
  155. How it works:
  156. * langauge neutral spell checker doesn't use any stop words.
  157. * as soon as a user limited his search by language facet spell
  158. checking is language specific
  159. ToDo:
  160. * admin configures if spell checker is language specific if
  161. site language changes (language selector, URL, ...)
  162. * admin configures if more than one suggestion should be made
  163. in different languages (expensive because solr needs to be queried
  164. one time per language)
  165. Apache Solr Text Files
  166. ======================
  167. stopwords.txt
  168. =============
  169. TODO
  170. protwords.txt
  171. =============
  172. TODO
  173. synonyms.txt
  174. =============
  175. TODO
  176. compoundwords.txt
  177. =================
  178. TODO
  179. Troubleshooting
  180. ===============
  181. Searching for words containing accents or umlauts does not work!
  182. You need to verify the configuration of your servlet container (tomcat, jetty, ...)
  183. to support UTF-8 characters within the URL. For tomcat you have to add an attribute
  184. URIEncoding="UTF-8" to your Connector definition. See Solr's documentation for details:
  185. http://wiki.apache.org/solr/SolrInstall
  186. http://wiki.apache.org/solr/SolrTomcat