You are here

README.txt in Search by Page 6

Same filename and directory in other branches
  1. 8 README.txt
  2. 7 README.txt
Search by Page module for Drupal

This module adds searching to the core Drupal search module that is oriented by
page. It can be used as an additional tab in the core Search page, or you can
display Search by Page separately (though it requires the core Search module to
be enabled, because it uses Search for its indexing).

Contents of this file:
- How it works
- Setup and Configuration
- Usage
- Theming
- Users and roles
- Other suggestions
- Copyright and License


-- How it works --

The core Search module works by indexing your content whenever cron is run, and
then looking in that index when someone requests a search on your site. The
Content tab of Search indexes all the content items on your site, by loading the
item and indexing the resulting body and comments. The User portion actually
doesn't index users -- at search time, it just looks for a user name that
matches (User Search doesn't look in any other profile fields, except that for
an administrative user doing a user search, it looks in the email field as
well). Other modules may add other tabs to Search as well.

Search by Page, in contrast, indexes the content of pages on your site, which
could be content pages, user profile pages, composites of content (such as
Views), or pages that are generated by other modules. The indexing in Search by
Page is done by first building the "content" region of each page to be indexed,
in the language(s) appropriate to the page and with the viewpoint of the role
you configure, and then adding that content to the Search index. Note that only
the "content" region of the page is indexed, not the sidebars, header, footer,
or other block regions in your theme. Also note that what is indexed is what is
output by your theme for each page, in contrast to core Search, which does not
depend on the theme's rendering of the page.

Search by Page also restricts search results to the currently-enabled language.
The core Search module only does this for content search, and only if you
have the Internationalization module enabled.

One other difference between Search by Page and the usual content search of the
core Search module is in reindexing. Search by Page assumes that your page
content might change over time, so it periodically reindexes the pages on your
site, giving priority to pages that have been edited. In contrast, the core
Search module assumes that if a content item hasn't been edited, it doesn't need
to be reindexed. You have some additional control over this reindexing -- see
the configuration section below.

Your site may experience errors during content indexing, which you can see in
the Recent Log Entries report. The typical reason is that the item that is
being indexed cannot be viewed by the user role you chose for indexing; other
errors are also possible. If this happens, Search by Page will still mark the
page as "indexed" in the search index, so that in the next cron run, it will
not try to index the page again and block working items from indexing. If you
ever want Search by Page to try indexing the failed pages again (after fixing
the cause of the error, presumably), there is a link to reset items with no
content in the index. This is located in the "Additional Actions" section of
the Search by Page configuration screen.


-- Setup and Configuration --

Search by Page does not know what the pages of your site are, so it doesn't
index anything by itself. You will need to enable and configure at least one
sub-module that lets you add paths to the search index, in order for this module
to do anything.

You will also need to set up one or more search "environments". Each environment
defines which paths are searchable, and has its own search URL and search block.

Four sub-modules are provided: "Paths" to index arbitrary paths to pages on your
site, "Nodes" to index content items of particular content types, "Users" to
index user profile pages for users of particular roles, and "Attachments" to
index files attached to content items.

The "Paths" sub-module is the most generic, but if you put a lot of paths in it,
your searches will run slower. (The technical reason is that each time someone
searches on your site, this module has to check whether that person has
permission to view each page in the list, to exclude pages the person doesn't
have permission to view from search results, and this has to be done via a PHP
loop rather than an SQL query because of how Drupal permissions work.)

IMPORTANT NOTE: If you are using Search by Page Paths, your database must be
set up with permission to create temporary tables.

The "Attachments" sub-module indexes the text in certain types of files that are
attached to content items via either the CCK FileField module or the core Upload
module. This requires "helper" programs to extract the text from file
attachments, and the helper programs are configured using the separate Search
Files module (which you can find at http://drupal.org/project/search_files). It
is recommended that you download Search Files 6.x-2.x (which may be a "Beta"
version, but that is OK), and only enable the Search Files API module (and not
the other included modules). This will enable just the helper program setup
functionality, without enabling the other functionality of Search Files.

If you want to write your own sub-modules, see the search_by_page.api.php file
included with this module (or use one of the included sub-modules as an
example).

Once you have enabled sub-module(s), visit the path
admin/settings/search_by_page to set up search environments and define pages to
index for each environment. Then wait for cron to run (or visit the status
report page, admin/reports/status, and click on "run cron manually"). No pages
will be indexed until cron has run, and no search results will come out until
pages have been indexed.

Other configuration options:
* You can change various labels and other text on the Search by Page
configuration pages.
* You can set the number of items Search by Page will index per cron run on
the Search by Page configuration pages. This is independent of the indexing
settings for the core Search module. If you are using Search by Page as an
independent search (rather than as a tab on the core Search page -- see section
below), you might want to set the core Search settings cron limit to zero, so
that only Search by Page items are added to the search index. But if you do
that, the % indexed reported on the Search page will never reach 100%, because
the core content items will never be indexed. So you will need to check the more
detailed status report farther down on that page.
* You can control the reindex cycling described in the How it Works section
above by using the minimum/maximum reindexing time settings, which are on a
per-module, per-environment basis. Setting the minimum reindexing time forces
Search by Page to wait at least this amount of time before reindexing that type
of page.  Setting the maximum reindex time forces Search by Page to reindex that
type of page immediately when this amount of time has passed. WARNING: Do not
choose too small of a maximum reindex page globally! This setting works by
marking the pages for immediate reindexing when this time has passed, and it can
interfere with the reindexing of new content.
* You can exclude the contents of specific HTML tags from indexing.
* You will also need to set permissions, which are separate from the
core Search permissions.
* You should also visit the main Search configuration screen, where you can set
options such as the number of items to index each cron run for core Search
modules, and the minimum word size for searching. You can also watch the
progress of indexing on that page (there is a detailed table near the bottom in
the Search by Page section).


-- Usage --

You have two choices for how to use the module, once you have it set up:

a) There will be a new tab called "Pages" by default, included in Drupal's
built-in search page (in addition to the Content and Users tabs provided by
Search, and any other tabs added by other search modules). So, if a site visitor
performs a search from that tab, they will get the Search by Pages results. This
will use whichever search environment you have set as the default.

b) You can also use Search by Page as its own entity, which is probably what you
want to do, assuming that you think having three separate tabs called "Content",
"Pages", and "Users" is confusing in the core Search results page, and assuming
you have configured sub-modules so that all the site content you want people to
search is available from Search by Pages. To run Search by Page as its own
entity, enable the Search by Pages block for your search environments, and/or
add a link to the paths you have defined for your search environments to your
menu system.


-- Theming --

The search form that is used by Search by Page on search pages and search blocks
can be themed using the search-by-page-form.tpl.php file provided (copy that
file into your theme and modify it).

Search results are themed using the search-result.tpl.php (each result item) and
search-results.tpl.php (the list of results) theme files from the core Search
module (in directory modules/search in your Drupal installation). If you are
using Search by Page Attachments, there is an additional variable available
$result['related_node'], which gives you the node object that the attachment is
attached to.

The heading shown on the search results page can be themed by overriding
theme_search_by_page_results_title(). The markup and text shown when there are
no search results can be themed by overriding
theme_search_by_page_no_results(). Default versions of both functions are in the
search_by_page.module file (copy the function into your theme's template.php
file, replace the word 'theme' in the function name with the name of your
module, and modify the function.)


-- Users and roles --

When you set up content items, attachments, etc. for searching within Search by
Page, you will need to choose a role to use for search indexing. This will make
Search by Page render your pages from the point of view of a user with that
role.

In order to do this, assuming you have used a non-anonymous role, Search by Page
will create its own user accounts for internal use, which you will see on your
Users management page. For instance, if you set up Search by Page Nodes to index
from the point of view of role "My role", Search by Page will set up a user
called "sbp indexing My role" with role "My role". The users that Search by Page
sets up will always have their status set to "blocked". During search indexing,
the account is set to "active" only temporarily, and only for the indexing
process, so no one should ever be able to see these users except site
administrators.


-- Other suggestions --

The default behavior for Drupal's Search module (which is the technology used
for indexing/searching in Search by Page) is that only exact matches are
returned (except for the User search portion of Search, which matches substrings
of user names). For instance, this means that if you search for "quake", and a
page contains "quakes", "quaking", or "earthquake", it will not be matched.

To get around this limitation, I suggest using a "stemmer" module, such as
http://drupal.org/project/porterstemmer (You can search for "stemmer" on
drupal.org to find stemmers for other languages.)  Stemmers enable matching on
inflected forms of words (verb forms, plurals, etc.), so they should give you
matches for "quaking" and "quakes" if you search for "quake". They wouldn't give
you a match for "earthquake", however.


-- Copyright and License --

Copyright 2009-2010 Jennifer Hodgdon, Poplar ProductivityWare LLC

Licensed under the GNU Public License

File

README.txt
View source
  1. Search by Page module for Drupal
  2. This module adds searching to the core Drupal search module that is oriented by
  3. page. It can be used as an additional tab in the core Search page, or you can
  4. display Search by Page separately (though it requires the core Search module to
  5. be enabled, because it uses Search for its indexing).
  6. Contents of this file:
  7. - How it works
  8. - Setup and Configuration
  9. - Usage
  10. - Theming
  11. - Users and roles
  12. - Other suggestions
  13. - Copyright and License
  14. -- How it works --
  15. The core Search module works by indexing your content whenever cron is run, and
  16. then looking in that index when someone requests a search on your site. The
  17. Content tab of Search indexes all the content items on your site, by loading the
  18. item and indexing the resulting body and comments. The User portion actually
  19. doesn't index users -- at search time, it just looks for a user name that
  20. matches (User Search doesn't look in any other profile fields, except that for
  21. an administrative user doing a user search, it looks in the email field as
  22. well). Other modules may add other tabs to Search as well.
  23. Search by Page, in contrast, indexes the content of pages on your site, which
  24. could be content pages, user profile pages, composites of content (such as
  25. Views), or pages that are generated by other modules. The indexing in Search by
  26. Page is done by first building the "content" region of each page to be indexed,
  27. in the language(s) appropriate to the page and with the viewpoint of the role
  28. you configure, and then adding that content to the Search index. Note that only
  29. the "content" region of the page is indexed, not the sidebars, header, footer,
  30. or other block regions in your theme. Also note that what is indexed is what is
  31. output by your theme for each page, in contrast to core Search, which does not
  32. depend on the theme's rendering of the page.
  33. Search by Page also restricts search results to the currently-enabled language.
  34. The core Search module only does this for content search, and only if you
  35. have the Internationalization module enabled.
  36. One other difference between Search by Page and the usual content search of the
  37. core Search module is in reindexing. Search by Page assumes that your page
  38. content might change over time, so it periodically reindexes the pages on your
  39. site, giving priority to pages that have been edited. In contrast, the core
  40. Search module assumes that if a content item hasn't been edited, it doesn't need
  41. to be reindexed. You have some additional control over this reindexing -- see
  42. the configuration section below.
  43. Your site may experience errors during content indexing, which you can see in
  44. the Recent Log Entries report. The typical reason is that the item that is
  45. being indexed cannot be viewed by the user role you chose for indexing; other
  46. errors are also possible. If this happens, Search by Page will still mark the
  47. page as "indexed" in the search index, so that in the next cron run, it will
  48. not try to index the page again and block working items from indexing. If you
  49. ever want Search by Page to try indexing the failed pages again (after fixing
  50. the cause of the error, presumably), there is a link to reset items with no
  51. content in the index. This is located in the "Additional Actions" section of
  52. the Search by Page configuration screen.
  53. -- Setup and Configuration --
  54. Search by Page does not know what the pages of your site are, so it doesn't
  55. index anything by itself. You will need to enable and configure at least one
  56. sub-module that lets you add paths to the search index, in order for this module
  57. to do anything.
  58. You will also need to set up one or more search "environments". Each environment
  59. defines which paths are searchable, and has its own search URL and search block.
  60. Four sub-modules are provided: "Paths" to index arbitrary paths to pages on your
  61. site, "Nodes" to index content items of particular content types, "Users" to
  62. index user profile pages for users of particular roles, and "Attachments" to
  63. index files attached to content items.
  64. The "Paths" sub-module is the most generic, but if you put a lot of paths in it,
  65. your searches will run slower. (The technical reason is that each time someone
  66. searches on your site, this module has to check whether that person has
  67. permission to view each page in the list, to exclude pages the person doesn't
  68. have permission to view from search results, and this has to be done via a PHP
  69. loop rather than an SQL query because of how Drupal permissions work.)
  70. IMPORTANT NOTE: If you are using Search by Page Paths, your database must be
  71. set up with permission to create temporary tables.
  72. The "Attachments" sub-module indexes the text in certain types of files that are
  73. attached to content items via either the CCK FileField module or the core Upload
  74. module. This requires "helper" programs to extract the text from file
  75. attachments, and the helper programs are configured using the separate Search
  76. Files module (which you can find at http://drupal.org/project/search_files). It
  77. is recommended that you download Search Files 6.x-2.x (which may be a "Beta"
  78. version, but that is OK), and only enable the Search Files API module (and not
  79. the other included modules). This will enable just the helper program setup
  80. functionality, without enabling the other functionality of Search Files.
  81. If you want to write your own sub-modules, see the search_by_page.api.php file
  82. included with this module (or use one of the included sub-modules as an
  83. example).
  84. Once you have enabled sub-module(s), visit the path
  85. admin/settings/search_by_page to set up search environments and define pages to
  86. index for each environment. Then wait for cron to run (or visit the status
  87. report page, admin/reports/status, and click on "run cron manually"). No pages
  88. will be indexed until cron has run, and no search results will come out until
  89. pages have been indexed.
  90. Other configuration options:
  91. * You can change various labels and other text on the Search by Page
  92. configuration pages.
  93. * You can set the number of items Search by Page will index per cron run on
  94. the Search by Page configuration pages. This is independent of the indexing
  95. settings for the core Search module. If you are using Search by Page as an
  96. independent search (rather than as a tab on the core Search page -- see section
  97. below), you might want to set the core Search settings cron limit to zero, so
  98. that only Search by Page items are added to the search index. But if you do
  99. that, the % indexed reported on the Search page will never reach 100%, because
  100. the core content items will never be indexed. So you will need to check the more
  101. detailed status report farther down on that page.
  102. * You can control the reindex cycling described in the How it Works section
  103. above by using the minimum/maximum reindexing time settings, which are on a
  104. per-module, per-environment basis. Setting the minimum reindexing time forces
  105. Search by Page to wait at least this amount of time before reindexing that type
  106. of page. Setting the maximum reindex time forces Search by Page to reindex that
  107. type of page immediately when this amount of time has passed. WARNING: Do not
  108. choose too small of a maximum reindex page globally! This setting works by
  109. marking the pages for immediate reindexing when this time has passed, and it can
  110. interfere with the reindexing of new content.
  111. * You can exclude the contents of specific HTML tags from indexing.
  112. * You will also need to set permissions, which are separate from the
  113. core Search permissions.
  114. * You should also visit the main Search configuration screen, where you can set
  115. options such as the number of items to index each cron run for core Search
  116. modules, and the minimum word size for searching. You can also watch the
  117. progress of indexing on that page (there is a detailed table near the bottom in
  118. the Search by Page section).
  119. -- Usage --
  120. You have two choices for how to use the module, once you have it set up:
  121. a) There will be a new tab called "Pages" by default, included in Drupal's
  122. built-in search page (in addition to the Content and Users tabs provided by
  123. Search, and any other tabs added by other search modules). So, if a site visitor
  124. performs a search from that tab, they will get the Search by Pages results. This
  125. will use whichever search environment you have set as the default.
  126. b) You can also use Search by Page as its own entity, which is probably what you
  127. want to do, assuming that you think having three separate tabs called "Content",
  128. "Pages", and "Users" is confusing in the core Search results page, and assuming
  129. you have configured sub-modules so that all the site content you want people to
  130. search is available from Search by Pages. To run Search by Page as its own
  131. entity, enable the Search by Pages block for your search environments, and/or
  132. add a link to the paths you have defined for your search environments to your
  133. menu system.
  134. -- Theming --
  135. The search form that is used by Search by Page on search pages and search blocks
  136. can be themed using the search-by-page-form.tpl.php file provided (copy that
  137. file into your theme and modify it).
  138. Search results are themed using the search-result.tpl.php (each result item) and
  139. search-results.tpl.php (the list of results) theme files from the core Search
  140. module (in directory modules/search in your Drupal installation). If you are
  141. using Search by Page Attachments, there is an additional variable available
  142. $result['related_node'], which gives you the node object that the attachment is
  143. attached to.
  144. The heading shown on the search results page can be themed by overriding
  145. theme_search_by_page_results_title(). The markup and text shown when there are
  146. no search results can be themed by overriding
  147. theme_search_by_page_no_results(). Default versions of both functions are in the
  148. search_by_page.module file (copy the function into your theme's template.php
  149. file, replace the word 'theme' in the function name with the name of your
  150. module, and modify the function.)
  151. -- Users and roles --
  152. When you set up content items, attachments, etc. for searching within Search by
  153. Page, you will need to choose a role to use for search indexing. This will make
  154. Search by Page render your pages from the point of view of a user with that
  155. role.
  156. In order to do this, assuming you have used a non-anonymous role, Search by Page
  157. will create its own user accounts for internal use, which you will see on your
  158. Users management page. For instance, if you set up Search by Page Nodes to index
  159. from the point of view of role "My role", Search by Page will set up a user
  160. called "sbp indexing My role" with role "My role". The users that Search by Page
  161. sets up will always have their status set to "blocked". During search indexing,
  162. the account is set to "active" only temporarily, and only for the indexing
  163. process, so no one should ever be able to see these users except site
  164. administrators.
  165. -- Other suggestions --
  166. The default behavior for Drupal's Search module (which is the technology used
  167. for indexing/searching in Search by Page) is that only exact matches are
  168. returned (except for the User search portion of Search, which matches substrings
  169. of user names). For instance, this means that if you search for "quake", and a
  170. page contains "quakes", "quaking", or "earthquake", it will not be matched.
  171. To get around this limitation, I suggest using a "stemmer" module, such as
  172. http://drupal.org/project/porterstemmer (You can search for "stemmer" on
  173. drupal.org to find stemmers for other languages.) Stemmers enable matching on
  174. inflected forms of words (verb forms, plurals, etc.), so they should give you
  175. matches for "quaking" and "quakes" if you search for "quake". They wouldn't give
  176. you a match for "earthquake", however.
  177. -- Copyright and License --
  178. Copyright 2009-2010 Jennifer Hodgdon, Poplar ProductivityWare LLC
  179. Licensed under the GNU Public License