README.txt in Search by Page 6
Search by Page module for Drupal
This module adds searching to the core Drupal search module that is oriented by
page. It can be used as an additional tab in the core Search page, or you can
display Search by Page separately (though it requires the core Search module to
be enabled, because it uses Search for its indexing).
Contents of this file:
- How it works
- Setup and Configuration
- Usage
- Theming
- Users and roles
- Other suggestions
- Copyright and License
-- How it works --
The core Search module works by indexing your content whenever cron is run, and
then looking in that index when someone requests a search on your site. The
Content tab of Search indexes all the content items on your site, by loading the
item and indexing the resulting body and comments. The User portion actually
doesn't index users -- at search time, it just looks for a user name that
matches (User Search doesn't look in any other profile fields, except that for
an administrative user doing a user search, it looks in the email field as
well). Other modules may add other tabs to Search as well.
Search by Page, in contrast, indexes the content of pages on your site, which
could be content pages, user profile pages, composites of content (such as
Views), or pages that are generated by other modules. The indexing in Search by
Page is done by first building the "content" region of each page to be indexed,
in the language(s) appropriate to the page and with the viewpoint of the role
you configure, and then adding that content to the Search index. Note that only
the "content" region of the page is indexed, not the sidebars, header, footer,
or other block regions in your theme. Also note that what is indexed is what is
output by your theme for each page, in contrast to core Search, which does not
depend on the theme's rendering of the page.
Search by Page also restricts search results to the currently-enabled language.
The core Search module only does this for content search, and only if you
have the Internationalization module enabled.
One other difference between Search by Page and the usual content search of the
core Search module is in reindexing. Search by Page assumes that your page
content might change over time, so it periodically reindexes the pages on your
site, giving priority to pages that have been edited. In contrast, the core
Search module assumes that if a content item hasn't been edited, it doesn't need
to be reindexed. You have some additional control over this reindexing -- see
the configuration section below.
Your site may experience errors during content indexing, which you can see in
the Recent Log Entries report. The typical reason is that the item that is
being indexed cannot be viewed by the user role you chose for indexing; other
errors are also possible. If this happens, Search by Page will still mark the
page as "indexed" in the search index, so that in the next cron run, it will
not try to index the page again and block working items from indexing. If you
ever want Search by Page to try indexing the failed pages again (after fixing
the cause of the error, presumably), there is a link to reset items with no
content in the index. This is located in the "Additional Actions" section of
the Search by Page configuration screen.
-- Setup and Configuration --
Search by Page does not know what the pages of your site are, so it doesn't
index anything by itself. You will need to enable and configure at least one
sub-module that lets you add paths to the search index, in order for this module
to do anything.
You will also need to set up one or more search "environments". Each environment
defines which paths are searchable, and has its own search URL and search block.
Four sub-modules are provided: "Paths" to index arbitrary paths to pages on your
site, "Nodes" to index content items of particular content types, "Users" to
index user profile pages for users of particular roles, and "Attachments" to
index files attached to content items.
The "Paths" sub-module is the most generic, but if you put a lot of paths in it,
your searches will run slower. (The technical reason is that each time someone
searches on your site, this module has to check whether that person has
permission to view each page in the list, to exclude pages the person doesn't
have permission to view from search results, and this has to be done via a PHP
loop rather than an SQL query because of how Drupal permissions work.)
IMPORTANT NOTE: If you are using Search by Page Paths, your database must be
set up with permission to create temporary tables.
The "Attachments" sub-module indexes the text in certain types of files that are
attached to content items via either the CCK FileField module or the core Upload
module. This requires "helper" programs to extract the text from file
attachments, and the helper programs are configured using the separate Search
Files module (which you can find at http://drupal.org/project/search_files). It
is recommended that you download Search Files 6.x-2.x (which may be a "Beta"
version, but that is OK), and only enable the Search Files API module (and not
the other included modules). This will enable just the helper program setup
functionality, without enabling the other functionality of Search Files.
If you want to write your own sub-modules, see the search_by_page.api.php file
included with this module (or use one of the included sub-modules as an
example).
Once you have enabled sub-module(s), visit the path
admin/settings/search_by_page to set up search environments and define pages to
index for each environment. Then wait for cron to run (or visit the status
report page, admin/reports/status, and click on "run cron manually"). No pages
will be indexed until cron has run, and no search results will come out until
pages have been indexed.
Other configuration options:
* You can change various labels and other text on the Search by Page
configuration pages.
* You can set the number of items Search by Page will index per cron run on
the Search by Page configuration pages. This is independent of the indexing
settings for the core Search module. If you are using Search by Page as an
independent search (rather than as a tab on the core Search page -- see section
below), you might want to set the core Search settings cron limit to zero, so
that only Search by Page items are added to the search index. But if you do
that, the % indexed reported on the Search page will never reach 100%, because
the core content items will never be indexed. So you will need to check the more
detailed status report farther down on that page.
* You can control the reindex cycling described in the How it Works section
above by using the minimum/maximum reindexing time settings, which are on a
per-module, per-environment basis. Setting the minimum reindexing time forces
Search by Page to wait at least this amount of time before reindexing that type
of page. Setting the maximum reindex time forces Search by Page to reindex that
type of page immediately when this amount of time has passed. WARNING: Do not
choose too small of a maximum reindex page globally! This setting works by
marking the pages for immediate reindexing when this time has passed, and it can
interfere with the reindexing of new content.
* You can exclude the contents of specific HTML tags from indexing.
* You will also need to set permissions, which are separate from the
core Search permissions.
* You should also visit the main Search configuration screen, where you can set
options such as the number of items to index each cron run for core Search
modules, and the minimum word size for searching. You can also watch the
progress of indexing on that page (there is a detailed table near the bottom in
the Search by Page section).
-- Usage --
You have two choices for how to use the module, once you have it set up:
a) There will be a new tab called "Pages" by default, included in Drupal's
built-in search page (in addition to the Content and Users tabs provided by
Search, and any other tabs added by other search modules). So, if a site visitor
performs a search from that tab, they will get the Search by Pages results. This
will use whichever search environment you have set as the default.
b) You can also use Search by Page as its own entity, which is probably what you
want to do, assuming that you think having three separate tabs called "Content",
"Pages", and "Users" is confusing in the core Search results page, and assuming
you have configured sub-modules so that all the site content you want people to
search is available from Search by Pages. To run Search by Page as its own
entity, enable the Search by Pages block for your search environments, and/or
add a link to the paths you have defined for your search environments to your
menu system.
-- Theming --
The search form that is used by Search by Page on search pages and search blocks
can be themed using the search-by-page-form.tpl.php file provided (copy that
file into your theme and modify it).
Search results are themed using the search-result.tpl.php (each result item) and
search-results.tpl.php (the list of results) theme files from the core Search
module (in directory modules/search in your Drupal installation). If you are
using Search by Page Attachments, there is an additional variable available
$result['related_node'], which gives you the node object that the attachment is
attached to.
The heading shown on the search results page can be themed by overriding
theme_search_by_page_results_title(). The markup and text shown when there are
no search results can be themed by overriding
theme_search_by_page_no_results(). Default versions of both functions are in the
search_by_page.module file (copy the function into your theme's template.php
file, replace the word 'theme' in the function name with the name of your
module, and modify the function.)
-- Users and roles --
When you set up content items, attachments, etc. for searching within Search by
Page, you will need to choose a role to use for search indexing. This will make
Search by Page render your pages from the point of view of a user with that
role.
In order to do this, assuming you have used a non-anonymous role, Search by Page
will create its own user accounts for internal use, which you will see on your
Users management page. For instance, if you set up Search by Page Nodes to index
from the point of view of role "My role", Search by Page will set up a user
called "sbp indexing My role" with role "My role". The users that Search by Page
sets up will always have their status set to "blocked". During search indexing,
the account is set to "active" only temporarily, and only for the indexing
process, so no one should ever be able to see these users except site
administrators.
-- Other suggestions --
The default behavior for Drupal's Search module (which is the technology used
for indexing/searching in Search by Page) is that only exact matches are
returned (except for the User search portion of Search, which matches substrings
of user names). For instance, this means that if you search for "quake", and a
page contains "quakes", "quaking", or "earthquake", it will not be matched.
To get around this limitation, I suggest using a "stemmer" module, such as
http://drupal.org/project/porterstemmer (You can search for "stemmer" on
drupal.org to find stemmers for other languages.) Stemmers enable matching on
inflected forms of words (verb forms, plurals, etc.), so they should give you
matches for "quaking" and "quakes" if you search for "quake". They wouldn't give
you a match for "earthquake", however.
-- Copyright and License --
Copyright 2009-2010 Jennifer Hodgdon, Poplar ProductivityWare LLC
Licensed under the GNU Public License
File
README.txt
View source
-
- Search by Page module for Drupal
-
- This module adds searching to the core Drupal search module that is oriented by
- page. It can be used as an additional tab in the core Search page, or you can
- display Search by Page separately (though it requires the core Search module to
- be enabled, because it uses Search for its indexing).
-
- Contents of this file:
- - How it works
- - Setup and Configuration
- - Usage
- - Theming
- - Users and roles
- - Other suggestions
- - Copyright and License
-
-
- -- How it works --
-
- The core Search module works by indexing your content whenever cron is run, and
- then looking in that index when someone requests a search on your site. The
- Content tab of Search indexes all the content items on your site, by loading the
- item and indexing the resulting body and comments. The User portion actually
- doesn't index users -- at search time, it just looks for a user name that
- matches (User Search doesn't look in any other profile fields, except that for
- an administrative user doing a user search, it looks in the email field as
- well). Other modules may add other tabs to Search as well.
-
- Search by Page, in contrast, indexes the content of pages on your site, which
- could be content pages, user profile pages, composites of content (such as
- Views), or pages that are generated by other modules. The indexing in Search by
- Page is done by first building the "content" region of each page to be indexed,
- in the language(s) appropriate to the page and with the viewpoint of the role
- you configure, and then adding that content to the Search index. Note that only
- the "content" region of the page is indexed, not the sidebars, header, footer,
- or other block regions in your theme. Also note that what is indexed is what is
- output by your theme for each page, in contrast to core Search, which does not
- depend on the theme's rendering of the page.
-
- Search by Page also restricts search results to the currently-enabled language.
- The core Search module only does this for content search, and only if you
- have the Internationalization module enabled.
-
- One other difference between Search by Page and the usual content search of the
- core Search module is in reindexing. Search by Page assumes that your page
- content might change over time, so it periodically reindexes the pages on your
- site, giving priority to pages that have been edited. In contrast, the core
- Search module assumes that if a content item hasn't been edited, it doesn't need
- to be reindexed. You have some additional control over this reindexing -- see
- the configuration section below.
-
- Your site may experience errors during content indexing, which you can see in
- the Recent Log Entries report. The typical reason is that the item that is
- being indexed cannot be viewed by the user role you chose for indexing; other
- errors are also possible. If this happens, Search by Page will still mark the
- page as "indexed" in the search index, so that in the next cron run, it will
- not try to index the page again and block working items from indexing. If you
- ever want Search by Page to try indexing the failed pages again (after fixing
- the cause of the error, presumably), there is a link to reset items with no
- content in the index. This is located in the "Additional Actions" section of
- the Search by Page configuration screen.
-
-
- -- Setup and Configuration --
-
- Search by Page does not know what the pages of your site are, so it doesn't
- index anything by itself. You will need to enable and configure at least one
- sub-module that lets you add paths to the search index, in order for this module
- to do anything.
-
- You will also need to set up one or more search "environments". Each environment
- defines which paths are searchable, and has its own search URL and search block.
-
- Four sub-modules are provided: "Paths" to index arbitrary paths to pages on your
- site, "Nodes" to index content items of particular content types, "Users" to
- index user profile pages for users of particular roles, and "Attachments" to
- index files attached to content items.
-
- The "Paths" sub-module is the most generic, but if you put a lot of paths in it,
- your searches will run slower. (The technical reason is that each time someone
- searches on your site, this module has to check whether that person has
- permission to view each page in the list, to exclude pages the person doesn't
- have permission to view from search results, and this has to be done via a PHP
- loop rather than an SQL query because of how Drupal permissions work.)
-
- IMPORTANT NOTE: If you are using Search by Page Paths, your database must be
- set up with permission to create temporary tables.
-
- The "Attachments" sub-module indexes the text in certain types of files that are
- attached to content items via either the CCK FileField module or the core Upload
- module. This requires "helper" programs to extract the text from file
- attachments, and the helper programs are configured using the separate Search
- Files module (which you can find at http://drupal.org/project/search_files). It
- is recommended that you download Search Files 6.x-2.x (which may be a "Beta"
- version, but that is OK), and only enable the Search Files API module (and not
- the other included modules). This will enable just the helper program setup
- functionality, without enabling the other functionality of Search Files.
-
- If you want to write your own sub-modules, see the search_by_page.api.php file
- included with this module (or use one of the included sub-modules as an
- example).
-
- Once you have enabled sub-module(s), visit the path
- admin/settings/search_by_page to set up search environments and define pages to
- index for each environment. Then wait for cron to run (or visit the status
- report page, admin/reports/status, and click on "run cron manually"). No pages
- will be indexed until cron has run, and no search results will come out until
- pages have been indexed.
-
- Other configuration options:
- * You can change various labels and other text on the Search by Page
- configuration pages.
- * You can set the number of items Search by Page will index per cron run on
- the Search by Page configuration pages. This is independent of the indexing
- settings for the core Search module. If you are using Search by Page as an
- independent search (rather than as a tab on the core Search page -- see section
- below), you might want to set the core Search settings cron limit to zero, so
- that only Search by Page items are added to the search index. But if you do
- that, the % indexed reported on the Search page will never reach 100%, because
- the core content items will never be indexed. So you will need to check the more
- detailed status report farther down on that page.
- * You can control the reindex cycling described in the How it Works section
- above by using the minimum/maximum reindexing time settings, which are on a
- per-module, per-environment basis. Setting the minimum reindexing time forces
- Search by Page to wait at least this amount of time before reindexing that type
- of page. Setting the maximum reindex time forces Search by Page to reindex that
- type of page immediately when this amount of time has passed. WARNING: Do not
- choose too small of a maximum reindex page globally! This setting works by
- marking the pages for immediate reindexing when this time has passed, and it can
- interfere with the reindexing of new content.
- * You can exclude the contents of specific HTML tags from indexing.
- * You will also need to set permissions, which are separate from the
- core Search permissions.
- * You should also visit the main Search configuration screen, where you can set
- options such as the number of items to index each cron run for core Search
- modules, and the minimum word size for searching. You can also watch the
- progress of indexing on that page (there is a detailed table near the bottom in
- the Search by Page section).
-
-
- -- Usage --
-
- You have two choices for how to use the module, once you have it set up:
-
- a) There will be a new tab called "Pages" by default, included in Drupal's
- built-in search page (in addition to the Content and Users tabs provided by
- Search, and any other tabs added by other search modules). So, if a site visitor
- performs a search from that tab, they will get the Search by Pages results. This
- will use whichever search environment you have set as the default.
-
- b) You can also use Search by Page as its own entity, which is probably what you
- want to do, assuming that you think having three separate tabs called "Content",
- "Pages", and "Users" is confusing in the core Search results page, and assuming
- you have configured sub-modules so that all the site content you want people to
- search is available from Search by Pages. To run Search by Page as its own
- entity, enable the Search by Pages block for your search environments, and/or
- add a link to the paths you have defined for your search environments to your
- menu system.
-
-
- -- Theming --
-
- The search form that is used by Search by Page on search pages and search blocks
- can be themed using the search-by-page-form.tpl.php file provided (copy that
- file into your theme and modify it).
-
- Search results are themed using the search-result.tpl.php (each result item) and
- search-results.tpl.php (the list of results) theme files from the core Search
- module (in directory modules/search in your Drupal installation). If you are
- using Search by Page Attachments, there is an additional variable available
- $result['related_node'], which gives you the node object that the attachment is
- attached to.
-
- The heading shown on the search results page can be themed by overriding
- theme_search_by_page_results_title(). The markup and text shown when there are
- no search results can be themed by overriding
- theme_search_by_page_no_results(). Default versions of both functions are in the
- search_by_page.module file (copy the function into your theme's template.php
- file, replace the word 'theme' in the function name with the name of your
- module, and modify the function.)
-
-
- -- Users and roles --
-
- When you set up content items, attachments, etc. for searching within Search by
- Page, you will need to choose a role to use for search indexing. This will make
- Search by Page render your pages from the point of view of a user with that
- role.
-
- In order to do this, assuming you have used a non-anonymous role, Search by Page
- will create its own user accounts for internal use, which you will see on your
- Users management page. For instance, if you set up Search by Page Nodes to index
- from the point of view of role "My role", Search by Page will set up a user
- called "sbp indexing My role" with role "My role". The users that Search by Page
- sets up will always have their status set to "blocked". During search indexing,
- the account is set to "active" only temporarily, and only for the indexing
- process, so no one should ever be able to see these users except site
- administrators.
-
-
- -- Other suggestions --
-
- The default behavior for Drupal's Search module (which is the technology used
- for indexing/searching in Search by Page) is that only exact matches are
- returned (except for the User search portion of Search, which matches substrings
- of user names). For instance, this means that if you search for "quake", and a
- page contains "quakes", "quaking", or "earthquake", it will not be matched.
-
- To get around this limitation, I suggest using a "stemmer" module, such as
- http://drupal.org/project/porterstemmer (You can search for "stemmer" on
- drupal.org to find stemmers for other languages.) Stemmers enable matching on
- inflected forms of words (verb forms, plurals, etc.), so they should give you
- matches for "quaking" and "quakes" if you search for "quake". They wouldn't give
- you a match for "earthquake", however.
-
-
- -- Copyright and License --
-
- Copyright 2009-2010 Jennifer Hodgdon, Poplar ProductivityWare LLC
-
- Licensed under the GNU Public License