function apachesolr_index_add_tags_to_document in Apache Solr Search 7
Same name and namespace in other branches
- 8 apachesolr.index.inc \apachesolr_index_add_tags_to_document()
- 6.3 apachesolr.index.inc \apachesolr_index_add_tags_to_document()
Extract HTML tag contents from $text and add to boost fields.
Parameters
ApacheSolrDocument $document:
string $text: must be stripped of control characters before hand.
1 call to apachesolr_index_add_tags_to_document()
- apachesolr_index_node_solr_document in ./
apachesolr.index.inc - Builds the node-specific information for a Solr document.
File
- ./
apachesolr.index.inc, line 405 - Functions related to Apache Solr indexing operations.
Code
function apachesolr_index_add_tags_to_document(ApacheSolrDocument $document, $text) {
$tags_to_index = _apachesolr_tags_to_index();
// Strip off all ignored tags.
$allowed_tags = '<' . implode('><', array_keys($tags_to_index)) . '>';
$text = strip_tags($text, $allowed_tags);
preg_match_all('@<(' . implode('|', array_keys($tags_to_index)) . ')[^>]*>(.*)</\\1>@Ui', $text, $matches);
foreach ($matches[1] as $key => $tag) {
$tag = drupal_strtolower($tag);
// We don't want to index links auto-generated by the url filter.
if ($tag != 'a' || !preg_match('@(?:http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\\.)[a-zA-Z0-9]+@', $matches[2][$key])) {
if (!isset($document->{$tags_to_index[$tag]})) {
$document->{$tags_to_index[$tag]} = '';
}
$document->{$tags_to_index[$tag]} .= ' ' . apachesolr_clean_text($matches[2][$key]);
}
}
}