You are here

function apachesolr_index_add_tags_to_document in Apache Solr Search 7

Same name and namespace in other branches
  1. 8 apachesolr.index.inc \apachesolr_index_add_tags_to_document()
  2. 6.3 apachesolr.index.inc \apachesolr_index_add_tags_to_document()

Extract HTML tag contents from $text and add to boost fields.

Parameters

ApacheSolrDocument $document:

string $text: must be stripped of control characters before hand.

1 call to apachesolr_index_add_tags_to_document()
apachesolr_index_node_solr_document in ./apachesolr.index.inc
Builds the node-specific information for a Solr document.

File

./apachesolr.index.inc, line 405
Functions related to Apache Solr indexing operations.

Code

function apachesolr_index_add_tags_to_document(ApacheSolrDocument $document, $text) {
  $tags_to_index = _apachesolr_tags_to_index();

  // Strip off all ignored tags.
  $allowed_tags = '<' . implode('><', array_keys($tags_to_index)) . '>';
  $text = strip_tags($text, $allowed_tags);
  preg_match_all('@<(' . implode('|', array_keys($tags_to_index)) . ')[^>]*>(.*)</\\1>@Ui', $text, $matches);
  foreach ($matches[1] as $key => $tag) {
    $tag = drupal_strtolower($tag);

    // We don't want to index links auto-generated by the url filter.
    if ($tag != 'a' || !preg_match('@(?:http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\\.)[a-zA-Z0-9]+@', $matches[2][$key])) {
      if (!isset($document->{$tags_to_index[$tag]})) {
        $document->{$tags_to_index[$tag]} = '';
      }
      $document->{$tags_to_index[$tag]} .= ' ' . apachesolr_clean_text($matches[2][$key]);
    }
  }
}