You are here

protected function HtmlFilter::parseHtml in Search API 8

Tokenizes an HTML string according to the HTML elements.

Assigns boost values to the elements' contents accordingly.

Parameters

string $text: The HTML string to parse, passed by reference. After the method call, the variable will contain the portion of the string after the current element, or an empty string (if there is no current element).

string|null $active_tag: (optional) The currently active tag, for which a closing tag has to be found. Internal use only.

float $boost: (optional) The currently active boost value. Internal use only.

Return value

\Drupal\search_api\Plugin\search_api\data_type\value\TextTokenInterface[] Tokenized text with appropriate scores.

1 call to HtmlFilter::parseHtml()
HtmlFilter::processFieldValue in src/Plugin/search_api/processor/HtmlFilter.php
Processes a single text element in a field.

File

src/Plugin/search_api/processor/HtmlFilter.php, line 246

Class

HtmlFilter
Strips HTML tags from fulltext fields and decodes HTML entities.

Namespace

Drupal\search_api\Plugin\search_api\processor

Code

protected function parseHtml(&$text, $active_tag = NULL, $boost = 1.0) {
  $ret = [];
  while (($pos = strpos($text, '<')) !== FALSE) {
    $text_before = substr($text, 0, $pos);
    $text_after = substr($text, $pos + 1);

    // Attempt some small error tolerance when literal "<" characters aren't
    // escaped properly (and are free-standing).
    if (!preg_match('#^(/?)([-:_a-zA-Z0-9]+)#', $text_after, $m)) {
      $text = $text_before . '&lt;' . $text_after;
      continue;
    }
    if ($boost && $pos > 0) {
      $value = $this
        ->normalizeText($text_before);
      if ($value !== '') {
        $ret[] = Utility::createTextToken($value, $boost);
      }
    }
    $text = $text_after;
    $pos = strpos($text, '>');
    $empty_tag = $text[$pos - 1] == '/';
    $text = substr($text, $pos + 1);
    if ($m[1]) {

      // Closing tag.
      if ($active_tag && $m[2] == $active_tag) {
        return $ret;
      }
    }
    elseif (!$empty_tag) {

      // Opening tag => recursive call.
      $inner_boost = $boost * ($this->configuration['tags'][$m[2]] ?? 1);
      $ret = array_merge($ret, $this
        ->parseHtml($text, $m[2], $inner_boost));
    }
  }
  if ($text) {
    $value = $this
      ->normalizeText($text);
    if ($value !== '') {
      $ret[] = Utility::createTextToken($value, $boost);
    }
    $text = '';
  }
  return $ret;
}