protected function HtmlFilter::parseHtml in Search API 8
Tokenizes an HTML string according to the HTML elements.
Assigns boost values to the elements' contents accordingly.
Parameters
string $text: The HTML string to parse, passed by reference. After the method call, the variable will contain the portion of the string after the current element, or an empty string (if there is no current element).
string|null $active_tag: (optional) The currently active tag, for which a closing tag has to be found. Internal use only.
float $boost: (optional) The currently active boost value. Internal use only.
Return value
\Drupal\search_api\Plugin\search_api\data_type\value\TextTokenInterface[] Tokenized text with appropriate scores.
1 call to HtmlFilter::parseHtml()
- HtmlFilter::processFieldValue in src/
Plugin/ search_api/ processor/ HtmlFilter.php - Processes a single text element in a field.
File
- src/
Plugin/ search_api/ processor/ HtmlFilter.php, line 246
Class
- HtmlFilter
- Strips HTML tags from fulltext fields and decodes HTML entities.
Namespace
Drupal\search_api\Plugin\search_api\processorCode
protected function parseHtml(&$text, $active_tag = NULL, $boost = 1.0) {
$ret = [];
while (($pos = strpos($text, '<')) !== FALSE) {
$text_before = substr($text, 0, $pos);
$text_after = substr($text, $pos + 1);
// Attempt some small error tolerance when literal "<" characters aren't
// escaped properly (and are free-standing).
if (!preg_match('#^(/?)([-:_a-zA-Z0-9]+)#', $text_after, $m)) {
$text = $text_before . '<' . $text_after;
continue;
}
if ($boost && $pos > 0) {
$value = $this
->normalizeText($text_before);
if ($value !== '') {
$ret[] = Utility::createTextToken($value, $boost);
}
}
$text = $text_after;
$pos = strpos($text, '>');
$empty_tag = $text[$pos - 1] == '/';
$text = substr($text, $pos + 1);
if ($m[1]) {
// Closing tag.
if ($active_tag && $m[2] == $active_tag) {
return $ret;
}
}
elseif (!$empty_tag) {
// Opening tag => recursive call.
$inner_boost = $boost * ($this->configuration['tags'][$m[2]] ?? 1);
$ret = array_merge($ret, $this
->parseHtml($text, $m[2], $inner_boost));
}
}
if ($text) {
$value = $this
->normalizeText($text);
if ($value !== '') {
$ret[] = Utility::createTextToken($value, $boost);
}
$text = '';
}
return $ret;
}