You are here

protected function HtmlFilter::processFieldValue in Search API 8

Processes a single text element in a field.

The default implementation just calls process().

Parameters

string $value: The string value to preprocess, as a reference. Can be manipulated directly, nothing has to be returned. Can either be left a string, or changed into an array of \Drupal\search_api\Plugin\search_api\data_type\value\TextTokenInterface objects. Returning anything else will result in undefined behavior.

string $type: The field's data type.

Overrides FieldsProcessorPluginBase::processFieldValue

File

src/Plugin/search_api/processor/HtmlFilter.php, line 195

Class

HtmlFilter
Strips HTML tags from fulltext fields and decodes HTML entities.

Namespace

Drupal\search_api\Plugin\search_api\processor

Code

protected function processFieldValue(&$value, $type) {

  // Remove invisible content.
  $text = preg_replace('@<(applet|audio|canvas|command|embed|iframe|map|menu|noembed|noframes|noscript|script|style|svg|video)[^>]*>.*</\\1>@siU', ' ', $value);

  // Let removed tags still delimit words.
  $is_text_type = $this
    ->getDataTypeHelper()
    ->isTextType($type);
  if ($is_text_type) {
    $text = str_replace([
      '<',
      '>',
    ], [
      ' <',
      '> ',
    ], $text);
    if ($this->configuration['title']) {
      $text = preg_replace('/(<[-a-z_]+[^>]*["\\s])title\\s*=\\s*("([^"]+)"|\'([^\']+)\')([^>]*>)/i', '$1 $5 $3$4 ', $text);
    }
    if ($this->configuration['alt']) {
      $text = preg_replace('/<[-a-z_]+[^>]*["\\s]alt\\s*=\\s*("([^"]+)"|\'([^\']+)\')[^>]*>/i', ' <img>$2$3</img> ', $text);
    }
  }
  if ($this->configuration['tags'] && $is_text_type) {
    $text = strip_tags($text, '<' . implode('><', array_keys($this->configuration['tags'])) . '>');
    $value = $this
      ->parseHtml($text);
  }
  else {
    $text = strip_tags($text);
    $value = $this
      ->normalizeText(trim($text));
  }
}