You are here

public function FilterHtml::getHTMLRestrictions in Drupal 8

Same name and namespace in other branches
  1. 9 core/modules/filter/src/Plugin/Filter/FilterHtml.php \Drupal\filter\Plugin\Filter\FilterHtml::getHTMLRestrictions()
  2. 10 core/modules/filter/src/Plugin/Filter/FilterHtml.php \Drupal\filter\Plugin\Filter\FilterHtml::getHTMLRestrictions()

Returns HTML allowed by this filter's configuration.

May be implemented by filters of the FilterInterface::TYPE_HTML_RESTRICTOR type, this won't be used for filters of other types; they should just return FALSE.

This callback function is only necessary for filters that strip away HTML tags (and possibly attributes) and allows other modules to gain insight in a generic manner into which HTML tags and attributes are allowed by a format.

Return value

array|false A nested array with *either* of the following keys:

  • 'allowed': (optional) the allowed tags as keys, and for each of those tags (keys) either of the following values:

    • TRUE to indicate any attribute is allowed
    • FALSE to indicate no attributes are allowed
    • an array to convey attribute restrictions: the keys must be attribute names (which may use a wildcard, e.g. "data-*"), the possible values are similar to the above:

      • TRUE to indicate any attribute value is allowed
      • FALSE to indicate the attribute is forbidden
      • an array to convey attribute value restrictions: the key must be attribute values (which may use a wildcard, e.g. "xsd:*"), the possible values are TRUE or FALSE: to mark the attribute value as allowed or forbidden, respectively
  • 'forbidden_tags': (optional) the forbidden tags

There is one special case: the "wildcard tag", "*": any attribute restrictions on that pseudotag apply to all tags.

If no restrictions apply, then FALSE must be returned.

Here is a concrete example, for a very granular filter:

array(
  'allowed' => array(
    // Allows any attribute with any value on the <div> tag.
    'div' => TRUE,
    // Allows no attributes on the <p> tag.
    'p' => FALSE,
    // Allows the following attributes on the <a> tag:
    //  - 'href', with any value;
    //  - 'rel', with the value 'nofollow' value.
    'a' => array(
      'href' => TRUE,
      'rel' => array(
        'nofollow' => TRUE,
      ),
    ),
    // Only allows the 'src' and 'alt' attributes on the <alt> tag,
    // with any value.
    'img' => array(
      'src' => TRUE,
      'alt' => TRUE,
    ),
    // Allow RDFa on <span> tags, using only the dc, foaf, xsd and sioc
    // vocabularies/namespaces.
    'span' => array(
      'property' => array(
        'dc:*' => TRUE,
        'foaf:*' => TRUE,
      ),
      'datatype' => array(
        'xsd:*' => TRUE,
      ),
      'rel' => array(
        'sioc:*' => TRUE,
      ),
    ),
    // Forbid the 'style' and 'on*' ('onClick' etc.) attributes on any
    // tag.
    '*' => array(
      'style' => FALSE,
      'on*' => FALSE,
    ),
  ),
);

A simpler example, for a very coarse filter:

array(
  'forbidden_tags' => array(
    'iframe',
    'script',
  ),
);

The simplest example possible: a filter that doesn't allow any HTML:

array(
  'allowed' => array(),
);

And for a filter that applies no restrictions, i.e. allows any HTML:

FALSE;

Overrides FilterBase::getHTMLRestrictions

See also

\Drupal\filter\Entity\FilterFormatInterface::getHtmlRestrictions()

1 call to FilterHtml::getHTMLRestrictions()
FilterHtml::filterAttributes in core/modules/filter/src/Plugin/Filter/FilterHtml.php
Provides filtering of tag attributes into accepted HTML.

File

core/modules/filter/src/Plugin/Filter/FilterHtml.php, line 245

Class

FilterHtml
Provides a filter to limit allowed HTML tags.

Namespace

Drupal\filter\Plugin\Filter

Code

public function getHTMLRestrictions() {
  if ($this->restrictions) {
    return $this->restrictions;
  }

  // Parse the allowed HTML setting, and gradually make the list of allowed
  // tags more specific.
  $restrictions = [
    'allowed' => [],
  ];

  // Make all the tags self-closing, so they will be parsed into direct
  // children of the body tag in the DomDocument.
  $html = str_replace('>', ' />', $this->settings['allowed_html']);

  // Protect any trailing * characters in attribute names, since DomDocument
  // strips them as invalid.
  $star_protector = '__zqh6vxfbk3cg__';
  $html = str_replace('*', $star_protector, $html);
  $body_child_nodes = Html::load($html)
    ->getElementsByTagName('body')
    ->item(0)->childNodes;
  foreach ($body_child_nodes as $node) {
    if ($node->nodeType !== XML_ELEMENT_NODE) {

      // Skip the empty text nodes inside tags.
      continue;
    }
    $tag = $node->tagName;
    if ($node
      ->hasAttributes()) {

      // Mark the tag as allowed, assigning TRUE for each attribute name if
      // all values are allowed, or an array of specific allowed values.
      $restrictions['allowed'][$tag] = [];

      // Iterate over any attributes, and mark them as allowed.
      foreach ($node->attributes as $name => $attribute) {

        // Put back any trailing * on wildcard attribute name.
        $name = str_replace($star_protector, '*', $name);

        // Put back any trailing * on wildcard attribute value and parse out
        // the allowed attribute values.
        $allowed_attribute_values = preg_split('/\\s+/', str_replace($star_protector, '*', $attribute->value), -1, PREG_SPLIT_NO_EMPTY);

        // Sanitize the attribute value: it lists the allowed attribute values
        // but one allowed attribute value that some may be tempted to use
        // is specifically nonsensical: the asterisk. A prefix is required for
        // allowed attribute values with a wildcard. A wildcard by itself
        // would mean allowing all possible attribute values. But in that
        // case, one would not specify an attribute value at all.
        $allowed_attribute_values = array_filter($allowed_attribute_values, function ($value) use ($star_protector) {
          return $value !== '*';
        });
        if (empty($allowed_attribute_values)) {

          // If the value is the empty string all values are allowed.
          $restrictions['allowed'][$tag][$name] = TRUE;
        }
        else {

          // A non-empty attribute value is assigned, mark each of the
          // specified attribute values as allowed.
          foreach ($allowed_attribute_values as $value) {
            $restrictions['allowed'][$tag][$name][$value] = TRUE;
          }
        }
      }
    }
    else {

      // Mark the tag as allowed, but with no attributes allowed.
      $restrictions['allowed'][$tag] = FALSE;
    }
  }

  // The 'style' and 'on*' ('onClick' etc.) attributes are always forbidden,
  // and are removed by Xss::filter().
  // The 'lang', and 'dir' attributes apply to all elements and are always
  // allowed. The list of allowed values for the 'dir' attribute is enforced
  // by self::filterAttributes(). Note that those two attributes are in the
  // short list of globally usable attributes in HTML5. They are always
  // allowed since the correct values of lang and dir may only be known to
  // the content author. Of the other global attributes, they are not usually
  // added by hand to content, and especially the class attribute can have
  // undesired visual effects by allowing content authors to apply any
  // available style, so specific values should be explicitly allowed.
  // @see http://www.w3.org/TR/html5/dom.html#global-attributes
  $restrictions['allowed']['*'] = [
    'style' => FALSE,
    'on*' => FALSE,
    'lang' => TRUE,
    'dir' => [
      'ltr' => TRUE,
      'rtl' => TRUE,
    ],
  ];

  // Save this calculated result for re-use.
  $this->restrictions = $restrictions;
  return $restrictions;
}