You are here

public static function Unicode::validateUtf8 in Drupal 8

Same name and namespace in other branches
  1. 9 core/lib/Drupal/Component/Utility/Unicode.php \Drupal\Component\Utility\Unicode::validateUtf8()

Checks whether a string is valid UTF-8.

All functions designed to filter input should use drupal_validate_utf8 to ensure they operate on valid UTF-8 strings to prevent bypass of the filter.

When text containing an invalid UTF-8 lead byte (0xC0 - 0xFF) is presented as UTF-8 to Internet Explorer 6, the program may misinterpret subsequent bytes. When these subsequent bytes are HTML control characters such as quotes or angle brackets, parts of the text that were deemed safe by filters end up in locations that are potentially unsafe; An onerror attribute that is outside of a tag, and thus deemed safe by a filter, can be interpreted by the browser as if it were inside the tag.

The function does not return FALSE for strings containing character codes above U+10FFFF, even though these are prohibited by RFC 3629.

Parameters

string $text: The text to check.

Return value

bool TRUE if the text is valid UTF-8, FALSE if not.

4 calls to Unicode::validateUtf8()
drupal_validate_utf8 in core/includes/bootstrap.inc
Checks whether a string is valid UTF-8.
FileSystem::getDestinationFilename in core/lib/Drupal/Core/File/FileSystem.php
Determines the destination path for a file.
UnicodeTest::testValidateUtf8 in core/tests/Drupal/Tests/Component/Utility/UnicodeTest.php
Tests UTF-8 validation.
Xss::filter in core/lib/Drupal/Component/Utility/Xss.php
Filters HTML to prevent cross-site-scripting (XSS) vulnerabilities.

File

core/lib/Drupal/Component/Utility/Unicode.php, line 595

Class

Unicode
Provides Unicode-related conversions and operations.

Namespace

Drupal\Component\Utility

Code

public static function validateUtf8($text) {
  if (strlen($text) == 0) {
    return TRUE;
  }

  // With the PCRE_UTF8 modifier 'u', preg_match() fails silently on strings
  // containing invalid UTF-8 byte sequences. It does not reject character
  // codes above U+10FFFF (represented by 4 or more octets), though.
  return preg_match('/^./us', $text) == 1;
}