function drupal_validate_utf8 in Drupal 4
Same name and namespace in other branches
- 8 core/includes/bootstrap.inc \drupal_validate_utf8()
- 5 includes/bootstrap.inc \drupal_validate_utf8()
- 6 includes/bootstrap.inc \drupal_validate_utf8()
- 7 includes/bootstrap.inc \drupal_validate_utf8()
Checks whether a string is valid UTF-8.
All functions designed to filter input should use drupal_validate_utf8 to ensure they operate on valid UTF-8 strings to prevent bypass of the filter.
When text containing an invalid UTF-8 lead byte (0xC0 - 0xFF) is presented as UTF-8 to Internet Explorer 6, the program may misinterpret subsequent bytes. When these subsequent bytes are HTML control characters such as quotes or angle brackets, parts of the text that were deemed safe by filters end up in locations that are potentially unsafe; An onerror attribute that is outside of a tag, and thus deemed safe by a filter, can be interpreted by the browser as if it were inside the tag.
This function exploits preg_match behaviour (since PHP 4.3.5) when used with the u modifier as a fast way to find invalid UTF-8. When the matched string contains invalid byte sequences, it will fail silently.
preg_match may not fail on 4 and 5 octet sequences, even though they are not supported by the specification.
The specific preg_match behaviour is present
Parameters
$text: The text to check.
Return value
TRUE if the text is valid UTF-8, FALSE if not.
2 calls to drupal_validate_utf8()
- check_plain in includes/
bootstrap.inc - Encode special characters in a plain-text string for display as HTML.
- filter_xss in modules/
filter.module - Filters XSS. Based on kses by Ulf Harnhammar, see http://sourceforge.net/projects/kses
File
- includes/
bootstrap.inc, line 650 - Functions that need to be loaded on every Drupal request.
Code
function drupal_validate_utf8($text) {
if (strlen($text) == 0) {
return TRUE;
}
return preg_match('/^./us', $text) == 1;
}