You are here

public function PhpTransliteration::transliterate in Drupal 9

Same name and namespace in other branches
  1. 8 core/lib/Drupal/Component/Transliteration/PhpTransliteration.php \Drupal\Component\Transliteration\PhpTransliteration::transliterate()

Transliterates text from Unicode to US-ASCII.

Parameters

string $string: The string to transliterate.

string $langcode: (optional) The language code of the language the string is in. Defaults to 'en' if not provided. Warning: this can be unfiltered user input.

string $unknown_character: (optional) The character to substitute for characters in $string without transliterated equivalents. Defaults to '?'.

int $max_length: (optional) If provided, return at most this many characters, ensuring that the transliteration does not split in the middle of an input character's transliteration.

Return value

string $string with non-US-ASCII characters transliterated to US-ASCII characters, and unknown characters replaced with $unknown_character.

Overrides TransliterationInterface::transliterate

File

core/lib/Drupal/Component/Transliteration/PhpTransliteration.php, line 125

Class

PhpTransliteration
Implements transliteration without using the PECL extensions.

Namespace

Drupal\Component\Transliteration

Code

public function transliterate($string, $langcode = 'en', $unknown_character = '?', $max_length = NULL) {
  $result = '';
  $length = 0;
  $hash = FALSE;

  // Replace question marks with a unique hash if necessary. This because
  // mb_convert_encoding() replaces all invalid characters with a question
  // mark.
  if ($unknown_character != '?' && strpos($string, '?') !== FALSE) {
    $hash = hash('sha256', $string);
    $string = str_replace('?', $hash, $string);
  }

  // Ensure the string is valid UTF8 for preg_split(). Unknown characters will
  // be replaced by a question mark.
  $string = mb_convert_encoding($string, 'UTF-8', 'UTF-8');

  // Use the provided unknown character instead of a question mark.
  if ($unknown_character != '?') {
    $string = str_replace('?', $unknown_character, $string);

    // Restore original question marks if necessary.
    if ($hash !== FALSE) {
      $string = str_replace($hash, '?', $string);
    }
  }

  // Split into Unicode characters and transliterate each one.
  foreach (preg_split('//u', $string, 0, PREG_SPLIT_NO_EMPTY) as $character) {
    $code = self::ordUTF8($character);
    if ($code == -1) {
      $to_add = $unknown_character;
    }
    else {
      $to_add = $this
        ->replace($code, $langcode, $unknown_character);
    }

    // Check if this exceeds the maximum allowed length.
    if (isset($max_length)) {
      $length += strlen($to_add);
      if ($length > $max_length) {

        // There is no more space.
        return $result;
      }
    }
    $result .= $to_add;
  }
  return $result;
}