You are here

protected static function TokenizerTest::codepointToUtf8 in Search API 8

Converts a Unicode code point to a UTF-8 string.

The PHP function "chr()" only works for ASCII characters up to character 255. This function converts a number to the corresponding unicode character. Adapted from functions supplied in comments on several functions on php.net.

Parameters

int $num: A Unicode code point.

Return value

string A UTF-8 string containing the character corresponding to that code point.

1 call to TokenizerTest::codepointToUtf8()
TokenizerTest::testCjkSupport in tests/src/Unit/Processor/TokenizerTest.php
Tests that the simplifyText() method handles CJK characters properly.

File

tests/src/Unit/Processor/TokenizerTest.php, line 273

Class

TokenizerTest
Tests the "Tokenizer" processor.

Namespace

Drupal\Tests\search_api\Unit\Processor

Code

protected static function codepointToUtf8($num) {
  if ($num < 128) {
    return chr($num);
  }
  if ($num < 2048) {
    return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
  }
  if ($num < 65536) {
    return chr(($num >> 12) + 224) . chr(($num >> 6 & 63) + 128) . chr(($num & 63) + 128);
  }
  if ($num < 2097152) {
    return chr(($num >> 18) + 240) . chr(($num >> 12 & 63) + 128) . chr(($num >> 6 & 63) + 128) . chr(($num & 63) + 128);
  }
  return '';
}