protected static function TokenizerTest::codepointToUtf8 in Search API 8
Converts a Unicode code point to a UTF-8 string.
The PHP function "chr()" only works for ASCII characters up to character 255. This function converts a number to the corresponding unicode character. Adapted from functions supplied in comments on several functions on php.net.
Parameters
int $num: A Unicode code point.
Return value
string A UTF-8 string containing the character corresponding to that code point.
1 call to TokenizerTest::codepointToUtf8()
- TokenizerTest::testCjkSupport in tests/
src/ Unit/ Processor/ TokenizerTest.php - Tests that the simplifyText() method handles CJK characters properly.
File
- tests/
src/ Unit/ Processor/ TokenizerTest.php, line 273
Class
- TokenizerTest
- Tests the "Tokenizer" processor.
Namespace
Drupal\Tests\search_api\Unit\ProcessorCode
protected static function codepointToUtf8($num) {
if ($num < 128) {
return chr($num);
}
if ($num < 2048) {
return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
}
if ($num < 65536) {
return chr(($num >> 12) + 224) . chr(($num >> 6 & 63) + 128) . chr(($num & 63) + 128);
}
if ($num < 2097152) {
return chr(($num >> 18) + 240) . chr(($num >> 12 & 63) + 128) . chr(($num >> 6 & 63) + 128) . chr(($num & 63) + 128);
}
return '';
}