function _strip_symbols in Bibliography Module 6.2
Same name and namespace in other branches
- 7 includes/biblio.util.inc \_strip_symbols()
- 7.2 includes/biblio.util.inc \_strip_symbols()
Copyright (c) 2008, David R. Nadeau, NadeauSoftware.com. All rights reserved.
Strip symbol characters from UTF-8 text.
Characters stripped from the text include characters in the following Unicode categories:
Modifier symbols Private use symbols Math symbols Other symbols
Exceptions are made for math symbols embedded within numbers (such as + - /), math symbols used within URLs (such as = ~), units of measure symbols, and ideograph parts. Currency symbols are not removed.
Parameters: text the UTF-8 text to strip
Return values: the stripped UTF-8 text.
See also: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_symbol_char...
File
- includes/
biblio.util.inc, line 352
Code
function _strip_symbols($text) {
$plus = '\\+\\x{FE62}\\x{FF0B}\\x{208A}\\x{207A}';
$minus = '\\x{2012}\\x{208B}\\x{207B}';
$units = '\\x{00B0}\\x{2103}\\x{2109}\\x{23CD}';
$units .= '\\x{32CC}-\\x{32CE}';
$units .= '\\x{3300}-\\x{3357}';
$units .= '\\x{3371}-\\x{33DF}';
$units .= '\\x{33FF}';
$ideo = '\\x{2E80}-\\x{2EF3}';
$ideo .= '\\x{2F00}-\\x{2FD5}';
$ideo .= '\\x{2FF0}-\\x{2FFB}';
$ideo .= '\\x{3037}-\\x{303F}';
$ideo .= '\\x{3190}-\\x{319F}';
$ideo .= '\\x{31C0}-\\x{31CF}';
$ideo .= '\\x{32C0}-\\x{32CB}';
$ideo .= '\\x{3358}-\\x{3370}';
$ideo .= '\\x{33E0}-\\x{33FE}';
$ideo .= '\\x{A490}-\\x{A4C6}';
return preg_replace(array(
// Remove modifier and private use symbols.
'/[\\p{Sk}\\p{Co}]/u',
// Remove math symbols except + - = ~ and fraction slash
'/\\p{Sm}(?<![' . $plus . $minus . '=~\\x{2044}])/u',
// Remove + - if space before, no number or currency after
'/((?<= )|^)[' . $plus . $minus . ']+((?![\\p{N}\\p{Sc}])|$)/u',
// Remove = if space before
'/((?<= )|^)=+/u',
// Remove + - = ~ if space after
'/[' . $plus . $minus . '=~]+((?= )|$)/u',
// Remove other symbols except units and ideograph parts
'/\\p{So}(?<![' . $units . $ideo . '])/u',
// Remove consecutive white space
'/ +/',
), ' ', $text);
}