You are here

function csl_name::get_utf8_regex in Bibliography Module 7.2

Same name and namespace in other branches
  1. 6.2 modules/CiteProc/CSL.inc \csl_name::get_utf8_regex()
  2. 7 modules/CiteProc/CSL.inc \csl_name::get_utf8_regex()
1 call to csl_name::get_utf8_regex()
csl_name::get_regex_patterns in modules/CiteProc/CSL.inc

File

modules/CiteProc/CSL.inc, line 725

Class

csl_name

Code

function get_utf8_regex() {

  // Matches Unicode letters & digits:
  $alnum = "\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}";

  // Unicode-aware equivalent of "[:alnum:]"
  // Matches Unicode letters:
  $alpha = "\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}";

  // Unicode-aware equivalent of "[:alpha:]"
  // Matches Unicode control codes & characters not in other categories:
  $cntrl = "\\p{C}";

  // Unicode-aware equivalent of "[:cntrl:]"
  // Matches Unicode dashes & hyphens:
  $dash = "\\p{Pd}";

  // Matches Unicode digits:
  $digit = "\\p{Nd}";

  // Unicode-aware equivalent of "[:digit:]"
  // Matches Unicode printing characters (excluding space):
  $graph = "^\\p{C}\t\n\f\r\\p{Z}";

  // Unicode-aware equivalent of "[:graph:]"
  // Matches Unicode lower case letters:
  $lower = "\\p{Ll}\\p{M}";

  // Unicode-aware equivalent of "[:lower:]"
  // Matches Unicode printing characters (including space):
  $print = "\\P{C}";

  // same as "^\p{C}", Unicode-aware equivalent of "[:print:]"
  // Matches Unicode punctuation (printing characters excluding letters & digits):
  $punct = "\\p{P}";

  // Unicode-aware equivalent of "[:punct:]"
  // Matches Unicode whitespace (separating characters with no visual representation):
  $space = "\t\n\f\r\\p{Z}";

  // Unicode-aware equivalent of "[:space:]"
  // Matches Unicode upper case letters:
  $upper = "\\p{Lu}\\p{Lt}";

  // Unicode-aware equivalent of "[:upper:]"
  // Matches Unicode "word" characters:
  $word = "_\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}";

  // Unicode-aware equivalent of "[:word:]" (or "[:alnum:]" plus "_")
  // Defines the PCRE pattern modifier(s) to be used in conjunction with the above variables:
  // More info: <http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php>
  $patternModifiers = "u";

  // the "u" (PCRE_UTF8) pattern modifier causes PHP/PCRE to treat pattern strings as UTF-8
  return array(
    $alnum,
    $alpha,
    $cntrl,
    $dash,
    $digit,
    $graph,
    $lower,
    $print,
    $punct,
    $space,
    $upper,
    $word,
    $patternModifiers,
  );
}