public static function SmartyPants::tokenizeHtml in Typogrify 8
Fallback Tokenizer if Markdown not present.
Regular expression derived from the _tokenize() subroutine in Brad Choate's MTRegex plugin. <http://www.bradchoate.com/past/mtregex.php>
Parameters
string $str: String containing HTML markup.
Return value
array An array of the tokens comprising the input string. Each token is either a tag (possibly with nested, tags contained therein, such as <a href="<MTFoo>" />, or a run of text between tags. Each element of the array is a two-element array; the first is either 'tag' or 'text'; the second is the actual value.
12 calls to SmartyPants::tokenizeHtml()
- SmartyPants::hyphenate in src/
SmartyPants.php - Replaces '=' with '­' for easier manual hyphenating.
- SmartyPants::process in src/
SmartyPants.php - SmartyPants.
- SmartyPants::smartAbbreviation in src/
SmartyPants.php - Wrapping abbreviations and adding half space between digit grouping.
- SmartyPants::smartAmpersand in src/
SmartyPants.php - Wrapping ampersands.
- SmartyPants::smartDashes in src/
SmartyPants.php - SmartDashes.
File
- src/
SmartyPants.php, line 1331
Class
- SmartyPants
- SmartyPants - Smart punctuation for web sites.
Namespace
Drupal\typogrifyCode
public static function tokenizeHtml($str) {
$index = 0;
$tokens = [];
// Comment
// Processing instruction
// Regular tags.
$match = '(?s:<!(?:--.*?--\\s*)+>)|';
$match .= '(?s:<\\?.*?\\?>)|';
$match .= '(?:<[/!$]?[-a-zA-Z0-9:]+\\b(?>[^"\'>]+|"[^"]*"|\'[^\']*\')*>)';
$parts = preg_split("{({$match})}", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($parts as $part) {
if (++$index % 2 && $part != '') {
$tokens[] = [
'text',
$part,
];
}
else {
$tokens[] = [
'tag',
$part,
];
}
}
return $tokens;
}