You are here

public static function SmartyPants::tokenizeHtml in Typogrify 8

Fallback Tokenizer if Markdown not present.

Regular expression derived from the _tokenize() subroutine in Brad Choate's MTRegex plugin. <http://www.bradchoate.com/past/mtregex.php>

Parameters

string $str: String containing HTML markup.

Return value

array An array of the tokens comprising the input string. Each token is either a tag (possibly with nested, tags contained therein, such as <a href="<MTFoo>" />, or a run of text between tags. Each element of the array is a two-element array; the first is either 'tag' or 'text'; the second is the actual value.

12 calls to SmartyPants::tokenizeHtml()
SmartyPants::hyphenate in src/SmartyPants.php
Replaces '=' with '&shy;' for easier manual hyphenating.
SmartyPants::process in src/SmartyPants.php
SmartyPants.
SmartyPants::smartAbbreviation in src/SmartyPants.php
Wrapping abbreviations and adding half space between digit grouping.
SmartyPants::smartAmpersand in src/SmartyPants.php
Wrapping ampersands.
SmartyPants::smartDashes in src/SmartyPants.php
SmartDashes.

... See full list

File

src/SmartyPants.php, line 1331

Class

SmartyPants
SmartyPants - Smart punctuation for web sites.

Namespace

Drupal\typogrify

Code

public static function tokenizeHtml($str) {
  $index = 0;
  $tokens = [];

  // Comment
  // Processing instruction
  // Regular tags.
  $match = '(?s:<!(?:--.*?--\\s*)+>)|';
  $match .= '(?s:<\\?.*?\\?>)|';
  $match .= '(?:<[/!$]?[-a-zA-Z0-9:]+\\b(?>[^"\'>]+|"[^"]*"|\'[^\']*\')*>)';
  $parts = preg_split("{({$match})}", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
  foreach ($parts as $part) {
    if (++$index % 2 && $part != '') {
      $tokens[] = [
        'text',
        $part,
      ];
    }
    else {
      $tokens[] = [
        'tag',
        $part,
      ];
    }
  }
  return $tokens;
}