You are here

class FeedsExXmlUtility in Feeds extensible parsers 7

Same name and namespace in other branches
  1. 7.2 src/Xml/Utility.php \FeedsExXmlUtility

Simple XML helpers.

Hierarchy

Expanded class hierarchy of FeedsExXmlUtility

File

src/Xml/Utility.php, line 11
Contains FeedsExXmlUtility.

View source
class FeedsExXmlUtility {

  /**
   * Matches the characters of an XML element.
   *
   * @var string
   */
  protected static $elementRegex = '[:A-Z_a-z\\xC0-\\xD6\\xD8-\\xF6\\xF8-\\x{2FF}\\x{370}-\\x{37D}\\x{37F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}][:A-Z_a-z\\xC0-\\xD6\\xD8-\\xF6\\xF8-\\x{2FF}\\x{370}-\\x{37D}\\x{37F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}\\.\\-0-9\\xB7\\x{0300}-\\x{036F}\\x{203F}-\\x{2040}]*';

  /**
   * Strips the default namespaces from an XML string.
   *
   * @param string $xml
   *   The XML string.
   *
   * @return string
   *   The XML string with the default namespaces removed.
   */
  public static function removeDefaultNamespaces($xml) {
    return preg_replace('/(<' . self::$elementRegex . '[^>]*)\\s+xmlns\\s*=\\s*("|\').*?(\\2)([^>]*>)/u', '$1$4', $xml);
  }

  /**
   * Converts named HTML entities to their UTF-8 equivalent.
   *
   * @param string $markup
   *   The string.
   *
   * @return string
   *   The converted string.
   */
  public static function decodeNamedHtmlEntities($markup) {
    if (version_compare(PHP_VERSION, '5.4.0', '<')) {
      $map = (require dirname(dirname(dirname(__FILE__))) . '/resources/html_entities.php');
    }
    else {
      $map = array_flip(get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES | ENT_HTML5, 'UTF-8'));
      unset($map['&amp;'], $map['&lt;'], $map['&gt;']);
    }
    return strtr($markup, $map);
  }

  /**
   * Creates an XML document.
   *
   * @param string $source
   *   The string containing the XML.
   * @param int $options
   *   (optional) Bitwise OR of the libxml option constants. Defaults to 0.
   *
   * @return DOMDocument
   *   The newly created DOMDocument.
   *
   * @throws RuntimeException
   *   Thrown if there is a fatal error parsing the XML.
   */
  public static function createXmlDocument($source, $options = 0) {
    $document = self::buildDomDocument();
    $options |= LIBXML_NONET;
    $options |= defined('LIBXML_COMPACT') ? LIBXML_COMPACT : 0;
    $options |= defined('LIBXML_PARSEHUGE') ? LIBXML_PARSEHUGE : 0;
    if (!$document
      ->loadXML($source, $options)) {
      throw new RuntimeException(t('There was an error parsing the XML document.'));
    }
    return $document;
  }

  /**
   * Creates an HTML document.
   *
   * @param string $source
   *   The string containing the HTML.
   * @param int $options
   *   (optional) Bitwise OR of the libxml option constants. Defaults to 0.
   *
   * @return DOMDocument
   *   The newly created DOMDocument.
   *
   * @throws RuntimeException
   *   Thrown if there is a fatal error parsing the XML.
   */
  public static function createHtmlDocument($source, $options = 0) {

    // Fun hack to force parsing as utf-8.
    $source = '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />' . "\n" . $source;
    $document = self::buildDomDocument();

    // Pass in options if available.
    if (version_compare(PHP_VERSION, '5.4.0', '>=')) {
      $options |= LIBXML_NONET;
      $options |= defined('LIBXML_COMPACT') ? LIBXML_COMPACT : 0;
      $options |= defined('LIBXML_PARSEHUGE') ? LIBXML_PARSEHUGE : 0;
      $success = $document
        ->loadHTML($source, $options);
    }
    else {
      $success = $document
        ->loadHTML($source);
    }
    if (!$success) {
      throw new RuntimeException(t('There was an error parsing the HTML document.'));
    }
    return $document;
  }

  /**
   * Builds a DOMDocument setting some default values.
   *
   * @return DOMDocument
   *   A new DOMDocument.
   */
  protected static function buildDomDocument() {
    $document = new DOMDocument();
    $document->strictErrorChecking = FALSE;
    $document->resolveExternals = FALSE;

    // Libxml specific.
    $document->substituteEntities = FALSE;
    $document->recover = TRUE;
    return $document;
  }

}

Members

Namesort descending Modifiers Type Description Overrides
FeedsExXmlUtility::$elementRegex protected static property Matches the characters of an XML element.
FeedsExXmlUtility::buildDomDocument protected static function Builds a DOMDocument setting some default values.
FeedsExXmlUtility::createHtmlDocument public static function Creates an HTML document.
FeedsExXmlUtility::createXmlDocument public static function Creates an XML document.
FeedsExXmlUtility::decodeNamedHtmlEntities public static function Converts named HTML entities to their UTF-8 equivalent.
FeedsExXmlUtility::removeDefaultNamespaces public static function Strips the default namespaces from an XML string.