You are here

class XPathHTMLParser in Feeds XPath Parser 8

Defines an HTML feed parser.

Plugin annotation


@Plugin(
  id = "feeds_xpathparser_html",
  title = @Translation("XPath HTML parser"),
  description = @Translation("Parse HTML files using XPath.")
)

Hierarchy

  • class \Drupal\feeds_xpathparser\ParserBase extends \Drupal\feeds\Plugin\ParserBase implements \Drupal\feeds\FeedPluginFormInterface, FormInterface

Expanded class hierarchy of XPathHTMLParser

File

lib/Drupal/feeds_xpathparser/Plugin/feeds/Parser/XPathHTMLParser.php, line 24
Contains \Drupal\feeds_xpathparser\Plugin\feeds\Parser\XPathHTMLParser.

Namespace

Drupal\feeds_xpathparser\Plugin\feeds\Parser
View source
class XPathHTMLParser extends ParserBase {

  /**
   * {@inheritdoc}
   */
  protected function setup(array $feed_config, FetcherResultInterface $fetcher_result) {
    if (!empty($feed_config['tidy'])) {
      $config = array(
        'merge-divs' => FALSE,
        'merge-spans' => FALSE,
        'join-styles' => FALSE,
        'drop-empty-paras' => FALSE,
        'wrap' => 0,
        'tidy-mark' => FALSE,
        'escape-cdata' => TRUE,
        'word-2000' => TRUE,
      );

      // Default tidy encoding is UTF8.
      $encoding = $feed_config['tidy_encoding'];
      $raw = tidy_repair_string(trim($fetcher_result
        ->getRaw()), $config, $encoding);
    }
    else {
      $raw = $fetcher_result
        ->getRaw();
    }
    $doc = new \DOMDocument();

    // Use our own error handling.
    $use = $this
      ->errorStart();
    $success = $doc
      ->loadHTML($raw);
    unset($raw);
    $this
      ->errorStop($use, $feed_config['errors']);
    if (!$success) {
      throw new \RuntimeException(t('There was an error parsing the HTML document.'));
    }
    return $doc;
  }

  /**
   * {@inheritdoc}
   */
  protected function getRaw(\DOMNode $node) {
    return $this->doc
      ->saveHTML($node);
  }

}

Members

Namesort descending Modifiers Type Description Overrides
FormInterface::getFormId public function Returns a unique string identifying the form. 236
FormInterface::submitForm public function Form submission handler. 192
ParserBase::$doc protected property The DOMDocument to parse.
ParserBase::$rawXML protected property The mappings to return raw XML for.
ParserBase::$xpath protected property The DOMXpath object to use for XPath queries.
ParserBase::baseForm protected function
ParserBase::buildForm public function Form constructor. Overrides FormInterface::buildForm
ParserBase::configDefaults public function
ParserBase::errorStart protected function Starts custom error handling.
ParserBase::errorStop protected function Stops custom error handling.
ParserBase::feedForm public function
ParserBase::feedFormValidate public function Overrides parent::feedFormValidate().
ParserBase::filterMappings protected function Filters mappings, returning the ones that belong to us.
ParserBase::getMappingSources public function
ParserBase::getOwnMappings protected function Gets the mappings that are defined by this parser.
ParserBase::getUniques protected function Gets the unique mappings targets that are used by this parser.
ParserBase::kSort protected function Recursivly sorts an array.
ParserBase::parse public function
ParserBase::parseSourceElement protected function Parses one item from the context array.
ParserBase::sourceDefaults public function
ParserBase::validateForm public function Form validation handler. Overrides FormInterface::validateForm
XPathHTMLParser::getRaw protected function Returns the raw node value. Overrides ParserBase::getRaw
XPathHTMLParser::setup protected function Classes that use ParserBase must implement this. Overrides ParserBase::setup