You are here

protected function XPathHTMLParser::setup in Feeds XPath Parser 8

Classes that use ParserBase must implement this.

Parameters

array $feed_config: The configuration for the source.

\Drupal\feeds\FetcherResultInterface $fetcher_result: A FetcherResultInterface object.

Return value

\DOMDocument The DOMDocument to perform XPath queries on.

Overrides ParserBase::setup

File

lib/Drupal/feeds_xpathparser/Plugin/feeds/Parser/XPathHTMLParser.php, line 29
Contains \Drupal\feeds_xpathparser\Plugin\feeds\Parser\XPathHTMLParser.

Class

XPathHTMLParser
Defines an HTML feed parser.

Namespace

Drupal\feeds_xpathparser\Plugin\feeds\Parser

Code

protected function setup(array $feed_config, FetcherResultInterface $fetcher_result) {
  if (!empty($feed_config['tidy'])) {
    $config = array(
      'merge-divs' => FALSE,
      'merge-spans' => FALSE,
      'join-styles' => FALSE,
      'drop-empty-paras' => FALSE,
      'wrap' => 0,
      'tidy-mark' => FALSE,
      'escape-cdata' => TRUE,
      'word-2000' => TRUE,
    );

    // Default tidy encoding is UTF8.
    $encoding = $feed_config['tidy_encoding'];
    $raw = tidy_repair_string(trim($fetcher_result
      ->getRaw()), $config, $encoding);
  }
  else {
    $raw = $fetcher_result
      ->getRaw();
  }
  $doc = new \DOMDocument();

  // Use our own error handling.
  $use = $this
    ->errorStart();
  $success = $doc
    ->loadHTML($raw);
  unset($raw);
  $this
    ->errorStop($use, $feed_config['errors']);
  if (!$success) {
    throw new \RuntimeException(t('There was an error parsing the HTML document.'));
  }
  return $doc;
}