You are here

class Xml in Migrate Plus 8.3

Same name and namespace in other branches
  1. 8.5 src/Plugin/migrate_plus/data_parser/Xml.php \Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Xml
  2. 8.2 src/Plugin/migrate_plus/data_parser/Xml.php \Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Xml
  3. 8.4 src/Plugin/migrate_plus/data_parser/Xml.php \Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Xml

Obtain XML data for migration using the XMLReader pull parser.

Plugin annotation


@DataParser(
  id = "xml",
  title = @Translation("XML")
)

Hierarchy

Expanded class hierarchy of Xml

File

src/Plugin/migrate_plus/data_parser/Xml.php, line 16

Namespace

Drupal\migrate_plus\Plugin\migrate_plus\data_parser
View source
class Xml extends DataParserPluginBase {
  use XmlTrait;

  /**
   * The XMLReader we are encapsulating.
   *
   * @var \XMLReader
   */
  protected $reader;

  /**
   * Array of the element names from the query.
   *
   * 0-based from the first (root) element. For example, '//file/article' would
   * be stored as [0 => 'file', 1 => 'article'].
   *
   * @var array
   */
  protected $elementsToMatch = [];

  /**
   * An optional xpath predicate.
   *
   * Restricts the matching elements based on values in their children. Parsed
   * from the element query at construct time.
   *
   * @var string
   */
  protected $xpathPredicate = NULL;

  /**
   * Array representing the path to the current element as we traverse the XML.
   *
   * For example, if in an XML string like '<file><article>...</article></file>'
   * we are positioned within the article element, currentPath will be
   * [0 => 'file', 1 => 'article'].
   *
   * @var array
   */
  protected $currentPath = [];

  /**
   * Retains all elements with a given name to support extraction from parents.
   *
   * This is a hack to support field extraction of values in parents
   * of the 'context node' - ie, if $this->fields() has something like '..\nid'.
   * Since we are using a streaming xml processor, it is too late to snoop
   * around parent elements again once we've located an element of interest. So,
   * grab elements with matching names and their depths, and refer back to it
   * when building the source row.
   *
   * @var array
   */
  protected $parentXpathCache = [];

  /**
   * Hash of the element names that should be captured into $parentXpathCache.
   *
   * @var array
   */
  protected $parentElementsOfInterest = [];

  /**
   * Element name matching mode.
   *
   * When matching element names, whether to compare to the namespace-prefixed
   * name, or the local name.
   *
   * @var bool
   */
  protected $prefixedName = FALSE;

  /**
   * {@inheritdoc}
   */
  public function __construct(array $configuration, $plugin_id, $plugin_definition) {
    parent::__construct($configuration, $plugin_id, $plugin_definition);
    $this->reader = new \XMLReader();

    // Suppress errors during parsing, so we can pick them up after.
    libxml_use_internal_errors(TRUE);

    // Parse the element query. First capture group is the element path, second
    // (if present) is the attribute.
    preg_match_all('|^/([^\\[]+)\\[?(.*?)]?$|', $configuration['item_selector'], $matches);
    $element_path = $matches[1][0];
    $this->elementsToMatch = explode('/', $element_path);
    $predicate = $matches[2][0];
    if ($predicate) {
      $this->xpathPredicate = $predicate;
    }

    // If the element path contains any colons, it must be specifying
    // namespaces, so we need to compare using the prefixed element
    // name in next().
    if (strpos($element_path, ':')) {
      $this->prefixedName = TRUE;
    }
    foreach ($this
      ->fieldSelectors() as $field_name => $xpath) {
      $prefix = substr($xpath, 0, 3);
      if ($prefix === '../') {
        $this->parentElementsOfInterest[] = str_replace('../', '', $xpath);
      }
      elseif ($prefix === '..\\') {
        $this->parentElementsOfInterest[] = str_replace('..\\', '', $xpath);
      }
    }
  }

  /**
   * Builds a \SimpleXmlElement rooted at the iterator's current location.
   *
   * The resulting SimpleXmlElement also contains any child nodes of the current
   * element.
   *
   * @return \SimpleXmlElement|false
   *   A \SimpleXmlElement when the document is parseable, or false if a
   *   parsing error occurred.
   *
   * @throws MigrateException
   */
  protected function getSimpleXml() {
    $node = $this->reader
      ->expand();
    if ($node) {

      // We must associate the DOMNode with a DOMDocument to be able to import
      // it into SimpleXML. Despite appearances, this is almost twice as fast as
      // simplexml_load_string($this->readOuterXML());
      $dom = new \DOMDocument();
      $node = $dom
        ->importNode($node, TRUE);
      $dom
        ->appendChild($node);
      $sxml_elem = simplexml_import_dom($node);
      $this
        ->registerNamespaces($sxml_elem);
      return $sxml_elem;
    }
    else {
      foreach (libxml_get_errors() as $error) {
        $error_string = self::parseLibXmlError($error);
        throw new MigrateException($error_string);
      }
      return FALSE;
    }
  }

  /**
   * {@inheritdoc}
   */
  public function rewind() {

    // Reset our path tracker.
    $this->currentPath = [];
    parent::rewind();
  }

  /**
   * {@inheritdoc}
   */
  protected function openSourceUrl($url) {

    // (Re)open the provided URL.
    $this->reader
      ->close();

    // Clear XML error buffer. Other Drupal code that executed during the
    // migration may have polluted the error buffer and could create false
    // positives in our error check below. We are only concerned with errors
    // that occur from attempting to load the XML string into an object here.
    libxml_clear_errors();
    return $this->reader
      ->open($url, NULL, \LIBXML_NOWARNING);
  }

  /**
   * {@inheritdoc}
   */
  protected function fetchNextRow() {
    $target_element = NULL;

    // Loop over each node in the XML file, looking for elements at a path
    // matching the input query string (represented in $this->elementsToMatch).
    while ($this->reader
      ->read()) {
      if ($this->reader->nodeType == \XMLReader::ELEMENT) {
        if ($this->prefixedName) {
          $this->currentPath[$this->reader->depth] = $this->reader->name;
          if (in_array($this->reader->name, $this->parentElementsOfInterest)) {
            $this->parentXpathCache[$this->reader->depth][$this->reader->name][] = $this
              ->getSimpleXml();
          }
        }
        else {
          $this->currentPath[$this->reader->depth] = $this->reader->localName;
          if (in_array($this->reader->localName, $this->parentElementsOfInterest)) {
            $this->parentXpathCache[$this->reader->depth][$this->reader->name][] = $this
              ->getSimpleXml();
          }
        }
        if ($this->currentPath == $this->elementsToMatch) {

          // We're positioned to the right element path - build the SimpleXML
          // object to enable proper xpath predicate evaluation.
          $target_element = $this
            ->getSimpleXml();
          if ($target_element !== FALSE) {
            if (empty($this->xpathPredicate) || $this
              ->predicateMatches($target_element)) {
              break;
            }
          }
        }
      }
      elseif ($this->reader->nodeType == \XMLReader::END_ELEMENT) {

        // Remove this element and any deeper ones from the current path.
        foreach ($this->currentPath as $depth => $name) {
          if ($depth >= $this->reader->depth) {
            unset($this->currentPath[$depth]);
          }
        }
        foreach ($this->parentXpathCache as $depth => $elements) {
          if ($depth > $this->reader->depth) {
            unset($this->parentXpathCache[$depth]);
          }
        }
      }
    }

    // If we've found the desired element, populate the currentItem and
    // currentId with its data.
    if ($target_element !== FALSE && !is_null($target_element)) {
      foreach ($this
        ->fieldSelectors() as $field_name => $xpath) {
        $prefix = substr($xpath, 0, 3);
        if (in_array($prefix, [
          '../',
          '..\\',
        ])) {
          $name = str_replace($prefix, '', $xpath);
          $up = substr_count($xpath, $prefix);
          $values = $this
            ->getAncestorElements($up, $name);
        }
        else {
          $values = $target_element
            ->xpath($xpath);
        }
        foreach ($values as $value) {

          // If the SimpleXMLElement doesn't render to a string of any sort,
          // and has children then return the whole object for the process
          // plugin or other row manipulation.
          if ($value
            ->children() && !trim((string) $value)) {
            $this->currentItem[$field_name] = $value;
          }
          else {
            $this->currentItem[$field_name][] = (string) $value;
          }
        }
      }

      // Reduce single-value results to scalars.
      foreach ($this->currentItem as $field_name => $values) {
        if (count($values) == 1) {
          $this->currentItem[$field_name] = reset($values);
        }
      }
    }
  }

  /**
   * Tests whether the iterator's xpath predicate matches the provided element.
   *
   * Has some limitations esp. in that it is easy to write predicates that
   * reference things outside this SimpleXmlElement's tree, but "simpler"
   * predicates should work as expected.
   *
   * @param \SimpleXMLElement $elem
   *   The element to test.
   *
   * @return bool
   *   True if the element matches the predicate, false if not.
   */
  protected function predicateMatches(\SimpleXMLElement $elem) {
    return !empty($elem
      ->xpath('/*[' . $this->xpathPredicate . ']'));
  }

  /**
   * Gets an ancestor SimpleXMLElement, if the element name was registered.
   *
   * Gets the SimpleXMLElement some number of levels above the iterator
   * having the given name, but only for element names that this
   * Xml data parser was told to retain for future reference through the
   * constructor's $parent_elements_of_interest.
   *
   * @param int $levels_up
   *   The number of levels back towards the root of the DOM tree to ascend
   *   before searching for the named element.
   * @param string $name
   *   The name of the desired element.
   *
   * @return \SimpleXMLElement|false
   *   The element matching the level and name requirements, or false if it is
   *   not present or was not retained.
   */
  public function getAncestorElements($levels_up, $name) {
    if ($levels_up > 0) {
      $levels_up *= -1;
    }
    $ancestor_depth = $this->reader->depth + $levels_up + 1;
    if ($ancestor_depth < 0) {
      return FALSE;
    }
    if (array_key_exists($ancestor_depth, $this->parentXpathCache) && array_key_exists($name, $this->parentXpathCache[$ancestor_depth])) {
      return $this->parentXpathCache[$ancestor_depth][$name];
    }
    else {
      return FALSE;
    }
  }

}

Members

Namesort descending Modifiers Type Description Overrides
DataParserPluginBase::$activeUrl protected property Index of the currently-open url.
DataParserPluginBase::$currentId protected property Value of the ID for the current item when iterating.
DataParserPluginBase::$currentItem protected property Current item when iterating.
DataParserPluginBase::$dataFetcher protected property The data retrieval client.
DataParserPluginBase::$itemSelector protected property String indicating how to select an item's data from the source.
DataParserPluginBase::$urls protected property List of source urls.
DataParserPluginBase::count public function
DataParserPluginBase::create public static function
DataParserPluginBase::current public function
DataParserPluginBase::fieldSelectors protected function Return the selectors used to populate each configured field.
DataParserPluginBase::getDataFetcherPlugin public function Returns the initialized data fetcher plugin.
DataParserPluginBase::key public function
DataParserPluginBase::next public function Implementation of Iterator::next().
DataParserPluginBase::nextSource protected function Advances the data parser to the next source url.
DataParserPluginBase::valid public function
DependencySerializationTrait::$_entityStorages protected property An array of entity type IDs keyed by the property name of their storages.
DependencySerializationTrait::$_serviceIds protected property An array of service IDs keyed by property name used for serialization.
DependencySerializationTrait::__sleep public function 1
DependencySerializationTrait::__wakeup public function 2
MessengerTrait::$messenger protected property The messenger. 29
MessengerTrait::messenger public function Gets the messenger. 29
MessengerTrait::setMessenger public function Sets the messenger.
PluginBase::$configuration protected property Configuration information passed into the plugin. 1
PluginBase::$pluginDefinition protected property The plugin implementation definition. 1
PluginBase::$pluginId protected property The plugin_id.
PluginBase::DERIVATIVE_SEPARATOR constant A string which is used to separate base plugin IDs from the derivative ID.
PluginBase::getBaseId public function Gets the base_plugin_id of the plugin instance. Overrides DerivativeInspectionInterface::getBaseId
PluginBase::getDerivativeId public function Gets the derivative_id of the plugin instance. Overrides DerivativeInspectionInterface::getDerivativeId
PluginBase::getPluginDefinition public function Gets the definition of the plugin implementation. Overrides PluginInspectionInterface::getPluginDefinition 3
PluginBase::getPluginId public function Gets the plugin_id of the plugin instance. Overrides PluginInspectionInterface::getPluginId
PluginBase::isConfigurable public function Determines if the plugin is configurable.
StringTranslationTrait::$stringTranslation protected property The string translation service. 1
StringTranslationTrait::formatPlural protected function Formats a string containing a count of items.
StringTranslationTrait::getNumberOfPlurals protected function Returns the number of plurals supported by a given language.
StringTranslationTrait::getStringTranslation protected function Gets the string translation service.
StringTranslationTrait::setStringTranslation public function Sets the string translation service to use. 2
StringTranslationTrait::t protected function Translates a string to the current language or to a given language.
Xml::$currentPath protected property Array representing the path to the current element as we traverse the XML.
Xml::$elementsToMatch protected property Array of the element names from the query.
Xml::$parentElementsOfInterest protected property Hash of the element names that should be captured into $parentXpathCache.
Xml::$parentXpathCache protected property Retains all elements with a given name to support extraction from parents.
Xml::$prefixedName protected property Element name matching mode.
Xml::$reader protected property The XMLReader we are encapsulating.
Xml::$xpathPredicate protected property An optional xpath predicate.
Xml::fetchNextRow protected function Retrieves the next row of data from the open source URL, populating currentItem. Overrides DataParserPluginBase::fetchNextRow
Xml::getAncestorElements public function Gets an ancestor SimpleXMLElement, if the element name was registered.
Xml::getSimpleXml protected function Builds a \SimpleXmlElement rooted at the iterator's current location.
Xml::openSourceUrl protected function Opens the specified URL. Overrides DataParserPluginBase::openSourceUrl
Xml::predicateMatches protected function Tests whether the iterator's xpath predicate matches the provided element.
Xml::rewind public function Overrides DataParserPluginBase::rewind
Xml::__construct public function Constructs a \Drupal\Component\Plugin\PluginBase object. Overrides DataParserPluginBase::__construct
XmlTrait::parseLibXmlError public static function Parses a LibXMLError to a error message string.
XmlTrait::registerNamespaces protected function Registers the iterator's namespaces to a SimpleXMLElement.