You are here

taxonomy_xml.module in Taxonomy import/export via XML 6.2

This module makes it possible to import and export taxonomies as XML documents.

File

taxonomy_xml.module
View source
<?php

/**
 * @file
 *  This module makes it possible to import and export taxonomies as XML
 * documents.
 */

/**
 * 2010 updates for remote services and RDF syntax.
 * 2009 updates for taxonconcepts and RDF.
 * 2008 rewrite
 * Dan Morrison http://coders. co. nz
 *
 * Copyright (c) 2007  Nicolas Haase <nicolas.haase@team.ourbrisbane.com>
 * Copyright (c) 2006  Sami Khan <sami@etopian.com>
 * Copyright (c) 2005  Sheldon Rampton <sheldon@prwatch.org>
 *
 * This program is free software; you can redistribute it and/or modify it under
 * the terms of the GNU General Public License as published by the Free Software
 * Foundation; either version 2 of the License.
 */

/**
 * Cannonic Predicates
 *
 * Following is a list of the cannonic keywords used in the expected source
 * files to represent parent-child relationships and other attributes. These
 * words based on one selected govt vocabulary, many others are likely.
 *
 * A hash table of SYNONYMS for these terms is included in this module for
 * maximum compatability with other syntaxes found in the wild.
 * @see taxonomy_xml_relationship_synonyms()
 *
 * Refer also to ISO2788 for a description of the use of these structural
 * relations
 * "http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html#sect-thesauri"
 *
 *
 *
 * An example Thesaurus source file may be the text version of
 *
 * Thesaurus for Graphic Materials I: Subject Terms [Library of Congress]
 *
 * "http://www.loc.gov/rr/print/tgm1/downloadtgm1.html"
 */
define('TAXONOMY_XML_RELATED', 'Related Terms');
define('TAXONOMY_XML_PARENT', 'Broader Terms');
define('TAXONOMY_XML_CHILD', 'Narrower Terms');
define('TAXONOMY_XML_HAS_SYNONYM', 'Used for');
define('TAXONOMY_XML_SYNONYM_OF', 'Use');
define('TAXONOMY_XML_DESCRIPTION', 'Definition');
define('TAXONOMY_XML_IN_VOCABULARY', 'Part of');
define('TAXONOMY_XML_NAME', 'name');
define('TAXONOMY_XML_UNUSED', 'unused');

// Data coming from CSV may have a header row with a blank predicate
// That row is not a triple, defining a NULL value allows us to ignore it quietly
define('TAXONOMY_XML_NULL', '');

// Other is data that we want but doesn't map to Drupal
// It'll get stored by rdf.module if possible
define('TAXONOMY_XML_OTHER_PREDICATE', 'other_rdf');

/**
 * The batch process is tricky.
 * When we import a document, it may have any number of external references that
 * need retrieving and resolving.
 * As they are found, they are added to a queue.
 * As the queue is processed, it in turn adds more references branching off the
 * process. Probably growing exponentially for a few rounds.
 * To try and avoid the worst of the scaling problems, we will limit batch sizes
 * to manageable numbers.
 * It is impossible to extimate just how many or how deep the spidering process
 * will go - we just have to aim and go.
 * 
 * If you have a chunky server, good memory and a large timeout, it's probably
 * fine to push this up to 200 or more.
 */
define('TAXONOMY_XML_MAX_BATCH_SIZE', 50);

/**
 * Used to define some flags for the import options
 */
define('TAXONOMY_XML_DETERMINED_BY_SOURCE_FILE', 0);
define('TAXONOMY_XML_CREATE_NEW', -1);

/**
 * Menu root for our items
 */
define('TAXONOMY_XML_ADMIN', 'admin/content/taxonomy');

/**
 * As we allow upload, we should limit it to expected types
 */
define('TAXONOMY_XML_ALLOWED_UPLOAD_EXTENSIONS', 'txt csv xml rdf');

/**
 * Implementation of hook_help().
 */
function taxonomy_xml_help($path, $arg) {
  $doc_path = drupal_get_path('module', 'taxonomy_xml') . '/help';
  switch ($path) {
    case 'admin/modules#description':
      return t('Makes it possible to import and export taxonomy terms via XML.');
    case TAXONOMY_XML_ADMIN . '/import':
      return t("\n        You can upload or import a vocabulary and/or taxonomy terms\n        from a properly-formatted input document or web service.\n      ") . theme("more_help_link", url('admin/help/taxonomy_xml'));
    case TAXONOMY_XML_ADMIN . '/export':
      return t("\n        You can export XML documents for each vocabulary and its terms in\n        this website's taxonomies.\n        Choose the vocabulary from the list below.\n        See more about !taxonomy_formats in the module docs.\n        ", array(
        '!taxonomy_formats' => l(t("taxonomy formats"), "admin/help/taxonomy_xml"),
      ));
    case 'admin/help#taxonomy_xml':
      return t(file_get_contents($doc_path . '/help.html'), array(
        '!downloads' => url(TAXONOMY_XML_ADMIN . "/export"),
        '!upload' => url(TAXONOMY_XML_ADMIN . "/import"),
        '!formats' => url("{$doc_path}/formats.html"),
        '!services' => url(TAXONOMY_XML_ADMIN . "/import/services"),
        '!rdf' => url("{$doc_path}/rdf.html"),
      ));
    case TAXONOMY_XML_ADMIN . '/import/services':
      return file_get_contents("{$doc_path}/services.html");
  }
}

/**
 * Implementation of hook_menu: Define menu links.
 *
 * @note See hook_menu for a description of return values.
 */
function taxonomy_xml_menu() {
  if (!module_exists('taxonomy')) {
    return;
  }
  $items = array();
  $items[TAXONOMY_XML_ADMIN . '/export'] = array(
    'title' => t('Export'),
    'access arguments' => array(
      'export taxonomy',
    ),
    'page callback' => 'taxonomy_xml_export',
    'type' => MENU_LOCAL_TASK,
  );
  $items[TAXONOMY_XML_ADMIN . '/import'] = array(
    'title' => t('Import'),
    'access arguments' => array(
      'administer taxonomy',
    ),
    'page callback' => 'taxonomy_xml_import',
    'type' => MENU_LOCAL_TASK,
  );
  $items[TAXONOMY_XML_ADMIN . '_xml/flush'] = array(
    'title' => t('Delete cache file'),
    'access arguments' => array(
      'administer taxonomy',
    ),
    'page callback' => 'taxonomy_xml_flush_cache_file',
    'type' => MENU_CALLBACK,
  );
  $items['taxonomy_xml'] = array(
    'title' => t('Taxonomy XML'),
    'access arguments' => array(
      'access content',
    ),
    'page callback' => 'taxonomy_xml_file',
    'type' => MENU_CALLBACK,
  );
  $items[TAXONOMY_XML_ADMIN . '/import/services'] = array(
    'title' => t('About taxonomy_import services'),
    'access arguments' => array(
      'administer taxonomy',
    ),
    'page callback' => 'taxonomy_xml_about_services',
    'type' => MENU_LOCAL_TASK,
  );
  if (module_exists('rdf')) {

    // Export individual taxonomy term information
    // This approach is closer to how rdf.module expects to do it
    $items['taxonomy/term/%/rdf'] = array(
      'title' => 'RDF',
      'type' => MENU_CALLBACK,
      'access arguments' => array(
        'access content',
      ),
      'page callback' => 'taxonomy_xml_rdf_export_term',
      'page arguments' => array(
        2,
      ),
      'file' => 'taxonomy_xml_rdf.inc',
    );

    // Not a real URL, but it works as a placeholder
    $items['taxonomy/vocabulary/%/rdf'] = array(
      'title' => 'RDF',
      'type' => MENU_CALLBACK,
      'access arguments' => array(
        'access content',
      ),
      'page callback' => 'taxonomy_xml_rdf_export_vocabulary',
      'page arguments' => array(
        2,
      ),
      'file' => 'taxonomy_xml_rdf.inc',
    );
  }
  return $items;
}

/**
 * Implementation of hook_perm().
 */
function taxonomy_xml_perm() {
  return array(
    'export taxonomy',
  );
}

/**
 * Page callback. Displays an unordered list of all available vocabularies for
 * export
 *
 * @return
 *   An unordered HTML list
 */
function taxonomy_xml_export() {

  // return the list of vocabularies
  $output = '';
  $vocabularies = taxonomy_get_vocabularies();
  if (empty($vocabularies)) {
    $output .= t('There are no vocabularies present');
  }
  else {
    foreach ($vocabularies as $vocabulary) {
      $vocablist[$vocabulary->vid] = $vocabulary->name;
      $vocabcount = db_result(db_query("SELECT count(*) FROM {term_data} WHERE vid=%n", $vocabulary->vid));
      $vocablist[$vocabulary->vid] .= t(' (%vocabcount terms) ', array(
        '%vocabcount' => $vocabcount,
      ));
      $vocablist[$vocabulary->vid] .= ' ' . l('XML', "taxonomy_xml/{$vocabulary->vid}", array(
        'attributes' => array(
          'title' => "This format is Drupal-only. It closely matches the internal data structure, but is not portable outside of Drupal without work.",
        ),
      ));
      $vocablist[$vocabulary->vid] .= ' ' . l("RDF", "taxonomy_xml/{$vocabulary->vid}/rdf", array(
        'attributes' => array(
          'title' => "RDF is recommended for portability with external databases, although it is verbose and sometimes unreadable to humans.",
        ),
      ));
      $vocablist[$vocabulary->vid] .= ' ' . l("TCS", "taxonomy_xml/{$vocabulary->vid}/tcs", array(
        'attributes' => array(
          'title' => "The Taxon Concept Schema is used in Life Sciences to notate biological families of living things.",
        ),
      ));
    }
    $output = theme_item_list($vocablist);
  }
  return $output;
}

/**
 * Return a flat file representation of the requested vocab
 *
 * Default format is the original custom Drupal XML file.
 */
function taxonomy_xml_file($vid, $format = 'xml') {

  // Retrieving Vocabulary name
  $vocabulary = taxonomy_vocabulary_load($vid);
  $vname = drupal_strtolower(str_replace(' ', '_', trim($vocabulary->name)));
  unset($vocabulary);

  // Load the appropriate library, guess at the format name file
  module_load_include('inc', 'taxonomy_xml', $format . '_format');
  taxonomy_xml_include_module_hooks();
  $create_funcname = "taxonomy_xml_{$format}_create";
  $file = $create_funcname($vid);
  if (!empty($_SERVER['HTTP_USER_AGENT']) && (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE 5.5') || strpos($_SERVER['HTTP_USER_AGENT'], 'Opera'))) {
    header('Content-Type: application/dummy');
  }
  else {
    header('Content-Type: text/xml; charset=UTF-8');
  }
  if (headers_sent()) {
    echo 'Some data has already been output to browser, can\'t send file';
  }
  header('Content-Length: ' . drupal_strlen($file));
  header("Content-Disposition: attachment; filename=taxonomy_{$vname}.{$format}.xml");
  echo $file;
}

/**
 * Return a representation of the requested term
 *
 * @todo UNFINISHED, currently a stub copy from full vocab export
 */
function taxonomy_xml_export_term($tid, $format = 'rdf') {
  $term = is_numeric($term) ? taxonomy_term_load($term) : $term;
  module_load_include('inc', 'taxonomy_xml', $format . '_format');
  taxonomy_xml_include_module_hooks();
  $create_funcname = "taxonomy_xml_{$format}_create_term";
  $file = $create_funcname($tid);
  if (!empty($_SERVER['HTTP_USER_AGENT']) && (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE 5.5') || strpos($_SERVER['HTTP_USER_AGENT'], 'Opera'))) {
    header('Content-Type: application/dummy');
  }
  else {
    header('Content-Type: text/xml; charset=UTF-8');
  }
  if (headers_sent()) {
    echo 'Some data has already been output to browser, can\'t send file';
  }
  header('Content-Length: ' . drupal_strlen($file));
  header("Content-Disposition: attachment; filename=taxonomy_{$vname}.{$format}.xml");
  echo $file;
}

/**
 * Menu callback for the import page.
 */
function taxonomy_xml_import() {
  if (module_exists('pathauto')) {
    $pathauto_taxonomy_pattern = variable_get("pathauto_taxonomy_pattern", '');
    if (!empty($pathauto_taxonomy_pattern)) {
      drupal_set_message(t('
        Pathauto has been profiled to slow down the import process by over 50%.
        If you are doing large updates, it really needs to be turned off during that process.
        This module will now <em>unset</em> the "Default path pattern" so that it has a hope of doing its job.
        See <a href="!pathauto_settings">Pathauto settings</a> to review your settings.', array(
        '!pathauto_settings' => url('admin/build/path/patterns'),
      )), 'warning');
      variable_set("pathauto_taxonomy_pattern", '');
    }
  }
  return drupal_get_form('taxonomy_xml_import_form');
}

/**
 * Builds the import form.
 *
 * Form contains a selector to choose the import method used (upload, URL, Web
 * Service). This selector should reveal or hide the appropriate secondary
 * parameters. Uses JS and a bit of CSS to show/hide. With no JS, all options
 * are shown but only the chosen one is used.
 *
 * @see taxonomy_xml_import_form_submit()
 */
function taxonomy_xml_import_form($form_state) {
  drupal_add_js(drupal_get_path('module', 'taxonomy_xml') . '/taxonomy_xml.js');
  drupal_add_css(drupal_get_path('module', 'taxonomy_xml') . '/taxonomy_xml.css');
  $formats = taxonomy_xml_formats();
  $vocs[TAXONOMY_XML_DETERMINED_BY_SOURCE_FILE] = t('[Determined by source file]');
  foreach (taxonomy_get_vocabularies() as $vid => $voc) {
    $vocs[$vid] = $voc->name;
  }
  $vocs[TAXONOMY_XML_CREATE_NEW] = t('[Create new]');
  $form['vid'] = array(
    '#type' => 'select',
    '#title' => t('Target vocabulary'),
    '#default_value' => variable_get('taxonomy_xml_vid', TAXONOMY_XML_CREATE_NEW),
    '#options' => $vocs,
    '#description' => t('
      The vocabulary into which terms should be loaded.
      If you choose a pre-existing vocabulary,
      existing vocabulary settings (tags, node types etc) will NOT be modified.
      If it is to be created new, they <em>may</em> be retained.
      Internal vocabulary ID "vid" cannot be imported.
    '),
  );
  $form['data_source'] = array(
    '#type' => 'fieldset',
    #'#title' => t('Data Source'),
    '#id' => 'data_source',
    '#attributes' => array(
      'id' => 'data_source',
    ),
  );
  $form['data_source']['protocol'] = array(
    '#type' => 'select',
    '#title' => t('Data Source'),
    '#options' => array(
      'none' => t('CHOOSE'),
      'upload-file' => t('Upload File'),
      'url' => t('Web URL'),
      'filepath' => t('Local File'),
      'service' => t('Web Service'),
    ),
    '#attributes' => array(
      'id' => 'protocol',
    ),
    '#default_value' => variable_get('taxonomy_xml_protocol', 'none'),
  );
  $form['data_source']['upload_file'] = array(
    '#type' => 'file',
    '#title' => t('File to import'),
  );
  global $user;
  if ($user->uid != 1) {
    $form['data_source']['upload_file']['#description'] = t('Only [%allowed_extensions] files are allowed.', array(
      '%allowed_extensions' => TAXONOMY_XML_ALLOWED_UPLOAD_EXTENSIONS,
    ));
  }
  $form['data_source']['url'] = array(
    '#type' => 'textfield',
    '#size' => 128,
    '#title' => t('URL to import from'),
    '#description' => t('
      Enter the URL of a file or web service containing a vocabulary definition.
      <br/>eg <em>http://www.w3.org/TR/2003/PR-owl-guide-20031215/food</em>
      <br/>The URL will be fetched requesting a content-type rdf/xml if available,
      this means sometimes you can enter URLs that look like web pages
      but we will actually retrieve the raw data.
      (On well-behaved sources)
    '),
    '#default_value' => variable_get('taxonomy_xml_url', ''),
  );
  $form['data_source']['filepath'] = array(
    '#type' => 'textfield',
    '#size' => 128,
    '#title' => t('Local filepath'),
    '#description' => t('
      Enter the path of a file containing an appropriately formatted vocabulary.
      The path can either be relative to the files directory or to Drupal root.
      <br>Eg: <pre>!example</pre>
    ', array(
      '!example' => drupal_get_path('module', 'taxonomy_xml') . '/samples/sample-Dewey_System-100_terms.csv',
    )),
    '#default_value' => variable_get('taxonomy_xml_filepath', ''),
  );
  $available_services = taxonomy_xml_lookup_services('lookup', 'options');
  $form['data_source']['service'] = array(
    '#type' => 'fieldset',
    '#attributes' => array(
      'id' => 'edit-service-wrapper',
    ),
    'service_id' => array(
      '#type' => 'select',
      '#title' => t('Taxonomy Server'),
      '#description' => t('
        Choose one of the available taxonomy server services.
        These preset services are defined in the taxonomy_xml module
        and may be extended by other contributed modules.
      '),
      '#default_value' => variable_get('taxonomy_xml_service_id', ''),
      '#options' => $available_services,
    ),
    'identifier' => array(
      '#type' => 'textfield',
      '#title' => t('Unique Identifier for this service'),
      '#description' => t('Optional ID, GIUD, LSID, URI or other UID as (if) required by this service.'),
      '#default_value' => variable_get('taxonomy_xml_identifier', ''),
    ),
    'information' => array(
      '#type' => 'markup',
      '#value' => l('More info about remote taxonomy services', 'admin/content/taxonomy/import/services'),
    ),
  );
  $form['data_source']['format'] = array(
    '#type' => 'select',
    '#title' => t('Format of file'),
    '#default_value' => variable_get('taxonomy_xml_format', 'xml_format'),
    '#options' => $formats,
  );
  $form['recurse_down'] = array(
    '#type' => 'checkbox',
    '#title' => t('Recurse down the taxonomy tree'),
    '#description' => t('
      Some taxonomy sources return references to further external URL
      sources (child terms).
      Tick this if those references are to be followed.
      <br/>The recursion may get intensive, although the tasks will be "batched".
      <br/>Note: You will <b>need</b> taxonomy_guid, taxonomy_enhancer, rdf,
      or something similar to be recording the external GUIDs
      or relationships (heirarchy) cannot be maintained over batches.
    '),
    '#default_value' => variable_get('taxonomy_xml_recurse_down', TRUE),
  );

  // Add this as a requirement!
  if (!module_exists('taxonomy_guid')) {
    $form['recurse_down']['#default_value'] = FALSE;
    $form['recurse_down']['#disabled'] = TRUE;
    $form['recurse_down']['#description'] = '<p><strong>' . t('Enable taxonomy_guid before trying recursion.') . '</strong></p>' . $form['recurse_down']['#description'];
  }
  $form['advanced'] = array(
    '#type' => 'fieldset',
    '#title' => t('Advanced'),
    '#collapsible' => TRUE,
    '#collapsed' => TRUE,
  );
  $form['advanced']['duplicate'] = array(
    '#type' => 'checkbox',
    '#title' => t('Allow duplicate terms'),
    '#description' => t('If you want to keep the same term in different positions in the vocabulary hierarchy, check this'),
    '#default_value' => variable_get('taxonomy_xml_duplicate', FALSE),
  );
  $form['advanced']['reuseids'] = array(
    '#type' => 'checkbox',
    '#title' => t('Re-use IDs'),
    '#description' => t('If the source data includes numeric IDs, try to use them as Drupal term IDs. This may have mixed results on sites that are not directly synched.'),
    '#default_value' => variable_get('taxonomy_xml_reuseids', FALSE),
  );
  $form['advanced']['taxonomy_xml_watchdog_level'] = array(
    '#type' => 'select',
    '#title' => t('Debug Level'),
    '#description' => t('To assist development, taxonomy_xml has the ability to display parsing messages as they are logged.'),
    '#options' => taxonomy_xml_watchdog_levels(),
    '#default_value' => variable_get('taxonomy_xml_watchdog_level', WATCHDOG_NOTICE),
  );
  $form['advanced']['flush_cache_description'] = array(
    '#type' => 'markup',
    '#prefix' => '<p>',
    '#suffix' => '</p>',
    '#value' => t('When retrieving remote data, a local cache is kept of successful file downloads. These are not expected to change, but may get in the way during testing. Flush the cache to delete them <em>all</em>.'),
  );
  $form['advanced']['flush_cache'] = array(
    '#type' => 'submit',
    '#value' => t('Flush Cache'),
    '#submit' => array(
      'taxonomy_xml_flush_file_cache',
    ),
  );
  $form['submit'] = array(
    '#type' => 'submit',
    '#value' => t('Import'),
  );
  $form['#attributes'] = array(
    'enctype' => 'multipart/form-data',
  );
  return $form;
}

/**
 * Imports the actual XML.
 */
function taxonomy_xml_import_form_submit($form, &$form_state) {

  #if ( function_exists('xdebug_start_trace')) { xdebug_start_trace(); }

  // This should not be set unless we are actually interrogating a service. Ensure it gets wiped.
  variable_del('taxonomy_xml_service_id', NULL);

  // Remember current prefs, just for convenience
  // Speaking of convenience, why does my form discard these values all the time?
  variable_set('taxonomy_xml_format', $form_state['values']['format']);
  variable_set('taxonomy_xml_vid', $form_state['values']['vid']);
  variable_set('taxonomy_xml_duplicate', $form_state['values']['duplicate']);
  variable_set('taxonomy_xml_reuseids', $form_state['values']['reuseids']);
  variable_set('taxonomy_xml_protocol', $form_state['values']['protocol']);
  variable_set('taxonomy_xml_identifier', $form_state['values']['identifier']);
  variable_set('taxonomy_xml_url', $form_state['values']['url']);
  variable_set('taxonomy_xml_filepath', $form_state['values']['filepath']);
  variable_set('taxonomy_xml_recurse_down', $form_state['values']['recurse_down']);
  variable_set('taxonomy_xml_watchdog_level', $form_state['values']['taxonomy_xml_watchdog_level']);
  taxonomy_xml_fetch_and_import($form_state['values']);
}

/**
 * Fetches the data according to the given method, then invokes the import on
 * that data.
 *
 * @param $form_values - as submitted from the import form, but could also be an
 * array set programatically (by features or install hooks)
 */
function taxonomy_xml_fetch_and_import($form_values) {

  // Allow either upload, web service or URL sources
  //
  switch ($form_values['protocol']) {
    case 'upload-file':

      // Retrieve submitted data and parse it
      global $user;

      // Try to limit bad uploads (limitation does not apply to uid1)
      $validators = array(
        'file_validate_extensions' => array(
          TAXONOMY_XML_ALLOWED_UPLOAD_EXTENSIONS,
        ),
      );
      if ($file = file_save_upload('upload_file', $validators)) {
        $fd = fopen($file->filepath, "rb");
        if (!$fd) {
          drupal_set_message(t('Vocabulary import failed: file %filename cannot be read.', array(
            '%filename' => $file->filename,
          )), 'error');
        }
        else {
          $info = fstat($fd);
          $len = $info["size"];
          $text = fread($fd, $len);
          fclose($fd);
          drupal_set_message(t('Loaded file %filename. Now processing it.', array(
            '%filename' => $file->filename,
          )));
          $form_values['file'] = $file;
          taxonomy_xml_invoke_import($text, $form_values);
        }
      }
      else {
        drupal_set_message(t('Vocabulary import failed: file was not uploaded.'), 'error');
      }
      break;
    case 'url':

      // Retrieve remote URL and parser it
      $url = $form_values['url'];
      taxonomy_xml_invoke_import_on_url($url, $form_values);
      break;
    case 'filepath':
      $filepath = $form_values['filepath'];
      taxonomy_xml_invoke_import_on_filepath($filepath, $form_values);
      break;
    case 'service':
      variable_set('taxonomy_xml_service_id', $form_values['service_id']);

      // Invoke service and parse response
      $services = taxonomy_xml_lookup_services(NULL, 'full');
      $service = $services[$form_values['service_id']];
      taxonomy_xml_invoke_service_request($service, $form_values);
      break;
  }

  // The initial import may have queued up some further process to do.
  // Check the queue and run it when this form goes.
  batch_set(taxonomy_xml_add_term_to_batch_queue());
}

/**
 * Do the actual importing from the given string, pased on the parameters passed
 * from the form.
 *
 * @return
 *   NULL. Writes success summary to the screen
 *
 * @see taxonomy_xml_formats()
 * @see taxonomy_xml_HOOK_parse()
 */
function taxonomy_xml_invoke_import($text, $form_values, $url = NULL) {
  $vid = $form_values['vid'];
  if ($vid == TAXONOMY_XML_CREATE_NEW) {

    // Requested to create new vocab.
    $newname = !empty($form_values['file']) ? basename($form_values['file']->filename) : basename($url);
    $vocabulary = _taxonomy_xml_get_vocabulary_placeholder($newname);
    $vid = $vocabulary->vid;
    variable_set('taxonomy_xml_vid', $vid);
  }

  // Sitemenu implodes when I try to do bulk updates? Or is it pathauto
  // @todo figure this out
  // module_disable(array('sitemenu','pathauto'));
  // Conditionally include and invoke the appropriate format library
  $format = $form_values['format'];
  module_load_include('inc', 'taxonomy_xml', $format . '_format');
  taxonomy_xml_include_module_hooks();
  $funcname = "taxonomy_xml_{$format}_parse";
  if (function_exists($funcname)) {

    // All the action is here:
    $modified_terms = $funcname($text, $vid, $url);

    // Func may have modified vocab or vid during its import. reload (just for these messages).
    $vocabulary = taxonomy_vocabulary_load($vid);
    if (empty($vocabulary)) {
      drupal_set_message("Failed to create or update vocabulary. Invalid ID", 'error');
      return FALSE;
    }
    if (!empty($modified_terms)) {
      if (is_array($modified_terms)) {
        $term_list = array();
        foreach ($modified_terms as $list_term) {
          $term_list[] = l($list_term->name, 'admin/content/taxonomy/edit/term/' . $list_term->tid);
        }
        drupal_set_message(t('Updated %count term(s)', array(
          '%count' => count($modified_terms),
        )) . ' <i>' . implode(', ', $term_list) . '.</i> ');
        drupal_set_message(t("\n          Imported vocabulary %vocab_name.\n          You may now need to <a href='!settings_link'>Review the vocabulary settings</a>\n          or <a href='!list_link'>List the terms</a>", array(
          '%vocab_name' => $vocabulary->name,
          '!settings_link' => url('admin/content/taxonomy/edit/vocabulary/' . $vid),
          '!list_link' => url('admin/content/taxonomy/' . $vid),
        )));
      }
      else {

        // returned something that was not an array.
        // maybe it was just 'OK'
      }
      return TRUE;
    }
    else {
      drupal_set_message(t("Failed to import any new terms. This may be due to syntax or formattings errors in the import file.", array()), 'error');
      return FALSE;
    }
  }
  else {
    drupal_set_message("Unavailable format. {$funcname} was not found in formatting library {$format}_format .", 'error');
    return FALSE;
  }
}

/**
 * Load a vocabulary from the given URL and import it
 */
function taxonomy_xml_invoke_import_on_url($url, $form_values) {
  if ($url) {
    $text = taxonomy_xml_cached_get_contents($url);
    if (!empty($text)) {
      drupal_set_message(t('Retrieved Submitted URL %url. Now starting an import process.', array(
        '%url' => $url,
      )));
      return taxonomy_xml_invoke_import($text, $form_values, $url);
    }
    else {
      drupal_set_message(t('<strong>Taxonomy XML</strong>: Failed to retrieve content from <a href="!url">!url</a>. Check this URL and access to it. This will not work on sites requiring authentication', array(
        '!url' => $url,
      )), 'error');
    }
  }
  else {
    drupal_set_message(t('<strong>Taxonomy XML</strong>: No URL. A valid, readable URL required.'), 'error');
  }
  return FALSE;
}

/**
 * Load a vocabulary from the given local filepath and import it
 *
 */
function taxonomy_xml_invoke_import_on_filepath($filepath, $form_values) {

  // Retrieve local file and parse it
  if ($filepath) {
    $text = file_get_contents($filepath);
    if (!empty($text)) {
      drupal_set_message(t('Retrieved file %filepath. Now starting a %format import process.', array(
        '%filepath' => $filepath,
        '%format' => $form_values['format'],
      )));
      return taxonomy_xml_invoke_import($text, $form_values, $filepath);
    }
    else {
      drupal_set_message(t('<strong>Taxonomy XML</strong>: Failed to retrieve content from %filepath. Check this file exists and is readable', array(
        '%filepath' => $filepath,
      )), 'error');
    }
  }
  else {
    drupal_set_message(t('<strong>Taxonomy XML</strong>: No filepath. A valid, readable file path is required.'), 'error');
  }
  return FALSE;
}

/**
 * Return a list of available file formats.
 *
 * Scan the module directory for appropriate inc files.
 * More can be added as appropriate.
 *
 * A taxonomy_xml *_format.inc file should prvide an implimentation of the hooks
 * taxonomy_xml_FORMAT_parse() and/or taxonomy_xml_FORMAT_create() to support
 * reading or writing respectively.
 *
 * It may also check for further dependencies (ARC) as needed.
 *
 * @return
 *   an Array (
 *   [csv_format]   => CSV
 *   [rdf_format]   => RDF
 *   [xml_format]   => XML
 *  )
 */
function taxonomy_xml_formats() {
  $module_dir = drupal_get_path('module', 'taxonomy_xml');
  $incs = file_scan_directory($module_dir, '.*_format.inc');
  $formats = array();
  foreach ($incs as $filepath => $file) {
    include_once $file->filename;
    $format_name = preg_replace('/_format$/', '', $file->name);
    $funcname = "taxonomy_xml_{$format_name}_requirements";
    $error = function_exists($funcname) ? $funcname() : NULL;
    if (empty($error)) {
      $formats[$format_name] = drupal_strtoupper($format_name);
    }
    else {
      drupal_set_message($error['taxonomy_xml_' . $format_name]['description'], 'warning');
    }
  }
  return $formats;
}

/**
 * Load includes for any modules that implement extensions to import or
 * export processes.
 *
 * We don't have much of an API, but any module can include
 * hook_taxonomy_xml_export_term if it wants to.
 */
function taxonomy_xml_include_module_hooks($reset = FALSE) {

  // I'm providing some default integration on behalf of these modules.
  // Find the files in the 'includes' directory
  $modules = array(
    'path',
    'geotaxonomy',
    'menu',
  );
  foreach (array_filter($modules, 'module_exists') as $module) {
    module_load_include('inc', 'taxonomy_xml', "includes/taxonomy_xml.{$module}");
  }
}

/**
 * Fetch the named vocab if it exists
 *
 * Create and return a useful placeholder if not.
 *
 * @param $name
 * @param $edit An array of additional vocabulary properties as used by
 * taxonomy_save_vocabulary, eg 'description', 'help', 'required', 'tags'
 *
 * @return
 *   The new vocab object.
 */
function _taxonomy_xml_get_vocabulary_placeholder($name, $edit = array()) {
  if ($vocabulary = taxonomy_xml_get_vocabulary_by_name($name)) {
    return $vocabulary;
  }

  // Create new vocab
  $vocabulary = array(
    'name' => $name,
    'relations' => TRUE,
    'hierarchy' => 2,
  ) + $edit;
  taxonomy_save_vocabulary($vocabulary);

  // Need to retrieve it from DB again - the result isn't given back to us.
  $vid = db_result(db_query("SELECT vid FROM {vocabulary} WHERE LOWER('%s') LIKE LOWER(name)", $vocabulary['name']));
  $vocabulary = taxonomy_vocabulary_load($vid);
  drupal_set_message(t('Created vocabulary %vid %vocabname to put these terms into. You probably want to <a href="!vocablink">go edit it now</a>.', array(
    '%vocabname' => $vocabulary->name,
    '%vid' => $vid,
    '!vocablink' => url('admin/content/taxonomy/edit/vocabulary/' . $vid),
  )));
  return $vocabulary;
}

/**
 * Create Vocabulary definitions.
 *
 * Use the vocabs defined as resources in the input.
 *
 * @param
 *   $vocabularies    An array of vocabulary definition objects, extracted from
 * the XML. Modified with their deduced or new vid values by reference
 *
 * $vocabularies = array(
 *   '#vocabulary-3' => stdClass Object
 *     'name' => "Countries",
 *     'predicates'  => array(
 *       'description' => array( 0 => "A list of countries" ),
 *       'version'  => array( 0 => "2008-08-08" ),
 *     )
 *   )
 * )
 *
 * All 'predicates' will be compressed into properties. EG in the above example,
 * ['#vocabulary-3']['predicates']['description'][0] = "a list"
 * is flattened to
 * ['#vocabulary-3']['description'] = "a list"
 *
 * Either input format is fine.
 *
 */
function taxonomy_xml_absorb_vocabulary_definitions(&$vocabularies) {

  // See if we can match this definition against the given vid
  // - then on name.
  // If neither seems to exist, make a new one.
  if (is_array($vocabularies)) {

    #dpm(array("The vocabulary definition(s) found in the input file is ", $vocabularies));

    // There may be more than one vocab def per file, although this is unlikely
    // Diagnostics:
    if (count($vocabularies) > 1) {
      drupal_set_message(t("When importing, I found more than one vocabulary definition in the same resource. This could be confusing. <pre>!object</pre>", array(
        '!object' => print_r($vocabularies, 1),
      )), 'warning');
    }
    foreach ($vocabularies as $vocabid => &$vocab) {

      // Merge all predicate data into a simpler array, re-tagging the attributes as needed
      // - there's not a lot of metadata about vocabs we know how to import, but try anyway - for expansion
      if (!empty($vocab->predicates)) {
        taxonomy_xml_merge_predicates_into_attributes($vocab);
      }
      if (empty($vocab->name)) {
        drupal_set_message("We require a NAME to create a vocabulary. Vocabulary definition appeared to have no name. Using a label derived from the URI instead.", 'warning');

        // Make up a name based on the URI
        $vocab->name = taxonomy_xml_label_from_uri($vocabid);
      }
      $target_vocab = NULL;

      // Deduce the given vocab definitions vid, if given as a value
      if (isset($vocab->vid)) {
        $vocab->internal_id = $vocab->vid;
        drupal_set_message(t("Found a vocabulary definition in the input, called {$vocabid}. vid={$vocab->internal_id}"));

        // Try to maintain old Vocabulary IDs
        // Check if it's a good number to write into
        // If the input defines a vid BUT there is already a non-matching vocab with that number, we need a new number
        // If it DOES seem to match, we are safe
        $target_vocab = taxonomy_vocabulary_load($vocab->internal_id);
      }

      #dpm(array('loaded vocab' => $target_vocab, 'parsed vocab' => $vocab));
      if (!empty($target_vocab) && $target_vocab->name == $vocab->name) {

        // Looks like a great match
        $vocab->vid = $vocab->internal_id;
        drupal_set_message(t("Found matching target vocabulary '%vocab_name' vid=%vocab_vid", array(
          '%vocab_name' => $vocab->name,
          '%vocab_vid' => $vocab->vid,
        )));
      }
      else {
        if ($target_vocab) {
          drupal_set_message(t("The vocab ID given in the input file (%vocab_vid) conflicts with an existing vocabulary. We need a different ID... ", array(
            '%vocab_vid' => $vocab->vid,
          )));
        }
        unset($vocab->vid);

        // Vocab with this id exists, but is called something else - Do not merge with it
        // Look for a match by name instead
        if ($target_vocab = taxonomy_xml_get_vocabulary_by_name($vocab->name)) {

          // Found a local vocab called the same as the input vocab. That's a good enough match for us.
          $vocab->vid = $target_vocab->vid;
          drupal_set_message(t("Found a target vocabulary already in the database, matching by name '%name' vid=%vid . This will be used, but not updated.", array(
            '%name' => $vocab->name,
            '%vid' => $vocab->vid,
          )));
        }
      }

      // Have we found a target vocab yet?
      if (empty($vocab->vid)) {

        // Make a brand new one from the imported definition
        $vocab = _taxonomy_xml_get_vocabulary_placeholder($vocab->name);

        // Built a vocabulary from input details. Now save it
        $vocab_array = (array) $vocab;
        $status = taxonomy_save_vocabulary($vocab_array);
        $strings = array(
          '%name' => $vocab->name,
          '%description' => $vocab->description,
        );
        $vocab = taxonomy_vocabulary_load($vocab_array['vid']);
        if (!empty($vocab->vid)) {
          drupal_set_message(t("Made a new Drupal vocabulary definition from data found in the input. Vocab is called: '%name': %description ", $strings));
        }
        else {
          drupal_set_message(t("Failed to create a new vocabulary called: '%name' : %description \n This is fatal, aborting.", $strings), 'error');
          return FALSE;
        }

        #dpm($vocab);
      }
    }

    // end looping through found vocabs
  }
  else {
    drupal_set_message("The document provided no recognisible vocabulary definitions");
  }

  // This is the default (last found) vid. Probably should not be used, but we may have to make a guess.
  // Either an input file contains just one vocab - in which case this will be right,
  // or the input file contains multiple vocabularies - in which case the terms damn well ought to be tagged with which vocab to use.
  return isset($vocab->vid) ? $vocab->vid : NULL;
}

/**
 * Convert aliased predicates into common ones.
 *
 * Given a term with a collection of named predicate relations, convert those
 * into canonic (known, defined) terms. This involves some duplication as the
 * original and true names are both packed into the $term->predicates array.
 * Only the true names are looked at later however.
 */
function taxonomy_xml_canonicize_predicates(&$term) {

  // Translate the predicate statements into what we need
  if (empty($term->predicates)) {
    $term->predicates = array();
  }

  // $predicate_synonyms is a translation array to match synonyms from various syntaxes with Drupal concepts
  $predicate_synonyms = taxonomy_xml_relationship_synonyms();
  foreach ($term->predicates as $predicate => $values) {
    $original_predicate = $predicate;

    // First translate misc terminology synonyms to the cannonic predicate I use everywhere
    // This allows us to interpret several XML dialects at once
    if (isset($predicate_synonyms[$predicate]) && ($cannonic = $predicate_synonyms[$predicate])) {
      $predicate = $cannonic;
    }

    # drupal_set_message(t("Applying '$predicate' ($predicate) value of ". print_r($values, 1) ." found in $guid"));
    switch ($predicate) {
      case TAXONOMY_XML_DESCRIPTION:
        $term->description = taxonomy_xml_get_literal_string($values);
        break;
      case TAXONOMY_XML_NAME:

        // In the (hopefully never) case that a term has, eg, both a 'name'
        // and a 'title' set, and different, we may have conflicts to resolve.
        // Pre-empt this here by noting both, but revisit if we can come up
        // with better logic, eg ordering priority of synonyms.
        $val = taxonomy_xml_get_literal_string($values);
        if (isset($term->name) && $val != $term->name) {
          $term->name .= ' (' . $val . ')';
        }
        else {
          $term->name = $val;
        }
        break;
      case TAXONOMY_XML_PARENT:
        foreach ($values as $i => $target_uri) {
          $term->predicates[$predicate][$i] = $target_uri;
          $strings = array(
            '%predicate' => $predicate,
            '%subject' => isset($term->name) ? $term->name : $term->guid,
            '%target_uri' => $target_uri,
          );
        }
        break;
      case TAXONOMY_XML_RELATED:
      case TAXONOMY_XML_CHILD:

        // A term relationship.
        // Translate each referred item from URI to its label or handle,
        // and save to be linked in later
        foreach ($values as $i => $target_uri) {
          $term->predicates[$predicate][$i] = $target_uri;
        }
        break;
      case TAXONOMY_XML_HAS_SYNONYM:
        $term->synonyms_array = isset($term->synonyms_array) ? array_merge($term->synonyms_array, $values) : $values;
        $term->synonyms = implode("\n", array_unique($term->synonyms_array));
        break;
      case TAXONOMY_XML_IN_VOCABULARY:

        /* currently not used very much - more than one vocab per input file is rare
           // This term need to be in the vocabulary referred to by this URI
           // check our known vocabs to see if they are recognised
           // Do we know a vocab with an ID matching this 'isdefinedby' value?
           dpm(array('looking for vocab' => $values));
           foreach ($values as $value) { // probably just one...
           if (isset($vocabularies[$value])) {
           // I know this vocab!
           $term->vid = $vocabularies[$value]->vid;
           }
           }
           */
        break;
      case 'type':

      // These are already done. Ignore
      case TAXONOMY_XML_UNUSED:

        // Explicitly ignore these
        break;
      case TAXONOMY_XML_OTHER_PREDICATE:

        // These ones we'll try to save as RDF statements, attached to the term URI
        foreach ($values as $value) {
          $term->rdf[] = array(
            'subject' => NULL,
            'predicate' => $original_predicate,
            'object' => $value,
          );
        }
        watchdog('taxonomy_xml', "\n          Found a useful predicate '<b>%predicate</b> = %value'.\n          Making a note of it for pure-RDF storage.\n          ", array(
          '%predicate' => "{$predicate} ({$original_predicate})",
          '%subject' => isset($term->name) ? $term->name : $term->guid,
          '%value' => $value,
        ), WATCHDOG_INFO);
        break;
      default:

        // A valid, but unrecognised statement was found when flattening the input
        watchdog('taxonomy_xml', "\n          Dunno what to do with '<b>%predicate</b>'.\n          Subject '%subject' has value(s) = <pre>!values</pre>\n          A later content type may absorb this info,\n          but it's not a core term property.", array(
          '%predicate' => $predicate,
          '%subject' => isset($term->name) ? $term->name : $term->guid,
          '!values' => print_r($values, 1),
        ), WATCHDOG_DEBUG);
    }
  }
  if (!empty($term->guid)) {
    taxonomy_xml_set_term_guid($term, $term->guid);
  }

  #dpm(array(__FUNCTION__ => $term, 'transformed predicates are' => $term->predicates));
}

/**
 * Queue up an import action.
 *
 * If the currently processing term refers to other terms by URI, set up a
 * job to retrieve them recursively later.
 *
 * For all unknown $term->predicates[TAXONOMY_XML_CHILD] URIs, add a job to the
 * batch queue.
 *
 * Helper function for parser routines
 *
 * @see taxonomy_xml_add_term_to_batch_queue()
 */
function taxonomy_xml_add_all_children_to_queue($term) {

  #dpm(array('Queuing children of ' => $term));
  if (variable_get('taxonomy_xml_recurse_down', TRUE) && !empty($term->predicates[TAXONOMY_XML_CHILD])) {

    // Add child items to the import queue.
    $children = $term->predicates[TAXONOMY_XML_CHILD];

    #dpm(array('Queuing children' => $children));
    foreach ((array) $children as $child_ref) {

      // Check that it looks like a valid URL we can request
      $scheme = "unknown";
      if (valid_url($child_ref)) {

        // The ref is a URI.
        // but LSID identifiers pass that test :)
        $url_parts = @parse_url($child_ref);
        $scheme = isset($url_parts['scheme']) ? $url_parts['scheme'] : 'no scheme';
      }
      if (isset($url_parts['host']) && $url_parts['host'] == '_') {

        // BEWARE, RDF bnodes ("_:123") may look like URIs
        // Ignore them
        continue;
      }
      if ($scheme == 'http') {

        // Check we don't know it already.
        if ($found_term = taxonomy_xml_get_term_by_guid($child_ref, $term->vid)) {

          #dpm("Found known term by URI, $child_ref is ". $found_term->tid );
          watchdog('taxonomy_xml', "While processing %term_name, found an existing local version\n            of its child. # !ref\n            This means it will not be re-queued.\n            ", array(
            '%term_name' => $term->name,
            '!ref' => l('taxonomy/term/' . $found_term->tid, $found_term->tid),
          ), WATCHDOG_DEBUG);

          // This is cool, we have a handle on this term. Make a note in the global list
          $terms =& taxonomy_xml_current_terms();
          $terms[$child_ref] = $found_term;
        }
        else {

          // Save the request as a batch job to do later. Our session queue will tuck this away and remember it.
          // Create a placeholder so at least we know where this item is being imported to
          // Beware memory. This should be lightweight, as the batch API seems to be treating it inefficiently.
          $placeholder_term = (object) array(
            'guid' => $child_ref,
            'parent' => array(
              $term->tid => $term->tid,
            ),
            'vid' => $term->vid,
          );

          // Some data sources MAY supply a known name for this child, but that's too hard to extract
          // Trust the named resource will fill in the gaps, and just know it's a URI for now
          taxonomy_xml_add_term_to_batch_queue($placeholder_term);
          watchdog('taxonomy_xml', "\n            While processing %term_name,\n            Found a reference to child term !child_ref.\n            Queuing it for later retrieval and import", array(
            '%term_name' => $term->name,
            '!child_ref' => l($child_ref, $child_ref),
          ), WATCHDOG_NOTICE);
        }
      }
      else {

        // The referred term is not a URI, nor do we recognise its identifier so far.
        // It's a dangling reference. What can we do?
        // Handle URI/GUID lookup services?
        //
        // @todo Should do this with a hook/service-callback rather than built into this module.
        // @todo - this uses a global, should use batch context info
        // Lets see if it fits the pattern that a lookup service expects
        if ($service_id = variable_get('taxonomy_xml_service_id', '')) {
          $services = taxonomy_xml_lookup_services(NULL, 'full');
          $service = $services[$service_id];
          $lookup_uri = taxonomy_xml_sub_placeholders_into_pattern($service['pattern'], array(
            $service['identifier'] => $child_ref,
          ));

          #drupal_set_message(t('Will use service lookup to find !child_ref', array('!child_ref' => l($child_ref, $lookup_uri), '%name' => $service['name'])));
          $placeholder_term = (object) array(
            'guid' => $lookup_uri,
            'parent' => array(
              $term->tid => $term->tid,
            ),
          );
          taxonomy_xml_add_term_to_batch_queue($placeholder_term);
        }
        else {
          drupal_set_message(t('Cannot yet resolve non-URI references, and no resolver service is active. %child_ref', array(
            '%child_ref' => $child_ref,
          )));
        }
      }
    }
  }
}

/**
 * Merge all predicate data into a simpler array
 *
 * Re-tags the attributes as needed
 *
 * @param
 *   An   object containing a 'predicates' array. For each predicate, a
 * cannonically named attribute will be attached to the object.
 */
function taxonomy_xml_merge_predicates_into_attributes(&$object) {
  if (empty($object)) {
    return;
  }
  $predicate_synonyms = taxonomy_xml_relationship_synonyms();

  // Diagnostics:
  if (empty($object->predicates)) {
    watchdog('taxonomy_xml', "When importing an object, I found some data with no predicates at all. This is odd, but probably no big deal. <pre>!object</pre>", array(
      '!object' => print_r($object, 1),
    ), WATCHDOG_NOTICE);
    $object->predicates = array();
  }
  foreach ($object->predicates as $predicate => $vals) {
    $predicate = isset($predicate_synonyms[$predicate]) ? $predicate_synonyms[$predicate] : $predicate;
    $object->{$predicate} = array_pop($vals);
  }
  if (empty($object->description) && isset($object->{TAXONOMY_XML_DESCRIPTION})) {
    $object->description = $object->{TAXONOMY_XML_DESCRIPTION};
  }
  return $object;
}

/**
 * Either fetch the named term if it exists, or return a useful placeholder.
 *
 * The returned term has a 'synonyms_array' because that's easier to work with
 * than string concats in odd places.
 *
 * If $new = TRUE, then always return a fresh placeholder, do not attempt a
 * string name lookup.
 * Default is $new = FALSE, which attempts re-use of existing terms.
 */
function _taxonomy_xml_get_term_placeholder($name, $vid = 0, $new = FALSE) {

  #dpm("Getting placeholder '$name' vocab:$vid");
  if (!$new) {

    // Look for a pre-existing term by that name
    if ($name) {
      $term = taxonomy_xml_get_term_by_name_from_vocab($name, $vid);
    }
    else {

      // Assert input is OK. Just paranoia
      drupal_set_message(t("Asked to make a term with no name ... that can't be right. I refuse!"), 'error');
      return NULL;
    }
  }
  if (empty($term)) {
    $term = (object) array(
      'name' => $name,
      'vid' => $vid,
      'description' => '',
      'weight' => 0,
      'predicates' => array(),
      'synonyms_array' => array(),
    );
  }
  else {

    #drupal_set_message(t("A term called '!name' already exists. We will just update information onto it.", array('!name' => l($term->name, 'admin/content/taxonomy/edit/term/' . $term->tid) )), 'trace');
  }
  return $term;
}

/**
 * Given a list of terms, set  the related-terms and structure, and save again
 *
 * Helper function for bulk processes.
 *
 * The terms are currently indexed by either URI or name. The reference arrays
 * refer to either the URI or name. Scan the current array for the objects
 * (terms) being linked to.
 *
 * Input would look (in part) like this:
 *
 * $terms = array(
 *   '#123' => (
 *      'name' => 'hotels',
 *      'tid' => 23,
 *      'predicates' => (
 *        'See Also' => ['#135', 'camping']
 *        'Broader Term' => ['accomodation']
 *      )
 *    )
 *   '#135' => ( 'name' => 'motels', 'tid' => 35 )
 *   '#145' => ( 'name' => 'camping', 'tid' => 37 )
 *   'accomodation' => ( 'name' => 'accomodation', 'tid' => 11 )
 * )
 *
 * The process will read the 'predicates', figure out what they mean, figure out
 * which other term is being referenced, and create properties on the term
 * object.
 *
 * And will return the term objects with appropriate Drupal attributes
 *
 *   '#123' => (
 *      'name' => 'hotels',
 *      'nid' => 23,
 *      'parent' => 11,
 *      'relations' => array(35, 37),
 *    )
 *
 * Note that the key need have no relation with the nid,
 * and may be a full string, which will work just as well.
 * The above shows an example of both,
 * although that would be rare in the one import.
 *
 *
 * Relationships cannot be created if the target term is not included in the
 * $terms list itself. If we are adding to an existing vocab, doing a partial
 * merge, the target terms should have already been retrieved from the database
 * and included in the complete list.
 *
 *
 * @param
 *   An    indexed array of existing taxonomy term objects, possibly referring
 * to each other by id. It's possible for the same term to be in the list twice,
 * under different keys, (indexed by tid, name or URL) but these should be
 * HANDLES on the same object by reference, so changes will stick.
 */
function taxonomy_xml_set_term_relations(&$terms) {

  #drupal_set_message(t("Now connecting all known term relations and hierarchy links between this group of %count related terms.", array('%count' => count($terms) )), 'trace');
  $relationship_predicates = array(
    TAXONOMY_XML_PARENT,
    TAXONOMY_XML_CHILD,
    TAXONOMY_XML_RELATED,
  );
  foreach ($terms as $guid => &$term) {

    // Simplify the t() calls.
    $strings = array(
      '%tid' => $term->tid,
      '%guid' => $guid,
      '!name' => l($term->name, 'admin/content/taxonomy/edit/term/' . $term->tid),
    );

    // Avoid doing this again if we are stuck in a recursive loop,
    // batch, or working with duplicate handles
    if (isset($term->taxonomy_xml_linked)) {
      continue;
    }

    // Load this to detect looping relationships
    $ancestors = taxonomy_xml_get_term_ancestors($term);

    // The predicates (relationships) array may contain actual handles on terms, term ids,
    // or it may still contain URIs representing terms not yet loaded in this phase.
    // We need to resolve those external references into term handles (or at least tids) if possible.
    if (isset($term->predicates) && is_array($term->predicates)) {
      foreach ($term->predicates as $predicate => &$targets) {
        $strings['%predicate'] = $predicate;
        if (in_array($predicate, $relationship_predicates)) {

          // Keep a list just for logging
          $found_term_names = array();
          foreach ($targets as $target_ix => &$target) {

            // $target is the identifier guid.
            // We will initialize $terms[$target]
            watchdog('taxonomy_xml', "Term %termname references %target as a %predicate", array(
              '%termname' => $term->name,
              '%target' => $target,
              '%predicate' => $predicate,
            ), WATCHDOG_DEBUG);

            // Here we first try to find the referred term in the list of recently-made terms
            if (!isset($terms[$target])) {

              #drupal_set_message(t("Referenced term %target seems unknown so far, need to try a bigger lookup for it", array('%targetid' => $targetid)));

              // taxonomy_enhancer.module, if available, may have more data about our terms. Hopefully including a GUID.
              if ($found_term = taxonomy_xml_get_term_by_guid($target, $term->vid)) {
                $terms[$target] = $found_term;
                $found_term_names[] = l($found_term->name, "taxonomy/term/{$found_term->tid}") . ' ' . l('#', $target);
              }
              else {
                if ($found_term = taxonomy_xml_get_term_by_guid($target, $term->vid)) {
                  $terms[$target] = $found_term;
                  $found_term_names[] = l($found_term->name, "taxonomy/term/{$found_term->tid}") . ' ' . l('#', $target);
                }
                else {

                  #drupal_set_message(t("We so far have no knowledge of the referenced term - ". $term->name ." '$predicate' !targetid. It should be imported later and linked in.", array('!targetid' => l($targetid, $targetid))));
                  $found_term_names[] = $target;
                }
              }
            }
            else {

              // already know about it
              $found_term_names[] = $terms[$target]->name;

              #if (empty($terms[$target]->name)) dpm($terms[$target]);

              # . (!empty($terms[$target]->tid)) ? " (taxonomy/term/{$terms[$target]->tid})" : '' ;
            }

            // Drupal is liable to implode if we ever establish a relationship
            // from a term to itself.
            // Lets never allow that, remove the relationship.
            if (isset($term->guid) && $term->guid == $target) {
              watchdog('taxonomy_xml', "Not supporting setting !name as related to itself as a %predicate. Avoiding a potential infinite loop.", $strings, WATCHDOG_WARNING);
              unset($term->predicates[$predicate][$target_ix]);
            }
          }

          // each referred term
          watchdog('taxonomy_xml', '%predicate relations of %term_name are : %targets', array(
            '%term_name' => $term->name,
            '%predicate' => $predicate,
            '%targets' => implode(', ', $found_term_names),
          ), WATCHDOG_INFO);
        }

        // Is a 'relationship' sort of predicate.

        #else {

        #    watchdog('taxonomy_xml', "$predicate is not a ". implode(',', $relationship_predicates));

        #}
      }

      // each type of predicate
    }

    // has predicates
    // Go through all and add relationships
    // Note that a modification was made by flagging $term->taxonomy_xml_relinked = TRUE;
    //
    // The linking earlier may have given us some duplicates if the source had redundant info, so filter for uniques
    if (isset($term->predicates[TAXONOMY_XML_PARENT]) && is_array($term->predicates[TAXONOMY_XML_PARENT])) {
      foreach (array_unique($term->predicates[TAXONOMY_XML_PARENT]) as $key => $othertermname) {
        if ($othertermname) {

          // Here we try to find the referred term in the list of recently-made terms
          if (isset($terms[$othertermname])) {
            $parent = $terms[$othertermname];
            if ($parent && isset($parent->tid)) {

              // Due to possible data inconsistencies (input from Freebase)
              // We need to take care to avoid infinite ancestry loops
              // Which Drupal doesn't handle.
              // It prevents that in the UI, not at the data level
              // So need to scan the ancestry tree to make sure we don't add this term
              // as a descendant of itself
              $ancestors = taxonomy_xml_get_term_ancestors($parent);
              if (in_array($term->tid, array_keys($ancestors))) {
                watchdog('taxonomy_xml', "Not setting !name as a descendant of itself. Avoiding a potential infinite loop.", $strings, WATCHDOG_WARNING);
                continue;
              }
              global $_taxonomy_xml_current_doc;
              drupal_set_message(t("!name # %tid is a child of !parent # %ptid (<a href='!source' style='font-size:x-small'>source</a>)", array(
                '!name' => l($term->name, 'admin/content/taxonomy/edit/term/' . $term->tid),
                '%tid' => $term->tid,
                '!parent' => l($parent->name, 'admin/content/taxonomy/edit/term/' . $parent->tid),
                '%ptid' => $parent->tid,
                '!source' => $_taxonomy_xml_current_doc,
              )));
              $term->parent[$parent->tid] = $parent->tid;
            }
          }
          else {

            #drupal_set_message(t("Couldn't find the parent identified as %termname for %name # %tid", array('%termname' => $othertermname, '%name' => $term->name, '%tid' => $term->tid) ));
          }
        }
      }
      $term->taxonomy_xml_relinked = TRUE;
    }

    #else{drupal_set_message(" $name ". $term->tid ." has no parent term");}

    // Run through children and set their parent to this.
    //
    // On the first run-through, it's premature to try and set children,
    // the link is supposed to be set from child -> parent.
    // But - if updating an existing parent, or if the child terms WERE successfully
    // found already and have tids, then DO go and set them as children.
    // This is only to support later modified imports where a parent heirarchy
    // is being re-adjusted over top of known terms.
    if (isset($term->predicates[TAXONOMY_XML_CHILD]) && is_array($term->predicates[TAXONOMY_XML_CHILD])) {
      foreach (array_unique($term->predicates[TAXONOMY_XML_CHILD]) as $key => $othertermname) {
        $strings['!child_guid'] = $othertermname;

        // It would create a loop if we ever set a child that was also an ancestor
        if (in_array($term->tid, array_keys($ancestors))) {
          watchdog('taxonomy_xml', "Not supporting setting !name as related to an ancestor as a %predicate. Avoiding a potential infinite loop.", $strings, WATCHDOG_WARNING);
          unset($term->predicates[$predicate][$target_ix]);
        }

        // For each child of this parent
        watchdog('taxonomy_xml', "!name # %tid Has a child identified as !child_guid", $strings, WATCHDOG_DEBUG);
        if (!empty($othertermname) && isset($terms[$othertermname])) {
          $child_term =& $terms[$othertermname];
          if (!$child_term->tid) {

            // This was a placeholder only, not yet made.
            // SO the parent cannot force it it link back.
            continue;
          }
          $strings['!child'] = l($child_term->name, 'admin/content/taxonomy/edit/term/' . $child_term->tid);
          if (empty($child_term->parent)) {
            $child_term->parent = array();
          }

          // If it's not already set,
          if (!in_array($term->tid, $child_term->parent)) {

            // set the child to point to this parent
            $child_term->parent[$term->tid] = $term->tid;
            drupal_set_message(t("!name # %tid Has a child called !child", $strings));

            // Save it now. This may save too often, but
            // this change may have been missed otherwise, due to arbitrary processing order
            if (empty($child_term->taxonomy_xml_relinked)) {
              $save_term = (array) $child_term;
              taxonomy_save_term($save_term);
            }
          }
          else {
            watchdog('taxonomy_xml', "!name already knows it has a child called !child", $strings, WATCHDOG_DEBUG);
          }
        }
        else {
          watchdog('taxonomy_xml', "We haven't loaded child term !child_guid in this run, so not touching it. Could be if recursion is off.", $strings, WATCHDOG_DEBUG);
        }
      }
    }

    // Set the is-related link also
    if (isset($term->predicates[TAXONOMY_XML_RELATED]) && is_array($term->predicates[TAXONOMY_XML_RELATED])) {
      foreach (array_unique($term->predicates[TAXONOMY_XML_RELATED]) as $key => $othertermname) {
        if ($othertermname) {
          if (isset($terms[$othertermname])) {
            $related = $terms[$othertermname];
            $term->relations[$related->tid] = $related->tid;

            #drupal_set_message("Term ". $term->name ." ". $term->tid ." is related to $related->name ". $related->tid);
          }
          else {
            drupal_set_message(t("\n              Couldn't find the term called '%termname'\n              to link to '%name' as being related to this.\n              This relationship will be discarded. ", array(
              '%name' => $term->name,
              '%termname' => $othertermname,
              '%debug' => print_r(array_keys($terms), 1),
            )));

            #dpm(array('available terms' => array_keys($terms)));
          }
        }
      }
      $term->taxonomy_xml_relinked = TRUE;
    }
    if (!empty($term->synonyms_array)) {
      $term->synonyms = implode("\n", array_unique($term->synonyms_array));
      $term->taxonomy_xml_relinked = TRUE;
    }

    #dpm(array('Saving' => $term));
    $term->taxonomy_xml_linked = TRUE;

    // for efficiency, only re-save terms that really need it
    if (!empty($term->taxonomy_xml_relinked)) {
      $save_term = (array) $term;
      taxonomy_save_term($save_term);
      unset($term->taxonomy_xml_relinked);
    }
  }
}

/**
 * Return a list of all terms in the ancestry chain of the given term.
 *
 * List list includes all parents from multi-inheritance.
 *
 * Use this to check before setting up a parental chain that may loop
 *
 * @return
 *   array  keyed by tid, but in no specific order.
 * @see taxonomy_get_parents_all()
 */
function taxonomy_xml_get_term_ancestors($term) {
  $parents = taxonomy_get_parents_all($term->tid);

  // This always includes self, so skip that.
  array_pop($parents);

  // index by tid
  $ancestors = array();
  foreach ($parents as $parent) {
    $ancestors[$parent->tid] = $parent;
  }
  return $ancestors;
}

/**
 * Insert a URI serialization into a term object.
 *
 * Does NOT actually save the value, but puts values that will be saved by the
 * appropriate handlers (CCK, taxonomy_enhancer or rdf) when the term is saved.
 */
function taxonomy_xml_set_term_guid(&$term, $guid) {
  $term->guid = $guid;

  // CCK (only supported via nat or enhancer)
  if (module_exists('taxonomy_enhancer')) {
    $term->field_guid[0]['#value'] = $guid;
    $term->fields['field_guid'][0]['value'] = $guid;

    // taxonomy_enhancer swaps between these two versions of the same data when reading and writing
    // Do both, as te is unstable.
  }
  if (module_exists('rdf')) {

    // Set a statement that RDF.module can serialize
    $term->rdf[] = array(
      'predicate' => 'owl:sameAs',
      'object' => $guid,
    );
  }

  // @todo other ways of noting this?
}

/**
 * Return (and remember) the URI associated with this term.
 *
 * Abstracted into a getter, because it may be serialized in different ways.
 */
function taxonomy_xml_get_term_guid(&$term) {

  // taxonomy_guid will support this in core, if enabled.
  if (!empty($term->guid)) {
    return $term->guid;
  }

  // CCK (only supported via nat or enhancer)
  if (isset($term->field_guid)) {
    $term->guid = $term->field_guid[0]['#value'];
    return $term->guid;
  }

  // RDF.module
  if (module_exists('rdf')) {
    $term_url = taxonomy_xml_rdf_taxonomy_term_path($term);
    $about_term = rdf_query($term_url, 'owl:sameAs', NULL, array())
      ->to_array();
    foreach ($about_term as $sid => $statement) {
      $term->guid = $statement[2];
      return $term->guid;
    }
  }
}

/**
 * Deprecated, to avoid conflict with D7 entities, which use uri for their own
 * purpose
 */
function taxonomy_xml_get_term_uri($term) {
  watchdog('taxonomy_xml', __FUNCTION__ . ' deprecated. Use taxonomy_xml_get_term_guid() instead', array(), WATCHDOG_NOTICE);
  return taxonomy_xml_get_term_guid($term);
}

/**
 * Special lookup for terms if they are saved with a URI or GUID
 *
 * Very specific to certain ways of serializing terms, REQUIRES
 * taxonomy_enhancer and a field called field_URI
 * OR
 * rdf.module and an owl:sameAs relation
 */
function taxonomy_xml_get_term_by_guid($guid, $vid = NULL) {
  if (!$guid) {
    return NULL;
  }

  // Our own taxonomy_guid module is the simplest one, check that first.
  if (module_exists('taxonomy_guid')) {
    $terms = taxonomy_guid_get_term($guid, $vid);
    if (count($terms) > 1) {

      // Not sure how it happened, but we may get more than one result.
      watchdog('taxonomy_xml', "This is confusing, apparently there are more than one local match\n        with the GUID '%guid' .\n        <pre>!lookups</pre>", array(
        '%guid' => $guid,
        '!lookups' => print_r($terms, 1),
      ), WATCHDOG_WARNING);
    }
    if (!empty($terms)) {
      return reset($terms);
    }
  }

  // taxonomy_enhancer.module, if available, may have more data about our terms. Hopefully including a GUID.
  if (function_exists('taxonomy_enhancer')) {
    $searchterm = (object) array(
      'field_guid' => $guid,
    );
    $results = taxonomy_enhancer_get_term($searchterm);
    if (!empty($results)) {

      #drupal_set_message(t("Successfully found a known target term indexed by external <a href='!guid'>GUID</a>.", array('!guid' => $guid)));
      $term = array_pop($results);
    }
    else {

      #dpm("Couldn't find a known item with a URI = $guid ");
    }
  }
  if (module_exists('rdf')) {

    // Lookup the RDF values to see if this term has any 'sameAs' matches
    // Note the canononic or remote URL will be on the right of the triple
    $lookups = rdf_normalize(rdf_query(NULL, 'owl:sameAs', $guid));

    // Normalized data is indexed by [subject][predicate][] = object
    // We are looking for the value on the left - the subject.
    $local_term_paths = array_keys($lookups);
    if (count($local_term_paths) > 1) {

      // Not sure how it happened, but we may get more than one result.
      watchdog('taxonomy_xml', "This is confusing, apparently there are more than one local match\n        that are sameAs '%guid' .\n        <pre>!lookups</pre>\n        Possibly the same concept in a different vocabulary.\n        I'm only going to deal with one of them (the one in the current vocab - if any).", array(
        '%guid' => $guid,
        '!lookups' => print_r(array_keys($lookups), 1),
      ), WATCHDOG_DEBUG);
    }
    $term_base_url = url('taxonomy/term/', array(
      'absolute' => TRUE,
    ));

    // There is (almost certainly) only one value all down this tree,
    // but I don't know the keys, just iterate.
    foreach ((array) $lookups as $subject => $predicate_array) {
      foreach ($predicate_array as $predicate => $value_array) {
        foreach ($value_array as $i => $found_value) {

          // Stored in the database was a URI, backtrack to the term ID.
          // Is my term sameAs the URI?
          if (strstr($subject, $term_base_url)) {

            // yep, it's a local term URL
            $tid = intval(drupal_substr($subject, drupal_strlen($term_base_url)));
            if ($found_term = taxonomy_get_term($tid)) {
              watchdog('taxonomy_xml', 'Found <a href="!term_link">an existing term %term_name</a>
                in vocab %vid when looking for %guid', array(
                '%guid' => $guid,
                '%term_name' => $found_term->name,
                '%vid' => $found_term->vid,
                '!term_link' => url('taxonomy/term/' . $found_term->tid),
              ), WATCHDOG_DEBUG);

              // Now I found the term, check it's the right vocab (in case of multiples)
              if ($vid && $found_term->vid == $vid) {
                $term = $found_term;
              }
            }
          }

          // found a string match
        }

        // loop all result values
      }

      // loop all (1) predicates
      // Keep looking if that didn't work
      // (highly rare there's more than one)
    }

    // loop all (1) result
  }

  // RDF.module lookup
  return isset($term) ? $term : NULL;
}

/**
 * Given a URI ening in a #fragment or a /basename, return just that string.
 *
 * A definition entity may arrive without a proper readable label - only the
 * string at the end of the URI to identify it.
 */
function taxonomy_xml_label_from_uri($uri) {
  $url_parts = @parse_url($uri);
  if (!empty($url_parts['fragment'])) {
    return $url_parts['fragment'];
  }
  else {
    return basename($uri);
  }
}

/**
 * A caching version of file_get_contents.
 *
 * Used to try and minimize remote URL lookups.
 *
 * A URI with a #tag is NOT a different file, so all requests for that doc will
 * return the same result.
 *
 * File requests are usually either one huge one (possibly many times if it
 * uses internal anchors), or many little ones. As well as the file-level
 * caching, we'll statically cache the MRU.
 * Because we work in batches, the 'static' cache will only hit if the
 * wind is blowing the right way, but it's still a help.
 *
 * Uses content-negotiation! So if retrieving from a repository that speaks
 * that, we'll get the RDF out from under the rendered version.
 * When making a request, we say
 * "Accept: application/rdf+xml"
 * .. and sometimes we get it.
 *
 * @ingroup utility
 */
function taxonomy_xml_cached_get_contents($url, $flush = FALSE) {

  // Note this current active URL for reference in debuging in distant places
  global $_taxonomy_xml_current_doc;
  $_taxonomy_xml_current_doc = $url;
  $url_parts = @parse_url($url);

  // Double-check it's an URL.
  // Don't want random things that look a bit like an URL (LSID urns) getting through here
  if ($url_parts['scheme'] != 'http' && $url_parts['scheme'] != 'https' && $url_parts['scheme'] != 'ftp') {
    watchdog('taxonomy_xml', "Not retrieving remote file. !url is not an HTTP URL", array(
      '!url' => l($url, $url),
      WATCHDOG_WARNING,
    ));
    return NULL;
  }

  // Discard any anchor before MD5-ing it or checking the cache
  // The last part does not count.
  @(list($url, $anchor) = split('#', $url));

  // Check the MRU cache.
  static $old_url, $old_data;
  if ($url == $old_url && !$flush) {
    return $old_data;
  }
  $old_url = $url;
  $cachedir = file_directory_path() . '/url_cache';
  $save_as = $cachedir . '/' . md5($url);
  if (file_exists($save_as)) {
    $content = file_get_contents($save_as);
    $old_data = $content;
    if ($content) {

      // Occasionally got zero-length reponses?
      $flush = l("flush", TAXONOMY_XML_ADMIN . '_xml/flush/' . md5($url));
      watchdog('taxonomy_xml', "Using locally cached copy !local_copy of !url !flush", array(
        '!local_copy' => l(md5($url), $save_as),
        '!url' => l($url, $url),
        '!flush' => $flush,
      ), WATCHDOG_DEBUG);
      return $content;
    }
  }

  // else
  file_check_directory($cachedir, FILE_CREATE_DIRECTORY);

  // Try to do content negotiation here.
  // Using HTTP_ACCEPT, I say I'd RATHER have rdf if you've got it.
  // Also tell them who we are and why we are scraping their data.
  $opts = array(
    'http' => array(
      'method' => "GET",
      'header' => "Accept: application/rdf+xml,*/* \r\n",
      'user_agent' => "taxonomy_xml.module data import running from a Drupal CMS. [" . variable_get('site_name', '') . "]",
    ),
  );
  $context = stream_context_create($opts);

  // Header set - will take effect on this request
  $content = file_get_contents($url, NULL, $context);
  if (!empty($content)) {
    file_put_contents($save_as, $content);
  }
  else {
    watchdog('taxonomy_xml', 'Failed to retrieve valid content from URL <a href="!url">!url</a>', array(
      '!url' => $url,
    ), WATCHDOG_ERROR);
  }
  $old_data = $content;
  return $content;
}

/**
 * Utility menu callback.
 */
function taxonomy_xml_flush_cache_file($hash) {
  $cachedir = file_directory_path() . '/url_cache';
  unlink($cachedir . '/' . $hash);
  return "Deleted {$hash}";
}

/**
 * Deletes ALL cached files. Accessed in the 'advanced' section of the import
 * form.
 */
function taxonomy_xml_flush_file_cache() {
  $cachedir = file_directory_path() . '/url_cache';
  $file_list = file_scan_directory($cachedir, '.*');
  foreach ($file_list as $file_info) {
    unlink($file_info->filename);
  }
  return "Deleted all files within {$cachedir}";
}

/**
 * Fetch a vocabulary by name.
 *
 * Utility Func extending taxonomy.module
 *
 * @return
 *  Vocabulary  object. NULL if not found
 *
 * @ingroup utility
 */
function taxonomy_xml_get_vocabulary_by_name($name) {
  $vs = taxonomy_get_vocabularies();
  foreach ($vs as $voc) {
    if ($voc->name == $name) {
      return $voc;
    }
  }
}

/**
 * Get a term by name from the given vocab.
 *
 * Loads the related terms and parent links as well.
 *
 * @see taxonomy_get_term_by_name()
 *
 * @return
 *  Term  object. NULL if not found
 *
 * @ingroup utility
 */
function taxonomy_xml_get_term_by_name_from_vocab($name, $vid) {
  $matched_terms = taxonomy_get_term_by_name($name);

  // This returns fully loaded terms with peripheral data attached.
  // Check it's the right vocab
  while (($term = array_pop($matched_terms)) && $term->vid != $vid) {
    continue;
  }
  if (!$term) {
    return NULL;
  }

  // What? there is no full taxonomy_term_load()?
  if ($parent_list = array_keys(taxonomy_get_parents($term->tid))) {
    $term->parent = array_combine($parent_list, $parent_list);
  }
  if ($relation_list = array_keys(taxonomy_get_related($term->tid))) {
    $term->relations = array_combine($relation_list, $relation_list);
  }
  $term->synonyms_array = taxonomy_get_synonyms($term->tid);

  // Do a full term load, extra data may be available
  module_invoke_all('taxonomy_term_load', $term);
  return $term;
}

/**
 * Return the full, internal taxonomy/term/n URI
 *
 * Note that we must always use the taxonomy/term/n system path, not the alias.
 *
 * @param
 *   Either  a $term or $tid is fine
 */
function taxonomy_xml_rdf_taxonomy_term_path($term) {

  #return url('taxonomy/term/' . (is_numeric($term) ? $term : ((is_object($term)) ? $term->tid : $term['tid'] )), array('absolute' => TRUE));

  // Don't alias this, Drupal, use the system path.
  return url('taxonomy/term/', array(
    'absolute' => TRUE,
  )) . (is_numeric($term) ? $term : (is_object($term) ? $term->tid : $term['tid']));
}

/**
 * A variation of taxonomy_get_term - load all the peripheral term info too
 */
function taxonomy_xml_get_term($tid, $vid) {
  if (!$vid) {
    $term = taxonomy_get_term($tid);
    $vid = $term->vid;
  }
  return taxonomy_get_tree($vid, $tid, -1, $max_depth = 1);
}

/**
 * Appends any missing data to the given term - by ref.
 */
function taxonomy_xml_taxonomy_term_load($term) {
  if ($parent_list = array_keys(taxonomy_get_parents($term->tid))) {
    $term->parent = array_combine($parent_list, $parent_list);
  }
  if ($relation_list = array_keys(taxonomy_get_related($term->tid))) {
    $term->relations = array_combine($relation_list, $relation_list);
  }
  $term->synonyms_array = taxonomy_get_synonyms($term->tid);
  if ($guid = taxonomy_xml_get_term_guid($term)) {
    $term->guid = $guid;
  }
}

/**
 * Helper function to check if a given string looks like an LSID.
 * If so, it returns it in an array of componant bits.
 * If not, returns NULL.
 *
 * LSID is a "Life Sciences Identifier" GUID used to identify Taxonomic
 * concepts.
 * It's not great, and it's probably not going to carry on living, but it's out
 * there in metadata and web services.
 *
 * For convenience, it also returns a namespaced 'type' so we can quickly see
 * what 'type' of resource the LSID is referring to.
 * eg an LSID starting with 'urn:lsid:ubio.org:classificationbank' is the type
 * of entity that Drupal will call a term, and TCS would call a TaxonConcept.
 */
function taxonomy_xml_parse_lsid($id) {
  $bits = split(":", $id);
  if (count($bits) < 5) {
    return NULL;
  }
  $lsid = array(
    'urn' => $bits[0],
    'schema' => $bits[1],
    'authority' => $bits[2],
    'namespace' => $bits[3],
    'identifier' => $bits[4],
    'version' => @$bits[5],
    # optional
    'type' => implode(':', array(
      $bits[0],
      $bits[1],
      $bits[2],
      $bits[3],
    )),
  );
  if (count($bits) > 4 && $lsid['urn'] == 'urn' && $lsid['schema'] == 'lsid') {
    return $lsid;
  }
  return NULL;
}

/**
 * Manage batch queues by dividing them into recursive 'rounds'.
 *
 * This is required because the current implementation of batch processing isn't
 * actually as atomic as it looks, and it's easy to hit max_allowed_packets just
 * by adding things to the queue.
 *
 * Given a placeholder term item, make a note that it should be retrieved and
 * analysed when possible.
 *
 * @param $term a template term object, must include at least a URI that
 * indicates where the rest of the information should be sourced from.
 *
 * If no $term is given, this function RETURNS the remaining queue so far, in
 * the form of a batch configuration object that can be batch_set().
 *
 * Retrieving the queue empties the static list held here, so it can only be
 * done once.
 *
 * The returned queue will include a rider at the end of the operations that
 * will trigger a recursion if it finds that new terms have been added to this
 * list in the most recent round.
 *
 * @ingroup batch_operations
 */
function taxonomy_xml_add_term_to_batch_queue($term = NULL) {
  if ($term) {
    $_SESSION['taxonomy_xml_batch_queue'][$term->guid] = array(
      'taxonomy_xml_import_from_url',
      array(
        $term,
      ),
    );
    watchdog('taxonomy_xml', "Batch Queued %term for import later...", array(
      '%term' => $term->guid,
    ), WATCHDOG_DEBUG);

    // To avoid overruns, ensure that batches are not too big
    if (count($_SESSION['taxonomy_xml_batch_queue']) >= TAXONOMY_XML_MAX_BATCH_SIZE) {
      batch_set(taxonomy_xml_add_term_to_batch_queue());
    }
  }
  else {

    // Called with no arg,
    // this means we want to return the queue so far, and flush it from here.
    if (!empty($_SESSION['taxonomy_xml_batch_queue'])) {

      // Prepare a batch config
      $batch_settings = array(
        'finished' => 'taxonomy_xml_batch_import_finished',
        'title' => t('Processing all queued import requests.'),
        'init_message' => t('Starting Batch Taxonomy Import.'),
        'progress_message' => t('Processed @current out of @total. (May require further recursion)', array()),
        'error_message' => t('Batch Taxonomy Import has encountered an error.'),
      );

      // Queue up our ops, and flush them from here.
      $batch_settings['operations'] = $_SESSION['taxonomy_xml_batch_queue'];
      unset($_SESSION['taxonomy_xml_batch_queue']);
      drupal_set_message(t("Retrieving the next batch queue. %operations_count operations in this batch . ", array(
        '%operations_count' => count($batch_settings['operations']),
      )));

      // The last thing each round should do is queue up the next round. Add this callback to the operations.
      $batch_settings['operations']['final'] = array(
        'taxonomy_xml_batch_requeue_more',
        array(),
      );
      return $batch_settings;
    }
    else {

      // If the queue is empty, return NULL so the caller won't get confused by an empty batch
      return NULL;
    }
  }
}

/**
 * Import data from one URL. Function used by the batch operation
 *
 * $param $term_placeholder A term object that is expected to be provided with
 * at least a URI to go fetch data from. May have other context info (like the
 * parent attribute) already set. This data should be merged onto that which is
 * absorbed.
 *
 * @ingroup  batch_operations
 */
function taxonomy_xml_import_from_url($term_placeholder, &$context) {
  $text = taxonomy_xml_cached_get_contents($term_placeholder->guid);
  if (empty($text)) {
    drupal_set_message(__FUNCTION__ . ' ' . t('Retrieved no content from URL %url. Returning failure.', array(
      '%url' => $term_placeholder->guid,
    )), 'error');
    return FALSE;
  }

  #drupal_set_message(t('Retrieved Queued URL %url. Now processing it.', array('%url' => $term_placeholder->guid)));

  // taxonomy_xml_cached_get_contents() used content negotiation.
  // @todo maybe add rdf-alternative awareness.
  // If we got given XHTML+RDFa, maybe that's useful too
  $format = variable_get('taxonomy_xml_format', 'rdf');
  $vid = isset($term_placeholder->vid) ? $term_placeholder->vid : variable_get('taxonomy_xml_vid', 0);

  // Conditionally include and invoke the appropriate format library
  module_load_include('inc', 'taxonomy_xml', $format . '_format');
  taxonomy_xml_include_module_hooks();
  $funcname = "taxonomy_xml_{$format}_parse";
  if (function_exists($funcname)) {
    $terms = $funcname($text, $vid, $term_placeholder->guid);

    // $terms is an array, as one URL may produce several terms,
    // It also contains all the known terms we referred to this round, not just the new one.
  }
  else {
    watchdog('taxonomy_xml', 'Error loading expected parse function %funcname . This is pretty bad and wholly unexpeceted. The library %format_format must be broken?', array(
      '%funcname' => $funcname,
      '%format' => $format,
    ), WATCHDOG_ERROR);
  }

  #dpm(array('placeholder was' => $term_placeholder, 'terms are' => $terms));
  $context['message'] = "Imported from " . $term_placeholder->guid;
  if (!empty($terms[$term_placeholder->guid])) {
    $this_term = $terms[$term_placeholder->guid];
    $context['results'][$this_term->tid] = $this_term->name;
    $context['message'] .= "<br/>Result: " . $this_term->name;
  }
  return $terms;
}

/**
 * Batch callback action that should happen at the end of each round of
 * processing.
 *
 * The last thing that happens in a batch 'round' is the next round gets added
 * to the queue.
 *
 * @ingroup  batch_operations
 */
function taxonomy_xml_batch_requeue_more(&$context) {
  if ($new_jobs = taxonomy_xml_add_term_to_batch_queue()) {
    $context['message'] = t("Finished one round of imports, but the process found still more to do. Restarting to process a further %remaining_count items.", array(
      '%remaining_count' => count($new_jobs['operations']),
    ));
    batch_set($new_jobs);
    watchdog('taxonomy_xml', $context['message']);
  }
}

/**
 * Finalization callback when a round is done.
 *
 * @ingroup batch_operations
 */
function taxonomy_xml_batch_import_finished($success, $results, $operations) {
  if ($success) {
    $message = t("Completed a batch round. %count items processed.", array(
      '%count' => count($results),
    ));
    foreach ($results as $tid => $term_name) {
      $message .= " " . l($term_name, 'taxonomy/term/' . $tid);
    }
  }
  else {

    // An error occurred.
    // $operations contains the operations that remained unprocessed.
    $error_operation = reset($operations);
    $message = 'An error occurred while processing ' . $error_operation[0] . ' with arguments :' . print_r($error_operation[1], TRUE);
    watchdog('taxonomy_xml', "Batch error " . print_r(array(
      $success,
      $results,
      $operations,
    ), 1), array(), WATCHDOG_ERROR);
  }
  watchdog('taxonomy_xml', $message);
  drupal_set_message($message);
}

/**
 * Return a list of known taxonomy services. Optionally filtered by type,
 * or formatted for re-use.
 *
 * The sample services distributed with this module are in the file
 * lookup_services.inc
 *
 * @see taxonomy_xml_taxonomy_servers()
 *
 * @return
 *  A  structured array describing each service and how it can be invoked
 *
 * @param
 *  $type  if set, return only services of the named type. currently 'search' or
 * 'lookup'
 *
 * @param
 *  $mode  either 'full', 'options' or 'links' full returns the whole
 * description, options returns an array suitable for use in select box, links
 * may return a list linked to more information about the service. @todo
 *
 * @ingroup taxonomy_xml_services
 */
function taxonomy_xml_lookup_services($type = NULL, $mode = 'full') {

  // Use a hook lookup to allow any later modules to provide new search services
  module_load_include('inc', 'taxonomy_xml', 'lookup_services');
  static $services;
  if (empty($services)) {
    $services = module_invoke_all('taxonomy_servers');
  }
  $requested = $services;
  if ($type) {

    // filter out unwanted
    foreach ($requested as $id => $service) {
      if ($service['servicetype'] != $type) {
        unset($requested[$id]);
      }
    }
  }
  if ($mode == 'options') {

    // Return an array suitable for use in a form select element
    $options = array();
    foreach ($requested as $id => $service) {
      $options[$id] = $service['provider'] . " - " . $service['name'];
    }
    return $options;
  }
  return $requested;
}

/**
 * Replace URL patterns containing placeholders for data values.
 *
 * Used when invoking GET URL services
 *
 * Given a pattern like http://example.com/lookup?id=!id&rank=!rank and array
 * ('rank' => 'Genus', 'id' => 55596) those values will be placed into the URL.
 * Note that we add '!' before doing the sub, to avoid incorrect placements,
 * otherwise we'd just use strtr()
 *
 * @ingroup taxonomy_xml_services
 */
function taxonomy_xml_sub_placeholders_into_pattern($pattern, $values) {
  $subs = array();
  foreach ($values as $var => $val) {
    $subs['!' . $var] = $val;
  }
  return strtr($pattern, $subs);
}

/**
 * Admin help page listing details of available services
 * A menu page callback
 *
 * @ingroup taxonomy_xml_services
 */
function taxonomy_xml_about_services() {
  drupal_set_title(t('About Taxonomy Import Services'));
  $services = taxonomy_xml_lookup_services();
  $output = "";
  foreach ($services as $service) {
    @($output .= "<dt>{$service['provider']} - {$service['name']} ({$service['protocol']} {$service['servicetype']})</dt>");
    @($output .= "<dd>{$service['description']} <br/> <em>{$service['format']}</em> <br/> " . l($service['about'], $service['about']) . "</dd>");
  }
  $output = "<dl>{$output}</dl>";
  return $output;
}

/**
 * Make a request on a remote taxonomy server and process the response
 *
 * Remote services may be slow or unavailable, so we need to allow for that.
 *
 * @param
 *   $service  an array describing properties of the chosen service
 * @param
 *   $values as submitted from a form, settings (such as an
 * item ID) to invoke the service with.
 *
 * @ingroup taxonomy_xml_services
 */
function taxonomy_xml_invoke_service_request($service, $values) {
  switch ($service['protocol']) {
    case 'URI':

      // Before running, need to make sure the preferences on the form
      // are in line with those set for the service -
      // ie, the format is selected right. Maybe more later?
      $values['format'] = $service['format'];
      variable_set('taxonomy_xml_format', $values['format']);
      $req = taxonomy_xml_sub_placeholders_into_pattern($service['pattern'], $values);

      // Need to break it into steps, to try and keep the times manageable.
      // Prepare a batch config
      $batch_settings = array(
        'title' => t('Invoking a request on taxonomy server %name.', array(
          '%name' => $service['name'],
        )),
        'operations' => array(),
        // The last operation will be to see if any more jobs were queued in the meantime.
        // unlimited batch recursion.
        'finished' => 'taxonomy_xml_batch_import_finished',
      );

      // Break down the steps.
      // #1, Fetch a reponse from the URL
      $batch_settings['operations'][] = array(
        'taxonomy_xml_cached_get_contents',
        array(
          $req,
        ),
      );

      // #2 That would have cached the response, so next time we open that file will be quicker
      $batch_settings['operations'][] = array(
        'taxonomy_xml_invoke_import_on_url',
        array(
          $req,
          $values,
        ),
      );

      // Ensure that any pending jobs in the queue get found and done.
      $batch_settings['operations']['final'] = array(
        'taxonomy_xml_batch_requeue_more',
        array(),
      );
      batch_set($batch_settings);
      break;
    case 'FILE':
      taxonomy_xml_invoke_import_on_filepath($service['filepath'], $values);
      break;
    default:
      drupal_set_message(t("Taxonomy server protocol %protocol is not yet supported", array(
        '%protocol' => $service['protocol'],
      )), 'warning');
  }
}

/**
 * Catches watchdog calls, to expose them to the UI if I feel like it.
 *
 * This means I don't have to do my own logging!
 * @see watchdog()
 */
function taxonomy_xml_watchdog($log_message) {
  if ($log_message['type'] == 'taxonomy_xml') {
    static $watchdog_level;
    if (empty($watchdog_level)) {
      $watchdog_level = variable_get('taxonomy_xml_watchdog_level', WATCHDOG_NOTICE);
    }
    if ($log_message['severity'] > $watchdog_level) {
      return;
    }
    $levels = taxonomy_xml_watchdog_levels();
    drupal_set_message($levels[$log_message['severity']] . ": " . t($log_message['message'], $log_message['variables']));
  }
}

/**
 * Catch taxonomy deletion so we know if they need to be rebuilt (by features)
 */
function taxonomy_xml_taxonomy($op, $type, $item) {
  if ($op == 'delete' && $type == 'vocabulary') {
    $taxonomy_xml_imports = variable_get('taxonomy_xml_imports', array());
    $feature_name = str_replace('features_', '', $item['module']);
    unset($taxonomy_xml_imports[$feature_name]);
    variable_set('taxonomy_xml_imports', $taxonomy_xml_imports);

    // Removed this vocab from the known imported list of features.
  }
}

/**
 * Text version of the integer constants. Used for log display.
 */
function taxonomy_xml_watchdog_levels() {
  return array(
    WATCHDOG_DEBUG => 'debug',
    WATCHDOG_INFO => 'info',
    WATCHDOG_NOTICE => 'notice',
    WATCHDOG_WARNING => 'warning',
    WATCHDOG_ERROR => 'error',
  );
}

/**
 * Safer entity encoding.
 *
 * htmlentities doesn't actually encode strings to anything XML-safe.
 * This is a slightly laborious emulation that does decimal entities not fancy
 * character ones.
 * Taken from http://nz.php.net/manual/en/function.htmlentities.php
 *
 * As such, it's screwing things up for non-English taxonomies.
 *
 * @ingroup Utility
 */
function xmlentities($str) {
  return preg_replace('/[^\\x00-\\x25\\x27-\\x7F]/e', '"&#".ord("$0").";"', $str);
}

/**
 * Returns a HANDLE on the current working list of terms.
 *
 * Basically behaving like a global, so we can cache and share the working list.
 *
 * Remember to fetch the list by handle, eg $terms =&
 * taxonoomy_xml_current_terms() if you are planning on modifying the list.
 */
function &taxonomy_xml_current_terms() {
  static $terms;
  if (!isset($terms)) {
    $terms = array();
  }
  return $terms;
}

/**
 * Return an array of alternative wordings that may be used in the input files.
 *
 * ADD TO THIS AS NEEDED, to support more XML dialects, referring to the globals
 * defined at the top of the module.
 *
 * Different input files use different words to express the same concept. This
 * array tries to translate the meanings down into the core concepts used
 * internally. The reason that this list is so big and messy is because all the
 * different academic sources I've researched just use different terminology to
 * say the same thing.
 *
 * See ISO 2788 for notes on expressing thesauri.
 * or SKOS http://www.w3.org/2004/02/skos/vocabs
 *
 * or an alternative glossary:
 * "http://www.boxesandarrows.com/view/controlled_vocabularies_a_glosso_thesaurus"
 *
 * Each of these terms are predicates that would make up a 'triple' statement.
 * For a geographical taxonomy, a sample could be:
 *
 * Subject, Predicate, Object
 *
 * "United States of America", "Narrower", "Iowa"
 * "United States of America", "Broader", "North America"
 * "United States of America", "AKA", "USA"
 * "The States", "See", "United States of America"
 * 
 * The list below is non-namespaced. SOME of these predicates could and should
 * be namespaced, and if we were working purely with RDF, they would be. In
 * reality we work with all sorts of data sources, most of which are NOT
 * namespaced. Therefore, when resolving them, we reduce complex, namespaced
 * values to the last word at the end, the shortname, before flattening all
 * these synonyms. 
 * 
 * This leads to some collisions, but in this case it's
 * intentional, as the concept of 'label' is braodly the same no matter what
 * namespace it was found in.
 * 
 * @see taxonomy_xml_canonicize_predicates()
 * 
 */
function taxonomy_xml_relationship_synonyms() {
  static $synonyms;
  if (!isset($synonyms)) {
    $synonyms = array(
      'Related Terms' => TAXONOMY_XML_RELATED,
      'Related' => TAXONOMY_XML_RELATED,
      'related' => TAXONOMY_XML_RELATED,
      // SKOS
      'RT' => TAXONOMY_XML_RELATED,
      // ISO2788
      'seeAlso' => TAXONOMY_XML_RELATED,
      // RDFS
      'Broader Terms' => TAXONOMY_XML_PARENT,
      // Library of Congress
      'Broader' => TAXONOMY_XML_PARENT,
      'broader' => TAXONOMY_XML_PARENT,
      // SKOS
      'Broad Term' => TAXONOMY_XML_PARENT,
      'BT' => TAXONOMY_XML_PARENT,
      // ISO2788
      'subClassOf' => TAXONOMY_XML_PARENT,
      // rdfs
      'SubClassOf' => TAXONOMY_XML_PARENT,
      // contentlabel
      'ChildOf' => TAXONOMY_XML_PARENT,
      // FOAF?
      'hypernym' => TAXONOMY_XML_PARENT,
      'hyponymOf' => TAXONOMY_XML_PARENT,
      // Wordnet
      'parent' => TAXONOMY_XML_PARENT,
      // lsid.zoology.gla.ac.uk
      'is child taxon of' => TAXONOMY_XML_PARENT,
      // TCS
      'biology.organism_classification.higher_classification' => TAXONOMY_XML_PARENT,
      // Freenet
      'music.genre.parent_genre' => TAXONOMY_XML_PARENT,
      // Freenet
      'media_common.media_genre.parent_genre' => TAXONOMY_XML_PARENT,
      // Freebase
      'broaderTransitive' => TAXONOMY_XML_PARENT,
      // Seen in SKOS from IPTC
      'Narrower Terms' => TAXONOMY_XML_CHILD,
      // Library of Congress
      'Narrower' => TAXONOMY_XML_CHILD,
      'Narrow Term' => TAXONOMY_XML_CHILD,
      // FONZ
      'narrower' => TAXONOMY_XML_CHILD,
      // SKOS
      'NT' => TAXONOMY_XML_CHILD,
      // ISO2788 ' Narrower Term'
      'superClassOf' => TAXONOMY_XML_CHILD,
      // should be complimentary to the rdfs subClassOf
      'ParentOf' => TAXONOMY_XML_CHILD,
      'hasChild' => TAXONOMY_XML_CHILD,
      // uBio
      'hasCAVConcept' => TAXONOMY_XML_CHILD,
      // uBio
      'hyponym' => TAXONOMY_XML_CHILD,
      'hyponymOf' => TAXONOMY_XML_CHILD,
      // Wordnet
      'is parent taxon of' => TAXONOMY_XML_CHILD,
      // TCS
      'biology.organism_classification.lower_classifications' => TAXONOMY_XML_CHILD,
      //Freenet
      'music.genre.subgenre' => TAXONOMY_XML_CHILD,
      //Freenet
      'Description' => TAXONOMY_XML_DESCRIPTION,
      'description' => TAXONOMY_XML_DESCRIPTION,
      // DC
      'definition' => TAXONOMY_XML_DESCRIPTION,
      // SKOS
      'Definition' => TAXONOMY_XML_DESCRIPTION,
      'comment' => TAXONOMY_XML_DESCRIPTION,
      'gloss' => TAXONOMY_XML_DESCRIPTION,
      'Scope Note' => TAXONOMY_XML_DESCRIPTION,
      // Library of Congress
      'scopeNote' => TAXONOMY_XML_DESCRIPTION,
      // also Library of Congress
      'note' => TAXONOMY_XML_DESCRIPTION,
      // SKOS
      'SN' => TAXONOMY_XML_DESCRIPTION,
      // ISO2788 'Scope note'
      'Used for' => TAXONOMY_XML_HAS_SYNONYM,
      'UF' => TAXONOMY_XML_HAS_SYNONYM,
      // "used for", Alexandria Digital Library
      // 'UFT'             => TAXONOMY_XML_HAS_SYNONYM, // "used for term", Alexandria Digital Library
      'AKA' => TAXONOMY_XML_HAS_SYNONYM,
      'synonym' => TAXONOMY_XML_HAS_SYNONYM,
      'altLabel' => TAXONOMY_XML_HAS_SYNONYM,
      // SKOS

      //'notation'        => TAXONOMY_XML_HAS_SYNONYM, // SKOS
      'equivalentClass' => TAXONOMY_XML_HAS_SYNONYM,
      // OWL
      'has synonym' => TAXONOMY_XML_HAS_SYNONYM,
      // TCS
      'has vernacular' => TAXONOMY_XML_HAS_SYNONYM,
      // TCS
      'common.topic.alias' => TAXONOMY_XML_HAS_SYNONYM,
      // Freenet
      'biology.organism_classification.scientific_name',
      TAXONOMY_XML_HAS_SYNONYM,
      // Freenet
      'See' => TAXONOMY_XML_SYNONYM_OF,
      // Library of Congress
      'USE' => TAXONOMY_XML_SYNONYM_OF,
      // ISO2788
      'Use' => TAXONOMY_XML_SYNONYM_OF,
      'Preferred Term' => TAXONOMY_XML_SYNONYM_OF,
      'PT' => TAXONOMY_XML_SYNONYM_OF,
      // 'Preferred Term'
      // Do not use 'sameas', it's not.
      'related' => TAXONOMY_XML_RELATED,
      'seeAlso' => TAXONOMY_XML_RELATED,
      'See Also' => TAXONOMY_XML_RELATED,
      // Library of Congress
      'memberMeronymOf' => TAXONOMY_XML_RELATED,
      // Wordnet
      'similarTo' => TAXONOMY_XML_RELATED,
      // Wordnet (1999 rdfs)
      'RT' => TAXONOMY_XML_RELATED,
      // 'Related Term'
      'Related Term' => TAXONOMY_XML_RELATED,
      // FONZ
      'Part of' => TAXONOMY_XML_IN_VOCABULARY,
      'belongs-to-facet' => TAXONOMY_XML_IN_VOCABULARY,
      // XFML
      'isDefinedBy' => TAXONOMY_XML_IN_VOCABULARY,
      // rdfs
      'inScheme' => TAXONOMY_XML_IN_VOCABULARY,
      // SKOS
      'name' => TAXONOMY_XML_NAME,
      'title' => TAXONOMY_XML_NAME,
      // DC
      'lexicalForm' => TAXONOMY_XML_NAME,
      'label' => TAXONOMY_XML_NAME,
      'scientific name' => TAXONOMY_XML_NAME,
      'Scientific Name' => TAXONOMY_XML_NAME,
      // ubio
      'prefLabel' => TAXONOMY_XML_NAME,
      //SKOS
      'type.object.name' => TAXONOMY_XML_NAME,
      // Freebase
      'subPropertyOf' => TAXONOMY_XML_UNUSED,
      // ignore this
      'hasDescriptor' => TAXONOMY_XML_UNUSED,
      'subjectIndicator' => TAXONOMY_XML_UNUSED,
      // from SKOS/Topic Maps. Seen in the GCL. I have no idea what it's used for
      'type.object.key' => TAXONOMY_XML_UNUSED,
      //unused things from freebase
      'license' => TAXONOMY_XML_UNUSED,
      'attributionName' => TAXONOMY_XML_UNUSED,
      'attributionURL' => TAXONOMY_XML_UNUSED,
      'example' => TAXONOMY_XML_UNUSED,
      // unused from LoC
      'created' => TAXONOMY_XML_UNUSED,
      // unused from LoC
      'modified' => TAXONOMY_XML_UNUSED,
      // unused from LoC
      'source' => TAXONOMY_XML_UNUSED,
      // unused from LoC
      'editorialNote' => TAXONOMY_XML_UNUSED,
      // From Library of Congress. Ignore
      'closeMatch' => TAXONOMY_XML_UNUSED,
      // From Library of Congress. Ignore
      // closeMatch COULD become TAXONOMY_XML_RELATED, though it appears in LOC to refer to off-site resources, and that would be problematic to follow.
      'music.genre.albums' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'music.genre.artists' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'common.topic.webpage' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'common.topic.article' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'location.location.geolocation' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'type.type.expected_by' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'common.topic.image' => TAXONOMY_XML_UNUSED,
      // Freenet - ignore for better debugging
      'status' => TAXONOMY_XML_UNUSED,
      // Seen in SKOS from "http://metadataregistry.org/"
      // Some arbitrary known values may come in handy. Store them as pure RDF if we can!
      'sameAs' => TAXONOMY_XML_OTHER_PREDICATE,
      'notation' => TAXONOMY_XML_OTHER_PREDICATE,
      // SKOS
      'identifier' => TAXONOMY_XML_OTHER_PREDICATE,
      // uBio
      'type.type.instance' => TAXONOMY_XML_OTHER_PREDICATE,
    );
  }

  // By listing the deliberately unused attributes the parser finds,
  // we can still be alerted to other unrecognised tags found in the input.
  // Perhaps they could be used. Otherwise the unused ones cause too much noise.
  // Experimental mapping
  // Freenet geographic areas.
  $synonyms['location.location.contains'] = TAXONOMY_XML_CHILD;
  $synonyms['location.location.containedby'] = TAXONOMY_XML_PARENT;
  return $synonyms;
}

/**
 * Implementation of hook_features_api().
 *
 * taxonomy_xml 'features' are full taxonomy source definitions. Enabling a
 * taxonomy_xml feature will install the remote vocabulary for us.
 *
 * The 'component' for this module is named
 * 'taxonomy_xml_source' not just 'taxonomy_xml_source' to follow recommended
 * practice documented in features. api
 *
 */
function taxonomy_xml_features_api() {
  return array(
    'taxonomy_xml_source' => array(
      'name' => t('Taxonomy Import (taxonomy_xml)'),
      'default_hook' => 'taxonomy_xml_source_default_items',
      'feature_source' => TRUE,
      //'default_file' => FEATURES_DEFAULTS_INCLUDED,

      // Save the settings along with the normal features taxonomy export. Don't need a new file
      'default_file' => FEATURES_DEFAULTS_CUSTOM,
      'default_filename' => 'features.taxonomy',
      'file' => drupal_get_path('module', 'taxonomy_xml') . '/taxonomy_xml.features.inc',
      'module' => 'taxonomy_xml',
      'feature_source' => TRUE,
    ),
  );
}

/**
 * Implements hook_rdf_mapping().
 *
 * Copied back from D7. Defines the predicate labels to be used when generating RDF
 * 
 * @return array
 *   The rdf mapping for vocabularies and terms.
 */
function taxonomy_xml_rdf_mapping() {
  $RDF_DEFAULT_BUNDLE = '';
  return array(
    array(
      'type' => 'taxonomy_term',
      'bundle' => $RDF_DEFAULT_BUNDLE,
      'mapping' => array(
        'rdftype' => array(
          'skos:Concept',
        ),
        'name' => array(
          'predicates' => array(
            'rdfs:label',
            'skos:prefLabel',
          ),
        ),
        'description' => array(
          'predicates' => array(
            'skos:definition',
          ),
        ),
        'vid' => array(
          'predicates' => array(
            'skos:inScheme',
          ),
          'type' => 'rel',
          'callback' => 'taxonomy_xml_taxonomy_vocabulary_uri',
        ),
        'parent' => array(
          'predicates' => array(
            'skos:broader',
          ),
          'type' => 'rel',
          'callback' => 'taxonomy_xml_taxonomy_term_uri',
        ),
        'child' => array(
          'predicates' => array(
            'skos:narrower',
          ),
          'type' => 'rel',
          'callback' => 'taxonomy_xml_taxonomy_term_uri',
        ),
        'synonyms_array' => array(
          'predicates' => array(
            'skos:altLabel',
          ),
        ),
        'related' => array(
          'predicates' => array(
            'skos:related',
          ),
          'type' => 'rel',
          'callback' => 'taxonomy_xml_taxonomy_term_uri',
        ),
      ),
    ),
    array(
      'type' => 'taxonomy_vocabulary',
      'bundle' => $RDF_DEFAULT_BUNDLE,
      'mapping' => array(
        'rdftype' => array(
          'skos:ConceptScheme',
        ),
        'name' => array(
          'predicates' => array(
            'dc:title',
          ),
        ),
        'description' => array(
          'predicates' => array(
            'rdfs:comment',
          ),
        ),
      ),
    ),
  );
}

Functions

Namesort descending Description
taxonomy_xml_about_services Admin help page listing details of available services A menu page callback
taxonomy_xml_absorb_vocabulary_definitions Create Vocabulary definitions.
taxonomy_xml_add_all_children_to_queue Queue up an import action.
taxonomy_xml_add_term_to_batch_queue Manage batch queues by dividing them into recursive 'rounds'.
taxonomy_xml_batch_import_finished Finalization callback when a round is done.
taxonomy_xml_batch_requeue_more Batch callback action that should happen at the end of each round of processing.
taxonomy_xml_cached_get_contents A caching version of file_get_contents.
taxonomy_xml_canonicize_predicates Convert aliased predicates into common ones.
taxonomy_xml_current_terms Returns a HANDLE on the current working list of terms.
taxonomy_xml_export Page callback. Displays an unordered list of all available vocabularies for export
taxonomy_xml_export_term Return a representation of the requested term
taxonomy_xml_features_api Implementation of hook_features_api().
taxonomy_xml_fetch_and_import Fetches the data according to the given method, then invokes the import on that data.
taxonomy_xml_file Return a flat file representation of the requested vocab
taxonomy_xml_flush_cache_file Utility menu callback.
taxonomy_xml_flush_file_cache Deletes ALL cached files. Accessed in the 'advanced' section of the import form.
taxonomy_xml_formats Return a list of available file formats.
taxonomy_xml_get_term A variation of taxonomy_get_term - load all the peripheral term info too
taxonomy_xml_get_term_ancestors Return a list of all terms in the ancestry chain of the given term.
taxonomy_xml_get_term_by_guid Special lookup for terms if they are saved with a URI or GUID
taxonomy_xml_get_term_by_name_from_vocab Get a term by name from the given vocab.
taxonomy_xml_get_term_guid Return (and remember) the URI associated with this term.
taxonomy_xml_get_term_uri Deprecated, to avoid conflict with D7 entities, which use uri for their own purpose
taxonomy_xml_get_vocabulary_by_name Fetch a vocabulary by name.
taxonomy_xml_help Implementation of hook_help().
taxonomy_xml_import Menu callback for the import page.
taxonomy_xml_import_form Builds the import form.
taxonomy_xml_import_form_submit Imports the actual XML.
taxonomy_xml_import_from_url Import data from one URL. Function used by the batch operation
taxonomy_xml_include_module_hooks Load includes for any modules that implement extensions to import or export processes.
taxonomy_xml_invoke_import Do the actual importing from the given string, pased on the parameters passed from the form.
taxonomy_xml_invoke_import_on_filepath Load a vocabulary from the given local filepath and import it
taxonomy_xml_invoke_import_on_url Load a vocabulary from the given URL and import it
taxonomy_xml_invoke_service_request Make a request on a remote taxonomy server and process the response
taxonomy_xml_label_from_uri Given a URI ening in a #fragment or a /basename, return just that string.
taxonomy_xml_lookup_services Return a list of known taxonomy services. Optionally filtered by type, or formatted for re-use.
taxonomy_xml_menu Implementation of hook_menu: Define menu links.
taxonomy_xml_merge_predicates_into_attributes Merge all predicate data into a simpler array
taxonomy_xml_parse_lsid Helper function to check if a given string looks like an LSID. If so, it returns it in an array of componant bits. If not, returns NULL.
taxonomy_xml_perm Implementation of hook_perm().
taxonomy_xml_rdf_mapping Implements hook_rdf_mapping().
taxonomy_xml_rdf_taxonomy_term_path Return the full, internal taxonomy/term/n URI
taxonomy_xml_relationship_synonyms Return an array of alternative wordings that may be used in the input files.
taxonomy_xml_set_term_guid Insert a URI serialization into a term object.
taxonomy_xml_set_term_relations Given a list of terms, set the related-terms and structure, and save again
taxonomy_xml_sub_placeholders_into_pattern Replace URL patterns containing placeholders for data values.
taxonomy_xml_taxonomy Catch taxonomy deletion so we know if they need to be rebuilt (by features)
taxonomy_xml_taxonomy_term_load Appends any missing data to the given term - by ref.
taxonomy_xml_watchdog Catches watchdog calls, to expose them to the UI if I feel like it.
taxonomy_xml_watchdog_levels Text version of the integer constants. Used for log display.
xmlentities Safer entity encoding.
_taxonomy_xml_get_term_placeholder Either fetch the named term if it exists, or return a useful placeholder.
_taxonomy_xml_get_vocabulary_placeholder Fetch the named vocab if it exists

Constants

Namesort descending Description
TAXONOMY_XML_ADMIN Menu root for our items
TAXONOMY_XML_ALLOWED_UPLOAD_EXTENSIONS As we allow upload, we should limit it to expected types
TAXONOMY_XML_CHILD
TAXONOMY_XML_CREATE_NEW
TAXONOMY_XML_DESCRIPTION
TAXONOMY_XML_DETERMINED_BY_SOURCE_FILE Used to define some flags for the import options
TAXONOMY_XML_HAS_SYNONYM
TAXONOMY_XML_IN_VOCABULARY
TAXONOMY_XML_MAX_BATCH_SIZE The batch process is tricky. When we import a document, it may have any number of external references that need retrieving and resolving. As they are found, they are added to a queue. As the queue is processed, it in turn adds more references…
TAXONOMY_XML_NAME
TAXONOMY_XML_NULL
TAXONOMY_XML_OTHER_PREDICATE
TAXONOMY_XML_PARENT
TAXONOMY_XML_RELATED Cannonic Predicates
TAXONOMY_XML_SYNONYM_OF
TAXONOMY_XML_UNUSED