function taxonomy_xml_cached_get_contents in Taxonomy import/export via XML 7

Same name and namespace in other branches

6.2 taxonomy_xml.module \taxonomy_xml_cached_get_contents()
6 taxonomy_xml.module \taxonomy_xml_cached_get_contents()

A caching version of file_get_contents.

Used to try and minimize remote URL lookups.

A URI with a #tag is NOT a different file, so all requests for that doc will return the same result.

File requests are usually either one huge one (possibly many times if it uses internal anchors), or many little ones. As well as the file-level caching, we'll statically cache the MRU. Because we work in batches, the 'static' cache will only hit if the wind is blowing the right way, but it's still a help.

Uses content-negotiation! So if retrieving from a repository that speaks that, we'll get the RDF out from under the rendered version. When making a request, we say "Accept: application/rdf+xml" .. and sometimes we get it.

Content-negotiation is triggered by entering either a ('format' => 'RDF') or a ('mime' => 'application/rdf+xml') parameter in the options. If requesting a second time, the (cached) results of the first operation will be returned regardless of type, so 'flush' if this is likely to be different.

2 calls to taxonomy_xml_cached_get_contents()

taxonomy_xml_import_from_url in ./taxonomy_xml.process.inc: Import data from one URL. Function used by the batch operation
taxonomy_xml_invoke_import_on_url in ./taxonomy_xml.module: Load a vocabulary from the given URL and import it.

2 string references to 'taxonomy_xml_cached_get_contents'

taxonomy_xml_invoke_service_request in ./taxonomy_xml.process.inc: Make a request on a remote taxonomy server and process the response
url_taxonomy_service_form_submit in services/url.taxonomy_service.inc: What to do when loading from this service

File

./taxonomy_xml.module, line 600: Make it possible to import and export taxonomies as XML documents.

Code

function taxonomy_xml_cached_get_contents($url, $options = array(), $flush = FALSE) {

  // Note this current active URL for reference in debuging in distant places
  global $_taxonomy_xml_current_doc;
  $_taxonomy_xml_current_doc = $url;
  $url_parts = @parse_url($url);

  // Double-check it's an URL.
  // Don't want random things that look a bit like an URL (LSID urns) getting through here
  if ($url_parts['scheme'] != 'http' && $url_parts['scheme'] != 'https' && $url_parts['scheme'] != 'ftp') {
    watchdog('taxonomy_xml', "Not retrieving remote file. !url is not an HTTP URL", array(
      '!url' => l($url, $url),
      WATCHDOG_WARNING,
    ));
    return NULL;
  }

  // Discard any anchor before MD5-ing it or checking the cache
  // The last part does not count.
  @(list($url, $anchor) = explode('#', $url));

  // Check the MRU cache.
  static $old_url, $old_data;
  if ($url == $old_url && !$flush) {
    return $old_data;
  }
  $old_url = $url;

  #$cachedir = file_directory_path() . '/url_cache';

  #http://drupal.org/update/modules/6/7#file_directory_path
  $cachedir = drupal_realpath('public://url_cache');
  $save_as = $cachedir . '/' . md5($url);
  if (file_exists($save_as)) {
    $content = file_get_contents($save_as);
    $old_data = $content;
    if ($content) {

      // Occasionally got zero-length reponses?
      $flush = l("flush", TAXONOMY_XML_ADMIN . '_xml/flush/' . md5($url));
      watchdog('taxonomy_xml', "Using locally cached copy !local_copy of !url !flush", array(
        '!local_copy' => l(md5($url), $save_as),
        '!url' => l($url, $url),
        '!flush' => $flush,
      ), WATCHDOG_DEBUG);
      return $content;
    }
  }

  // else
  file_prepare_directory($cachedir, FILE_CREATE_DIRECTORY);

  // Try to do content negotiation here.
  if (!empty($options['format'])) {
    $format_info = taxonomy_xml_format_info($options['format']);
    if (!empty($format_info['mime'])) {
      $options['mime'] = $format_info['mime'];
    }
  }

  // Using HTTP_ACCEPT, I say I'd RATHER have rdf if you've got it.
  // Also tell them who we are and why we are scraping their data.
  $opts = array(
    'http' => array(
      'method' => "GET",
      'header' => "Accept: application/rdf+xml,*/* \r\n",
      'user_agent' => "taxonomy_xml.module data import running from a Drupal CMS. [" . variable_get('site_name', '') . "]",
    ),
  );
  if (!empty($options['mime'])) {
    watchdog('taxonomy_xml', '<a href="!url">URL</a> will be requested as MIME type "!mime"', array(
      '!url' => $url,
      '!mime' => $options['mime'],
    ), WATCHDOG_INFO);
    $opts['http']['header'] = "Accept: {$options['mime']},*/* \r\n";
  }
  $context = stream_context_create($opts);

  // Header set - will take effect on this request.
  $content = file_get_contents($url, NULL, $context);
  if (!empty($content)) {
    file_put_contents($save_as, $content);
  }
  else {
    $error = error_get_last();
    watchdog('taxonomy_xml', 'Failed to retrieve valid content from URL <a href="!url">!url</a>', array(
      '!url' => $url,
    ), WATCHDOG_ERROR);
    if (!empty($options['mime'])) {
      watchdog('taxonomy_xml', 'Request for <a href="!url">URL</a> failed. It may be that the server does not respont to content-type "!mime" requests', array(
        '!url' => $url,
        '!mime' => $options['mime'],
        '%error' => $error,
      ), WATCHDOG_ERROR);
    }
  }
  $old_data = $content;
  return $content;
}

You are here

function taxonomy_xml_cached_get_contents in Taxonomy import/export via XML 7

File

Code

API Navigation