You are here

function taxonomy_xml_relationship_synonyms in Taxonomy import/export via XML 7

Same name and namespace in other branches
  1. 5.2 taxonomy_xml.module \taxonomy_xml_relationship_synonyms()
  2. 5 taxonomy_xml.module \taxonomy_xml_relationship_synonyms()
  3. 6.2 taxonomy_xml.module \taxonomy_xml_relationship_synonyms()
  4. 6 taxonomy_xml.module \taxonomy_xml_relationship_synonyms()

Return an array of alternative wordings that may be used in the input files.

ADD TO THIS AS NEEDED, to support more XML dialects, referring to the globals defined at the top of the module.

Different input files use different words to express the same concept. This array tries to translate the meanings down into the core concepts used internally. The reason that this list is so big and messy is because all the different academic sources I've researched just use different terminology to say the same thing.

See ISO 2788 for notes on expressing thesauri. or SKOS http://www.w3.org/2004/02/skos/vocabs

or an alternative glossary: "http://www.boxesandarrows.com/view/controlled_vocabularies_a_glosso_thes...

Each of these terms are predicates that would make up a 'triple' statement. For a geographical taxonomy, a sample could be:

Subject, Predicate, Object

"United States of America", "Narrower", "Iowa" "United States of America", "Broader", "North America" "United States of America", "AKA", "USA" "The States", "See", "United States of America"

The list below is non-namespaced. SOME of these predicates could and should be namespaced, and if we were working purely with RDF, they would be. In reality we work with all sorts of data sources, most of which are NOT namespaced. Therefore, when resolving them, we reduce complex, namespaced values to the last word at the end, the shortname, before flattening all these synonyms.

This leads to some collisions, but in this case it's intentional, as the concept of 'label' is braodly the same no matter what namespace it was found in.

See also

taxonomy_xml_canonicize_predicates()

4 calls to taxonomy_xml_relationship_synonyms()
taxonomy_xml_canonicize_predicates in ./taxonomy_xml.process.inc
Convert aliased predicates into common ones.
taxonomy_xml_csv_parse in formats/csv_format.inc
Scan the input CSV file and create a taxonomy structure out of it.
taxonomy_xml_merge_predicates_into_attributes in ./taxonomy_xml.process.inc
Merge all predicate data into a simpler array.
taxonomy_xml_relationship_synonyms_reverse in ./taxonomy_xml.module
Invert the synonyms array

File

./taxonomy_xml.module, line 1061
Make it possible to import and export taxonomies as XML documents.

Code

function taxonomy_xml_relationship_synonyms() {
  static $synonyms;
  if (!isset($synonyms)) {
    $synonyms = array(
      'type' => 'type',
      # Only used internally
      'Related Terms' => TAXONOMY_XML_RELATED,
      'Related' => TAXONOMY_XML_RELATED,
      'related' => TAXONOMY_XML_RELATED,
      # SKOS
      'closeMatch' => TAXONOMY_XML_RELATED,
      # SKOS
      'RT' => TAXONOMY_XML_RELATED,
      # ISO2788 'Related Term'
      'seeAlso' => TAXONOMY_XML_RELATED,
      # RDFS
      'Broader Terms' => TAXONOMY_XML_PARENT,
      # Library of Congress
      'Broader' => TAXONOMY_XML_PARENT,
      'broader' => TAXONOMY_XML_PARENT,
      # SKOS
      'Broad Term' => TAXONOMY_XML_PARENT,
      'BT' => TAXONOMY_XML_PARENT,
      # ISO2788
      'subClassOf' => TAXONOMY_XML_PARENT,
      # rdfs
      'SubClassOf' => TAXONOMY_XML_PARENT,
      # contentlabel
      'ChildOf' => TAXONOMY_XML_PARENT,
      # FOAF?
      'hypernym' => TAXONOMY_XML_PARENT,
      'hyponymOf' => TAXONOMY_XML_PARENT,
      # Wordnet
      'parent' => TAXONOMY_XML_PARENT,
      // lsid.zoology.gla.ac.uk
      'is child taxon of' => TAXONOMY_XML_PARENT,
      # TCS
      'biology.organism_classification.higher_classification' => TAXONOMY_XML_PARENT,
      # Freenet
      'music.genre.parent_genre' => TAXONOMY_XML_PARENT,
      # Freenet
      'Narrower Terms' => TAXONOMY_XML_CHILD,
      # Library of Congress
      'Narrower' => TAXONOMY_XML_CHILD,
      'Narrow Term' => TAXONOMY_XML_CHILD,
      # FONZ
      'narrower' => TAXONOMY_XML_CHILD,
      # SKOS
      'NT' => TAXONOMY_XML_CHILD,
      # ISO2788 ' Narrower Term'
      'superClassOf' => TAXONOMY_XML_CHILD,
      # should be complimentary to the rdfs subClassOf
      'ParentOf' => TAXONOMY_XML_CHILD,
      'hasChild' => TAXONOMY_XML_CHILD,
      # uBio
      'hasCAVConcept' => TAXONOMY_XML_CHILD,
      # uBio
      'hyponym' => TAXONOMY_XML_CHILD,
      'is parent taxon of' => TAXONOMY_XML_CHILD,
      # TCS
      'biology.organism_classification.lower_classifications' => TAXONOMY_XML_CHILD,
      #Freenet
      'music.genre.subgenre' => TAXONOMY_XML_CHILD,
      #Freenet
      'Description' => TAXONOMY_XML_DESCRIPTION,
      'description' => TAXONOMY_XML_DESCRIPTION,
      # DC
      'definition' => TAXONOMY_XML_DESCRIPTION,
      # SKOS
      'Definition' => TAXONOMY_XML_DESCRIPTION,
      'comment' => TAXONOMY_XML_DESCRIPTION,
      'gloss' => TAXONOMY_XML_DESCRIPTION,
      'Scope Note' => TAXONOMY_XML_DESCRIPTION,
      # Library of Congress
      'scopeNote' => TAXONOMY_XML_DESCRIPTION,
      # also Library of Congress
      'note' => TAXONOMY_XML_DESCRIPTION,
      # SKOS
      'SN' => TAXONOMY_XML_DESCRIPTION,
      # ISO2788 'Scope note'
      'Used for' => TAXONOMY_XML_HAS_SYNONYM,
      'AKA' => TAXONOMY_XML_HAS_SYNONYM,
      'synonym' => TAXONOMY_XML_HAS_SYNONYM,
      'altLabel' => TAXONOMY_XML_HAS_SYNONYM,
      # SKOS

      #'notation'        => TAXONOMY_XML_HAS_SYNONYM, # SKOS
      'equivalentClass' => TAXONOMY_XML_HAS_SYNONYM,
      # OWL
      'has synonym' => TAXONOMY_XML_HAS_SYNONYM,
      # TCS
      'has vernacular' => TAXONOMY_XML_HAS_SYNONYM,
      # TCS
      'common.topic.alias' => TAXONOMY_XML_HAS_SYNONYM,
      # Freenet 'biology.organism_classification.scientific_name', TAXONOMY_XML_HAS_SYNONYM, # Freenet
      'See' => TAXONOMY_XML_SYNONYM_OF,
      # Library of Congress
      'USE' => TAXONOMY_XML_SYNONYM_OF,
      # ISO2788
      'Use' => TAXONOMY_XML_SYNONYM_OF,
      'Preferred Term' => TAXONOMY_XML_SYNONYM_OF,
      'PT' => TAXONOMY_XML_SYNONYM_OF,
      # 'Preferred Term'

      # Do not use 'sameas', it's not.
      'See Also' => TAXONOMY_XML_RELATED,
      # Library of Congress
      'memberMeronymOf' => TAXONOMY_XML_RELATED,
      # Wordnet
      'similarTo' => TAXONOMY_XML_RELATED,
      # Wordnet (1999 rdfs)
      'Related Term' => TAXONOMY_XML_RELATED,
      # FONZ
      'Part of' => TAXONOMY_XML_IN_VOCABULARY,
      'belongs-to-facet' => TAXONOMY_XML_IN_VOCABULARY,
      # XFML
      'isDefinedBy' => TAXONOMY_XML_IN_VOCABULARY,
      # rdfs
      'inScheme' => TAXONOMY_XML_IN_VOCABULARY,
      # SKOS
      'member' => TAXONOMY_XML_IN_VOCABULARY,
      # SKOS
      'name' => TAXONOMY_XML_NAME,
      'title' => TAXONOMY_XML_NAME,
      # DC
      'lexicalForm' => TAXONOMY_XML_NAME,
      'label' => TAXONOMY_XML_NAME,
      # sig.ma and others
      'scientific name' => TAXONOMY_XML_NAME,
      'Scientific Name' => TAXONOMY_XML_NAME,
      # ubio
      'prefLabel' => TAXONOMY_XML_NAME,
      #SKOS
      'type.object.name' => TAXONOMY_XML_NAME,
      # Freebase

      # Listing unused things here just turns down the volume in debug logs.

      # Things NOT explicitly unused MAY be retained further down the process.
      'subPropertyOf' => TAXONOMY_XML_UNUSED,
      # ignore this
      'hasDescriptor' => TAXONOMY_XML_UNUSED,
      'subjectIndicator' => TAXONOMY_XML_UNUSED,
      # from SKOS/Topic Maps. Seen in the GCL. I have no idea what it's used for
      'type.object.key' => TAXONOMY_XML_UNUSED,
      #unused things from freebase
      'license' => TAXONOMY_XML_UNUSED,
      'attributionName' => TAXONOMY_XML_UNUSED,
      'attributionURL' => TAXONOMY_XML_UNUSED,
      'example' => TAXONOMY_XML_UNUSED,
      # unused from LoC
      'created' => TAXONOMY_XML_UNUSED,
      # unused from LoC
      'modified' => TAXONOMY_XML_UNUSED,
      # unused from LoC
      'source' => TAXONOMY_XML_UNUSED,
      # unused from LoC
      'music.genre.albums' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'music.genre.artists' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'common.topic.webpage' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'common.topic.article' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'location.location.geolocation' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'type.type.expected_by' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'common.topic.image' => TAXONOMY_XML_UNUSED,
      # Freenet - ignore for better debugging
      'status' => TAXONOMY_XML_UNUSED,
      # Seen in SKOS from "http://metadataregistry.org/"
      'editorialNote' => TAXONOMY_XML_UNUSED,
      # From Library of Congress. Ignore
      'value' => TAXONOMY_XML_NAME,
      # Too unpredictable what context this may be found in
      'stylesheet' => TAXONOMY_XML_UNUSED,
      # RDFa - often parses to include useless stuff.
      'alternate' => TAXONOMY_XML_UNUSED,
      # RDFa
      'icon' => TAXONOMY_XML_UNUSED,
      # RDFa

      // Some arbitrary known values may come in handy. Store them as pure RDF if we can!
      'sameAs' => TAXONOMY_XML_OTHER_PREDICATE,
      'notation' => TAXONOMY_XML_OTHER_PREDICATE,
      # SKOS
      'identifier' => TAXONOMY_XML_OTHER_PREDICATE,
      # uBio
      'type.type.instance' => TAXONOMY_XML_OTHER_PREDICATE,
    );
  }

  // By listing the deliberately unused attributes the parser finds,
  // we can still be alerted to other unrecognised tags found in the input.
  // Perhaps they could be used. Otherwise the unused ones cause too much noise.
  // Experimental mapping
  // Freenet geographic areas.
  $synonyms['location.location.contains'] = TAXONOMY_XML_CHILD;
  $synonyms['location.location.containedby'] = TAXONOMY_XML_PARENT;
  return $synonyms;
}