Stanford University Libraries is in the beginning stages of developing metadata creation and management processes using the ISO 191** series of standards. The libraries currently maintain a large quantity of spatial data at the Stanford Geospatial Center located in the Earth Sciences Library. A majority of the metadata associated with these resources is provided in FGDC format, however, the completeness and consistency in the metadata is often lacking in essential areas.
The initial focus will be on the implementation of the following schemata:
- ISO 19115: Geographic Information – Metadata
- ISO 19115-2: Geographic Information – Metadata – Part 2: Extensions for Imagery and Gridded Data
- ISO 19139: Geographic information – Metadata -XML schema implementation
- ISO 19110: Geographic Information – Methodology for Feature Cataloguing
Use of ISO 19115-2 is preferred over ISO 19115 because the 19115-2 contains the entire 19115 standard as well as extension elements for describing acquisition information particular to raster datasets. The 19115-2 also allows for enhanced lineage and process documentation capabilities. However, the decision on which standard should be employed will largely depend upon the capabilities of the system(s) utilized to created and exchange metadata.
The ISO standard defines a number of fixed domain values and namespaces in order to support the normalization of information. Namespaces employ validation rules that can define input types for data entry such as free text character strings (gco:CharacterString), formatted dates (gco:Date), or decimal coordinates (gco:Decimal). Additionally, ISO relies heavily on the use of codelists in order to provide a standard set of terminologies that are defined within a common knowledge domain. For example, an institution could utilize the Resource Constraints metadata element to communicate and manage access rights using its Restriction code.
ResourceConstraints Element in ISO 19115-2:
<gmi:MI_Metadata...
<gmd:resourceConstraints>
<gmd:MD_LegalConstraints>
<gmd:accessConstraints>
<gmd:MD_RestrictionCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_RestrictionCode" codeListValue="license" codeSpace=”005”>license</gmd:MD_RestrictionCode>
</gmd:accessConstraints>
….</gmd:resourceConstraints>
Definition from ISO Codelist:
<CodeDefinition gml:id="MD_RestrictionCode_license">
<gml:description>formal permission to do something</gml:description>
<gml:identifier >license</gml:identifier>
</CodeDefinition>
A major departure from the FGDC-style metadata record is the encoding of entity and attribute information in a separate XML document using the ISO 19110 Feature Cataloging specification. This schema defines a collection along with its features and their attributes. The reference to the 19110 metadata is expressed using the xlink attribute in the ContentInfo section of the 19115-2 record. The xlink reference is generated using the universally unique identifier (uuid) assigned to each metadata file.
Example from ISO 19115-2:
<gmi:MI_Metadata...
<gmd:contentInfo>
<gmd:MD_FeatureCatalogueDescription>
<gmd:featureCatalogueCitation xlink:href="http://stanford.edu/uuid?71g91be3-1ff9-438b-bd89-9d89ec0bddfd"/>
</gmd:MD_FeatureCatalogueDescription>
</gmd:contentInfo>
...</gmi:MI_Metadata
Example from corresponding ISO 19110 record:
<gfc:FC_FeatureCatalogue xmlns="http://www.isotc211.org/2005/gfc"
xmlns:gco="http://www.isotc211.org/2005/gco"
xmlns:gfc="http://www.isotc211.org/2005/gfc"
...uuid="71g91be3-1ff9-438b-bd89-9d89ec0bddfd">
<gmx:name>
<gco:CharacterString>Feature Catalog for Electric Transmission Lines in the United States - 1010 Update</gco:CharacterString>
</gmx:name>
...</gfc:FC_FeatureCatalogue>
The standalone 19110 record allows for the feature catalog XML to be associated with one or more metadata records without having to duplicate the information inside of another document.
The flexibility provided by the xlink language is useful in a number of ways. One notable example is the ability to better manage contact information associated with an institution or individual. Spatial metadata may associate several responsible parties with the lifecycle of the data and these sections of information require dozens of lines of XML for each named entity. Complicating this matter even further is the inevitability that locations and access points for people and organizations will change over time. The ISO metadata model allows information to be managed externally and associated with any number of records. This relationship can be expressed as a hyperlink, directing the user to the external document. Or, records can be resolved to include the external XML inside of the main document. When modifications are made to the external file, all records that contain the xlink reference must be resolved again in order to be updated with the current information.
Example of unresolved contact element:
<gmd:contact xlink:href="http://www.stanford.edu/09A95C420FB821476665893256MOME37" xlink:title="Stanford Geospatial Center"/>
Example of a resolved record with contact information inserted from external xml file:
<gmi:MI_Metadata...
<gmd:contact xlink:title="Stanford Geospatial Center">
<gmd:CI_ResponsibleParty uuid="09A95C420FB821476665893256MOME37"">
<gmd:organisationName>
<gco:CharacterString>Stanford Geospatial Center</gco:CharacterString>
</gmd:organisationName>
<gmd:contactInfo xlink:type="simple">
<gmd:CI_Contact>
<gmd:address xlink:type="simple">
<gmd:CI_Address>
<gmd:deliveryPoint>
<gco:CharacterString>Branner Earth Sciences Library</gco:CharacterString>
</gmd:deliveryPoint>
<gmd:deliveryPoint>
<gco:CharacterString>397 Panama Mall</gco:CharacterString>
</gmd:deliveryPoint>
<gmd:city>
<gco:CharacterString>Stanford</gco:CharacterString>
</gmd:city>
<gmd:administrativeArea>
<gco:CharacterString>California</gco:CharacterString>
</gmd:administrativeArea>
<gmd:postalCode>
<gco:CharacterString>94305</gco:CharacterString>
</gmd:postalCode>
<gmd:electronicMailAddress>
<gco:CharacterString>brannerlibrary@stanford.edu</gco:CharacterString>
</gmd:electronicMailAddress>
</gmd:CI_Address>
</gmd:address>
The ISO model supports similar methods for reusing metadata from within a single record. This is a very useful convention for encoding long stanzas of XML that are recurrent in the metadata. For example, series metadata will contain information that is universal to all resources in a data series while other metadata elements may be unique to just one specific file. Through the use of corresponding identifiers, the uuid and uuidref attributes allow for content from one section of an XML document (uuid) to be included in another section (uuidref). This approach to metadata management will improve data standardization and appearance, as well as reduce the need to express redundant information within a record.
Example from Series Metadata:
<gmd:DS_Series> ...
<gmd:MD_DataIdentification>
<gmd:citation>
<gmd:CI_Citation uuid="STA95C42-7FB1-2847-9945893256DADE10">
<gmd:title>
<gco:CharacterString>Climatic Research Unit (CRU) Time Series 3.0 Monthly Precipitation Time-Series Data (1901-2006)</gco:CharacterString>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2009-12-10</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication" codeSpace="002">publication</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
...</gmd:DS_Series>
Example of individual dataset metadata within the Series Record:
<gmd:DS_Series>...
<gmd:has>...
<gmd:identificationInfo>
<gmd:MD_DataIdentification>
<gmd:citation uuidref="STA95C42-7FB1-2847-9945893256DADE10"/>
<gmd:extent>
<gmd:EX_Extent>
<gmd:temporalElement>
<gmd:EX_TemporalExtent>
<gmd:extent>
<gml:TimePeriod gml:id="TimePeriod_1">
<gml:begin>
<gml:TimeInstant gml:id="begin_1">
<gml:timePosition>1901-01-01</gml:timePosition>
</gml:TimeInstant>
</gml:begin>
<gml:end>
<gml:TimeInstant gml:id="end_1">
<gml:timePosition>1901-01-31</gml:timePosition>
</gml:TimeInstant>
</gml:end>
</gml:TimePeriod>
...</gmd:extent>
...</gmd:DS_Series>
In the above example, the citation information from the identification section is inherited from the series metadata. The temporal extent metadata, which is specific to this particular file, is recorded separately.
The structure of series records will largely depend upon the system used to create metadata. While the above example validates according to the schema definition, some systems create individual xml files which are linked together using parent/child relationships, referenced using a parentIdentifier uuid for the series metadata file. For example:
The Series record below uses the Scope Code ‘series’, and its metadata file Identifier is used by any child dataset metadata which belong to the series:
<gmd:fileIdentifier>
<gco:CharacterString>74d83c27-09e2-4e89-9d1e-a2f4af1d87e7</gco:CharacterString>
</gmd:fileIdentifier>
<gmd:hierarchyLevel>
<MD_ScopeCode xmlns="http://www.isotc211.org/2005/gmd"
codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode"
codeListValue="series"/>
</gmd:hierarchyLevel>
Example from corresponding child record with Scope code ‘dataset’ and referenced parent (series record) identifier:
<gmd:parentIdentifier>
<gco:CharacterString>74d83c27-09e2-4e89-9d1e-a2f4af1d87e7</gco:CharacterString>
</gmd:parentIdentifier>
<gmd:hierarchyLevel>
<MD_ScopeCode xmlns="http://www.isotc211.org/2005/gmd"
codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode"
codeListValue="dataset"/>
</gmd:hierarchyLevel>
Decisions about how to handle series information should be influenced by system capabilities and necessary levels of description for discovery of dataset collections.
While the architecture required to manage multiple XML records to describe one single file might at first seem complex, trends in managing geospatial metadata are increasingly leaning towards international acceptance of the ISO series in order to take advantage of its flexible and more semantically enabled data structure.
Currently, Stanford is surveying the effectiveness and overall quality of the ISO metadata for approximately 60 datasets. We will report back after our analysis is complete.
Links
ISO Schema Documentation
ISO Codelists