OpenGeoportal Ingest Technical Overview

OpenGeoportal Ingest is a Java web application that utilizes the Spring MVC framework and Spring Security. Currently, XML is parsed by walking the DOM via the W3C DOM API for speed and flexibility. Jackson is used for JSON parsing, SolrJ for interfacing with Solr and Apache HTTPComponents is used for HTTP connections. jQuery and Twitter Bootstap are used on the web client.

The general mechanism of ogpIngest is to gather layer-level geospatial metadata of disparate types (FGDC and ISO 19115/19139 XML documents, WMS GetCapabilities documents, remote OGP instances, library records, etc.), translate these to a common schema, then perform some validation on the resulting fields, which are finally ingested into a local Solr index. This allows an OGP instance to then search the records. Issues and results are reported back to the user as they occur.

With form submission, an IngestStatus object with a jobId (UUID) is created and registered with the IngestStatusManager. The jobId is returned to the web client so that it can poll the IngestStatusController for information about the ingest job. At the same time, a new thread is spawned to handle the particular ingest job, which is passed a reference to the IngestStatus and arguments from the web form submitted by the user. As messages (“Success”, “Warning”, “Fail”) are generated, they are added to the IngestStatus object. Once ingest is complete, a summary of info messages can be viewed in a spreadsheet report.

Generally, the first stage of the process is to parse incoming metadata to populate an intermediary Metadata object. The Metadata object contains fields relevant to the OGP Solr schema. Typically it is populated with some combination of data from the metadata itself and information supplied in ingest.properties. The purpose of this intermediate object is twofold. An intermediate object provides some level of decoupling from the Solr schema, since it is likely to change over time. Additionally fields can be strongly typed as deemed convenient, while Solr schemas are by nature somewhat loosely typed.

If parsing is successful, the process continues with SolrIngest. SolrIngest performs some validation on the Metadata fields (including enforcement of Required Fields) and converts the Metadata object into a SolrRecord object which can be directly ingested to a Solr instance via the SolrJ library. Since all parsing processes result in a Metadata object, the same SolrIngest can be shared for each type of ingest process.