Through the Open Geoportal Stanford, Tufts, Columbia, UC Berkeley, and Harvard are working together to document, archive, and serve up all of the National Atlas data that went offline last fall. The project is near completion and all the data and metadata will be available around February 2015.
NYPL: Moving Historical Geodata to the Web, November 5th – 7th
The New York Public Library is hosting a a three-day meeting: Setting a Research and Development Agenda for Moving Historical Geodata to the Web, made possible through generous support from The Alfred P. Sloan Foundation. The meeting will provide the opportunity for key partners to shape the future of the field through collaborative planning and a shared agenda. The meeting will run from Wednesday, November 5th to Friday, November 7, 2014.
Greater NYC OGP Meeting Summary
So here is a summary of our greater NYC OGP Meeting held on 12/13/2013 at NYU. We had a great turnout of around 65 participants from the area. The complete agenda with links to presentations and participant list can be found on the Greater NYC OGP Meeting Event Page. We will be following up closely on specific action items.
Summary and Key Action Items
1. Prepare cost-benefit analysis statement for potential partners
2. Hold meeting for those interested / outline next steps
-
Jeremiah & Eric (Columbia), Frank (CUNY Baruch), Wangyal (Princeton), Alan Leidner (Booz Allen), Wendy (GISMO), Matt (NYPL), Him (NYU)
-
Determine others to invite
3. Establish regional hubs/nodes for Tri-state area
-
Four hubs/nodes
-
Greater NYC including Long Island, Westchester County., etc.
-
Tentative coordination by Columbia, NYU, CUNY
-
-
Upstate New York
-
Tentative coordination by Cornell?
-
-
New Jersey
-
Tentative coordination by Princeton. Others such as Rutgers?
-
-
Connecticut
-
Tentative coordination by UConn, Yale, Tufts.
-
-
Other organizations with compatible infrastructure can become a direct node
-
-
Coordinate data collection and ingest from smaller organizations
-
GISMO potentially serve as organizing body
-
Coordinate collection development and metadata creation
-
Identify key data repositories for inclusion
4. Coordinate outreach for presenting case for benefits to community
-
GISMO forum
-
Hold a webinar
5. Establish greater NYC OGP information architecture
-
Storage and interface separate
-
Explore single interface through hosted services
-
Develop and present cost analysis for various options
-
Tufts to set-up cloud hosted instance
-
Independence for NY node important
-
Crawling Rockland Site
Data from the Rockland County (NY) GIS site is now available at http://www.WorldWideGeoWeb.com. The following image shows some of the search results after a crawl based ingest of the Rockland data site:
Even though no web services may be available, these search results can still be previewed. The following image shows the Rockland boundaries for State Senate:
Here’s a screenshot with two separate bus routes previewed.
On the Rockland County site, many bus routes are stored in a single zip file. These are individually searchable and previewable on WorldWideGeoWeb.
Directed crawling of new sites presents new challenges and reveals limitations in the existing code. Several changes were made to successfully crawl Rockland, NY data at https://geopower.jws.com/rockland/DataPage.jsp .
The Rockland site does not contain links to zip files. A typical link to a data file is https://geopower.jws.com/rockland/DownloadData.jsp?pck_oid=2464. The crawl code was changed change to support links to servlets rather than simple zip files.
The Rockland metadata files contain minimal, often cryptic titles; for example,“monsey2” and “TZX”. These titles are not sufficient. Fortunately, they can be augmented with information scraped from the crawled web page. Specifically, the text from the anchor tag linking to DownloadData.jsp servlet is concatenated to the title field in the xml metadata file. This creates user friendly titles, for example, “monsey2: TOR Bus Routes” and “TZX: Tappan Zee Express Bus Route”.
The Rockland site contains zip files that hold multiple shapefiles. For example, the file TOR.zip contains 6 separate bus routes, each in a separate shapefile. Each shapefile is ingested as a separate entity so it can be independently searched and previewed. The 28 links on the Rockland data page expand into 47 searchable, previewable spatial resources. Note that the OpenGeoPortal download operation pulls down entire shape files from the Rockland server, not the individual shapefiles.
Since the Rockland site only supports secure connections, the ingest code was enhanced to support https.
Crawling Westchester Site
At the NYC OGP meeting, I demoed my thesis site. Since there isn’t a video of it, here’s a write-up covering the same material. There are also some slides available.
The goal of my Masters thesis is to make all the world’s spatial data accessible. This goal is accomplished by expanding OpenGeoPortal in two significant ways. First, spatial data files are discovered via web crawls and then ingested. Second, the ability to preview and download layers without needing OGC protocols was developed. This expanded version of OpenGeoPortal is on the web at http://WorldWideGeoWeb.com.
Data on WorldWideGeoWeb.com was discovered by crawling the web, relying exclusively on HTTP GET requests. This is the same technique used by Google and other search engines. The WorldWideGeoWeb crawler can be instructed to crawl a specific site. Sites are searched for links to zip files. The ingest code retrieves and unzips these files. If they contain a shapefile, the bounding box is determined using the shp and prj files. Any metadata file is also parsed. Information about each discovered layer is ingested into WorldWideGeoWeb’s Solr instance.
After ingest, OpenGeoPortal’s powerful search interface allows users to quickly and easily find spatial data layers. Preview of shapefiles on the map is based on parsing and rendering shapefiles entirely in JavaScript. It does not use image tiles from GeoServer or ArcGIS Server. When the user selects a layer to preview, the browser sends a request to the server to create a temporary, server-side copy of the zip file. During ingest, the URL of the zip file was stored in the Solr record. This is used and an HTTP GET request is issued to create local copy of the zip file. Then the file unzipped. At this point the browser requests the .shp, .shx, .prj and .dbf elements of the shapefile. They are processed in JavaScript as binary data streams. If the data is not in a suitable projection, it is reprojected on the browser. Then the features in the shapefile are parsed are rendered on OpenGeoPortal’s map. Attributes in the .dbf file are displayed as features are moused-over.
The following screenshot shows WorldWideGeoWeb.com. The search results were discovered by crawling Westchester County’s data web site at http://giswww.westchestergov.com/wcgis/DataWarehouse.htm. The map shows a preview for layer titled ”County Legislative Districts”. The browser debug panel at the bottom of the screenshot shows the network traffic generated by the preview request. The “cacheShapeFile.jsp” ajax call told the server to copy the shapefile from http://giswww.westchestergov.com using an HTTP GET and unzip the results. After the Ajax request completes, the .shx, .shp, .dbf and .prj are requested by the browser and parsed in JavaScript. Transferring this 220 kilobyte layer first to the WorldWideGeoWeb server and then to the browser took just under 2 seconds.
The user can add any of these Westchester layers to the cart and download them. The zip files are transferred directly from the Westchester server to the browser. Clipping the data or converting it to another format are not supported.
WorldWideGeoWeb shows it is possible to build a powerful, interactive portal without requiring data holders create web services. Data only available on web sites designed for people can be ingested using a web crawl and previewed using advanced JavaScript techniques that weren’t available when the OGC protocols were created. Since WorldWideGeoWeb is built on OpenGeoPortal, data available with web services can also be supported.
Limitations
There are significant limitations with the current version of the software. Most notable is its inability to deal with large shapefiles. Currently, shapefiles over one megabyte can cause the browser to hang. Search results are color-coded to advise the user. Green layers are small and should preview quickly. Yellow layers are larger but should preview without too much delay. Layers in red represent shapefiles over a megabyte and should not be previewed. Even these large layers can be downloaded easily downloaded, just not previewed.
My thesis code is not production quality.
Future Directions
During a crawl, only spatial resources in shapefiles are discovered and their associated metadata must be in FGDC or ISO19115. It would be trivial to add support for KML and KMZ files. Support for other file formats and metadata standards could also be integrated.
Crawling based OGC protocols such as Get Capabilities and CSW could be added.
The ranking of the search results could be based on the “Page Rank” of the page that linked to the zip file.
Semi-spatial data such as web pages about places could be ingested and searched spatially.
Other Notes
I now work for Voyager Search (http://voyagersearch.com/). We are investigating how some of these ideas could be incorporated into their existing products. The code created for my thesis has been released under the GPL.
Some data an organization provides may be more critical or widely used. This data could be available via an OGC compliant server while other, less critical data is made available only via HTTP GET.
Review of Open Geoportal National Summit
A wonderful review and summary of the Open Geoportal National Summit.
Notes from the Open Geoportal National Summit
by Frank Donnelly, Geospatial Data Librarian at Baruch College CUNY
OGP 2.0 Release
UNH Receives IMLS Award
Empowering the University of New Hampshire User Community with the Power of PLACE.
The University of New Hampshire Library and its partner, the Earth Systems Research Center, have been awarded a grant in the amount of $474,156 from the Institute for Museum and Library Services, National Leadership Grants for Libraries Program (Grant Award Number: LG-05-13-0350-13) to build PLACE, the Position-based Location Archive Coordinate Explorer. PLACE will be a geospatial search interface that will use embedded geospatial coordinates to enable easier discovery of information that can be difficult to locate through text based searching. Through PLACE, via a click or delineation of a search polygon on a web map, users will zoom to a region and will locate all UNH Library collections whose geographic extents intersect. Initially, PLACE will provide access to geographic collections focused on the region, but it will be flexible and expandable as collections grow. The project will provide users with access to these collections through a flexible visual interface and provide a toolkit for other institutions to implement in their geospatial collections. Ready access to embedded geospatial information in a flexible visual interface will contribute to the development of 21st-century skills by library users, such as visual, global, and environmental literacy.
The project will contribute to two open source communities: Open GeoPortal (OGP) and FEDORA. Tasks to accomplish our goals include creating standards compliant metadata for prototype collections and ingesting digital objects into FEDORA, purchasing and configuring a dedicated server for our instance of OGP, and integrating OGP with the FEDORA Solr index to provide a basic level of OGP functionality. We will build new tools not currently available in GeoPortal using Jscript and Jquery. The universal gazetteer tool will involve a common library of polygons, such as county boundaries, which will be available via pull down lists. Time series data is important for assessing changes over time: a cross reference table and a time slider on the interface will make it easier for users to select datasets by time periods. We plan usability studies throughout the project to optimize interface design, and enhancements for providing geospatial access to the unique geological fieldtrip guidebook literature, a feature supported in our needs analysis.
Contacts:
Thelma Thompson, Associate Professor & Government Information and Maps Librarian
(603)862-1132; thelma.thompson@unh.edu
Eleta Exline, Assistant Professor & Scholarly Communication Librarian
(603) 862-4252; eleta.exline@unh.edu
Michael Routhier, Information Technologist, Earth Systems Research Center
(603) 862-1792; mike.routhier@unh.edu
NYC Open Geoportal Meeting – December 13th
LINK TO SUMMARY OF GREATER NYC OPEN GEOPORTAL MEETING
NYC Open Geoportal Meeting
Friday, December 13th
Location: New York University (NYU)
Map of Meeting Location (PDF)
Morning Sessions (8:30am-12:30pm)
NYU Kimmel Center for University Life – Room 914, 60 Washington Square S, New York
Afternoon Working Meeting (1:30-4:45pm)
NYU Bobst Library – Room 619, 70 Washington Square S, New York, NY
Through the support of the Alfred P. Sloan Foundation and NYU, we are happy to announce the first New York City Open Geoportal Meeting on Friday, December 13th hosted by New York University (NYU). The goals of the meeting are to:
1. Build relationships between individuals and organizations interested in geospatial data management around the greater New York City area.
2. Introduce the Open Geoportal community, software application and underlying technology
3. Work towards setting up a NYC Open Geoportal environment.
AGENDA
In the morning we will introduce OGP functionality, underlying technology, metadata requirements, and future developments. We will also allow for selected participants to provide an optional five minute ignite talk introducing their interest in greater NYC geospatial data.
After the morning sessions, we will hold a smaller gathering of working meetings to discuss a NYC OGP task force, answer detailed technical questions, and determine key action items to develop a NYC OGP instance. The afternoon is designed for those with an interest in spatial data infrastructure (SDI).
Agenda (PDF)
DETAILED AGENDA
Time | Title | Presenter |
Morning Conference (8:30am-12:30pm) NYU Kimmel Center for University Life – Room 914 60 Washington Square S, New York, NY |
||
8:30-9:00 | Food & Refreshments | |
9:00-9:30am | Welcome & Introductions | Scott Collard, NYU |
9:30-10:00am | Ignite Sessions 1 Alan Leidner, Booz Allen; NYS GIS Association Matt Knutzen, New York Public Library Himanshu Mistry, New York University Nathan Storey, PediaCities |
Selected Presenters |
10:00-10:30am | Overview of Open Geoportal | Patrick Florance, Tufts |
10:30-11:00am | Overview OGP Information Architecture & Underlying Technology | Chris Barnett, Tufts Steve McDonald |
11:00-11:30am | Break with Refreshments | |
11:30-12:00pm | OGP Current & Future Developments: OGP 2.0, Harvester, Metadata Toolkit, Hosted Services, OGP Crawler |
Chris Barnett & Patrick Florance, Tufts, Steve McDonald |
12:00-12:30 | Ignite Sessions 2 Steve Romalewski, Center for Urban Research, CUNY Graduate Center Frank Donnelly, Baruch College–CUNY Holly Orr, New York University Eric Glass, Columbia University |
Selected Presenters |
12:30-1:30pm | Lunch | |
Afternoon Working Meetings (1:30-4:45pm) NYU Bobst Library – Room 619 70 Washington Square S, New York, NY |
||
1:30-2:15pm | Greater NYC Spatial Data Infrastructure (SDI) | Participants discuss their SDI Data holdings, IT infrastructure, staff, metadata, etc. |
2:15-3:00pm | Metadata Formats & Best Practices | Intro: Marc McGee, Harvard Participants discuss metadata authoring, workflow, coordination OpenGeoportal_Metadata |
3:00-3:30pm | Break | |
3:30-4:30pm | Information Architecture (IA) & Governance | Participants discuss IA & Governance needed to establish a greater NYC OGP Instance |
4:30-4:45 | Summary | Patrick Florance, Tufts |
REGISTRATION
General registration is now closed.
Participants include Columbia University, NYU, CUNY Graduate Center, Hunter-CUNY, Baruch-CUNY, New York Public Library, Tufts University, Harvard University, and various local and federal government agencies among others.
QUESTIONS
Please contact patrick.florance@tufts.edu
BACKGROUND
The idea of the NYC OGP originated during the National OGP Summit. The Open Geoportal is a collaboratively developed, open source, federated web application to rapidly discover, preview, and retrieve geospatial data from multiple repositories. It is also a community of best-practice around those interested in geospatial data management and spatial data infrastructure (SDI). It was developed by Tufts, Harvard, and MIT and currently has over 30 partner organizations.
Tufts Instance – just zoom in on the map to New York City, Boston, Afghanistan, etc. to see some relevant holdings
http://geodata.tufts.edu
Open Geoportal Organization
http://opengeoportal.org
Tufts Adds 200 Ecuador Layers including 2010 Census Data
Tufts recently added around 200 public data layers for Ecuador. This includes the complete 2010 Population and Housing Census. We processed and joined all the census data attributes down to the third level (Parroquias) administrative geography. The data is very difficult to access.