On Line Technical Chat (2012-02-22)

A bunch of OGP developers are getting together on Wednesday, February 22th at 4pm Eastern Time. If you want to join us, email Steve McDonald (Stephen.McDonald@tufts.edu). Here’s our agenda:

  • Welcome
    • Introduce yourself and what you’re up to
  • OGP 1.1 Release
    • Server URLs: Config file changes and Solr Location field
    • GeoServer proxy for restricted layers
    • URL shortening
    • Save Image
    • Improved “Loading” indicator
    • Improved client side login, added logout
    • Download clean up
    • More POM file dependencies
    • Harvard PIN Authentication
    • Harvard’s link to a layer, sending arguments to OGP
    • Searching by solr keywords (e.g., LayerId:Tufts.Buildings)
    • Local JavaScript file loading (institutionSpecificJavaScript in ogpConfig.json needs documentation)
    • Shard evaluation support
    • OpenGeoIngest: Location field, test data generation
  • OpenGeoPortal 2012
    • April 13th all day meeting in Boston
  • Search
  • Future Plans
    • Raster ingest support

 

Larger Scale Search

Can OpenGeoPortal’s search scale to meet the communities future needs?  To answer this question, I investigated scaling both vertically and horizontally.

What if a Solr instance had to handle many more layers?  Most OGP Solr instances currently provide access to under 10,000 GIS layers.  Searches typically execute in under 10 milliseconds.  To test with more layers, I wrote code to create new layers by processing my existing layers and changing their LayerId field.  With this approach I created a Solr instance containing over 60,000 layers.  Running on a development VM with 1.5 GB of RAM and only a single core, search times increased to 200 milliseconds.  This is sufficient and provides very quick searches.

Having a single Solr instance wil eventually become a performance limitation.  When there are too many layers for a single machine to search quickly, it is important to distribute the search query across multiple machines.   Fortunately, this functionality is built in to Solr.  The metadata is broken into multiple pieces called “shards”, each shard might hold, for example, 150,000 layer.   Each shard runs on its own Solr instance, on its own server.  A query is sent to one Solr instance.  This Solr instance shares the query with all the other shards, combines all the search results and returns them.  I have run tests where OGP connects to two Solr instances.  To maximize network latency, I used one at server Tufts and one at Berkeley.  Although the time perform a search slowed to 200 milliseconds, that is still fast enough.

Based on these tests, I estimate a single, very powerful server can probably handle over 100,000 layers.  Perhaps it could handle twice that.  When the limit of a single machine is reached, we can scale out to multiple shards running on multiple servers.  If I had to, I imagine I could build a Solr repository to handled half a million layers.

On-Line Tech Talk Notes

Thanks to the 12 people from 8 universities that joined our inaugural OGP Tech Talk.

Berkeley’s project is to add OGP support for layers with a bounding box but that are not georeferenced was well received.  To support the effort, Tufts will add a new field to the Solr schema.  This will be in the OGP point release created at the end of January.  Support for non-georeferenced could also help us deal with collection level data.  The roll-out of this change to the Tufts production server will be coordinated other OGP sites relying on the Tufts Solr instance.

Tufts will initiate a discussion about moving from setting map server locations in a config file to using the existing “location” field in the Solr schema.

Getting OGP authentication to work with central authentication servers is a challenge.  We need to document thoroughly Harvard’s solution and understand the issues with Berkeley’s system.  This issue is critical because authentication is required for access to licensed/restricted data.  Dave Siegel at Harvard has started to document his solution at http://code.google.com/p/opengeoportal/wiki/PINAuthentication .  Berkeley will be sending code that demonstrates their authentication issue for troubleshooting.

There is very broad interest in shared cloud based OGP resources.  To quote MIT, “instead of figuring out how and when to harvest metadata from many sites/systems, we’d like to explore all sites contributing metadata to one shared site that can be easily searched by all
instanced of OpenGeoPortal”.  Tufts will take the lead by generating a beta proposal and starting a discussion within the community.  The current OGP functionality of each institution controlling whose data they see will be preserved.  It will likely require metadata standards.  We envision administration access similar to the OGP source repository; staff from multiple institutions will have commit privileges.   Such a system will eventually require we handle cross-institution de-duplication.

We will likely hold another OGP Tech Talk meeting at the end of January.  It may use a telephone based conference call to provide high quality audio and Adobe Connect for chat and screen sharing.  Please let Steve know if that would be a problem for you.

Additionally, feel free to use the mailing list for questions, meeting agenda items and announcements.

On-line Technical Chat

A bunch of OGP developers are getting together on Wednesday, January 11th at 4pm Eastern Time.  If you want to join us, email Steve McDonald (Stephen.McDonald@tufts.edu).  Here’s our agenda:

  • Status Update – what people are doing
  • Berkeley’s Project
  • Solr Schema Changes
  • Questions – Protected content programming

Here’s the details on Berkeley’s project:

Here at Berkeley, we have about 40,000 scanned maps, mostly from topographic map sets. These maps have to be georeferenced, which is a long process, but we can assign bounding boxes to each file/sheet. We would like to make these maps available to our patrons for searching and visualizing while we are working on georeferencing them. We have managed to get non-georeferenced images into Geoserver, so we can use the same server and access method for the non-georeferenced images as for the georeferenced ones.

PROPOSAL

We propose that we add a new key value pair to the Solr record. The key would be ‘georeferenced’ and the values would be true or false.

The Solr record would have a bounding box, but the bounding box would be approximate.

A layer whose Solr record ‘georeferenced’ field has a value of false would behave in every way like a regular layer except that
on preview, the layer would open in a new openlayer window.

Since the URL for accessing a non-georeferenced layer is different from a regular WMS call, there will also be some code that will deal with that, but it can easily work off the ‘georeferenced’ metadata tag. (here is an example

<a href=”http://linuxdev.lib.berkeley.edu:8080/geoserver/UCB/wms?service=WMS&version=1.1.0&request=GetMap&layers=UCB:images&CQL_FILTER=PATH=’furtwangler/17076013_02_015a.tif’&styles=&bbox=0.0,-65536.0,65536.0,0.0&width=512&height=512&srs=EPSG:404000&format=application/openlayers“>17076013_02_015a.tif</a>

notice the CQL_FILTER argument. This is a working URL, though the Geoserver is on a development system and may go down without notice.)

DEVELOPMENT

Since we have a pressing need for this modification, we would of course do the development and make the code available as an OGP module.

Welcome to OpenGeoportal.org

Attending FOSS4G? We’re having a Birds Of A Feather meeting on Wednesday, September 14th in room Tower A. Here’s the current agenda.

OpenGeoportal.io is a new group of geospatial professionals, developers, and librarians working together on the Open Geospatial Data Repository: a collaboratively developed, open source, federated web application to discover, preview, and retrieve geospatial data.  It is also a collaborative effort to share resources and best practices in the areas of application development, metadata creation, data sharing, data licensing, and data sources.  Please contact Patrick.Florance@Tufts.edu for more information.

GeoData@Tufts is a demo site of  the Tufts instance of OpenGeoportal.  Please note that it is currently underdevelopment.

  • Only Tufts, MIT, and MassGIS data is previewable and downloadable.
  • Harvard unrestricted data is now previewable and downloadable.
  • 1.0 release: mid-September, 2011.