FOSS4G

We had a great time at FOSS4G, here’s some photographic evidence:

Thanks to Keith Jenkins for the picture!

OpenGeoportal.org Birds of a Feather meeting at FOSS4G 2011
September 14, 2011
Notes written by Lisa Sweeney

Attending:
·     Stephen McDonald (Tufts)
·     Patrick Florance (Tufts) – via skype
·     Chris Barnett (Tufts) – via skype
·     David Siegel (Harvard)
·     Matt Bertrand (Harvard)
·     Lisa Sweeney (MIT)
·     Garey Mills (Berkeley)
·     John Ridener (Berkeley)
·     Patricia Carbajales (Stanford)
·     Renzo Sanchez-Silva (Stanford)
·     William (University of Alaska, Fairbanks: Geographic Information Network of Alaska (http://www.gina.alaska.edu/) – Will is involved in developing catalog and web mapping apps for the GINA group)
·     Laurie Allen (Haverford (part of Tri Co Colleges in PA), previously worked at Penn State)
·     Sean Gillies (NYU)
·     Brian Hamlin (David Byron Environmental Center; OSGEO CA chapter member)
·     Arnulf Christl (OSGEO)

Anticipated attendance, but did not make it:
·     Karel Charvat (CCSS)
·     Karen Payne (GIST, USAID, U of Georgia)-via skype
·     Massimo Di Stefano (RPI – WHOI)
·     Keith Jenkins (Cornell)
Brief Updates
·     OpenGeoportal is under a GPL license (http://www.gnu.org/).
·     The code is available through Google Code – http://code.google.com/p/opengeoportal/
·     More project information is available at: http://opengeoportal.org/
·     Berkeley has implemented the first live instance of the OpenGeoportal code: http://gis.lib.berkeley.edu:8080/ This instance includes geodata records from Berkeley, Harvard, MIT, MassGIS, Princeton, and Tufts.
·     Tufts anticipates going into production by the end of September. The demo site is at: http://geoportal-demo.atech.tufts.edu/
·     Chris Barnett is currently working on server side code to enable creation of one zip file of data selected from across multiple institutions by a user.
·     Berkeley has gotten about 400 data layers loaded
·     Stanford currently scanning ~300 topographic maps per day; 10 terabytes in first 6 months
·     Alaska polar data; collaborating with ACER and USGS – working towards common data sharing schema

Governance
Notes from the group:
·     We need to determine ways to make decisions and move forward as a community.
·     We are working with massive repositories – these institutions hold more data than most government entities. The opengeoportal provides a federated approach to rapid discovery of geospatial data. We want a stewarded, guided infrastructure. Geocommons provides a platform for people to contribute any data they want. The opengeoportal should provide discovery of quality, controlled data that is good for decision making – data that will persist 50 years from now. Partners, including USAID, and the UN hold the same vision.
·     Aim decision making towards consensus
·     A project steering committee could be enough to enable the decision making structure we need to move forward for now. We want to keep this a dynamic group so things can keep rolling. It could evolve into something more formal in the future. If we can avoid creating a legal entity we can be more nimble. We need to define what the governance committee is capable of.
·     It was asked “What is the highest purpose of this project? Is it truly a repository that grows into a thing of its own?” – A board with a voting structure with institutional identity could be VERY slow and very difficult to really let be free.
·     We want an organization that allows things to shift, change and mutate and come to new ways of dealing with the community. Currently this is limited.
·     We don’t want massive infrastructure and overheard around creating partnerships. Partnerships can gain more defined clarity as we move through real life circumstances. The Stanford NDIP grant could be used as an example of what it means to be a partner.
·     Use tools for communication that will enable openness for decision making and  across areas of interest (for example: some people may want to track what is going on across all realms including code development, user interface, metadata, data, etc; others may prefer to focus on particular realms). There is out of the box forum software that could provide an intro page/ centerplace with easy navigation into major topics/interest areas making it easy for people to follow the topics of interest to them.
·     Should those contributing more have a stronger voice? There are multiple ways of contributing, for example: code, data sharing, personal involvement and leadership.
·     There needs to be a way for people to communicate ideas and concerns and update statuses
·     What types of powers are needed from institutions? For OSGEO– some people come as individuals, some come with institutional backing
·     Including international data and working with international institutions is something to work towards.
·     Non academic institutions/groups could be good to partner with as well
·     There are a variety of environments to work with under the umbrella of governing as an open source project – incubation under OSGEO is one option (http://www.osgeo.org/incubator/); Kuali.org could be another option
·     Some people preferred to create different groups/email lists for handling different major topics/interest areas
·     An Agile development model was suggested. Arnulf noted that scrum requires a scrum master, team, and product owner (not a committee) – goal, resources, plan, and timeframe are required for first iteration. The stake holders are the committee. This could be tried on a short term basis.
·     We could start with 4-5 institutions being involved in all decision making – no significant changes would be made without being vetted first (Harvard, Tufts, MIT, Berkeley, Stanford). Steve is the administrator on the Google code site and could provide commit ability to organizations as well. Currently working towards nightly build testing so if something breaks the build would know quickly.
·     Not everyone who is interested is represented today
·     Individual entities can always maintain local control over what they want and which SOLR instance(s) to search
·     Code governance is low hanging fruit compared to overall governance. It was estimated we will not be able to achieve a working service model before thanksgiving.

Federated Search
Options:
A – Institutions could run their own instance on their own hardware
B – A SOLR instance could be run in the cloud
C – a combination of A & B

Entities not contributing code, data or metadata may be interested in searching the opengeoportal
A suite of services could be offered, for example a cloud instance with a paid subscription – the Kuali Foundation provides an example of this (http://kuali.org/)
We should think about governance models for both .

A major benefit of the opengeoportal is the opportunity to reduce the level of effort and resources needed across all institutions by sharing resources for metadata creation, data loading, data hosting, code development, collection development, documentation, etc.
Running an instance in the cloud with multiple institutions contributing could reduce the workload across all institutions.

Harvard wants to run their own instance of opengeoportal for a variety of reasons.
Reasons to run an in house version at an institution could include: security concerns, protection of proprietary data, desire for a particular interface or branding
One possibility could be to run unlicensed data in the cloud and run proprietary data in house; a portal run internally to an institution could pull records from the cloud and combine with institutionally specific records and allow for whatever branding or particular interface enhancements desired
Since code is GPL licensing is protected

Berkeley – Garey has had the portal up for 2 weeks, but would like 6 months to just collect impressions before taking any new direction; it’s usable as is;
John is creating services around the new portal; scheduling talks around campus to let people know it’s available;
Garey developed an app for ingesting data from community – there are a number of GIS centers around campus with terabytes of data, and there is a desire to collect data from non-profits as well

What type of data should be included in the opengeoportal?
Academic institutions desire access to historic and current curated data of long-lasting interest that could be useful in teaching and research
Requirements for publishing research data followings standards are increasing for grant funded projects
Government entities, like MASSGIS, are most interested in current data

Arnulf asked about the goals of the opengeoportal. His observation was this sounded more like a meta project/domain group that is pulling together existing software pieces to create the portal, rather than a software development group. Will there be code contributions back to the open source software projects being used by the portal?
Stephen noted there is the potential to help with search and contribute back code. “Today we have institutions with a known problem and money going forward toward solving it. We are learning how to collaborate with others.”

We all recognize Patrick has done a lot to get the opengeoportal to where it is now, but he stated he is looking to step back a bit. He stated he really wants everyone’s active leadership in this.

Next Steps
-     Draft key areas of governance and decision making practices to share with group for discussion – Lisa (MIT), Steve (Tufts), and Patricia (Stanford, working with Julie Sweetkind-Singer) offered to lead this
-     Communication – it will be critical to establish positive and strong lines of communication to keep things moving forward and avoid duplication of efforts. An existing code base now exists. Next we need to start thinking about how to create synergies and avoid duplication of efforts regarding areas like metadata creation, data loading, documentation, collection development, and learning from the experiences of the institutions that have been running geodata repositories for close to a decade (Harvard and MIT).
-     We should evaluate the communication tools being used to ensure they are open, easy to use, easy to maintain, preferably not requiring a subscription that would need to be maintained, and can be easily managed by groups. Opengeoportal.org is currently a wordpress site. Blog posts persist over time, but comments can be hard to track. WordPress charges a modest fee that requires annual renewal. Code currently sits in google code for free. Github has a built in wiki for projects and is free as long as the repository is public. If opengeoportal incubates under OSGEO then osgeo would provide some umbrella structure, including wiki space, and mailing list options.
-     The contact information for fielding requests for information and administering mailing lists and code access should be shifted from individuals to groups willing to help sustain the efforts.
-     Establish how documentation is handled for each area – code, metadata, data, etc.

Follow Ups
·     It was suggested to follow-up in person with some working meetings to hammer out some of the details. Patrick has been seeking some funding to help support travel for these.
·     Suggested next in-person meeting:  end of Oct. around the wherecamp Boston meeting (oct.
·     The next in-person meeting could possibly be on the west coast (NoCal Bay area) just before the Christmas/New Year holiday break.

Future Requests
·     backwards compatible changes to solr records – faceted searching leading to schema changes; process in place so don’t break anything
·     Create centralized code and issue tracking system

Meeting Agenda

Developer Meet-Up

Developers are meeting at the Tuesday evening Welcome Social at the Wynkoop Brewing Company.

Birds Of A Feather

There’s an OGP Birds Of A Feather meeting on Wednesday at 5:30pm in room Tower A.  Here’s are current agenda:

  • Introductions
  • Governance Issues (20 mins)
  • Federated Search (10 mins)
  • Contributions (10 mins)
  • The Future (20 mins)

Introductions

We’re a new community that needs to work together.  The first step will be to put faces to email addresses and learn what everyone is up to.

  • Your OpenGeoPortal status or plans
  • Your concerns and interests

Governance Issues

To date, the development of the OpenGeoPortal has been led by Tufts with valuable contributions from Harvard and MIT.  This must change.  We all want to evolve beyond this small team that was essential for the initial rapid development to a much broader, energetic community effort.

  • What should be be planning for, on what schedule
  • We’re thinking about a diverse advisory board with representation from multiple universities and non-educational organizations, representation from repository managers and developers.  Does that make sense?
  • Does anyone have experience as a founding board member?  Experience working with OSGeo?

Federated Search

Currently each institution runs its own portal with its own Solr search instance.  We manually share layer data as FGDC or Solr ingestable files.  It requires work to keep sharing and ingesting layers.  Should we try to do better?

  • Should we consider sharing via something like GeoNetworks, library infrastructure
  • Do we want a single, cloud based Solr instance with data from many institutions?
  • Do we want a single portal running in the cloud?
  • Do we want GeoServer and a repository running in the cloud?

Contributions

A vibrant community needs a “To Do” list.  What should be on it?  Here’s a start:

  • Improving communications
  • Thesaurus based auto-completion and search augmentation
  • Metadata editor
  • Faceted search
  • Automated sharing of metadata
  • Non-map data
  • Improving search
  • Other open-source projects to connect with

The Future

A BoF meeting is a start.  Where do we go from here?

  • Action items
  • Advisory board volunteers
  • Developers signed-up for tasks
  • Further meetings this week