Open Data Should Serve More Than Good Ideas

By Andrew Turner

Walk through any park or college campus and you will quickly notice that dirt-worn pathways lay between and connect the sidewalks to indicate pedestrian shortcuts. These desire lines show an initial and repeated optimization that lays outside the paved paths. These ad-hoc networks are either adopted and paved or left to individual use, muddy in rain, undocumented and unsupported by groundskeeping. There are entire city (and even national) road networks that started as “cowpaths” and, through continued and growing use, became official infrastructure — roads and highways — which are relied on as a matter of business.

This road network is the infrastructure that governments develop and maintain for residents to build communities and businesses to conduct commercial transactions. Information systems are the next generation of government infrastructure. Tim O’Reilly has referred to “Government as a Platform,” identifying these services as a durable backbone on which we must be able to rely on to build our numerous and diverse applications.

Opening data: prototype or infrastructure

Open data started as simple file sharing. In my own city of Washington, D.C., the data catalog was a large and easy-to-read list of datasets available in common formats that included metadata and the date the dataset was last updated—many of the common features of popular open data websites. Through a series of public contests and challenges, developers downloaded and used these files to build compelling applications that demonstrated the future of government information technology. In 2008 and 2009, apps iLive.at, ParkItDC and 311.socialdc.org made use of thse open datasets and launched to critical recognition. In 2014, none of these apps is still online.

The fact that not one of the three contest-winning apps is currently available shows that the apps were interesting desire lines, but they were not a sustainable platform of information on which citizens could rely. They were indicative of the tendency to build simple, one-time applications that unfortunately miss the next, important step of becoming part of the platform they seek to improve. I have heard similar examples from other cities where civic hackers created well-meaning and well-built applications that sit so far outside of existing government operations that they require continued manual upkeep by volunteers. Commonly, these services simply stop receiving updates.

Unfortunately, even the original Washington, D.C. data catalog has atrophied. Based on my own experience looking for more recent crime data, the data catalog has not been updated since approximately September 2013. I learned that the internal system was being migrated and the transformation process had faltered. It was not a priority to get the data back online. Making crime data available on the portal was too removed from the actual job of analyzing and responding to crime to make a separate feed available with any defined timeline.

Despite the amazing capabilities technology can deliver, Government is foremost responsible for serving people, not technology. Everything governments do in the end is to serve the communities that elect, fund and are generally employed by these governments. When most civil engineers are designing roads, they do not apply grandiose design aesthetics and creativity. They pull open their codes and standards, determine the appropriate concrete mixture, depth and rebars based on specifications and get to work developing the road that fits the expected and reliable operations that citizens need.

Illustration by Livien Yin

Operational open data is sustainable open data

Numerous studies, reports, and community practices have made the case that open data has great potential benefit to residents and businesses, but open data also creates the ability for government agencies to more easily share data between one another and improve efficiency and decision making.

Governments have a difficult and important job. As entities, they are not enamored with new techniques or file formats. Attempting to create one-way extracts of data can cause strain that will eventually give way with the pressure of time, money or personnel. Proponents of new technologies need to understand these processes and costs. Technologies must be aligned with government operations if they are to ever become part of the government platform.

For open data to move from a desire line in the grass to a part of the stable infrastructure, we need to design it from the beginning to be practical, sustainable and operational. Open data needs to be the way government operates, and it needs to be part of the living systems that manage and process the data as part of day-to-day business.

Techniques for opening data — such as extract, transform and load (ETL) — allow for tremendous flexibility to explore new paths and opportunities. This lets governments observe and understand desire lines of how people want to use data. They can then make the decision to allow these datasets to become part of the supported network or to indicate a necessary redesign to accommodate new uses.

I have seen and built many ETL tools and community applications that worked outside of government. While fast and agile to implement, they are not ongoing and durable platforms for information access.

We should encourage and work directly with government technical staff to specify and prototype application programming interfaces (APIs). APIs provide a sustainable infrastructure for bringing prototypes to adoption. By developing an external interface to a service, the provider of that service is making a contract with end-users that is independent of what tools are being used on their side. Further, an API makes clear to developers that this is a service the government provider intends to support.

Data service prototypes can be quickly built with one set of tools. If successful, they can be migrated to existing government infrastructure while maintaining the same interface on which applications already rely. Just as observing the increasing erosion of a well-worn dirt path gives insight as to how it is used, usage analytics can help prioritize development and operationalization of government open data. APIs can measure analytics and usage more precisely than flat files. Who knows how many dirt paths there are among the flat files provided by governments?

Desire to collaboratively craft

In my six years of living in Washington, D.C., and watching the open government movement surge, I have observed the positive growth and excitement of people within government to collaborate in public. Publishing an open data catalog and running a competition is a start, but government representatives are eager to talk about ideas, share code and data and hear where they can open their infrastructure for new types of creative developments.

While much of the commercial web is focused on applications, common and open access via APIs is often ignored. Perhaps this is one case where it is better that the government moves slowly and is just now entering the time of the programmable web. For those who volunteer their time and technical expertise hoping to improve their communities, remember that open data and the apps it powers should not be thought experiments. Work closely with governments to create services people need and platforms to power them.

Andrew Turner Andrew Turner is the CTO of Esri R&D DC and is fascinated with the personal history of place. andrew@highearthorbit.com / @ajturner