Last week Virginia Tech’s University Libraries hosted its inaugural Open Data Week with six programs on a variety of open data topics. The new format builds on last year’s Open Data Day, which incorporated a hackathon and roundtable discussions. However, the weekend scheduling and a conflict with spring break this year spurred us to create a new event friendlier to academic schedules, with programs throughout the week. Though we hadn’t heard of anyone having an Open Data Week before, we know that Virginia Tech is supposed to “Invent The Future,” so we did. Here’s a summary of the week’s programs.
In our first program of the week, Data Anonymization: Lessons from an Millennium Challenge Corporation Impact Evaluation, Ralph P. Hall (Urban Affairs and Planning) and Eric Vance (Director, LISA- Laboratory for Interdisciplinary Statistical Analysis) described their evaluation of a rural water supply project in Mozambique, which involved household surveys (slides, MCC documentation).
The first lesson learned from their evaluation was that everything is linked to the informed consent. The primary takeaway here is the importance of distinguishing between anonymity and confidentiality (see slide 18), the latter of which provides researchers much more flexibility. In addition, there were difficulties with the translation of informed consent into Portuguese and local languages. Other lessons include not underestimating the time required to anonymize data, and designing surveying instruments to minimize anonymization challenges. Unfortunately, the anonymization challenges resulted in an analysis that is not reproducible and data that cannot be shared with a follow-up evaluation team. Data anonymization is a persistent and complex issue that needs to be discussed more frequently, and will certainly be on the agenda of future Open Data Weeks.
Our session on The Freedom of Information Act (FOIA) featured three speakers: Wat Hopkins (Dept. of Communication), Steve Capaldo (University Legal Counsel), and Siddhartha Roy (Flint Water Study team).
Wat Hopkins focused on FOIA in Virginia. FOIA first emerged at the federal level from a 1964 Supreme Court case, and subsequently Virginia was among the first to implement FOIA at the state level in the late 1960s. FOIA laws vary greatly from state to state. In Virginia, FOIA applies to records and meetings. Record requests must receive a response within 5 days and do not need to be in writing (though federal FOIA does require it), and there are around 130 exemptions. Requests must come from a Virginia citizen, or a news organization with circulation or broadcast in some part of the state. For more information, see Virginia’s Freedom of Information Advisory Council, and the Virginia Coalition for Open Government’s FOI Citizens Guide. Ultimately, we can’t be responsible citizens without access to government information.
Steve Capaldo said that since Virginia Tech is a state agency, it is governed by Virginia FOIA. However, the university responds to requests from everyone, not just residents or the media, and will do so within 5 days. There are many exemptions, including some involving research (proprietary or classified research, and grant proposals), personnel records, and records involving security, such as building plans. He emphasized the importance of making requests as specific as possible in order to reduce the time and effort required to respond. And although it’s not required, Capaldo suggested that it can be helpful when requestors explain the context of their request, because sometimes information needs can be met in alternative ways.
Sid Roy, a member of the Flint Water Study team and a graduate student in Civil and Environmental Engineering, described the Flint water crisis which has spanned 18 months and affected 100,000 people. In the process, an EPA employee was silenced and the fallout has included several resignations. The crisis response involved FOIA requests to the city of Flint, the Michigan Department of Environmental Quality, and the EPA. Interestingly, federal FOIA requires an acknowledgement of the request within 2 weeks, but there is no time limit for responding with the requested information. Roy relayed the FOIA advice of the project’s leader, Dr. Marc Edwards: first, be as specific as possible in your request, and second, make requests to a related agency that is not the primary target. For example, the team made FOIA requests to Flint in order to obtain communications and data from EPA. Although we ran out of time to discuss FOIA costs, according to the Flint Water Study GoFundMe page, their FOIA expenses came to $3,180 (while you are on that page, consider a donation!). In short, Roy recommended that FOIA should be in every scientist’s toolbox.
In Library Data Services: Supporting Data-Enabled Teaching and Research @ VT , Andi Ogier gave an overview of the three services offered: education (data management and fluency), curation (capturing context and ensuring reuse), and consulting (embedding informatics methods into research, and teaching about proprietary formats and the need for using open standards). Data Services strives to help researchers have their data achieve impact on the scholarly record, remain useful over time and across disciplines, and have it openly shared for the benefit of humanity. The library helps with data management plans required by funders, and can assign DOIs to datasets. The presentation coincided with the beta release of VTechData, a data repository to help Virginia Tech researchers provide access to and preserve their data.
Show Me the (Open) Data! with librarians Ginny Pannabecker and Andi Ogier was a conversational, exploratory session devoted to identifying open data sets. At the session, they introduced a new guide to finding data, which in addition to listing data sources also includes definitions and information on citing data.
Scraping Websites: How to Automate the Collection of Data from the Web was led by Ben Schoenfeld of Code for New River Valley, a Code for America brigade that meets biweekly to work on civic projects. As the slides explain, some programming skills are needed to effectively obtain and clean up data from websites lacking an API, and the basic steps are outlined. The live demonstration, using local restaurant health inspection data, did a good job of showing what is possible. One of our developers in the library, Keith Gilbertson, wrote a blog post about the session and how he applied the skills he learned to a database of state salaries.
Intro to APIs: What’s an API and How Can I Use One? was led by Neal Feierabend, also of Code for NRV (slides follow the scraping slides with slide 17). After an explanation of what APIs (application programming interfaces) are and what types are available, the live demo explored a few APIs, beginning with the Google Maps API. Use of this API is free up to a certain number of page loads, and usage beyond that requires a fee– a model used by many popular APIs. This is one reason Craigslist switched from Google Maps to OpenStreetMap, which as an open mapping tool enables download of the data. Generally, good APIs are those that are well documented. Both Neal and Ben attested to the value of using Stack Overflow and searching the web when encountering coding problems. After the session I found out there are also web services for data extraction like import.io.
Thanks to all of our presenters and attendees, and please let us know if you have suggestions for Open Data Week programs. We hope to do it again next year!