Open@VT

Open Access, Open Data, and Open Educational Resources

Tag Archives: Open Data Week

Open Data Week Will Feature ContentMine, Data Visualization, Panel Discussions

The University Libraries will be hosting its second Open Data Week on April 10-13 with opportunities to learn more about sharing, visualizing, finding, mining, and reusing data for research. In addition to panel discussions on open research data as well as on text and data mining, there will be two sessions on data visualization. From Tuesday through Thursday, join one or more sessions featuring guests Thomas Arrow and Stefan Kasberger from ContentMine to learn about open source tools in development for mining scholarly and research literature. ContentMine software “allows users to gather papers from many different sources, standardize the material, and process them to look up and/or search for key terms, phrases, patterns, and more.” Be sure to register for limited capacity events (Lunch on Wednesday 4/12, and the in-depth workshop on Thursday 4/13); links and full schedule below. For more information, see our Open Data Week guide, and use our hashtag, #VTODW.

Open Data Week featuring ContentMine

Monday April 10
Open Research/Open Data Forum: Transparency, Sharing, and Reproducibility in Scholarship
6:30-8:00pm, in Torgersen Hall 1100 (NLI credit available)

Join our panelists for a discussion on challenges and opportunities related to sharing and using open data in research, including meeting funder and journal guidelines:

  • Daniel Chen (Ph.D. candidate in Genetics, Bioinformatics, and Computational Biology)
  • Karen DePauw (Vice President and Dean for Graduate Education)
  • Sally Morton (Dean, College of Science)
  • Jon Petters (Data Management Consultant, University Libraries)
  • David Radcliffe (English)
  • Laura Sands (Center for Gerontology)

Tuesday April 11
Introduction to Content Mine – Tools for Mining Scholarly Literature
9:30-10:45am, Newman Library Multipurpose Room (NLI credit available)

Join ContentMine instructors for an overview of text and data mining, and an introduction to ContentMine tools for text and data mining of scholarly and research literature.

Tuesday April 11
Data Visualization with Tableau
10:30 am -12:00 pm, Torgersen 1100 (NLI registration)

With the Tableau data visualization software, you or your students can easily turn research data into detailed, interactive visualizations that tell the story that numbers alone struggle to express. The software can link directly to your data sources so you always have the most up-to-date data on hand without exporting manually, and easily generate hundreds of types of visualizations that include interactive elements.

Wednesday April 12
Introduction to Content Mine: Tools for Mining Scholarly Literature
9:00-9:55am, Newman Library Multipurpose Room (NLI credit available)

Join ContentMine instructors for an overview of text and data mining, and an introduction to ContentMine tools for text and data mining of scholarly and research literature.

Wednesday April 12
Making Visible the Invisible: Data Visualization and Poster Design
9:30-11:00am, Newman 207A (NLI registration)

Visually representing data helps users and readers engage with the content, understand key findings, and retain information. Exploring, creating, and presenting these visual representations is becoming critical for teaching, academic research, and professional engagement. In this session we will explore the basics of data visualization and poster design, and look at a few tools to create different kinds of visualizations. We will also discuss the academic and professional value in visualizing data.

Wednesday April 12
ContentMine and Specialized Tools for Life Sciences Research
11:15-12:05pm, Newman Library Multipurpose Room (NLI credit available)

Join students in a computational biochemistry informatics class session for an introduction to ContentMine open source tools for text and data mining to explore research literature sources, with a focus on tools related to mining and exploring content for Life Sciences research (phylogeny and and visualization).

Wednesday April 12
Lunch with ContentMine guest speakers and program participants
12:30-1:30, Location TBA (Registration required; Limit: 50 participants)

Wednesday April 12
Text and Data Mining Forum
2:30-3:45pm, Newman MultiPurpose Room (NLI credit available)

Join our panelists for a discussion about opportunities and challenges related to text and data mining, with a focus on research purposes and information access. Audience questions are encouraged.

  • Tom Arrow (ContentMine)
  • Tom Ewing (College of Liberal Arts and Human Sciences, Virginia Tech)
  • Weiguo (Patrick) Fan (Pamplin College of Business, Virginia Tech)
  • Ed Fox (Computer Science, Virginia Tech)
  • Leanna House (Statistics, Virginia Tech)
  • Brent Huang (Computer Science, Virginia Tech)

ContentMine logo

Wednesday April 12
Introduction to Content Mine: Tools for Mining Scholarly Literature
4:00-5:15pm, Newman ScaleUp Classroom (101S) (NLI credit available)

Join ContentMine instructors for an overview of text and data mining, and an introduction to ContentMine tools for text and data mining of scholarly and research literature.

Thursday April 13
ContentMine Tools to Explore Scholarly Literature: A Full Day, Hands-On Workshop
9:00am – 4:00pm, Newman Library 207A (Registration required; also, NLI credit available; Coffee and Lunch provided)

During this workshop participants will: (1) ensure the software is functioning on their laptop computer, (2) participate in individual and group hands-on exercises to become more familiar with ContentMine tools, and (3) have the opportunity to experiment with using ContentMine tools with ContentMine instructors’ support – to mine scholarly literature and explore results specific to their own research project goals. Prior to the workshop, attendees will receive instructions to download software and make any other preparations to get the most of of the workshop.

Open Data Week in Review

Last week Virginia Tech’s University Libraries hosted its inaugural Open Data Week with six programs on a variety of open data topics. The new format builds on last year’s Open Data Day, which incorporated a hackathon and roundtable discussions. However, the weekend scheduling and a conflict with spring break this year spurred us to create a new event friendlier to academic schedules, with programs throughout the week. Though we hadn’t heard of anyone having an Open Data Week before, we know that Virginia Tech is supposed to “Invent The Future,” so we did. Here’s a summary of the week’s programs.

Open Data Week logo

In our first program of the week, Data Anonymization: Lessons from an Millennium Challenge Corporation Impact Evaluation, Ralph P. Hall (Urban Affairs and Planning) and Eric Vance (Director, LISA- Laboratory for Interdisciplinary Statistical Analysis) described their evaluation of a rural water supply project in Mozambique, which involved household surveys (slides, MCC documentation).

Ralph P. Hall

Ralph P. Hall

The first lesson learned from their evaluation was that everything is linked to the informed consent. The primary takeaway here is the importance of distinguishing between anonymity and confidentiality (see slide 18), the latter of which provides researchers much more flexibility. In addition, there were difficulties with the translation of informed consent into Portuguese and local languages. Other lessons include not underestimating the time required to anonymize data, and designing surveying instruments to minimize anonymization challenges. Unfortunately, the anonymization challenges resulted in an analysis that is not reproducible and data that cannot be shared with a follow-up evaluation team. Data anonymization is a persistent and complex issue that needs to be discussed more frequently, and will certainly be on the agenda of future Open Data Weeks.

Our session on The Freedom of Information Act (FOIA) featured three speakers: Wat Hopkins (Dept. of Communication), Steve Capaldo (University Legal Counsel), and Siddhartha Roy (Flint Water Study team).

Wat Hopkins

Wat Hopkins

Wat Hopkins focused on FOIA in Virginia. FOIA first emerged at the federal level from a 1964 Supreme Court case, and subsequently Virginia was among the first to implement FOIA at the state level in the late 1960s. FOIA laws vary greatly from state to state. In Virginia, FOIA applies to records and meetings. Record requests must receive a response within 5 days and do not need to be in writing (though federal FOIA does require it), and there are around 130 exemptions. Requests must come from a Virginia citizen, or a news organization with circulation or broadcast in some part of the state. For more information, see Virginia’s Freedom of Information Advisory Council, and the Virginia Coalition for Open Government’s FOI Citizens Guide. Ultimately, we can’t be responsible citizens without access to government information.

Steve Capaldo said that since Virginia Tech is a state agency, it is governed by Virginia FOIA. However, the university responds to requests from everyone, not just residents or the media, and will do so within 5 days. There are many exemptions, including some involving research (proprietary or classified research, and grant proposals), personnel records, and records involving security, such as building plans. He emphasized the importance of making requests as specific as possible in order to reduce the time and effort required to respond. And although it’s not required, Capaldo suggested that it can be helpful when requestors explain the context of their request, because sometimes information needs can be met in alternative ways.

Sid Roy, a member of the Flint Water Study team and a graduate student in Civil and Environmental Engineering, described the Flint water crisis which has spanned 18 months and affected 100,000 people. In the process, an EPA employee was silenced and the fallout has included several resignations. The crisis response involved FOIA requests to the city of Flint, the Michigan Department of Environmental Quality, and the EPA. Interestingly, federal FOIA requires an acknowledgement of the request within 2 weeks, but there is no time limit for responding with the requested information. Roy relayed the FOIA advice of the project’s leader, Dr. Marc Edwards: first, be as specific as possible in your request, and second, make requests to a related agency that is not the primary target. For example, the team made FOIA requests to Flint in order to obtain communications and data from EPA. Although we ran out of time to discuss FOIA costs, according to the Flint Water Study GoFundMe page, their FOIA expenses came to $3,180 (while you are on that page, consider a donation!). In short, Roy recommended that FOIA should be in every scientist’s toolbox.

In Library Data Services: Supporting Data-Enabled Teaching and Research @ VT , Andi Ogier gave an overview of the three services offered: education (data management and fluency), curation (capturing context and ensuring reuse), and consulting (embedding informatics methods into research, and teaching about proprietary formats and the need for using open standards). Data Services strives to help researchers have their data achieve impact on the scholarly record, remain useful over time and across disciplines, and have it openly shared for the benefit of humanity. The library helps with data management plans required by funders, and can assign DOIs to datasets. The presentation coincided with the beta release of VTechData, a data repository to help Virginia Tech researchers provide access to and preserve their data.

Show Me the (Open) Data! with librarians Ginny Pannabecker and Andi Ogier was a conversational, exploratory session devoted to identifying open data sets. At the session, they introduced a new guide to finding data, which in addition to listing data sources also includes definitions and information on citing data.

Web Scraping session with Ben Schoenfeld

Web Scraping session with Ben Schoenfeld

Scraping Websites: How to Automate the Collection of Data from the Web was led by Ben Schoenfeld of Code for New River Valley, a Code for America brigade that meets biweekly to work on civic projects. As the slides explain, some programming skills are needed to effectively obtain and clean up data from websites lacking an API, and the basic steps are outlined. The live demonstration, using local restaurant health inspection data, did a good job of showing what is possible. One of our developers in the library, Keith Gilbertson, wrote a blog post about the session and how he applied the skills he learned to a database of state salaries.

Intro to APIs: What’s an API and How Can I Use One? was led by Neal Feierabend, also of Code for NRV (slides follow the scraping slides with slide 17). After an explanation of what APIs (application programming interfaces) are and what types are available, the live demo explored a few APIs, beginning with the Google Maps API. Use of this API is free up to a certain number of page loads, and usage beyond that requires a fee– a model used by many popular APIs. This is one reason Craigslist switched from Google Maps to OpenStreetMap, which as an open mapping tool enables download of the data. Generally, good APIs are those that are well documented. Both Neal and Ben attested to the value of using Stack Overflow and searching the web when encountering coding problems. After the session I found out there are also web services for data extraction like import.io.

Thanks to all of our presenters and attendees, and please let us know if you have suggestions for Open Data Week programs. We hope to do it again next year!