Data Science for Librarians: why you should care and what you can do?

Big data and small data

2012 is the year when everyone gets to know about the term Big Data. Like other buzzwords, there are various definitions for this term. In his article about this topic, Timo Elliot enumerates seven definitions of Big Data from its originality, and the aspects of technology, data destination, and a new term for old stuff, etc.

Big Data is listed in the Top 10 Technology Trends in 2013 by the renowned Gartner Inc. But more importantly, people are seeing it every day. One example is Nate Silver’s prediction of the 2013 US President Election. As a statistician, he successfully predicted the winner of the election in all 50 states in the US by exploring, collecting and analyzing large volume of data.

But on the other hand, it is felt by many that the concept of Big Data is too broad and vague to define. Moreover, due to the resources and expertise required by Big Data, it may not be the best solution for smaller companies. As such, some people coined the term “Small Data”, which means smaller datasets often resides on smaller software or platforms, whose structure is also more decentralized. Regardless of the names and volumes, it’s probably safe to say that what really matters is data science, including but not limited to collecting, managing, preserving and presenting data, rather than these dry catch phrases.

Why librarians should care about data

First of all, libraries are in the bigger information industry. Data are the source of information which we care about. As librarians, we offer library members all kinds of information (books, other materials, and reference). And before offering them, we need to collect and manage the information. As a result, many library processes can be seen as a part of data science.

Another, which may be a more direct reason, is that our members (especially those of academic libraries) need these services and we have the expertise to support their needs. From January 2011, data management plan is required for all grant applications to National Science Foundation. As a result, research communities need librarians to help them with data and data management, which is also a chance for librarians to rebrand themselves as a more valuable facilitator for academic research.

Last but not least, when we are offering services and when members are using libraries, a huge amount of data are also created, for example, the data about our members and how they are using the library. These kinds of data, after appropriate gathering and analysis, can help us build better services.

Data services

Data services are one example of libraries’ participating in Data Management movement, many people see them as a promising library services for the future. An increasingly number of academic libraries is offering these services to their university communities. ACRL Research Planning and Review Committee identify data curation as one of the top ten trends in academic libraries in 2012.

According to a survey about how academic libraries in North America are offering data services conducted by ACRL in 2011, even though data services are still in an early stage for most of the libraries, many libraries are doing a good job in using their traditional expertise to enter this field, for example, nearly half of the participants (44.1%) are offering reference services to help members find and cite data or data sets.

Learning Resources

As promising as the future of data services seems to be, a number of gaps between where we are and where we will/should be can be easily found. Besides traditional library skills, better technical skills and knowledge of specific research fields are definitely needed for librarian to offer these services. This list of expertise may look daunting for new-comers in this field. But don’t’ worry. There are various resources you can make good use of.

The first kind of resource is Open Online Courses. Bill Howe from University of Washington offered a MOOC on Coursera this summer titled “Introduction to Data Science”, which includes some “hard-core” technical skills needed in this field. Another course with the same title will be offered by School of Information Studies at Syracuse University in this fall. For Open Online Courses that are not that massive, RDMRose and 3TU.Datacentrum are the two projects in which you can find useful course materials.

Increasingly more library researchers and librarians have been paying attention to this field. A large number of books, articles and reports talking about this topic have been published. One example is Professor Jeffrey M. Stanton’s book “Introduction to Data Science”, who is a professor in School of Information Studies at Syracuse University. You can download the PDF version of this “open source” textbook at Professor Jeffrey M. Stanton’s website.

Another source worth mentioning is conferences. For example, in this year’s ALA Annual Conference, there are a number of sessions about this topic. One of the sessions is Data, E-Data, Data Curation: Our New Frontier, in which practitioners from California Digital Library, Purdue University Libraries, and University of Illinois at Chicago shared their experience and observation of this field.

Is your library offering data services? Do you know any other sources for data management that can be used by librarians? And what’s your opinion about this article. We are interested in hearing your voices.