for librarians, researchers and educators interested in information management

Relating to data


I am feeling a little bit pleased with myself, and more than a bit relieved, to have completed a 6 week MOOC through Waikato University called Data Mining with Weka, run by Prof Ian Witten.  It was an introduction to the basic principles and practical applications around data mining and machine learning, using Weka (Waikato Environment for Knowledge Analysis), an open source software developed at the Department of Computer Science, Waikato University, NZ.

Embarking on the course with no knowledge about data mining, or even data itself, I felt a lot like a traveller in strange lands, attempting to learn a language and culture that was completely foreign to me.  I’m not sure exactly why I enjoyed it so much, but it was strangely compelling and there was a odd sense of satisfaction mixed with slight uneasiness when I completed the assessments, knowing I knew what the correct answer was (most of the time), but not knowing how I knew it, or what it actually meant.

Anyway, I now know a small something about what a dataset looks like, what an attribute and an instance is, the importance of filtering, classifying, and evaluating data, and I had a taste of exploring data visualisation, among a few other things – all of which is probably the equivalent of being able to order a coffee (as opposed to say, have a conversation with a local about politics) in Spanish.

The subject of data seems to be making its presence felt in my consciousness in a number of ways lately.

A few weeks back I attended a seminar on research data (organised by WAGUL)  where I was fortunate to hear from a range of  interesting speakers, including the keynote speaker Sam Searle, e-research senior specialist in information management at Griffiths University.  Sam’s presentation covered a heap of topics  –  including what do we mean by research data, what are the key national services, data citation, and the role of the information manager in research data.  Some of the key points I came away with were:

  • There is emerging evidence of citation advantage by sharing detailed research data.  See the article by Timothy Smith in the Conversation  Scientists must share early and share often to boost citations
  • Only a small minority of academic libraries in the US, Canada and UK offer research data services, but they certainly plan to in the future (driven by funding agency requirements)
  • Without information professionals research data management is more likely to be approached as a set of purely technical problems
  • Engagement with researchers around RDM is not about being an expert, but rather listening and facilitating a process
  • There are lots of new roles being created for information professionals in data management, often are start-up in nature

Some of the key national services operating out of, and interweaving through, ANDS (Australian Data Network Service) are :

  • Research Data Australia  “an Internet-based discovery service designed to provide rich connections between data, projects, researchers and institutions, and promote visibility of Australian research data collections in search engines”.
  • Australian Research Data Commons : sharing and describing data collections, and providing infrastructure.
  • Cite My Data: enables research organisations to assign DOIs to research datasets or collections.
Other important data projects  are:
  • NeCTAR (National eResearch Collaboration Tools and Resources): “aims to enhance research collaboration by providing Information and Communication Technology (ICT) infrastructure to: create new centric research capabilities; simplify the combining of instruments, data, computing, and analysis applications; and access to multiple resources to develop research workflows”.  This includes providing vritual laboratories, eResearch tools, a federated research cloud and a national server.
  • HuNI (Humanities Networked Infrastructure): is “combining Australia’s most significant cultural datasets to build a national virtual laboratory”.  this includes enabling searches across databases, creating individual collections which can be analysed, annotated, exported, published and shared.

Clearly, data management is becoming an increasingly important issue for Australian universities, who are putting in place strategies and policies to assist researchers manage their data.  However, not many researchers think much about managing their data (as this survey by iVEC’s Florian Goessmann [et al] indicates).   There is a significant role for the library to play, particularly in providing information, creating awareness, and educating researchers on what is involved, what researchers should be thinking about, and what services and resources are available to assist in creating a data management strategy.  Rather than try to provide solutions, the role for librarians is to be involved in engagement and discussion.

However, while information managers can provide people with things to think about and consider, is there more that we can be involved in?   Those who work in libraries are a diverse lot, and bring a whole range of different skills, interests and experience to bear.

Some of the ways in which I would like to be involved with data on a deeper level are:

  • Engage, collaborate with and contribute to particular research projects.  As an information professional I would like to be able to offer meaningful assistance by being involved with research groups at the grass roots level, assisting with real situations, dealing with specific types of data and the various issues these throw up.
  • Be an active contributor to research projects.  The digital humanities offers a lot of potential here, because it overlaps so well with interests of the library including archives, metadata, and storage of humanities data.  Data visualisation is one example where the library can play a central, collaborative role.
  • Not only develop skills and knowledge around data, but to be able to teach others – there is a role for the library to be involved in teaching digital literacies, of which understanding data is a key element.
  • Using data for projects within the library.   Libraries can use their own data for research, for example through learning analytics, to improve the quality of the learning services they provide.

Having returned from my travels in the data mining MOOC more or less unscathed, I’ve been wondering if there are more courses on data around to help develop my infant skills, particularly in the area of data visualisation.  There’s one on Infographics and Data Visualisation running at the moment, and another on Information Visualisation starting in January, which I’m considering doing.

For now, though,  I’m off to order a coffee (in English) and – let me be very clear about this – that is not a metaphor and in no way means I’m about to use linear regression to determine a correlation coefficient using 10-fold cross-validation. Adios!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


This entry was posted on October 23, 2013 by in data, digital humanities, research support.
%d bloggers like this: