for librarians, researchers and educators interested in information management
I am feeling a little bit pleased with myself, and more than a bit relieved, to have completed a 6 week MOOC through Waikato University called Data Mining with Weka, run by Prof Ian Witten. It was an introduction to the basic principles and practical applications around data mining and machine learning, using Weka (Waikato Environment for Knowledge Analysis), an open source software developed at the Department of Computer Science, Waikato University, NZ.
Embarking on the course with no knowledge about data mining, or even data itself, I felt a lot like a traveller in strange lands, attempting to learn a language and culture that was completely foreign to me. I’m not sure exactly why I enjoyed it so much, but it was strangely compelling and there was a odd sense of satisfaction mixed with slight uneasiness when I completed the assessments, knowing I knew what the correct answer was (most of the time), but not knowing how I knew it, or what it actually meant.
Anyway, I now know a small something about what a dataset looks like, what an attribute and an instance is, the importance of filtering, classifying, and evaluating data, and I had a taste of exploring data visualisation, among a few other things – all of which is probably the equivalent of being able to order a coffee (as opposed to say, have a conversation with a local about politics) in Spanish.
The subject of data seems to be making its presence felt in my consciousness in a number of ways lately.
A few weeks back I attended a seminar on research data (organised by WAGUL) where I was fortunate to hear from a range of interesting speakers, including the keynote speaker Sam Searle, e-research senior specialist in information management at Griffiths University. Sam’s presentation covered a heap of topics – including what do we mean by research data, what are the key national services, data citation, and the role of the information manager in research data. Some of the key points I came away with were:
Some of the key national services operating out of, and interweaving through, ANDS (Australian Data Network Service) are :
Clearly, data management is becoming an increasingly important issue for Australian universities, who are putting in place strategies and policies to assist researchers manage their data. However, not many researchers think much about managing their data (as this survey by iVEC’s Florian Goessmann [et al] indicates). There is a significant role for the library to play, particularly in providing information, creating awareness, and educating researchers on what is involved, what researchers should be thinking about, and what services and resources are available to assist in creating a data management strategy. Rather than try to provide solutions, the role for librarians is to be involved in engagement and discussion.
However, while information managers can provide people with things to think about and consider, is there more that we can be involved in? Those who work in libraries are a diverse lot, and bring a whole range of different skills, interests and experience to bear.
Some of the ways in which I would like to be involved with data on a deeper level are:
Having returned from my travels in the data mining MOOC more or less unscathed, I’ve been wondering if there are more courses on data around to help develop my infant skills, particularly in the area of data visualisation. There’s one on Infographics and Data Visualisation running at the moment, and another on Information Visualisation starting in January, which I’m considering doing.
For now, though, I’m off to order a coffee (in English) and – let me be very clear about this – that is not a metaphor and in no way means I’m about to use linear regression to determine a correlation coefficient using 10-fold cross-validation. Adios!