By Derek McAuley, Hanif Rahemtulla, James Goulding and Catherine Souch –
‘Open Data’ refers to the philosophical and methodological approach to democratising data, enabling individuals, communities and organisations to access and create value through the reuse of non-sensitive, publicly available information. This data is typically available online at no cost to citizen groups, non-governmental-organisations (NGOs) and businesses. Some view this as the logical conclusion to Freedom of Information (FoI) Acts in various countries—if citizens can ask for the data, why not simply publish it in the first place?
Today, Open Data is gathering momentum, and forms part of a global movement, linked to other movements such as Open Access and Open Source. The Open Data Initiatives will, it is envisaged, support greater transparency and accountability within government, as well as leading to economic development in commercial sectors and improved public sector service delivery. Integral to this vision is that information hitherto held in hidden databases is opened to the public and, furthermore, released in a form that facilitates easy reuse.
To date, the Open Data movement has created great excitement in developer communities. Social and commercial entrepreneurs are producing a seemingly endless stream of innovative applications that repurpose and enrich publicly available data, across multiple sectors, including health, transport, education and the environment. This new wave of creativity is characterised by Sir Tim Berners-Lee (creator of the World Wide Web) as the combination of information, creative vision and digital technology.
However, smart governments should not rely solely on the organic growth produced by entrepreneurs. Rather, as argued by Eaves:
‘Forward-looking governments – those that want an engaged citizenry, a 21st-century workforce and a creative, knowledge-based economy in their jurisdiction – will reach out to universities, colleges and schools and encourage them to get their students using, visualising, writing about and generally engaging with Open Data.’
This will foster a sense of opportunity among this generation, to interact and participate in this wave of innovation and change, empowering citizens to improve services, reduce costs and boost productivity.
To illustrate with an example from the UK, consider www.police.uk which had recently launched at the time of writing, amid a fanfare of publicity. It was immediately pilloried from various quarters for overloaded servers, sluggish service (indicating, at least, an intrigued public) and inaccurate data. If the latter is true, then surely this was a great opportunity to encourage the bottom-up correction of a large public database that police agencies work with on a daily basis. However, few voices pointed to such opportunities, or highlighted how combining this data with other information, for example economic data from ONS, could help geography, sociology and criminology researchers develop valuable insights into the relationship between employment and crime.
Such researchers within higher education establishments are at the vanguard of the Open Data movement, whether as evangelists, users or technologists. Higher education has pioneered the use of web technologies, with institutions making large amounts of information available to students, commercial partners, funding agencies and staff. Yet there is still much that can only be accessed through FoI requests, and most data resides on static web pages, rather than in common data formats that enable data reuse. In the UK, the Joint Information Systems Committee (JISC) has been developing such open data standards, with initiatives such as ePortfolios and course definitions, which if adopted by a sufficiently large proportion of the sector would enable a wave of innovation.
Integral to this growth in innovative data uses and repurposing is training in Data Literacy within higher education. Data literacy—defined here as the ability to identify, retrieve, evaluate and use information to both ask and answer meaningful questions —is an important civic skill that forms the foundation of an innovative knowledge economy and increasingly data-driven society. To demonstrate, one needs only to reflect on a recent statement by Richard Sterling (Former Head of data.gov.uk). In July 2010 he acknowledged that the public are already struggling to make sense of the huge volume of datasets published online, expressing concerns that individuals may be coming to conclusions that ‘weren’t quite valid’ after browsing the 5,850 data sets available on data.gov.uk. Sterling attributes this to the format of the information (e.g. structure, configuration and pre-processing) impacting deleteriously on the capacity of end users to make use of the data. As Davis2 states, much public sector information is ‘simply not collected in a usable form at present’3 and the systematic organisation of information is not a neutral act, involving decisions that impact both on its interpretation and future use.4 For example, with regard to the recent crime statistics data, the conclusions that can be drawn are very clearly a function of both how data is collected and the degree of aggregation granularity used to preserve privacy of individual households.
Further, addressing these challenges by providing online query and visualisation tools to ‘make it easier to analyse and visualise the data’ as proposed by Sterling, assumes that the public have sufficient knowledge and skills to interpret and use data, and know the sources of uncertainties generated in the conflation of different open datasets.5 Even something as simple as where something happens is a complex problem; for example data recorded based on local government boundaries, which are subject to change, can only be interpreted rationally with access to a historical archive of such boundaries.
Herein lies an important distinction between the often-conflated memes (‘meme’ is a relatively newly coined term, attributed to Richard Dawkins, which describes a unit of social information, ideas or beliefs that is transmitted from one person or group to another, analogous to genes) of Open Data and Linked Data. While the former represents an unequivocal step forward in increased access to, and public ownership of large data sets, it is the latter that holds the potential to be a powerful, positive and disruptive force in higher education. Meltzoff et al. reported that:
‘Insights from many different fields are converging to create a new science of learning that may transform education practice.’
It is Linked Data, with its facility to cross-correlate traditionally disparate, ring-fenced research resources, such as scientific, geographical, economic and sociological datasets, that will be a central tool in this transformation.
Linked Data, which uses familiar web-based URL addresses to provide links between Open Data sources, allows higher education to benefit from a ‘network effect’ as educational data is liberated from its traditional silos. Richer interconnected information environments will produce richer learning environments and a host of new opportunities: simplifying resource discovery and promoting personal exploration of material; supporting integration of distributed discourse while encouraging referencing skills; enhancing construction of both personal and group knowledge while promoting self-actuated learning; facilitating better argumentation and critical thinking skills through advanced reasoning over large volumes of resources; and because Linked Data represents a powerful tool for independent learning, it does all this with the added benefit of further disintermediating educators.
Realisation of this potential has not only begun, but continues apace. Hard sciences have paved the way through projects such as Bio2RDF and Linked Life Data which provide immense corpora of life-science information. Economists are harnessing Linked Data from public sector bodies such as the World Bank and the Office of National Statistics, as well as from a growing number of private sector producers (such as Xignite who provide access to live financial information). Geographers enjoy the facilities offered by geospatial Linked Data services such as GeoNames and LinkedGeoData, with its 350 million queryable geographical features, and Sociologists now have unprecedented access to the European Union’s statistical data, thanks to the Reise project, with its 3 billion queryable Eurostat derived facts.
The value of these resources to higher education lies not merely in openness and accessibility, but in their interconnectivity. The capability to query as well as browse, to benefit from data fusion mechanisms, generates both novel research discoveries and compelling educational experiences. Consider, for instance, the educational worth, research value and policy implications of being able to tie socio-economic data from Reise, with epidemiological patterns referenced by Linked Life Data, then joining this with the travel patterns indicated within LinkedGeoData.
Linked Data shows signs of achieving traction in higher education. However, ‘despite undoubted progress, the green shoots of a Linked Data ecology remain delicate’, 7 and, as such, we must take great care to reinforce the progress of this revolution. Higher education technologies require scalable inter-disciplinary design, and although Linked Data affords us exactly that, policies surrounding it must be grounded in communication and sharing of expertise amongst research disciplines. A number of cross-cutting issues stand out, but of these Information Literacy is the most pressing. With the DBPedia project now exposing Wikipedia as linked data, and services such as freebase expanding rapidly, educating students to distinguish between good and bad resources is paramount. For our part we must not only provide methodologies for making this distinction, but actively ensure that such distinctions are achievable in the first place. Bechhofer argues that we must therefore bring our attention to bear on publishing requirements such as data provenance, quality and attribution—and that without addressing these considerations, simply publishing data into the cloud will not sufficiently meet the requirements of reuse.
The Open Data revolution and emerging technologies such as Linked Data offer exciting opportunities for higher education, allowing substantial learning challenges to be met by interlinking resources across disciplines and institutions. However, policy must attempt to reinforce progress already made, encouraging institutions to openly release their data in a linkable form, to deploy applications that use these resources within their educational programs and, importantly, to enhance emerging data vocabularies rather than engaging in top-down didactic creation of new ones. However, many challenges remain. There are fundamental epistemological differences in how different cultures, communities and disciplines (and even academics within a single discipline), view the same information and hence we need to be aware of and embrace different, even conflicting, vocabularies. New applications of data will revolutionise higher education, but it must take the lead in driving up data literacy amongst staff, students and the wider population.
Professor Derek McAuley is Professor of Digital Economy in the School of Computer Science and Director of Horizon, a Digital Economy Research Institute, at the University of Nottingham, and Affiliated Lecturer at the University of Cambridge Computer Laboratory. He is a Fellow of the British Computer Society and member of the UKCRC, a computing research expert panel of the IET and BCS.
Dr. Hanif Rahemtulla is Geospatial Scientist at the University of Nottingham and External Lecturer and Honorary Fellow at University College London. His research is principally focused in the areas of geographic information policy focusing on Open and Linked Data, the handling and analysis of environmental information, and wider philosophical issues on the societal impacts of Information Communication Technologies.
Dr. James Goulding is an early career researcher with a rapidly growing list of international publications across the fields of data theory, location based services, information retrieval and ubiquitous computing. Extremely experienced software engineer with a passion for Open and Linked data and an extensive range of programming skills, specialising in mobile technologies, distributed databases and artificial intelligence techniques.
Horizon Digital Economy Research at the University of Nottingham represents an initial £40million investment by Research Councils UK, The University of Nottingham and more than 100 academic and industrial partners in both a Research Hub and Doctoral Training Centre within the RCUK Digital Economy programme. Horizon brings together researchers with backgrounds in computer science, the geospatial sciences, engineering, psychology, sociology, business, social science, law and the arts to build-in an understanding of people and society in technology developments from the outset, and to ensure users benefit from these advances.
Dr. Catherine Souch is Head of Research and Higher Education at the Royal Geographical Society (with the Institute of British Geographers). Previously, she was Professor of Geography at Indiana University, USA.
The Royal Geographical Society (with Institute of British Geographers) was founded in 1830 and is a world centre for geography: supporting research, education, expeditions and fieldwork, and promoting public engagement and informed enjoyment of the world.
- Eaves (2010) Learning from Libraries: The Literacy Challenge of Open Data. Available at: www.eaves.ca (Accessed June 2010)
- Davis, T., (2010) Open Data, Democracy and Public Sector Reform. Available at: http://www.practicalparticipation.co.uk/ (Accessed July 2010)
- Allan, R., (2009). The Power of Government Information. In J. Gøtze & C. B. Pedersen, eds. State of the eUnion: Government 2.0 and Onwards. Author House. p.01
- Snowdon (2010), p.01 Its information to data we need, not DIKW. Cognitive Edge. Available at: www.cognitive-edge.com/blogs/dave/2010/05/its_information_to_data_we_nee.php (Accessed May 2010)
- Sterling (2010, p.01) Open data hard to understand, says data.gov.uk chief. Available at: www.information-age.com (Accessed June 2010)
- Meltzoff, A. N., Kuhl, P. K., Movellan, J., and Sejnowski, T. J. (2009). Foundations for a new science of learning. Science 325, 284–288
- Miller, P., (2010) Commissioned Report: “Linked Data Horizon Scan”, Joint Information Systems Committee (JISC), 2010