Open data and the politics of scholarly practice

It’s now a good few years since academic publishers put in place differential pricing arrangements for developing economies in Africa. And although a complex transition is still in place a large number of academic publications are freely available through Creative Commons licences, as pre-prints in online repositories or through the “gold” publication route, in which all costs and profits have been recovered up-front. So why, as Laura Czerniewicz asks in a recent and widely read article, does the north-south knowledge divide still persist?

As Laura points out, there are a number of contributing factors.

Firstly, it’s by no means all about access. Universities in the south lack research funds. Bandwidth may be available but it’s often constricted, unreliable and very expensive in comparison to the facilities that northern universities largely take for granted.

Secondly, while the editorial processes that scholarly journals use are central to the machinery of peer review, assurance and ethical practice, journals can also be gatekeepers that police disciplinary boundaries and provide reciprocal endorsements for scholarly cliques.

Thirdly is a cluster of epistemological issues: the assumption that “international knowledge” trumps “local” research; that “grey literature” is of lesser value than papers written in scholarly genres.

Here, I’m interested in this third set of issues that Laura raises. In her words:

Our own perceptions of ‘science’ must be broadened to encompass the social sciences. … Research outputs need to be recognised as existing beyond the boundaries of the formal journal article. Incentives and reward systems need to be adjusted to encourage and legitimise the new, fairer practices that are made possible in a digitally networked world. And finally, the open access movement needs to broaden its focus from access to knowledge to full participation in knowledge creation and in scholarly communication.

To see how these changes could happen – how the long-established subservience of “southern knowledge” to northern paradigms in scholarly publishing can be challenged – it’s necessary to backtrack a little and take account of where digital publication is going.

We all know how online access has transformed the music and film industries. The same is happening with scholarly publishing. All major journals now provide online access and, soon, there will be no need for paper versions. This requires new business models. If journals are to continue as on-line subscriptions, then publishers must put them behind secure paywalls and charge for individual access by non-subscribers, finding ways to stop these digital copies being copied and re-distributed. Alternatively, all costs and profit margins must be recovered up front through what are known as “article processing charges”, with either authors or their universities paying to allow free-of-charge distribution after publication; what is known as “gold” open access.

Making this jungle still more complex to get through, public interest journals publish articles without subscription fees or article processing charges. And every author holds the copyright to the final version of their work – the paper as it stands just before being surrendered to a publisher – allowing them to distribute their work free of charge, online, and through a “green” open access repository, of which there are now hundreds. Today’s battleground is with those profit-hungry publishers who are “double dipping”; charging high online subscriptions as well as article processing charges for papers based on publicly funded research.

This is all of importance. But getting to a new equilibrium for the scholarly online business model will not, it itself, solve the imbalance in the north-south distribution of knowledge. Much more interesting in this regard are the possibilities in what can be called “interactive citation”.

How will reforming the citation system work?

By tradition, scholarly publications are grand constructions buttressed by the strength and standing of their citations. These foundations and load-bearing walls are a combination of previous, authoritative, publications and references to appropriate sets of data. In turn, these data sets will be in forms recognized and accepted by the tradition and rules of scholarly disciplines. Together, the standing of precedent and associated work and the quality of the appropriate data sets serves to authenticate the knowledge claims of the new work. Knock away the foundations – for example, by showing that the data sets were falsified – and the whole edifice comes crashing down.

So far most – but by no means all – online publications have stayed true to these traditions. Tracking back the citations invariably leads to a pdf file, arguable the most unimaginative digital format available. But this will change. Increasingly, online publications will make use of hyperlinks to dynamic resources. These could be the full and original datasets underpinning the work, allowing the reader to explore different possibilities. They could be links to datasets that, in their nature, change constantly. Digital links could also be to live feeds that allow the “reader” to engage directly with continuing research work as it unfolds.

Here are a few examples from my own discipline – Archaeology – that illustrate what this could mean.

Most archaeological excavations are publicly funded and generate masses of heavy, dirty stuff. Traditionally, it has only been possible to make the most slender of references to these collections, which will usually be in deep storage, unavailable to those readers who are curious. Now, online publications can include hyperlinks to all the original assemblages, including large photographic archives, inventories and statistical data.

Staying with the example of archaeology, big sites are often excavated over many years, and successively by different teams. This means that hyperlinks embedded in earlier online publications can access datasets that have changed subsequently, and continue to change. This allows the assumptions behind earlier publications to be tested. And, given that research such as this is publicly funded, and often has high levels of public interest, why not embed links in scholarly publications to live resources, such as webcams inside an excavation, or a laboratory?

Back to the issue here – how could such “interactive citations” overturn the current hierarchies of knowledge? Getting to this requires a second digression – the prevalent assumptions about the relative value of ‘international” versus “local” vehicles for publication.

Here, the most egregious example is the way in which British universities have interpreted the requirements of the UK’s periodic evaluation of research quality – the “Research Excellence Framework”. In the last version of this, subject area panels of senior academics produced a ranking system for “their” journals. 4* and 3* journals had to be “international”, and research by academics who had not published in 4* and 3* journals was not considered to be “excellent” and was invariably not included in a university’s submission to the funding council. Some universities in Britain are now moving academics who are not “international” into teaching-only positions or onto fixed term contracts.

Sadly, some universities in the south mimic this hierarchy of value. Here’s a hypothetical case study, this time in public health; a thought experiment that shows what this could lead to.

Lets say a researcher in Cape Town is passionate about a local issue, such as the high incidence of foetal alcoholism, or drug resistant TB, or homelessness and child mortality. The results of the work may have immediate and high value to local health authorities and social services; the local professional community will benefit from the rapid availability of research results in a locally published, fully refereed, academic journal. This, though, is not an “international” publication and has no place in the dominant hierarchy of academic value.

But then add this twist. An academic in a UK-based university, in Cape Town for an academic conference, hears about the local study. He or she builds the continuing development of this work into a funding proposal and, with all due acknowledgement to the Cape Town team, submits the results to an academic journal in the UK. These research outcomes – in essence the same – are now “international”, excellent and of 4* quality.

The contradiction inherent in this kind of scenario – replicated in a wide range of disciplinary areas is this. While the intention will invariably be progressive, to open up key areas of research across Africa, an outcome is to replicate the exploitative structure of nineteenth century colonialism into the knowledge economy of the twenty-first century.

Then, raw materials were exported from Africa, fashioned into high-value goods in Europe’s factories, and sold back into the colonies. Now, raw local data is exported from Africa, fashioned into high-value knowledge in Europe’s universities, and sold back to universities in Africa as high-cost journal publications.

Back to Laura’s question: how can the open access movement broaden its focus from access to knowledge to full participation in knowledge creation and in scholarly communication? Here, the key opportunity in in further extending the opportunities inherent in “interactive citation”.

When the journey from the “doing” of research through to the finished product of the final publication is fully mapped, it becomes what Bruno Latour called a system of references. Seen through a digital lens, these route maps are a series of digital assemblages. In my archaeological world, such a system of digital references would include scanned field notes and maps, photographs taken for recording purposes, spreadsheets with lists and indices of excavated finds, digital records of laboratory results, interim reports and conclusions, correspondence and news releases and – eventually – the final pre-publication draft of the journal article. In traditional citation systems at their best, very little of this information trail is publicly available; more often than many will care to admit, a good deal is lost along the way.

Used appropriately, interactive citations embedded in digital publications will open up this rich chain of references. This will be valuable for all research endeavors. But how does it help with the stubborn and persistent north-south issue?

Another example, this time from human palaeontology. Some of our earlier ancestors left their bones in Kenya’s volatile landscape, resulting in remarkable levels of preservation over hundreds of thousands of years. These traces, from footprints in lakeside mud to human sculls that can tell us about the emerging brain, are part of Kenya’s cultural heritage and belong in the national museum in Nairobi which is where, for the most part, they are.

These key traces of our common past have been excavated under permits from the Kenyan government and with public funding from North American and European agencies. The results of such fieldwork can make an academic career; this is the kind of research for which the eventual journal article is reported on the cover of Time Magazine.

One could assume that, for research of such widespread interest, the data leading to interpretation and publication would be available for further analysis, whether to ask new and interesting questions or the check whether published claims stand up to scrutiny. Not so. All too often key data sets are reserved as the intellectual property of the researcher. What is made available is metadata – data about data. Access to primary information may be withheld for many years. This means that while the original object may be housed in Africa, in its country of discovery, the gateway to the key information that makes sense of the original object is through a northern research institute, and subject to its permission.

Interactive citation can change this. If the concept of “open data” can be defined as access to the full chain of references that make up the citation, then the political economy of the knowledge landscape can be changed. In this scenario, some of the old shibboleths of academic quality management fall away. The dichotomy between “local” and “international” becomes irrelevant because, in a sense, all knowledge becomes local and all researchers become international. Similarly, the distinction between formal publications and “grey literature” become redundant; all reports are electronic files and what matters is rather the rigor of their review and verification rather than the status of a journal title.

Changes such as these are difficult, controversial and contested. This is not surprising; it’s what disruptive technologies do. We are in the midst of a major modification to the way knowledge is created and distributed; changes to a system that was invented in the seventeenth century and which formed the basis of the great scientific advances of the following three hundred years. And so it’s appropriate to end with a point of view from the Royal Society, which was founded in 1660:

The changes that are needed go to the heart of the scientific enterprise and are much more than a requirement to publish or disclose more data. Realising the benefits of open data requires effective communication through a more intelligent openness: data must be accessible and readily located; they must be intelligible to those who wish to scrutinise them; data must be assessable so that judgments can be made about their reliability and the competence of those who created them; and they must be usable by others. For data to meet these requirements it must be supported by explanatory metadata (data about data). As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database. We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable.


Laura Czerniewicz: “It’s time to redraw the world’s very unequal knowledge map”. The Conversation Africa, 8 July 2015

Royal Society: “Science as an Open Enterprise”. June 2012:



2 thoughts on “Open data and the politics of scholarly practice

  1. Thank you for this Martin, you make an important point about open data as a political tool, as a strategy for contributing to righting the skewed geopolitical shape of scholarly publishing. While open data is good for scholarship in general, it is imperative for southern researchers whose work should not (further) be appropriated or rendered invisible. It is indeed an opportunity for realigning knowledge relationships in general.

  2. Open Data is more practicable in some disciplines / using some research approaches than others.

    My PhD research is funded by a research council (ESRC) and, as such, should be “publicly owned” or at least, available. However, social science ethics regarding confidentiality, “anonymity” (in practice usually pseudonymity) and protection of sources places moral embargoes on the release of raw data. (Even “cleaned”, anonymised versions of transcripts can be problematic – only by rendering the text completely bland can one be certain that there remain insufficient distinguishable (ie identifiable) traces to allow someone else to identify – correctly or otherwise – an informant or participant. At some point, the data ceases to be of interest or use because anything of interest has been stripped. ) Data is given to a researcher by a respondent according to a contract – they consider the purpose behind your asking them, and decide what they will share. Going beyond the agreed purpose is a breach of that contract – and not only renders the informants potentially vulnerable, but spoils the “field” for other researchers who will follow.

    In qualitative research, especially that of an ethnographic bent, the researcher is the instrument. I am certain that many things were told (or not told) to me because of who my informants saw me to be. Another researcher would almost certainly have been told different things, or the same things differently. Margaret Mead’s work in Samoa is probably the classic example of that.

    Also, the “second record” is important to the interpretation of the data – however full field notes might be, or whatever can be heard in a recording or seen in a video, they cannot fully capture the immersive experience of having been there at the time. These issues go to the heart of discussions around validity and reliability in social science, and notions of “replicability” imported from “hard” science, and are pertinent to notions of sharing data, or opening data, because they make assumptions about the nature of data that don’t always travel well across disciplines, approaches and locations.

    Research in creative and performing arts raises other issues. Where research is creative in nature, rather than following “traditional” scholarly paradigms, data may be located within the researcher – impressions, sensory responses, emotions, etc- that can only be accessed through the research output – the painting, dance, song or sculpture. Does this absolve these researchers of making their data available, or will they be required to translate it into other, accessible forms – almost certainly disrupting the research process and possibly corrupting the outputs?

    Or is such data deemed of lesser value, less desirable in the “market” for open data, echoing the (already political) differential valuing of some disciplines as more “worthy” than others? Already in the UK only some disciplines are deemed worthy of receiving funding for undergraduate teaching

    Politically, “open data” makes a great deal of sense. Putting it into practice across different disciplines, however, raises many questions.

