“The Right to Read is the Right to Mine…”

Those words are not only the tagline for an innovative text and data mining project called ContentMine, but are also a crucial component of the definition of Open Access.

The facts contained in scholarly articles are what make them so useful and so valuable. Researchers recognize that the digital environment gives them the opportunity to use these articles, and to make sense of these facts in entirely new ways. They want, and need, the ability to fully use these articles – to freely download and search, text mine, data mine, compute on and crawl them as data – in order to advance their work, to discover, to innovate.

Digital articles are, after all, simply small-scale aggregations of digital data. So it makes sense to empower users to employ the tools that are most appropriate to solving the problem at hand. Yet increasingly, we are seeing troubling signs that many commercial publishers are unwilling to support users who want to actually use the content in scholarly articles and not simply read the content in an analog fashion.

In an article in today’s TechDirt, Glyn Moody reports on a recent incident where a statistician attempted to use content mining techniques to advance his work, which involves improving detecting data fabrication – a legitimate and valuable academic pursuit.

The researcher, who works at an institution with a subscription to Elsevier’s ScienceDirect database, notes that he took care to conduct the necessary bulk downloading of articles from Elsevier’s database in a manner that would not disrupt other users.

Nevertheless, Moody reports that Elsevier contacted the researcher and instructed him to stop. The research notes that:

“Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.”

To be fair, Elsevier does appear to have indicated to the researcher that he could use an Elsevier-provided API to continue to content mine articles.  However, the researcher notes that the Elsevier API often returns only metadata to the user – rather than the full text that is so valuable, and that can be easily accessed by the user via the Web, making it a far less desirable option.

Elsevier’s response is troubling for a number of reasons. Using the threat of cutting off institution-wide paid access to ScienceDirect in response to a researcher’s legitimate use of content is extreme. Requiring researchers to use only Elsevier-approved tools to work on articles in an Elsevier-controlled environment is behavior that runs directly counter to promoting an open scholarly environment. And, perhaps the most troubling of all, is referring to the downloading of articles from an institution with a legitimate subscription to the content as “stealing”. The tragedy of Aaron Swartz starkly illustrated the folly of this kind of thinking.

In an era when many commercial publishers insist on selling our institutions access to digital articles only in large bundles, touting the benefits of these bundles as “databases,” restricting the rights of users to fully use these databases is unacceptable. As Peter Murray-Rust and his team at Content Mine so eloquently note:

“The Right to Read is the Right to Mine. Anyone who has lawful access to read the literature with their eyes should be able to do so with a machine. We want to make this right a reality and enable everyone to perform research using humanity’s accumulated scientific knowledge.”

