Monday, October 25, 2021 News

Celebrating 30 Years of arXiv and Its Lasting Legacy on Scientific Advancement

Sharing preprints – the preliminary versions of scientific manuscripts before peer-review and publication in an academic journal—over the internet was a new concept in 1991 when Paul Ginsparg launched the preprint server arXiv. This was before the World Wide Web took off, and there was skepticism about the transition to digital content.

Ginsparg’s idea was to level the global research playing field by providing access to the latest research results. Thirty years later, the free distribution service and open-access archive now has nearly 2 million scholarly papers.

The success of arXiv not only proved there was a demand in high-energy physics, but it has finally prompted other fields from mathematics to biological sciences to sociology to follow suit. Now, preprints in a variety of disciplines are accelerating discovery and demonstrating, more than ever during the global pandemic, the urgent need for information sharing in real time.

The work of arXiv has been important to the development of Open Access (OA) . Indeed, the OA movement was almost called Open Archive, says Melissa Hagemann, senior program officer with the Open Society Foundations. She was inspired to organize a meeting 20 years ago that led to the Budapest Open Access Initiative, in part, because arXiv demonstrated what was possible in disseminating academic research outside of a subscription-based model.

“It was one of the first to develop a membership model for an Open Access project,” says Hagemann of arXiv. “Since then, besides its amazing breadth, it has led to other [preprint servers] and shown that an Open Access project can be sustainable.”

As many open projects are being commercialized, Hagemann says investment in open infrastructure is critical now. “Building a healthy, sustainable open infrastructure is important—not only for the community to use—but to ensure that content going forward is open,” she says. Librarians can help by making sure research is placed in open institutional repositories and that preprint papers are valued on campus too, Hagemann adds.

Preprints have changed the scholarly communication landscape. To meet the moment, there is a growing call for investments to improvements in the publishing ecosystem.

Ginsparg says it’s an open question as to what the correct financial model will be moving forward. It may well be that many players explore a range of options that can scale, including professional societies spinning off a separate enterprise to operate a preprint platform. Despite the uncertainty of funding Ginsparg says he remains “very optimistic” about the future of preprints.

“The proof of concept is there. They are not going to go away,” Ginsparg says of preprints. “Overall, when I look at the growth in arXiv, it’s clear the community wants and needs it…Somebody will figure out answers [to funding] because researchers have found this availability to be increasingly essential because it expedites research.”

While there is growing acceptance of preprints, concerns remain in some quarters about whether they undermine the sustainability of peer-reviewed journals and can lead to scooping. Some journals don’t have clear policies about posting preprints before publication. However,  research shows at least 70 percent of journals accept preprints. Ginsparg adds often the articles are better quality because they have been refined in response to early critiques online and scooping worries are addressed with timestamps that chronicle researchers’ claims to their work. The advantages of preprints are clear, he says, and so far, no community that has adopted arXiv for rapid dissemination has since abandoned it.

In tracking all the research during the pandemic, preprints account for about 20 percent of the outputs around COVID-19.

“COVID-19 dramatically increased the amount of exposure for preprints—not only individual preprints but the concept of preprints more broadly,” says Jessica Polka, executive director for ASAPbio, a nonprofit promoting transparency in science. “There was an incredible spike in activity last year as researchers were striving to communicate their work as quickly as possible. And that is overlaid on top of a more gradual pattern of growth that is still expanding in the life sciences and many other fields.”

Preprints got a big boost once some funders recognized them as legitimate forms of scientific communication and they were cited in grants as evidence of productivity. Many people in the United States became more willing to engage with preprints after the National Institutes of Health changed its policy to encourage their use, notes Polka. This fall, the American Heart Association updated its Open Access policy to directly support preprints, and the Australian Research Council announced it would reverse its ERC Fellow requirement’s ban on preprints after strong community pushback.  

ASAPbio is working to build awareness and encourage a cultural change, hosting training sessions about preprints and emphasizing how they improve the efficiency in the way science is communicated. Many challenges with the delays in journal publishing are prompting more scientists to consider alternatives.   

“Preprints have the capacity to catalyze more significant and even broader changes in the way science is shared and evaluated,” Polka says. “The ability to share papers in a way that anyone in the world can provide feedback fundamentally changes the way researchers can interact. Traditional journal articles are static. By contrast, preprints are a work in progress, invite conversation, and can be updated as a result of that feedback.”

The availability of preprints has been embraced by some on the frontlines as means to improve COVID-19 patient care  and research. A recent study in Quantitative Science Studies, shows that depositing work as a preprint can enhance  both its citations and social impact.

Another milestone in the landscape was the 2013 launch of bioRxiv as an open access preprint repository for the biological science, followed by medRxiv in 2019 for the health sciences. John Inglis and Richard Sever were behind the establishment of both sites, which are operated by Cold Spring Harbor Laboratory (CSHL), a not-for-profit research and educational institution. Additional collaborators with medRxiv include Yale University and the medical publisher BMJ. Both bioRxiv and medRxiv are funded by CSHL and the Chan Zuckerberg Initiative.

“We have a generation of young scientists who are reaching principal investigator positions and bringing the information habits that they grew up with, which were based around the web,” Inglis says. “They see this kind of free, open distribution of scientific information as simply a natural evolution of all how all kinds of information are being shared.”

Preprints are getting attention through news articles, social media, and email sharing with colleagues. There are guardrails set up necessary for responsible curation and it is a transition to a more public process of sharing science. Yet, Inglis says, most authors who have posted preprints are convinced it improved the eventual product they sent to a journal. (See bioRxiv user survey results.)

Sever agrees that authors see the benefits, “Altruism is great, especially when there’s something in it for you as well. And authors realize that if they put their work out there, it’s really good for the community as a whole, it’s also really good for them,” he says.

In the first two decades after arXiv was launched, Sever suggests there was a notion that preprints were a peculiarity of physics, math and computational science. But many physicists who moved into biology helped nudge the field toward preprints and it caught on with now more than 130,000 papers posted on bioRxiv. Then pressure mounted to expand to clinical research and medRxiv was started, thrust into the limelight with COVID-19.

“It was a real example of building something and people coming to it because there was an obvious need for it,” Sever says.

The pandemic fueled interest in medRxiv, which posted 220 manuscripts in January 2020, none on COVID-19, and 2,100 in May, 80 percent on the pandemic (and 40 percent of submissions ended up being rejected for a variety of reasons), Together, about 20,000 COVID-19 related papers have been posted on the two servers.

“People have really realized that science moves fast, and communication of science needs to move fast in an emergency,” Sever says.

Preprints can be credited with getting out the word quickly about the value of the drug dexamethasone in treating severe cases of COVID-19 and also with correcting misconceptions about the airborne spread of the virus. The results promoted the need for revised health precautions and may have saved lives.

“There is growing interest within research communities, and many see the value of open, available platforms,” Inglis says, noting that preprints still only account for 5-10 percent of the published literature. “But there is still much ground to cover.”

Librarians can help in getting the word out by explaining to faculty and students that preprints are valuable research objects that demonstrate productivity and are an important part of the ecosystem, says Sever. The theory was always that preprints would advance science, but now there are concrete examples from researchers who say they are months ahead in their projects because of what they learned from posting their preprints or collaborations that resulted.

The explosion and acceptance of preprints during the pandemic has shifted the way rapid scientific developments are handled for the better, says Philip Cohen, professor of sociology at the University of Maryland and founder of SocArXiv, an open platform for social science papers. “The overall effect has been to accelerate the pace of scholarly communication in science and its openness. So it’s been more inclusive, faster and more efficient highlighting all the reasons preprints are good,” he says.

Preprint platforms share a common goal to disseminate information at a higher speed and lower cost, says Cohen, who launched SocArXiv in 2016, in collaboration with the Center for Open Science. It is now part of the University of Maryland Libraries.

“We felt it was important to come out with a service that was nonprofit and community-owned,” Cohen says, adding there was concern about commercial publishers co-opting the preprint market and hurting libraries with crippling cost hikes. “We wanted to gain some control among the research community for our process and our product.”

SocArXiv is a joint partnership with sociologists and librarians, taking in about 150 papers a month and increasing to a total of about 8,000. Users are encouraged to share preprints, as well as code and data, and the final version of record. Cohen says there remains some myth busting to do with some critics suggesting work released before peer review dilutes the research stream and risks spreading bad information. 

“Our long-run hopes are dependent on getting academia, researchers and libraries together to work together to change policy, rules and norms of how we do scholarly communication,” he says. “That’s how we are going to succeed in the end. We win by changing the climate and culture of the research community. I’m not starry-eyed, but I am optimistic.”

Much has happened in the three decades since arXiv emerged as the first preprint server, changing the scholarly communication landscape. Some have followed its lead; others have embraced the concept, but have also tailored aspects of the model to meet the needs of their discipline. As demands for rapid response to emergencies from public health to climate change intensify, so too will conversations about the value of preprints,  along with the increasing urgency to use  openness to advance science.

