Open Data

Open Data is research data that is freely available on the internet permitting any user to download, copy, analyse, re-process, pass to software or use for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.

In the digital age, data is the raw material on which discoveries are built, and unfettered access to research data, whether in the Life Sciences or the Social Sciences, is crucial to accelerating progress in research. Data plays a central role in our ability to predict and counter natural disasters, understand human biology, and develop advances in computing technology.

Despite its tremendous importance, today, research data remains largely fragmented—isolated across millions of individual computers, blocked by disparate technical, legal and financial restrictions.

The amount of scientific and scholarly data grows exponentially each year, yet we still lack the infrastructure, policies, and practices to harness this vital resource. While some high profile projects—such as the Human Genome Project and the Large Hadron Collider—make their data openly accessible, too often data isn’t shared beyond those who generate it. The Internet was built by researchers to share data, but data sharing isn’t yet the norm in research.

The tremendous gap between what is possible with digital technology and our outdated infrastructure has led to the call for Open Data

Open Data is research data that:

  1. Is freely available on the internet;
  2. Permits any user to download, copy, analyze, re-process, pass to software or use for any other purpose; and
  3. Is without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.

Open Data typically applies to a range of non-textual materials, including datasets, statistics, transcripts, survey results, and the metadata associated with these objects. The data is, in essence, the factual information that is necessary to replicate and verify research results. Open Data policies usually encompass the notion that machine extraction, manipulation, and meta-analysis of data should be permissible.

Open Data:

  • Accelerates the pace of discovery. When datasets are openly available, they can be easily accessed and used to create a fuller picture of a given area of inquiry, or analyzed by data mining software that can uncover connections not apparent to those who produced the original data.
  • Grows the economy. Researchers estimate that $3.2 trillion in economic output could be added to global GDP through Open Data across all sectors, with scientific and scholarly data playing an important role.[1]
  • Helps ensure we don’t miss breakthroughs. There are a huge number of ways to use or analyze any given dataset. What seems like noise to one person could be an important discovery to someone else with a different perspective or analytical technique.
  • Improves the integrity of the scientific and scholarly record. When the data that underlies findings is accessible, researchers can check each other’s work and ensure that conclusions are built upon a firm foundation.
  • Is becoming recognized by many in the research community as an important part of the research enterprise of the 21st From research funders like the US government to publishers, institutions involved in the research process are beginning to require that, at the very least, the data that underlies publications be made openly accessible.

Open Data has the potential to speed up the research process while simultaneously improving our confidence in those results. The access, use, and curation of this huge and growing body of data is central to the research enterprise.

[1] https://www.omidyar.com/sites/default/files/file_archive/insights/ON%20Report_061114_FNL.pdf

Learn more about our work