Next Up for Agency Public Access Plans: NOAA
Heather Joseph, Executive Director, SPARC
The National Oceanic and Atmospheric Administration (NOAA) has released its plan to create policies ensuring public access to articles and data resulting from its funded research, as required by the February 2013 White House directive. The agency’s plan places a strong emphasis on building on its current technical infrastructure, as well as leveraging its well-established culture of data sharing.
NOAA Plan for Articles: Establish NOAA Institutional Repository
NOAA’s plan calls for the agency to establish an internal repository for its funded articles. The repository will be built using the “Stacks” technology created by and currently in use by the Centers for Disease Control (CDC), which will act as the systems provider for the repository.
The NOAA plan calls for all agency-funded intramural and extramural researchers to deposit final, accepted manuscripts into the agency’s repository upon acceptance in a peer-reviewed journal. Unlike many of the other agencies that have released plans to date, NOAA will also require its investigators to submit technical reports, data reports, and technical memoranda into the repository as well – significantly increasing the scope of the materials covered by the agency’s policy.
NOAA will use the OSTP-suggested 12-month embargo period as its baseline. Like other agencies, it will provide stakeholders with a mechanism for petitioning the agency to change the embargo period. The plan indicates that requests must include evidence that outweighs the public benefit of having the embargo remain at one year. Given the interdisciplinary nature of its research, NOAA notes that it may also coordinate embargo period changes with other agencies or departments.
The CDC Stacks technology that NOAA will be utilizing currently relies on the NIH Manuscript Submission system for authors and journals to use to deposit articles, so presumably NOAA will adopt this submission module as well. CDC Stacks requires articles to be deposited in PDF/A format, and the NOAA plan indicates that this will be its preferred format; it is unclear if additional formats, such as XML, will be supported.
The NOAA plan does not provide specific information on reuse rights for articles in its planned repository other than noting “ there is no automated system for downloading all publications in CDC Stacks, which limits unauthorized redistribution. Users can request a copy of the publications that can be freely redistributed based on the publication’s license.” This would indicate that bulk downloading – and ready machine analysis – will be extremely difficult to do. NOAA indicates that it will attempt to monitor compliance using existing channels, but also explicitly notes that they currently “lack automated mechanisms to confirm compliance by grantees,” and that if these systems can’t be developed expediently, “manual verification” will have to be used.
The NOAA plan also indicates that the agency is aware of publisher efforts, such as CHORUS, that may be used to provide access to the final version of articles, and underscores that its own internal repository will be used in any event to ensure accessibility and long-term preservation of final, accepted articles. It also makes reference to a willingness to work with the SHARE project, should it evolve into something that has the capacity to house final manuscripts of articles or datasets
NOAA Plan for Research Data: Building on a Strong Foundation
NOAA’s plan for providing public access to data builds on the agency’s existing strong foundation for data sharing. Currently, all intramural researchers are required to submit a Data Management Plan (DMP) outlining plans for managing, providing access to, and the long-term preservation of any research data generated by NOAA funded researchers. Extramural researchers are required to submit a Data Sharing Plan covering plans for access only. These requirements will be adjusted to require all NOAA programs to consider whether researchers must submit full DMP’s when they apply for funding.
The NOAA plan is quite comprehensive in describing the definition of the data it covers – defining scientific data in terms of the types and formats of environmental data it generates. It also is unique among plans released so far in the scope of coverage, calling for the policy to ultimately apply to “all future results, and to all past results from current programs.” NOAA further states that the policy will also apply to “all legacy data currently archived at one of the NOAA National Data Centers, ” something no other agency has addressed to date.
The plan is also notable for the clear guidance it provides in terms of timing of data sharing. Currently, funded researchers are required to make data “visible and accessible” within two years. The new plan calls for this time frame to be shortened to just one year. It also indicates that data underlying the conclusions of peer-reviewed articles will most likely be required to be made available at the time of the article’s publication, in appropriate repositories (presumably to be designated by NOAA)
Out of all of the plans released to date, the NOAA plan provides the clearest guidance on linking publications and data, by citing specific datasets in the reference list of a publication. The agency will employ a strategy of assigning persistent digital identifiers (DOI’s) generated by the agency for NOAA-produced data sets, in order to enable them to be cited in journal articles and other documents. NOAA expects to issue complete guidance on the procedures and requirements for obtaining and using a NOAA DOI later this year.
Like many other agencies, NOAA’s plan for research data calls for the use of an agency Data Inventory. NOAA will build out its existing NOAA Data Catalog, to enable researchers to discover and connect data to scientific articles, other datasets, etc. The metadata describing the scientific data contained in the catalog will include, at a minimum, the common core metadata schema currently in use by the federal government.
The agency’s plan is also quite detailed in terms of outlined requirements for comprehensive metadata standards to be developed and used. NOAA has already developed a ‘Data Documentation Procedural Directive’ that requires data to use structured metadata based on specific ISO standards. The plan calls for additional training and tools for metadata creation and verification to be developed to support this function.
As an agency, NOAA has long maintained a set of comprehensive National Data Centers, which ensure long-term preservation of multiple formats of digital data. These Centers will continue to be maintained, and will play an increasingly important role as additional data sets are identified by the agency as requiring long term preservation. NOAA also participates in public-private collaborations (such as the Open Geospatial Consortium), that are designed to develop and promote interoperability standards, and the agency will also continue to place a premium on this important work.
And finally, NOAA joins the majority of the other U.S. agencies in noting that the Department will explore the development of a “research data commons” along with other departments and agencies, for storage, discoverability, and reuse of data with a particular focus on making the data underlying peer-reviewed scientific publications resulting from federally funded scientific research available for free at the time of publication.