File this under “all good things must come to an end.” Since 2009, we here at FGI have been posting “fugitive” federal government publications (now called “unreported” documents or those documents that should be part of the Federal Depository Library Program (FDLP) but have slipped through the cracks and remain uncatalogued and unpreserved) and advocating for the community to hunt for and report those documents to GPO. Through these efforts, thousands of unreported documents were sent in to GPO and made available for the long-term through FDLP libraries and GPO. We couldn’t have done it without the help of the many volunteers throughout the FDLP that have sent us their receipts of when they reported those documents to GPO. And we also couldn’t have done it without the dedication of folks like Daniel Cornwall, Jeffrey Hartsell-Gundy, and Meredith Johnston who helped by checking the Lostdocs email acct and posting new publications to the blog. I just archived the site in the Internet Archive’s Wayback machine for posterity. Thanks again everyone for the work that you do in making sure federal publications are curated, described, preserved, and made available long-term! Please keep reporting those documents to GPO. It’s critically important!!
That is all.
FGI’s comments and recommendations for the GPO draft report of the task force on an “all-digital” FDLP
[editor’s note 10/28/2022: we updated the text below about 100% of govinfo being published digitally in order to clarify where we got that number and why we use the 100% number rather than the 97% born-digital that is most frequently cited.]
We want to thank GPO Director Halpern for calling a “Task Force on a Digital FDLP” and for all of the members of the task force for diligently working through the many thorny issues regarding the future of the Federal Depository Library Program (FDLP). Director Halpern has requested public comment on the draft report until October 14, 2022. We at FGI are submitting the following as our public comments.
1. The task force was asked to study “the feasibility of an all-digital FDLP.” The group was charged to define the scope of an all-digital depository program and recommend how to implement and operate it.
Although the task force working groups concluded that the FDLP can and should go “all-digital,” the draft report was also consistent in noting that “all digital” does not mean everything will be available only in digital formats (pp. 7, 10). The final report should emphasize this point and clarify and clearly state that print remains a viable format for some of our most important government publications as well as an important access method – and recommend exploring future print opportunities like “print and distribute on demand” as an option for depository libraries.
2. The draft report has lots of good ideas but we suggest that some clarifications and reorganization will bring the findings and recommendations of the different working groups into better focus.
We suggest that the final report should begin with a clear “problem statement” that the report will then address. We suggest that this should have two points:
- Currently 100% of government Public Information is published digitally. (We are extrapolating that 100% estimate from table 11 of the 2018 Library of Congress study “Disseminating and Preserving Digital Public Information Products Created by the U.S. Federal Government:A Case Study Report” which showed that only three of the surveyed agencies reported less than 100% of their publishing output was born-digital. This 2018 estimate is no doubt closer to reality for most agencies than the 2009 estimate of 97%.)
- Only a small fraction of that born-digital government information is currently being curated or preserved in any regular fashion. This has created an enormous preservation (and, therefore, long-term access) gap. Any “all digital FDLP” must recognize and address this enormous digital preservation gap.
(Note: We base this on the research we have done examining the contents of GPO’s Govinfo repository and 2020 End-of-Term crawl data. We found that the great bulk of new digital Public Information is produced by the executive branch (90% of all government PDFs (aka “publications” on the government web are published by the executive branch), but only 2% of the born digital PDFs in GPO’s Govinfo repository are from the executive branch. Meanwhile LC’s web harvesting is relying on GPO and NARA to take care of the executive branch [https://www.loc.gov/acq/devpol/webarchive.pdf] and, by law, NARA is treating all executive branch web content as “records,” of which only 1-3% are typically preserved. For some more details, see our post “Some facts about the born-digital ‘National Collection'”.)
These two points would put the Task Force’s recommendations into context. The primary focus of actions designed to ensure “permanent no-fee public access to digital content” must focus on ensuring the preservation of that content. No digital system can ensure “access” unless that system has control over and preserves the content it intends to make accessible.
3. To address the problem statement, we recommend that the report create long-term goals from which all recommendations would flow. It would be persuasive and most helpful if the Task Force provided explicit connections between each recommendation and one or more of the goals, showing how the success of each recommendation can be evaluated in terms of the goals.
- Increase the preservation of all federal government Public Information.
- Ensure permanent, no-fee access of federal government Public Information to the general public.
- Enhance discoverability and usability of federal government Public Information for all.
4. We believe that the draft report minimizes the need to preserve the existing national print collection. It emphasizes digitization of paper documents for access and accepts and proposes even looser rules for the discarding of paper collections without adequate safeguards for the preservation of the information in those collections in either paper or digital formats. Digitizing for the sake of better access is a noble objective, but preserving the born-digital content that is currently NOT being curated and in danger of loss is a much more urgent matter than enhancing access to already well-preserved paper collections.
5. We suggest that the final recommendations and action items be grouped or labeled in categories that will clarify their purpose and scope. For example:
- principles (free access, privacy, etc.)
- short term tasks
- long term objectives
6. Earlier this year we created a list of some specific long-term strategies which may be of use to the Task Force: “FGI’s recommendations for creating the ‘all-digital FDLP'”.
By reorganizing and refocusing the report on what is truly important — preservation first, access built on preserved content — the report will be clearer about the current status of preservation and access and how GPO and FDLP can contribute solutions to existing gaps and weaknesses.
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
This is certainly good news! The Office of Science and Technology Policy (OSTP) yesterday released guidance for federal agencies to “ensure free, immediate, and equitable access to federally funded research.” This builds on the 2013 Obama administration’s Memorandum on Increasing Access to the Results of Federally Funded Research which directed all federal departments and agencies with more than $100 million in annual research and development expenditures to develop a plan to support increased public access to the results of federally funded research, with specific focus on access to scholarly publications and digital data resulting from such research. This new policy directs agencies to “update their public access policies as soon as possible, and no later than December 31, 2025.”
Of course, from FGI’s perspective, a key piece of making federally funded research open access is the curation, preservation and ongoing access to those publications. We wonder how this will impact the Government Publishing Office (GPO) in its quest to build the “National Collection of U.S. Government Public Information.”
Today, the White House Office of Science and Technology Policy (OSTP) updated U.S. policy guidance to make the results of taxpayer-supported research immediately available to the American public at no cost. In a memorandum to federal departments and agencies, Dr. Alondra Nelson, the head of OSTP, delivered guidance for agencies to update their public access policies as soon as possible to make publications and research funded by taxpayers publicly accessible, without an embargo or cost. All agencies will fully implement updated policies, including ending the optional 12-month embargo, no later than December 31, 2025.
This policy will likely yield significant benefits on a number of key priorities for the American people, from environmental justice to cancer breakthroughs, and from game-changing clean energy technologies to protecting civil liberties in an automated world.
John Oliver is a national hero, always talking about issues of importance in clear and exasperatingly funny ways. Take last night’s show in which he highlighted “environmental racism” – a term used to describe environmental injustice that “occurs within a racialized context both in practice and policy” (thanks wikipedia!). He clearly shows the connection with historical “red lining” — in which people of color were, through official government policy(!), denied the ability to purchase homes in certain areas and therefore kept segregated in many cities — and current environmental policy which often designates those same areas as “sacrifice zones” where heavy industry, toxic waste and superfund sites tend to be located.
Oliver does a great job in analyzing government policy, states that the Biden Administration has said publicly that it will focus on environmental justice — EPA even has an environmental justice website! — but also notes that the administration is not meeting its promises on this front and needs to do more.
One thing he failed to mention — and I don’t blame him because it is after all tangential to the issue of environmental racism — is that the EPA plans to sunset its online archive! According to the Verge article — which cites our pals at the Environmental Data and Governance Initiative (EDGI)!:
Come July, the EPA plans to retire the archive containing old news releases, policy changes, regulatory actions, and more. Those are important public resources, advocates say, but federal guidelines for maintaining public records still fall short when it comes to protecting digital assets.
It’s clear, as Oliver notes, that it’s going to take really big steps to address environmental racism. Local environmental groups will continue to be critical in pushing for changes in government policy and regulation, but they will continue to need access to environmental government information and that’s where librarians can and should do everything in their power to assist in addressing this horrible problem.
As a follow-up to our recent post, “Some facts about the born-digital “National Collection,” we want to suggest some specific actions that GPO and FDLP libraries can take to do a better job of collecting and preserving born-digital content for the “National Collection”.
For context, our starting assumption is that GPO and FDLP have two connected priorities: preservation and user services. The two go hand-in-hand. To be “preserved,” content must be discoverable, deliverable, readable, understandable, and usable by people. Broadly speaking, this can be understood as “user services.” Addressing these priorities at scale will require innovative, collaborative approaches. Old solutions that do not scale will not work.
With regard to preservation, digital objects have to be under sufficient control of the preservationist to be preservable. As we pointed out in our previous post, the vast bulk of born-digital government Public Information is not being preserved by GPO or FDLP libraries. But, worse than this, GPO and FDLP have no active plan to address that gap in preservation. While there are lots of projects to digitize historic paper documents in FDLs, there is no active project to acquire, describe, store, manage, and preserve — ie., curate! — the bulk of born-digital content (the End of Term crawl notwithstanding). Regardless of what minor steps GPO is taking, the results are, at best, insignificant when compared to the scale of the problem. What is needed is a recognition of the problem of the huge gap in digital preservation and a specific plan for developing active strategies to address the problem. Waiting for agencies to deposit with GPO doesn’t work. Simply advertising GPO’s publishing services is not enough. GPO needs new strategies.
The two most important aspects of user services are “discovery” (providing tools that enable users to find the information they need) and “usability” (providing tools that enable users to use the content they discover). The two approaches GPO uses for discoverability (catalog records in the Catalog of Government Publications and a hierarchical presentation of agencies and publication types and dates in govinfo.gov) are woefully incomplete in the 21st century. One resembles a legacy card catalog and the other resembles a 1990s Yahoo!-like directory interface. Each has some utility, but they are not sufficient. GPO needs to work with FDLP libraries to develop new user-centric tools for discovery.
As for usability, GPO’s approach is still very document-centric, being designed to deliver one document at a time for reading. It should be evident to all that there are many more potential uses of government information than simply retrieving one document at a time. 21st century users are more sophisticated and have more use-case needs than that. We believe that GPO should continue to provide the services it does through Govinfo, but it should supplement that work by developing programs, tools, and support for FDLs to develop new uses built on the specific use-case needs of Designated Communities of users — and potential users. Doing that will have the additional benefit of helping drive collection development — and preservation.
GPO already has policies in place that can be read to include the broader vision we offer here. For example, GPO’s Draft Strategic Plan Fiscal Years 2023 Through 2027, while explicitly mentioning digitizing paper collections also includes the vague phrase “focus on adding new collections and filling the gaps in existing collections.” Although, in the context, it seems to imply filing in gaps of paper/digitized collections, it could be taken as a broader mission to address the real preservation gap of new, born-digital content. Nevertheless, vague phrases, are not enough. Policies and projects need to specifically address the massive and growing born-digital preservation gap with action plans.
Given our assumptions and priorities, here are some suggestions for steps GPO can take now.
- Publicly and explicitly, acknowledge and publicize the born-digital preservation gap.
- Develop an aggressive, active strategy for gaining agreements with executive branch agencies to deposit their born digital content with GPO. Work with Congress to provide funding to agencies for providing those deposits and to GPO for receiving and processing them;
- Develop an aggressive, active strategy to promote and enforce existing OMB A-130 policy (“making Government publications available to depository libraries through the Government Publishing Office regardless of format”) for depositing executive branch content with GPO. The policy exists, but OMB does nothing to enforce it. The strategy could include working with NARA, the Federal CIO Council, the Federal Web Archiving Group (consisting of GPO, NARA, Library of Congress, the National Library of Medicine, the Smithsonian Institution, Department of Education, and Department of Heath and Human Services) to support OMB enforcement of that policy and set new policies and regulations for preserving federal agency publications and data;
- Develop an aggressive, active strategy for the development of new tools for harvesting and processing Public Information and metadata, and for the processing of that harvested data for the automated generation of rich metadata for the description, management, preservation, discovery, delivery, and use of harvested data and metadata. Develop tools, workflows, and policies to help FDLs preserve born-digital government information. This can include identifying and acquiring unreported documents, new methods of selection to build digital collections, metadata creation, and the development of digital repositories connected by APIs and a robust system of stable Permanent Identifiers;
- Develop a plan for active, continual harvesting of born-digital content that remains undeposited by agencies with GPO. Develop new strategies for targeting content by document and file-type, use-case, and source. Develop workflows to allow FDLs and other libraries and harvesters to feed their web archiving activities into the National Collection through ingest or cooperative metadata creation, or both;
- Develop next-generation tools and methods for extracting digital objects and metadata from existing Web archives for inclusion in the National Bibliography;
- Develop an active plan for obtaining federal funding to fund libraries, agencies, and GPO to do this ongoing and critical work.
Now THAT’s an “all-digital FDLP”!
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University