Also see the collection development page.
Project Gutenberg eBooks are created by volunteers. This How-To contains some of the basics to get started in creating an eBook for submission to Project Gutenberg.
Project Gutenberg only accepts works that are in the public domain in the United States. Generally, new submissions to Project Gutenberg are digitized versions of printed books, most of which were published at least 95 years ago (see the Copyright How-To for details). Confirmation of public domain status is via the copy.pglaf.org site.
Collection development focuses on literature and other written works that have enduring value for readers. Selections are made by volunteers with diverse interests, and essentially all eligible submissions are welcome.
Project Gutenberg does not avoid difficult or unpopular topics, or topics for which societal views, or state-of-the-art knowledge, is vastly different from contemporary literature.
Project Gutenberg does not accept copyrighted or other contemporary items, even if copyright or licensing permission would be granted (this includes the various “open” licenses). Instead, see the self-publishing portal described in the Submitting Your own Work How-To. In the past, a greater variety of non-public domain and non-literary works was added to Project Gutenberg, including copyrighted & other donated works, non-print formats, and different encodings and file types. Today, there are many outlets for such other items, and Project Gutenberg is entirely focused on works in the public domain. This includes literature, reference works, and variations such as children’s books and travelogues.
Creating an eBook is a lot of work, and Project Gutenberg’s requirements are rather strict. If these steps seem daunting, you might be better off working with Distributed Proofreaders, where every volunteer contributes to a portion of the effort. Distributed Proofreaders has its own guidelines on submissions, as well as other policy, guidance, and community.
Being a “solo” producer involves obtaining copyright, then scanning or harvesting to obtain images of a book’s pages, then engaging in many hours of proofreading and formatting, and, finally, ensuring the eBook is fully valid HTML, correctly spelled, and otherwise compliant with Project Gutenberg’s requirements. The detailed requirements are on the submission page, which also includes tools for automatically checking compliance. Any new submitter is invited to get in touch before to getting started. Contact the “copyright” or “whitewashers” team &emdash; email addresses are at the bottom of the copyright and upload sites.
The first step in any eBook submission is to confirm that Project Gutenberg may legally distribute the eBook. Visit our Copyright How-To for details. Project Gutenberg will not accept any eBook without confirming copyright status. Generally, this means that a printed book has entered the public domain in the United States, typically because copyright has expired.
The master format for nearly all new Project Gutenberg eBooks is HTML. Project Gutenberg insists the HTML is fully valid, and any cascading style sheets (CSS) in the HTML are part of the published standards by the World Wide Web consortium (W3C).
Whenever possible, Project Gutenberg also requires a plain text version of an eBook. We stress the inclusion of plain text because of its longevity: Project Gutenberg includes numerous text files that are over 30 years old. In that time, dozens of widely used file formats have come and gone. Text is accessible on all computers, and is also insurance against future obsolescence.
The only times when Project Gutenberg distributes an eBook without a plain text version are when plain text is impossible or impractical — for example, for our movies and MP3 audio files, and for some of our mathematical works.
This isn’t as hard as it sounds, if you start with HTML, and continuously confirm validity using the W3C’s online validator (see the upload page for details). Modern eBook producers almost always start with (valid) HTML, and then derive plain text from the HTML. These two “master” formats, text and HTML, are then submitted to Project Gutenberg. Automated tools then create derivative formats, including epub and mobi (the common e-reader formats).
Hand-crafted epub and mobi files are not accepted, currently. This limitation is mainly to allow easy editing when fixes are applied, because application of fixes occurs many times over the lifetime of every eBook. Project Gutenberg tries to limit the number of master formats, and then automate derived formats, to facilitate continuous improvement of items in the collection.
A small number of new eBooks utilize LaTeX or TEI as the master format - mostly those including lots of mathematical notation. Other less frequently utilized formats include ReSstructured Text (RST), Rich Data Format (RDF), and a few others. Note that PDF, Word, and other word processor formats are not utilized as master formats, because they do not convert easily to valid HTML. In addition, they are much more challenging to fix/update than HTML and plain text.
The Volunteer’s FAQ has extensive guidance on best practices for Project Gutenberg eBooks. Some of this information is outdated, and the focus is on solo eBook producers. Get in touch, if you need additional guidance.
Turning a physical book into an eBook is a wonderful way to preserve the book, and to make it more widely available. Historically, eBook creation was usually by a single person typing in the physical book, a page at a time. This technique still works, of course, and is sometimes necessary (for example, if the book is damaged or extremely fragile).