EPUB

What is EPUB

EPUB stands for electronic publication. It is an open file format used for eBooks. While eBooks typically refer to digital books, EPUBs is generically meant to include things that were represented as PDFs. EPUB is the distribution and interchange format standard for digital publications and documents based on Web Standards and is an IDPF (now a part of the W3C) standard.

What the EPUB format does

EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including XHTML, CSS, SVG, images, and other resources — for distribution in a single-file format. EPUB allows publishers to produce and send a single digital publication file through distribution and offers consumers interoperability between software/hardware for unencrypted reflowable digital books and other publications. The EPUB file is a zip archive so it is portable.

EPUB contains
  • XML structures
  • HTML and CSS resources
  • images
And – with EPUB 3
  • javascript code,
  • audio and
  • video assets.
As an open standard, the specification is freely accessible from the IDPF website so developers can freely create applications for generating or rendering EPUB files (.epub) prodcued by publishers.
More information on EPUB 3 can be found at EPUBZone website EPUB 3 overview.

Why EPUB

Using open Web Standards in EPUB brings many advantages to the publishing industry:

  • Web Standards are interoperable, meaning they aim at being usable on any kind of device; so is the EPUB standard.
  • Developers of the EPUB specification benefit from the work of the entire Web community. As an example, ebooks accessibility is leveraged by the work done by the W3C on the subject [WAI].
  • Developers of EPUB authoring solutions can create such tools as variants of Web authoring solutions. Developers of reading applications can use as core for their rendering engine an off-the-shelves Web browser engine.

By using Web Standards, the publishing industry avoids reinventing the wheel, albeit the publishing industry must continue to adapt this “wheel” to the chapters and pages of ebooks so that their historical context is preserved. As the reference format for distribution and interchange in the digital publishing indstry for books and media, EPUB allows publishers to produce and send a single digital publication file through digital distribution networks and offer consumers interoperability between software/hardware for reflowable or fixed-layout ebooks.

History of EPUB

2007 EPUB 2
It was initially standardized in 2007 as a successor format to the Open eBook Publication Structure or "OEB" which was developed in 1999.
2010 EPUB 2.0.1
This maintenance release was approved in and was the final release in the EPUB 2 branch.
2011 EPUB 3.0
EPUB 3 superseded EPUB 2 in October of 2011. EPUB 3.0 was approved as a final Recommended Specification.
2014 EPUB 3.0.1
This maintenance release was approved as a Final Recommended specification in June, 2014.
2017 EPUB 3.1
In January 2017, EPUB 3.1, the first major update to EPUB 3, was approved as a Recommended Specification. It is the current version of the standard.

EPUB3 vs EPUB2

Added and/or improved in EPUB3:
Content Documents
  • HTML5: EPUB 2 supports XHTML 1.1 and DTBook. With the support of the XML propoerties of HTML 5 in EPUB 3, it is now possible to use more detailed semantic markup (e.g. use section, aside, figure).
  • Semantic Inflection: A new epub:type attribute, when added to HTML 5 markup, defines the precise nature of structural markup, in line with the publisher intended book semantics.
  • Navigation: EPUB3 defines a new human-and-machine readable grammar for the navigation document, based on the HTML 5 nav element. It replaces the EPUB 2 .ncx file which now deprecated.
  • SVG documents: They can now appear directly in the spine (they no longer need to be nested within an xhtml file).
  • MathML: The XML markup language dedicated to the presentation of mathematical notations is now a first class citizen in EPUB publications.
  • Content switching: It has been simplified by having its processing model defined so that it does not require document preprocessing.
Navigation
Linking
Scripting and Interactivity
  • Interactivity: You can now script scrollbars, photo galleries, text popups, etc… using Javascript code.
  • Triggers: Trigger is an element included in HTML5 for EPUB that allows declarative bindings of activation events (such as “play”, “pause” for an audio event)
  • Bindings: You can now script your own handles for uncommon media files.
Styling and Layout
  • Fixed Layout: see Reflowable vs Fixed Layout.
  • Added modules from CSS3: It also includes alternate style tags, allowing the creation of custom viewing modes, such as day, night, etc.
  • Embedded Fonts: EPUB 3 requires Reading Systems to support the OpenType and WOFF font formats for embedded fonts in conjunction with the CSS @font-face rules.
  • Font Obfuscation: A new normative section on Font Obfuscation [OCF3] has been added the Open Container Format specification.
Rich Media and Speech
  • Media overlays: With the possibility of adding audio, EPUB includes a way to synchronize it with the text.
  • Text-to-speech: The possibility of a text-to-speech ebook is now implemented (using properties such as SSML attributes in XHTML content documents.
  • Audio and video: EPUB 2 has support for raster images only. Thanks to HTML 5, EPUB 3 publications can reference audio or video assets via the audio or video tags, and therefore audio and video assets can be natively processed by modern browser engines.
Meta Data
  • Publication Metadata and Identity: New but mandatory metadata has been added, dcterms:modified Date on which the resource was changed.
  • Resource Metadata: there are new properties attributes on the Package Document, allowing the declaration of new metadatas about the resources.
Containment
  • Remote Resources: EPUB 3 added new restrictions to the resources not located in the container. Please see changes to noted here.
  • Whitespace in MIMETYPE file: The restriction against trailing whitespace has been removed.
  • Disallowed characters: The OCF list of disallowed characters has been extended.
Removed or changed:
  • DTBook File format similar to HTML, with special regard to the requirements of the visually impaired.
  • Out-of-Line XML Islands an XML document that is not authored in a Preferred Vocabulary.
  • Tours A tag to mark points of interest in a publication
  • Filesystem Container
  • Guide
  • NCX Part of the specification for digital talking book DTB and is used in ePub documents to define the Table of Contents (TOC).
  • 2.0.1 meta element meta element annotating the version