The Stoic Resilience of PDF Within a Digital Ecosystem

PDF, as a format for the dissemination of scholarly content, does have its detractors—so why does the stalwart PDF file stubbornly refuse to retire from service?

At present, PDF is the core output of Overleaf’s LaTeX-based typesetting. Direct conversion of “raw” author-produced LaTeX documents to XML or MathML does present technical challenges and, typically, such conversion processes may require “normalizing” of the input and, perhaps, varying forms of “cleanup” of the output resulting from conversion—depending on the tools and technologies performing the conversion. Discussions of “print vs. digital” and the suitability of PDF as a container format for content distribution have been, and continue to be, played out in heated threads and debates across the web; however, we thought that a brief overview of the evolutionary path to today’s publishing context might be interesting.

From PostScript to JavaScript and beyond

The last few decades have born witness to a stream of technological changes affecting the creation, production and dissemination of scholarly content. In the 1970s TeX liberated scientists and mathematicians from the strictures of proprietary typesetting systems; the 1980s saw the creation of PostScript, and new font technologies, which helped to spawn the Desktop Publishing revolution. Less than a decade after PostScript came the rise of the Web, HTML and the birth of PostScript’s child: PDF—adopted by the publishing industry as the de facto file-transfer format. Not long after this, XML-based technologies, including MathML, gained popularity and traction within journal publishing. Today, scholarly publishing can produce and disseminate content built using a plethora of digital technologies, including MathJax, SVG (Scalable Vector Graphics), JavaScript, CSS and HTML5 functionality. Furthermore, those technologies can be variously packaged and combined to produce digital books through container specifications such as epub. Other enabling technologies include Unicode for text encoding and OpenType font technologies—which work together to enable the communication, transmission and rendering of textual content that is dependent upon complex typographic rules. The rapid growth in video and audio content has been enabled by powerful desktop computers, tablets and mobile devices, all equipped with increasingly sophisticated browsers and access to fast communications technologies.

Content: complexity and consumption

Quite clearly, scholarly content can now be created and distributed in a wide range, or mixture, of digital formats but the ecosystem used to access and “consume” it comprises an inhomogeneous mix of hardware and software technologies—a heady mix of vendor, reading device and operating system/platform. As the complexity of digital content increases, so it can become more reliant upon the specific capabilities of the technology being used to read it. Simple text may suffer the indignity of, perhaps, the occasional missing glyph or the lack of some fine ligatures but, overall, it is likely to survive pretty much unscathed. As you move up the complexity chain to incorporate advanced mathematics, complex Javascript/CSS, interactive features or text in complex-script languages, the end-user’s (i.e., reader’s) experience, can become increasingly dependent on their local environment—i.e., the capabilities of the software used to display the content—whether it is a web browser, a tablet or a mobile phone. Clearly, ensuring accuracy and fidelity of reproduction is an absolutely essential prerequisite for scholarly communications—you need to know that what you produce can be “consumed” by the vast majority of its potential readership without fear of that technology limitations will degrade or restrict the reader’s experience. Perhaps this partly explains why the stalwart PDF file stubbornly refuses to retire from service and, for some, it remains the favoured way to read, print and share books or journal papers which contain highly complex content.

The Stoic Resilience of PDF Within a Digital Ecosystem

From PostScript to JavaScript and beyond

Content: complexity and consumption

Get in touch

Message received