Odysee and YouTube videos are now available

0 - Preface

I’m willing to believe that for most of your life every document you’ve ever made or that most people around you ever made have been made with either Microsoft Word that you’ve obviously paid for and not pirated from 1337x, Google Docs, or in much rarer cases even LibreOffice Writer.

I cannot blame you, they do their job, but I don’t think they’re fit for someone like me.

But, I am a sucker for open source software. I mean, just look at me. I’m a Linux user! And while LibreOffice Writer should do the trick in this case, it didn’t scratch my itch.

Ever since I discovered the power of LaTeX I was fully enthralled, but I was left with a spot that I just couldn’t solve no matter how hard it seemed that I tried. That being the production of ebooks.

The issue here is that LaTeX is a language that is meant to produce perfectly formatted PDFs. And, well, PDF and ePub are two very, very different formats.

PDF was made to solve a rather specific problem, that problem being printable documents. You can notice that’s the case from how hard it is to edit PDFs. You can fix a typo in the PDF, but then the formatting can and probably will break, leaving you with a rather ugly PDF. The whole thing is fixed in place and simply isn’t meant to be edited. It’s meant to be a compilation target towards which programs will turn a document that’s easily editable into a final PDF.

In contrast, ePub is just a ZIP file with some XHTML, CSS and whatever media there is inside of it, and maybe even some JavaScript. It was made for digital devices, by default the contents adapt to the size of the screen, or to whatever size your ebook reader is, you can swap out fonts and change font sizes for anything you want, with no complaint from the format or the readers. This is amazing as, for example, if you’re an editor and you find an error or typo in the ePub, you can simply unpack the ePub, edit the text directly and suffer no formatting consequences because of ePub’s reflowability. And because it’s a digital format it means that we can do things that PDF can’t (or shouldn’t) such as being able to embed audio and video files into the ebook.

Additionally, if you want a fixed shape like PDF’s but still want the power that comes with embedding media inside of it ePub does have a fixed layout mode, which could come in handy for, say, a textbook that could make use of various media. But I’ll be featuring reflowable ePubs in this article anyway since I don’t know how to make fixed layout ones with the tools mentioned.

So, how can we write reflowable ePubs in a markup language similarly to how LaTeX does? The answer I want to offer to you is Markdown and Pandoc.

1 - The tools of the trade

Introduction to the tools

You may recognize Markdown if you’re a programmer. Pretty much every modern software codebase comes with a README.md, the .md telling you that it’s a markdown file. And all that Markdown really is is a subset of HTML specialized in the creation of text documents rather than website creation through its extremely easy to learn syntax, making it an optimal solution for our purposes. In fact, if you use a Static Site Generator like 11ty, Jekyll, Hugo or Zola you should already be familiar with Markdown as it’s the main language used for content creation for those generators, as well as the most popular option. Plus, because Markdown is just a reduced subset of HTML there is nothing stopping us from including actual HTML inside of our document to extend its features. This, off course, includes also adding CSS and JavaScript to it.

I think you can see where I’m going with this.

Pandoc is a text document converter. It has a vast array of supported formats it can convert from and to, as well as a vast amount of customization options to go along. With it we can convert our Markdown to an ePub with no problem, and even to a PDF but, I wouldn’t recommend that, I’ll go into it later when I’ll talk about LaTeX.

Before we start I would recommend you install a more complex text editor than your typical Notepad or Notepad++, usually one that is often also used by programmers. I personally use VSCode which is what I recommend if you’ve absolutely new to this workflow, If you’re a programmer I’m sure you’re already using something that you like, like NeoVim for example. But believe me you will want quick access to that terminal interface.

I use VSCodium for philosophical reasons, I would recommend that you do to since VSCodium doesn’t have telemetry unlike VSCode

I’m going to list some VSCode extensions that you can install to hopefully make your life easier.

  • Bracket Pair Colorizer 2
  • Code Spell Checker
  • LaTeX Workshop
  • Prettier - Code formatter
  • Markdown Preview Enhanced

ePub

Starting is extremely easy. First of all, what I recommend is creating a folder where you’ll keep your Markdown files, opening said folder in your text editor, create a file with the .md extension, write some markdown in that file and then invoke Pandoc. Normally you’ll invoke Pandoc through a terminal interface, the command

pandoc input.md -o output.epub

will generate the epub you want

If we want to add some CSS we just need to add the --css flag along with the name of the CSS file

pandoc input.md -o output.epub --css style.css

Now, I wouldn’t try to go too crazy with the CSS. I severely doubt your weak, meek, effeminate, submissive and definitely overpriced if you bought new ebook reader does not come packed with the unmatched power of the Chromium Embedded Framework. I may be wrong, but I don’t think they’d include that if they actually cared about battery lifespan, so expect a reduced subset of CSS at work.

I won’t go too in-depth with the options that Pandoc and Markdown have as there is plenty of documentation, forum posts and examples online that you can find to solve your issues.

Also, if you’re new to the world of command line programs it should be helpful to know that generally speaking there is no strict order in which you have to add the arguments for programs.

So, this solved the ePub question.

But how about PDFs?

PDF

Making a PDF is gonna be a bit more involved.

You can simply convert your Markdown file into a PDF directly using Pandoc; what Pandoc will do is turn your Markdown into LaTeX and then invoke a LaTeX compiler to turn that LaTeX into a PDF. Hell, you can even define some LaTeX stuff in the Markdown’s YAML metadata block or separate YAML file, like the document class. You can also tell Pandoc which LaTeX engine you want to use using the --pdf-engine flag. I personally prefer using LuaLaTeX.

You will have to install a LaTeX distribution on your computer first as Pandoc doesn’t come with any of the popular LaTeX compilers by default. The LaTeX website has some recommendations for you depending on your operating system.

Markdown’s flexible design is, simply put, perhaps too free-flowing for the strictness of PDF. So my recommendation is to turn your Markdown into LaTeX and then start working in LaTeX. The output LaTeX from the Markdown conversion will be very bare-bones, so bare-bones in fact that you can’t even compile it. Perhaps it’s better that way, to allow you to get it formatted just how you want it. LaTeX’s logic is considerably different compared to Markdown’s; but I’m sure you’ll be able to get the hang of it.

Alternatively, you can also brute-force Markdown to PDF directly by seeing what you can add to the YAML metadata in the “Variables for LaTeX” section of the Pandoc manual, which I’ll leave here.

Pandoc supports converting your Markdown into a .docx or.odt file which you can then edit from an actual word processor. Although, in my opinion LaTeX is still the undisputed king for generating beautifully formatted PDFs.

2 - Additional optional tooling

I want to also talk about some extra tooling that we can use to make our experience a little better.

If you’re on Windows you will probably want to install WSL or something else that can provide commonplace UNIX and POSIX tools like bash or make by default.

For starters, how about some scripts to automate the process? A simple bash script, or if you want to be fancier, even a makefile can make the generation of files easier.

For storing these files you could use plain old cloud storage solutions, such as Google Drive, but since we’re dealing with potentially highly-changing text files here we could, perhaps, sprinkle in some version control software. Git is the de-facto standard for code these days, both Markdown and code are pure text so you can use Git without fuss there; although, if you’re going to add binary files like images, videos or audio you may want to consider also installing git-lfs as storing binary files can very quickly increase your Git repository’s size, making it take much longer to run operations on it. You can use any Git hosting website; the most popular options are GitHub and GitLab but, because I don’t like using them for philosophical reasons, I personally prefer Codeberg.

3 - Some alternatives.

⚠️

I’ll put a disclaimer for these alternatives, I haven’t tested them extensively so I’m not 100% certain on if what I’m saying here will be accurate.

⚠️

AsciiDoc

Starting with AsciiDoc.

If you’re not a fan of Markdown’s fragmentation issue, that is, its many flavors (which really isn’t an issue in our case since Pandoc includes all the relevant flavors) or find it too limited or lacking in features, there is a potential alternative you could look out for:

AsciiDoc is a powerful alternative to Markdown. Out of the box, AsciiDoc is much more descriptive and powerful compared to Markdown while still maintaining a fairly similar syntax to it; and it has only one single official implementation, with additional tooling for AsciiDoc provided by AsciiDoctor. But the reason I didn’t go with AsciiDoc instead of Markdown was because AsciiDoctor’s ePub generation is still very much in development (as well as said development appearing to be rather slow at first glance) and, for now, it seems that it doesn’t support using custom CSS, which is crucial to me. So, I discarded it as a non-started for my purposes, but you may find value in it, perhaps especially so, if you’re looking for something better for creating technical documentation.

tex4ebook (?)

I did claim at the beginning of this that LaTeX is not made for ebooks, but, despite my words there does seem to be a project called tex4ebook that is trying to implement actual ePub tooling using LaTeX. I have barely touched this project, but it seems alright. If you’re coming from LaTeX land it may very well be worth checking out; perhaps, even allowing you to ignore my personal Markdown workflow and always stick to LaTeX. It also supports creating fixed layout ePubs, apparently. Which I don’t think is possible with Pandoc, but you are free to correct me.