Git Tracked Document Pipeline: Why Developers Are Replacing Word and InDesign with Markdown PDF Workflows
Published on May 27, 2026 by The Kestrel Tools Team • 7 min read
You’ve got a 200-page technical manual in Google Docs. Three people are editing it. Someone accidentally deletes a chapter, and the version history is a wall of unnamed snapshots with no meaningful diff. You can’t grep it. You can’t branch it. You can’t run CI on it. A git tracked document pipeline solves every one of these problems — and the 275-point Hacker News post “I Bypassed Adobe and Microsoft to Build a Git-Tracked Book Production Pipeline” shows you’re not the only one who’s had enough of proprietary document tooling.
The idea is simple: treat documents like code. Write in plaintext (Markdown, AsciiDoc, reStructuredText), store it in Git, and compile to PDF or EPUB at the end. The execution is where it gets interesting — and where most existing guides leave you stranded.
What Is a Git Tracked Document Pipeline (And Why Markdown to PDF)?
A git tracked document pipeline is a workflow where your document source files live in a version-controlled repository, authored in a plaintext format, and transformed into distributable outputs (PDF, EPUB, HTML) through an automated build step.
The core components:
| Layer | What It Does | Common Tools |
|---|---|---|
| Source format | Human-writable plaintext | Markdown, AsciiDoc, LaTeX |
| Version control | Diff, branch, merge, blame | Git |
| Build system | Transform source → output | Pandoc, Typst, WeasyPrint |
| CI/CD | Automate builds on push | GitHub Actions, GitLab CI |
| Output format | Distributable artifact | PDF, EPUB, HTML |
This isn’t new — LaTeX users have done this for decades. What’s new is that the tooling has finally gotten good enough for non-LaTeX workflows. Typst compiles in milliseconds. Pandoc handles cross-references. And Markdown is something every developer already knows.
Why Binary Document Formats Break Developer Workflows
The frustration driving this movement isn’t aesthetic. It’s operational. Binary formats like .docx, .indd, and .pdf (as source) create concrete problems:
You can’t diff them meaningfully. Git will tell you a .docx file changed. It won’t tell you that someone rewrote paragraph three. You get Binary files differ and nothing else. That means code review — the single most powerful quality gate in software engineering — doesn’t work for your documents.
You can’t merge them. Two people edit the same Word document on different branches. Now what? You’re manually comparing versions side-by-side, exactly like it’s 2003.
You can’t grep them. Need to find every instance of a deprecated product name across 40 documents? With plaintext, it’s grep -r "OldName" docs/. With Word files, you’re opening each one individually.
You can’t automate them. Want to auto-generate a changelog, inject a build number, or validate that every chapter has a summary? With plaintext source, it’s a 10-line script. With InDesign, it’s a plugin that costs $200/year.
The Plaintext-First Architecture
Here’s what a production-grade document pipeline actually looks like:
docs/
├── src/
│ ├── 01-introduction.md
│ ├── 02-architecture.md
│ ├── 03-api-reference.md
│ └── images/
│ ├── system-diagram.svg
│ └── flow-chart.svg
├── templates/
│ └── company-style.typ
├── build.sh
├── Makefile
└── .github/workflows/build-docs.yml
The key architectural decisions:
- One file per chapter/section — enables parallel editing without merge conflicts
- Images as SVG or referenced assets — diffable, scriptable
- Style separated from content — template changes don’t touch source files
- Build script checked into repo — anyone can reproduce the output
The build step is typically one command:
pandoc src/*.md -o output/manual.pdf --template=templates/company-style.typ
Or with Typst (which has been gaining traction for its speed):
typst compile src/main.typ output/manual.pdf
Where PDF Utility Tools Fit in a Plaintext-First World
Here’s the part that surprises people: even in a plaintext-first pipeline, you still need PDF manipulation tools. The reason is that the real world doesn’t live entirely in your repo.
Merging generated outputs. Your pipeline produces per-chapter PDFs for review. Stakeholders want a single combined document. You need a merge tool.
Splitting received documents. A client sends you a 50-page contract as a single PDF. You need pages 12-15 for the appendix of your plaintext-sourced manual. You need a split tool.
Compressing for distribution. Your build produces a 45MB PDF with high-resolution diagrams. The email attachment limit is 25MB. You need compression.
Converting legacy content. You’re migrating from Word to Markdown. The existing 300 pages need to be extracted, cleaned, and restructured. PDF-to-text extraction is step one.
These are the exact operations that Kestrel’s PDF tools handle — merge, split, compress, convert — all client-side, no upload required. They fit into the pipeline as post-processing utilities: your source stays in Git, your build stays automated, and the PDF tools handle the messy edges where your plaintext world meets everyone else’s binary world.
Setting Up Your First Git Tracked Document Pipeline
Here’s a minimal working setup you can adapt:
Step 1: Initialize the structure
mkdir -p docs/{src,templates,output}
cd docs && git init
Step 2: Write your source in Markdown
# Chapter 1: Getting Started
This is your document source. It supports:
- Standard Markdown syntax
- Cross-references (with Pandoc filters)
- Code blocks with syntax highlighting
- Tables, footnotes, and citations
Step 3: Add a build command
#!/bin/bash
# build.sh
pandoc src/*.md \
--toc \
--number-sections \
-o output/document.pdf \
--pdf-engine=tectonic
Step 4: Add CI (GitHub Actions example)
name: Build Docs
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: sudo apt-get install -y pandoc tectonic
- run: chmod +x build.sh && ./build.sh
- uses: actions/upload-artifact@v4
with:
name: document-pdf
path: output/document.pdf
Now every push produces a fresh PDF. Every change is tracked. Every version is reproducible.
The Trade-Offs Nobody Mentions
This workflow isn’t free of friction. Being honest about the costs:
- Collaborative editing with non-developers is harder. Your marketing team isn’t going to learn Git. Solutions exist (Prose.io, HedgeDoc, Obsidian Publish) but they add complexity.
- Rich layout is more work upfront. A complex magazine layout that takes 10 minutes in InDesign might take an hour to template in Typst. The payoff comes at scale — the 50th document using that template takes zero layout time.
- Toolchain setup isn’t trivial. Pandoc, LaTeX engines, font configuration — the first build environment takes time. Docker helps (there are pre-built images), but it’s still overhead.
- Some outputs genuinely need WYSIWYG. If your deliverable is a one-off flyer with precise visual placement, plaintext source is the wrong tool. Use the right tool for the job.
When This Architecture Wins
The plaintext-first pipeline pays for itself when:
- Multiple contributors need to work on the same document without conflicts
- Documents have long lifespans and many revisions (manuals, specs, policies)
- You need to audit who changed what, when, and why (blame + commit messages)
- Outputs need to be generated in multiple formats (PDF + HTML + EPUB)
- Quality gates matter (linting, spell-check, link validation in CI)
- Content is partially generated (API docs, changelogs, data-driven reports)
If you recognize your workflow in that list, you already know the next step.
The Bigger Picture
The 275-point Hacker News post isn’t really about PDFs. It’s about ownership. Developers are used to owning their toolchain — choosing their editor, their build system, their deployment target. Documents have been the last holdout where proprietary tools dictated the workflow.
That’s changing. The tooling is ready. The community is building. And the utility tools that handle the messy edges — merging, splitting, compressing, converting — are there when your plaintext pipeline meets the rest of the world.
Ready to handle the PDF side of your pipeline? Try Kestrel’s PDF tools — merge, split, and compress without uploading your files anywhere.