TK
Back to all posts

Browser-only Markdown→PDF with Mermaid: pagination math, SVG rasterisation, and the bugs in between

How TaskKit's markdown editor exports a real PDF — with rendered Mermaid diagrams — entirely in the browser. Plus the four bugs that took longer to find than the fix.

Published

A teammate asked the other day if there was a way to render a markdown document with Mermaid diagrams to a PDF without uploading it. The answers we found were either “install this Chrome extension that wants Drive access” or “use this CLI that needs Node and Pandoc”. For a 30-second action that should be one button on a page, that’s a lot of moving parts and a lot of trust handed out.

So we built it into TaskKit’s markdown tool. The pipeline is interesting enough — and the bugs we hit getting there were specific enough — that it’s worth writing up.

What the pipeline actually does

Markdown text
  → render to HTML (small custom renderer, not a library)
  → live preview DOM (mermaid replaces <pre><code class="language-mermaid"> with <svg>)
  → on Export PDF:
      → off-screen clone of the preview's children
      → rasterize every <svg> into a flat PNG <img>
      → element-aware pagination: insert spacers so blocks don't straddle page boundaries
      → html2canvas-pro captures the cloned stage as one tall canvas
      → pdf-lib slices the canvas into A4-sized PNGs and embeds each as a page
  → user downloads the PDF

Every stage runs in the browser. No upload, no network call during export, no fonts pulled from CDNs. The whole thing is pdf-lib + html2canvas-pro + mermaid, dynamically imported on first use so the editor stays light for users who never export.

That’s the happy path. The actual story is four bugs in a row.

Bug 1: dangerouslySetInnerHTML wipes Mermaid SVGs every render

The first version had a React preview that did this:

<div
  ref={previewRef}
  className="prose-md ..."
  dangerouslySetInnerHTML={{ __html: html }}
/>

Plus a useEffect that walked the DOM after each render, found <pre><code class="language-mermaid"> blocks, and replaced them with rendered SVG.

That worked for the live preview. It broke the export.

When the user clicked the Export button, setExporting(true) triggered a React re-render. dangerouslySetInnerHTML re-applied the original html string — which contained the raw <pre><code class="language-mermaid"> blocks. The preview’s DOM was reset to the pre-rendered state. The mermaid effect’s deps [html, previewNode, mermaidError] hadn’t changed, so the effect didn’t re-fire. By the time the export pipeline cloned the preview, the SVGs were gone and only the raw <pre> blocks remained.

The fix: stop using dangerouslySetInnerHTML. Manage the preview’s DOM imperatively from a useLayoutEffect:

const [previewNode, setPreviewNode] = useState<HTMLDivElement | null>(null);

useLayoutEffect(() => {
  if (!previewNode) return;
  previewNode.innerHTML = html;
  // …then run mermaid replacement on the fresh DOM
}, [html, previewNode]);

return <div ref={setPreviewNode} className="prose-md ..." />;

Now React only writes to the DOM when html actually changes. Re-renders triggered by other state (exporting, copied) leave the rendered SVGs alone.

This is also why the ref is a callback ref (setPreviewNode) and not a useRef: if the preview div does get remounted (switching between Edit / Split / Preview view modes), the callback fires with the new node and the effect re-runs against fresh DOM.

Bug 2: rasterised SVGs measure as 0 pixels until decoded

The export step replaces every <svg> in the cloned stage with an <img> of a rasterised PNG. Cleaner than letting html2canvas-pro deal with inline SVG quirks (<style> scoping, <foreignObject> rendering).

const xml = new XMLSerializer().serializeToString(svgClone);
const blob = new Blob([xml], { type: "image/svg+xml;charset=utf-8" });
const url = URL.createObjectURL(blob);
const img = await loadImage(url);

const canvas = document.createElement("canvas");
canvas.width = w * 2; canvas.height = h * 2;
const ctx = canvas.getContext("2d");
ctx.drawImage(img, 0, 0, w, h);

const replacement = document.createElement("img");
replacement.src = canvas.toDataURL("image/png");
svg.replaceWith(replacement);

This compiled. The export ran. The PDF had Mermaid diagrams that split across page boundaries — a thin strip of the diagram on page 1, the rest on page 2.

The page-aware pagination step (which I’ll get to) was supposed to push diagrams that would straddle a boundary onto the next page. It wasn’t pushing them. Why?

Because the previous step ran getBoundingClientRect() on the new <img> immediately after replaceWith. The PNG data URL hadn’t decoded yet. Layout reported height: 0. The pagination step has an early-return for zero-height blocks, so it skipped these images. By the time html2canvas captured the canvas, the images had loaded and were full size.

Two fixes layered:

// Reserve the right box up front so layout has the correct height
// before paint, even without decode.
replacement.style.cssText = `display:block;max-width:100%;height:auto;aspect-ratio:${w}/${h};`;
replacement.src = canvas.toDataURL("image/png");

// Wait for the bitmap to actually decode so getBoundingClientRect
// returns the correct height when pagination measures it next.
try {
  await replacement.decode();
} catch {
  /* aspect-ratio reservation handles the fallback */
}

svg.replaceWith(replacement);

aspect-ratio is the better fix — it tells the browser the box’s intrinsic shape before the image decodes, so layout settles immediately. await img.decode() is belt-and-suspenders. With both, pagination measures real heights.

Bug 3: pagination boundaries didn’t match slice boundaries

This was the big one. The PDF still had diagrams straddling pages, even after fixing the timing.

The export has two boundary calculations that must agree:

  1. Slicing — after html2canvas returns one tall canvas, we slice it into A4-sized pieces, one per PDF page. The slice height in canvas pixels comes from canvas.width / contentW (PDF points to canvas-pixel ratio).
  2. Pagination — before html2canvas runs, we walk the DOM and insert spacers so blocks don’t straddle the upcoming slice cuts.

I had:

// Slicing — uses full canvas width
const pxPerPoint = canvas.width / contentW;
const sliceHeightPx = Math.floor(contentH * pxPerPoint);

// Pagination — uses the inner content width (mistake)
const stageContentW = PRINT_WIDTH_PX - STAGE_PADDING * 2;
const sourcePagePx = Math.floor(stageContentW * (contentH / contentW));

PRINT_WIDTH_PX = 800 (the off-screen stage width). STAGE_PADDING = 32 (left/right padding I’d added so content wasn’t flush with the edge). stageContentW = 736. The two formulas computed:

  • Slice cuts: every 1177 source pixels of stage height
  • Pagination spacers: every 1083 source pixels of stage height

So spacers were pushing content to the wrong place, ~94 pixels short of every actual slice cut. Diagrams that should have started cleanly at the top of page N started 94 pixels above and got truncated, leaving a thin strip on page N-1.

Two fixes — one mathematical, one structural:

// Match: derive page-source-pixels from full stage width, not inner
const sourcePagePx = Math.floor(PRINT_WIDTH_PX * (contentH / contentW));

…and dropped the stage padding entirely. The 32px top padding was offsetting main’s top from the canvas top by 32 source pixels, so even with the math fix, pagination (measured from main) and slicing (measured from canvas) couldn’t align. The PDF’s own MARGIN points already give the visible breathing room around content; adding it to the stage just made things hard.

After both: spacers land exactly where slices cut. Blocks land cleanly on page tops.

Bug 4: long inline code wraps as two pills

The renderer’s CSS gave <code> a background pill, padding, and a thin border:

.prose-md code {
  background: var(--soft);
  border: 1px solid var(--line);
  border-radius: 4px;
  padding: 0.05em 0.35em;
}

A line containing a long inline span like `profile.locationId === jwt.locationId` would wrap mid-content because the text before it had pushed the inline near the right edge. Each wrapped segment got its own padding pill, so a single logical <code> element appeared as two disconnected, fully-decorated pills with whitespace between them. Looked broken even though the HTML was right.

Two CSS rules fix it:

.prose-md code {
  white-space: nowrap;             /* keep inline code on one line */
  -webkit-box-decoration-break: clone;
  box-decoration-break: clone;     /* if it does wrap, decorate each segment */
}
.prose-md pre code { white-space: pre; }   /* preserve fenced-block layout */

white-space: nowrap keeps the span on one line as long as possible. Within an 800-px-wide print stage that’s almost always achievable. box-decoration-break: clone is the fallback: if a code span really has to wrap, each wrapped segment renders with its own complete pill (full padding, full background) — visually consistent, not broken-looking.

What’s still imperfect

Two limitations worth naming:

Mermaid font and external-asset handling. Mermaid’s securityLevel: "strict" mode (which we use) blocks inline JavaScript and <foreignObject> HTML rendering. It does not automatically inline external fonts or icon assets. Newer diagram types — architecture-beta especially, plus some sankey configurations — reference icon packs by URL. Letting the browser fetch those during the off-DOM SVG render would leak the user’s IP to icon CDNs during an export that’s supposed to be local-only. Wrong trade for this site.

The mitigation: a sanitiser pass on every SVG before serialisation that strips @font-face rules with external URLs, drops <image href="https://…"> and <use href="https://…"> elements, and forces a web-safe font stack onto <text> nodes:

function sanitizeSvgForOfflineRender(svg: SVGSVGElement): void {
  for (const style of svg.querySelectorAll("style")) {
    if ((style.textContent ?? "").includes("@font-face")) {
      style.textContent = (style.textContent ?? "").replace(/@font-face\s*\{[^}]*\}/g, "");
    }
  }
  for (const img of svg.querySelectorAll("image")) {
    const href = img.getAttribute("href") ?? img.getAttribute("xlink:href") ?? "";
    if (/^https?:\/\//i.test(href)) img.remove();
  }
  // …same for <use>, plus inline-style fallback fonts on <text>
}

The result: icon-heavy diagrams lose their icons but keep their geometry and layout. Fully supported diagram types (where everything renders correctly) are flowchart, sequence, class, state, ER, journey, gitGraph, timeline, mindmap, gantt, and pie. Architecture and the icon-pack variants of sankey degrade gracefully.

Wide diagrams. A landscape gantt or large flowchart at 2000+ pixels wide gets scaled to fit a 736-pixel A4 content column. The geometry survives; the text labels often become unreadable. The cleaner fix is per-page orientation — push wide diagrams onto their own A4-landscape page, keep the rest portrait. pdf-lib supports landscape pages (just call addPage([PAGE_H, PAGE_W])), but routing the right diagrams to the right pages adds a non-trivial layer to the pagination logic, and we wanted to ship something that’s correct for the common case first.

For now: landscape diagrams up to ~1.5× aspect ratio render legibly; wider than that, you’ll want to split them in the source markdown.

Why browser-only

The reason this is a TaskKit tool and not, say, a Pandoc wrapper, is the trust posture. When you click Export PDF, your markdown — including any internal-architecture notes, customer data, or in-progress writing — never leaves the tab. The work happens in pdf-lib and html2canvas-pro, which are pure JavaScript and don’t initiate network calls during rendering. The Mermaid library renders diagrams synchronously from text input. The font is whatever your OS already has loaded for the page.

If you’ve ever pasted a sensitive document into an online “Markdown to PDF” converter, you’ve already done the threat-model math. Doing it locally just removes the question.

The full markdown editor with PDF export is at /dev/markdown.

  • JSON Formatter — same pattern: heavy editor that loads its work-libs dynamically, pure-browser pipeline
  • PDF Merge, PDF Split — pdf-lib usage in a different shape
  • Privacy — the full version of “what does and doesn’t leave your device”