Why `btoa("café")` Throws (And How to Base64-Encode Unicode the Right Way)

Published on May 17, 2026 by The Kestrel Tools Team • 8 min read

You’re wiring up a quick share link. You grab a string, hand it to btoa, and call it done. Then a coworker pastes a customer name with an accent into the test, and the page lights up red:

Uncaught DOMException: Failed to execute 'btoa' on 'Window':
The string to be encoded contains characters outside of the Latin1 range.

The string was café. Three letters, an accent, and a function that has lived in browsers since 1995 cannot encode it. Welcome to btoa and Unicode in JavaScript — a 30-year-old API that silently assumes the world is Latin-1, and the half-dozen Stack Overflow answers older than your tech stack that still recommend a deprecated workaround.

This post is a walkthrough of why btoa throws on Unicode, the one-line modern fix using TextEncoder, a roundtrip example with an emoji, the URL-safe Base64 variant you’ll need next, and the legacy unescape(encodeURIComponent(...)) pattern you should stop copy-pasting in 2026.

Why does `btoa` throw on Unicode in JavaScript?

Because btoa is defined to operate on a byte string, not a JavaScript string. It interprets each character’s code point as a single byte, which only works for code points 0–255 (the Latin-1 range). Anything above U+00FF — é (U+00E9 still squeaks in), 🎉 (U+1F389), 中 (U+4E2D), even a smart quote ' (U+2019) — is outside that range, and the spec says throw InvalidCharacterError.

The root cause is historical. btoa and atob come from the original Netscape window object, which predates JavaScript’s UTF-16 string model. The HTML spec keeps the original semantics for compatibility: btoa takes a string where each character is already a byte. It is not a Unicode-aware function and was never meant to be one.

You can reproduce the whole error in five seconds:

btoa('hello');     // 'aGVsbG8=' — fine
btoa('café');      // throws InvalidCharacterError
btoa('🎉');        // throws InvalidCharacterError
btoa('snowman ☃'); // throws InvalidCharacterError

And here is the precise rule from the HTML spec: if any code unit in the input is greater than U+00FF, throw. There is no graceful fallback. There is no “try harder.” The function does exactly what it has always done; the input just isn’t a byte string.

The modern fix: encode to bytes first, then Base64

The correct mental model is: btoa is a bytes-to-Base64 function, not a string-to-Base64 function. If you have a JavaScript string, your job is to turn it into bytes first. The right tool for that has been TextEncoder since 2017, and it’s been Baseline (every browser, every Node version since 11) for years.

Here is the complete, modern pattern:

function encodeBase64Unicode(str) {
  const bytes = new TextEncoder().encode(str);
  let binary = '';
  for (const byte of bytes) {
    binary += String.fromCharCode(byte);
  }
  return btoa(binary);
}

encodeBase64Unicode('café');     // 'Y2Fmw6k='
encodeBase64Unicode('🎉');       // '8J+OiQ=='
encodeBase64Unicode('snowman ☃'); // 'c25vd21hbiDimIM='

The pipeline is: JavaScript string → UTF-8 bytes (via TextEncoder) → byte string (each byte stuffed into a Latin-1 character) → Base64 (via btoa). Every step is well-defined and lossless. The output is the same Base64 you’d get from python3 -c 'import base64; print(base64.b64encode("café".encode()))' in a terminal — interoperable with any backend that expects UTF-8 Base64.

The inverse is the same in reverse. atob returns a byte string (each character a byte), and TextDecoder turns that back into a UTF-8 string:

function decodeBase64Unicode(b64) {
  const binary = atob(b64);
  const bytes = Uint8Array.from(binary, (char) => char.charCodeAt(0));
  return new TextDecoder().decode(bytes);
}

decodeBase64Unicode('Y2Fmw6k=');         // 'café'
decodeBase64Unicode('8J+OiQ==');         // '🎉'
decodeBase64Unicode('c25vd21pbiDimIM='); // 'snowman ☃'  (note: typo in this string returns garbage — that's the point of a roundtrip test)

Use this pair for every new piece of code. There is no scenario in 2026 where you should be reaching past TextEncoder/TextDecoder for this.

A roundtrip test you can paste into DevTools right now

The single sanity check that shows the whole bug and the whole fix in one go:

const input = 'café 🎉 中文';

// The broken way:
try {
  btoa(input);
} catch (err) {
  console.log('btoa direct:', err.name); // 'InvalidCharacterError'
}

// The right way:
const encoded = encodeBase64Unicode(input);
const decoded = decodeBase64Unicode(encoded);

console.log(encoded);          // 'Y2Fmw6kg8J+OiSDkuK3mlocy'
console.log(decoded === input); // true

If decoded === input ever returns false, you have a real bug — either the encoder or the decoder is using a different byte representation than UTF-8 (CESU-8, UTF-16, or the legacy escape-based hack are the usual suspects). For Unicode-clean code, that comparison should always be true, regardless of how exotic the input is.

Stop using `unescape(encodeURIComponent(...))`

If you’ve Googled this problem before 2022, you’ve seen the answer:

// Don't do this in 2026:
btoa(unescape(encodeURIComponent('café'))); // 'Y2Fmw6k='

It works. It will keep working in browsers for a long time, because the web doesn’t break things. But every part of it is the wrong shape for modern code:

unescape was deprecated in ECMAScript 1999 and is part of B.2.1 — the Annex B “compatibility” section that exists strictly so old web pages don’t break. New code shouldn’t reach into Annex B by choice.
The pipeline percent-encodes the string into ASCII, then un-percent-encodes it back into bytes, which is two extra string conversions for no semantic gain over TextEncoder.
It silently mishandles a few edge cases, particularly lone surrogate code units (a real concern when you’re encoding user-pasted strings that may have come from a buggy source). TextEncoder replaces lone surrogates with U+FFFD, which is the correct behavior; the unescape/encodeURIComponent chain throws a URIError on the same input.
Linters, type-checkers, and IDEs flag unescape because it’s deprecated. New developers will read it, search what it does, and learn that they should be using something else. That’s pure friction.

The only reason to keep this pattern around is if you’re maintaining code that has to run on a browser without TextEncoder — and the last such browser (IE 11) lost mainstream support years ago. If you ship a modern site in 2026, prefer the TextEncoder pattern in every new piece of code, and migrate older code when you touch it.

URL-safe Base64: the variant you’ll need next

Standard Base64 uses the alphabet A-Z a-z 0-9 + / plus = for padding. Three of those characters — +, /, and = — have special meanings in URLs, so a Base64 string dropped into a query parameter will sometimes need percent-encoding and sometimes mysteriously break depending on which framework decodes it.

URL-safe Base64 (defined in RFC 4648 §5) substitutes - for + and _ for /, and typically drops the = padding. It’s what JWTs use, what aud/iss claims expect, and what the GitHub API uses for content blobs. Once you’ve fixed the Unicode encoding, converting to URL-safe is a one-liner:

function encodeBase64UrlUnicode(str) {
  return encodeBase64Unicode(str)
    .replace(/\+/g, '-')
    .replace(/\//g, '_')
    .replace(/=+$/, '');
}

function decodeBase64UrlUnicode(b64url) {
  const padded = b64url.replace(/-/g, '+').replace(/_/g, '/') +
    '==='.slice((b64url.length + 3) % 4);
  return decodeBase64Unicode(padded);
}

encodeBase64UrlUnicode('café 🎉'); // 'Y2Fmw6kg8J-OiQ'
decodeBase64UrlUnicode('Y2Fmw6kg8J-OiQ'); // 'café 🎉'

The re-padding logic looks ugly but is correct: Base64 is always a multiple of 4 characters, so you pad back up to that boundary with = before calling atob. Skip the padding and atob throws on some inputs.

If you’re working with JWTs, OAuth state parameters, or any URL-friendly token format, this is the variant you want. If you’re just encoding for a data: URI or an HTTP header, standard Base64 is fine.

What about `Buffer` in Node and `Uint8Array.toBase64()`?

A quick tour of the non-browser corners:

Node.js: Use Buffer.from(str, 'utf8').toString('base64'). It’s correct, it’s been there since Node 0.x, and it does the UTF-8 conversion and Base64 in one step. There’s no btoa quirk to work around.

Uint8Array.prototype.toBase64() (TC39 Stage 3, 2024): A new built-in method that takes a byte array and returns Base64 directly, with options for URL-safe alphabet and padding. It’s Baseline 2024 in newer browsers but not yet in older Node LTS. When it lands everywhere, the pattern simplifies to:

// Future (2026+): when Uint8Array.toBase64 is universally available
new TextEncoder().encode('café').toBase64();          // 'Y2Fmw6k='
new TextEncoder().encode('🎉').toBase64({ alphabet: 'base64url', omitPadding: true }); // '8J-OiQ'

Until then, the TextEncoder + btoa pattern above is the cross-platform answer. You can feature-detect Uint8Array.prototype.toBase64 and progressively enhance, but the polyfill path is what should ship.

Common questions, answered concisely

Why does btoa('é') work but btoa('🎉') throw? é is U+00E9, which is exactly within the Latin-1 range (0–255). 🎉 is U+1F389, well above 255, so it throws. The cutoff is exactly at U+00FF.

Is the output the same as base64 on the command line? Yes, as long as your input is UTF-8 (which TextEncoder always produces). printf 'café' | base64 and encodeBase64Unicode('café') both return Y2Fmw6k=.

What about emoji with skin tone modifiers or ZWJ sequences? They work. UTF-8 handles them as multi-byte sequences (4 bytes each for the modifier + base, joined via U+200D). TextEncoder produces the right bytes, Base64 round-trips them, TextDecoder reassembles the grapheme cluster. No special handling needed.

Can I use this for binary data like images? No — for actual binary data (a File, an ArrayBuffer, a Blob), skip the string step entirely. Use FileReader.readAsDataURL or btoa(String.fromCharCode(...new Uint8Array(buffer))) directly on the bytes. TextEncoder is for strings.

Does this affect atob the same way? Yes, in reverse. atob returns a byte string. If you have multi-byte UTF-8 in there, calling decodeURIComponent(escape(atob(b64))) is the legacy fix; new TextDecoder().decode(Uint8Array.from(atob(b64), c => c.charCodeAt(0))) is the modern one.

Verifying it in your own code

The single fastest sanity check: drop a string with at least one accent, one emoji, and one CJK character into a client-side Base64 Encoder and confirm three things — the encode succeeds, the decode round-trips to the original string, and the output matches what your backend produces from the same input.

If any of those fail, the encoder is using the broken btoa-direct path or the deprecated escape chain. A correct encoder uses TextEncoder under the hood, handles every Unicode code point, and produces the same bytes a Python or Go server would. Once you’ve seen one work, the bug pattern is unmistakable everywhere else.

The takeaway

btoa throws on Unicode because it’s a 1995 API that was specified for byte strings, not JavaScript’s UTF-16 strings. The fix isn’t to wrestle with btoa — it’s to convert your string to UTF-8 bytes first with TextEncoder, then feed those bytes to btoa.

Three patterns to remember:

Encode: btoa(String.fromCharCode(...new TextEncoder().encode(str)))
Decode: new TextDecoder().decode(Uint8Array.from(atob(b64), c => c.charCodeAt(0)))
URL-safe: swap +// for -/_, drop = padding, re-pad before atob.

Stop using unescape(encodeURIComponent(...)) in new code — it’s deprecated, mishandles lone surrogates, and adds two unnecessary string conversions. When Uint8Array.prototype.toBase64() reaches Baseline across your target environments, simplify further.

If you want to sanity-check a string round-trip without writing any code, paste it into Kestrel Tools’ Base64 Encoder — it runs entirely client-side, handles Unicode correctly, and matches the bytes a UTF-8 backend would produce.

Why btoa("café") Throws (And How to Base64-Encode Unicode the Right Way)

Why does btoa throw on Unicode in JavaScript?