Exploring atob(null)

atob(null) is a valid JavaScript expression returning a valid JavaScript string, '\x9Eée'.

What's happening here is, I think, interesting. First things first, atob(null) is, thanks to type coersion, atob("null").

Given JS strings are UTF-16, those 4 codepoints really take 8 bytes, but decode to 3 bytes:

[...atob(null)].map((i)=>i.charCodeAt(0))
[158, 233, 101]

\x9E appears because U+009E is the non-printable Private Message.

Everything works out great for latin languages as Unicode codepoints 0x80-0xFF are the Latin-1 Supplement block.

What of encoding characters outside the first 2 blocks then, given btoa is the reciprocal of atob?

> btoa('π')
Uncaught InvalidCharacterError: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.

I found this surprising. My intuition would have been that btoa works with arbitrary strings.

If and when armor64 gets a web implementation, I hope it offers UTF-8 encoding for JS strings, and only exposes the equivalent of btoa on Uint8Array, atob to Uint8Array.

Small dose of nothing presents

Exploring `atob(null)`