number of items on `hexvals` is 128 while table size is 256, so
remaining 128 items are filled with zero. As a result, values in
\xf0-\xff will be treated as zero while should be rejected.
This sounds rather insane to me but I've managed to show that `(char)
-1` is converted to 255 on some platforms. This was reproduced on
ppc64el via Qemu on OS X. A simple program that does `fprintf(stderr,
"%d\r\n", (char) -1);` prints 255 to the console. Rather than rely on
the signedness of a char I've just updated things to use an unsigned
char (which hopefully is never signed) and replaced -1 with 255 for the
sentinel value when converting hex values.
Thanks to Balint Reczey (@rbalint) for the report.
Fixes#74
I had a silly direction mistake in a bit shift that was causing the high
portion of all combining characters to be printed as \uD800 which is
obviously wrong. This bug only affects people using the non-default
uescape option during encoding.
It was possible to pass some types of invalid UTF-8 through Jiffy's
encoder. Specifically, if uescaping isn't used, values that would decode
from 0xD800 to 0xDFFFF, 0xFFFE, 0xFFFF, and values greater than 0x10FFFF
would not be flagged as invalid. Now they are.
The encoder can now return \u escaped unicode data instead of leaving
it as UTF-8 byte sequences. This done like so:
Eshell V5.8.3 (abort with ^G)
1> jiffy:encode(<<240, 144, 129, 128>>, [uescape]).
<<"\"\\uD800\\uDC40\"">>