number of items on `hexvals` is 128 while table size is 256, so
remaining 128 items are filled with zero. As a result, values in
\xf0-\xff will be treated as zero while should be rejected.
Previously if a key was malformed UTF-8 and the user specified the
`force_utf8` option we would fail to try and encode a fixed up version
of the object. This was due to missing a clause to catch the
`invalid_object_member_key` exception. This adds the clause and a couple
tests to ensure it works.
A single quote input was causing segfaults due to sneaking past the
string termination logic. This patch corrects that lapse in conditional
by only parsing strings where a closing quote was found. All other
strings are rejected as invalid.
Big thanks to Jean-Charles Campagne (@jccampagne) for reporting the
issue.
After a few requests and some reflection I've decided to stop escaping
forward slashes in strings while still accepting the escaped version
through the decoder. This appears to mimic the behavior of other popular
JSON libraries I've checked (Ruby and Python).
By default Jiffy is quite strict in what it encodes. By default it will
not allow invalid UTF-8 to be produced. This can cause issues when
attempting to encode JSON that was decoded by other libraries as UTF-8
semantics are not uniformly enforced.
This patch adds an option 'force_utf8' to the encoder. If encoding hits
an error for an invalid string it will forcefully mutate the object to
contain only valid UTF-8 and return the resulting encoded JSON.
For the most part this means it will strip any garbage data from
binaries replacing it replacement codepoint U+FFFD. Although, it will
also try and the common error of encoding surrogate pairs as three-byte
sequences and reencode them into UTF-8 properly.
I had a silly direction mistake in a bit shift that was causing the high
portion of all combining characters to be printed as \uD800 which is
obviously wrong. This bug only affects people using the non-default
uescape option during encoding.
It was possible to pass some types of invalid UTF-8 through Jiffy's
encoder. Specifically, if uescaping isn't used, values that would decode
from 0xD800 to 0xDFFFF, 0xFFFE, 0xFFFF, and values greater than 0x10FFFF
would not be flagged as invalid. Now they are.
The encode and decode functions now return the value directly without
being wrapped in a tuple on success. If there is an error, it is
thrown. This is to more closely match the semantics of term_to_binary
and binary_to_term.