As it turns out I did not understand the documenation for
load/upgrade/unload correctly. load/upgrade are called conditionally if
there's code in the VM for the NIF. Ie, no code means load is called,
where as if code exists, upgrade is called.
unload is called regardless once per load/unload. This means that
load/upgrade in Jiffy's case will each create a state object and unload
will free it each time. I was missing the fact that unload is called
every time and hence I don't need to clear the state in upgrade.
The docs aren't entirely clear on the order of calls for upgrades so
this is mostly just in case old_priv ever happens to not be what load
returned in priv.
By default Jiffy is quite strict in what it encodes. By default it will
not allow invalid UTF-8 to be produced. This can cause issues when
attempting to encode JSON that was decoded by other libraries as UTF-8
semantics are not uniformly enforced.
This patch adds an option 'force_utf8' to the encoder. If encoding hits
an error for an invalid string it will forcefully mutate the object to
contain only valid UTF-8 and return the resulting encoded JSON.
For the most part this means it will strip any garbage data from
binaries replacing it replacement codepoint U+FFFD. Although, it will
also try and the common error of encoding surrogate pairs as three-byte
sequences and reencode them into UTF-8 properly.
I had a silly direction mistake in a bit shift that was causing the high
portion of all combining characters to be printed as \uD800 which is
obviously wrong. This bug only affects people using the non-default
uescape option during encoding.
Numbers like 1.0 were being encoded as <<"1">> which can lead to a bit
of confusion. This merely checks if a decimial point exists and if not
it appends ".0" to the value.
It was possible to pass some types of invalid UTF-8 through Jiffy's
encoder. Specifically, if uescaping isn't used, values that would decode
from 0xD800 to 0xDFFFF, 0xFFFE, 0xFFFF, and values greater than 0x10FFFF
would not be flagged as invalid. Now they are.
Based on the buffer doubling scheme in Yajl. This seems to make
Jiffy's encoder times less spikey, but I'm still a bit slower. It
appears to be related to floats or memory handling. Not sure how
to track this down.
The encoder can now return \u escaped unicode data instead of leaving
it as UTF-8 byte sequences. This done like so:
Eshell V5.8.3 (abort with ^G)
1> jiffy:encode(<<240, 144, 129, 128>>, [uescape]).
<<"\"\\uD800\\uDC40\"">>
The encode and decode functions now return the value directly without
being wrapped in a tuple on success. If there is an error, it is
thrown. This is to more closely match the semantics of term_to_binary
and binary_to_term.
Any number that can't be decoded in C is now passed back
to Erlang for decoding.
Large numbers passed to the encoder will make it through
and be processed in Erlang after the main encoding
process.
* Refs became atoms to make sure they can live across calls
to the NIF functions.
* Initialized curr in decode so that I'm no longer pushing
random values into the Erlang VM.