This bug was due to an interaction between two optimizations. If we
attempt to flush the buffer before any bytes are used we refused.
However, in enc_ensure we were not checking whether the buffer was
actually flushed so we would allocate a new buffer for the request.
The easiest way to encounter this issue was by encoding a raw binary
longer than 2041 bytes (i.e., `jiffy:encode(<<"stuff...">>).`).
When "\u"-escaping a Unicode character, the esc_extra value doesn't
need to include the number of bytes in the input string. That is, if
a three-byte UTF-8 character is being escaped to a six-byte "\uXXXX"
sequence, esc_extra only needs to be increased by 3.
In the original PR for `return_trailer` @vlm pointed out that I wasn't
using enif_consume_timeslice correctly. This fixes that by changing out
its called.
Previously we attempted to define the total number of bytes to decode or
encode in a single NIF call and then would consume as much of the
timeslice as we processed. This is wrong because we may start the NIF
call with less than an entire timeslice left.
The new approach is to define the number of bytes to encode or decode
per reduction and then iteratively call enif_consume_timeslice until it
indicates that we should return.
This fixes a leak when encoding a bare bignum. Technically it would be
possible to hit this memory leak randomly with bignums in objects but
the chances are highly unlikely.
Thanks to @miriampena for the issue.
Fixes#69
This implements the `use_nil` option as discussed on issue #64. Passing
the atom `use_nil` as an option to both encode and decode will replace
the atom `null` with `nil` when decoding and encode `nil` as `null` when
encoding values.
Fixes#64Fixes#68
This patch adds initial support for decoding/encoding to/from the new
maps data type.
I'd like to thank Jihyun Yu (yjh0502) for the initial versions of this
work.
This adds a configurable limit on the number of bytes produced by
the encoder before yielding back to the Erlang VM. This is to avoid the
infamous scheduler collapse issues.
The `jiffy:encode/2` now takes an option `{bytes_per_iter,
pos_integer()}` that controls the yield frequency. The default value is
2048.
This is ground work to allow Jiffy to yield back to the scheduler.
Creating an encoder resource will allow for the necessary state to be
carried across NIF function invocations.
While using rebar's dependency management was surprisingly easy it
broke apps that tried to build Jiffy as a dependency due to relative
path #includes.
This also fixes a few other issues. Most notably it removes the use of
the ECMAScript compatible encoding due to JSON's lack of support for +/-
Inf and NaN.
Floating point numbers are no longer encoded as a one to one mapping
of their binary representation, but as short as possible (while still
being acurate). The double-conversion library [1] is used to do the
hard work.
The ECMAScript compatible conversion is used.
[1] https://code.google.com/p/double-conversion/
After a few requests and some reflection I've decided to stop escaping
forward slashes in strings while still accepting the escaped version
through the decoder. This appears to mimic the behavior of other popular
JSON libraries I've checked (Ruby and Python).
Apparently I missed this when refactoring a do/while loop into a
standard while loop. It was a harmless extra invocation to check if the
stack was empty.
By default Jiffy is quite strict in what it encodes. By default it will
not allow invalid UTF-8 to be produced. This can cause issues when
attempting to encode JSON that was decoded by other libraries as UTF-8
semantics are not uniformly enforced.
This patch adds an option 'force_utf8' to the encoder. If encoding hits
an error for an invalid string it will forcefully mutate the object to
contain only valid UTF-8 and return the resulting encoded JSON.
For the most part this means it will strip any garbage data from
binaries replacing it replacement codepoint U+FFFD. Although, it will
also try and the common error of encoding surrogate pairs as three-byte
sequences and reencode them into UTF-8 properly.
Numbers like 1.0 were being encoded as <<"1">> which can lead to a bit
of confusion. This merely checks if a decimial point exists and if not
it appends ".0" to the value.
Based on the buffer doubling scheme in Yajl. This seems to make
Jiffy's encoder times less spikey, but I'm still a bit slower. It
appears to be related to floats or memory handling. Not sure how
to track this down.
The encoder can now return \u escaped unicode data instead of leaving
it as UTF-8 byte sequences. This done like so:
Eshell V5.8.3 (abort with ^G)
1> jiffy:encode(<<240, 144, 129, 128>>, [uescape]).
<<"\"\\uD800\\uDC40\"">>
The encode and decode functions now return the value directly without
being wrapped in a tuple on success. If there is an error, it is
thrown. This is to more closely match the semantics of term_to_binary
and binary_to_term.
Any number that can't be decoded in C is now passed back
to Erlang for decoding.
Large numbers passed to the encoder will make it through
and be processed in Erlang after the main encoding
process.
* Refs became atoms to make sure they can live across calls
to the NIF functions.
* Initialized curr in decode so that I'm no longer pushing
random values into the Erlang VM.