This bug was due to an interaction between two optimizations. If we
attempt to flush the buffer before any bytes are used we refused.
However, in enc_ensure we were not checking whether the buffer was
actually flushed so we would allocate a new buffer for the request.
The easiest way to encounter this issue was by encoding a raw binary
longer than 2041 bytes (i.e., `jiffy:encode(<<"stuff...">>).`).
If the input contained a mismatched end-of-array/object, the stack
could become empty before a call to dec_curr, which would look
beyond the bounds of the stack. If the value at this invalid
position happened to be st_array, we would pop too much from the
stack and overwrite the data that came before it.
This commit fixes this by letting dec_pop return the previous
state or st_invalid if the stack is empty, letting us exit
gracefully if the state isn't what we expect it to be.
dec_pop_assert is identical to the old dec_pop, tearing down the
emulator on internal errors.
number of items on `hexvals` is 128 while table size is 256, so
remaining 128 items are filled with zero. As a result, values in
\xf0-\xff will be treated as zero while should be rejected.
Some users of Jiffy have experienced issues when decoding large JSON
documents. Normally Jiffy expects smallish documents and returns any
strings as sub-binaries. When dealing with large documents these
sub-binary references can keep a large amount of RAM around unless the
user goes through and applies `binary:copy/1` on every string returned
from Jiffy. This however causes a large amount of CPU usage to do
something that Jiffy could do as it builds the JSON structure.
The `copy_strings` decoder option does exactly this. Instead of
returning sub-binaries Jiffy now copies every string into a newly
allocated binary. Users report that this fixes the memory issues while
also not negatively affecting performance significantly.
When "\u"-escaping a Unicode character, the esc_extra value doesn't
need to include the number of bytes in the input string. That is, if
a three-byte UTF-8 character is being escaped to a six-byte "\uXXXX"
sequence, esc_extra only needs to be increased by 3.
In the original PR for `return_trailer` @vlm pointed out that I wasn't
using enif_consume_timeslice correctly. This fixes that by changing out
its called.
Previously we attempted to define the total number of bytes to decode or
encode in a single NIF call and then would consume as much of the
timeslice as we processed. This is wrong because we may start the NIF
call with less than an entire timeslice left.
The new approach is to define the number of bytes to encode or decode
per reduction and then iteratively call enif_consume_timeslice until it
indicates that we should return.
Previously Jiffy would throw an error about trailing data if there is
any non-whitespace character encounter after the first term had been
decoded.
This patch adds a decoder option `return_trailer` that will instead
return a sub-binary starting at the first non-whitespace character. This
allows users to be able to decode multiple terms from a single iodata()
term.
Thanks to @vlm for the original patch.
This sounds rather insane to me but I've managed to show that `(char)
-1` is converted to 255 on some platforms. This was reproduced on
ppc64el via Qemu on OS X. A simple program that does `fprintf(stderr,
"%d\r\n", (char) -1);` prints 255 to the console. Rather than rely on
the signedness of a char I've just updated things to use an unsigned
char (which hopefully is never signed) and replaced -1 with 255 for the
sentinel value when converting hex values.
Thanks to Balint Reczey (@rbalint) for the report.
Fixes#74
This fixes a leak when encoding a bare bignum. Technically it would be
possible to hit this memory leak randomly with bignums in objects but
the chances are highly unlikely.
Thanks to @miriampena for the issue.
Fixes#69
The `val` variable is a register value that we need to be able to return
at any time from `decode_iter`. If it happened that a yield was
triggered while processing trailing whitespace the lack of persistance
caused decode to return a term intialized from a random integer value.
Obviously the Erlang VM did not enjoy this.
Thanks to @michalpalka for the report.
Fixes#66
This implements the `use_nil` option as discussed on issue #64. Passing
the atom `use_nil` as an option to both encode and decode will replace
the atom `null` with `nil` when decoding and encode `nil` as `null` when
encoding values.
Fixes#64Fixes#68
Rather than worry about truncation casting from a possibly 64bit value
down to a possibly 32bit size_t we just limit the total bytes per
invocation to 4G using an unsigned integer.
Thanks to @seriyps for the report.
Fixes#61
This patch adds initial support for decoding/encoding to/from the new
maps data type.
I'd like to thank Jihyun Yu (yjh0502) for the initial versions of this
work.
This adds a configurable limit on the number of bytes produced by
the encoder before yielding back to the Erlang VM. This is to avoid the
infamous scheduler collapse issues.
The `jiffy:encode/2` now takes an option `{bytes_per_iter,
pos_integer()}` that controls the yield frequency. The default value is
2048.
This adds a configurable limit on the number of bytes consumed by
the decoder before yielding back to the Erlang VM. This is to avoid the
infamous scheduler collapse issues.
The `jiffy:decode/2` now takes an option `{bytes_per_iter,
pos_integer()}` that controls the yield frequency. The default value is
2048.
This is ground work to allow Jiffy to yield back to the scheduler.
Creating an encoder resource will allow for the necessary state to be
carried across NIF function invocations.