If the input contained a mismatched end-of-array/object, the stack
could become empty before a call to dec_curr, which would look
beyond the bounds of the stack. If the value at this invalid
position happened to be st_array, we would pop too much from the
stack and overwrite the data that came before it.
This commit fixes this by letting dec_pop return the previous
state or st_invalid if the stack is empty, letting us exit
gracefully if the state isn't what we expect it to be.
dec_pop_assert is identical to the old dec_pop, tearing down the
emulator on internal errors.
Some users of Jiffy have experienced issues when decoding large JSON
documents. Normally Jiffy expects smallish documents and returns any
strings as sub-binaries. When dealing with large documents these
sub-binary references can keep a large amount of RAM around unless the
user goes through and applies `binary:copy/1` on every string returned
from Jiffy. This however causes a large amount of CPU usage to do
something that Jiffy could do as it builds the JSON structure.
The `copy_strings` decoder option does exactly this. Instead of
returning sub-binaries Jiffy now copies every string into a newly
allocated binary. Users report that this fixes the memory issues while
also not negatively affecting performance significantly.
In the original PR for `return_trailer` @vlm pointed out that I wasn't
using enif_consume_timeslice correctly. This fixes that by changing out
its called.
Previously we attempted to define the total number of bytes to decode or
encode in a single NIF call and then would consume as much of the
timeslice as we processed. This is wrong because we may start the NIF
call with less than an entire timeslice left.
The new approach is to define the number of bytes to encode or decode
per reduction and then iteratively call enif_consume_timeslice until it
indicates that we should return.
Previously Jiffy would throw an error about trailing data if there is
any non-whitespace character encounter after the first term had been
decoded.
This patch adds a decoder option `return_trailer` that will instead
return a sub-binary starting at the first non-whitespace character. This
allows users to be able to decode multiple terms from a single iodata()
term.
Thanks to @vlm for the original patch.
The `val` variable is a register value that we need to be able to return
at any time from `decode_iter`. If it happened that a yield was
triggered while processing trailing whitespace the lack of persistance
caused decode to return a term intialized from a random integer value.
Obviously the Erlang VM did not enjoy this.
Thanks to @michalpalka for the report.
Fixes#66
This implements the `use_nil` option as discussed on issue #64. Passing
the atom `use_nil` as an option to both encode and decode will replace
the atom `null` with `nil` when decoding and encode `nil` as `null` when
encoding values.
Fixes#64Fixes#68
This patch adds initial support for decoding/encoding to/from the new
maps data type.
I'd like to thank Jihyun Yu (yjh0502) for the initial versions of this
work.
This adds a configurable limit on the number of bytes consumed by
the decoder before yielding back to the Erlang VM. This is to avoid the
infamous scheduler collapse issues.
The `jiffy:decode/2` now takes an option `{bytes_per_iter,
pos_integer()}` that controls the yield frequency. The default value is
2048.
This is ground work to allow Jiffy to yield back to the scheduler.
Creating a decoder resource will allow for the necessary state to be
carried across NIF function invocations.
A single quote input was causing segfaults due to sneaking past the
string termination logic. This patch corrects that lapse in conditional
by only parsing strings where a closing quote was found. All other
strings are rejected as invalid.
Big thanks to Jean-Charles Campagne (@jccampagne) for reporting the
issue.
The encoder can now return \u escaped unicode data instead of leaving
it as UTF-8 byte sequences. This done like so:
Eshell V5.8.3 (abort with ^G)
1> jiffy:encode(<<240, 144, 129, 128>>, [uescape]).
<<"\"\\uD800\\uDC40\"">>
The encode and decode functions now return the value directly without
being wrapped in a tuple on success. If there is an error, it is
thrown. This is to more closely match the semantics of term_to_binary
and binary_to_term.
Any number that can't be decoded in C is now passed back
to Erlang for decoding.
Large numbers passed to the encoder will make it through
and be processed in Erlang after the main encoding
process.
* Refs became atoms to make sure they can live across calls
to the NIF functions.
* Initialized curr in decode so that I'm no longer pushing
random values into the Erlang VM.