erlAarango 二进制序列化库
Nie możesz wybrać więcej, niż 25 tematów Tematy muszą się zaczynać od litery lub cyfry, mogą zawierać myślniki ('-') i mogą mieć do 35 znaków.

654 wiersze
27 KiB

  1. VelocyPack (VPack)
  2. ==================
  3. Version 1
  4. VelocyPack (VPack) is a fast and compact serialization format
  5. ## Generalities
  6. VPack is (unsigned) byte oriented, so VPack values are simply sequences
  7. of bytes and are platform independent. Values are not necessarily
  8. aligned, so all access to larger subvalues must be properly organised to
  9. avoid alignment assumptions of the CPU.
  10. ## Value types
  11. We describe a single VPack value, which is recursive in nature, but
  12. resides (with two exceptions, see below) in a single contiguous block of
  13. memory. Assume that the value starts at address A, the first byte V
  14. indicates the type (and often the length) of the VPack value at hand:
  15. We first give an overview with a brief but accurate description for
  16. reference, for arrays and objects see below for details:
  17. - `0x00` : none - this indicates absence of any type and value,
  18. this is not allowed in VPack values
  19. - `0x01` : empty array
  20. - `0x02` : array without index table (all subitems have the same
  21. byte length), 1-byte byte length
  22. - `0x03` : array without index table (all subitems have the same
  23. byte length), 2-byte byte length
  24. - `0x04` : array without index table (all subitems have the same
  25. byte length), 4-byte byte length
  26. - `0x05` : array without index table (all subitems have the same
  27. byte length), 8-byte byte length
  28. - `0x06` : array with 1-byte index table offsets, bytelen and # subvals
  29. - `0x07` : array with 2-byte index table offsets, bytelen and # subvals
  30. - `0x08` : array with 4-byte index table offsets, bytelen and # subvals
  31. - `0x09` : array with 8-byte index table offsets, bytelen and # subvals
  32. - `0x0a` : empty object
  33. - `0x0b` : object with 1-byte index table offsets, sorted by
  34. attribute name, 1-byte bytelen and # subvals
  35. - `0x0c` : object with 2-byte index table offsets, sorted by
  36. attribute name, 2-byte bytelen and # subvals
  37. - `0x0d` : object with 4-byte index table offsets, sorted by
  38. attribute name, 4-byte bytelen and # subvals
  39. - `0x0e` : object with 8-byte index table offsets, sorted by
  40. attribute name, 8-byte bytelen and # subvals
  41. - `0x0f` : object with 1-byte index table offsets, not sorted by
  42. attribute name, 1-byte bytelen and # subvals - OBSOLETE
  43. - `0x10` : object with 2-byte index table offsets, not sorted by
  44. attribute name, 2-byte bytelen and # subvals - OBSOLETE
  45. - `0x11` : object with 4-byte index table offsets, not sorted by
  46. attribute name, 4-byte bytelen and # subvals - OBSOLETE
  47. - `0x12` : object with 8-byte index table offsets, not sorted by
  48. attribute name, 8-byte bytelen and # subvals - OBSOLETE
  49. - `0x13` : compact array, no index table
  50. - `0x14` : compact object, no index table
  51. - `0x15`-`0x16` : reserved
  52. - `0x17` : illegal - this type can be used to indicate a value that
  53. is illegal in the embedding application
  54. - `0x18` : null
  55. - `0x19` : false
  56. - `0x1a` : true
  57. - `0x1b` : double IEEE-754, 8 bytes follow, stored as little
  58. endian uint64 equivalent
  59. - `0x1c` : UTC-date in milliseconds since the epoch, stored as 8 byte
  60. signed int, little endian, two's complement
  61. - `0x1d` : external (only in memory): a char* pointing to the actual
  62. place in memory, where another VPack item resides, not
  63. allowed in VPack values on disk or on the network
  64. - `0x1e` : minKey, nonsensical value that compares < than all other values
  65. - `0x1f` : maxKey, nonsensical value that compares > than all other values
  66. - `0x20`-`0x27` : signed int, little endian, 1 to 8 bytes, number is V - `0x1f`,
  67. two's complement
  68. - `0x28`-`0x2f` : uint, little endian, 1 to 8 bytes, number is V - `0x27`
  69. - `0x30`-`0x39` : small integers 0, 1, ... 9
  70. - `0x3a`-`0x3f` : small negative integers -6, -5, ..., -1
  71. - `0x40`-`0xbe` : UTF-8-string, using V - `0x40` bytes (not Unicode characters!),
  72. length 0 is possible, so `0x40` is the empty string,
  73. maximal length is 126, note that strings here are not
  74. zero-terminated and may contain NUL bytes
  75. - `0xbf` : long UTF-8-string, next 8 bytes are length of string in
  76. bytes (not Unicode characters) as little endian unsigned
  77. integer, note that long strings are not zero-terminated
  78. and may contain NUL bytes
  79. - `0xc0`-`0xc7` : binary blob, next V - `0xbf` bytes are the length of blob in
  80. bytes, note that binary blobs are not zero-terminated
  81. - `0xc8`-`0xcf` : positive long packed BCD-encoded float, V - `0xc7` bytes follow
  82. that encode in a little endian way the length of the
  83. mantissa in bytes. Directly after that follow 4 bytes
  84. encoding the (power of 10) exponent, by which the mantissa
  85. is to be multiplied, stored as little endian two's
  86. complement signed 32-bit integer. After that, as many
  87. bytes follow as the length information at the beginning
  88. has specified, each byte encodes two digits in
  89. big-endian packed BCD.
  90. Example: 12345 decimal can be encoded as
  91. `c8 03 00 00 00 00 01 23 45` or
  92. `c8 03 ff ff ff ff 12 34 50`
  93. - `0xd0`-`0xd7` : negative long packed BCD-encoded float, V - `0xcf` bytes
  94. follow that encode in a little endian way the length of
  95. the mantissa in bytes. After that, same as positive long
  96. packed BCD-encoded float above.
  97. - `0xd8`-`0xed` : reserved
  98. - `0xee`-`0xef` : value tagging for logical types
  99. - `0xf0`-`0xff` : custom types
  100. ## Arrays
  101. Empty arrays are simply a single byte `0x01`.
  102. We next describe the type cases `0x02` to `0x09`, see below for the
  103. special compact type `0x13`.
  104. Non-empty arrNRI TEMSays look like one of the following:
  105. one of 0x02 to 0x05
  106. BYTELENGTH
  107. OPTIONAL UNUSED: padding
  108. sub VPack values
  109. or
  110. 0x06
  111. BYTELENGTH in 1 byte
  112. NRITEMS in 1 byte
  113. OPTIONAL UNUSED: 6 bytes of padding
  114. sub VPack values
  115. INDEXTABLE with 1 byte per entry
  116. or
  117. 0x07
  118. BYTELENGTH in 2 bytes
  119. NRITEMS in 2 bytes
  120. OPTIONAL UNUSED: 4 bytes of padding
  121. sub VPack values
  122. INDEXTABLE with 4 byte per entry
  123. or
  124. 0x08
  125. BYTELENGTH in 4 bytes
  126. NRITEMS in 4 bytes
  127. sub VPack values
  128. INDEXTABLE with 4 byte per entry
  129. or
  130. 0x09
  131. BYTELENGTH in 8 bytes
  132. sub VPack values
  133. INDEXTABLE with 8 byte per entry
  134. NRITEMS in 8 bytes
  135. If any optional padding is allowed for a type, the padding must consist
  136. of exactly that many bytes that the length of the padding, the length of
  137. BYTELENGTH and the length of NRITEMS (if present) sums up to 8. If the
  138. length of BYTELENGTH is already 8, there is no padding allowed. The
  139. entire padding must consist of zero bytes (ASCII NUL).
  140. Numbers (for byte length, number of subvalues and offsets in the
  141. INDEXTABLE) are little endian unsigned integers, using 1 byte for
  142. types `0x02` and `0x06`, 2 bytes for types `0x03` and `0x07`, 4 bytes for types
  143. `0x04` and `0x08`, and 8 bytes for types `0x05` and `0x09`.
  144. NRITEMS is a single number as described above.
  145. The INDEXTABLE consists of:
  146. - for types `0x06`-`0x09` an array of offsets (unaligned, in the number
  147. format described above) earlier offsets reside at lower addresses.
  148. Offsets are measured from the start of the VPack value.
  149. Non-empty arrays of types `0x06` to `0x09` have a small header including
  150. their byte length, the number of subvalues, then all the subvalues and
  151. finally an index table containing offsets to the subvalues. To find the
  152. index table, find the number of subvalues, then the end, and from that
  153. the base of the index table, considering how wide its entries are.
  154. For types `0x02` to `0x05` there is no offset table and no number of items.
  155. The first item begins at address A+2, A+3, A+5 or respectively A+9,
  156. depending on the type and thus the width of the byte length field. Note
  157. the following special rule: The actual position of the first subvalue
  158. is allowed to be further back, after some run of padding zero bytes.
  159. For example, if 2 bytes are used for both the byte length (BYTELENGTH),
  160. then an optional padding of 4 zero bytes is then allowed to follow, and
  161. the actual VPack subvalues can start at A+9.
  162. This is to give a program that builds a VPack value the opportunity to
  163. reserve 8 bytes in the beginning and only later find out that fewer bytes
  164. suffice to write the byte length. One can determine the number of
  165. subvalues by finding the first subvalue, its byte length, and
  166. dividing the amount of available space by it.
  167. For types `0x06` to `0x09` the offset table describes where the subvalues
  168. reside. It is not necessary for the subvalues to start immediately after
  169. the number of subvalues field.
  170. As above, it is allowed to include optional padding. Again here, any
  171. padding must consist of a run of consecutive zero bytes (ASCII NUL) and
  172. must be as long that it fills up the length of BYTELENGTH and the length
  173. of NRITEMS to 8.
  174. For example, if both BYTELENGTH and NRITEMS can be expressed using 2 bytes
  175. each, the sum of their lengths is 4. It is therefore allowed to add 4
  176. bytes of padding here, so that the first subvalue could be at address A+9.
  177. There is one exception for the 8-byte numbers case (type `0x05`):
  178. In this case the number of elements is moved behind the index table.
  179. This is to get away without moving memory when one has reserved 8 bytes
  180. in the beginning and later noticed that all 8 bytes are needed for the
  181. byte length. For this case it is not allowed to include any padding.
  182. All offsets are measured from base A.
  183. *Example*:
  184. `[1,2,3]` has the hex dump
  185. 02 05 31 32 33
  186. in the most compact representation, but the following are equally
  187. possible, though not necessarily advised to use:
  188. *Examples*:
  189. 03 06 00 31 32 33
  190. 04 08 00 00 00 31 32 33
  191. 05 0c 00 00 00 00 00 00 00 31 32 33
  192. 06 09 03 31 32 33 03 04 05
  193. 07 0e 00 03 00 31 32 33 05 00 06 00 07 00
  194. 08 18 00 00 00 03 00 00 00 31 32 33 09 00 00 00 0a 00 00 00 0b 00 00 00
  195. 09
  196. 2c 00 00 00 00 00 00 00
  197. 31 32 33
  198. 09 00 00 00 00 00 00 00
  199. 0a 00 00 00 00 00 00 00
  200. 0b 00 00 00 00 00 00 00
  201. 03 00 00 00 00 00 00 00
  202. Note that it is not recommended to encode short arrays in too long a
  203. format.
  204. We now describe the special type `0x13`, which is useful for a
  205. particularly compact array representation. Note that to some extent this
  206. goes against the principles of the VelocyPack format, since quick access
  207. to subvalues is no longer possible, all items in the array must be
  208. scanned to find a particular one. However, there are certain use cases
  209. for VelocyPack which only require sequential access (for example JSON
  210. dumping) and have a particular need for compactness.
  211. The overall format of this array type is
  212. 0x13 as type byte
  213. BYTELENGTH
  214. sub VPack values
  215. NRITEMS
  216. There is no index table at all, although the sub VelocyPack values can
  217. have different byte sizes. The BYTELENGTH and NRITEMS are encoded in a
  218. special format, which we describe now.
  219. The BYTELENGTH consists of 1 to 8 bytes, of which all but the last one
  220. have their high bit set. Thus, the high bits determine, how many bytes
  221. are actually used. The lower 7 bits of all these bits together comprise
  222. the actual byte length in a little endian fashion. That is, the byte at
  223. address A+1 contains the least significant 7 bits (0 to 6) of the byte length,
  224. the following byte at address A+2 contains the bits 7 to 13, and so on.
  225. Since the total number of bytes is limited to 8, this encodes unsigned
  226. integers of up to 56 bits, which is the overall limit for the size of
  227. such a compact array representation.
  228. The NRITEMS entry is encoded essentially the same, except that it is
  229. laid out in reverse order in memory. That is, one has to use the
  230. BYTELENGTH to find the end of the array value and go back bytes until
  231. one finds a byte with high bit reset. The last byte (at the highest
  232. memory address) contains the least significant 7 bits of the NRITEMS
  233. value, the second one bits 7 to 13 and so on.
  234. Here is an example, the array [1, 16] can be encoded as follows:
  235. 13 06
  236. 31 28 10
  237. 02
  238. ## Objects
  239. Empty objects are simply a single byte `0x0a`.
  240. We next describe the type cases `0x0b` to `0x12`, see below for the
  241. special compact type `0x14`.
  242. Non-empty objects look like this:
  243. one of 0x0b - 0x12
  244. BYTELENGTH
  245. optional NRITEMS
  246. sub VPack values as pairs of attribute and value
  247. optional INDEXTABLE
  248. NRITEMS for the 8-byte case
  249. Numbers (for byte length, number of subvalues and offsets in the
  250. INDEXTABLE) are little endian unsigned integers, using 1 byte for
  251. types `0x0b` and `0x0f`, 2 bytes for types `0x0c` and `0x10`, 4 bytes for types
  252. `0x0d` and `0x11`, and 8 bytes for types `0x0e` and `0x12`.
  253. NRITEMS is a single number as described above.
  254. The INDEXTABLE consists of:
  255. - an array of offsets (unaligned, in the number format described
  256. above) earlier offsets reside at lower addresses.
  257. Offsets are measured from the beginning of the VPack value.
  258. Non-empty objects have a small header including their byte length, the
  259. number of subvalues, then all the subvalues and finally an index table
  260. containing offsets to the subvalues. To find the index table, find
  261. number of subvalues, then the end, and from that the base of the index
  262. table, considering how wide its entries are.
  263. For all types the offset table describes where the subvalues reside. It
  264. is not necessary for the subvalues to start immediately after the number
  265. of subvalues field. For performance reasons when building the value, it
  266. could be desirable to reserve 8 bytes for the byte length and the number
  267. of subvalues and not fill the gap, even though it turns out later that
  268. offsets and thus the byte length only uses 2 bytes, say.
  269. There is one special case: the empty object is simply stored as the
  270. single byte `0x0a`.
  271. There is another exception: For 8-byte numbers (`0x12`) the number of
  272. subvalues is stored behind the INDEXTABLE. This is to get away without
  273. moving memory when one has reserved 8 bytes in the beginning and later
  274. noticed that all 8 bytes are needed for the byte length.
  275. All offsets are measured from base A.
  276. Each entry consists of two parts, the key and the value, they are
  277. encoded as normal VPack values as above, the first is always a short or
  278. long UTF-8 string starting with a byte `0x40`-`0xbf` as described below. The
  279. second is any other VPack value.
  280. There is one extension: For the key it is possible to use the positive
  281. small integer values `0x30`-`0x39` or an unsigned integer starting with a
  282. type byte of `0x28`-`0x2f`. Any such integer value is an index into an
  283. outside-given table of attribute names. These are convenient when only
  284. very few attribute names occur or some are repeated very often. The
  285. standard way to encode such an attribute name table is as a VPack array
  286. of strings as specified here.
  287. Objects are always stored with sorted key/value pairs, sorted by bytewise
  288. comparisons of the keys on each nesting level. Sorting has some overhead
  289. but will allow looking up keys in logarithmic time later. Note that only the
  290. index table needs to be sorted, it is not required that the offsets in
  291. these tables are increasing. Since the index table resides after the actual
  292. subvalues, one can build up a complex VPack value by writing linearly.
  293. Example: the object `{"a": 12, "b": true, "c": "xyz"}` can have the hexdump:
  294. 0b
  295. 13 03
  296. 41 62 1a
  297. 41 61 28 0c
  298. 41 63 43 78 79 7a
  299. 06 03 0a
  300. The same object could have been done with an index table with longer
  301. entries, as in this example:
  302. 0d
  303. 22 00 00 00
  304. 03 00 00 00
  305. 41 62 1a
  306. 41 61 28 0c
  307. 41 63 43 78 79 7a
  308. 0c 00 00 00 09 00 00 00 10 00 00 00
  309. Similarly with type `0x0c` and 2-byte offsets, byte length and number of
  310. subvalues, or with type `0x0e` and 8-byte numbers.
  311. Note that it is not recommended to encode short objects with too long
  312. index tables.
  313. ### Special compact objects
  314. We now describe the special type `0x14`, which is useful for a
  315. particularly compact object representation. Note that to some extent
  316. this goes against the principles of the VelocyPack format, since quick
  317. access to subvalues is no longer possible, all key/value pairs in the
  318. object must be scanned to find a particular one. However, there are
  319. certain use cases for VelocyPack which only require sequential access
  320. (for example JSON dumping) and have a particular need for compactness.
  321. The overall format of this object type is
  322. 0x14 as type byte
  323. BYTELENGTH
  324. sub VPack key/value pairs
  325. NRPAIRS
  326. There is no index table at all, although the sub VelocyPack values can
  327. have different byte sizes. The BYTELENGTH and NRPAIRS are encoded in a
  328. special format, which we describe now. It is the same as for the special
  329. compact array type `0x13`, which we repeat here for the sake of
  330. completeness.
  331. The BYTELENGTH consists of 1 to 8 bytes, of which all but the last one
  332. have their high bit set. Thus, the high bits determine, how many bytes
  333. are actually used. The lower 7 bits of all these bits together comprise
  334. the actual byte length in a little endian fashion. That is, the byte at
  335. address A+1 contains the least significant 7 bits (0 to 6) of the byte
  336. length, the following byte at address A+2 contains the bits 7 to 13, and
  337. so on. Since the total number of bytes is limited to 8, this encodes
  338. unsigned integers of up to 56 bits, which is the overall limit for the
  339. size of such a compact array representation.
  340. The NRPAIRS entry is encoded essentially the same, except that it
  341. is laid out in reverse order in memory. That is, one has to use the
  342. BYTELENGTH to find the end of the array value and go back bytes until
  343. one finds a byte with high bit reset. The last byte (at the highest
  344. memory address) contains the least significant 7 bits of the NRPAIRS
  345. value, the second one bits 7 to 13 and so on.
  346. Here is an example, the object `{"a":1, "b":16}` can be encoded as follows:
  347. 14 0a
  348. 41 61 31 42 62 28 10
  349. 02
  350. ## Doubles
  351. Type `0x1b` indicates a double IEEE-754 value using the 8 bytes following
  352. the type byte. To guarantee platform-independentness the details of the
  353. byte order are as follows. Encoding is done by using memcpy to copy the
  354. internal double value to an uint64\_t. This 64-bit unsigned integer is
  355. then stored as little endian 8 byte integer in the VPack value. Decoding
  356. works in the opposite direction. This should sort out the undetermined
  357. byte order in IEEE-754 in practice.
  358. ## Dates
  359. Type `0x1c` indicates a signed 64-int integer stored in 8 bytes little
  360. endian two's complement notation directly after the type. The value means
  361. a universal UTC-time measured in milliseconds since the epoch, which is
  362. 00:00 on 1 January 1970 UTC.
  363. ## External VPack values
  364. This type is only for use within memory, not for data exchange over disk
  365. or network. Therefore, we only need to specify that the following k
  366. bytes are the memcpy of a char* on the current architecture. That char*
  367. points to the actual VPack value elsewhere in memory.
  368. ## Artificial minimal and maximal keys
  369. These values of types `0x1e` and `0x1f` have no meaning other than comparing
  370. smaller or greater respectively than any other VPack value. The idea is
  371. that these can be used in systems that define a total order on all VPack
  372. values to specify left or right ends of infinite intervals.
  373. ## Integer types
  374. There are different ways to specify integers. For small values -6 to 9
  375. inclusively there are specific type bytes in the range `0x30` to `0x3f` to
  376. allow for storage in a single byte. After that there are signed and
  377. unsigned integer types that can code in the type byte the number of
  378. bytes used (ranges `0x20`-`0x27` for signed and `0x28`-`0x2f` for unsigned).
  379. ## Null and boolean values
  380. These three values use a single byte to store the corresponding JSON
  381. values.
  382. ## Strings
  383. Strings are stored as UTF-8 encoded byte sequences. There are two
  384. variants, a short one and a long one. In the short one, the byte length
  385. (not the number of UTF-8 characters) is directly encoded in the type,
  386. and this works up to and including byte length 126. Types `0x40` to `0xbe`
  387. are used for this and the byte length is V - `0x3f`, if V is the type
  388. byte. For strings longer than 126 bytes, the type byte is `0xbf` and the
  389. byte length of the string is stored in the first 8 bytes after the type
  390. byte, using a little endian unsigned integer representation. The actual
  391. string follows after these 8 bytes. There is no terminating zero byte in
  392. either case and the string may contain zero bytes.
  393. ## Binary data
  394. The type bytes `0xc0` to `0xc7` allow to store arbitrary binary byte
  395. sequences as a VPack value. The format is as follows: If V is the type
  396. byte, then V - `0xbf` bytes follow it to make a little endian unsigned
  397. integer representing the length of the binary data, which directly
  398. follows these length bytes. No alignment is guaranteed. The content is
  399. entirely up to the user.
  400. ## Packed BCD long floats
  401. These types are used to represent arbitrary precision decimal numbers.
  402. There are different types for positive and negative numbers. The overall
  403. format of these values is:
  404. one of 0xc8 - 0xcf (positive) or of 0xd0 - 0xd7 (negative)
  405. LENGTH OF MANTISSA in bytes
  406. EXPONENT (as 4-byte little endian signed two's complement integer)
  407. MANTISSA (as packed BCD-encoded integer, big-endian)
  408. The type byte describes the sign of the number as well as the number of
  409. bytes used to specify the byte length of the mantissa. As usual, if V is
  410. the type byte, then V - `0xc7` (in the positive case) or V - `0xcf` (in the
  411. negative case) bytes are used for the length of the mantissa, stored as
  412. little endian unsigned integer directly after the byte length. After
  413. this follow exactly 4 bytes (little endian signed two's complement
  414. integer) to specify the exponent. After the exponent, the actual
  415. mantissa bytes follow.
  416. Packed BCD is used, so that each byte stores exactly 2 decimal digits as
  417. in `0x34` for the decimal digits 34. Therefore, the mantissa always has an
  418. even number of decimal digits. Note that the mantissa is stored in big
  419. endian form, to make parsing and dumping efficient. This leads to the
  420. "unholy nibble problem": When a JSON parser sees the beginning of a
  421. longish number, it does not know whether an even or odd number of digits
  422. follow. However, for efficiency reasons it wants to start writing bytes
  423. to the output as it reads the input. This is, where the exponent comes
  424. to the rescue, which is illustrated by the following example.
  425. 12345 decimal can be encoded as:
  426. c8 03 00 00 00 00 01 23 45
  427. c8 03 ff ff ff ff 12 34 50
  428. The former encoding puts a leading 0 in the first byte and uses exponent
  429. 0, the latter encoding directly starts putting two decimal digits in one
  430. byte and then in the end has to "erase" the trailing 0 by using exponent
  431. -1, encoded by the 4 byte sequence `ff ff ff ff`.
  432. Therefore, the unholy nibble problem is solved and parsing (and indeed
  433. dumping) can be efficient.
  434. ## Tagging
  435. Types `0xee`-`0xef` are used for tagging of values to implement logical
  436. types.
  437. For example, if type `0x1c` did not exist, the database driver could
  438. serialize a timestamp object (Date in JavaScript, Instant in Java, etc)
  439. into a Unix timestamp, a 64-bit integer. Assuming the lack of schema,
  440. upon deserialization it would not be possible to tell an integer from
  441. a timestamp and deserialize the value accordingly.
  442. Type tagging resolves this by attaching an integer tag to values that
  443. can then be read when deserializing the value, e.g. that tag=1 is a
  444. timestamp and the relevant timestamp class should be used.
  445. The tag values are specified separately and applications can also
  446. specify their own to have the database driver deserialize their specific
  447. data types into the appropriate classes (including models).
  448. Essentially this is object-relational mapping for parts of documents.
  449. The format of the type is:
  450. 0xee
  451. TAG number in 1 byte
  452. sub VPack value
  453. or
  454. 0xef
  455. TAG number in 8 bytes, little-endian encoding
  456. sub VPack value
  457. ## Custom types
  458. Note that custom types should usually not be used for data exchange but
  459. only internally in systems. Nevertheless, the design of this part of
  460. the specification is made such that it is possible by generic methods
  461. to derive the byte length of each custom data type.
  462. The following user-defined types exist:
  463. - `0xf0` : 1 byte payload, directly following the type byte
  464. - `0xf1` : 2 bytes payload, directly following the type byte
  465. - `0xf2` : 4 bytes payload, directly following the type byte
  466. - `0xf3` : 8 bytes payload, directly following the type byte
  467. - `0xf4`-`0xf6` : length of the payload is described by a single further
  468. unsigned byte directly following the type byte, the
  469. payload of that many bytes follows
  470. - `0xf7`-`0xf9` : length of the payload is described by two bytes (little
  471. endian unsigned integer) directly following the type
  472. byte, the payload of that many bytes follows
  473. - `0xfa`-`0xfc` : length of the payload is described by four bytes (little
  474. endian unsigned integer) directly following the type
  475. byte, the payload of that many bytes follows
  476. - `0xfd`-`0xff` : length of the payload is described by eight bytes (little
  477. endian unsigned integer) directly following the type
  478. byte, the payload of that many bytes follows
  479. Note: In types `0xf4` to `0xff` the "payload" refers to the actual data not
  480. including the length specification.
  481. ## Portability
  482. Serialized booleans, integers, strings, arrays, objects etc. all have a
  483. defined endianess and length, which is platform-independent. These types are
  484. fully portable in serialized VelocyPack.
  485. There are still a few caveats when it comes to portability:
  486. It is possible to build up very large values on a 64 bit system, but it may not be
  487. possible to read them back on a 32 bit system. This is because the maximum memory
  488. allocation size on a 32 bit system may be severely limited compared to a 64 bit system,
  489. i.e. a 32 bit OS may simply not allow to allocate buffers larger than 4 GB. This
  490. is not a limitation of VelocyPack, but a limitation of 32 bit architectures.
  491. If all VelocyPack values are kept small enough so that they are well below the
  492. 32 bit length boundaries, this should not matter though.
  493. The VelocyPack type *External* contains just a raw pointer to memory, which should
  494. only be used during the buildup of VelocyPack values in memory. The *External* type
  495. is not supposed to be used in VelocyPack values that are serialized and stored
  496. persistently, and then later read back from persistence. Doing it anyway is not
  497. portable and will also pose a security risk.
  498. Not using the *External* type for any data that is serialized will avoid this problem
  499. entirely.
  500. The VelocyPack type *Custom* is completely user-defined, and there is no default
  501. implementation for them. So it is up to the embedder to make these custom type
  502. bindings portable if portability of them is a concern.
  503. VelocyPack *Double* values are serialized as integer equivalents in a specific way,
  504. and unserialized back into integers that overlay a IEEE-754 double-precision
  505. floating point value in memory. We found this to be sufficiently portable for our
  506. needs, although at least in theory there may be portability issues with some systems.
  507. The [following](https://en.wikipedia.org/wiki/Endianness#Floating_point) was used as
  508. a backing for our "reasonably portable in the real world" assumptions:
  509. > It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness.[17] Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type.