| |
| |
| |
| |
| |
| |
| Internet Engineering Task Force (IETF) C. Bormann |
| Request for Comments: 7049 Universitaet Bremen TZI |
| Category: Standards Track P. Hoffman |
| ISSN: 2070-1721 VPN Consortium |
| October 2013 |
| |
| |
| Concise Binary Object Representation (CBOR) |
| |
| Abstract |
| |
| The Concise Binary Object Representation (CBOR) is a data format |
| whose design goals include the possibility of extremely small code |
| size, fairly small message size, and extensibility without the need |
| for version negotiation. These design goals make it different from |
| earlier binary serializations such as ASN.1 and MessagePack. |
| |
| Status of This Memo |
| |
| This is an Internet Standards Track document. |
| |
| This document is a product of the Internet Engineering Task Force |
| (IETF). It represents the consensus of the IETF community. It has |
| received public review and has been approved for publication by the |
| Internet Engineering Steering Group (IESG). Further information on |
| Internet Standards is available in Section 2 of RFC 5741. |
| |
| Information about the current status of this document, any errata, |
| and how to provide feedback on it may be obtained at |
| http://www.rfc-editor.org/info/rfc7049. |
| |
| Copyright Notice |
| |
| Copyright (c) 2013 IETF Trust and the persons identified as the |
| document authors. All rights reserved. |
| |
| This document is subject to BCP 78 and the IETF Trust's Legal |
| Provisions Relating to IETF Documents |
| (http://trustee.ietf.org/license-info) in effect on the date of |
| publication of this document. Please review these documents |
| carefully, as they describe your rights and restrictions with respect |
| to this document. Code Components extracted from this document must |
| include Simplified BSD License text as described in Section 4.e of |
| the Trust Legal Provisions and are provided without warranty as |
| described in the Simplified BSD License. |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 1] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Table of Contents |
| |
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 |
| 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 |
| 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 |
| 2. Specification of the CBOR Encoding . . . . . . . . . . . . . 6 |
| 2.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 7 |
| 2.2. Indefinite Lengths for Some Major Types . . . . . . . . . 9 |
| 2.2.1. Indefinite-Length Arrays and Maps . . . . . . . . . . 9 |
| 2.2.2. Indefinite-Length Byte Strings and Text Strings . . . 11 |
| 2.3. Floating-Point Numbers and Values with No Content . . . . 12 |
| 2.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 14 |
| 2.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 16 |
| 2.4.2. Bignums . . . . . . . . . . . . . . . . . . . . . . . 16 |
| 2.4.3. Decimal Fractions and Bigfloats . . . . . . . . . . . 17 |
| 2.4.4. Content Hints . . . . . . . . . . . . . . . . . . . . 18 |
| 2.4.4.1. Encoded CBOR Data Item . . . . . . . . . . . . . 18 |
| 2.4.4.2. Expected Later Encoding for CBOR-to-JSON |
| Converters . . . . . . . . . . . . . . . . . . . 18 |
| 2.4.4.3. Encoded Text . . . . . . . . . . . . . . . . . . 19 |
| 2.4.5. Self-Describe CBOR . . . . . . . . . . . . . . . . . 19 |
| 3. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 20 |
| 3.1. CBOR in Streaming Applications . . . . . . . . . . . . . 20 |
| 3.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 21 |
| 3.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 21 |
| 3.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 22 |
| 3.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 22 |
| 3.3.3. Unknown Additional Information Values . . . . . . . . 23 |
| 3.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 23 |
| 3.5. Handling Unknown Simple Values and Tags . . . . . . . . . 24 |
| 3.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 24 |
| 3.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 25 |
| 3.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 26 |
| 3.9. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 26 |
| 3.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 28 |
| 4. Converting Data between CBOR and JSON . . . . . . . . . . . . 29 |
| 4.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 29 |
| 4.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 30 |
| 5. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 31 |
| 5.1. Extension Points . . . . . . . . . . . . . . . . . . . . 32 |
| 5.2. Curating the Additional Information Space . . . . . . . . 33 |
| 6. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 33 |
| 6.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 34 |
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 |
| 7.1. Simple Values Registry . . . . . . . . . . . . . . . . . 35 |
| 7.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 35 |
| 7.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 36 |
| 7.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 37 |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 2] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 7.5. The +cbor Structured Syntax Suffix Registration . . . . . 37 |
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 |
| 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 |
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 |
| 10.1. Normative References . . . . . . . . . . . . . . . . . . 39 |
| 10.2. Informative References . . . . . . . . . . . . . . . . . 40 |
| Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 41 |
| Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 45 |
| Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 48 |
| Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 50 |
| Appendix E. Comparison of Other Binary Formats to CBOR's Design |
| Objectives . . . . . . . . . . . . . . . . . . . . . 51 |
| E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 52 |
| E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 52 |
| E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 53 |
| E.4. UBJSON . . . . . . . . . . . . . . . . . . . . . . . . . 53 |
| E.5. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 53 |
| E.6. Conciseness on the Wire . . . . . . . . . . . . . . . . . 53 |
| |
| 1. Introduction |
| |
| There are hundreds of standardized formats for binary representation |
| of structured data (also known as binary serialization formats). Of |
| those, some are for specific domains of information, while others are |
| generalized for arbitrary data. In the IETF, probably the best-known |
| formats in the latter category are ASN.1's BER and DER [ASN.1]. |
| |
| The format defined here follows some specific design goals that are |
| not well met by current formats. The underlying data model is an |
| extended version of the JSON data model [RFC4627]. It is important |
| to note that this is not a proposal that the grammar in RFC 4627 be |
| extended in general, since doing so would cause a significant |
| backwards incompatibility with already deployed JSON documents. |
| Instead, this document simply defines its own data model that starts |
| from JSON. |
| |
| Appendix E lists some existing binary formats and discusses how well |
| they do or do not fit the design objectives of the Concise Binary |
| Object Representation (CBOR). |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 3] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 1.1. Objectives |
| |
| The objectives of CBOR, roughly in decreasing order of importance, |
| are: |
| |
| 1. The representation must be able to unambiguously encode most |
| common data formats used in Internet standards. |
| |
| * It must represent a reasonable set of basic data types and |
| structures using binary encoding. "Reasonable" here is |
| largely influenced by the capabilities of JSON, with the major |
| addition of binary byte strings. The structures supported are |
| limited to arrays and trees; loops and lattice-style graphs |
| are not supported. |
| |
| * There is no requirement that all data formats be uniquely |
| encoded; that is, it is acceptable that the number "7" might |
| be encoded in multiple different ways. |
| |
| 2. The code for an encoder or decoder must be able to be compact in |
| order to support systems with very limited memory, processor |
| power, and instruction sets. |
| |
| * An encoder and a decoder need to be implementable in a very |
| small amount of code (for example, in class 1 constrained |
| nodes as defined in [CNN-TERMS]). |
| |
| * The format should use contemporary machine representations of |
| data (for example, not requiring binary-to-decimal |
| conversion). |
| |
| 3. Data must be able to be decoded without a schema description. |
| |
| * Similar to JSON, encoded data should be self-describing so |
| that a generic decoder can be written. |
| |
| 4. The serialization must be reasonably compact, but data |
| compactness is secondary to code compactness for the encoder and |
| decoder. |
| |
| * "Reasonable" here is bounded by JSON as an upper bound in |
| size, and by implementation complexity maintaining a lower |
| bound. Using either general compression schemes or extensive |
| bit-fiddling violates the complexity goals. |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 4] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 5. The format must be applicable to both constrained nodes and high- |
| volume applications. |
| |
| * This means it must be reasonably frugal in CPU usage for both |
| encoding and decoding. This is relevant both for constrained |
| nodes and for potential usage in applications with a very high |
| volume of data. |
| |
| 6. The format must support all JSON data types for conversion to and |
| from JSON. |
| |
| * It must support a reasonable level of conversion as long as |
| the data represented is within the capabilities of JSON. It |
| must be possible to define a unidirectional mapping towards |
| JSON for all types of data. |
| |
| 7. The format must be extensible, and the extended data must be |
| decodable by earlier decoders. |
| |
| * The format is designed for decades of use. |
| |
| * The format must support a form of extensibility that allows |
| fallback so that a decoder that does not understand an |
| extension can still decode the message. |
| |
| * The format must be able to be extended in the future by later |
| IETF standards. |
| |
| 1.2. Terminology |
| |
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
| document are to be interpreted as described in RFC 2119, BCP 14 |
| [RFC2119] and indicate requirement levels for compliant CBOR |
| implementations. |
| |
| The term "byte" is used in its now-customary sense as a synonym for |
| "octet". All multi-byte values are encoded in network byte order |
| (that is, most significant byte first, also known as "big-endian"). |
| |
| This specification makes use of the following terminology: |
| |
| Data item: A single piece of CBOR data. The structure of a data |
| item may contain zero, one, or more nested data items. The term |
| is used both for the data item in representation format and for |
| the abstract idea that can be derived from that by a decoder. |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 5] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Decoder: A process that decodes a CBOR data item and makes it |
| available to an application. Formally speaking, a decoder |
| contains a parser to break up the input using the syntax rules of |
| CBOR, as well as a semantic processor to prepare the data in a |
| form suitable to the application. |
| |
| Encoder: A process that generates the representation format of a |
| CBOR data item from application information. |
| |
| Data Stream: A sequence of zero or more data items, not further |
| assembled into a larger containing data item. The independent |
| data items that make up a data stream are sometimes also referred |
| to as "top-level data items". |
| |
| Well-formed: A data item that follows the syntactic structure of |
| CBOR. A well-formed data item uses the initial bytes and the byte |
| strings and/or data items that are implied by their values as |
| defined in CBOR and is not followed by extraneous data. |
| |
| Valid: A data item that is well-formed and also follows the semantic |
| restrictions that apply to CBOR data items. |
| |
| Stream decoder: A process that decodes a data stream and makes each |
| of the data items in the sequence available to an application as |
| they are received. |
| |
| Where bit arithmetic or data types are explained, this document uses |
| the notation familiar from the programming language C, except that |
| "**" denotes exponentiation. Similar to the "0x" notation for |
| hexadecimal numbers, numbers in binary notation are prefixed with |
| "0b". Underscores can be added to such a number solely for |
| readability, so 0b00100001 (0x21) might be written 0b001_00001 to |
| emphasize the desired interpretation of the bits in the byte; in this |
| case, it is split into three bits and five bits. |
| |
| 2. Specification of the CBOR Encoding |
| |
| A CBOR-encoded data item is structured and encoded as described in |
| this section. The encoding is summarized in Table 5. |
| |
| The initial byte of each data item contains both information about |
| the major type (the high-order 3 bits, described in Section 2.1) and |
| additional information (the low-order 5 bits). When the value of the |
| additional information is less than 24, it is directly used as a |
| small unsigned integer. When it is 24 to 27, the additional bytes |
| for a variable-length integer immediately follow; the values 24 to 27 |
| of the additional information specify that its length is a 1-, 2-, |
| 4-, or 8-byte unsigned integer, respectively. Additional information |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 6] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| value 31 is used for indefinite-length items, described in |
| Section 2.2. Additional information values 28 to 30 are reserved for |
| future expansion. |
| |
| In all additional information values, the resulting integer is |
| interpreted depending on the major type. It may represent the actual |
| data: for example, in integer types, the resulting integer is used |
| for the value itself. It may instead supply length information: for |
| example, in byte strings it gives the length of the byte string data |
| that follows. |
| |
| A CBOR decoder implementation can be based on a jump table with all |
| 256 defined values for the initial byte (Table 5). A decoder in a |
| constrained implementation can instead use the structure of the |
| initial byte and following bytes for more compact code (see |
| Appendix C for a rough impression of how this could look). |
| |
| 2.1. Major Types |
| |
| The following lists the major types and the additional information |
| and other bytes associated with the type. |
| |
| Major type 0: an unsigned integer. The 5-bit additional information |
| is either the integer itself (for additional information values 0 |
| through 23) or the length of additional data. Additional |
| information 24 means the value is represented in an additional |
| uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a |
| uint64_t. For example, the integer 10 is denoted as the one byte |
| 0b000_01010 (major type 0, additional information 10). The |
| integer 500 would be 0b000_11001 (major type 0, additional |
| information 25) followed by the two bytes 0x01f4, which is 500 in |
| decimal. |
| |
| Major type 1: a negative integer. The encoding follows the rules |
| for unsigned integers (major type 0), except that the value is |
| then -1 minus the encoded unsigned integer. For example, the |
| integer -500 would be 0b001_11001 (major type 1, additional |
| information 25) followed by the two bytes 0x01f3, which is 499 in |
| decimal. |
| |
| Major type 2: a byte string. The string's length in bytes is |
| represented following the rules for positive integers (major type |
| 0). For example, a byte string whose length is 5 would have an |
| initial byte of 0b010_00101 (major type 2, additional information |
| 5 for the length), followed by 5 bytes of binary content. A byte |
| string whose length is 500 would have 3 initial bytes of |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 7] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 0b010_11001 (major type 2, additional information 25 to indicate a |
| two-byte length) followed by the two bytes 0x01f4 for a length of |
| 500, followed by 500 bytes of binary content. |
| |
| Major type 3: a text string, specifically a string of Unicode |
| characters that is encoded as UTF-8 [RFC3629]. The format of this |
| type is identical to that of byte strings (major type 2), that is, |
| as with major type 2, the length gives the number of bytes. This |
| type is provided for systems that need to interpret or display |
| human-readable text, and allows the differentiation between |
| unstructured bytes and text that has a specified repertoire and |
| encoding. In contrast to formats such as JSON, the Unicode |
| characters in this type are never escaped. Thus, a newline |
| character (U+000A) is always represented in a string as the byte |
| 0x0a, and never as the bytes 0x5c6e (the characters "\" and "n") |
| or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and |
| "a"). |
| |
| Major type 4: an array of data items. Arrays are also called lists, |
| sequences, or tuples. The array's length follows the rules for |
| byte strings (major type 2), except that the length denotes the |
| number of data items, not the length in bytes that the array takes |
| up. Items in an array do not need to all be of the same type. |
| For example, an array that contains 10 items of any type would |
| have an initial byte of 0b100_01010 (major type of 4, additional |
| information of 10 for the length) followed by the 10 remaining |
| items. |
| |
| Major type 5: a map of pairs of data items. Maps are also called |
| tables, dictionaries, hashes, or objects (in JSON). A map is |
| comprised of pairs of data items, each pair consisting of a key |
| that is immediately followed by a value. The map's length follows |
| the rules for byte strings (major type 2), except that the length |
| denotes the number of pairs, not the length in bytes that the map |
| takes up. For example, a map that contains 9 pairs would have an |
| initial byte of 0b101_01001 (major type of 5, additional |
| information of 9 for the number of pairs) followed by the 18 |
| remaining items. The first item is the first key, the second item |
| is the first value, the third item is the second key, and so on. |
| A map that has duplicate keys may be well-formed, but it is not |
| valid, and thus it causes indeterminate decoding; see also |
| Section 3.7. |
| |
| Major type 6: optional semantic tagging of other major types. See |
| Section 2.4. |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 8] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Major type 7: floating-point numbers and simple data types that need |
| no content, as well as the "break" stop code. See Section 2.3. |
| |
| These eight major types lead to a simple table showing which of the |
| 256 possible values for the initial byte of a data item are used |
| (Table 5). |
| |
| In major types 6 and 7, many of the possible values are reserved for |
| future specification. See Section 7 for more information on these |
| values. |
| |
| 2.2. Indefinite Lengths for Some Major Types |
| |
| Four CBOR items (arrays, maps, byte strings, and text strings) can be |
| encoded with an indefinite length using additional information value |
| 31. This is useful if the encoding of the item needs to begin before |
| the number of items inside the array or map, or the total length of |
| the string, is known. (The application of this is often referred to |
| as "streaming" within a data item.) |
| |
| Indefinite-length arrays and maps are dealt with differently than |
| indefinite-length byte strings and text strings. |
| |
| 2.2.1. Indefinite-Length Arrays and Maps |
| |
| Indefinite-length arrays and maps are simply opened without |
| indicating the number of data items that will be included in the |
| array or map, using the additional information value of 31. The |
| initial major type and additional information byte is followed by the |
| elements of the array or map, just as they would be in other arrays |
| or maps. The end of the array or map is indicated by encoding a |
| "break" stop code in a place where the next data item would normally |
| have been included. The "break" is encoded with major type 7 and |
| additional information value 31 (0b111_11111) but is not itself a |
| data item: it is just a syntactic feature to close the array or map. |
| That is, the "break" stop code comes after the last item in the array |
| or map, and it cannot occur anywhere else in place of a data item. |
| In this way, indefinite-length arrays and maps look identical to |
| other arrays and maps except for beginning with the additional |
| information value 31 and ending with the "break" stop code. |
| |
| Arrays and maps with indefinite lengths allow any number of items |
| (for arrays) and key/value pairs (for maps) to be given before the |
| "break" stop code. There is no restriction against nesting |
| indefinite-length array or map items. A "break" only terminates a |
| single item, so nested indefinite-length items need exactly as many |
| "break" stop codes as there are type bytes starting an indefinite- |
| length item. |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 9] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| For example, assume an encoder wants to represent the abstract array |
| [1, [2, 3], [4, 5]]. The definite-length encoding would be |
| 0x8301820203820405: |
| |
| 83 -- Array of length 3 |
| 01 -- 1 |
| 82 -- Array of length 2 |
| 02 -- 2 |
| 03 -- 3 |
| 82 -- Array of length 2 |
| 04 -- 4 |
| 05 -- 5 |
| |
| Indefinite-length encoding could be applied independently to each of |
| the three arrays encoded in this data item, as required, leading to |
| representations such as: |
| |
| 0x9f018202039f0405ffff |
| 9F -- Start indefinite-length array |
| 01 -- 1 |
| 82 -- Array of length 2 |
| 02 -- 2 |
| 03 -- 3 |
| 9F -- Start indefinite-length array |
| 04 -- 4 |
| 05 -- 5 |
| FF -- "break" (inner array) |
| FF -- "break" (outer array) |
| |
| |
| 0x9f01820203820405ff |
| 9F -- Start indefinite-length array |
| 01 -- 1 |
| 82 -- Array of length 2 |
| 02 -- 2 |
| 03 -- 3 |
| 82 -- Array of length 2 |
| 04 -- 4 |
| 05 -- 5 |
| FF -- "break" |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 10] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 0x83018202039f0405ff |
| 83 -- Array of length 3 |
| 01 -- 1 |
| 82 -- Array of length 2 |
| 02 -- 2 |
| 03 -- 3 |
| 9F -- Start indefinite-length array |
| 04 -- 4 |
| 05 -- 5 |
| FF -- "break" |
| |
| |
| 0x83019f0203ff820405 |
| 83 -- Array of length 3 |
| 01 -- 1 |
| 9F -- Start indefinite-length array |
| 02 -- 2 |
| 03 -- 3 |
| FF -- "break" |
| 82 -- Array of length 2 |
| 04 -- 4 |
| 05 -- 5 |
| |
| |
| An example of an indefinite-length map (that happens to have two |
| key/value pairs) might be: |
| |
| 0xbf6346756ef563416d7421ff |
| BF -- Start indefinite-length map |
| 63 -- First key, UTF-8 string length 3 |
| 46756e -- "Fun" |
| F5 -- First value, true |
| 63 -- Second key, UTF-8 string length 3 |
| 416d74 -- "Amt" |
| 21 -- -2 |
| FF -- "break" |
| |
| 2.2.2. Indefinite-Length Byte Strings and Text Strings |
| |
| Indefinite-length byte strings and text strings are actually a |
| concatenation of zero or more definite-length byte or text strings |
| ("chunks") that are together treated as one contiguous string. |
| Indefinite-length strings are opened with the major type and |
| additional information value of 31, but what follows are a series of |
| byte or text strings that have definite lengths (the chunks). The |
| end of the series of chunks is indicated by encoding the "break" stop |
| code (0b111_11111) in a place where the next chunk in the series |
| would occur. The contents of the chunks are concatenated together, |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 11] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| and the overall length of the indefinite-length string will be the |
| sum of the lengths of all of the chunks. In summary, an indefinite- |
| length string is encoded similarly to how an indefinite-length array |
| of its chunks would be encoded, except that the major type of the |
| indefinite-length string is that of a (text or byte) string and |
| matches the major types of its chunks. |
| |
| For indefinite-length byte strings, every data item (chunk) between |
| the indefinite-length indicator and the "break" MUST be a definite- |
| length byte string item; if the parser sees any item type other than |
| a byte string before it sees the "break", it is an error. |
| |
| For example, assume the sequence: |
| |
| 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 |
| |
| 5F -- Start indefinite-length byte string |
| 44 -- Byte string of length 4 |
| aabbccdd -- Bytes content |
| 43 -- Byte string of length 3 |
| eeff99 -- Bytes content |
| FF -- "break" |
| |
| After decoding, this results in a single byte string with seven |
| bytes: 0xaabbccddeeff99. |
| |
| Text strings with indefinite lengths act the same as byte strings |
| with indefinite lengths, except that all their chunks MUST be |
| definite-length text strings. Note that this implies that the bytes |
| of a single UTF-8 character cannot be spread between chunks: a new |
| chunk can only be started at a character boundary. |
| |
| 2.3. Floating-Point Numbers and Values with No Content |
| |
| Major type 7 is for two types of data: floating-point numbers and |
| "simple values" that do not need any content. Each value of the |
| 5-bit additional information in the initial byte has its own separate |
| meaning, as defined in Table 1. Like the major types for integers, |
| items of this major type do not carry content data; all the |
| information is in the initial bytes. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 12] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| +-------------+--------------------------------------------------+ |
| | 5-Bit Value | Semantics | |
| +-------------+--------------------------------------------------+ |
| | 0..23 | Simple value (value 0..23) | |
| | | | |
| | 24 | Simple value (value 32..255 in following byte) | |
| | | | |
| | 25 | IEEE 754 Half-Precision Float (16 bits follow) | |
| | | | |
| | 26 | IEEE 754 Single-Precision Float (32 bits follow) | |
| | | | |
| | 27 | IEEE 754 Double-Precision Float (64 bits follow) | |
| | | | |
| | 28-30 | (Unassigned) | |
| | | | |
| | 31 | "break" stop code for indefinite-length items | |
| +-------------+--------------------------------------------------+ |
| |
| Table 1: Values for Additional Information in Major Type 7 |
| |
| As with all other major types, the 5-bit value 24 signifies a single- |
| byte extension: it is followed by an additional byte to represent the |
| simple value. (To minimize confusion, only the values 32 to 255 are |
| used.) This maintains the structure of the initial bytes: as for the |
| other major types, the length of these always depends on the |
| additional information in the first byte. Table 2 lists the values |
| assigned and available for simple types. |
| |
| +---------+-----------------+ |
| | Value | Semantics | |
| +---------+-----------------+ |
| | 0..19 | (Unassigned) | |
| | | | |
| | 20 | False | |
| | | | |
| | 21 | True | |
| | | | |
| | 22 | Null | |
| | | | |
| | 23 | Undefined value | |
| | | | |
| | 24..31 | (Reserved) | |
| | | | |
| | 32..255 | (Unassigned) | |
| +---------+-----------------+ |
| |
| Table 2: Simple Values |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 13] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit |
| IEEE 754 binary floating-point values. These floating-point values |
| are encoded in the additional bytes of the appropriate size. (See |
| Appendix D for some information about 16-bit floating point.) |
| |
| 2.4. Optional Tagging of Items |
| |
| In CBOR, a data item can optionally be preceded by a tag to give it |
| additional semantics while retaining its structure. The tag is major |
| type 6, and represents an integer number as indicated by the tag's |
| integer value; the (sole) data item is carried as content data. If a |
| tag requires structured data, this structure is encoded into the |
| nested data item. The definition of a tag usually restricts what |
| kinds of nested data item or items can be carried by a tag. |
| |
| The initial bytes of the tag follow the rules for positive integers |
| (major type 0). The tag is followed by a single data item of any |
| type. For example, assume that a byte string of length 12 is marked |
| with a tag to indicate it is a positive bignum (Section 2.4.2). This |
| would be marked as 0b110_00010 (major type 6, additional information |
| 2 for the tag) followed by 0b010_01100 (major type 2, additional |
| information of 12 for the length) followed by the 12 bytes of the |
| bignum. |
| |
| Decoders do not need to understand tags, and thus tags may be of |
| little value in applications where the implementation creating a |
| particular CBOR data item and the implementation decoding that stream |
| know the semantic meaning of each item in the data flow. Their |
| primary purpose in this specification is to define common data types |
| such as dates. A secondary purpose is to allow optional tagging when |
| the decoder is a generic CBOR decoder that might be able to benefit |
| from hints about the content of items. Understanding the semantic |
| tags is optional for a decoder; it can just jump over the initial |
| bytes of the tag and interpret the tagged data item itself. |
| |
| A tag always applies to the item that is directly followed by it. |
| Thus, if tag A is followed by tag B, which is followed by data item |
| C, tag A applies to the result of applying tag B on data item C. |
| That is, a tagged item is a data item consisting of a tag and a |
| value. The content of the tagged item is the data item (the value) |
| that is being tagged. |
| |
| IANA maintains a registry of tag values as described in Section 7.2. |
| Table 3 provides a list of initial values, with definitions in the |
| rest of this section. |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 14] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| +--------------+------------------+---------------------------------+ |
| | Tag | Data Item | Semantics | |
| +--------------+------------------+---------------------------------+ |
| | 0 | UTF-8 string | Standard date/time string; see | |
| | | | Section 2.4.1 | |
| | | | | |
| | 1 | multiple | Epoch-based date/time; see | |
| | | | Section 2.4.1 | |
| | | | | |
| | 2 | byte string | Positive bignum; see Section | |
| | | | 2.4.2 | |
| | | | | |
| | 3 | byte string | Negative bignum; see Section | |
| | | | 2.4.2 | |
| | | | | |
| | 4 | array | Decimal fraction; see Section | |
| | | | 2.4.3 | |
| | | | | |
| | 5 | array | Bigfloat; see Section 2.4.3 | |
| | | | | |
| | 6..20 | (Unassigned) | (Unassigned) | |
| | | | | |
| | 21 | multiple | Expected conversion to | |
| | | | base64url encoding; see | |
| | | | Section 2.4.4.2 | |
| | | | | |
| | 22 | multiple | Expected conversion to base64 | |
| | | | encoding; see Section 2.4.4.2 | |
| | | | | |
| | 23 | multiple | Expected conversion to base16 | |
| | | | encoding; see Section 2.4.4.2 | |
| | | | | |
| | 24 | byte string | Encoded CBOR data item; see | |
| | | | Section 2.4.4.1 | |
| | | | | |
| | 25..31 | (Unassigned) | (Unassigned) | |
| | | | | |
| | 32 | UTF-8 string | URI; see Section 2.4.4.3 | |
| | | | | |
| | 33 | UTF-8 string | base64url; see Section 2.4.4.3 | |
| | | | | |
| | 34 | UTF-8 string | base64; see Section 2.4.4.3 | |
| | | | | |
| | 35 | UTF-8 string | Regular expression; see | |
| | | | Section 2.4.4.3 | |
| | | | | |
| | 36 | UTF-8 string | MIME message; see Section | |
| | | | 2.4.4.3 | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 15] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | | | | |
| | 37..55798 | (Unassigned) | (Unassigned) | |
| | | | | |
| | 55799 | multiple | Self-describe CBOR; see | |
| | | | Section 2.4.5 | |
| | | | | |
| | 55800+ | (Unassigned) | (Unassigned) | |
| +--------------+------------------+---------------------------------+ |
| |
| Table 3: Values for Tags |
| |
| 2.4.1. Date and Time |
| |
| Tag value 0 is for date/time strings that follow the standard format |
| described in [RFC3339], as refined by Section 3.3 of [RFC4287]. |
| |
| Tag value 1 is for numerical representation of seconds relative to |
| 1970-01-01T00:00Z in UTC time. (For the non-negative values that the |
| Portable Operating System Interface (POSIX) defines, the number of |
| seconds is counted in the same way as for POSIX "seconds since the |
| epoch" [TIME_T].) The tagged item can be a positive or negative |
| integer (major types 0 and 1), or a floating-point number (major type |
| 7 with additional information 25, 26, or 27). Note that the number |
| can be negative (time before 1970-01-01T00:00Z) and, if a floating- |
| point number, indicate fractional seconds. |
| |
| 2.4.2. Bignums |
| |
| Bignums are integers that do not fit into the basic integer |
| representations provided by major types 0 and 1. They are encoded as |
| a byte string data item, which is interpreted as an unsigned integer |
| n in network byte order. For tag value 2, the value of the bignum is |
| n. For tag value 3, the value of the bignum is -1 - n. Decoders |
| that understand these tags MUST be able to decode bignums that have |
| leading zeroes. |
| |
| For example, the number 18446744073709551616 (2**64) is represented |
| as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major |
| type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 |
| and eight bytes 0x00). In hexadecimal: |
| |
| C2 -- Tag 2 |
| 29 -- Byte string of length 9 |
| 010000000000000000 -- Bytes content |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 16] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 2.4.3. Decimal Fractions and Bigfloats |
| |
| Decimal fractions combine an integer mantissa with a base-10 scaling |
| factor. They are most useful if an application needs the exact |
| representation of a decimal fraction such as 1.1 because there is no |
| exact representation for many decimal fractions in binary floating |
| point. |
| |
| Bigfloats combine an integer mantissa with a base-2 scaling factor. |
| They are binary floating-point values that can exceed the range or |
| the precision of the three IEEE 754 formats supported by CBOR |
| (Section 2.3). Bigfloats may also be used by constrained |
| applications that need some basic binary floating-point capability |
| without the need for supporting IEEE 754. |
| |
| A decimal fraction or a bigfloat is represented as a tagged array |
| that contains exactly two integer numbers: an exponent e and a |
| mantissa m. Decimal fractions (tag 4) use base-10 exponents; the |
| value of a decimal fraction data item is m*(10**e). Bigfloats (tag |
| 5) use base-2 exponents; the value of a bigfloat data item is |
| m*(2**e). The exponent e MUST be represented in an integer of major |
| type 0 or 1, while the mantissa also can be a bignum (Section 2.4.2). |
| |
| An example of a decimal fraction is that the number 273.15 could be |
| represented as 0b110_00100 (major type of 6 for the tag, additional |
| information of 4 for the type of tag), followed by 0b100_00010 (major |
| type of 4 for the array, additional information of 2 for the length |
| of the array), followed by 0b001_00001 (major type of 1 for the first |
| integer, additional information of 1 for the value of -2), followed |
| by 0b000_11001 (major type of 0 for the second integer, additional |
| information of 25 for a two-byte value), followed by |
| 0b0110101010110011 (27315 in two bytes). In hexadecimal: |
| |
| C4 -- Tag 4 |
| 82 -- Array of length 2 |
| 21 -- -2 |
| 19 6ab3 -- 27315 |
| |
| An example of a bigfloat is that the number 1.5 could be represented |
| as 0b110_00101 (major type of 6 for the tag, additional information |
| of 5 for the type of tag), followed by 0b100_00010 (major type of 4 |
| for the array, additional information of 2 for the length of the |
| array), followed by 0b001_00000 (major type of 1 for the first |
| integer, additional information of 0 for the value of -1), followed |
| by 0b000_00011 (major type of 0 for the second integer, additional |
| information of 3 for the value of 3). In hexadecimal: |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 17] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| C5 -- Tag 5 |
| 82 -- Array of length 2 |
| 20 -- -1 |
| 03 -- 3 |
| |
| Decimal fractions and bigfloats provide no representation of |
| Infinity, -Infinity, or NaN; if these are needed in place of a |
| decimal fraction or bigfloat, the IEEE 754 half-precision |
| representations from Section 2.3 can be used. For constrained |
| applications, where there is a choice between representing a specific |
| number as an integer and as a decimal fraction or bigfloat (such as |
| when the exponent is small and non-negative), there is a quality-of- |
| implementation expectation that the integer representation is used |
| directly. |
| |
| 2.4.4. Content Hints |
| |
| The tags in this section are for content hints that might be used by |
| generic CBOR processors. |
| |
| 2.4.4.1. Encoded CBOR Data Item |
| |
| Sometimes it is beneficial to carry an embedded CBOR data item that |
| is not meant to be decoded immediately at the time the enclosing data |
| item is being parsed. Tag 24 (CBOR data item) can be used to tag the |
| embedded byte string as a data item encoded in CBOR format. |
| |
| 2.4.4.2. Expected Later Encoding for CBOR-to-JSON Converters |
| |
| Tags 21 to 23 indicate that a byte string might require a specific |
| encoding when interoperating with a text-based representation. These |
| tags are useful when an encoder knows that the byte string data it is |
| writing is likely to be later converted to a particular JSON-based |
| usage. That usage specifies that some strings are encoded as base64, |
| base64url, and so on. The encoder uses byte strings instead of doing |
| the encoding itself to reduce the message size, to reduce the code |
| size of the encoder, or both. The encoder does not know whether or |
| not the converter will be generic, and therefore wants to say what it |
| believes is the proper way to convert binary strings to JSON. |
| |
| The data item tagged can be a byte string or any other data item. In |
| the latter case, the tag applies to all of the byte string data items |
| contained in the data item, except for those contained in a nested |
| data item tagged with an expected conversion. |
| |
| These three tag types suggest conversions to three of the base data |
| encodings defined in [RFC4648]. For base64url encoding, padding is |
| not used (see Section 3.2 of RFC 4648); that is, all trailing equals |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 18] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| signs ("=") are removed from the base64url-encoded string. Later |
| tags might be defined for other data encodings of RFC 4648 or for |
| other ways to encode binary data in strings. |
| |
| 2.4.4.3. Encoded Text |
| |
| Some text strings hold data that have formats widely used on the |
| Internet, and sometimes those formats can be validated and presented |
| to the application in appropriate form by the decoder. There are |
| tags for some of these formats. |
| |
| o Tag 32 is for URIs, as defined in [RFC3986]; |
| |
| o Tags 33 and 34 are for base64url- and base64-encoded text strings, |
| as defined in [RFC4648]; |
| |
| o Tag 35 is for regular expressions in Perl Compatible Regular |
| Expressions (PCRE) / JavaScript syntax [ECMA262]. |
| |
| o Tag 36 is for MIME messages (including all headers), as defined in |
| [RFC2045]; |
| |
| Note that tags 33 and 34 differ from 21 and 22 in that the data is |
| transported in base-encoded form for the former and in raw byte |
| string form for the latter. |
| |
| 2.4.5. Self-Describe CBOR |
| |
| In many applications, it will be clear from the context that CBOR is |
| being employed for encoding a data item. For instance, a specific |
| protocol might specify the use of CBOR, or a media type is indicated |
| that specifies its use. However, there may be applications where |
| such context information is not available, such as when CBOR data is |
| stored in a file and disambiguating metadata is not in use. Here, it |
| may help to have some distinguishing characteristics for the data |
| itself. |
| |
| Tag 55799 is defined for this purpose. It does not impart any |
| special semantics on the data item that follows; that is, the |
| semantics of a data item tagged with tag 55799 is exactly identical |
| to the semantics of the data item itself. |
| |
| The serialization of this tag is 0xd9d9f7, which appears not to be in |
| use as a distinguishing mark for frequently used file types. In |
| particular, it is not a valid start of a Unicode text in any Unicode |
| encoding if followed by a valid CBOR data item. |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 19] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| For instance, a decoder might be able to parse both CBOR and JSON. |
| Such a decoder would need to mechanically distinguish the two |
| formats. An easy way for an encoder to help the decoder would be to |
| tag the entire CBOR item with tag 55799, the serialization of which |
| will never be found at the beginning of a JSON text. |
| |
| 3. Creating CBOR-Based Protocols |
| |
| Data formats such as CBOR are often used in environments where there |
| is no format negotiation. A specific design goal of CBOR is to not |
| need any included or assumed schema: a decoder can take a CBOR item |
| and decode it with no other knowledge. |
| |
| Of course, in real-world implementations, the encoder and the decoder |
| will have a shared view of what should be in a CBOR data item. For |
| example, an agreed-to format might be "the item is an array whose |
| first value is a UTF-8 string, second value is an integer, and |
| subsequent values are zero or more floating-point numbers" or "the |
| item is a map that has byte strings for keys and contains at least |
| one pair whose key is 0xab01". |
| |
| This specification puts no restrictions on CBOR-based protocols. An |
| encoder can be capable of encoding as many or as few types of values |
| as is required by the protocol in which it is used; a decoder can be |
| capable of understanding as many or as few types of values as is |
| required by the protocols in which it is used. This lack of |
| restrictions allows CBOR to be used in extremely constrained |
| environments. |
| |
| This section discusses some considerations in creating CBOR-based |
| protocols. It is advisory only and explicitly excludes any language |
| from RFC 2119 other than words that could be interpreted as "MAY" in |
| the sense of RFC 2119. |
| |
| 3.1. CBOR in Streaming Applications |
| |
| In a streaming application, a data stream may be composed of a |
| sequence of CBOR data items concatenated back-to-back. In such an |
| environment, the decoder immediately begins decoding a new data item |
| if data is found after the end of a previous data item. |
| |
| Not all of the bytes making up a data item may be immediately |
| available to the decoder; some decoders will buffer additional data |
| until a complete data item can be presented to the application. |
| Other decoders can present partial information about a top-level data |
| item to an application, such as the nested data items that could |
| already be decoded, or even parts of a byte string that hasn't |
| completely arrived yet. |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 20] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Note that some applications and protocols will not want to use |
| indefinite-length encoding. Using indefinite-length encoding allows |
| an encoder to not need to marshal all the data for counting, but it |
| requires a decoder to allocate increasing amounts of memory while |
| waiting for the end of the item. This might be fine for some |
| applications but not others. |
| |
| 3.2. Generic Encoders and Decoders |
| |
| A generic CBOR decoder can decode all well-formed CBOR data and |
| present them to an application. CBOR data is well-formed if it uses |
| the initial bytes, as well as the byte strings and/or data items that |
| are implied by their values, in the manner defined by CBOR, and no |
| extraneous data follows (Appendix C). |
| |
| Even though CBOR attempts to minimize these cases, not all well- |
| formed CBOR data is valid: for example, the format excludes simple |
| values below 32 that are encoded with an extension byte. Also, |
| specific tags may make semantic constraints that may be violated, |
| such as by including a tag in a bignum tag or by following a byte |
| string within a date tag. Finally, the data may be invalid, such as |
| invalid UTF-8 strings or date strings that do not conform to |
| [RFC3339]. There is no requirement that generic encoders and |
| decoders make unnatural choices for their application interface to |
| enable the processing of invalid data. Generic encoders and decoders |
| are expected to forward simple values and tags even if their specific |
| codepoints are not registered at the time the encoder/decoder is |
| written (Section 3.5). |
| |
| Generic decoders provide ways to present well-formed CBOR values, |
| both valid and invalid, to an application. The diagnostic notation |
| (Section 6) may be used to present well-formed CBOR values to humans. |
| |
| Generic encoders provide an application interface that allows the |
| application to specify any well-formed value, including simple values |
| and tags unknown to the encoder. |
| |
| 3.3. Syntax Errors |
| |
| A decoder encountering a CBOR data item that is not well-formed |
| generally can choose to completely fail the decoding (issue an error |
| and/or stop processing altogether), substitute the problematic data |
| and data items using a decoder-specific convention that clearly |
| indicates there has been a problem, or take some other action. |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 21] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 3.3.1. Incomplete CBOR Data Items |
| |
| The representation of a CBOR data item has a specific length, |
| determined by its initial bytes and by the structure of any data |
| items enclosed in the data items. If less data is available, this |
| can be treated as a syntax error. A decoder may also implement |
| incremental parsing, that is, decode the data item as far as it is |
| available and present the data found so far (such as in an event- |
| based interface), with the option of continuing the decoding once |
| further data is available. |
| |
| Examples of incomplete data items include: |
| |
| o A decoder expects a certain number of array or map entries but |
| instead encounters the end of the data. |
| |
| o A decoder processes what it expects to be the last pair in a map |
| and comes to the end of the data. |
| |
| o A decoder has just seen a tag and then encounters the end of the |
| data. |
| |
| o A decoder has seen the beginning of an indefinite-length item but |
| encounters the end of the data before it sees the "break" stop |
| code. |
| |
| 3.3.2. Malformed Indefinite-Length Items |
| |
| Examples of malformed indefinite-length data items include: |
| |
| o Within an indefinite-length byte string or text, a decoder finds |
| an item that is not of the appropriate major type before it finds |
| the "break" stop code. |
| |
| o Within an indefinite-length map, a decoder encounters the "break" |
| stop code immediately after reading a key (the value is missing). |
| |
| Another error is finding a "break" stop code at a point in the data |
| where there is no immediately enclosing (unclosed) indefinite-length |
| item. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 22] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 3.3.3. Unknown Additional Information Values |
| |
| At the time of writing, some additional information values are |
| unassigned and reserved for future versions of this document (see |
| Section 5.2). Since the overall syntax for these additional |
| information values is not yet defined, a decoder that sees an |
| additional information value that it does not understand cannot |
| continue parsing. |
| |
| 3.4. Other Decoding Errors |
| |
| A CBOR data item may be syntactically well-formed but present a |
| problem with interpreting the data encoded in it in the CBOR data |
| model. Generally speaking, a decoder that finds a data item with |
| such a problem might issue a warning, might stop processing |
| altogether, might handle the error and make the problematic value |
| available to the application as such, or take some other type of |
| action. |
| |
| Such problems might include: |
| |
| Duplicate keys in a map: Generic decoders (Section 3.2) make data |
| available to applications using the native CBOR data model. That |
| data model includes maps (key-value mappings with unique keys), |
| not multimaps (key-value mappings where multiple entries can have |
| the same key). Thus, a generic decoder that gets a CBOR map item |
| that has duplicate keys will decode to a map with only one |
| instance of that key, or it might stop processing altogether. On |
| the other hand, a "streaming decoder" may not even be able to |
| notice (Section 3.7). |
| |
| Inadmissible type on the value following a tag: Tags (Section 2.4) |
| specify what type of data item is supposed to follow the tag; for |
| example, the tags for positive or negative bignums are supposed to |
| be put on byte strings. A decoder that decodes the tagged data |
| item into a native representation (a native big integer in this |
| example) is expected to check the type of the data item being |
| tagged. Even decoders that don't have such native representations |
| available in their environment may perform the check on those tags |
| known to them and react appropriately. |
| |
| Invalid UTF-8 string: A decoder might or might not want to verify |
| that the sequence of bytes in a UTF-8 string (major type 3) is |
| actually valid UTF-8 and react appropriately. |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 23] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 3.5. Handling Unknown Simple Values and Tags |
| |
| A decoder that comes across a simple value (Section 2.3) that it does |
| not recognize, such as a value that was added to the IANA registry |
| after the decoder was deployed or a value that the decoder chose not |
| to implement, might issue a warning, might stop processing |
| altogether, might handle the error by making the unknown value |
| available to the application as such (as is expected of generic |
| decoders), or take some other type of action. |
| |
| A decoder that comes across a tag (Section 2.4) that it does not |
| recognize, such as a tag that was added to the IANA registry after |
| the decoder was deployed or a tag that the decoder chose not to |
| implement, might issue a warning, might stop processing altogether, |
| might handle the error and present the unknown tag value together |
| with the contained data item to the application (as is expected of |
| generic decoders), might ignore the tag and simply present the |
| contained data item only to the application, or take some other type |
| of action. |
| |
| 3.6. Numbers |
| |
| For the purposes of this specification, all number representations |
| for the same numeric value are equivalent. This means that an |
| encoder can encode a floating-point value of 0.0 as the integer 0. |
| It, however, also means that an application that expects to find |
| integer values only might find floating-point values if the encoder |
| decides these are desirable, such as when the floating-point value is |
| more compact than a 64-bit integer. |
| |
| An application or protocol that uses CBOR might restrict the |
| representations of numbers. For instance, a protocol that only deals |
| with integers might say that floating-point numbers may not be used |
| and that decoders of that protocol do not need to be able to handle |
| floating-point numbers. Similarly, a protocol or application that |
| uses CBOR might say that decoders need to be able to handle either |
| type of number. |
| |
| CBOR-based protocols should take into account that different language |
| environments pose different restrictions on the range and precision |
| of numbers that are representable. For example, the JavaScript |
| number system treats all numbers as floating point, which may result |
| in silent loss of precision in decoding integers with more than 53 |
| significant bits. A protocol that uses numbers should define its |
| expectations on the handling of non-trivial numbers in decoders and |
| receiving applications. |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 24] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| A CBOR-based protocol that includes floating-point numbers can |
| restrict which of the three formats (half-precision, single- |
| precision, and double-precision) are to be supported. For an |
| integer-only application, a protocol may want to completely exclude |
| the use of floating-point values. |
| |
| A CBOR-based protocol designed for compactness may want to exclude |
| specific integer encodings that are longer than necessary for the |
| application, such as to save the need to implement 64-bit integers. |
| There is an expectation that encoders will use the most compact |
| integer representation that can represent a given value. However, a |
| compact application should accept values that use a longer-than- |
| needed encoding (such as encoding "0" as 0b000_11101 followed by two |
| bytes of 0x00) as long as the application can decode an integer of |
| the given size. |
| |
| 3.7. Specifying Keys for Maps |
| |
| The encoding and decoding applications need to agree on what types of |
| keys are going to be used in maps. In applications that need to |
| interwork with JSON-based applications, keys probably should be |
| limited to UTF-8 strings only; otherwise, there has to be a specified |
| mapping from the other CBOR types to Unicode characters, and this |
| often leads to implementation errors. In applications where keys are |
| numeric in nature and numeric ordering of keys is important to the |
| application, directly using the numbers for the keys is useful. |
| |
| If multiple types of keys are to be used, consideration should be |
| given to how these types would be represented in the specific |
| programming environments that are to be used. For example, in |
| JavaScript objects, a key of integer 1 cannot be distinguished from a |
| key of string "1". This means that, if integer keys are used, the |
| simultaneous use of string keys that look like numbers needs to be |
| avoided. Again, this leads to the conclusion that keys should be of |
| a single CBOR type. |
| |
| Decoders that deliver data items nested within a CBOR data item |
| immediately on decoding them ("streaming decoders") often do not keep |
| the state that is necessary to ascertain uniqueness of a key in a |
| map. Similarly, an encoder that can start encoding data items before |
| the enclosing data item is completely available ("streaming encoder") |
| may want to reduce its overhead significantly by relying on its data |
| source to maintain uniqueness. |
| |
| A CBOR-based protocol should make an intentional decision about what |
| to do when a receiving application does see multiple identical keys |
| in a map. The resulting rule in the protocol should respect the CBOR |
| data model: it cannot prescribe a specific handling of the entries |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 25] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| with the identical keys, except that it might have a rule that having |
| identical keys in a map indicates a malformed map and that the |
| decoder has to stop with an error. Duplicate keys are also |
| prohibited by CBOR decoders that are using strict mode |
| (Section 3.10). |
| |
| The CBOR data model for maps does not allow ascribing semantics to |
| the order of the key/value pairs in the map representation. |
| Thus, it would be a very bad practice to define a CBOR-based protocol |
| in such a way that changing the key/value pair order in a map would |
| change the semantics, apart from trivial aspects (cache usage, etc.). |
| (A CBOR-based protocol can prescribe a specific order of |
| serialization, such as for canonicalization.) |
| |
| Applications for constrained devices that have maps with 24 or fewer |
| frequently used keys should consider using small integers (and those |
| with up to 48 frequently used keys should consider also using small |
| negative integers) because the keys can then be encoded in a single |
| byte. |
| |
| 3.8. Undefined Values |
| |
| In some CBOR-based protocols, the simple value (Section 2.3) of |
| Undefined might be used by an encoder as a substitute for a data item |
| with an encoding problem, in order to allow the rest of the enclosing |
| data items to be encoded without harm. |
| |
| 3.9. Canonical CBOR |
| |
| Some protocols may want encoders to only emit CBOR in a particular |
| canonical format; those protocols might also have the decoders check |
| that their input is canonical. Those protocols are free to define |
| what they mean by a canonical format and what encoders and decoders |
| are expected to do. This section lists some suggestions for such |
| protocols. |
| |
| If a protocol considers "canonical" to mean that two encoder |
| implementations starting with the same input data will produce the |
| same CBOR output, the following four rules would suffice: |
| |
| o Integers must be as small as possible. |
| |
| * 0 to 23 and -1 to -24 must be expressed in the same byte as the |
| major type; |
| |
| * 24 to 255 and -25 to -256 must be expressed only with an |
| additional uint8_t; |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 26] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| * 256 to 65535 and -257 to -65536 must be expressed only with an |
| additional uint16_t; |
| |
| * 65536 to 4294967295 and -65537 to -4294967296 must be expressed |
| only with an additional uint32_t. |
| |
| o The expression of lengths in major types 2 through 5 must be as |
| short as possible. The rules for these lengths follow the above |
| rule for integers. |
| |
| o The keys in every map must be sorted lowest value to highest. |
| Sorting is performed on the bytes of the representation of the key |
| data items without paying attention to the 3/5 bit splitting for |
| major types. (Note that this rule allows maps that have keys of |
| different types, even though that is probably a bad practice that |
| could lead to errors in some canonicalization implementations.) |
| The sorting rules are: |
| |
| * If two keys have different lengths, the shorter one sorts |
| earlier; |
| |
| * If two keys have the same length, the one with the lower value |
| in (byte-wise) lexical order sorts earlier. |
| |
| o Indefinite-length items must be made into definite-length items. |
| |
| If a protocol allows for IEEE floats, then additional |
| canonicalization rules might need to be added. One example rule |
| might be to have all floats start as a 64-bit float, then do a test |
| conversion to a 32-bit float; if the result is the same numeric |
| value, use the shorter value and repeat the process with a test |
| conversion to a 16-bit float. (This rule selects 16-bit float for |
| positive and negative Infinity as well.) Also, there are many |
| representations for NaN. If NaN is an allowed value, it must always |
| be represented as 0xf97e00. |
| |
| CBOR tags present additional considerations for canonicalization. |
| The absence or presence of tags in a canonical format is determined |
| by the optionality of the tags in the protocol. In a CBOR-based |
| protocol that allows optional tagging anywhere, the canonical format |
| must not allow them. In a protocol that requires tags in certain |
| places, the tag needs to appear in the canonical format. A CBOR- |
| based protocol that uses canonicalization might instead say that all |
| tags that appear in a message must be retained regardless of whether |
| they are optional. |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 27] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 3.10. Strict Mode |
| |
| Some areas of application of CBOR do not require canonicalization |
| (Section 3.9) but may require that different decoders reach the same |
| (semantically equivalent) results, even in the presence of |
| potentially malicious data. This can be required if one application |
| (such as a firewall or other protecting entity) makes a decision |
| based on the data that another application, which independently |
| decodes the data, relies on. |
| |
| Normally, it is the responsibility of the sender to avoid ambiguously |
| decodable data. However, the sender might be an attacker specially |
| making up CBOR data such that it will be interpreted differently by |
| different decoders in an attempt to exploit that as a vulnerability. |
| Generic decoders used in applications where this might be a problem |
| need to support a strict mode in which it is also the responsibility |
| of the receiver to reject ambiguously decodable data. It is expected |
| that firewalls and other security systems that decode CBOR will only |
| decode in strict mode. |
| |
| A decoder in strict mode will reliably reject any data that could be |
| interpreted by other decoders in different ways. It will reliably |
| reject data items with syntax errors (Section 3.3). It will also |
| expend the effort to reliably detect other decoding errors |
| (Section 3.4). In particular, a strict decoder needs to have an API |
| that reports an error (and does not return data) for a CBOR data item |
| that contains any of the following: |
| |
| o a map (major type 5) that has more than one entry with the same |
| key |
| |
| o a tag that is used on a data item of the incorrect type |
| |
| o a data item that is incorrectly formatted for the type given to |
| it, such as invalid UTF-8 or data that cannot be interpreted with |
| the specific tag that it has been tagged with |
| |
| A decoder in strict mode can do one of two things when it encounters |
| a tag or simple value that it does not recognize: |
| |
| o It can report an error (and not return data). |
| |
| o It can emit the unknown item (type, value, and, for tags, the |
| decoded tagged data item) to the application calling the decoder |
| with an indication that the decoder did not recognize that tag or |
| simple value. |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 28] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| The latter approach, which is also appropriate for non-strict |
| decoders, supports forward compatibility with newly registered tags |
| and simple values without the requirement to update the encoder at |
| the same time as the calling application. (For this, the API for the |
| decoder needs to have a way to mark unknown items so that the calling |
| application can handle them in a manner appropriate for the program.) |
| |
| Since some of this processing may have an appreciable cost (in |
| particular with duplicate detection for maps), support of strict mode |
| is not a requirement placed on all CBOR decoders. |
| |
| Some encoders will rely on their applications to provide input data |
| in such a way that unambiguously decodable CBOR results. A generic |
| encoder also may want to provide a strict mode where it reliably |
| limits its output to unambiguously decodable CBOR, independent of |
| whether or not its application is providing API-conformant data. |
| |
| 4. Converting Data between CBOR and JSON |
| |
| This section gives non-normative advice about converting between CBOR |
| and JSON. Implementations of converters are free to use whichever |
| advice here they want. |
| |
| It is worth noting that a JSON text is a sequence of characters, not |
| an encoded sequence of bytes, while a CBOR data item consists of |
| bytes, not characters. |
| |
| 4.1. Converting from CBOR to JSON |
| |
| Most of the types in CBOR have direct analogs in JSON. However, some |
| do not, and someone implementing a CBOR-to-JSON converter has to |
| consider what to do in those cases. The following non-normative |
| advice deals with these by converting them to a single substitute |
| value, such as a JSON null. |
| |
| o An integer (major type 0 or 1) becomes a JSON number. |
| |
| o A byte string (major type 2) that is not embedded in a tag that |
| specifies a proposed encoding is encoded in base64url without |
| padding and becomes a JSON string. |
| |
| o A UTF-8 string (major type 3) becomes a JSON string. Note that |
| JSON requires escaping certain characters (RFC 4627, Section 2.5): |
| quotation mark (U+0022), reverse solidus (U+005C), and the "C0 |
| control characters" (U+0000 through U+001F). All other characters |
| are copied unchanged into the JSON UTF-8 string. |
| |
| o An array (major type 4) becomes a JSON array. |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 29] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| o A map (major type 5) becomes a JSON object. This is possible |
| directly only if all keys are UTF-8 strings. A converter might |
| also convert other keys into UTF-8 strings (such as by converting |
| integers into strings containing their decimal representation); |
| however, doing so introduces a danger of key collision. |
| |
| o False (major type 7, additional information 20) becomes a JSON |
| false. |
| |
| o True (major type 7, additional information 21) becomes a JSON |
| true. |
| |
| o Null (major type 7, additional information 22) becomes a JSON |
| null. |
| |
| o A floating-point value (major type 7, additional information 25 |
| through 27) becomes a JSON number if it is finite (that is, it can |
| be represented in a JSON number); if the value is non-finite (NaN, |
| or positive or negative Infinity), it is represented by the |
| substitute value. |
| |
| o Any other simple value (major type 7, any additional information |
| value not yet discussed) is represented by the substitute value. |
| |
| o A bignum (major type 6, tag value 2 or 3) is represented by |
| encoding its byte string in base64url without padding and becomes |
| a JSON string. For tag value 3 (negative bignum), a "~" (ASCII |
| tilde) is inserted before the base-encoded value. (The conversion |
| to a binary blob instead of a number is to prevent a likely |
| numeric overflow for the JSON decoder.) |
| |
| o A byte string with an encoding hint (major type 6, tag value 21 |
| through 23) is encoded as described and becomes a JSON string. |
| |
| o For all other tags (major type 6, any other tag value), the |
| embedded CBOR item is represented as a JSON value; the tag value |
| is ignored. |
| |
| o Indefinite-length items are made definite before conversion. |
| |
| 4.2. Converting from JSON to CBOR |
| |
| All JSON values, once decoded, directly map into one or more CBOR |
| values. As with any kind of CBOR generation, decisions have to be |
| made with respect to number representation. In a suggested |
| conversion: |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 30] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| o JSON numbers without fractional parts (integer numbers) are |
| represented as integers (major types 0 and 1, possibly major type |
| 6 tag value 2 and 3), choosing the shortest form; integers longer |
| than an implementation-defined threshold (which is usually either |
| 32 or 64 bits) may instead be represented as floating-point |
| values. (If the JSON was generated from a JavaScript |
| implementation, its precision is already limited to 53 bits |
| maximum.) |
| |
| o Numbers with fractional parts are represented as floating-point |
| values. Preferably, the shortest exact floating-point |
| representation is used; for instance, 1.5 is represented in a |
| 16-bit floating-point value (not all implementations will be |
| capable of efficiently finding the minimum form, though). There |
| may be an implementation-defined limit to the precision that will |
| affect the precision of the represented values. Decimal |
| representation should only be used if that is specified in a |
| protocol. |
| |
| CBOR has been designed to generally provide a more compact encoding |
| than JSON. One implementation strategy that might come to mind is to |
| perform a JSON-to-CBOR encoding in place in a single buffer. This |
| strategy would need to carefully consider a number of pathological |
| cases, such as that some strings represented with no or very few |
| escapes and longer (or much longer) than 255 bytes may expand when |
| encoded as UTF-8 strings in CBOR. Similarly, a few of the binary |
| floating-point representations might cause expansion from some short |
| decimal representations (1.1, 1e9) in JSON. This may be hard to get |
| right, and any ensuing vulnerabilities may be exploited by an |
| attacker. |
| |
| 5. Future Evolution of CBOR |
| |
| Successful protocols evolve over time. New ideas appear, |
| implementation platforms improve, related protocols are developed and |
| evolve, and new requirements from applications and protocols are |
| added. Facilitating protocol evolution is therefore an important |
| design consideration for any protocol development. |
| |
| For protocols that will use CBOR, CBOR provides some useful |
| mechanisms to facilitate their evolution. Best practices for this |
| are well known, particularly from JSON format development of JSON- |
| based protocols. Therefore, such best practices are outside the |
| scope of this specification. |
| |
| However, facilitating the evolution of CBOR itself is very well |
| within its scope. CBOR is designed to both provide a stable basis |
| for development of CBOR-based protocols and to be able to evolve. |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 31] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Since a successful protocol may live for decades, CBOR needs to be |
| designed for decades of use and evolution. This section provides |
| some guidance for the evolution of CBOR. It is necessarily more |
| subjective than other parts of this document. It is also necessarily |
| incomplete, lest it turn into a textbook on protocol development. |
| |
| 5.1. Extension Points |
| |
| In a protocol design, opportunities for evolution are often included |
| in the form of extension points. For example, there may be a |
| codepoint space that is not fully allocated from the outset, and the |
| protocol is designed to tolerate and embrace implementations that |
| start using more codepoints than initially allocated. |
| |
| Sizing the codepoint space may be difficult because the range |
| required may be hard to predict. An attempt should be made to make |
| the codepoint space large enough so that it can slowly be filled over |
| the intended lifetime of the protocol. |
| |
| CBOR has three major extension points: |
| |
| o the "simple" space (values in major type 7). Of the 24 efficient |
| (and 224 slightly less efficient) values, only a small number have |
| been allocated. Implementations receiving an unknown simple data |
| item may be able to process it as such, given that the structure |
| of the value is indeed simple. The IANA registry in Section 7.1 |
| is the appropriate way to address the extensibility of this |
| codepoint space. |
| |
| o the "tag" space (values in major type 6). Again, only a small |
| part of the codepoint space has been allocated, and the space is |
| abundant (although the early numbers are more efficient than the |
| later ones). Implementations receiving an unknown tag can choose |
| to simply ignore it or to process it as an unknown tag wrapping |
| the following data item. The IANA registry in Section 7.2 is the |
| appropriate way to address the extensibility of this codepoint |
| space. |
| |
| o the "additional information" space. An implementation receiving |
| an unknown additional information value has no way to continue |
| parsing, so allocating codepoints to this space is a major step. |
| There are also very few codepoints left. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 32] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 5.2. Curating the Additional Information Space |
| |
| The human mind is sometimes drawn to filling in little perceived gaps |
| to make something neat. We expect the remaining gaps in the |
| codepoint space for the additional information values to be an |
| attractor for new ideas, just because they are there. |
| |
| The present specification does not manage the additional information |
| codepoint space by an IANA registry. Instead, allocations out of |
| this space can only be done by updating this specification. |
| |
| For an additional information value of n >= 24, the size of the |
| additional data typically is 2**(n-24) bytes. Therefore, additional |
| information values 28 and 29 should be viewed as candidates for |
| 128-bit and 256-bit quantities, in case a need arises to add them to |
| the protocol. Additional information value 30 is then the only |
| additional information value available for general allocation, and |
| there should be a very good reason for allocating it before assigning |
| it through an update of this protocol. |
| |
| 6. Diagnostic Notation |
| |
| CBOR is a binary interchange format. To facilitate documentation and |
| debugging, and in particular to facilitate communication between |
| entities cooperating in debugging, this section defines a simple |
| human-readable diagnostic notation. All actual interchange always |
| happens in the binary format. |
| |
| Note that this truly is a diagnostic format; it is not meant to be |
| parsed. Therefore, no formal definition (as in ABNF) is given in |
| this document. (Implementers looking for a text-based format for |
| representing CBOR data items in configuration files may also want to |
| consider YAML [YAML].) |
| |
| The diagnostic notation is loosely based on JSON as it is defined in |
| RFC 4627, extending it where needed. |
| |
| The notation borrows the JSON syntax for numbers (integer and |
| floating point), True (>true<), False (>false<), Null (>null<), UTF-8 |
| strings, arrays, and maps (maps are called objects in JSON; the |
| diagnostic notation extends JSON here by allowing any data item in |
| the key position). Undefined is written >undefined< as in |
| JavaScript. The non-finite floating-point numbers Infinity, |
| -Infinity, and NaN are written exactly as in this sentence (this is |
| also a way they can be written in JavaScript, although JSON does not |
| allow them). A tagged item is written as an integer number for the |
| tag followed by the item in parentheses; for instance, an RFC 3339 |
| (ISO 8601) date could be notated as: |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 33] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 0("2013-03-21T20:04:00Z") |
| |
| or the equivalent relative time as |
| |
| 1(1363896240) |
| |
| Byte strings are notated in one of the base encodings, without |
| padding, enclosed in single quotes, prefixed by >h< for base16, >b32< |
| for base32, >h32< for base32hex, >b64< for base64 or base64url (the |
| actual encodings do not overlap, so the string remains unambiguous). |
| For example, the byte string 0x12345678 could be written h'12345678', |
| b32'CI2FM6A', or b64'EjRWeA'. |
| |
| Unassigned simple values are given as "simple()" with the appropriate |
| integer in the parentheses. For example, "simple(42)" indicates |
| major type 7, value 42. |
| |
| 6.1. Encoding Indicators |
| |
| Sometimes it is useful to indicate in the diagnostic notation which |
| of several alternative representations were actually used; for |
| example, a data item written >1.5< by a diagnostic decoder might have |
| been encoded as a half-, single-, or double-precision float. |
| |
| The convention for encoding indicators is that anything starting with |
| an underscore and all following characters that are alphanumeric or |
| underscore, is an encoding indicator, and can be ignored by anyone |
| not interested in this information. Encoding indicators are always |
| optional. |
| |
| A single underscore can be written after the opening brace of a map |
| or the opening bracket of an array to indicate that the data item was |
| represented in indefinite-length format. For example, [_ 1, 2] |
| contains an indicator that an indefinite-length representation was |
| used to represent the data item [1, 2]. |
| |
| An underscore followed by a decimal digit n indicates that the |
| preceding item (or, for arrays and maps, the item starting with the |
| preceding bracket or brace) was encoded with an additional |
| information value of 24+n. For example, 1.5_1 is a half-precision |
| floating-point number, while 1.5_3 is encoded as double precision. |
| This encoding indicator is not shown in Appendix A. (Note that the |
| encoding indicator "_" is thus an abbreviation of the full form "_7", |
| which is not used.) |
| |
| As a special case, byte and text strings of indefinite length can be |
| notated in the form (_ h'0123', h'4567') and (_ "foo", "bar"). |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 34] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 7. IANA Considerations |
| |
| IANA has created two registries for new CBOR values. The registries |
| are separate, that is, not under an umbrella registry, and follow the |
| rules in [RFC5226]. IANA has also assigned a new MIME media type and |
| an associated Constrained Application Protocol (CoAP) Content-Format |
| entry. |
| |
| 7.1. Simple Values Registry |
| |
| IANA has created the "Concise Binary Object Representation (CBOR) |
| Simple Values" registry. The initial values are shown in Table 2. |
| |
| New entries in the range 0 to 19 are assigned by Standards Action. |
| It is suggested that these Standards Actions allocate values starting |
| with the number 16 in order to reserve the lower numbers for |
| contiguous blocks (if any). |
| |
| New entries in the range 32 to 255 are assigned by Specification |
| Required. |
| |
| 7.2. Tags Registry |
| |
| IANA has created the "Concise Binary Object Representation (CBOR) |
| Tags" registry. The initial values are shown in Table 3. |
| |
| New entries in the range 0 to 23 are assigned by Standards Action. |
| New entries in the range 24 to 255 are assigned by Specification |
| Required. New entries in the range 256 to 18446744073709551615 are |
| assigned by First Come First Served. The template for registration |
| requests is: |
| |
| o Data item |
| |
| o Semantics (short form) |
| |
| In addition, First Come First Served requests should include: |
| |
| o Point of contact |
| |
| o Description of semantics (URL) |
| This description is optional; the URL can point to something like |
| an Internet-Draft or a web page. |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 35] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 7.3. Media Type ("MIME Type") |
| |
| The Internet media type [RFC6838] for CBOR data is application/cbor. |
| |
| Type name: application |
| |
| Subtype name: cbor |
| |
| Required parameters: n/a |
| |
| Optional parameters: n/a |
| |
| Encoding considerations: binary |
| |
| Security considerations: See Section 8 of this document |
| |
| Interoperability considerations: n/a |
| |
| Published specification: This document |
| |
| Applications that use this media type: None yet, but it is expected |
| that this format will be deployed in protocols and applications. |
| |
| Additional information: |
| Magic number(s): n/a |
| File extension(s): .cbor |
| Macintosh file type code(s): n/a |
| |
| Person & email address to contact for further information: |
| Carsten Bormann |
| cabo@tzi.org |
| |
| Intended usage: COMMON |
| |
| Restrictions on usage: none |
| |
| Author: |
| Carsten Bormann <cabo@tzi.org> |
| |
| Change controller: |
| The IESG <iesg@ietf.org> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 36] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| 7.4. CoAP Content-Format |
| |
| Media Type: application/cbor |
| |
| Encoding: - |
| |
| Id: 60 |
| |
| Reference: [RFC7049] |
| |
| 7.5. The +cbor Structured Syntax Suffix Registration |
| |
| Name: Concise Binary Object Representation (CBOR) |
| |
| +suffix: +cbor |
| |
| References: [RFC7049] |
| |
| Encoding Considerations: CBOR is a binary format. |
| |
| Interoperability Considerations: n/a |
| |
| Fragment Identifier Considerations: |
| The syntax and semantics of fragment identifiers specified for |
| +cbor SHOULD be as specified for "application/cbor". (At |
| publication of this document, there is no fragment identification |
| syntax defined for "application/cbor".) |
| |
| The syntax and semantics for fragment identifiers for a specific |
| "xxx/yyy+cbor" SHOULD be processed as follows: |
| |
| For cases defined in +cbor, where the fragment identifier resolves |
| per the +cbor rules, then process as specified in +cbor. |
| |
| For cases defined in +cbor, where the fragment identifier does not |
| resolve per the +cbor rules, then process as specified in |
| "xxx/yyy+cbor". |
| |
| For cases not defined in +cbor, then process as specified in |
| "xxx/yyy+cbor". |
| |
| Security Considerations: See Section 8 of this document |
| |
| Contact: |
| Apps Area Working Group (apps-discuss@ietf.org) |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 37] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Author/Change Controller: |
| The Apps Area Working Group. |
| The IESG has change control over this registration. |
| |
| 8. Security Considerations |
| |
| A network-facing application can exhibit vulnerabilities in its |
| processing logic for incoming data. Complex parsers are well known |
| as a likely source of such vulnerabilities, such as the ability to |
| remotely crash a node, or even remotely execute arbitrary code on it. |
| CBOR attempts to narrow the opportunities for introducing such |
| vulnerabilities by reducing parser complexity, by giving the entire |
| range of encodable values a meaning where possible. |
| |
| Resource exhaustion attacks might attempt to lure a decoder into |
| allocating very big data items (strings, arrays, maps) or exhaust the |
| stack depth by setting up deeply nested items. Decoders need to have |
| appropriate resource management to mitigate these attacks. (Items |
| for which very large sizes are given can also attempt to exploit |
| integer overflow vulnerabilities.) |
| |
| Applications where a CBOR data item is examined by a gatekeeper |
| function and later used by a different application may exhibit |
| vulnerabilities when multiple interpretations of the data item are |
| possible. For example, an attacker could make use of duplicate keys |
| in maps and precision issues in numbers to make the gatekeeper base |
| its decisions on a different interpretation than the one that will be |
| used by the second application. Protocols that are used in a |
| security context should be defined in such a way that these multiple |
| interpretations are reliably reduced to a single one. To facilitate |
| this, encoder and decoder implementations used in such contexts |
| should provide at least one strict mode of operation (Section 3.10). |
| |
| 9. Acknowledgements |
| |
| CBOR was inspired by MessagePack. MessagePack was developed and |
| promoted by Sadayuki Furuhashi ("frsyuki"). This reference to |
| MessagePack is solely for attribution; CBOR is not intended as a |
| version of or replacement for MessagePack, as it has different design |
| goals and requirements. |
| |
| The need for functionality beyond the original MessagePack |
| Specification became obvious to many people at about the same time |
| around the year 2012. BinaryPack is a minor derivation of |
| MessagePack that was developed by Eric Zhang for the binaryjs |
| project. A similar, but different, extension was made by Tim Caswell |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 38] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| for his msgpack-js and msgpack-js-browser projects. Many people have |
| contributed to the recent discussion about extending MessagePack to |
| separate text string representation from byte string representation. |
| |
| The encoding of the additional information in CBOR was inspired by |
| the encoding of length information designed by Klaus Hartke for CoAP. |
| |
| This document also incorporates suggestions made by many people, |
| notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew |
| Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, |
| Tony Finch, Tony Hansen, and Yaron Sheffer. |
| |
| 10. References |
| |
| 10.1. Normative References |
| |
| [ECMA262] European Computer Manufacturers Association, "ECMAScript |
| Language Specification 5.1 Edition", ECMA Standard |
| ECMA-262, June 2011, <http://www.ecma-international.org/ |
| publications/files/ecma-st/ECMA-262.pdf>. |
| |
| [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail |
| Extensions (MIME) Part One: Format of Internet Message |
| Bodies", RFC 2045, November 1996. |
| |
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate |
| Requirement Levels", BCP 14, RFC 2119, March 1997. |
| |
| [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the |
| Internet: Timestamps", RFC 3339, July 2002. |
| |
| [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO |
| 10646", STD 63, RFC 3629, November 2003. |
| |
| [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform |
| Resource Identifier (URI): Generic Syntax", STD 66, RFC |
| 3986, January 2005. |
| |
| [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom |
| Syndication Format", RFC 4287, December 2005. |
| |
| [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data |
| Encodings", RFC 4648, October 2006. |
| |
| [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an |
| IANA Considerations Section in RFCs", BCP 26, RFC 5226, |
| May 2008. |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 39] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| [TIME_T] The Open Group Base Specifications, "Vol. 1: Base |
| Definitions, Issue 7", Section 4.15 'Seconds Since the |
| Epoch', IEEE Std 1003.1, 2013 Edition, 2013, |
| <http://pubs.opengroup.org/onlinepubs/9699919799/ |
| basedefs/V1_chap04.html#tag_04_15>. |
| |
| 10.2. Informative References |
| |
| [ASN.1] International Telecommunication Union, "Information |
| Technology -- ASN.1 encoding rules: Specification of Basic |
| Encoding Rules (BER), Canonical Encoding Rules (CER) and |
| Distinguished Encoding Rules (DER)", ITU-T Recommendation |
| X.690, 1994. |
| |
| [BSON] Various, "BSON - Binary JSON", 2013, |
| <http://bsonspec.org/>. |
| |
| [CNN-TERMS] |
| Bormann, C., Ersue, M., and A. Keranen, "Terminology for |
| Constrained Node Networks", Work in Progress, July 2013. |
| |
| [MessagePack] |
| Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>. |
| |
| [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission |
| Protocol", RFC 713, April 1976. |
| |
| [RFC4627] Crockford, D., "The application/json Media Type for |
| JavaScript Object Notation (JSON)", RFC 4627, July 2006. |
| |
| [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type |
| Specifications and Registration Procedures", BCP 13, RFC |
| 6838, January 2013. |
| |
| [UBJSON] The Buzz Media, "Universal Binary JSON Specification", |
| 2013, <http://ubjson.org/>. |
| |
| [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup |
| Language (YAML[TM]) Version 1.2", 3rd Edition, October |
| 2009, <http://www.yaml.org/spec/1.2/spec.html>. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 40] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Appendix A. Examples |
| |
| The following table provides some CBOR-encoded values in hexadecimal |
| (right column), together with diagnostic notation for these values |
| (left column). Note that the string "\u00fc" is one form of |
| diagnostic notation for a UTF-8 string containing the single Unicode |
| character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut). |
| Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a |
| single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often |
| representing "water"), and "\ud800\udd51" is a UTF-8 string in |
| diagnostic notation with a single character U+10151 (GREEK ACROPHONIC |
| ATTIC FIFTY STATERS). (Note that all these single-character strings |
| could also be represented in native UTF-8 in diagnostic notation, |
| just not in an ASCII-only specification like the present one.) In |
| the diagnostic notation provided for bignums, their intended numeric |
| value is shown as a decimal number (such as 18446744073709551616) |
| instead of showing a tagged byte string (such as |
| 2(h'010000000000000000')). |
| |
| +------------------------------+------------------------------------+ |
| | Diagnostic | Encoded | |
| +------------------------------+------------------------------------+ |
| | 0 | 0x00 | |
| | | | |
| | 1 | 0x01 | |
| | | | |
| | 10 | 0x0a | |
| | | | |
| | 23 | 0x17 | |
| | | | |
| | 24 | 0x1818 | |
| | | | |
| | 25 | 0x1819 | |
| | | | |
| | 100 | 0x1864 | |
| | | | |
| | 1000 | 0x1903e8 | |
| | | | |
| | 1000000 | 0x1a000f4240 | |
| | | | |
| | 1000000000000 | 0x1b000000e8d4a51000 | |
| | | | |
| | 18446744073709551615 | 0x1bffffffffffffffff | |
| | | | |
| | 18446744073709551616 | 0xc249010000000000000000 | |
| | | | |
| | -18446744073709551616 | 0x3bffffffffffffffff | |
| | | | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 41] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | -18446744073709551617 | 0xc349010000000000000000 | |
| | | | |
| | -1 | 0x20 | |
| | | | |
| | -10 | 0x29 | |
| | | | |
| | -100 | 0x3863 | |
| | | | |
| | -1000 | 0x3903e7 | |
| | | | |
| | 0.0 | 0xf90000 | |
| | | | |
| | -0.0 | 0xf98000 | |
| | | | |
| | 1.0 | 0xf93c00 | |
| | | | |
| | 1.1 | 0xfb3ff199999999999a | |
| | | | |
| | 1.5 | 0xf93e00 | |
| | | | |
| | 65504.0 | 0xf97bff | |
| | | | |
| | 100000.0 | 0xfa47c35000 | |
| | | | |
| | 3.4028234663852886e+38 | 0xfa7f7fffff | |
| | | | |
| | 1.0e+300 | 0xfb7e37e43c8800759c | |
| | | | |
| | 5.960464477539063e-8 | 0xf90001 | |
| | | | |
| | 0.00006103515625 | 0xf90400 | |
| | | | |
| | -4.0 | 0xf9c400 | |
| | | | |
| | -4.1 | 0xfbc010666666666666 | |
| | | | |
| | Infinity | 0xf97c00 | |
| | | | |
| | NaN | 0xf97e00 | |
| | | | |
| | -Infinity | 0xf9fc00 | |
| | | | |
| | Infinity | 0xfa7f800000 | |
| | | | |
| | NaN | 0xfa7fc00000 | |
| | | | |
| | -Infinity | 0xfaff800000 | |
| | | | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 42] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | Infinity | 0xfb7ff0000000000000 | |
| | | | |
| | NaN | 0xfb7ff8000000000000 | |
| | | | |
| | -Infinity | 0xfbfff0000000000000 | |
| | | | |
| | false | 0xf4 | |
| | | | |
| | true | 0xf5 | |
| | | | |
| | null | 0xf6 | |
| | | | |
| | undefined | 0xf7 | |
| | | | |
| | simple(16) | 0xf0 | |
| | | | |
| | simple(24) | 0xf818 | |
| | | | |
| | simple(255) | 0xf8ff | |
| | | | |
| | 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | |
| | | 30343a30305a | |
| | | | |
| | 1(1363896240) | 0xc11a514b67b0 | |
| | | | |
| | 1(1363896240.5) | 0xc1fb41d452d9ec200000 | |
| | | | |
| | 23(h'01020304') | 0xd74401020304 | |
| | | | |
| | 24(h'6449455446') | 0xd818456449455446 | |
| | | | |
| | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | |
| | | 616d706c652e636f6d | |
| | | | |
| | h'' | 0x40 | |
| | | | |
| | h'01020304' | 0x4401020304 | |
| | | | |
| | "" | 0x60 | |
| | | | |
| | "a" | 0x6161 | |
| | | | |
| | "IETF" | 0x6449455446 | |
| | | | |
| | "\"\\" | 0x62225c | |
| | | | |
| | "\u00fc" | 0x62c3bc | |
| | | | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 43] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | "\u6c34" | 0x63e6b0b4 | |
| | | | |
| | "\ud800\udd51" | 0x64f0908591 | |
| | | | |
| | [] | 0x80 | |
| | | | |
| | [1, 2, 3] | 0x83010203 | |
| | | | |
| | [1, [2, 3], [4, 5]] | 0x8301820203820405 | |
| | | | |
| | [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | |
| | 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | |
| | 17, 18, 19, 20, 21, 22, 23, | | |
| | 24, 25] | | |
| | | | |
| | {} | 0xa0 | |
| | | | |
| | {1: 2, 3: 4} | 0xa201020304 | |
| | | | |
| | {"a": 1, "b": [2, 3]} | 0xa26161016162820203 | |
| | | | |
| | ["a", {"b": "c"}] | 0x826161a161626163 | |
| | | | |
| | {"a": "A", "b": "B", "c": | 0xa5616161416162614261636143616461 | |
| | "C", "d": "D", "e": "E"} | 4461656145 | |
| | | | |
| | (_ h'0102', h'030405') | 0x5f42010243030405ff | |
| | | | |
| | (_ "strea", "ming") | 0x7f657374726561646d696e67ff | |
| | | | |
| | [_ ] | 0x9fff | |
| | | | |
| | [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | |
| | | | |
| | [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | |
| | | | |
| | [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | |
| | | | |
| | [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | |
| | | | |
| | [_ 1, 2, 3, 4, 5, 6, 7, 8, | 0x9f0102030405060708090a0b0c0d0e0f | |
| | 9, 10, 11, 12, 13, 14, 15, | 101112131415161718181819ff | |
| | 16, 17, 18, 19, 20, 21, 22, | | |
| | 23, 24, 25] | | |
| | | | |
| | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | |
| | | | |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 44] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | ["a", {_ "b": "c"}] | 0x826161bf61626163ff | |
| | | | |
| | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | |
| +------------------------------+------------------------------------+ |
| |
| Table 4: Examples of Encoded CBOR Data Items |
| |
| Appendix B. Jump Table |
| |
| For brevity, this jump table does not show initial bytes that are |
| reserved for future extension. It also only shows a selection of the |
| initial bytes that can be used for optional features. (All unsigned |
| integers are in network byte order.) |
| |
| +-----------------+-------------------------------------------------+ |
| | Byte | Structure/Semantics | |
| +-----------------+-------------------------------------------------+ |
| | 0x00..0x17 | Integer 0x00..0x17 (0..23) | |
| | | | |
| | 0x18 | Unsigned integer (one-byte uint8_t follows) | |
| | | | |
| | 0x19 | Unsigned integer (two-byte uint16_t follows) | |
| | | | |
| | 0x1a | Unsigned integer (four-byte uint32_t follows) | |
| | | | |
| | 0x1b | Unsigned integer (eight-byte uint64_t follows) | |
| | | | |
| | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24) | |
| | | | |
| | 0x38 | Negative integer -1-n (one-byte uint8_t for n | |
| | | follows) | |
| | | | |
| | 0x39 | Negative integer -1-n (two-byte uint16_t for n | |
| | | follows) | |
| | | | |
| | 0x3a | Negative integer -1-n (four-byte uint32_t for n | |
| | | follows) | |
| | | | |
| | 0x3b | Negative integer -1-n (eight-byte uint64_t for | |
| | | n follows) | |
| | | | |
| | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | |
| | | | |
| | 0x58 | byte string (one-byte uint8_t for n, and then n | |
| | | bytes follow) | |
| | | | |
| | 0x59 | byte string (two-byte uint16_t for n, and then | |
| | | n bytes follow) | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 45] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | | | |
| | 0x5a | byte string (four-byte uint32_t for n, and then | |
| | | n bytes follow) | |
| | | | |
| | 0x5b | byte string (eight-byte uint64_t for n, and | |
| | | then n bytes follow) | |
| | | | |
| | 0x5f | byte string, byte strings follow, terminated by | |
| | | "break" | |
| | | | |
| | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | |
| | | | |
| | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | |
| | | n bytes follow) | |
| | | | |
| | 0x79 | UTF-8 string (two-byte uint16_t for n, and then | |
| | | n bytes follow) | |
| | | | |
| | 0x7a | UTF-8 string (four-byte uint32_t for n, and | |
| | | then n bytes follow) | |
| | | | |
| | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | |
| | | then n bytes follow) | |
| | | | |
| | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | |
| | | by "break" | |
| | | | |
| | 0x80..0x97 | array (0x00..0x17 data items follow) | |
| | | | |
| | 0x98 | array (one-byte uint8_t for n, and then n data | |
| | | items follow) | |
| | | | |
| | 0x99 | array (two-byte uint16_t for n, and then n data | |
| | | items follow) | |
| | | | |
| | 0x9a | array (four-byte uint32_t for n, and then n | |
| | | data items follow) | |
| | | | |
| | 0x9b | array (eight-byte uint64_t for n, and then n | |
| | | data items follow) | |
| | | | |
| | 0x9f | array, data items follow, terminated by "break" | |
| | | | |
| | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | |
| | | | |
| | 0xb8 | map (one-byte uint8_t for n, and then n pairs | |
| | | of data items follow) | |
| | | | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 46] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | 0xb9 | map (two-byte uint16_t for n, and then n pairs | |
| | | of data items follow) | |
| | | | |
| | 0xba | map (four-byte uint32_t for n, and then n pairs | |
| | | of data items follow) | |
| | | | |
| | 0xbb | map (eight-byte uint64_t for n, and then n | |
| | | pairs of data items follow) | |
| | | | |
| | 0xbf | map, pairs of data items follow, terminated by | |
| | | "break" | |
| | | | |
| | 0xc0 | Text-based date/time (data item follows; see | |
| | | Section 2.4.1) | |
| | | | |
| | 0xc1 | Epoch-based date/time (data item follows; see | |
| | | Section 2.4.1) | |
| | | | |
| | 0xc2 | Positive bignum (data item "byte string" | |
| | | follows) | |
| | | | |
| | 0xc3 | Negative bignum (data item "byte string" | |
| | | follows) | |
| | | | |
| | 0xc4 | Decimal Fraction (data item "array" follows; | |
| | | see Section 2.4.3) | |
| | | | |
| | 0xc5 | Bigfloat (data item "array" follows; see | |
| | | Section 2.4.3) | |
| | | | |
| | 0xc6..0xd4 | (tagged item) | |
| | | | |
| | 0xd5..0xd7 | Expected Conversion (data item follows; see | |
| | | Section 2.4.4.2) | |
| | | | |
| | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a | |
| | | data item follow) | |
| | | | |
| | 0xe0..0xf3 | (simple value) | |
| | | | |
| | 0xf4 | False | |
| | | | |
| | 0xf5 | True | |
| | | | |
| | 0xf6 | Null | |
| | | | |
| | 0xf7 | Undefined | |
| | | | |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 47] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| | 0xf8 | (simple value, one byte follows) | |
| | | | |
| | 0xf9 | Half-Precision Float (two-byte IEEE 754) | |
| | | | |
| | 0xfa | Single-Precision Float (four-byte IEEE 754) | |
| | | | |
| | 0xfb | Double-Precision Float (eight-byte IEEE 754) | |
| | | | |
| | 0xff | "break" stop code | |
| +-----------------+-------------------------------------------------+ |
| |
| Table 5: Jump Table for Initial Byte |
| |
| Appendix C. Pseudocode |
| |
| The well-formedness of a CBOR item can be checked by the pseudocode |
| in Figure 1. The data is well-formed if and only if: |
| |
| o the pseudocode does not "fail"; |
| |
| o after execution of the pseudocode, no bytes are left in the input |
| (except in streaming applications) |
| |
| The pseudocode has the following prerequisites: |
| |
| o take(n) reads n bytes from the input data and returns them as a |
| byte string. If n bytes are no longer available, take(n) fails. |
| |
| o uint() converts a byte string into an unsigned integer by |
| interpreting the byte string in network byte order. |
| |
| o Arithmetic works as in C. |
| |
| o All variables are unsigned integers of sufficient range. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 48] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| well_formed (breakable = false) { |
| // process initial bytes |
| ib = uint(take(1)); |
| mt = ib >> 5; |
| val = ai = ib & 0x1f; |
| switch (ai) { |
| case 24: val = uint(take(1)); break; |
| case 25: val = uint(take(2)); break; |
| case 26: val = uint(take(4)); break; |
| case 27: val = uint(take(8)); break; |
| case 28: case 29: case 30: fail(); |
| case 31: |
| return well_formed_indefinite(mt, breakable); |
| } |
| // process content |
| switch (mt) { |
| // case 0, 1, 7 do not have content; just use val |
| case 2: case 3: take(val); break; // bytes/UTF-8 |
| case 4: for (i = 0; i < val; i++) well_formed(); break; |
| case 5: for (i = 0; i < val*2; i++) well_formed(); break; |
| case 6: well_formed(); break; // 1 embedded data item |
| } |
| return mt; // finite data item |
| } |
| |
| well_formed_indefinite(mt, breakable) { |
| switch (mt) { |
| case 2: case 3: |
| while ((it = well_formed(true)) != -1) |
| if (it != mt) // need finite embedded |
| fail(); // of same type |
| break; |
| case 4: while (well_formed(true) != -1); break; |
| case 5: while (well_formed(true) != -1) well_formed(); break; |
| case 7: |
| if (breakable) |
| return -1; // signal break out |
| else fail(); // no enclosing indefinite |
| default: fail(); // wrong mt |
| } |
| return 0; // no break out |
| } |
| |
| Figure 1: Pseudocode for Well-Formedness Check |
| |
| Note that the remaining complexity of a complete CBOR decoder is |
| about presenting data that has been parsed to the application in an |
| appropriate form. |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 49] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Major types 0 and 1 are designed in such a way that they can be |
| encoded in C from a signed integer without actually doing an if-then- |
| else for positive/negative (Figure 2). This uses the fact that |
| (-1-n), the transformation for major type 1, is the same as ~n |
| (bitwise complement) in C unsigned arithmetic; ~n can then be |
| expressed as (-1)^n for the negative case, while 0^n leaves n |
| unchanged for non-negative. The sign of a number can be converted to |
| -1 for negative and 0 for non-negative (0 or positive) by arithmetic- |
| shifting the number by one bit less than the bit length of the number |
| (for example, by 63 for 64-bit numbers). |
| |
| void encode_sint(int64_t n) { |
| uint64t ui = n >> 63; // extend sign to whole length |
| mt = ui & 0x20; // extract major type |
| ui ^= n; // complement negatives |
| if (ui < 24) |
| *p++ = mt + ui; |
| else if (ui < 256) { |
| *p++ = mt + 24; |
| *p++ = ui; |
| } else |
| ... |
| |
| Figure 2: Pseudocode for Encoding a Signed Integer |
| |
| Appendix D. Half-Precision |
| |
| As half-precision floating-point numbers were only added to IEEE 754 |
| in 2008, today's programming platforms often still only have limited |
| support for them. It is very easy to include at least decoding |
| support for them even without such support. An example of a small |
| decoder for half-precision floating-point numbers in the C language |
| is shown in Figure 3. A similar program for Python is in Figure 4; |
| this code assumes that the 2-byte value has already been decoded as |
| an (unsigned short) integer in network byte order (as would be done |
| by the pseudocode in Appendix C). |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 50] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| #include <math.h> |
| |
| double decode_half(unsigned char *halfp) { |
| int half = (halfp[0] << 8) + halfp[1]; |
| int exp = (half >> 10) & 0x1f; |
| int mant = half & 0x3ff; |
| double val; |
| if (exp == 0) val = ldexp(mant, -24); |
| else if (exp != 31) val = ldexp(mant + 1024, exp - 25); |
| else val = mant == 0 ? INFINITY : NAN; |
| return half & 0x8000 ? -val : val; |
| } |
| |
| Figure 3: C Code for a Half-Precision Decoder |
| |
| import struct |
| from math import ldexp |
| |
| def decode_single(single): |
| return struct.unpack("!f", struct.pack("!I", single))[0] |
| |
| def decode_half(half): |
| valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 |
| if ((half & 0x7c00) != 0x7c00): |
| return ldexp(decode_single(valu), 112) |
| return decode_single(valu | 0x7f800000) |
| |
| Figure 4: Python Code for a Half-Precision Decoder |
| |
| Appendix E. Comparison of Other Binary Formats to CBOR's Design |
| Objectives |
| |
| The proposal for CBOR follows a history of binary formats that is as |
| long as the history of computers themselves. Different formats have |
| had different objectives. In most cases, the objectives of the |
| format were never stated, although they can sometimes be implied by |
| the context where the format was first used. Some formats were meant |
| to be universally usable, although history has proven that no binary |
| format meets the needs of all protocols and applications. |
| |
| CBOR differs from many of these formats due to it starting with a set |
| of objectives and attempting to meet just those. This section |
| compares a few of the dozens of formats with CBOR's objectives in |
| order to help the reader decide if they want to use CBOR or a |
| different format for a particular protocol or application. |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 51] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| Note that the discussion here is not meant to be a criticism of any |
| format: to the best of our knowledge, no format before CBOR was meant |
| to cover CBOR's objectives in the priority we have assigned them. A |
| brief recap of the objectives from Section 1.1 is: |
| |
| 1. unambiguous encoding of most common data formats from Internet |
| standards |
| |
| 2. code compactness for encoder or decoder |
| |
| 3. no schema description needed |
| |
| 4. reasonably compact serialization |
| |
| 5. applicability to constrained and unconstrained applications |
| |
| 6. good JSON conversion |
| |
| 7. extensibility |
| |
| E.1. ASN.1 DER, BER, and PER |
| |
| [ASN.1] has many serializations. In the IETF, DER and BER are the |
| most common. The serialized output is not particularly compact for |
| many items, and the code needed to decode numeric items can be |
| complex on a constrained device. |
| |
| Few (if any) IETF protocols have adopted one of the several variants |
| of Packed Encoding Rules (PER). There could be many reasons for |
| this, but one that is commonly stated is that PER makes use of the |
| schema even for parsing the surface structure of the data stream, |
| requiring significant tool support. There are different versions of |
| the ASN.1 schema language in use, which has also hampered adoption. |
| |
| E.2. MessagePack |
| |
| [MessagePack] is a concise, widely implemented counted binary |
| serialization format, similar in many properties to CBOR, although |
| somewhat less regular. While the data model can be used to represent |
| JSON data, MessagePack has also been used in many remote procedure |
| call (RPC) applications and for long-term storage of data. |
| |
| MessagePack has been essentially stable since it was first published |
| around 2011; it has not yet had a transition. The evolution of |
| MessagePack is impeded by an imperative to maintain complete |
| backwards compatibility with existing stored data, while only few |
| bytecodes are still available for extension. Repeated requests over |
| the years from the MessagePack user community to separate out binary |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 52] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| and text strings in the encoding recently have led to an extension |
| proposal that would leave MessagePack's "raw" data ambiguous between |
| its usages for binary and text data. The extension mechanism for |
| MessagePack remains unclear. |
| |
| E.3. BSON |
| |
| [BSON] is a data format that was developed for the storage of JSON- |
| like maps (JSON objects) in the MongoDB database. Its major |
| distinguishing feature is the capability for in-place update, |
| foregoing a compact representation. BSON uses a counted |
| representation except for map keys, which are null-byte terminated. |
| While BSON can be used for the representation of JSON-like objects on |
| the wire, its specification is dominated by the requirements of the |
| database application and has become somewhat baroque. The status of |
| how BSON extensions will be implemented remains unclear. |
| |
| E.4. UBJSON |
| |
| [UBJSON] has a design goal to make JSON faster and somewhat smaller, |
| using a binary format that is limited to exactly the data model JSON |
| uses. Thus, there is expressly no intention to support, for example, |
| binary data; however, there is a "high-precision number", expressed |
| as a character string in JSON syntax. UBJSON is not optimized for |
| code compactness, and its type byte coding is optimized for human |
| recognition and not for compact representation of native types such |
| as small integers. Although UBJSON is mostly counted, it provides a |
| reserved "unknown-length" value to support streaming of arrays and |
| maps (JSON objects). Within these containers, UBJSON also has a |
| "Noop" type for padding. |
| |
| E.5. MSDTP: RFC 713 |
| |
| Message Services Data Transmission (MSDTP) is a very early example of |
| a compact message format; it is described in [RFC0713], written in |
| 1976. It is included here for its historical value, not because it |
| was ever widely used. |
| |
| E.6. Conciseness on the Wire |
| |
| While CBOR's design objective of code compactness for encoders and |
| decoders is a higher priority than its objective of conciseness on |
| the wire, many people focus on the wire size. Table 6 shows some |
| encoding examples for the simple nested array [1, [2, 3]]; where some |
| form of indefinite-length encoding is supported by the encoding, |
| [_ 1, [2, 3]] (indefinite length on the outer array) is also shown. |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 53] |
| |
| RFC 7049 CBOR October 2013 |
| |
| |
| +---------------+-------------------------+-------------------------+ |
| | Format | [1, [2, 3]] | [_ 1, [2, 3]] | |
| +---------------+-------------------------+-------------------------+ |
| | RFC 713 | c2 05 81 c2 02 82 83 | | |
| | | | | |
| | ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | |
| | | 01 02 02 01 03 | 01 02 02 01 03 00 00 | |
| | | | | |
| | MessagePack | 92 01 92 02 03 | | |
| | | | | |
| | BSON | 22 00 00 00 10 30 00 01 | | |
| | | 00 00 00 04 31 00 13 00 | | |
| | | 00 00 10 30 00 02 00 00 | | |
| | | 00 10 31 00 03 00 00 00 | | |
| | | 00 00 | | |
| | | | | |
| | UBJSON | 61 02 42 01 61 02 42 02 | 61 ff 42 01 61 02 42 02 | |
| | | 42 03 | 42 03 45 | |
| | | | | |
| | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | |
| +---------------+-------------------------+-------------------------+ |
| |
| Table 6: Examples for Different Levels of Conciseness |
| |
| Authors' Addresses |
| |
| Carsten Bormann |
| Universitaet Bremen TZI |
| Postfach 330440 |
| D-28359 Bremen |
| Germany |
| |
| Phone: +49-421-218-63921 |
| EMail: cabo@tzi.org |
| |
| |
| Paul Hoffman |
| VPN Consortium |
| |
| EMail: paul.hoffman@vpnc.org |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Bormann & Hoffman Standards Track [Page 54] |
| |