Big documentation update, particularly tags (#290)
* Big documentation update, particularly tags
* More documentation fixes
---------
Co-authored-by: Laurence Lundblade <lgl@securitytheory.com>
diff --git a/doc/Tagging.md b/doc/Tagging.md
index abb75f9..acaa1f7 100644
--- a/doc/Tagging.md
+++ b/doc/Tagging.md
@@ -1,31 +1,50 @@
@anchor CBORTags
-# Types and Tagging in CBOR
+# QCBOR-oriented Introduction to Tags
## New Types
-CBOR provides a means for defining new data types beyond the primitive
-types like integers and strings. These new types can be the simple
-association of some further semantics with a primitive type or they
-can be large complex aggregations.
+CBOR allows for the definition of new data types beyond basic
+primitives like integers, strings, array and such. These new types can either be
+simple extensions of a primitive type with additional semantics or
+more complex structures involving large aggregations.
-The explicit means for identifying these new types as called tagging.
-It uses a simple unsigned integer known as the tag number to indicate
-that the following CBOR is of that type. Officially speaking, a "tag"
-is made up of exactly a tag number and tag content. The tag content
-is exactly a single data item, but note that a single data item can be
-a map or an array which contains further nested maps and arrays and
-thus arbitrarily complex.
+The mechanism for identifying these new types is called tagging.
+Tagging uses a simple unsigned integer to indicate that the following
+CBOR item is a different type.
-The tag numbers can be up to UINT64_MAX, so there are a lot of them.
-Some defined in standards and registered in the IANA CBOR Tag
-Registry. A very large range is available for proprietary tags.
+For example, when an encoded integer is preceeded by the encoded
+tag number 1, the integer represents an epoch date.
+It's important to note that CBOR uses the word "tag" in an unusual
+way. In CBOR, a "tag" refers to the combination of the tag number and
+the tag content. By the normal dictionary definition, a "tag" would
+be just a tag number, not an aggregation of tag number and tag content
+By analogy, if you attach a small label to an elephant's ear, the
+"tag" in CBOR terms would be the combination of the label (tag number)
+and the elephant (tag content).
+
+QCBOR always uses the term "tag number" to refer to the integer that
+identifies the type, "tag content" to refer to the target of the
+indicating integer and "tag" as the full combination of both.
+
+The tag content is always a single data item. However, this item can
+itself be a complex structure, such as a map or an array, which may
+contain nested maps and arrays, allowing for arbitrarily complex tag
+content.
+
+Tag numbers can range up to UINT64_MAX, providing a large number of
+possible tags. Some are defined by standards and registered in the
+IANA CBOR Tag Registry, while a substantial range is available for
+proprietary use.
+
+
+@anchor AreTagsOptional
## Are Tags "Optional"?
The description of tags in
-[RFC 7049] (https://tools.ietf.org/html/rfc7049) and in some other
+[RFC 7049](https://tools.ietf.org/html/rfc7049) and in some other
places can lead one to think they are optional prefixes that can be
ignored at times. This is not true.
@@ -35,11 +54,11 @@
ignoring a typedef or struct in C.
However, it is common in CBOR-based protocols to use the format,
-semantics or definition of the tag content without it actually being a
+semantics and definition of the tag content without it actually being a
*tag*. One can think of this as *borrowing* the tag content or implied
type information.
-For example, [RFC 8392] (https://tools.ietf.org/html/rfc8392) which
+For example, [RFC 8392](https://tools.ietf.org/html/rfc8392) which
defines a CBOR Web Token, a CWT, says that the NumericDate field is
represented as a CBOR numeric date described as tag 1 in the CBOR
standard, but with the tag number 1 omitted from the encoding. A
@@ -73,42 +92,52 @@
Finally, every CBOR protocol should explicitly spell out how it is
using each tag, borrowing tag content and such. If the protocol you
are trying to implement doesn't, ask the designer. Generally,
-protocols designs should not allow for some data item to optional be
+protocols designs should not allow for some data item to be
either a tag or to be the borrowed tag content. While allowing this
tag optionality is a form of Postel's law, "be liberal in what you
accept", current wisdom is somewhat the opposite.
-## Types and Tags in QCBOR
+## QCBOR Tag APIs
-QCBOR explicitly supports all the tags defined in
-[RFC 7049] (https://tools.ietf.org/html/rfc7049). It has specific APIs
-and data structures for encoding and decoding them.
+The encode APIs are in @ref inc/qcbor/qcbor_tag_encode.h "qcbor_tag_encode.h"
+and decoding APIs in @ref inc/qcbor/qcbor_tag_decode.h "qcbor_tag_decode.h"
-These APIs and structures can support either the full and proper tag
-form or the borrowed content form that is not a tag.
+The base primitives for encoding and decoding tag numbers are
+QCBOREncode_AddTagNumber() and QCBORDecode_VGetNextTagNumber(). These
+are used in constructing and decoding tags. Note that for decoding,
+all tag numbers have to be consumed before decoding the tag
+content. This is different from QCBOR v1 where tag numbers did not
+have to be explicitly consumed.
-The original QCBOR APIs for encoding tags did not allow for encoding
-the borrowed content format. They only support the proper tag
-format. With spiffy decode, a second set of APIs was added that takes
-and argument to indicate whether the proper tag should be output or
-just the borrowed content format should be output. The first set are
-the "AddXxxx" functions and the second the "AddTXxxx" functions.
+QCBOR also provides APIs for directly encoding and decoding all the
+tags standardized in [RFC 8949](https://tools.ietf.org/html/rfc8949)
+for dates, big numbers and such. For encoding their names start with
+"QCBOREncode_AddT" and for decoding they start with "QCBOREncode_Get"
+(TODO: fix this). These APIs can handle
-When decoding with QCBORDecode_GetNext(), the non-spiffy decode API,
-the proper tag form is automatically recognized by the tag number and
-decoded into QCBORItem. This decoding API however cannot recognize
-borrowed content format. The caller must tell QCBOR when borrowed
-content format is expected.
+These APIs and structures support both the full tag form and the
+borrowed content form that is not a tag. An argument of type @ref xxx
+and @ref QCBORDecodeTagReq is provided respectively to the tag encode
+and decode functions to distinguish between full tags and borrowed
+content.
-The spiffy decode APIs for the particular tags are the way the caller
-tells QCBOR to expect borrowed content format. These spiffy decode
-APIs can also decode the proper tag as well. When asked to decode a
-proper tag and the input CBOR is not, it is a decode validity
-error. These APIs take an argument which says whether to expect the
-proper tag or just the borrowed content. They can also be told to
-allow either to "be liberal in what you accept", but this is not
-recommended.
+Early versions of QCBOR do not support encoding borrowed content. The
+old APIs for dates, big numbers and such are listed as deprecated, but
+will continue to be supported. The encode side has functions like
+QCBOREncode_AddDateEpoch() rather than
+QCBOREncode_AddTDateEpoch(). The tag decode APIs always supported
+borrowed content.
+
+Last, QCBORDecode_InstallTagDecoders() allows callbacks to be
+installed that will fire on a particular tag number. These callbacks
+decode the tag content and put it into a QCBORItem with a new QCBOR
+data type. The decoded tags show up as a @ref QCBORItem fetched by
+QCBORDecode_VGetNext().
+
+A set of callbacks called @ref QCBORDecode_TagDecoderTablev1 is
+provided for all the standard tags from RFC 8949. These are not
+automatically installed in QCBOR v2. These were built into QCBOR v1.
## Nested Tags
@@ -117,8 +146,9 @@
encloses another, the enclosed tag is the content for the enclosing
tag.
-Encoding nested tags is easy with QCBOREncode_AddTagNumber(). Just call it
-several times before calling the functions to encode the tag content.
+Encoding nested tags is easy with QCBOREncode_AddTagNumber(). Just
+call it several times before calling the functions to encode the tag
+content.
When QCBOR decodes tags it does so by first completely processing the
built-in tags that it knows how to process. It returns that processed
@@ -144,6 +174,7 @@
the decode context and have to be fetched with
QCBORDecode_GetNthTagOfLast().
+
## Tags for Encapsulated CBOR
Tag 24 and 63 deserve special mention. The content of these tags is a
@@ -154,7 +185,7 @@
either a map or an array or a tag that defined to be complex and
nested. With tag 63, the content can be a sequence of integers not
held together in a map or array. Tag 63 is defined in
-[RFC 8742] (https://tools.ietf.org/html/rfc8742).
+[RFC 8742](https://tools.ietf.org/html/rfc8742).
The point of encapsulating CBOR this way is often so it can be
cryptographically signed. It works well with off-the-shelf encoders
@@ -190,6 +221,7 @@
payload is known to contain CBOR, like the case of a CWT, then QCBOR's
QCBORDecode_EnterBstrWrapped() can be used to decode it.
+
## Tags that Can be Ignored
There are a few specially defined tags that can actually be
@@ -216,98 +248,16 @@
## Standard Tags and the Tags Registry
Tags used in CBOR protocols should at least be registered in the
-[IANA CBOR Tags Registry] (https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml).
-A small number of tags (0-23), are full IETF standards. Further, tags
-24-255 require published documentation, but are not full IETF
-standards. Beyond tag 255, the tags are first come first served.
+[IANA CBOR Tags Registry](https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml).
+A small number of tags (0-23), must be full IETF standards. Tags
+24-255 require published documentation. Beyond tag 255, the tags are
+first come first served. Any tag can be an IETF standard if the
+authors chooses to take it through the process.
There is no range for private use, so any tag used in a CBOR protocol
should be registered. The range of tag values is very large to
accommodate this.
-As described above, It is common to use data types from the registry
-in a CBOR protocol without the explicit tag, so in a way the registry
-is a registry of data types.
-
-
-
-## Tag Decoding
-
-QCBOR offers two ways to decoding tags.
-
-The first is by registering a call back that can transform the tag into
-a QCBORItem itendified by a new QCBOR Type. It is limited in that
-the decoded data must fit into the 24 bytes of a QCBORItem values. It is
-good for new data types.
-
-The second is by getting tag numbers in the course of decoding. This
-is more suitable for tags numbers that indicate message types, those
-that alter the decode flow.
-
-QCBOR v2 (when not in v1 compatibility) requires all tags be consumed.
-If they are not consumed by one of the above methods, xxxx error occurs.
-They are never optional (as they were described in RFC 7049) just is
-it is not optional to ignore whether an item is a string rather than
-an integer.
-
-In v2
-
-TODO: make clean this up
-
-When asking for specific tag decode, for example GetDateEpoch()
-
-Tag required
- - No tag gives error xxxx
- - The epoch date tag by itself succeeds
- - The epoch date tag with wrong content gives error yyy
- - The epoch date tag with additional
- - The additional have been consumed -- suceeds
- - The aditional tags have not been consumed -- gives error aaa
- - Another tag gives --- error zzz
-
- Tag not required
- - No tags, correct tag content -- success
- - No tags, incorrect tag content type error yyy
- - Another tag, not consumed --- error aaa
- - Another tag consumed -- success
- - Another tag consumed and made into another type --- error xxxx
-
- Tag optional
- - No tags, correct content -- success
- - No tags, incorrect content -- error yyy
- - Expected tag -- success
- - Another tag, consumed -- success
- - Another tag, not consumed tag content correct -- error, probably aaa
- - Another tag, consumed and made into another type -- error xxx
- - Expected tag + another tag, not consumed -- error aaa
-
-
- Now fan out for ALLOW_EXTRA --- yuckkkkk
-
- Ignore ALLOW_EXTRA in v2?
-
- Fan out for v1
-
-
-
-
- 0(140) good date tag
- 50000(140)
- interpret as a date with some other tag on it -- must be consumed, so unconsumed tag error
- intepret this as not a date -- wrong type error
- subjective depending on whether tag content decoder is installed
- 1(140) same as above
-
- Tag optional
-
-
-
-
-
-
-## See Also
-
-See @ref Tags-Overview and @ref Tag-Usage.
-
-
-
+As described above, it is common to use data types from the registry
+in a CBOR protocol without the explicit tag, to borrow the content, so
+in a way the IANA registry is a registry of data types.
diff --git a/doc/mainpage.dox b/doc/mainpage.dox
index 18c27f8..2a723ce 100644
--- a/doc/mainpage.dox
+++ b/doc/mainpage.dox
@@ -3,13 +3,28 @@
@par Table of Contents
API Reference
-- Encoding functions: @ref inc/qcbor/qcbor_encode.h "qcbor_encode.h"
-- Main/Basic decode functions: @ref inc/qcbor/qcbor_main_decode.h "qcbor_main_decode.h"
-- Spiffy decode functions: @ref inc/qcbor/qcbor_spiffy_decode.h "qcbor_spiffy_decode.h"
-- Tag decode functions: @ref inc/qcbor/qcbor_tag_decode.h "qcbor_tag_decode.h"
-- Number decode functions: @ref inc/qcbor/qcbor_number_decode.h "qcbor_number_decode.h"
+- Common
+ - Error codes and common constants: @ref inc/qcbor/qcbor_common.h "qcbor_common.h"
+- Encoding
+ - Main/Basic encode functions: @ref inc/qcbor/qcbor_main_encode.h "qcbor_main_encode.h"
+ - Number encode functions: @ref inc/qcbor/qcbor_number_encode.h "qcbor_number_encode.h"
+ - Tag encode functions: @ref inc/qcbor/qcbor_tag_encode.h "qcbor_tag_encode.h"
+- Decoding
+ - Main/Basic decode functions: @ref inc/qcbor/qcbor_main_decode.h "qcbor_main_decode.h"
+ - Spiffy decode functions: @ref inc/qcbor/qcbor_spiffy_decode.h "qcbor_spiffy_decode.h"
+ - Tag decode functions: @ref inc/qcbor/qcbor_tag_decode.h "qcbor_tag_decode.h"
+ - Number decode functions: @ref inc/qcbor/qcbor_number_decode.h "qcbor_number_decode.h"
+
+Note: the API Reference is largely complete, the subject matter below is partial
Subject Matter
-- @ref SpiffyDecode
+- @ref Overview "QCBOR overview and implementation limits"
+- @ref Building "Building QCBOR with make and cmake"
+- @ref CodeSize "Minimizing the amount of code linked and disabling features"
+- @ref SpiffyDecode "The 'spiffy' decode functions for easier map decoding"
+- CBOR Numbers
+ - @ref BigNumbers "Characteristics, encoding and decoding big numbers"
+- Tags
+ - @ref CBORTags
*/