Duplicate map label detection for encoding (#209)
This adds duplicate map label detection during encoding as part of sorting.
There was a lot of rework of map sorting.
UsefulOutBuf_Compare() was changed to behave differently and more universally.
* Duplicate detection for encoding
* rework UsefulOutBuf_Compare and test
* Dup detection seems to be working
* Final tidy-up
---------
Co-authored-by: Laurence Lundblade <lgl@securitytheory.com>
diff --git a/inc/qcbor/UsefulBuf.h b/inc/qcbor/UsefulBuf.h
index eb0a691..1a4a3bf 100644
--- a/inc/qcbor/UsefulBuf.h
+++ b/inc/qcbor/UsefulBuf.h
@@ -1,6 +1,6 @@
/* =========================================================================
* Copyright (c) 2016-2018, The Linux Foundation.
- * Copyright (c) 2018-2022, Laurence Lundblade.
+ * Copyright (c) 2018-2024, Laurence Lundblade.
* Copyright (c) 2021, Arm Limited. All rights reserved.
* All rights reserved.
*
@@ -43,6 +43,7 @@
when who what, where, why
-------- ---- --------------------------------------------------
+ 28/02/2022 llundblade Rearrange UsefulOutBuf_Compare().
19/11/2023 llundblade Add UsefulOutBuf_GetOutput().
19/11/2023 llundblade Add UsefulOutBuf_Swap().
19/11/2023 llundblade Add UsefulOutBuf_Compare().
@@ -1401,34 +1402,41 @@
*
* @param[in] pUOutBuf Pointer to the @ref UsefulOutBuf.
* @param[in] uStart1 Offset of first bytes to compare.
- * @param[in] uStart2 Offset of second bytes to compare.
+ * @param[in] uLen1 Length of first bytes to compare.
+ * @param[in] uStart2 Offset of second bytes to compare.
+ * @param[in] uLen2 Length of second bytes to compare.
*
* @return 0 for equality, positive if uStart1 is lexographically larger,
* negative if uStart2 is lexographically larger.
- *
+ *
* This looks into bytes that have been output at the offsets @c start1
* and @c start2. It compares bytes at those two starting points until
- * they are not equal or the end of the output data is reached from
- * one of the starting points.
+ * they are not equal or @c uLen1 or @c uLen2 is reached. If the
+ * length of the string given is off the end of the output data, the
+ * string will be effectively concated to the data in the output
+ * buffer for the comparison.
*
* This returns positive when @c uStart1 lexographically sorts ahead
* of @c uStart2 and vice versa. Zero is returned if the strings
- * compare equally. This only happens when the end of the valid data
- * is reached from one of the starting points and the comparison up to
- * that point is equality.
+ * compare equally.
+ *
+ * If lengths are unequal and the first bytes are an exact subset of
+ * the second string, then a positve value will be returned and vice
+ * versa.
*
* If either start is past the end of data in the output buffer, 0
* will be returned. It is the caller's responsibility to make sure
- * the offsets are not off the end such that a comparison is actually
+ * the offsets are not off the end so that a comparison is actually
* being made. No data will ever be read off the end of the buffer so
* this safe no matter what offsets are passed.
*
* This is a relatively odd function in that it works on data in the
- * output buffer. It is employed by QCBOR to sort CBOR-encoded maps that
- * are in the output buffer.
+ * output buffer. It is employed by QCBOR to sort CBOR-encoded maps
+ * that are in the output buffer.
*/
-int UsefulOutBuf_Compare(UsefulOutBuf *pUOutBuf, size_t uStart1, size_t uStart2);
-
+int UsefulOutBuf_Compare(UsefulOutBuf *pUOutBuf,
+ size_t uStart1, size_t uLen1,
+ size_t uStart2, size_t uLen2);
/**
* @brief Swap two regions of output bytes.
diff --git a/inc/qcbor/qcbor_encode.h b/inc/qcbor/qcbor_encode.h
index f27af1f..694c2a3 100644
--- a/inc/qcbor/qcbor_encode.h
+++ b/inc/qcbor/qcbor_encode.h
@@ -2188,7 +2188,7 @@
* This is the same as QCBOREncode_CloseMap(), but the open map that
* is being close must be of indefinite length.
*/
-static void
+static void
QCBOREncode_CloseMapIndefiniteLength(QCBOREncodeContext *pCtx);
@@ -2198,22 +2198,25 @@
* @param[in] pCtx The encoding context to close the map in .
*
* This is the same as QCBOREncode_CloseMap() except it sorts the map
- * per RFC 8949 Section 4.2.1. This sort is lexicographic of the CBOR-encoded
- * map labels.
+ * per RFC 8949 Section 4.2.1 and checks for duplicate map keys. This
+ * sort is lexicographic of the CBOR-encoded map labels.
*
* This is more expensive than most things in the encoder. It uses
- * bubble sort which runs in n-squared time where n is the number of
- * map items. Sorting large maps on slow CPUs might be slow. This is
- * also increases the object code size of the encoder by about 30%
+ * bubble sort which runs in n-squared time where @c n is the number
+ * of map items. Sorting large maps on slow CPUs might be slow. This
+ * is also increases the object code size of the encoder by about 30%
* (500-1000 bytes).
*
- * Bubble sort was selected so as to not need an extra buffer to track
- * map item offsets. Bubble sort works well even though map items are
- * not all the same size because it always swaps adjacent items.
+ * Bubble sort was selected so as to not need require configuration of
+ * a buffer to track map item offsets. Bubble sort works well even
+ * though map items are not all the same size because it always swaps
+ * adjacent items.
*/
-void QCBOREncode_CloseAndSortMap(QCBOREncodeContext *pCtx);
+void
+QCBOREncode_CloseAndSortMap(QCBOREncodeContext *pCtx);
-void QCBOREncode_CloseAndSortMapIndef(QCBOREncodeContext *pCtx);
+void
+QCBOREncode_CloseAndSortMapIndef(QCBOREncodeContext *pCtx);
/**