x86_64 MULADDC assembly: add missing constraints about memory

MULADDC_CORE reads from (%%rsi) and writes to (%%rdi). This fragment is
repeated up to 16 times, and %%rsi and %%rdi are s and d on entry
respectively. Hence the complete asm statement reads 16 64-bit words
from memory starting at s, and writes 16 64-bit words starting at d.

Without any declaration of modified memory, Clang 12 and Clang 13 generated
non-working code for mbedtls_mpi_mod_exp. The constraints make the unit
tests pass with Clang 12.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
diff --git a/include/mbedtls/bn_mul.h b/include/mbedtls/bn_mul.h
index 17d057f..1dea22b 100644
--- a/include/mbedtls/bn_mul.h
+++ b/include/mbedtls/bn_mul.h
@@ -189,9 +189,9 @@
         "addq   $8, %%rdi\n"
 
 #define MULADDC_STOP                        \
-        : "+c" (c), "+D" (d), "+S" (s)      \
-        : "b" (b)                           \
-        : "rax", "rdx", "r8"                \
+        : "+c" (c), "+D" (d), "+S" (s), "+m" (*(uint64_t (*)[16]) d) \
+        : "b" (b), "m" (*(const uint64_t (*)[16]) s)                 \
+        : "rax", "rdx", "r8"                                         \
     );
 
 #endif /* AMD64 */