Add option to avoid 64-bit multiplication

Motivation is similar to NO_UDBL_DIVISION.

The alternative implementation of 64-bit mult is straightforward and aims at
obvious correctness. Also, visual examination of the generate assembly show
that it's quite efficient with clang, armcc5 and arm-clang. However current
GCC generates fairly inefficient code for it.

I tried to rework the code in order to make GCC generate more efficient code.
Unfortunately the only way to do that is to get rid of 64-bit add and handle
the carry manually, but this causes other compilers to generate less efficient
code with branches, which is not acceptable from a side-channel point of view.

So let's keep the obvious code that works for most compilers and hope future
versions of GCC learn to manage registers in a sensible way in that context.

See https://bugs.launchpad.net/gcc-arm-embedded/+bug/1775263
diff --git a/include/mbedtls/config.h b/include/mbedtls/config.h
index 663c984..bde5a45 100644
--- a/include/mbedtls/config.h
+++ b/include/mbedtls/config.h
@@ -85,6 +85,28 @@
 //#define MBEDTLS_NO_UDBL_DIVISION
 
 /**
+ * \def MBEDTLS_NO_64BIT_MULTIPLICATION
+ *
+ * The platform lacks support for 32x32 -> 64-bit multiplication.
+ *
+ * Used in:
+ *      library/poly1305.c
+ *
+ * Some parts of the library may use multiplication of two unsigned 32-bit
+ * operands with a 64-bit result in order to speed up computations. On some
+ * platforms, this is not available in hardware and has to be implemented in
+ * software, usually in a library provided by the toolchain.
+ *
+ * Sometimes it is not desirable to have to link to that library. This option
+ * removes the dependency of that library on platforms that lack a hardware
+ * 64-bit multiplier by embedding a software implementation in Mbed TLS.
+ *
+ * Note that depending on the compiler, this may decrease performance compared
+ * to using the library function provided by the toolchain.
+ */
+//#define MBEDTLS_NO_64BIT_MULTIPLICATION
+
+/**
  * \def MBEDTLS_HAVE_SSE2
  *
  * CPU supports SSE2 instruction set.