Add DSP support for fully connected int16

Authors: Sebastian-Larsson, oscarandersson8218 (GitHub)

Adding DSP implementation for fully connected int16 as well a unit
test.

Also replacing SXTB(__ROR()) by __SXTB16_RORn in read_and_pad, since it
is used by this and it is faster for GCC.

Change-Id: I4a40a3f9137435e822957c6c8f971e65c1bf9706
diff --git a/ARM.CMSIS.pdsc b/ARM.CMSIS.pdsc
index c9d4551..9267266 100644
--- a/ARM.CMSIS.pdsc
+++ b/ARM.CMSIS.pdsc
@@ -13,7 +13,7 @@
       CMSIS-DSP: 1.10.0 (see revision history for details)
       CMSIS-NN: 3.1.0 (see revision history for details)
        - Support for int16 convolution and fully connected for reference implementation
-       - Support for DSP extension optimization for int16 convolution
+       - Support for DSP extension optimization for int16 convolution and fully connected
     </release>
     <release version="5.8.0" date="2021-06-24">
       CMSIS-Core(M): 5.5.0 (see revision history for details)