Fix the CAS spinlock implementation

Make the spinlock implementation use ARMv8.1-LSE CAS instruction based
on a platform build option. The CAS-based implementation used to be
unconditionally selected for all ARM8.1+ platforms.

The previous CAS spinlock implementation had a bug wherein the spin_unlock()
implementation had an `sev` after `stlr` which is not sufficient. A dsb is
needed to ensure that the stlr completes prior to the sev. Having a dsb is
heavyweight and a better solution would be to use load exclusive semantics
to monitor the lock and wake up from wfe when a store happens to the lock.
The patch implements the same.

Change-Id: I5283ce4a889376e4cc01d1b9d09afa8229a2e522
Signed-off-by: Soby Mathew <soby.mathew@arm.com>
Signed-off-by: Olivier Deprez <olivier.deprez@arm.com>
diff --git a/docs/design/firmware-design.rst b/docs/design/firmware-design.rst
index dc08208..2cbd9c9 100644
--- a/docs/design/firmware-design.rst
+++ b/docs/design/firmware-design.rst
@@ -2540,8 +2540,11 @@
 This Architecture Extension is targeted when ``ARM_ARCH_MAJOR`` >= 8, or when
 ``ARM_ARCH_MAJOR`` == 8 and ``ARM_ARCH_MINOR`` >= 1.
 
--  The Compare and Swap instruction is used to implement spinlocks. Otherwise,
-   the load-/store-exclusive instruction pair is used.
+-  By default, a load-/store-exclusive instruction pair is used to implement
+   spinlocks. The ``USE_SPINLOCK_CAS`` build option when set to 1 selects the
+   spinlock implementation using the ARMv8.1-LSE Compare and Swap instruction.
+   Notice this instruction is only available in AArch64 execution state, so
+   the option is only available to AArch64 builds.
 
 Armv8.2-A
 ~~~~~~~~~