Make generic code work in presence of system caches

On the ARMv8 architecture, cache maintenance operations by set/way on the last
level of integrated cache do not affect the system cache. This means that such a
flush or clean operation could result in the data being pushed out to the system
cache rather than main memory. Another CPU could access this data before it
enables its data cache or MMU. Such accesses could be serviced from the main
memory instead of the system cache. If the data in the sysem cache has not yet
been flushed or evicted to main memory then there could be a loss of
coherency. The only mechanism to guarantee that the main memory will be updated
is to use cache maintenance operations to the PoC by MVA(See section D3.4.11
(System level caches) of ARMv8-A Reference Manual (Issue A.g/ARM DDI0487A.G).

This patch removes the reliance of Trusted Firmware on the flush by set/way
operation to ensure visibility of data in the main memory. Cache maintenance
operations by MVA are now used instead. The following are the broad category of
changes:

1. The RW areas of BL2/BL31/BL32 are invalidated by MVA before the C runtime is
   initialised. This ensures that any stale cache lines at any level of cache
   are removed.

2. Updates to global data in runtime firmware (BL31) by the primary CPU are made
   visible to secondary CPUs using a cache clean operation by MVA.

3. Cache maintenance by set/way operations are only used prior to power down.

NOTE: NON-UPSTREAM TRUSTED FIRMWARE CODE SHOULD MAKE EQUIVALENT CHANGES IN
ORDER TO FUNCTION CORRECTLY ON PLATFORMS WITH SUPPORT FOR SYSTEM CACHES.

Fixes ARM-software/tf-issues#205

Change-Id: I64f1b398de0432813a0e0881d70f8337681f6e9a
diff --git a/lib/aarch64/cache_helpers.S b/lib/aarch64/cache_helpers.S
index 0dbab1b..476b906 100644
--- a/lib/aarch64/cache_helpers.S
+++ b/lib/aarch64/cache_helpers.S
@@ -32,6 +32,7 @@
 #include <asm_macros.S>
 
 	.globl	flush_dcache_range
+	.globl	clean_dcache_range
 	.globl	inv_dcache_range
 	.globl	dcsw_op_louis
 	.globl	dcsw_op_all
@@ -39,25 +40,39 @@
 	.globl	dcsw_op_level2
 	.globl	dcsw_op_level3
 
+/*
+ * This macro can be used for implementing various data cache operations `op`
+ */
+.macro do_dcache_maintenance_by_mva op
+	dcache_line_size x2, x3
+	add	x1, x0, x1
+	sub	x3, x2, #1
+	bic	x0, x0, x3
+loop_\op:
+	dc	\op, x0
+	add	x0, x0, x2
+	cmp	x0, x1
+	b.lo    loop_\op
+	dsb	sy
+	ret
+.endm
 	/* ------------------------------------------
 	 * Clean+Invalidate from base address till
 	 * size. 'x0' = addr, 'x1' = size
 	 * ------------------------------------------
 	 */
 func flush_dcache_range
-	dcache_line_size x2, x3
-	add	x1, x0, x1
-	sub	x3, x2, #1
-	bic	x0, x0, x3
-flush_loop:
-	dc	civac, x0
-	add	x0, x0, x2
-	cmp	x0, x1
-	b.lo    flush_loop
-	dsb	sy
-	ret
+	do_dcache_maintenance_by_mva civac
 endfunc flush_dcache_range
 
+	/* ------------------------------------------
+	 * Clean from base address till size.
+	 * 'x0' = addr, 'x1' = size
+	 * ------------------------------------------
+	 */
+func clean_dcache_range
+	do_dcache_maintenance_by_mva cvac
+endfunc clean_dcache_range
 
 	/* ------------------------------------------
 	 * Invalidate from base address till
@@ -65,17 +80,7 @@
 	 * ------------------------------------------
 	 */
 func inv_dcache_range
-	dcache_line_size x2, x3
-	add	x1, x0, x1
-	sub	x3, x2, #1
-	bic	x0, x0, x3
-inv_loop:
-	dc	ivac, x0
-	add	x0, x0, x2
-	cmp	x0, x1
-	b.lo    inv_loop
-	dsb	sy
-	ret
+	do_dcache_maintenance_by_mva ivac
 endfunc inv_dcache_range