Enable data caches early with hardware-assisted coherency

At present, warm-booted CPUs keep their caches disabled when enabling
MMU, and remains so until they enter coherency later.

On systems with hardware-assisted coherency, for which
HW_ASSISTED_COHERENCY build flag would be enabled, warm-booted CPUs can
have both caches and MMU enabled at once.

Change-Id: Icb0adb026e01aecf34beadf49c88faa9dd368327
Signed-off-by: Jeenu Viswambharan <jeenu.viswambharan@arm.com>
diff --git a/bl31/aarch64/bl31_entrypoint.S b/bl31/aarch64/bl31_entrypoint.S
index d14a68d..f6a21dc 100644
--- a/bl31/aarch64/bl31_entrypoint.S
+++ b/bl31/aarch64/bl31_entrypoint.S
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2013-2016, ARM Limited and Contributors. All rights reserved.
+ * Copyright (c) 2013-2017, ARM Limited and Contributors. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -180,24 +180,29 @@
 		_init_c_runtime=0				\
 		_exception_vectors=runtime_exceptions
 
-	/* --------------------------------------------
-	 * Enable the MMU with the DCache disabled. It
-	 * is safe to use stacks allocated in normal
-	 * memory as a result. All memory accesses are
-	 * marked nGnRnE when the MMU is disabled. So
-	 * all the stack writes will make it to memory.
-	 * All memory accesses are marked Non-cacheable
-	 * when the MMU is enabled but D$ is disabled.
-	 * So used stack memory is guaranteed to be
-	 * visible immediately after the MMU is enabled
-	 * Enabling the DCache at the same time as the
-	 * MMU can lead to speculatively fetched and
-	 * possibly stale stack memory being read from
-	 * other caches. This can lead to coherency
-	 * issues.
-	 * --------------------------------------------
+	/*
+	 * We're about to enable MMU and participate in PSCI state coordination.
+	 *
+	 * The PSCI implementation invokes platform routines that enable CPUs to
+	 * participate in coherency. On a system where CPUs are not
+	 * cache-coherent out of reset, having caches enabled until such time
+	 * might lead to coherency issues (resulting from stale data getting
+	 * speculatively fetched, among others). Therefore we keep data caches
+	 * disabled while enabling the MMU, thereby forcing data accesses to
+	 * have non-cacheable, nGnRnE attributes (these will always be coherent
+	 * with main memory).
+	 *
+	 * On systems with hardware-assisted coherency, where CPUs are expected
+	 * to be cache-coherent out of reset without needing explicit software
+	 * intervention, PSCI need not invoke platform routines to enter
+	 * coherency (as CPUs already are); and there's no reason to have caches
+	 * disabled either.
 	 */
+#if HW_ASSISTED_COHERENCY
+	mov	x0, #0
+#else
 	mov	x0, #DISABLE_DCACHE
+#endif
 	bl	bl31_plat_enable_mmu
 
 	bl	psci_warmboot_entrypoint
diff --git a/bl32/sp_min/aarch32/entrypoint.S b/bl32/sp_min/aarch32/entrypoint.S
index e2ab923..d934bb8 100644
--- a/bl32/sp_min/aarch32/entrypoint.S
+++ b/bl32/sp_min/aarch32/entrypoint.S
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2016, ARM Limited and Contributors. All rights reserved.
+ * Copyright (c) 2016-2017, ARM Limited and Contributors. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -231,24 +231,27 @@
 		_init_c_runtime=0				\
 		_exception_vectors=sp_min_vector_table
 
-	/* --------------------------------------------
-	 * Enable the MMU with the DCache disabled. It
-	 * is safe to use stacks allocated in normal
-	 * memory as a result. All memory accesses are
-	 * marked nGnRnE when the MMU is disabled. So
-	 * all the stack writes will make it to memory.
-	 * All memory accesses are marked Non-cacheable
-	 * when the MMU is enabled but D$ is disabled.
-	 * So used stack memory is guaranteed to be
-	 * visible immediately after the MMU is enabled
-	 * Enabling the DCache at the same time as the
-	 * MMU can lead to speculatively fetched and
-	 * possibly stale stack memory being read from
-	 * other caches. This can lead to coherency
-	 * issues.
-	 * --------------------------------------------
+	/*
+	 * We're about to enable MMU and participate in PSCI state coordination.
+	 *
+	 * The PSCI implementation invokes platform routines that enable CPUs to
+	 * participate in coherency. On a system where CPUs are not
+	 * cache-coherent out of reset, having caches enabled until such time
+	 * might lead to coherency issues (resulting from stale data getting
+	 * speculatively fetched, among others). Therefore we keep data caches
+	 * disabled while enabling the MMU, thereby forcing data accesses to
+	 * have non-cacheable, nGnRnE attributes (these will always be coherent
+	 * with main memory).
+	 *
+	 * On systems where CPUs are cache-coherent out of reset, however, PSCI
+	 * need not invoke platform routines to enter coherency (as CPUs already
+	 * are), and there's no reason to have caches disabled either.
 	 */
+#if HW_ASSISTED_COHERENCY
+	mov	r0, #0
+#else
 	mov	r0, #DISABLE_DCACHE
+#endif
 	bl	bl32_plat_enable_mmu
 
 	bl	sp_min_warm_boot