PSCI: Optimize call paths if all participants are cache-coherent

The current PSCI implementation can apply certain optimizations upon the
assumption that all PSCI participants are cache-coherent.

  - Skip performing cache maintenance during power-up.

  - Skip performing cache maintenance during power-down:

    At present, on the power-down path, CPU driver disables caches and
    MMU, and performs cache maintenance in preparation for powering down
    the CPU. This means that PSCI must perform additional cache
    maintenance on the extant stack for correct functioning.

    If all participating CPUs are cache-coherent, CPU driver would
    neither disable MMU nor perform cache maintenance. The CPU being
    powered down, therefore, remain cache-coherent throughout all PSCI
    call paths. This in turn means that PSCI cache maintenance
    operations are not required during power down.

  - Choose spin locks instead of bakery locks:

    The current PSCI implementation must synchronize both cache-coherent
    and non-cache-coherent participants. Mutual exclusion primitives are
    not guaranteed to function on non-coherent memory. For this reason,
    the current PSCI implementation had to resort to bakery locks.

    If all participants are cache-coherent, the implementation can
    enable MMU and data caches early, and substitute bakery locks for
    spin locks. Spin locks make use of architectural mutual exclusion
    primitives, and are lighter and faster.

The optimizations are applied when HW_ASSISTED_COHERENCY build option is
enabled, as it's expected that all PSCI participants are cache-coherent
in those systems.

Change-Id: Iac51c3ed318ea7e2120f6b6a46fd2db2eae46ede
Signed-off-by: Jeenu Viswambharan <jeenu.viswambharan@arm.com>
diff --git a/docs/psci-lib-integration-guide.md b/docs/psci-lib-integration-guide.md
index f290966..d81b328 100644
--- a/docs/psci-lib-integration-guide.md
+++ b/docs/psci-lib-integration-guide.md
@@ -176,7 +176,9 @@
    * The page tables must be setup and the MMU enabled
    * The C runtime environment must be setup and stack initialized
    * The Data cache must be enabled prior to invoking any of the PSCI library
-     interfaces except for `psci_warmboot_entrypoint()`.
+     interfaces except for `psci_warmboot_entrypoint()`. For
+     `psci_warmboot_entrypoint()`, if the build option `HW_ASSISTED_COHERENCY`
+     is enabled however, data caches are expected to be enabled.
 
 Further requirements for each interface can be found in the interface
 description.
@@ -270,11 +272,11 @@
     Return   : void
 
 This function performs the warm boot initialization/restoration as mandated by
-[PSCI spec]. For AArch32, on wakeup from power down the CPU resets to secure
-SVC mode and the EL3 Runtime Software must perform the prerequisite
-initializations mentioned at top of this section. This function must be called
-with Data cache disabled but with MMU initialized and enabled. The major
-actions performed by this function are:
+[PSCI spec]. For AArch32, on wakeup from power down the CPU resets to secure SVC
+mode and the EL3 Runtime Software must perform the prerequisite initializations
+mentioned at top of this section. This function must be called with Data cache
+disabled (unless build option `HW_ASSISTED_COHERENCY` is enabled) but with MMU
+initialized and enabled. The major actions performed by this function are:
 
   * Invalidates the stack and enables the data cache.
   * Initializes architecture and PSCI state coordination.
diff --git a/lib/psci/psci_common.c b/lib/psci/psci_common.c
index 026690d..1be37c0 100644
--- a/lib/psci/psci_common.c
+++ b/lib/psci/psci_common.c
@@ -79,7 +79,8 @@
 #endif
 ;
 
-DEFINE_BAKERY_LOCK(psci_locks[PSCI_NUM_NON_CPU_PWR_DOMAINS]);
+/* Lock for PSCI state coordination */
+DEFINE_PSCI_LOCK(psci_locks[PSCI_NUM_NON_CPU_PWR_DOMAINS]);
 
 cpu_pd_node_t psci_cpu_pd_nodes[PLATFORM_CORE_COUNT];
 
@@ -992,3 +993,33 @@
 }
 
 #endif
+
+/*******************************************************************************
+ * Initiate power down sequence, by calling power down operations registered for
+ * this CPU.
+ ******************************************************************************/
+void psci_do_pwrdown_sequence(unsigned int power_level)
+{
+#if HW_ASSISTED_COHERENCY
+	/*
+	 * With hardware-assisted coherency, the CPU drivers only initiate the
+	 * power down sequence, without performing cache-maintenance operations
+	 * in software. Data caches and MMU remain enabled both before and after
+	 * this call.
+	 */
+	prepare_cpu_pwr_dwn(power_level);
+#else
+	/*
+	 * Without hardware-assisted coherency, the CPU drivers disable data
+	 * caches and MMU, then perform cache-maintenance operations in
+	 * software.
+	 *
+	 * We ought to call prepare_cpu_pwr_dwn() to initiate power down
+	 * sequence. We currently have data caches and MMU enabled, but the
+	 * function will return with data caches and MMU disabled. We must
+	 * ensure that the stack memory is flushed out to memory before we start
+	 * popping from it again.
+	 */
+	psci_do_pwrdown_cache_maintenance(power_level);
+#endif
+}
diff --git a/lib/psci/psci_off.c b/lib/psci/psci_off.c
index 94cf2ed..4ba7865 100644
--- a/lib/psci/psci_off.c
+++ b/lib/psci/psci_off.c
@@ -119,10 +119,9 @@
 #endif
 
 	/*
-	 * Arch. management. Perform the necessary steps to flush all
-	 * cpu caches.
+	 * Arch. management. Initiate power down sequence.
 	 */
-	psci_do_pwrdown_cache_maintenance(psci_find_max_off_lvl(&state_info));
+	psci_do_pwrdown_sequence(psci_find_max_off_lvl(&state_info));
 
 #if ENABLE_RUNTIME_INSTRUMENTATION
 	PMF_CAPTURE_TIMESTAMP(rt_instr_svc,
diff --git a/lib/psci/psci_on.c b/lib/psci/psci_on.c
index f4bb797..675ed66 100644
--- a/lib/psci/psci_on.c
+++ b/lib/psci/psci_on.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2013-2016, ARM Limited and Contributors. All rights reserved.
+ * Copyright (c) 2013-2017, ARM Limited and Contributors. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -165,10 +165,12 @@
 	 */
 	psci_plat_pm_ops->pwr_domain_on_finish(state_info);
 
+#if !HW_ASSISTED_COHERENCY
 	/*
 	 * Arch. management: Enable data cache and manage stack memory
 	 */
 	psci_do_pwrup_cache_maintenance();
+#endif
 
 	/*
 	 * All the platform specific actions for turning this cpu
diff --git a/lib/psci/psci_private.h b/lib/psci/psci_private.h
index 7f0204a..a27e215 100644
--- a/lib/psci/psci_private.h
+++ b/lib/psci/psci_private.h
@@ -39,6 +39,7 @@
 #include <spinlock.h>
 
 #if HW_ASSISTED_COHERENCY
+
 /*
  * On systems with hardware-assisted coherency, make PSCI cache operations NOP,
  * as PSCI participants are cache-coherent, and there's no need for explicit
@@ -49,7 +50,21 @@
 #define psci_inv_cpu_data(member)
 
 #define psci_dsbish()
+
+/*
+ * On systems where participant CPUs are cache-coherent, we can use spinlocks
+ * instead of bakery locks.
+ */
+#define DEFINE_PSCI_LOCK(_name)		spinlock_t _name
+#define DECLARE_PSCI_LOCK(_name)	extern DEFINE_PSCI_LOCK(_name)
+
+#define psci_lock_get(non_cpu_pd_node)				\
+	spin_lock(&psci_locks[(non_cpu_pd_node)->lock_index])
+#define psci_lock_release(non_cpu_pd_node)			\
+	spin_unlock(&psci_locks[(non_cpu_pd_node)->lock_index])
+
 #else
+
 /*
  * If not all PSCI participants are cache-coherent, perform cache maintenance
  * and issue barriers wherever required to coordinate state.
@@ -59,19 +74,24 @@
 #define psci_inv_cpu_data(member)		inv_cpu_data(member)
 
 #define psci_dsbish()				dsbish()
-#endif
 
 /*
- * The following helper macros abstract the interface to the Bakery
- * Lock API.
+ * Use bakery locks for state coordination as not all PSCI participants are
+ * cache coherent.
  */
-#define psci_lock_init(non_cpu_pd_node, idx)			\
-	((non_cpu_pd_node)[(idx)].lock_index = (idx))
+#define DEFINE_PSCI_LOCK(_name)		DEFINE_BAKERY_LOCK(_name)
+#define DECLARE_PSCI_LOCK(_name)	DECLARE_BAKERY_LOCK(_name)
+
 #define psci_lock_get(non_cpu_pd_node)				\
 	bakery_lock_get(&psci_locks[(non_cpu_pd_node)->lock_index])
 #define psci_lock_release(non_cpu_pd_node)			\
 	bakery_lock_release(&psci_locks[(non_cpu_pd_node)->lock_index])
 
+#endif
+
+#define psci_lock_init(non_cpu_pd_node, idx)			\
+	((non_cpu_pd_node)[(idx)].lock_index = (idx))
+
 /*
  * The PSCI capability which are provided by the generic code but does not
  * depend on the platform or spd capabilities.
@@ -189,8 +209,8 @@
 extern cpu_pd_node_t psci_cpu_pd_nodes[PLATFORM_CORE_COUNT];
 extern unsigned int psci_caps;
 
-/* One bakery lock is required for each non-cpu power domain */
-DECLARE_BAKERY_LOCK(psci_locks[PSCI_NUM_NON_CPU_PWR_DOMAINS]);
+/* One lock is required per non-CPU power domain node */
+DECLARE_PSCI_LOCK(psci_locks[PSCI_NUM_NON_CPU_PWR_DOMAINS]);
 
 /*******************************************************************************
  * SPD's power management hooks registered with PSCI
@@ -227,6 +247,14 @@
 void psci_print_power_domain_map(void);
 unsigned int psci_is_last_on_cpu(void);
 int psci_spd_migrate_info(u_register_t *mpidr);
+void psci_do_pwrdown_sequence(unsigned int power_level);
+
+/*
+ * CPU power down is directly called only when HW_ASSISTED_COHERENCY is
+ * available. Otherwise, this needs post-call stack maintenance, which is
+ * handled in assembly.
+ */
+void prepare_cpu_pwr_dwn(unsigned int power_level);
 
 /* Private exported functions from psci_on.c */
 int psci_cpu_on_start(u_register_t target_cpu,
diff --git a/lib/psci/psci_suspend.c b/lib/psci/psci_suspend.c
index 23e5ada..08c8fd6 100644
--- a/lib/psci/psci_suspend.c
+++ b/lib/psci/psci_suspend.c
@@ -121,13 +121,11 @@
 #endif
 
 	/*
-	 * Arch. management. Perform the necessary steps to flush all
-	 * cpu caches. Currently we assume that the power level correspond
-	 * the cache level.
+	 * Arch. management. Initiate power down sequence.
 	 * TODO : Introduce a mechanism to query the cache level to flush
 	 * and the cpu-ops power down to perform from the platform.
 	 */
-	psci_do_pwrdown_cache_maintenance(max_off_lvl);
+	psci_do_pwrdown_sequence(max_off_lvl);
 
 #if ENABLE_RUNTIME_INSTRUMENTATION
 	PMF_CAPTURE_TIMESTAMP(rt_instr_svc,
@@ -304,12 +302,10 @@
 	 */
 	psci_plat_pm_ops->pwr_domain_suspend_finish(state_info);
 
-	/*
-	 * Arch. management: Enable the data cache, manage stack memory and
-	 * restore the stashed EL3 architectural context from the 'cpu_context'
-	 * structure for this cpu.
-	 */
+#if !HW_ASSISTED_COHERENCY
+	/* Arch. management: Enable the data cache, stack memory maintenance. */
 	psci_do_pwrup_cache_maintenance();
+#endif
 
 	/* Re-init the cntfrq_el0 register */
 	counter_freq = plat_get_syscnt_freq2();