libc: optimize pthread mutex lock/unlock operations (1/2)

This patch provides several small optimizations to the
implementation of mutex locking and unlocking. Note that
a following patch will get rid of the global recursion
lock, and provide a few more aggressive changes, I
though it'd be simpler to split this change in two parts.

+ New behaviour: pthread_mutex_lock et al now detect
  recursive mutex overflows and will return EAGAIN in
  this case, as suggested by POSIX. Before, the counter
  would just wrap to 0.

- Remove un-necessary reloads of the mutex value from memory
  by storing it in a local variable (mvalue)

- Remove un-necessary reload of the mutex value by passing
  the 'shared' local variable to _normal_lock / _normal_unlock

- Remove un-necessary reload of the mutex value by using a
  new macro (MUTEX_VALUE_OWNER()) to compare the thread id
  for recursive/errorcheck mutexes

- Use a common inlined function to increment the counter
  of a recursive mutex. Also do not use the global
  recursion lock in this case to speed it up.

Change-Id: I106934ec3a8718f8f852ef547f3f0e9d9435c816
