The semantic of the stack bounds is not consistent or is not described.
At various places, there were either some assumption that the 'end'
boundary (highest address) was either not included, included,
or was the highest addressable word, or the highest addressable byte.
This e.g. was very visible when doing:
  ./vg-in-place -d -d ./helgrind/tests/tc01_simple_race|&grep regi
giving
  --24040:2:stacks     register 0xBEDB4000-0xBEDB4FFF as stack 0
  --24040:2:stacks     register 0x402C000-0x4A2C000 as stack 1
showing that the main stack end was (on x86) not the highest word
but the highest byte, while for the thread 1, the registered end
was a byte not part of the stack.

The attached patch ensures that stack bounds semantic are documented and
consistent. Also, some of the stack handling code is factorised.

The convention that the patch ensures and documents is:
start is the lowest addressable byte, end is the highest addressable byte.
(the words 'min' and 'max' have been kept when already used, as this wording is 
consistent with the new semantic of start/end).

In various debug log, used brackets [ and ] to make clear that
both bounds are included.

The code to guess and register the client stack was duplicated
in all the platform specific syswrap-<plat>-<os>.c files.
Code has been factorised in syswrap-generic.c

The patch has been regression tested on
   x86, amd64, ppc32/64, s390x.
It has been compiled and one test run on arm64.
Not compiled/not tested on darwin, android, mips32/64, arm


More in details, the patch does the following:

coregrind/pub_core_aspacemgr.h
include/valgrind.h
include/pub_tool_machine.h
coregrind/pub_core_scheduler.h
coregrind/pub_core_stacks.h
  - document start/end semantic in various functions
 also in pub_tool_machine.h:
  - replaces unclear 'bottommost address' by 'lowest address'
    (unclear as stack bottom is or at least can be interpreted as
     the 'functional' bottom of the stack, which is the highest
      address for 'stack growing downwards').
coregrind/pub_core_initimg.h
  replace unclear clstack_top by clstack_end
coregrind/m_main.c
  updated to clstack_end

coregrind/pub_core_threadstate.h
  renamed client_stack_highest_word to client_stack_highest_byte
coregrind/m_scheduler/scheduler.c
  computes client_stack_highest_byte as the highest addressable byte
  Update comments in call to VG_(show_sched_status)
coregrind/m_machine.c
coregrind/m_stacktrace.c
  updated to client_stack_highest_byte, and switched 
    stack_lowest/highest_word to stack_lowest/highest_byte accordingly

coregrind/m_stacks.c
  clarify semantic of start/end,
  added a comment to indicate why we invert start/end in register call
  (note that the code find_stack_by_addr was already assuming that
  end was included as the checks were doing e.g.
    sp >= i->start && sp <= i->end

coregrind/pub_core_clientstate.h
coregrind/m_clientstate.c
  renames Addr  VG_(clstk_base) to Addr  VG_(clstk_start_base)
    (start to indicate it is the lowest address, base suffix kept
     to indicate it is the initial lowest address).

coregrind/m_initimg/initimg-darwin.c
   updated to  VG_(clstk_start_base)
   replace unclear iicii.clstack_top by iicii.clstack_end
   updated clstack_max_size computation according to both bounds included.

coregrind/m_initimg/initimg-linux.c
   updated to  VG_(clstk_start_base)
   updated VG_(clstk_end) computation according to both bounds included.
   replace unclear iicii.clstack_top by iicii.clstack_end

coregrind/pub_core_aspacemgr.h
  extern Addr VG_(am_startup) : clarify semantic of the returned value
coregrind/m_aspacemgr/aspacemgr-linux.c
   removed a copy of a comment that was already in pub_core_aspacemgr.h
     (avoid double maintenance)
   renamed unclear suggested_clstack_top to suggested_clstack_end
    (note that here, it looks like suggested_clstack_top was already
     the last addressable byte)

* factorisation of the stack guessing and registration causes
  mechanical changes in the following files:
      coregrind/m_syswrap/syswrap-ppc64-linux.c
      coregrind/m_syswrap/syswrap-x86-darwin.c
      coregrind/m_syswrap/syswrap-amd64-linux.c
      coregrind/m_syswrap/syswrap-arm-linux.c
      coregrind/m_syswrap/syswrap-generic.c
      coregrind/m_syswrap/syswrap-mips64-linux.c
      coregrind/m_syswrap/syswrap-ppc32-linux.c
      coregrind/m_syswrap/syswrap-amd64-darwin.c
      coregrind/m_syswrap/syswrap-mips32-linux.c
      coregrind/m_syswrap/priv_syswrap-generic.h
      coregrind/m_syswrap/syswrap-x86-linux.c
      coregrind/m_syswrap/syswrap-s390x-linux.c
      coregrind/m_syswrap/syswrap-darwin.c
      coregrind/m_syswrap/syswrap-arm64-linux.c
 Some files to look at more in details:
  syswrap-darwin.c : the handling of sysctl(kern.usrstack) looked
    buggy to me, and has probably be made correct by the fact that
     VG_(clstk_end) is now the last addressable byte. However,unsure
    about this, as I could not find any documentation about 
    sysctl(kern.usrstack). I only find several occurences on the web,
    showing that the result of this is page aligned, which I guess
    means it must be 1+ the last addressable byte.
  syswrap-x86-darwin.c and syswrap-amd64-darwin.c
   I suspect the code that was computing client_stack_highest_word
   was wrong, and the patch makes it correct.
  syswrap-mips64-linux.c
    not sure what to do for this code. This is the only code
    that was guessing the stack differently from others.
    Kept (almost) untouched. To be discussed with mips maintainers.

coregrind/pub_core_libcassert.h
coregrind/m_libcassert.c
  * void VG_(show_sched_status):
     renamed Bool valgrind_stack_usage to Bool stack_usage
     if stack_usage, shows both the valgrind stack usage and
     the client stack boundaries
coregrind/m_scheduler/scheduler.c
coregrind/m_gdbserver/server.c
coregrind/m_gdbserver/remote-utils.c
   Updated comments in callers to VG_(show_sched_status)



git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14392 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/coregrind/m_aspacemgr/aspacemgr-linux.c b/coregrind/m_aspacemgr/aspacemgr-linux.c
index 30b6149..88951c9 100644
--- a/coregrind/m_aspacemgr/aspacemgr-linux.c
+++ b/coregrind/m_aspacemgr/aspacemgr-linux.c
@@ -1609,19 +1609,11 @@
    add_segment( &seg );
 }
 
-/* Initialise the address space manager, setting up the initial
-   segment list, and reading /proc/self/maps into it.  This must
-   be called before any other function.
-
-   Takes a pointer to the SP at the time V gained control.  This is
-   taken to be the highest usable address (more or less).  Based on
-   that (and general consultation of tea leaves, etc) return a
-   suggested end address for the client's stack. */
-
+/* See description in pub_core_aspacemgr.h */
 Addr VG_(am_startup) ( Addr sp_at_startup )
 {
    NSegment seg;
-   Addr     suggested_clstack_top;
+   Addr     suggested_clstack_end;
 
    aspacem_assert(sizeof(Word)   == sizeof(void*));
    aspacem_assert(sizeof(Addr)   == sizeof(void*));
@@ -1660,7 +1652,7 @@
    // 0x7fff:5c000000..0x7fff:ffe00000? is stack, dyld, shared cache
 # endif
 
-   suggested_clstack_top = -1; // ignored; Mach-O specifies its stack
+   suggested_clstack_end = -1; // ignored; Mach-O specifies its stack
 
 #else /* !defined(VGO_darwin) */
 
@@ -1690,7 +1682,7 @@
    aspacem_vStart -= 0x10000000; // 256M
 #  endif
 
-   suggested_clstack_top = aspacem_maxAddr - 16*1024*1024ULL
+   suggested_clstack_end = aspacem_maxAddr - 16*1024*1024ULL
                                            + VKI_PAGE_SIZE;
 
 #endif /* #else of 'defined(VGO_darwin)' */
@@ -1699,7 +1691,7 @@
    aspacem_assert(VG_IS_PAGE_ALIGNED(aspacem_maxAddr + 1));
    aspacem_assert(VG_IS_PAGE_ALIGNED(aspacem_cStart));
    aspacem_assert(VG_IS_PAGE_ALIGNED(aspacem_vStart));
-   aspacem_assert(VG_IS_PAGE_ALIGNED(suggested_clstack_top + 1));
+   aspacem_assert(VG_IS_PAGE_ALIGNED(suggested_clstack_end + 1));
 
    VG_(debugLog)(2, "aspacem", 
                     "              minAddr = 0x%010llx (computed)\n", 
@@ -1714,8 +1706,8 @@
                     "               vStart = 0x%010llx (computed)\n", 
                     (ULong)aspacem_vStart);
    VG_(debugLog)(2, "aspacem", 
-                    "suggested_clstack_top = 0x%010llx (computed)\n", 
-                    (ULong)suggested_clstack_top);
+                    "suggested_clstack_end = 0x%010llx (computed)\n", 
+                    (ULong)suggested_clstack_end);
 
    if (aspacem_cStart > Addr_MIN) {
       init_resvn(&seg, Addr_MIN, aspacem_cStart-1);
@@ -1751,7 +1743,7 @@
    VG_(am_show_nsegments)(2, "With contents of /proc/self/maps");
 
    AM_SANITY_CHECK;
-   return suggested_clstack_top;
+   return suggested_clstack_end;
 }
 
 
diff --git a/coregrind/m_clientstate.c b/coregrind/m_clientstate.c
index b251f1b..e1a6040 100644
--- a/coregrind/m_clientstate.c
+++ b/coregrind/m_clientstate.c
@@ -45,9 +45,9 @@
 // TODO: get rid of as many of these as possible.
 
 /* ***Initial*** lowest address of the stack segment of the main thread.
-   The main stack will grow if needed but VG_(clstk_base) will
+   The main stack will grow if needed but VG_(clstk_start_base) will
    not be changed according to the growth. */
-Addr  VG_(clstk_base)  = 0;
+Addr  VG_(clstk_start_base)  = 0;
 /* Initial highest address of the stack segment of the main thread. */
 Addr  VG_(clstk_end)   = 0;
 UWord VG_(clstk_id)    = 0;
diff --git a/coregrind/m_gdbserver/remote-utils.c b/coregrind/m_gdbserver/remote-utils.c
index 599919c..6ee1f7e 100644
--- a/coregrind/m_gdbserver/remote-utils.c
+++ b/coregrind/m_gdbserver/remote-utils.c
@@ -44,7 +44,7 @@
       Int i;
       vki_sigset_t cursigset;
       VG_(show_sched_status) (True,  // host_stacktrace
-                              True,  // valgrind_stack_usage
+                              True,  // stack_usage
                               True); // exited_threads
       VG_(sigprocmask) (0,           // dummy how.
                         NULL,        // do not change the sigmask
diff --git a/coregrind/m_gdbserver/server.c b/coregrind/m_gdbserver/server.c
index d8d3a19..072bd68 100644
--- a/coregrind/m_gdbserver/server.c
+++ b/coregrind/m_gdbserver/server.c
@@ -398,7 +398,7 @@
          break;
       case  5: /* scheduler */
          VG_(show_sched_status) (True,  // host_stacktrace
-                                 True,  // valgrind_stack_usage
+                                 True,  // stack_usage
                                  True); // exited_threads
          ret = 1;
          break;
diff --git a/coregrind/m_initimg/initimg-darwin.c b/coregrind/m_initimg/initimg-darwin.c
index 89d8584..2ae934e 100644
--- a/coregrind/m_initimg/initimg-darwin.c
+++ b/coregrind/m_initimg/initimg-darwin.c
@@ -410,7 +410,7 @@
 
    /* Record stack extent -- needed for stack-change code. */
    /* GrP fixme really? */
-   VG_(clstk_base) = clstack_start;
+   VG_(clstk_start_base) = clstack_start;
    VG_(clstk_end)  = clstack_end;
 
    if (0)
@@ -548,18 +548,18 @@
    //   p: load_client()     [for 'info']
    //   p: fix_environment() [for 'env']
    //--------------------------------------------------------------
-   iicii.clstack_top = info.stack_end - 1;
-   iifii.clstack_max_size = info.stack_end - info.stack_start;
+   iicii.clstack_end = info.stack_end;
+   iifii.clstack_max_size = info.stack_end - info.stack_start + 1;
    
    iifii.initial_client_SP = 
        setup_client_stack( iicii.argv - 1, env, &info, 
-                           iicii.clstack_top, iifii.clstack_max_size );
+                           iicii.clstack_end, iifii.clstack_max_size );
 
    VG_(free)(env);
 
    VG_(debugLog)(2, "initimg",
                  "Client info: "
-                 "initial_IP=%p initial_SP=%p stack=%p..%p\n", 
+                 "initial_IP=%p initial_SP=%p stack=[%p..%p]\n", 
                  (void*)(iifii.initial_client_IP),
                  (void*)(iifii.initial_client_SP),
                  (void*)(info.stack_start), 
diff --git a/coregrind/m_initimg/initimg-linux.c b/coregrind/m_initimg/initimg-linux.c
index 2d942c1..a8e7d27 100644
--- a/coregrind/m_initimg/initimg-linux.c
+++ b/coregrind/m_initimg/initimg-linux.c
@@ -564,8 +564,8 @@
      vg_assert(!sr_isError(res)); 
 
      /* Record stack extent -- needed for stack-change code. */
-     VG_(clstk_base) = anon_start -inner_HACK;
-     VG_(clstk_end)  = VG_(clstk_base) + anon_size +inner_HACK -1;
+     VG_(clstk_start_base) = anon_start -inner_HACK;
+     VG_(clstk_end)  = VG_(clstk_start_base) + anon_size +inner_HACK -1;
 
    }
 
@@ -917,7 +917,7 @@
       iifii.initial_client_SP
          = setup_client_stack( init_sp, env, 
                                &info, &iifii.client_auxv, 
-                               iicii.clstack_top, iifii.clstack_max_size );
+                               iicii.clstack_end, iifii.clstack_max_size );
 
       VG_(free)(env);
 
diff --git a/coregrind/m_libcassert.c b/coregrind/m_libcassert.c
index 94f28a1..d48751f 100644
--- a/coregrind/m_libcassert.c
+++ b/coregrind/m_libcassert.c
@@ -39,6 +39,8 @@
 #include "pub_core_libcassert.h"
 #include "pub_core_libcprint.h"
 #include "pub_core_libcproc.h"      // For VG_(gettid)()
+#include "pub_core_machine.h"
+#include "pub_core_stacks.h"
 #include "pub_core_stacktrace.h"
 #include "pub_core_syscall.h"
 #include "pub_core_tooliface.h"     // For VG_(details).{name,bug_reports_to}
@@ -294,7 +296,7 @@
 
 // Print the scheduler status.
 static void show_sched_status_wrk ( Bool host_stacktrace,
-                                    Bool valgrind_stack_usage,
+                                    Bool stack_usage,
                                     Bool exited_threads,
                                     UnwindStartRegs* startRegsIN)
 {
@@ -349,7 +351,19 @@
                   VG_(name_of_ThreadStatus)(VG_(threads)[i].status) );
       if (VG_(threads)[i].status != VgTs_Empty)
          VG_(get_and_pp_StackTrace)( i, BACKTRACE_DEPTH );
-      if (valgrind_stack_usage && stack != 0)
+      if (stack_usage && VG_(threads)[i].client_stack_highest_byte != 0 ) {
+         Addr start, end;
+         
+         start = end = 0;
+         VG_(stack_limits)(VG_(threads)[i].client_stack_highest_byte,
+                           &start, &end);
+         if (start != end)
+            VG_(printf)("client stack range: [%p %p] client SP: %p\n",
+                        (void*)start, (void*)end, (void*)VG_(get_SP)(i));
+         else
+            VG_(printf)("client stack range: ???????\n");
+      }
+      if (stack_usage && stack != 0)
           VG_(printf)("valgrind stack top usage: %ld of %ld\n",
                       VG_STACK_ACTIVE_SZB 
                       - VG_(am_get_VgStack_unused_szB)(stack,
@@ -360,11 +374,11 @@
 }
 
 void VG_(show_sched_status) ( Bool host_stacktrace,
-                              Bool valgrind_stack_usage,
+                              Bool stack_usage,
                               Bool exited_threads)
 {
    show_sched_status_wrk (host_stacktrace,
-                          valgrind_stack_usage,
+                          stack_usage,
                           exited_threads,
                           NULL);
 }
@@ -374,7 +388,7 @@
                               UnwindStartRegs* startRegsIN )
 {
    show_sched_status_wrk (True,  // host_stacktrace
-                          False, // valgrind_stack_usage
+                          False, // stack_usage
                           False, // exited_threads
                           startRegsIN);
    VG_(printf)(
@@ -485,7 +499,7 @@
    VG_(umsg)("Valgrind has to exit now.  Sorry.  Bye!\n");
    VG_(umsg)("\n");
    VG_(show_sched_status)(False,  // host_stacktrace
-                          False,  // valgrind_stack_usage
+                          False,  // stack_usage
                           False); // exited_threads
    VG_(exit)(1);
 }
diff --git a/coregrind/m_machine.c b/coregrind/m_machine.c
index 1c767b6..42d8ce8 100644
--- a/coregrind/m_machine.c
+++ b/coregrind/m_machine.c
@@ -380,7 +380,7 @@
       if (VG_(threads)[i].status != VgTs_Empty) {
          *tid       = i;
          *stack_min = VG_(get_SP)(i);
-         *stack_max = VG_(threads)[i].client_stack_highest_word;
+         *stack_max = VG_(threads)[i].client_stack_highest_byte;
          return True;
       }
    }
@@ -391,7 +391,7 @@
 {
    vg_assert(0 <= tid && tid < VG_N_THREADS && tid != VG_INVALID_THREADID);
    vg_assert(VG_(threads)[tid].status != VgTs_Empty);
-   return VG_(threads)[tid].client_stack_highest_word;
+   return VG_(threads)[tid].client_stack_highest_byte;
 }
 
 SizeT VG_(thread_get_stack_size)(ThreadId tid)
diff --git a/coregrind/m_main.c b/coregrind/m_main.c
index 796c9eb..7fdc3a7 100644
--- a/coregrind/m_main.c
+++ b/coregrind/m_main.c
@@ -1649,7 +1649,7 @@
    vg_assert(VKI_PAGE_SIZE <= VKI_MAX_PAGE_SIZE);
    vg_assert(VKI_PAGE_SIZE     == (1 << VKI_PAGE_SHIFT));
    vg_assert(VKI_MAX_PAGE_SIZE == (1 << VKI_MAX_PAGE_SHIFT));
-   the_iicii.clstack_top = VG_(am_startup)( the_iicii.sp_at_startup );
+   the_iicii.clstack_end = VG_(am_startup)( the_iicii.sp_at_startup );
    VG_(debugLog)(1, "main", "Address space manager is running\n");
 
    //--------------------------------------------------------------
@@ -2365,7 +2365,7 @@
    //--------------------------------------------------------------
    // register client stack
    //--------------------------------------------------------------
-   VG_(clstk_id) = VG_(register_stack)(VG_(clstk_base), VG_(clstk_end));
+   VG_(clstk_id) = VG_(register_stack)(VG_(clstk_start_base), VG_(clstk_end));
 
    //--------------------------------------------------------------
    // Show the address space state so far
diff --git a/coregrind/m_scheduler/scheduler.c b/coregrind/m_scheduler/scheduler.c
index 0ee1416..c25a759 100644
--- a/coregrind/m_scheduler/scheduler.c
+++ b/coregrind/m_scheduler/scheduler.c
@@ -620,7 +620,7 @@
 
       VG_(threads)[i].status                    = VgTs_Empty;
       VG_(threads)[i].client_stack_szB          = 0;
-      VG_(threads)[i].client_stack_highest_word = (Addr)NULL;
+      VG_(threads)[i].client_stack_highest_byte = (Addr)NULL;
       VG_(threads)[i].err_disablement_level     = 0;
       VG_(threads)[i].thread_name               = NULL;
    }
@@ -656,8 +656,8 @@
    vg_assert(VG_IS_PAGE_ALIGNED(clstack_end+1));
    vg_assert(VG_IS_PAGE_ALIGNED(clstack_size));
 
-   VG_(threads)[tid_main].client_stack_highest_word 
-      = clstack_end + 1 - sizeof(UWord);
+   VG_(threads)[tid_main].client_stack_highest_byte 
+      = clstack_end;
    VG_(threads)[tid_main].client_stack_szB 
       = clstack_size;
 
@@ -2167,7 +2167,7 @@
          VG_(printf)("\n------------ Sched State at %d ms ------------\n",
                      (Int)now);
          VG_(show_sched_status)(True,  // host_stacktrace
-                                True,  // valgrind_stack_usage
+                                True,  // stack_usage
                                 True); // exited_threads);
       }
    }
diff --git a/coregrind/m_stacks.c b/coregrind/m_stacks.c
index d96f70e..d116547 100644
--- a/coregrind/m_stacks.c
+++ b/coregrind/m_stacks.c
@@ -87,8 +87,8 @@
  */
 typedef struct _Stack {
    UWord id;
-   Addr start;
-   Addr end;
+   Addr start; // Lowest stack byte, included.
+   Addr end;   // Highest stack byte, included.
    struct _Stack *next;
 } Stack;
 
@@ -183,6 +183,9 @@
    Stack *i;
 
    if (start > end) {
+      /* If caller provides addresses in reverse order, swap them.
+         Ugly but not doing that breaks backward compatibility with
+         (user) code registering stacks with start/end inverted . */
       Addr t = end;
       end = start;
       start = t;
@@ -199,7 +202,7 @@
       current_stack = i;
    }
 
-   VG_(debugLog)(2, "stacks", "register %p-%p as stack %lu\n",
+   VG_(debugLog)(2, "stacks", "register [%p-%p] as stack %lu\n",
                     (void*)start, (void*)end, i->id);
 
    return i->id;
@@ -246,9 +249,11 @@
 
    while (i) {
       if (i->id == id) {
-         VG_(debugLog)(2, "stacks", "change stack %lu from %p-%p to %p-%p\n",
+         VG_(debugLog)(2, "stacks", 
+                       "change stack %lu from [%p-%p] to [%p-%p]\n",
                        id, (void*)i->start, (void*)i->end,
                            (void*)start,    (void*)end);
+         /* FIXME : swap start/end like VG_(register_stack) ??? */
          i->start = start;
          i->end = end;
          return;
@@ -271,7 +276,6 @@
    }
 }
 
-
 /* complaints_stack_switch reports that SP has changed by more than some
    threshold amount (by default, 2MB).  We take this to mean that the
    application is switching to a new stack, for whatever reason.
diff --git a/coregrind/m_stacktrace.c b/coregrind/m_stacktrace.c
index 64e1ac4..be171b5 100644
--- a/coregrind/m_stacktrace.c
+++ b/coregrind/m_stacktrace.c
@@ -1395,8 +1395,8 @@
    VG_(memset)( &startRegs, 0, sizeof(startRegs) );
    VG_(get_UnwindStartRegs)( &startRegs, tid );
 
-   Addr stack_highest_word = VG_(threads)[tid].client_stack_highest_word;
-   Addr stack_lowest_word  = 0;
+   Addr stack_highest_byte = VG_(threads)[tid].client_stack_highest_byte;
+   Addr stack_lowest_byte  = 0;
 
 #  if defined(VGP_x86_linux)
    /* Nasty little hack to deal with syscalls - if libc is using its
@@ -1428,7 +1428,7 @@
 
    /* See if we can get a better idea of the stack limits */
    VG_(stack_limits)( (Addr)startRegs.r_sp,
-                      &stack_lowest_word, &stack_highest_word );
+                      &stack_lowest_byte, &stack_highest_byte );
 
    /* Take into account the first_ip_delta. */
    startRegs.r_pc += (Long)(Word)first_ip_delta;
@@ -1436,13 +1436,13 @@
    if (0)
       VG_(printf)("tid %d: stack_highest=0x%08lx ip=0x%010llx "
                   "sp=0x%010llx\n",
-		  tid, stack_highest_word,
+		  tid, stack_highest_byte,
                   startRegs.r_pc, startRegs.r_sp);
 
    return VG_(get_StackTrace_wrk)(tid, ips, max_n_ips, 
                                        sps, fps,
                                        &startRegs,
-                                       stack_highest_word);
+                                       stack_highest_byte);
 }
 
 static void printIpDesc(UInt n, Addr ip, void* uu_opaque)
diff --git a/coregrind/m_syswrap/priv_syswrap-generic.h b/coregrind/m_syswrap/priv_syswrap-generic.h
index 598e27d..b3372f3 100644
--- a/coregrind/m_syswrap/priv_syswrap-generic.h
+++ b/coregrind/m_syswrap/priv_syswrap-generic.h
@@ -36,6 +36,12 @@
 #include "priv_types_n_macros.h"  // DECL_TEMPLATE
 
 
+/* Guess the client stack from the segment in which sp is mapped.
+   Register the guessed stack using VG_(register_stack).
+   Setup tst client_stack_highest_byte and client_stack_szB.
+   If sp is not in a mapped segment, does nothing. */
+extern void ML_(guess_and_register_stack) (Addr sp, ThreadState* tst);
+
 // Return true if address range entirely contained within client
 // address space.
 extern
diff --git a/coregrind/m_syswrap/syswrap-amd64-darwin.c b/coregrind/m_syswrap/syswrap-amd64-darwin.c
index 77a583e..b51b652 100644
--- a/coregrind/m_syswrap/syswrap-amd64-darwin.c
+++ b/coregrind/m_syswrap/syswrap-amd64-darwin.c
@@ -373,7 +373,7 @@
    if ((flags & 0x01000000) == 0) {
       // kernel allocated stack - needs mapping
       Addr stack = VG_PGROUNDUP(sp) - stacksize;
-      tst->client_stack_highest_word = stack+stacksize;
+      tst->client_stack_highest_byte = stack+stacksize-1;
       tst->client_stack_szB = stacksize;
 
       // pthread structure
@@ -547,7 +547,7 @@
       record_named_port(tst->tid, kport, MACH_PORT_RIGHT_SEND, "wqthread-%p");
       
       // kernel allocated stack - needs mapping
-      tst->client_stack_highest_word = stack+stacksize;
+      tst->client_stack_highest_byte = stack+stacksize-1;
       tst->client_stack_szB = stacksize;
 
       // GrP fixme scheduler lock?!
diff --git a/coregrind/m_syswrap/syswrap-amd64-linux.c b/coregrind/m_syswrap/syswrap-amd64-linux.c
index 92e9d55..02d5a46 100644
--- a/coregrind/m_syswrap/syswrap-amd64-linux.c
+++ b/coregrind/m_syswrap/syswrap-amd64-linux.c
@@ -210,7 +210,6 @@
    ThreadState* ptst = VG_(get_ThreadState)(ptid);
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    Long         rax;
    vki_sigset_t blockall, savedmask;
@@ -264,27 +263,7 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)((Addr)rsp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(rsp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-
-      if (debug)
-	 VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-		     ctid, seg->start, VG_PGROUNDUP(rsp));
-   } else {
-      VG_(message)(Vg_UserMsg,
-                   "!? New thread %d starts with RSP(%#lx) unmapped\n",
-		   ctid, rsp);
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (rsp, ctst);
 
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  If the clone
diff --git a/coregrind/m_syswrap/syswrap-arm-linux.c b/coregrind/m_syswrap/syswrap-arm-linux.c
index 7680ad0..b91edcc 100644
--- a/coregrind/m_syswrap/syswrap-arm-linux.c
+++ b/coregrind/m_syswrap/syswrap-arm-linux.c
@@ -177,7 +177,6 @@
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    UInt r0;
    UWord *stack;
-   NSegment const* seg;
    SysRes res;
    vki_sigset_t blockall, savedmask;
 
@@ -215,20 +214,7 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   seg = VG_(am_find_nsegment)((Addr)sp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(sp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-   
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-   
-      if (debug)
-         VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-         ctid, seg->start, VG_PGROUNDUP(sp));
-   } else {
-      VG_(message)(Vg_UserMsg, "!? New thread %d starts with sp+%#lx) unmapped\n", ctid, sp);
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (sp, ctst);
 
    vg_assert(VG_(owns_BigLock_LL)(ptid));
    VG_TRACK ( pre_thread_ll_create, ptid, ctid );
diff --git a/coregrind/m_syswrap/syswrap-arm64-linux.c b/coregrind/m_syswrap/syswrap-arm64-linux.c
index 063898a..6ebfc7e 100644
--- a/coregrind/m_syswrap/syswrap-arm64-linux.c
+++ b/coregrind/m_syswrap/syswrap-arm64-linux.c
@@ -226,7 +226,6 @@
    ThreadState* ptst = VG_(get_ThreadState)(ptid);
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    ULong        x0;
    vki_sigset_t blockall, savedmask;
@@ -280,28 +279,7 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that xsp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)((Addr)child_xsp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(child_xsp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-   
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-   
-      if (debug)
-         VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-         ctid, seg->start, VG_PGROUNDUP(child_xsp));
-   } else {
-      VG_(message)(
-         Vg_UserMsg,
-         "!? New thread %d starts with sp+%#lx) unmapped\n", ctid, child_xsp
-      );
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack)(child_xsp, ctst);
 
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  If the clone
diff --git a/coregrind/m_syswrap/syswrap-darwin.c b/coregrind/m_syswrap/syswrap-darwin.c
index 0af2a9c..a736f39 100644
--- a/coregrind/m_syswrap/syswrap-darwin.c
+++ b/coregrind/m_syswrap/syswrap-darwin.c
@@ -174,26 +174,7 @@
 
 void find_stack_segment(ThreadId tid, Addr sp)
 {
-   /* We don't really know where the client stack is, because it's
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   ThreadState *tst = VG_(get_ThreadState)(tid);
-   const NSegment *seg = VG_(am_find_nsegment)(sp);
-   if (seg && seg->kind != SkResvn) {
-      tst->client_stack_highest_word = (Addr)VG_PGROUNDUP(sp);
-      tst->client_stack_szB = tst->client_stack_highest_word - seg->start;
-
-      if (1)
-         VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-                     tid, seg->start, VG_PGROUNDUP(sp));
-   } else {
-       VG_(printf)("couldn't find user stack\n");
-      VG_(message)(Vg_UserMsg, "!? New thread %d starts with SP(%#lx) unmapped\n",
-                   tid, sp);
-      tst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (sp, VG_(get_ThreadState)(tid));
 }
 
 
@@ -3783,6 +3764,10 @@
             Addr *oldp = (Addr *)ARG3;
             size_t *oldlenp = (size_t *)ARG4;
             if (oldlenp) {
+               // According to some searches on the net, it looks like USRSTACK
+               // gives the address of the byte following the highest byte of the stack
+               // As VG_(clstk_end) is the address of the highest addressable byte, we
+               // add +1.
                Addr stack_end = VG_(clstk_end)+1;
                size_t oldlen = *oldlenp;
                // always return actual size
diff --git a/coregrind/m_syswrap/syswrap-generic.c b/coregrind/m_syswrap/syswrap-generic.c
index d9df953..4fd4e50 100644
--- a/coregrind/m_syswrap/syswrap-generic.c
+++ b/coregrind/m_syswrap/syswrap-generic.c
@@ -60,6 +60,7 @@
 #include "pub_core_syswrap.h"
 #include "pub_core_tooliface.h"
 #include "pub_core_ume.h"
+#include "pub_core_stacks.h"
 
 #include "priv_types_n_macros.h"
 #include "priv_syswrap-generic.h"
@@ -67,6 +68,35 @@
 #include "config.h"
 
 
+void ML_(guess_and_register_stack) (Addr sp, ThreadState* tst)
+{
+   Bool debug = False;
+   NSegment const* seg;
+
+   /* We don't really know where the client stack is, because its
+      allocated by the client.  The best we can do is look at the
+      memory mappings and try to derive some useful information.  We
+      assume that sp starts near its highest possible value, and can
+      only go down to the start of the mmaped segment. */
+   seg = VG_(am_find_nsegment)(sp);
+   if (seg && seg->kind != SkResvn) {
+      tst->client_stack_highest_byte = (Addr)VG_PGROUNDUP(sp)-1;
+      tst->client_stack_szB = tst->client_stack_highest_byte - seg->start + 1;
+
+      VG_(register_stack)(seg->start, tst->client_stack_highest_byte);
+
+      if (debug)
+	 VG_(printf)("tid %d: guessed client stack range [%#lx-%#lx]\n",
+		     tst->tid, seg->start, tst->client_stack_highest_byte);
+   } else {
+      VG_(message)(Vg_UserMsg,
+                   "!? New thread %d starts with SP(%#lx) unmapped\n",
+		   tst->tid, sp);
+      tst->client_stack_highest_byte = 0;
+      tst->client_stack_szB  = 0;
+   }
+}
+
 /* Returns True iff address range is something the client can
    plausibly mess with: all of it is either already belongs to the
    client or is free or a reservation. */
diff --git a/coregrind/m_syswrap/syswrap-mips32-linux.c b/coregrind/m_syswrap/syswrap-mips32-linux.c
index c0cd811..5888841 100644
--- a/coregrind/m_syswrap/syswrap-mips32-linux.c
+++ b/coregrind/m_syswrap/syswrap-mips32-linux.c
@@ -248,7 +248,6 @@
    ThreadState * ctst = VG_ (get_ThreadState) (ctid);
    UInt ret = 0;
    UWord * stack;
-   NSegment const *seg;
    SysRes res;
    vki_sigset_t blockall, savedmask;
 
@@ -283,22 +282,8 @@
       See #226116. */ 
 
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
-   seg = VG_ (am_find_nsegment) ((Addr) sp);
 
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr) VG_PGROUNDUP (sp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-      VG_ (register_stack) (seg->start, ctst->client_stack_highest_word);
-      if (debug)
-         VG_ (printf) ("tid %d: guessed client stack range %#lx-%#lx\n",
-
-      ctid, seg->start, VG_PGROUNDUP (sp));
-   } else {
-      VG_ (message) (Vg_UserMsg,
-                     "!? New thread %d starts with sp+%#lx) unmapped\n",
-                     ctid, sp);
-      ctst->client_stack_szB = 0;
-   }
+   ML_(guess_and_register_stack) (sp, ctst);
 
    VG_TRACK (pre_thread_ll_create, ptid, ctid);
    if (flags & VKI_CLONE_SETTLS) {
diff --git a/coregrind/m_syswrap/syswrap-mips64-linux.c b/coregrind/m_syswrap/syswrap-mips64-linux.c
index fb5067c..3d5cb7d 100644
--- a/coregrind/m_syswrap/syswrap-mips64-linux.c
+++ b/coregrind/m_syswrap/syswrap-mips64-linux.c
@@ -232,10 +232,12 @@
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
    seg = VG_(am_find_nsegment)((Addr)sp);
 
+   // FIXME mips64: the below differs significantly from the code
+   // factorised in syswrap-generic.c e.g. does not round sp ????
    if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = sp;
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
+      ctst->client_stack_highest_byte = sp;
+      ctst->client_stack_szB = ctst->client_stack_highest_byte - seg->start + 1;
+      VG_(register_stack)(seg->start, ctst->client_stack_highest_byte);
       if (debug)
         VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
                     ctid, seg->start, sp /* VG_PGROUNDUP (sp) */ );
diff --git a/coregrind/m_syswrap/syswrap-ppc32-linux.c b/coregrind/m_syswrap/syswrap-ppc32-linux.c
index c79156c..9b4edf1 100644
--- a/coregrind/m_syswrap/syswrap-ppc32-linux.c
+++ b/coregrind/m_syswrap/syswrap-ppc32-linux.c
@@ -244,7 +244,6 @@
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    ULong        word64;
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    vki_sigset_t blockall, savedmask;
 
@@ -310,27 +309,7 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)(sp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(sp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-
-      if (debug)
-	 VG_(printf)("\ntid %d: guessed client stack range %#lx-%#lx\n",
-		     ctid, seg->start, VG_PGROUNDUP(sp));
-   } else {
-      VG_(message)(Vg_UserMsg,
-                   "!? New thread %d starts with R1(%#lx) unmapped\n",
-		   ctid, sp);
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (sp, ctst);
 
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  If the clone
diff --git a/coregrind/m_syswrap/syswrap-ppc64-linux.c b/coregrind/m_syswrap/syswrap-ppc64-linux.c
index b67cf30..d660833 100644
--- a/coregrind/m_syswrap/syswrap-ppc64-linux.c
+++ b/coregrind/m_syswrap/syswrap-ppc64-linux.c
@@ -394,7 +394,6 @@
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    ULong        word64;
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    vki_sigset_t blockall, savedmask;
 
@@ -460,27 +459,7 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)(sp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(sp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-
-      if (debug)
-	 VG_(printf)("\ntid %d: guessed client stack range %#lx-%#lx\n",
-		     ctid, seg->start, VG_PGROUNDUP(sp));
-   } else {
-      VG_(message)(Vg_UserMsg,
-                   "!? New thread %d starts with R1(%#lx) unmapped\n",
-		   ctid, sp);
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (sp, ctst);
 
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  If the clone
diff --git a/coregrind/m_syswrap/syswrap-s390x-linux.c b/coregrind/m_syswrap/syswrap-s390x-linux.c
index 664c6ff..ee852d4 100644
--- a/coregrind/m_syswrap/syswrap-s390x-linux.c
+++ b/coregrind/m_syswrap/syswrap-s390x-linux.c
@@ -216,7 +216,6 @@
    ThreadState* ptst = VG_(get_ThreadState)(ptid);
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    ULong        r2;
    vki_sigset_t blockall, savedmask;
@@ -262,27 +261,7 @@
    /* have the parents thread group */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)((Addr)sp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(sp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-
-      if (debug)
-	 VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-		     ctid, seg->start, VG_PGROUNDUP(sp));
-   } else {
-      VG_(message)(Vg_UserMsg,
-                   "!? New thread %d starts with SP(%#lx) unmapped\n",
-		   ctid, sp);
-      ctst->client_stack_szB  = 0;
-   }
+   ML_(guess_and_register_stack) (sp, ctst);
 
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  If the clone
diff --git a/coregrind/m_syswrap/syswrap-x86-darwin.c b/coregrind/m_syswrap/syswrap-x86-darwin.c
index b7c8241..b4a6f63 100644
--- a/coregrind/m_syswrap/syswrap-x86-darwin.c
+++ b/coregrind/m_syswrap/syswrap-x86-darwin.c
@@ -314,7 +314,7 @@
    if ((flags & 0x01000000) == 0) {
       // kernel allocated stack - needs mapping
       Addr stack = VG_PGROUNDUP(sp) - stacksize;
-      tst->client_stack_highest_word = stack+stacksize;
+      tst->client_stack_highest_byte = stack+stacksize-1;
       tst->client_stack_szB = stacksize;
 
       // pthread structure
@@ -475,7 +475,7 @@
       record_named_port(tst->tid, kport, MACH_PORT_RIGHT_SEND, "wqthread-%p");
       
       // kernel allocated stack - needs mapping
-      tst->client_stack_highest_word = stack+stacksize;
+      tst->client_stack_highest_byte = stack+stacksize-1;
       tst->client_stack_szB = stacksize;
 
       // GrP fixme scheduler lock?!
diff --git a/coregrind/m_syswrap/syswrap-x86-linux.c b/coregrind/m_syswrap/syswrap-x86-linux.c
index c07c628..2ec4d1a 100644
--- a/coregrind/m_syswrap/syswrap-x86-linux.c
+++ b/coregrind/m_syswrap/syswrap-x86-linux.c
@@ -217,7 +217,6 @@
    ThreadState* ptst = VG_(get_ThreadState)(ptid);
    ThreadState* ctst = VG_(get_ThreadState)(ctid);
    UWord*       stack;
-   NSegment const* seg;
    SysRes       res;
    Int          eax;
    vki_sigset_t blockall, savedmask;
@@ -275,28 +274,8 @@
       See #226116. */
    ctst->os_state.threadgroup = ptst->os_state.threadgroup;
 
-   /* We don't really know where the client stack is, because its
-      allocated by the client.  The best we can do is look at the
-      memory mappings and try to derive some useful information.  We
-      assume that esp starts near its highest possible value, and can
-      only go down to the start of the mmaped segment. */
-   seg = VG_(am_find_nsegment)((Addr)esp);
-   if (seg && seg->kind != SkResvn) {
-      ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(esp);
-      ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
-
-      VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
-
-      if (debug)
-	 VG_(printf)("tid %d: guessed client stack range %#lx-%#lx\n",
-		     ctid, seg->start, VG_PGROUNDUP(esp));
-   } else {
-      VG_(message)(Vg_UserMsg,
-                   "!? New thread %d starts with ESP(%#lx) unmapped\n",
-		   ctid, esp);
-      ctst->client_stack_szB  = 0;
-   }
-
+   ML_(guess_and_register_stack) (esp, ctst);
+   
    /* Assume the clone will succeed, and tell any tool that wants to
       know that this thread has come into existence.  We cannot defer
       it beyond this point because sys_set_thread_area, just below,
diff --git a/coregrind/pub_core_aspacemgr.h b/coregrind/pub_core_aspacemgr.h
index 4dd62cb..890ac3a 100644
--- a/coregrind/pub_core_aspacemgr.h
+++ b/coregrind/pub_core_aspacemgr.h
@@ -58,7 +58,7 @@
    Takes a pointer to the SP at the time V gained control.  This is
    taken to be the highest usable address (more or less).  Based on
    that (and general consultation of tea leaves, etc) return a
-   suggested end address for the client's stack. */
+   suggested end address (highest addressable byte) for the client's stack. */
 extern Addr VG_(am_startup) ( Addr sp_at_startup );
 
 
diff --git a/coregrind/pub_core_clientstate.h b/coregrind/pub_core_clientstate.h
index 64ec161..dc93d1f 100644
--- a/coregrind/pub_core_clientstate.h
+++ b/coregrind/pub_core_clientstate.h
@@ -42,8 +42,9 @@
 
 // Address space globals
 
-extern Addr  VG_(clstk_base);	 // client stack range
-extern Addr  VG_(clstk_end);
+// client stack range
+extern Addr  VG_(clstk_start_base); // *Initial* lowest byte address
+extern Addr  VG_(clstk_end);        // Highest byte address
 extern UWord VG_(clstk_id);      // client stack id
 
 /* linux only: where is the client auxv ? */
diff --git a/coregrind/pub_core_initimg.h b/coregrind/pub_core_initimg.h
index a9a8ab3..5623498 100644
--- a/coregrind/pub_core_initimg.h
+++ b/coregrind/pub_core_initimg.h
@@ -72,7 +72,7 @@
    /* ------ Mandatory fields ------ */
    const HChar*  toolname;
    Addr    sp_at_startup;
-   Addr    clstack_top;
+   Addr    clstack_end; // Highest stack addressable byte
    /* ------ Per-OS fields ------ */
    HChar** argv;
    HChar** envp;
@@ -96,7 +96,7 @@
    /* ------ Mandatory fields ------ */
    const HChar*  toolname;
    Addr    sp_at_startup;
-   Addr    clstack_top;
+   Addr    clstack_end; // highest stack addressable byte
    /* ------ Per-OS fields ------ */
    HChar** argv;
    HChar** envp;
diff --git a/coregrind/pub_core_libcassert.h b/coregrind/pub_core_libcassert.h
index de4b3cc..6e1df85 100644
--- a/coregrind/pub_core_libcassert.h
+++ b/coregrind/pub_core_libcassert.h
@@ -83,11 +83,13 @@
    Mostly for debugging V.
    The following activates optional output:
      host_stacktrace : shows the host stacktrace.
-     valgrind_stack_usage : shows how much of the valgrind stack was used.
+     stack_usage True means:
+                   shows how much of the valgrind stack was used.
+                   shows the client stack range
      exited_thread_slots : show information for thread slots that were used
         but the thread has now exited. */
 extern void VG_(show_sched_status) ( Bool host_stacktrace,
-                                     Bool valgrind_stack_usage,
+                                     Bool stack_usage,
                                      Bool exited_threads);
 
 #endif   // __PUB_CORE_LIBCASSERT_H
diff --git a/coregrind/pub_core_scheduler.h b/coregrind/pub_core_scheduler.h
index 6b24aeb..aa9cf04 100644
--- a/coregrind/pub_core_scheduler.h
+++ b/coregrind/pub_core_scheduler.h
@@ -96,7 +96,8 @@
 extern ThreadId VG_(scheduler_init_phase1) ( void );
 
 // Initialise, phase 2.  Is passed the extent of the root thread's
-// client stack and the root ThreadId decided on by phase 1.
+// client stack end (highest addressable byte) and the root ThreadId
+// decided on by phase 1.
 extern void VG_(scheduler_init_phase2) ( ThreadId main_tid, 
                                          Addr     clstack_end, 
                                          SizeT    clstack_size );
diff --git a/coregrind/pub_core_stacks.h b/coregrind/pub_core_stacks.h
index 71bdfca..31f5f72 100644
--- a/coregrind/pub_core_stacks.h
+++ b/coregrind/pub_core_stacks.h
@@ -38,6 +38,15 @@
 // purposes of detecting stack switches.
 //--------------------------------------------------------------------
 
+/* Convention for start and end:
+   'start' is the lowest address, 'end' is the highest address.
+   'start' and 'end' bytes are included in the stack.
+   In other words, the stack is the byte interval ['start', 'end']
+   (both bounds are included).
+
+   Note: for compatibility reasons, VG_(register_stack) accepts
+   'start' bigger than 'end' and will (transparently) swap 'start/end'
+   to register the stack. */
 extern UWord VG_(register_stack)   ( Addr start, Addr end );
 extern void  VG_(deregister_stack) ( UWord id );
 extern void  VG_(change_stack)     ( UWord id, Addr start, Addr end );
diff --git a/coregrind/pub_core_threadstate.h b/coregrind/pub_core_threadstate.h
index c2ebb1c..bd1bc06 100644
--- a/coregrind/pub_core_threadstate.h
+++ b/coregrind/pub_core_threadstate.h
@@ -341,11 +341,11 @@
    /* The allocated size of this thread's stack */
    SizeT client_stack_szB;
 
-   /* Address of the highest legitimate word in this stack.  This is
+   /* Address of the highest legitimate byte in this stack.  This is
       used for error messages only -- not critical for execution
       correctness.  Is is set for all stacks, specifically including
       ThreadId == 1 (the main thread). */
-   Addr client_stack_highest_word;
+   Addr client_stack_highest_byte;
 
    /* Alternate signal stack */
    vki_stack_t altstack;
diff --git a/include/pub_tool_machine.h b/include/pub_tool_machine.h
index 0f004f1..c80d49b 100644
--- a/include/pub_tool_machine.h
+++ b/include/pub_tool_machine.h
@@ -142,17 +142,21 @@
 // Returns False at the end.  'tid' is the iterator and you can only
 // safely change it by making calls to these functions.
 extern void VG_(thread_stack_reset_iter) ( /*OUT*/ThreadId* tid );
+// stack_min is the address of the lowest stack byte,
+// stack_max is the address of the highest stack byte.
+// In other words, the live stack is [stack_min, stack_max].
 extern Bool VG_(thread_stack_next)       ( /*MOD*/ThreadId* tid,
                                            /*OUT*/Addr* stack_min, 
                                            /*OUT*/Addr* stack_max );
 
-// Returns .client_stack_highest_word for the given thread
+// Returns .client_stack_highest_byte for the given thread
+// i.e. the highest addressable byte of the stack.
 extern Addr VG_(thread_get_stack_max) ( ThreadId tid );
 
 // Returns how many bytes have been allocated for the stack of the given thread
 extern SizeT VG_(thread_get_stack_size) ( ThreadId tid );
 
-// Returns the bottommost address of the alternate signal stack.
+// Returns the lowest address of the alternate signal stack.
 // See also the man page of sigaltstack().
 extern Addr VG_(thread_get_altstack_min) ( ThreadId tid );
 
diff --git a/include/valgrind.h b/include/valgrind.h
index a65f03a..6954d75 100644
--- a/include/valgrind.h
+++ b/include/valgrind.h
@@ -6507,7 +6507,9 @@
                                VG_USERREQ__MEMPOOL_EXISTS,        \
                                pool, 0, 0, 0, 0)
 
-/* Mark a piece of memory as being a stack. Returns a stack id. */
+/* Mark a piece of memory as being a stack. Returns a stack id.
+   start is the lowest addressable stack byte, end is the highest
+   addressable stack byte. */
 #define VALGRIND_STACK_REGISTER(start, end)                       \
     (unsigned)VALGRIND_DO_CLIENT_REQUEST_EXPR(0,                  \
                                VG_USERREQ__STACK_REGISTER,        \
@@ -6519,7 +6521,9 @@
     VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__STACK_DEREGISTER, \
                                     id, 0, 0, 0, 0)
 
-/* Change the start and end address of the stack id. */
+/* Change the start and end address of the stack id.
+   start is the new lowest addressable stack byte, end is the new highest
+   addressable stack byte. */
 #define VALGRIND_STACK_CHANGE(id, start, end)                     \
     VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__STACK_CHANGE,     \
                                     id, start, end, 0, 0)