README.aarch64 - platform/external/valgrind - Git at Google


 Status
 ~~~~~~

 As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
 the 64-bit ARM architecture.  Currently it supports integer and FP
 instructions and can run anything generated by gcc-4.8.2 -O3.  The
 port is under active development.

 Current limitations, as of mid-May 2014.

 * limited support of vector (SIMD) instructions.  Initial target is
   support for instructions created by gcc-4.8.2 -O3
   (via autovectorisation).  This is complete.

 * Integration with the built in GDB server:
    - works ok (breakpoint, attach to a process blocked in a syscall, ...)
    - still to do:
       arm64 xml register description files (allowing shadow registers
                                             to be looked at).
       cpsr transfer to/from gdb to be looked at (see also arm equivalent code)

 * limited syscall support

 There has been extensive testing of the baseline simulation of integer
 and FP instructions.  Memcheck is also believed to work, at least for
 small examples.  Other tools appear to at least not crash when running
 /bin/date.

 Enough syscalls and instructions are supported for substantial
 programs to work.  Firefox 26 is able to start up and quit.  The noise
 level from Memcheck is low enough to make it practical to use for real
 debugging.


 Building
 ~~~~~~~~

 You could probably build it directly on a target OS, using the normal
 non-cross scheme

   ./autogen.sh ; ./configure --prefix=.. ; make ; make install

 Development so far was however done by cross compiling, viz:

   export CC=aarch64-linux-gnu-gcc
   export LD=aarch64-linux-gnu-ld
   export AR=aarch64-linux-gnu-ar

   ./autogen.sh
   ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
               --enable-only64bit
   make -j4
   make -j4 install

 Doing this assumes that the install path (`pwd`/Inst) is valid on
 both host and target, which isn't normally the case.  To avoid
 this limitation, do instead:

   ./configure --prefix=/install/path/on/target \
               --host=aarch64-unknown-linux \
               --enable-only64bit
   make -j4
   make -j4 install DESTDIR=/a/temp/dir/on/host
   # and then copy the contents of DESTDIR to the target.

 See README.android for more examples of cross-compile building.


 Implementation tidying-up/TODO notes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 UnwindStartRegs -- what should that contain?


 vki-arm64-linux.h: vki_sigaction_base
 I really don't think that __vki_sigrestore_t sa_restorer
 should be present.  Adding it surely puts sa_mask at a wrong
 offset compared to (kernel) reality.  But not having it causes
 compilation of m_signals.c to fail in hard to understand ways,
 so adding it temporarily.


 m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF
 is there at the moment, but 0x00000000 is probably what it should be.
 Also, fix indentation/tab-vs-space stuff


 ./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
 it to __vki_uint128_t, but what's the defn of that?


 m_debuginfo/priv_storage.h: need proper defn of DiCfSI


 readdwarf.c: is this correct?
 #elif defined(VGP_arm64_linux)
 #  define FP_REG         29    //???
 #  define SP_REG         31    //???
 #  define RA_REG_DEFAULT 30    //???


 vki-arm64-linux.h:
 re linux-3.10.5/include/uapi/asm-generic/sembuf.h
 I'd say the amd64 version has padding it shouldn't have.  Check?


 syswrap-linux.c run_a_thread_NORETURN assembly sections
 seems like tst->os_state.exitcode has word type
 in which case the ppc64_linux use of lwz to read it, is wrong


 syswrap-linux.c ML_(do_fork_clone)
 assuming that VGP_arm64_linux is the same as VGP_arm_linux here


 dispatch-arm64-linux.S: FIXME: set up FP control state before
 entering generated code.  Also fix screwy indentation.


 dispatcher-ery general: what's a good (predictor-friendly) way to
 branch to a register?


 in vki-arm64-scnums.h
 //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
 Probably want to reenable that and clean up accordingly


 putIRegXXorZR: figure out a way that the computed value is actually
 used, so as to keep any memory reads that might generate it, alive.
 (else the simulation can lose exceptions).  At least, for writes to
 the zero register generated by loads .. or .. can anything other
 integer instructions, that write to a register, cause exceptions?


 loads/stores: generate stack alignment checks as necessary


 fix barrier insns: ISB, DMB


 fix atomic loads/stores


 FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
 IROps so as to avoid double rounding


 ARM64Instr_Call getRegUsage: re-check relative to what
 getAllocableRegs_ARM64 makes available


 Make dispatch-arm64-linux.S save any callee-saved Q regs
 I think what is required is to save D8-D15 and nothing more than that.


 wrapper for __NR3264_fstat -- correct?


 PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
 definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.


 sigframe-arm64-linux.c: build_sigframe: references to nonexistent
 siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
 replaced by zero.  Also in synth_ucontext.


 m_debugger.c:
 uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
 Is that remotely correct?


 host_arm64_defs.c: emit_ARM64INstr:
 ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
 MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
 dependencies on the top half of the register.  (Or at least check
 the semantics of INS Vd.D[0] to see if it zeroes out the top.)


 preferredVectorSubTypeFromSize: review perf effects and decide
 on a types-for-subparts policy


 fold_IRExpr_Unop: add a reduction rule for this
 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
 vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)


 check insn selection for memcheck-only primops:
 Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
 widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8


 isel: get rid of various cases where zero is put into a register
 and just use xzr instead.  Especially for CmpNEZ64/32.  And for
 writing zeroes into the CC thunk fields.


 /* Keep this list in sync with that in iselNext below */
 /* Keep this list in sync with that for Ist_Exit above */
 uh .. they are not in sync


 very stupid:
 imm64  x23, 0xFFFFFFFFFFFFFFA0
 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2


 valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
 also add CFI annotations


 could possibly bring r29 into use, which be useful as it is
 callee saved


 ubfm/sbfm etc: special case cases that are simple shifts, as iropt
 can't always simplify the general-case IR to a shift in such cases.


 LDP,STP (immediate, simm7) (FP&VEC)
 should zero out hi parts of dst registers in the LDP case


 DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
 rather than doing it "by hand"


 Any place where ZeroHI64ofV128 is used in conjunction with
 FP vector IROps: find a way to make sure that arithmetic on
 the upper half of the values is "harmless."


 math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
 inline scalar code


 chainXDirect_ARM64: use direct jump forms when possible

	Status
	~~~~~~

	As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
	the 64-bit ARM architecture. Currently it supports integer and FP
	instructions and can run anything generated by gcc-4.8.2 -O3. The
	port is under active development.

	Current limitations, as of mid-May 2014.

	* limited support of vector (SIMD) instructions. Initial target is
	support for instructions created by gcc-4.8.2 -O3
	(via autovectorisation). This is complete.

	* Integration with the built in GDB server:
	- works ok (breakpoint, attach to a process blocked in a syscall, ...)
	- still to do:
	arm64 xml register description files (allowing shadow registers
	to be looked at).
	cpsr transfer to/from gdb to be looked at (see also arm equivalent code)

	* limited syscall support

	There has been extensive testing of the baseline simulation of integer
	and FP instructions. Memcheck is also believed to work, at least for
	small examples. Other tools appear to at least not crash when running
	/bin/date.

	Enough syscalls and instructions are supported for substantial
	programs to work. Firefox 26 is able to start up and quit. The noise
	level from Memcheck is low enough to make it practical to use for real
	debugging.


	Building
	~~~~~~~~

	You could probably build it directly on a target OS, using the normal
	non-cross scheme

	./autogen.sh ; ./configure --prefix=.. ; make ; make install

	Development so far was however done by cross compiling, viz:

	export CC=aarch64-linux-gnu-gcc
	export LD=aarch64-linux-gnu-ld
	export AR=aarch64-linux-gnu-ar

	./autogen.sh
	./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
	--enable-only64bit
	make -j4
	make -j4 install

	Doing this assumes that the install path (`pwd`/Inst) is valid on
	both host and target, which isn't normally the case. To avoid
	this limitation, do instead:

	./configure --prefix=/install/path/on/target \
	--host=aarch64-unknown-linux \
	--enable-only64bit
	make -j4
	make -j4 install DESTDIR=/a/temp/dir/on/host
	# and then copy the contents of DESTDIR to the target.

	See README.android for more examples of cross-compile building.


	Implementation tidying-up/TODO notes
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	UnwindStartRegs -- what should that contain?


	vki-arm64-linux.h: vki_sigaction_base
	I really don't think that __vki_sigrestore_t sa_restorer
	should be present. Adding it surely puts sa_mask at a wrong
	offset compared to (kernel) reality. But not having it causes
	compilation of m_signals.c to fail in hard to understand ways,
	so adding it temporarily.


	m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF
	is there at the moment, but 0x00000000 is probably what it should be.
	Also, fix indentation/tab-vs-space stuff


	./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change
	it to __vki_uint128_t, but what's the defn of that?


	m_debuginfo/priv_storage.h: need proper defn of DiCfSI


	readdwarf.c: is this correct?
	#elif defined(VGP_arm64_linux)
	# define FP_REG 29 //???
	# define SP_REG 31 //???
	# define RA_REG_DEFAULT 30 //???


	vki-arm64-linux.h:
	re linux-3.10.5/include/uapi/asm-generic/sembuf.h
	I'd say the amd64 version has padding it shouldn't have. Check?


	syswrap-linux.c run_a_thread_NORETURN assembly sections
	seems like tst->os_state.exitcode has word type
	in which case the ppc64_linux use of lwz to read it, is wrong


	syswrap-linux.c ML_(do_fork_clone)
	assuming that VGP_arm64_linux is the same as VGP_arm_linux here


	dispatch-arm64-linux.S: FIXME: set up FP control state before
	entering generated code. Also fix screwy indentation.


	dispatcher-ery general: what's a good (predictor-friendly) way to
	branch to a register?


	in vki-arm64-scnums.h
	//#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
	Probably want to reenable that and clean up accordingly


	putIRegXXorZR: figure out a way that the computed value is actually
	used, so as to keep any memory reads that might generate it, alive.
	(else the simulation can lose exceptions). At least, for writes to
	the zero register generated by loads .. or .. can anything other
	integer instructions, that write to a register, cause exceptions?


	loads/stores: generate stack alignment checks as necessary


	fix barrier insns: ISB, DMB


	fix atomic loads/stores


	FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
	IROps so as to avoid double rounding


	ARM64Instr_Call getRegUsage: re-check relative to what
	getAllocableRegs_ARM64 makes available


	Make dispatch-arm64-linux.S save any callee-saved Q regs
	I think what is required is to save D8-D15 and nothing more than that.


	wrapper for __NR3264_fstat -- correct?


	PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
	definition of it in vki-arm64-linux.h. Ditto for 32 bit arm.


	sigframe-arm64-linux.c: build_sigframe: references to nonexistent
	siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
	replaced by zero. Also in synth_ucontext.


	m_debugger.c:
	uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
	Is that remotely correct?


	host_arm64_defs.c: emit_ARM64INstr:
	ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
	MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
	dependencies on the top half of the register. (Or at least check
	the semantics of INS Vd.D[0] to see if it zeroes out the top.)


	preferredVectorSubTypeFromSize: review perf effects and decide
	on a types-for-subparts policy


	fold_IRExpr_Unop: add a reduction rule for this
	1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
	vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)


	check insn selection for memcheck-only primops:
	Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
	widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8


	isel: get rid of various cases where zero is put into a register
	and just use xzr instead. Especially for CmpNEZ64/32. And for
	writing zeroes into the CC thunk fields.


	/* Keep this list in sync with that in iselNext below */
	/* Keep this list in sync with that for Ist_Exit above */
	uh .. they are not in sync


	very stupid:
	imm64 x23, 0xFFFFFFFFFFFFFFA0
	17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2


	valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
	also add CFI annotations


	could possibly bring r29 into use, which be useful as it is
	callee saved


	ubfm/sbfm etc: special case cases that are simple shifts, as iropt
	can't always simplify the general-case IR to a shift in such cases.


	LDP,STP (immediate, simm7) (FP&VEC)
	should zero out hi parts of dst registers in the LDP case


	DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
	rather than doing it "by hand"


	Any place where ZeroHI64ofV128 is used in conjunction with
	FP vector IROps: find a way to make sure that arithmetic on
	the upper half of the values is "harmless."


	math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
	inline scalar code


	chainXDirect_ARM64: use direct jump forms when possible