
Status
~~~~~~

As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
the 64-bit ARM architecture.  Currently it supports integer and FP
instructions and can run almost anything generated by gcc-4.8.2 -O2.
The port is under active development.

Current limitations, as of mid-Jan 2014.

* threaded apps won't work, due to inadequate sys_clone() support.

* almost no support of vector (SIMD) instructions

* Integration with the built in GDB server:
   - basically works but breakpoints are causing crashes due to missing
     unchainXDirect_ARM64 needed by LibVEX_UnChain.
     Use --vgdb=full to bypass the problem.  
   - still to do:
      arm64 xml register description files (allowing shadow registers
                                            to be looked at).
      ptrace invoker : currently disabled for both arm and arm64
      cpsr transfer to/from gdb to be looked at (see also arm equivalent code)

There has been extensive testing of the baseline simulation of integer
and FP instructions.  Memcheck is also believed to work, at least for
small examples.  Other tools appear to at least not crash when running
/bin/date.


Building
~~~~~~~~

You could probably build it directly on a target OS, using the normal
non-cross scheme

  ./autogen.sh ; ./configure --prefix=.. ; make ; make install

Development so far was however done by cross compiling, viz:

  export CC=aarch64-linux-gnu-gcc
  export LD=aarch64-linux-gnu-ld
  export AR=aarch64-linux-gnu-ar

  ./autogen.sh
  ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install

Doing this assumes that the install path (`pwd`/Inst) is valid on
both host and target, which isn't normally the case.  To avoid
this limitation, do instead:

  ./configure --prefix=/install/path/on/target \
              --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install DESTDIR=/a/temp/dir/on/host
  # and then copy the contents of DESTDIR to the target.

See README.android for more examples of cross-compile building.


Implementation tidying-up/TODO notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

UnwindStartRegs -- what should that contain?


vki-arm64-linux.h: vki_sigaction_base
I really don't think that __vki_sigrestore_t sa_restorer
should be present.  Adding it surely puts sa_mask at a wrong
offset compared to (kernel) reality.  But not having it causes
compilation of m_signals.c to fail in hard to understand ways,
so adding it temporarily.


m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF 
is there at the moment, but 0x00000000 is probably what it should be.
Also, fix indentation/tab-vs-space stuff


./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
it to __vki_uint128_t, but what's the defn of that?


m_debuginfo/priv_storage.h: need proper defn of DiCfSI


readdwarf.c: is this correct?
#elif defined(VGP_arm64_linux)
#  define FP_REG         29    //???
#  define SP_REG         31    //???
#  define RA_REG_DEFAULT 30    //???


vki-arm64-linux.h:
re linux-3.10.5/include/uapi/asm-generic/sembuf.h
I'd say the amd64 version has padding it shouldn't have.  Check?


syswrap-linux.c run_a_thread_NORETURN assembly sections
seems like tst->os_state.exitcode has word type
in which case the ppc64_linux use of lwz to read it, is wrong


syswrap-linux.c ML_(do_fork_clone)
assuming that VGP_arm64_linux is the same as VGP_arm_linux here


dispatch-arm64-linux.S: FIXME: set up FP control state before
entering generated code.  Also fix screwy indentation.


dispatcher-ery general: what's a good (predictor-friendly) way to
branch to a register?


in vki-arm64-scnums.h
//#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
Probably want to reenable that and clean up accordingly


putIRegXXorZR: figure out a way that the computed value is actually
used, so as to keep any memory reads that might generate it, alive.
(else the simulation can lose exceptions).  At least, for writes to
the zero register generated by loads .. or .. can anything other
integer instructions, that write to a register, cause exceptions?


loads/stores: generate stack alignment checks as necessary


fix barrier insns: ISB, DMB


fix atomic loads/stores


FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
IROps so as to avoid double rounding


ARM64Instr_Call getRegUsage: re-check relative to what
getAllocableRegs_ARM64 makes available


Make dispatch-arm64-linux.S save any callee-saved Q regs
I think what is required is to save D8-D15 and nothing more than that.


wrapper for __NR3264_fstat -- correct?


PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.


sigframe-arm64-linux.c: build_sigframe: references to nonexistent
siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
replaced by zero.  Also in synth_ucontext.


m_debugger.c:
uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
Is that remotely correct?


host_arm64_defs.c: emit_ARM64INstr:
ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
dependencies on the top half of the register.  (Or at least check
the semantics of INS Vd.D[0] to see if it zeroes out the top.)


preferredVectorSubTypeFromSize: review perf effects and decide
on a types-for-subparts policy


fold_IRExpr_Unop: add a reduction rule for this
1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)


check insn selection for memcheck-only primops:
Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8


isel: get rid of various cases where zero is put into a register
and just use xzr instead.  Especially for CmpNEZ64/32.  And for
writing zeroes into the CC thunk fields.


/* Keep this list in sync with that in iselNext below */
/* Keep this list in sync with that for Ist_Exit above */
uh .. they are not in sync


very stupid:
imm64  x23, 0xFFFFFFFFFFFFFFA0
17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 


valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
also add CFI annotations


could possibly bring r29 into use, which be useful as it is
callee saved


ubfm/sbfm etc: special case cases that are simple shifts, as iropt
can't always simplify the general-case IR to a shift in such cases.


LDP,STP (immediate, simm7) (FP&VEC)
should zero out hi parts of dst registers in the LDP case


DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
rather than doing it "by hand"


Any place where ZeroHI64ofV128 is used in conjunction with
FP vector IROps: find a way to make sure that arithmetic on
the upper half of the values is "harmless."


math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
inline scalar code
