Fix for crashes and failures due to 32-bit x86 struct layout.

(Revert "Revert "Fix for crashes and failures due to 32-bit x86 struct layout.""

This reverts commit 871eb011c1b5e9474ea021a4096824669b765b32.)

Add explicit padding to structure types, including invokable function
parameter structure types.  The padding does not change field offset
or structure size -- it makes explicit any padding that was implicit
due to the ABI.  This ensures that if the frontend compiles for an ABI
with stricter alignment requirements than the backend compiles for,
the frontend and backend will still agree on structure layout (field
offset and structure size).  This is important for 32-bit x86: The
frontend compiles for 32-bit ARM ABI, in which 64-bit scalars are
64-bit aligned; but the 32-bit x86 ABI says that 64-bit scalars are
only 32-bit aligned.

(Ideally, we would pad only exported structure types; but the most
convenient time to insert the padding is as soon as a structure type
definition is complete, so that we don't have to modify the AST to
update references to the structure's fields.  Unfortunately, this is
long before we can tell whether or not a structure type is exported.)

We had partial fixes for the 32-bit x86 problem in the backend (bcc),
but they were incomplete.  They compute field offsets according to the
ARM layout (thereby compatible with the frontend, including reflected
code and Allocation cell size) rather than the x86 layout; but:
- A stack-based local variable was allocated according to the
  (potentially smaller) x86 size rather than the ARM size, whereas
  field accesses occurred at ARM offsets, potentially spilling off
  the end of the local variable.
- Despite the old fixes, certain analyses/transformations (for
  example, certain loop optimizations) look at structure sizes and field
  offsets according to x86 rules rather than ARM rules

Also, for the benefit of libbcc change
  https://android-review.googlesource.com/#/c/358954/
make slang header files available as a module "slang_headers".

Bug: http://b/29154200
Bug: http://b/28070272

Test: (aosp_x86_64-eng emulator, full_fugu-eng, aosp_angler-eng) x
      (RsTest 32-bit, RsTest 64-bit, cts -m RenderscriptTest)
      tests/slang_test.py
      lit-tests/run-lit-tests.sh
  RsTest includes forthcoming additional regression tests:
    https://android-review.googlesource.com/#/c/299370/
  Tried (unmodified slang,   modified bcc) and
        (  modified slang, unmodified bcc) and
        (  modified slang,   modified bcc)
  By instrumenting modified bcc, confimed that:
  - Special x8632 layout transformations only run with unmodified slang,
    and only when test is compiled for x8632.
  "Modified bcc" is a forthcoming bcc change to turn off the "partial
  fixes" (x8632 layout transformations) mentioned above, and to verify
  that front end (Module) and back end (TargetMachine) agree on
  the layout of every exported struct type:
    https://android-review.googlesource.com/#/c/299531/

Change-Id: I25aac8e88812b5d3198e99f1929d4908ce663c46
diff --git a/Android.bp b/Android.bp
index 14748da..f0bda4f 100644
--- a/Android.bp
+++ b/Android.bp
@@ -65,6 +65,18 @@
     "libLLVMBitWriter_3_2",
 ]
 
+// Exported header files
+cc_library_headers {
+    name: "slang_headers",
+    export_include_dirs: ["."],
+    host_supported: true,
+    target: {
+        windows: {
+	    enabled: true,
+	},
+    },
+}
+
 // Static library libslang for host
 // ========================================================
 cc_library_host_static {
diff --git a/lit-tests/padding/bitfield.rs b/lit-tests/padding/bitfield.rs
new file mode 100644
index 0000000..2975c6b
--- /dev/null
+++ b/lit-tests/padding/bitfield.rs
@@ -0,0 +1,92 @@
+// RUN: %Slang %s
+
+// RUN: %rs-filecheck-wrapper %s --check-prefix=CHECK-LL
+//     Check that the data types are properly padded:
+// CHECK-LL: %struct.NoBitfield{{(\.[0-9]+)?}} = type { i32, [4 x i8], i64, float, [4 x i8] }
+// CHECK-LL: %struct.Bitfield{{(\.[0-9]+)?}} = type { i32, [4 x i8], i64, i8 }
+//     Check that only NoBitfield is exported:
+// CHECK-LL: !\23rs_export_type = !{![[NODE:[0-9]+]]}
+// CHECK-LL: ![[NODE]] = !{!"NoBitfield"}
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --type=NoBitfield --check-prefix=CHECK-JAVA-STRUCT %s
+// CHECK-JAVA-STRUCT:      public static Element createElement(RenderScript rs) {
+// CHECK-JAVA-STRUCT-NEXT:     Element.Builder eb = new Element.Builder(rs);
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I32(rs), "I");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_1");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I64(rs), "L");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.F32(rs), "F");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_2");
+// CHECK-JAVA-STRUCT-NEXT:     return eb.create();
+// CHECK-JAVA-STRUCT-NEXT: }
+
+#pragma version(1)
+#pragma rs java_package_name(foo)
+
+// There is a C99 rule (under "Structure and union members") that
+// reads "One special guarantee is made in order to simplify the use
+// of unions: if a union contains several structures that share a
+// common initial sequence, and if the union object currently contains
+// one of these structures, it is permitted to inspect the common
+// initial part of any of them anywhere that a declaration of the
+// completed type of the union is visible. Two structures share a
+// common initial sequence if corresponding members have compatible
+// types (and, for bit-fields, the same widths) for a sequence of one
+// or more initial members."
+//
+// We want to ensure that the common initial sequences of exported
+// and non-exported types have the same layout.
+
+// An exported type (because we declare a global variable of this type)
+struct NoBitfield {
+    int I;
+    // expect 4 bytes of padding here
+    long L;
+    float F;
+    // expect 4 bytes of padding here
+};
+
+struct NoBitfield junk;  // just to make this an exported type
+
+// A non-exported type that shares a common initial sequence with NoBitfield
+struct Bitfield {
+    int I;
+    // expect 4 bytes of padding here
+    long L;
+    uint U:3;
+};
+
+union CommonInitialSequence {
+    struct NoBitfield nbf;
+    struct   Bitfield  bf;
+};
+
+static union CommonInitialSequence U, V;
+
+static struct NoBitfield *nbf;
+static struct   Bitfield * bf;
+
+// Note: Sets through the exported type (NoBitfield)
+void setUnion(long argL, int argI) {
+    nbf->L = argL;
+    nbf->I = argI;
+}
+
+bool failed = false;
+
+// Note: Tests through the non-exported type (Bitfield)
+void testUnion(long argL, int argI) {
+    failed |= ((bf->I != argI) || (bf->L != argL));
+}
+
+// Note: Prevent compiler from optimizing setUnion()/testUnion()
+//       to convert indirect accesses through nbf/bf into direct
+//       accesses through U or V.
+void choose(int i) {
+    if (i) {
+        nbf = &U.nbf;
+         bf = &U. bf;
+    } else {
+        nbf = &V.nbf;
+         bf = &V. bf;
+    }
+}
diff --git a/lit-tests/padding/more_structs.rs b/lit-tests/padding/more_structs.rs
new file mode 100644
index 0000000..aabbc91
--- /dev/null
+++ b/lit-tests/padding/more_structs.rs
@@ -0,0 +1,122 @@
+// RUN: %Slang %s
+
+// RUN: %rs-filecheck-wrapper %s --check-prefix=CHECK-LL
+//
+//     Check that the data types are properly padded:
+//
+// CHECK-LL: %struct.char_struct{{(\.[0-9]+)?}} = type { i16, [6 x i8], i64 }
+// CHECK-LL: %struct.five_struct{{(\.[0-9]+)?}} = type { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }
+//
+//     Check that the helper function for unpacking an invokable's arguments
+//     accesses a properly padded struct:
+//
+// CHECK-LL: define void @.helper_check_char_struct({ i16, [6 x i8], i64 }* nocapture)
+// CHECK-LL: [[C_F1_ADDR:%[0-9]+]] = getelementptr inbounds { i16, [6 x i8], i64 }, { i16, [6 x i8], i64 }* %0, i{{[0-9]+}} 0, i32 0
+// CHECK-LL: [[C_F1_VAL:%[0-9]+]] = load i16, i16* [[C_F1_ADDR]]
+// CHECK-LL: [[C_F2_ADDR:%[0-9]+]] = getelementptr inbounds { i16, [6 x i8], i64 }, { i16, [6 x i8], i64 }* %0, i{{[0-9]+}} 0, i32 2
+// CHECK-LL: [[C_F2_VAL:%[0-9]+]] = load i64, i64* [[C_F2_ADDR]]
+// CHECK-LL: tail call void @check_char_struct(i16 [[C_F1_VAL]], i64 [[C_F2_VAL]])
+//
+// CHECK-LL: define void @.helper_check_five_struct({ i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* nocapture)
+// CHECK-LL: [[F_F1_ADDR:%[0-9]+]] = getelementptr inbounds { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }, { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* %0, i{{[0-9]+}} 0, i32 0
+// CHECK-LL: [[F_F1_VAL:%[0-9]+]] = load i8, i8* [[F_F1_ADDR]]
+// CHECK-LL: [[F_F2_ADDR:%[0-9]+]] = getelementptr inbounds { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }, { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* %0, i{{[0-9]+}} 0, i32 2
+// CHECK-LL: [[F_F2_VAL:%[0-9]+]] = load i64, i64* [[F_F2_ADDR]]
+// CHECK-LL: [[F_F3_ADDR:%[0-9]+]] = getelementptr inbounds { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }, { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* %0, i{{[0-9]+}} 0, i32 3
+// CHECK-LL: [[F_F3_VAL:%[0-9]+]] = load i16, i16* [[F_F3_ADDR]]
+// CHECK-LL: [[F_F4_ADDR:%[0-9]+]] = getelementptr inbounds { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }, { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* %0, i{{[0-9]+}} 0, i32 5
+// CHECK-LL: [[F_F4_VAL:%[0-9]+]] = load i64, i64* [[F_F4_ADDR]]
+// CHECK-LL: [[F_F5_ADDR:%[0-9]+]] = getelementptr inbounds { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }, { i8, [7 x i8], i64, i16, [6 x i8], i64, half, [6 x i8] }* %0, i{{[0-9]+}} 0, i32 6
+// CHECK-LL: [[F_F5_VAL:%[0-9]+]] = load half, half* [[F_F5_ADDR]]
+// CHECK-LL: tail call void @check_five_struct(i8 [[F_F1_VAL]], i64 [[F_F2_VAL]], i16 [[F_F3_VAL]], i64 [[F_F4_VAL]], half [[F_F5_VAL]])
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --type=char_struct --check-prefix=CHECK-JAVA-CHAR-STRUCT %s
+// CHECK-JAVA-CHAR-STRUCT:      public static Element createElement(RenderScript rs) {
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     Element.Builder eb = new Element.Builder(rs);
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     eb.add(Element.I16(rs), "f1");
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_1");
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     eb.add(Element.U16(rs), "#rs_padding_2");
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     eb.add(Element.I64(rs), "f2");
+// CHECK-JAVA-CHAR-STRUCT-NEXT:     return eb.create();
+// CHECK-JAVA-CHAR-STRUCT-NEXT: }
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --type=five_struct --check-prefix=CHECK-JAVA-FIVE-STRUCT %s
+// CHECK-JAVA-FIVE-STRUCT:      public static Element createElement(RenderScript rs) {
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     Element.Builder eb = new Element.Builder(rs);
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.I8(rs), "f1");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_1");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U16(rs), "#rs_padding_2");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U8(rs), "#rs_padding_3");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.I64(rs), "f2");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.I16(rs), "f3");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_4");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U16(rs), "#rs_padding_5");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.I64(rs), "f4");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.F16(rs), "f5");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_6");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     eb.add(Element.U16(rs), "#rs_padding_7");
+// CHECK-JAVA-FIVE-STRUCT-NEXT:     return eb.create();
+// CHECK-JAVA-FIVE-STRUCT-NEXT: }
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --check-prefix=CHECK-JAVA-INVOKE %s
+//
+// CHECK-JAVA-INVOKE:      public void invoke_check_char_struct(short arg1, long arg2) {
+// CHECK-JAVA-INVOKE-NEXT:     FieldPacker check_char_struct_fp = new FieldPacker(16);
+// CHECK-JAVA-INVOKE-NEXT:     check_char_struct_fp.addI16(arg1);
+// CHECK-JAVA-INVOKE-NEXT:     check_char_struct_fp.skip(6);
+// CHECK-JAVA-INVOKE-NEXT:     check_char_struct_fp.addI64(arg2);
+// CHECK-JAVA-INVOKE-NEXT:     invoke(mExportFuncIdx_check_char_struct, check_char_struct_fp);
+// CHECK-JAVA-INVOKE-NEXT: }
+//
+// CHECK-JAVA-INVOKE:      public void invoke_check_five_struct(byte arg1, long arg2, short arg3, long arg4, short arg5) {
+// CHECK-JAVA-INVOKE-NEXT:     FieldPacker check_five_struct_fp = new FieldPacker(40);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.addI8(arg1);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.skip(7);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.addI64(arg2);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.addI16(arg3);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.skip(6);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.addI64(arg4);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.addI16(arg5);
+// CHECK-JAVA-INVOKE-NEXT:     check_five_struct_fp.skip(6);
+// CHECK-JAVA-INVOKE-NEXT:     invoke(mExportFuncIdx_check_five_struct, check_five_struct_fp);
+// CHECK-JAVA-INVOKE-NEXT: }
+
+// Some more structs with different field types and/or more fields.
+
+#pragma version(1)
+#pragma rs java_package_name(foo)
+
+typedef struct char_struct {
+    short f1;
+    // expect 6 bytes of padding here
+    long f2;
+} char_struct;
+
+char_struct g_char_struct;
+
+typedef struct five_struct {
+    char f1;
+    // expect 7 bytes of padding here
+    long f2;
+    short f3;
+    // expect 6 bytes of padding here
+    long f4;
+    half f5;
+    // expect 6 bytes of padding here
+} five_struct;
+
+five_struct g_five_struct;
+
+bool failed = false;
+
+void check_char_struct(short arg1, long arg2) {
+    failed |= ((g_char_struct.f1 != arg1) || (g_char_struct.f2 != arg2));
+}
+
+void check_five_struct(char arg1, long arg2, short arg3, long arg4, half arg5) {
+    failed |= ((g_five_struct.f1 != arg1) ||
+               (g_five_struct.f2 != arg2) ||
+               (g_five_struct.f3 != arg3) ||
+               (g_five_struct.f4 != arg4) ||
+               (g_five_struct.f5 != arg5));
+}
diff --git a/lit-tests/padding/small_struct.rs b/lit-tests/padding/small_struct.rs
new file mode 100644
index 0000000..95057f7
--- /dev/null
+++ b/lit-tests/padding/small_struct.rs
@@ -0,0 +1,50 @@
+// RUN: %Slang %s
+
+// RUN: %rs-filecheck-wrapper %s --check-prefix=CHECK-LL
+//     Check that the data type small_struct is properly padded:
+// CHECK-LL: %struct.small_struct{{(\.[0-9]+)?}} = type { i32, [4 x i8], i64 }
+//     Check that the helper function for unpacking an invokable's arguments
+//     accesses a properly padded struct:
+// CHECK-LL: define void @.helper_checkStruct({ i32, [4 x i8], i64 }* nocapture)
+// CHECK-LL: [[FIELD_I_ADDR:%[0-9]+]] = getelementptr inbounds { i32, [4 x i8], i64 }, { i32, [4 x i8], i64 }* %0, i{{[0-9]+}} 0, i32 0
+// CHECK-LL: [[FIELD_I_VAL:%[0-9]+]] = load i32, i32* [[FIELD_I_ADDR]]
+// CHECK-LL: [[FIELD_L_ADDR:%[0-9]+]] = getelementptr inbounds { i32, [4 x i8], i64 }, { i32, [4 x i8], i64 }* %0, i{{[0-9]+}} 0, i32 2
+// CHECK-LL: [[FIELD_L_VAL:%[0-9]+]] = load i64, i64* [[FIELD_L_ADDR]]
+// CHECK-LL: call void @checkStruct(i32 [[FIELD_I_VAL]], i64 [[FIELD_L_VAL]])
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --type=small_struct --check-prefix=CHECK-JAVA-STRUCT %s
+// CHECK-JAVA-STRUCT:      public static Element createElement(RenderScript rs) {
+// CHECK-JAVA-STRUCT-NEXT:     Element.Builder eb = new Element.Builder(rs);
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I32(rs), "i");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_1");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I64(rs), "l");
+// CHECK-JAVA-STRUCT-NEXT:     return eb.create();
+// CHECK-JAVA-STRUCT-NEXT: }
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --check-prefix=CHECK-JAVA-INVOKE %s
+// CHECK-JAVA-INVOKE:      public void invoke_checkStruct(int argI, long argL) {
+// CHECK-JAVA-INVOKE-NEXT:     FieldPacker checkStruct_fp = new FieldPacker(16);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.addI32(argI);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.skip(4);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.addI64(argL);
+// CHECK-JAVA-INVOKE-NEXT:     invoke(mExportFuncIdx_checkStruct, checkStruct_fp);
+// CHECK-JAVA-INVOKE-NEXT: }
+
+// Same as small_struct_2.rs except for order of fields (hence location of padding) in struct small_struct[_2].
+
+#pragma version(1)
+#pragma rs java_package_name(foo)
+
+typedef struct small_struct {
+    int i;
+    // expect 4 bytes of padding here
+    long l;
+} small_struct;
+
+small_struct g_small_struct;
+
+bool failed = false;
+
+void checkStruct(int argI, long argL) {
+    failed |= ((g_small_struct.i != argI) || (g_small_struct.l != argL));
+}
diff --git a/lit-tests/padding/small_struct_2.rs b/lit-tests/padding/small_struct_2.rs
new file mode 100644
index 0000000..b622cc5
--- /dev/null
+++ b/lit-tests/padding/small_struct_2.rs
@@ -0,0 +1,50 @@
+// RUN: %Slang %s
+
+// RUN: %rs-filecheck-wrapper %s --check-prefix=CHECK-LL
+//     Check that the data type small_struct_2 is properly padded:
+// CHECK-LL: %struct.small_struct_2{{(\.[0-9]+)?}} = type { i64, i32, [4 x i8] }
+//     Check that the helper function for unpacking an invokable's arguments
+//     accesses a properly (un)padded struct:
+// CHECK-LL: define void @.helper_checkStruct({ i64, i32, [4 x i8] }* nocapture)
+// CHECK-LL: [[FIELD_L_ADDR:%[0-9]+]] = getelementptr inbounds { i64, i32, [4 x i8] }, { i64, i32, [4 x i8] }* %0, i{{[0-9]+}} 0, i32 0
+// CHECK-LL: [[FIELD_L_VAL:%[0-9]+]] = load i64, i64* [[FIELD_L_ADDR]]
+// CHECK-LL: [[FIELD_I_ADDR:%[0-9]+]] = getelementptr inbounds { i64, i32, [4 x i8] }, { i64, i32, [4 x i8] }* %0, i{{[0-9]+}} 0, i32 1
+// CHECK-LL: [[FIELD_I_VAL:%[0-9]+]] = load i32, i32* [[FIELD_I_ADDR]]
+// CHECK-LL: call void @checkStruct(i64 [[FIELD_L_VAL]], i32 [[FIELD_I_VAL]])
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --type=small_struct_2 --check-prefix=CHECK-JAVA-STRUCT %s
+// CHECK-JAVA-STRUCT:      public static Element createElement(RenderScript rs) {
+// CHECK-JAVA-STRUCT-NEXT:     Element.Builder eb = new Element.Builder(rs);
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I64(rs), "l");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.I32(rs), "i");
+// CHECK-JAVA-STRUCT-NEXT:     eb.add(Element.U32(rs), "#rs_padding_1");
+// CHECK-JAVA-STRUCT-NEXT:     return eb.create();
+// CHECK-JAVA-STRUCT-NEXT: }
+
+// RUN: %scriptc-filecheck-wrapper --lang=Java --check-prefix=CHECK-JAVA-INVOKE %s
+// CHECK-JAVA-INVOKE:      public void invoke_checkStruct(long argL, int argI) {
+// CHECK-JAVA-INVOKE-NEXT:     FieldPacker checkStruct_fp = new FieldPacker(16);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.addI64(argL);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.addI32(argI);
+// CHECK-JAVA-INVOKE-NEXT:     checkStruct_fp.skip(4);
+// CHECK-JAVA-INVOKE-NEXT:     invoke(mExportFuncIdx_checkStruct, checkStruct_fp);
+// CHECK-JAVA-INVOKE-NEXT: }
+
+// Same as small_struct.rs except for order of fields (hence location of padding) in struct small_struct[_2].
+
+#pragma version(1)
+#pragma rs java_package_name(foo)
+
+typedef struct small_struct_2 {
+    long l;
+    int i;
+    // expect 4 bytes of padding here
+} small_struct_2;
+
+small_struct_2 g_small_struct_2;
+
+bool failed = false;
+
+void checkStruct(long argL, int argI) {
+    failed |= ((g_small_struct_2.l != argL) || (g_small_struct_2.i != argI));
+}
diff --git a/lit-tests/rs-filecheck-wrapper.sh b/lit-tests/rs-filecheck-wrapper.sh
index 8f6d718..fb37f02 100755
--- a/lit-tests/rs-filecheck-wrapper.sh
+++ b/lit-tests/rs-filecheck-wrapper.sh
@@ -1,14 +1,15 @@
 #!/bin/bash -e
 
 # RS Invocation script to FileCheck
-# Usage: rs-filecheck-wrapper.sh <output-directory> <path-to-FileCheck> <source>
+# Usage: rs-filecheck-wrapper.sh <output-directory> <path-to-FileCheck> <source> [<more-args>]
 
 OUTDIR=$1
 FILECHECK=$2
 SOURCEFILE=$3
+shift 3
 
 FILECHECK_INPUTFILE=`basename $SOURCEFILE | sed 's/\.rs\$/.ll/'`
 
 # This runs FileCheck on both the 32 bit and the 64 bit bitcode files.
-$FILECHECK -input-file $OUTDIR/bc32/$FILECHECK_INPUTFILE $SOURCEFILE
-$FILECHECK -input-file $OUTDIR/bc64/$FILECHECK_INPUTFILE $SOURCEFILE
+$FILECHECK -input-file $OUTDIR/bc32/$FILECHECK_INPUTFILE $SOURCEFILE $@
+$FILECHECK -input-file $OUTDIR/bc64/$FILECHECK_INPUTFILE $SOURCEFILE $@
diff --git a/lit-tests/scriptc-filecheck-wrapper.sh b/lit-tests/scriptc-filecheck-wrapper.sh
index 7e16872..9485f2b 100755
--- a/lit-tests/scriptc-filecheck-wrapper.sh
+++ b/lit-tests/scriptc-filecheck-wrapper.sh
@@ -8,6 +8,8 @@
   help_str="Usage: %s --output=<output-dir> \
 --filecheck=<path-to-filecheck> \
 --lang=[Java/C++] \
+[--type=<typename>] \
+[--check-prefix=<prefix>] \
 <.rs file>\n"
 
   printf "$help_str" $0
@@ -25,6 +27,12 @@
   --lang*)
     lang="${arg#*=}"
     ;;
+  --type*)
+    type="${arg#*=}"
+    ;;
+  --check-prefix=*)
+    check_prefix="${arg}"
+    ;;
   --help)
     print_help
     exit 0
@@ -51,9 +59,20 @@
 
 if [[ $lang == "Java" ]]
 then
-  filecheck_inputfile=foo/ScriptC_${rsfile_basename%.*}.java
+  if [[ (-z $type) ]]
+  then
+    filecheck_inputfile=foo/ScriptC_${rsfile_basename%.*}.java
+  else
+    filecheck_inputfile=foo/ScriptField_${type}.java
+  fi
 elif [[ $lang == "C++" ]]
 then
+  if [[ (-n $type) ]]
+  then
+    echo --type not supported for C++
+    print_help
+    exit 1
+  fi
   filecheck_inputfile=ScriptC_${rsfile_basename%.*}.h
 else
   echo Unknown language "$lang"
@@ -73,4 +92,4 @@
   exit 1
 fi
 
-"$filecheck" -input-file "$outdir"/$filecheck_inputfile "$rsfile"
+"$filecheck" -input-file "$outdir"/$filecheck_inputfile ${check_prefix} "$rsfile"
diff --git a/slang_backend.cpp b/slang_backend.cpp
index 5ff90b9..16f6e41 100644
--- a/slang_backend.cpp
+++ b/slang_backend.cpp
@@ -18,11 +18,13 @@
 
 #include <string>
 #include <vector>
+#include <iostream>
 
 #include "clang/AST/ASTContext.h"
 #include "clang/AST/Attr.h"
 #include "clang/AST/Decl.h"
 #include "clang/AST/DeclGroup.h"
+#include "clang/AST/RecordLayout.h"
 
 #include "clang/Basic/Diagnostic.h"
 #include "clang/Basic/TargetInfo.h"
@@ -339,7 +341,205 @@
   }
 }
 
+// Insert explicit padding fields into struct to follow the current layout.
+//
+// A similar algorithm is present in PadHelperFunctionStruct().
+void Backend::PadStruct(clang::RecordDecl* RD) {
+  // Example of padding:
+  //
+  //   // ORIGINAL CODE                   // TRANSFORMED CODE
+  //   struct foo {                       struct foo {
+  //     int a;                             int a;
+  //     // 4 bytes of padding              char <RS_PADDING_FIELD_NAME>[4];
+  //     long b;                            long b;
+  //     int c;                             int c;
+  //     // 4 bytes of (tail) padding       char <RS_PADDING_FIELD_NAME>[4];
+  //   };                                 };
+
+  // We collect all of RD's fields in a vector FieldsInfo.  We
+  // represent tail padding as an entry in the FieldsInfo vector with a
+  // null FieldDecl.
+  typedef std::pair<size_t, clang::FieldDecl*> FieldInfoType;  // (pre-field padding bytes, field)
+  std::vector<FieldInfoType> FieldsInfo;
+
+  // RenderScript is C99-based, so we only expect to see fields.  We
+  // could iterate over fields, but instead let's iterate over
+  // everything, to verify that there are only fields.
+  for (clang::Decl* D : RD->decls()) {
+    clang::FieldDecl* FD = clang::dyn_cast<clang::FieldDecl>(D);
+    slangAssert(FD && "found a non field declaration within a struct");
+    FieldsInfo.push_back(std::make_pair(size_t(0), FD));
+  }
+
+  clang::ASTContext& ASTC = mContext->getASTContext();
+
+  // ASTContext caches record layout.  We may transform the record in a way
+  // that would render this cached information incorrect.  clang does
+  // not provide any way to invalidate this cached information.  We
+  // take the following approach:
+  //
+  // 1. ASSUME that record layout has not yet been computed for RD.
+  //
+  // 2. Create a temporary clone of RD, and compute its layout.
+  //    ASSUME that we know how to clone RD in a way that copies all the
+  //    properties that are relevant to its layout.
+  //
+  // 3. Use the layout information from the temporary clone to
+  //    transform RD.
+  //
+  // NOTE: ASTContext also caches TypeInfo (see
+  //       ASTContext::getTypeInfo()).  ASSUME that inserting padding
+  //       fields doesn't change the type in any way that affects
+  //       TypeInfo.
+  //
+  // NOTE: A RecordType knows its associated RecordDecl -- so even
+  //       while we're manipulating RD, the associated RecordType
+  //       still recognizes RD as its RecordDecl.  ASSUME that we
+  //       don't do anything during our manipulation that would cause
+  //       the RecordType to be followed to RD while RD is in a
+  //       partially transformed state.
+
+  // The assumptions above may be brittle, and if they are incorrect,
+  // we may get mysterious failures.
+
+  // create a temporary clone
+  clang::RecordDecl* RDForLayout =
+      clang::RecordDecl::Create(ASTC, clang::TTK_Struct, RD->getDeclContext(),
+                                clang::SourceLocation(), clang::SourceLocation(),
+                                nullptr /* IdentifierInfo */);
+  RDForLayout->startDefinition();
+  RDForLayout->setTypeForDecl(RD->getTypeForDecl());
+  RDForLayout->setAttrs(RD->getAttrs());
+  RDForLayout->completeDefinition();
+
+  // move all fields from RD to RDForLayout
+  for (const auto &info : FieldsInfo) {
+    RD->removeDecl(info.second);
+    RDForLayout->addDecl(info.second);
+  }
+
+  const clang::ASTRecordLayout& RL = ASTC.getASTRecordLayout(RDForLayout);
+
+  // An exportable type cannot contain a bitfield.  However, it's
+  // possible that this current type might have a bitfield and yet
+  // share a common initial sequence with an exportable type, so even
+  // if the current type has a bitfield, the current type still
+  // needs to have explicit padding inserted (in case the two types
+  // under discussion are members of a union).  We don't need to
+  // insert any padding after the bitfield, however, because that
+  // would be beyond the common initial sequence.
+  bool foundBitField = false;
+
+  // Is there any padding in this struct?
+  bool foundPadding = false;
+
+  unsigned fieldNo = 0;
+  uint64_t fieldPrePaddingOffset = 0;  // byte offset of pre-field padding within struct
+  for (auto &info : FieldsInfo) {
+    const clang::FieldDecl* FD = info.second;
+
+    if ((foundBitField = FD->isBitField()))
+      break;
+
+    const uint64_t fieldOffset = RL.getFieldOffset(fieldNo) >> 3;
+    const size_t prePadding = fieldOffset - fieldPrePaddingOffset;
+    foundPadding |= (prePadding != 0);
+    info.first = prePadding;
+
+    // get ready for the next field
+    //
+    //   assumes that getTypeSize() is the storage size of the Type -- for example,
+    //   that it includes a struct's tail padding (if any)
+    //
+    fieldPrePaddingOffset = fieldOffset + (ASTC.getTypeSize(FD->getType()) >> 3);
+    ++fieldNo;
+  }
+
+  if (!foundBitField) {
+    // In order to ensure that the front end (including reflected
+    // code) and back end agree on struct size (not just field
+    // offsets) we may need to add explicit tail padding, just as we'e
+    // added explicit padding between fields.
+    slangAssert(RL.getSize().getQuantity() >= fieldPrePaddingOffset);
+    if (const size_t tailPadding = RL.getSize().getQuantity() - fieldPrePaddingOffset) {
+      foundPadding = true;
+      FieldsInfo.push_back(std::make_pair(tailPadding, nullptr));
+    }
+  }
+
+  if (false /* change to "true" for extra debugging output */) {
+   if (foundPadding) {
+     std::cout << "PadStruct(" << RD->getNameAsString() << "):" << std::endl;
+     for (const auto &info : FieldsInfo)
+       std::cout << "  " << info.first << ", " << (info.second ? info.second->getNameAsString() : "<tail>") << std::endl;
+   }
+  }
+
+  if (foundPadding && Slang::IsLocInRSHeaderFile(RD->getLocation(), mSourceMgr)) {
+    mContext->ReportError(RD->getLocation(), "system structure contains padding: '%0'")
+        << RD->getName();
+  }
+
+  // now move fields from RDForLayout to RD, and add any necessary
+  // padding fields
+  const clang::QualType byteType = ASTC.getIntTypeForBitwidth(8, false /* not signed */);
+  clang::IdentifierInfo* const paddingIdentifierInfo = &ASTC.Idents.get(RS_PADDING_FIELD_NAME);
+  for (const auto &info : FieldsInfo) {
+    if (info.first != 0) {
+      // Create a padding field: "char <RS_PADDING_FIELD_NAME>[<info.first>];"
+
+      // TODO: Do we need to do anything else to keep this field from being shown in debugger?
+      //       There's no source location, and the field is marked as implicit.
+      const clang::QualType paddingType =
+          ASTC.getConstantArrayType(byteType,
+                                    llvm::APInt(sizeof(info.first) << 3, info.first),
+                                    clang::ArrayType::Normal, 0 /* IndexTypeQuals */);
+      clang::FieldDecl* const FD =
+          clang::FieldDecl::Create(ASTC, RD, clang::SourceLocation(), clang::SourceLocation(),
+                                   paddingIdentifierInfo,
+                                   paddingType,
+                                   nullptr,  // TypeSourceInfo*
+                                   nullptr,  // BW (bitwidth)
+                                   false,    // Mutable = false
+                                   clang::ICIS_NoInit);
+      FD->setImplicit(true);
+      RD->addDecl(FD);
+    }
+    if (info.second != nullptr) {
+      RDForLayout->removeDecl(info.second);
+      RD->addDecl(info.second);
+    }
+  }
+
+  // There does not appear to be any safe way to delete a RecordDecl
+  // -- for example, there is no RecordDecl destructor to invalidate
+  // cached record layout, and if we were to get unlucky, some future
+  // RecordDecl could be allocated in the same place as a deleted
+  // RDForLayout and "inherit" the cached record layout from
+  // RDForLayout.
+}
+
 void Backend::HandleTagDeclDefinition(clang::TagDecl *D) {
+  // we want to insert explicit padding fields into structs per http://b/29154200 and http://b/28070272
+  switch (D->getTagKind()) {
+    case clang::TTK_Struct:
+      PadStruct(llvm::cast<clang::RecordDecl>(D));
+      break;
+
+    case clang::TTK_Union:
+      // cannot be part of an exported type
+      break;
+
+    case clang::TTK_Enum:
+      // a scalar
+      break;
+
+    case clang::TTK_Class:
+    case clang::TTK_Interface:
+    default:
+      slangAssert(false && "Unexpected TagTypeKind");
+      break;
+  }
   mGen->HandleTagDeclDefinition(D);
 }
 
@@ -590,6 +790,61 @@
   }
 }
 
+// A similar algorithm is present in Backend::PadStruct().
+static void PadHelperFunctionStruct(llvm::Module *M,
+                                    llvm::StructType **paddedStructType,
+                                    std::vector<unsigned> *origFieldNumToPaddedFieldNum,
+                                    llvm::StructType *origStructType) {
+  slangAssert(origFieldNumToPaddedFieldNum->empty());
+  origFieldNumToPaddedFieldNum->resize(2 * origStructType->getNumElements());
+
+  llvm::LLVMContext &llvmContext = M->getContext();
+
+  const llvm::DataLayout *DL = &M->getDataLayout();
+  const llvm::StructLayout *SL = DL->getStructLayout(origStructType);
+
+  // Field types -- including any padding fields we need to insert.
+  std::vector<llvm::Type *> paddedFieldTypes;
+  paddedFieldTypes.reserve(2 * origStructType->getNumElements());
+
+  // Is there any padding in this struct?
+  bool foundPadding = false;
+
+  llvm::Type *const byteType = llvm::Type::getInt8Ty(llvmContext);
+  unsigned origFieldNum = 0, paddedFieldNum = 0;
+  uint64_t fieldPrePaddingOffset = 0;  // byte offset of pre-field padding within struct
+  for (llvm::Type *fieldType : origStructType->elements()) {
+    const uint64_t fieldOffset = SL->getElementOffset(origFieldNum);
+    const size_t prePadding = fieldOffset - fieldPrePaddingOffset;
+    if (prePadding != 0) {
+      foundPadding = true;
+      paddedFieldTypes.push_back(llvm::ArrayType::get(byteType, prePadding));
+      ++paddedFieldNum;
+    }
+    paddedFieldTypes.push_back(fieldType);
+    (*origFieldNumToPaddedFieldNum)[origFieldNum] = paddedFieldNum;
+
+    // get ready for the next field
+    fieldPrePaddingOffset = fieldOffset + DL->getTypeAllocSize(fieldType);
+    ++origFieldNum;
+    ++paddedFieldNum;
+  }
+
+  // In order to ensure that the front end (including reflected code)
+  // and back end agree on struct size (not just field offsets) we may
+  // need to add explicit tail padding, just as we'e added explicit
+  // padding between fields.
+  slangAssert(SL->getSizeInBytes() >= fieldPrePaddingOffset);
+  if (const size_t tailPadding = SL->getSizeInBytes() - fieldPrePaddingOffset) {
+    foundPadding = true;
+    paddedFieldTypes.push_back(llvm::ArrayType::get(byteType, tailPadding));
+  }
+
+  *paddedStructType = (foundPadding
+                       ? llvm::StructType::get(llvmContext, paddedFieldTypes)
+                       : origStructType);
+}
+
 void Backend::dumpExportFunctionInfo(llvm::Module *M) {
   if (mExportFuncMetadata == nullptr)
     mExportFuncMetadata =
@@ -617,7 +872,10 @@
 
       // Create helper function
       {
-        llvm::StructType *HelperFunctionParameterTy = nullptr;
+        llvm::StructType *OrigHelperFunctionParameterTy = nullptr;
+        llvm::StructType *PaddedHelperFunctionParameterTy = nullptr;
+
+        std::vector<unsigned> OrigFieldNumToPaddedFieldNum;
         std::vector<bool> isStructInput;
 
         if (!F->getArgumentList().empty()) {
@@ -632,11 +890,14 @@
                   isStructInput.push_back(false);
               }
           }
-          HelperFunctionParameterTy =
+          OrigHelperFunctionParameterTy =
               llvm::StructType::get(mLLVMContext, HelperFunctionParameterTys);
+          PadHelperFunctionStruct(M,
+                                  &PaddedHelperFunctionParameterTy, &OrigFieldNumToPaddedFieldNum,
+                                  OrigHelperFunctionParameterTy);
         }
 
-        if (!EF->checkParameterPacketType(HelperFunctionParameterTy)) {
+        if (!EF->checkParameterPacketType(OrigHelperFunctionParameterTy)) {
           fprintf(stderr, "Failed to export function %s: parameter type "
                           "mismatch during creation of helper function.\n",
                   EF->getName().c_str());
@@ -646,16 +907,16 @@
             fprintf(stderr, "Expected:\n");
             Expected->getLLVMType()->dump();
           }
-          if (HelperFunctionParameterTy) {
+          if (OrigHelperFunctionParameterTy) {
             fprintf(stderr, "Got:\n");
-            HelperFunctionParameterTy->dump();
+            OrigHelperFunctionParameterTy->dump();
           }
         }
 
         std::vector<llvm::Type*> Params;
-        if (HelperFunctionParameterTy) {
+        if (PaddedHelperFunctionParameterTy) {
           llvm::PointerType *HelperFunctionParameterTyP =
-              llvm::PointerType::getUnqual(HelperFunctionParameterTy);
+              llvm::PointerType::getUnqual(PaddedHelperFunctionParameterTy);
           Params.push_back(HelperFunctionParameterTyP);
         }
 
@@ -688,17 +949,17 @@
 
           // getelementptr and load instruction for all elements in
           // parameter .p
-          for (size_t i = 0; i < EF->getNumParameters(); i++) {
+          for (size_t origFieldNum = 0; origFieldNum < EF->getNumParameters(); origFieldNum++) {
             // getelementptr
             Idx[1] = llvm::ConstantInt::get(
-              llvm::Type::getInt32Ty(mLLVMContext), i);
+              llvm::Type::getInt32Ty(mLLVMContext), OrigFieldNumToPaddedFieldNum[origFieldNum]);
 
             llvm::Value *Ptr = NULL;
 
             Ptr = IB->CreateInBoundsGEP(HelperFunctionParameter, Idx);
 
             // Load is only required for non-struct ptrs
-            if (isStructInput[i]) {
+            if (isStructInput[origFieldNum]) {
                 Params.push_back(Ptr);
             } else {
                 llvm::Value *V = IB->CreateLoad(Ptr);
diff --git a/slang_backend.h b/slang_backend.h
index 38aa1e4..4bb596f 100644
--- a/slang_backend.h
+++ b/slang_backend.h
@@ -123,6 +123,20 @@
   // rsForEachWithOptions().
   void LowerRSForEachCall(clang::FunctionDecl* FD, bool isKernel);
 
+  // Insert explicit padding fields into struct to follow the current
+  // layout as defined by the RenderScript ABI (32-bit or 64-bit ARM).
+  //
+  // The padding does not change field offset or structure size -- it
+  // makes explicit any padding that was implicit due to the ABI.
+  // This ensures that if the frontend compiles for an ABI with
+  // stricter alignment requirements than the backend compiles for,
+  // the frontend and backend will still agree on structure layout
+  // (field offset and structure size).  This is important for 32-bit
+  // x86: The frontend compiles for 32-bit ARM ABI, in which 64-bit
+  // scalars are 64-bit aligned; but the 32-bit x86 ABI says that
+  // 64-bit scalars are only 32-bit aligned.
+  void PadStruct(clang::RecordDecl* RD);
+
  protected:
   llvm::LLVMContext &mLLVMContext;
   clang::DiagnosticsEngine &mDiagEngine;
diff --git a/slang_rs_export_type.cpp b/slang_rs_export_type.cpp
index ecd83b0..030d47b 100644
--- a/slang_rs_export_type.cpp
+++ b/slang_rs_export_type.cpp
@@ -1526,6 +1526,9 @@
       return nullptr;
     }
 
+    if (FD->isImplicit() && (FD->getName() == RS_PADDING_FIELD_NAME))
+      continue;
+
     // Type
     RSExportType *ET = RSExportElement::CreateFromDecl(Context, FD);
 
diff --git a/slang_rs_export_type.h b/slang_rs_export_type.h
index 7921386..fcdd9cd 100644
--- a/slang_rs_export_type.h
+++ b/slang_rs_export_type.h
@@ -33,6 +33,7 @@
 
 #include "slang_rs_exportable.h"
 
+#define RS_PADDING_FIELD_NAME ".rs.padding"
 
 inline const clang::Type* GetCanonicalType(const clang::Type* T) {
   if (T == nullptr) {
diff --git a/slang_version.h b/slang_version.h
index 6f96f76..7c07389 100644
--- a/slang_version.h
+++ b/slang_version.h
@@ -74,7 +74,8 @@
   M = 2300,
   M_RS_OBJECT = 2310,
   N = 2400,
-  CURRENT = N
+  N_STRUCT_EXPLICIT_PADDING = 2410,
+  CURRENT = N_STRUCT_EXPLICIT_PADDING
 };
 }  // namespace SlangVersion