Update libjpeg-turbo to 1.2.0.
This change applies the upstream changes from 1.1.90 to 1.2.0 (r733).

BUG=none
TEST=webkit layout_tests
Review URL: https://chromiumcodereview.appspot.com/9232002

git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/libjpeg_turbo@118072 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
diff --git a/ChangeLog.txt b/ChangeLog.txt
index d1c877a..083a0a4 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -1,3 +1,25 @@
+1.2.0
+=====
+
+[1] Fixed build issue with YASM on Unix systems (the libjpeg-turbo build system
+was not adding the current directory to the assembler include path, so YASM
+was not able to find jsimdcfg.inc.)
+
+[2] Fixed out-of-bounds read in SSE2 SIMD code that occurred when decompressing
+a JPEG image to a bitmap buffer whose size was not a multiple of 16 bytes.
+This was more of an annoyance than an actual bug, since it did not cause any
+actual run-time problems, but the issue showed up when running libjpeg-turbo in
+valgrind.  See http://crbug.com/72399 for more information.
+
+[3] Added a compile-time macro (LIBJPEG_TURBO_VERSION) that can be used to
+check the version of libjpeg-turbo against which an application was compiled.
+
+[4] Added new RGBA/BGRA/ABGR/ARGB colorspace extension constants (libjpeg API)
+and pixel formats (TurboJPEG API), which allow applications to specify that,
+when decompressing to a 4-component RGB buffer, the unused byte should be set
+to 0xFF so that it can be interpreted as an opaque alpha channel.
+
+
 1.1.90 (1.2 beta1)
 ==================
 
diff --git a/README b/README
index 2ead09e..afa052d 100644
--- a/README
+++ b/README
@@ -62,7 +62,7 @@
 This package contains C software to implement JPEG image encoding, decoding,
 and transcoding.  JPEG (pronounced "jay-peg") is a standardized compression
 method for full-color and gray-scale images.  JPEG's strong suit is compressing
-photographic images or other types of images which have smooth color and
+photographic images or other types of images that have smooth color and
 brightness transitions between neighboring pixels.  Images with sharp lines or
 other abrupt features may not compress well with JPEG, and a higher JPEG
 quality may have to be used to avoid visible compression artifacts with such
@@ -274,7 +274,7 @@
 ================
 
 The ISO JPEG standards committee actually promotes different formats like
-"JPEG 2000" or "JPEG XR" which are incompatible with original DCT-based
+"JPEG 2000" or "JPEG XR", which are incompatible with original DCT-based
 JPEG.  IJG therefore does not support these formats (see REFERENCES).  Indeed,
 one of the original reasons for developing this free software was to help
 force convergence on common, interoperable format standards for JPEG files.
diff --git a/README-turbo.txt b/README-turbo.txt
index ac37556..327647c 100644
--- a/README-turbo.txt
+++ b/README-turbo.txt
@@ -2,7 +2,7 @@
 **     Background
 *******************************************************************************
 
-libjpeg-turbo is a derivative of libjpeg which uses SIMD instructions (MMX,
+libjpeg-turbo is a derivative of libjpeg that uses SIMD instructions (MMX,
 SSE2, etc.) to accelerate baseline JPEG compression and decompression on x86
 and x86-64 systems.  On such systems, libjpeg-turbo is generally 2-4x as fast
 as the unmodified version of libjpeg, all else being equal.
@@ -59,7 +59,7 @@
 **     Using libjpeg-turbo
 *******************************************************************************
 
-libjpeg-turbo includes two APIs which can be used to compress and decompress
+libjpeg-turbo includes two APIs that can be used to compress and decompress
 JPEG images:
 
   TurboJPEG/OSS:  This API wraps libjpeg-turbo and provides an easy-to-use
@@ -122,7 +122,7 @@
 Windows releases can obtain it from the Visual C++ 2008 Redistributable
 Package, which is available as a free download from Microsoft's web site.
 
-NOTE:  Features of libjpeg which require passing a C run time structure, such
+NOTE:  Features of libjpeg that require passing a C run time structure, such
 as a file handle, from an application to libjpeg will probably not work with
 the version of the libjpeg-turbo DLL distributed in the libjpeg-turbo SDK for
 Visual C++, unless the application is also built to use the Visual C++ 2008 C
@@ -208,9 +208,9 @@
 Colorspace Extensions
 =====================
 
-libjpeg-turbo includes extensions which allow JPEG images to be compressed
-directly from (and decompressed directly to) buffers which use BGR, BGRX,
-RGBX, XBGR, and XRGB pixel ordering.  This is implemented with six new
+libjpeg-turbo includes extensions that allow JPEG images to be compressed
+directly from (and decompressed directly to) buffers that use BGR, BGRX,
+RGBX, XBGR, and XRGB pixel ordering.  This is implemented with ten new
 colorspace constants:
 
   JCS_EXT_RGB   /* red/green/blue */
@@ -219,11 +219,15 @@
   JCS_EXT_BGRX  /* blue/green/red/x */
   JCS_EXT_XBGR  /* x/blue/green/red */
   JCS_EXT_XRGB  /* x/red/green/blue */
+  JCS_EXT_RGBA  /* red/green/blue/alpha */
+  JCS_EXT_BGRA  /* blue/green/red/alpha */
+  JCS_EXT_ABGR  /* alpha/blue/green/red */
+  JCS_EXT_ARGB  /* alpha/red/green/blue */
 
 Setting cinfo.in_color_space (compression) or cinfo.out_color_space
 (decompression) to one of these values will cause libjpeg-turbo to read the
 red, green, and blue values from (or write them to) the appropriate position in
-the pixel when YUV conversion is performed.
+the pixel when compressing from/decompressing to an RGB buffer.
 
 Your application can check for the existence of these extensions at compile
 time with:
@@ -233,13 +237,28 @@
 At run time, attempting to use these extensions with a version of libjpeg
 that doesn't support them will result in a "Bogus input colorspace" error.
 
+When using the RGBX, BGRX, XBGR, and XRGB colorspaces during decompression, the
+X byte is undefined, and in order to ensure the best performance, libjpeg-turbo
+can set that byte to whatever value it wishes.  If an application expects the X
+byte to be used as an alpha channel, then it should specify JCS_EXT_RGBA,
+JCS_EXT_BGRA, JCS_EXT_ABGR, or JCS_EXT_ARGB.  When these colorspace constants
+are used, the X byte is guaranteed to be 0xFF, which is interpreted as opaque.
+
+Your application can check for the existence of the alpha channel colorspace
+extensions at compile time with:
+
+  #ifdef JCS_ALPHA_EXTENSIONS
+
+jcstest.c, located in the libjpeg-turbo source tree, demonstrates how to check
+for the existence of the colorspace extensions at compile time and run time.
+
 =================================
 libjpeg v7 and v8 API/ABI support
 =================================
 
 libjpeg v7 and v8 added new features to the API/ABI, and, unfortunately, the
 compression and decompression structures were extended in a backward-
-incompatible manner to accommodate these features.  Thus, programs which are
+incompatible manner to accommodate these features.  Thus, programs that are
 built to use libjpeg v7 or v8 did not work with libjpeg-turbo, since it is
 based on the libjpeg v6b code base.  Although libjpeg v7 and v8 are still not
 as widely used as v6b, enough programs (including a few Linux distros) have
@@ -247,7 +266,7 @@
 API/ABI in libjpeg-turbo.
 
 Some of the libjpeg v7 and v8 features -- DCT scaling, to name one -- involve
-deep modifications to the code which cannot be accommodated by libjpeg-turbo
+deep modifications to the code that cannot be accommodated by libjpeg-turbo
 without either breaking compatibility with libjpeg v6b or producing an
 unsupportable mess.  In order to fully support libjpeg v8 with all of its
 features, we would have to essentially port the SIMD extensions to the libjpeg
@@ -258,8 +277,8 @@
 
 By passing an argument of --with-jpeg7 or --with-jpeg8 to configure, or an
 argument of -DWITH_JPEG7=1 or -DWITH_JPEG8=1 to cmake, you can build a version
-of libjpeg-turbo which emulates the libjpeg v7 or v8 API/ABI, so that programs
-which are built against libjpeg v7 or v8 can be run with libjpeg-turbo.  The
+of libjpeg-turbo that emulates the libjpeg v7 or v8 API/ABI, so that programs
+that are built against libjpeg v7 or v8 can be run with libjpeg-turbo.  The
 following section describes which libjpeg v7+ features are supported and which
 aren't.
 
diff --git a/README.chromium b/README.chromium
index 955e277..8e02302 100644
--- a/README.chromium
+++ b/README.chromium
@@ -1,12 +1,12 @@
 Name: libjpeg-turbo
 URL: http://sourceforge.net/projects/libjpeg-turbo/
-Version: 1.1.90
+Version: 1.2
 License File: LICENSE.txt
 Security Critical: yes
 
 Description:
 This consists of three components:
-* A partial copy of libjpeg-turbo 1.1.90 (r722);
+* A partial copy of libjpeg-turbo 1.2 (r733);
 * A build file (libjpeg.gyp), and;
 * Patched header files used by Chromium.
 
@@ -23,4 +23,4 @@
 * Supported motion-JPEG frames that do not have DHT markers.
 * Fixed valgrind errors.
 The 'google.patch' file represents our changes from the original
-libjpeg-turbo-1.1.90.
+libjpeg-turbo-1.2.
diff --git a/google.patch b/google.patch
index 7f337c4..3e88c25 100644
--- a/google.patch
+++ b/google.patch
@@ -1,6 +1,6 @@
 Index: jdmarker.c
 ===================================================================
---- jdmarker.c	(revision 722)
+--- jdmarker.c	(revision 733)
 +++ jdmarker.c	(working copy)
 @@ -906,7 +906,7 @@
    }
@@ -166,7 +166,7 @@
        cinfo->unread_marker = 0;	/* processed the marker */
 Index: jmorecfg.h
 ===================================================================
---- jmorecfg.h	(revision 722)
+--- jmorecfg.h	(revision 733)
 +++ jmorecfg.h	(working copy)
 @@ -153,14 +153,18 @@
  /* INT16 must hold at least the values -32768..32767. */
@@ -203,7 +203,7 @@
  /*
 Index: jpeglib.h
 ===================================================================
---- jpeglib.h	(revision 722)
+--- jpeglib.h	(revision 733)
 +++ jpeglib.h	(working copy)
 @@ -15,6 +15,10 @@
  #ifndef JPEGLIB_H
@@ -336,7 +336,7 @@
 +#endif  // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_
 Index: simd/jcgrass2-64.asm
 ===================================================================
---- simd/jcgrass2-64.asm	(revision 722)
+--- simd/jcgrass2-64.asm	(revision 733)
 +++ simd/jcgrass2-64.asm	(working copy)
 @@ -30,7 +30,7 @@
  	SECTION	SEG_CONST
@@ -349,7 +349,7 @@
  
 Index: simd/jiss2fst.asm
 ===================================================================
---- simd/jiss2fst.asm	(revision 722)
+--- simd/jiss2fst.asm	(revision 733)
 +++ simd/jiss2fst.asm	(working copy)
 @@ -59,7 +59,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -371,7 +371,7 @@
  	push	ebp
 Index: simd/jiss2red-64.asm
 ===================================================================
---- simd/jiss2red-64.asm	(revision 722)
+--- simd/jiss2red-64.asm	(revision 733)
 +++ simd/jiss2red-64.asm	(working copy)
 @@ -73,7 +73,7 @@
  	SECTION	SEG_CONST
@@ -402,7 +402,7 @@
  	push	rbp
 Index: simd/jcclrss2-64.asm
 ===================================================================
---- simd/jcclrss2-64.asm	(revision 722)
+--- simd/jcclrss2-64.asm	(revision 733)
 +++ simd/jcclrss2-64.asm	(working copy)
 @@ -37,7 +37,7 @@
  
@@ -415,7 +415,7 @@
  	push	rbp
 Index: simd/ji3dnflt.asm
 ===================================================================
---- simd/ji3dnflt.asm	(revision 722)
+--- simd/ji3dnflt.asm	(revision 733)
 +++ simd/ji3dnflt.asm	(working copy)
 @@ -27,7 +27,7 @@
  	SECTION	SEG_CONST
@@ -437,7 +437,7 @@
  	push	ebp
 Index: simd/jsimdcpu.asm
 ===================================================================
---- simd/jsimdcpu.asm	(revision 722)
+--- simd/jsimdcpu.asm	(revision 733)
 +++ simd/jsimdcpu.asm	(working copy)
 @@ -29,7 +29,7 @@
  ;
@@ -450,7 +450,7 @@
  	push	ebx
 Index: simd/jdsammmx.asm
 ===================================================================
---- simd/jdsammmx.asm	(revision 722)
+--- simd/jdsammmx.asm	(revision 733)
 +++ simd/jdsammmx.asm	(working copy)
 @@ -22,7 +22,7 @@
  	SECTION	SEG_CONST
@@ -499,7 +499,7 @@
  	push	ebp
 Index: simd/jdmerss2-64.asm
 ===================================================================
---- simd/jdmerss2-64.asm	(revision 722)
+--- simd/jdmerss2-64.asm	(revision 733)
 +++ simd/jdmerss2-64.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -512,7 +512,7 @@
  
 Index: simd/jdmrgmmx.asm
 ===================================================================
---- simd/jdmrgmmx.asm	(revision 722)
+--- simd/jdmrgmmx.asm	(revision 733)
 +++ simd/jdmrgmmx.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -534,7 +534,7 @@
  	push	ebp
 Index: simd/jdsamss2.asm
 ===================================================================
---- simd/jdsamss2.asm	(revision 722)
+--- simd/jdsamss2.asm	(revision 733)
 +++ simd/jdsamss2.asm	(working copy)
 @@ -22,7 +22,7 @@
  	SECTION	SEG_CONST
@@ -583,7 +583,7 @@
  	push	ebp
 Index: simd/jiss2flt-64.asm
 ===================================================================
---- simd/jiss2flt-64.asm	(revision 722)
+--- simd/jiss2flt-64.asm	(revision 733)
 +++ simd/jiss2flt-64.asm	(working copy)
 @@ -38,7 +38,7 @@
  	SECTION	SEG_CONST
@@ -605,7 +605,7 @@
  	push	rbp
 Index: simd/jfss2int-64.asm
 ===================================================================
---- simd/jfss2int-64.asm	(revision 722)
+--- simd/jfss2int-64.asm	(revision 733)
 +++ simd/jfss2int-64.asm	(working copy)
 @@ -67,7 +67,7 @@
  	SECTION	SEG_CONST
@@ -627,7 +627,7 @@
  	push	rbp
 Index: simd/jcqnts2f.asm
 ===================================================================
---- simd/jcqnts2f.asm	(revision 722)
+--- simd/jcqnts2f.asm	(revision 733)
 +++ simd/jcqnts2f.asm	(working copy)
 @@ -35,7 +35,7 @@
  %define workspace	ebp+16		; FAST_FLOAT * workspace
@@ -649,7 +649,7 @@
  	push	ebp
 Index: simd/jdmrgss2.asm
 ===================================================================
---- simd/jdmrgss2.asm	(revision 722)
+--- simd/jdmrgss2.asm	(revision 733)
 +++ simd/jdmrgss2.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -660,88 +660,7 @@
  
  EXTN(jsimd_h2v1_merged_upsample_sse2):
  	push	ebp
-@@ -307,6 +307,41 @@
- 	movdqa	xmmA,xmmD
- 	sub	ecx, byte SIZEOF_XMMWORD
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store the lower 8 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_MMWORD
-+	jb	short .column_st7
-+	movq	MMWORD [edi], xmmA
-+	add	edi, byte SIZEOF_MMWORD
-+	sub	ecx, byte SIZEOF_MMWORD
-+	psrldq	xmmA, SIZEOF_MMWORD
-+.column_st7:
-+	; Store the lower 4 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_DWORD
-+	jb	short .column_st3
-+	movd	DWORD [edi], xmmA
-+	add	edi, byte SIZEOF_DWORD
-+	sub	ecx, byte SIZEOF_DWORD
-+	psrldq	xmmA, SIZEOF_DWORD
-+.column_st3:
-+	; Store the lower 2 bytes of eax to the output when it has enough
-+	; space.
-+	movd	eax, xmmA
-+	cmp	ecx, byte SIZEOF_WORD
-+	jb	short .column_st1
-+	mov	WORD [edi], ax
-+	add	edi, byte SIZEOF_WORD
-+	sub	ecx, byte SIZEOF_WORD
-+	shr	eax, 16
-+.column_st1:
-+	; Store the lower 1 byte of eax to the output when it has enough
-+	; space.
-+	test	ecx, ecx
-+	jz	short .endcolumn
-+	mov	BYTE [edi], al
-+%else
- 	mov	eax,ecx
- 	xor	ecx, byte 0x0F
- 	shl	ecx, 2
-@@ -346,6 +381,7 @@
- 	por	xmmE,xmmC
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %else ; RGB_PIXELSIZE == 4 ; -----------
- 
-@@ -434,6 +470,22 @@
- 	movdqa	xmmA,xmmD
- 	sub	ecx, byte SIZEOF_XMMWORD/4
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store two pixels (8 bytes) of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_XMMWORD/8
-+	jb	short .column_st7
-+	movq	MMWORD [edi], xmmA
-+	add	edi, byte SIZEOF_XMMWORD/8*4
-+	sub	ecx, byte SIZEOF_XMMWORD/8
-+	psrldq	xmmA, SIZEOF_XMMWORD/8*4
-+.column_st7:
-+	; Store one pixel (4 bytes) of xmmA to the output when it has enough
-+	; space.
-+	test	ecx, ecx
-+	jz	short .endcolumn
-+	movd	DWORD [edi], xmmA
-+%else
- 	cmp	ecx, byte SIZEOF_XMMWORD/16
- 	jb	short .endcolumn
- 	mov	eax,ecx
-@@ -473,6 +525,7 @@
- 	por	xmmE,xmmG
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %endif ; RGB_PIXELSIZE ; ---------------
- 
-@@ -507,7 +560,7 @@
+@@ -560,7 +560,7 @@
  %define output_buf(b)		(b)+20		; JSAMPARRAY output_buf
  
  	align	16
@@ -752,7 +671,7 @@
  	push	ebp
 Index: simd/jfmmxint.asm
 ===================================================================
---- simd/jfmmxint.asm	(revision 722)
+--- simd/jfmmxint.asm	(revision 733)
 +++ simd/jfmmxint.asm	(working copy)
 @@ -66,7 +66,7 @@
  	SECTION	SEG_CONST
@@ -774,7 +693,7 @@
  	push	ebp
 Index: simd/jcgryss2-64.asm
 ===================================================================
---- simd/jcgryss2-64.asm	(revision 722)
+--- simd/jcgryss2-64.asm	(revision 733)
 +++ simd/jcgryss2-64.asm	(working copy)
 @@ -37,7 +37,7 @@
  
@@ -787,7 +706,7 @@
  	push	rbp
 Index: simd/jcqnts2i.asm
 ===================================================================
---- simd/jcqnts2i.asm	(revision 722)
+--- simd/jcqnts2i.asm	(revision 733)
 +++ simd/jcqnts2i.asm	(working copy)
 @@ -35,7 +35,7 @@
  %define workspace	ebp+16		; DCTELEM * workspace
@@ -809,7 +728,7 @@
  	push	ebp
 Index: simd/jiss2fst-64.asm
 ===================================================================
---- simd/jiss2fst-64.asm	(revision 722)
+--- simd/jiss2fst-64.asm	(revision 733)
 +++ simd/jiss2fst-64.asm	(working copy)
 @@ -60,7 +60,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -831,7 +750,7 @@
  	push	rbp
 Index: simd/jiss2flt.asm
 ===================================================================
---- simd/jiss2flt.asm	(revision 722)
+--- simd/jiss2flt.asm	(revision 733)
 +++ simd/jiss2flt.asm	(working copy)
 @@ -37,7 +37,7 @@
  	SECTION	SEG_CONST
@@ -853,7 +772,7 @@
  	push	ebp
 Index: simd/jiss2int.asm
 ===================================================================
---- simd/jiss2int.asm	(revision 722)
+--- simd/jiss2int.asm	(revision 733)
 +++ simd/jiss2int.asm	(working copy)
 @@ -66,7 +66,7 @@
  	SECTION	SEG_CONST
@@ -875,7 +794,7 @@
  	push	ebp
 Index: simd/jfsseflt-64.asm
 ===================================================================
---- simd/jfsseflt-64.asm	(revision 722)
+--- simd/jfsseflt-64.asm	(revision 733)
 +++ simd/jfsseflt-64.asm	(working copy)
 @@ -38,7 +38,7 @@
  	SECTION	SEG_CONST
@@ -897,7 +816,7 @@
  	push	rbp
 Index: simd/jccolss2-64.asm
 ===================================================================
---- simd/jccolss2-64.asm	(revision 722)
+--- simd/jccolss2-64.asm	(revision 733)
 +++ simd/jccolss2-64.asm	(working copy)
 @@ -34,7 +34,7 @@
  	SECTION	SEG_CONST
@@ -910,7 +829,7 @@
  
 Index: simd/jcsamss2-64.asm
 ===================================================================
---- simd/jcsamss2-64.asm	(revision 722)
+--- simd/jcsamss2-64.asm	(revision 733)
 +++ simd/jcsamss2-64.asm	(working copy)
 @@ -41,7 +41,7 @@
  ; r15 = JSAMPARRAY output_data
@@ -932,7 +851,7 @@
  	push	rbp
 Index: simd/jdclrss2-64.asm
 ===================================================================
---- simd/jdclrss2-64.asm	(revision 722)
+--- simd/jdclrss2-64.asm	(revision 733)
 +++ simd/jdclrss2-64.asm	(working copy)
 @@ -39,7 +39,7 @@
  %define WK_NUM		2
@@ -943,90 +862,9 @@
  
  EXTN(jsimd_ycc_rgb_convert_sse2):
  	push	rbp
-@@ -290,6 +290,41 @@
- 	movdqa	xmmA,xmmD
- 	sub	rcx, byte SIZEOF_XMMWORD
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store the lower 8 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_MMWORD
-+	jb	short .column_st7
-+	movq	MMWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_MMWORD
-+	sub	rcx, byte SIZEOF_MMWORD
-+	psrldq	xmmA, SIZEOF_MMWORD
-+.column_st7:
-+	; Store the lower 4 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_DWORD
-+	jb	short .column_st3
-+	movd	DWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_DWORD
-+	sub	rcx, byte SIZEOF_DWORD
-+	psrldq	xmmA, SIZEOF_DWORD
-+.column_st3:
-+	; Store the lower 2 bytes of rax to the output when it has enough
-+	; space.
-+	movd	rax, xmmA
-+	cmp	rcx, byte SIZEOF_WORD
-+	jb	short .column_st1
-+	mov	WORD [rdi], ax
-+	add	rdi, byte SIZEOF_WORD
-+	sub	rcx, byte SIZEOF_WORD
-+	shr	rax, 16
-+.column_st1:
-+	; Store the lower 1 byte of rax to the output when it has enough
-+	; space.
-+	test	rcx, rcx
-+	jz	short .nextrow
-+	mov	BYTE [rdi], al
-+%else
- 	mov	rax,rcx
- 	xor	rcx, byte 0x0F
- 	shl	rcx, 2
-@@ -329,6 +364,7 @@
- 	por	xmmE,xmmC
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [rdi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %else ; RGB_PIXELSIZE == 4 ; -----------
- 
-@@ -413,6 +449,22 @@
- 	movdqa	xmmA,xmmD
- 	sub	rcx, byte SIZEOF_XMMWORD/4
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store two pixels (8 bytes) of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_XMMWORD/8
-+	jb	short .column_st7
-+	movq	MMWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_XMMWORD/8*4
-+	sub	rcx, byte SIZEOF_XMMWORD/8
-+	psrldq	xmmA, SIZEOF_XMMWORD/8*4
-+.column_st7:
-+	; Store one pixel (4 bytes) of xmmA to the output when it has enough
-+	; space.
-+	test	rcx, rcx
-+	jz	short .nextrow
-+	movd	DWORD [rdi], xmmA
-+%else
- 	cmp	rcx, byte SIZEOF_XMMWORD/16
- 	jb	near .nextrow
- 	mov	rax,rcx
-@@ -452,6 +504,7 @@
- 	por	xmmE,xmmG
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [rdi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %endif ; RGB_PIXELSIZE ; ---------------
- 
 Index: simd/jdcolmmx.asm
 ===================================================================
---- simd/jdcolmmx.asm	(revision 722)
+--- simd/jdcolmmx.asm	(revision 733)
 +++ simd/jdcolmmx.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -1039,7 +877,7 @@
  
 Index: simd/jcclrmmx.asm
 ===================================================================
---- simd/jcclrmmx.asm	(revision 722)
+--- simd/jcclrmmx.asm	(revision 733)
 +++ simd/jcclrmmx.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -1052,7 +890,7 @@
  	push	ebp
 Index: simd/jfsseflt.asm
 ===================================================================
---- simd/jfsseflt.asm	(revision 722)
+--- simd/jfsseflt.asm	(revision 733)
 +++ simd/jfsseflt.asm	(working copy)
 @@ -37,7 +37,7 @@
  	SECTION	SEG_CONST
@@ -1074,7 +912,7 @@
  	push	ebp
 Index: simd/jdmrgss2-64.asm
 ===================================================================
---- simd/jdmrgss2-64.asm	(revision 722)
+--- simd/jdmrgss2-64.asm	(revision 733)
 +++ simd/jdmrgss2-64.asm	(working copy)
 @@ -39,7 +39,7 @@
  %define WK_NUM		3
@@ -1085,88 +923,7 @@
  
  EXTN(jsimd_h2v1_merged_upsample_sse2):
  	push	rbp
-@@ -294,6 +294,41 @@
- 	movdqa	xmmA,xmmD
- 	sub	rcx, byte SIZEOF_XMMWORD
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store the lower 8 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_MMWORD
-+	jb	short .column_st7
-+	movq	MMWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_MMWORD
-+	sub	rcx, byte SIZEOF_MMWORD
-+	psrldq	xmmA, SIZEOF_MMWORD
-+.column_st7:
-+	; Store the lower 4 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_DWORD
-+	jb	short .column_st3
-+	movd	DWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_DWORD
-+	sub	rcx, byte SIZEOF_DWORD
-+	psrldq	xmmA, SIZEOF_DWORD
-+.column_st3:
-+	; Store the lower 2 bytes of rax to the output when it has enough
-+	; space.
-+	movd	rax, xmmA
-+	cmp	rcx, byte SIZEOF_WORD
-+	jb	short .column_st1
-+	mov	WORD [rdi], ax
-+	add	rdi, byte SIZEOF_WORD
-+	sub	rcx, byte SIZEOF_WORD
-+	shr	rax, 16
-+.column_st1:
-+	; Store the lower 1 byte of rax to the output when it has enough
-+	; space.
-+	test	rcx, rcx
-+	jz	short .endcolumn
-+	mov	BYTE [rdi], al
-+%else
- 	mov	rax,rcx
- 	xor	rcx, byte 0x0F
- 	shl	rcx, 2
-@@ -333,6 +368,7 @@
- 	por	xmmE,xmmC
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %else ; RGB_PIXELSIZE == 4 ; -----------
- 
-@@ -420,6 +456,22 @@
- 	movdqa	xmmA,xmmD
- 	sub	rcx, byte SIZEOF_XMMWORD/4
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store two pixels (8 bytes) of xmmA to the output when it has enough
-+	; space.
-+	cmp	rcx, byte SIZEOF_XMMWORD/8
-+	jb	short .column_st7
-+	movq	MMWORD [rdi], xmmA
-+	add	rdi, byte SIZEOF_XMMWORD/8*4
-+	sub	rcx, byte SIZEOF_XMMWORD/8
-+	psrldq	xmmA, SIZEOF_XMMWORD/8*4
-+.column_st7:
-+	; Store one pixel (4 bytes) of xmmA to the output when it has enough
-+	; space.
-+	test	rcx, rcx
-+	jz	short .endcolumn
-+	movd	DWORD [rdi], xmmA
-+%else
- 	cmp	rcx, byte SIZEOF_XMMWORD/16
- 	jb	near .endcolumn
- 	mov	rax,rcx
-@@ -459,6 +511,7 @@
- 	por	xmmE,xmmG
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %endif ; RGB_PIXELSIZE ; ---------------
- 
-@@ -490,7 +543,7 @@
+@@ -543,7 +543,7 @@
  ; r13 = JSAMPARRAY output_buf
  
  	align	16
@@ -1177,7 +934,7 @@
  	push	rbp
 Index: simd/jdmermmx.asm
 ===================================================================
---- simd/jdmermmx.asm	(revision 722)
+--- simd/jdmermmx.asm	(revision 733)
 +++ simd/jdmermmx.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -1190,7 +947,7 @@
  
 Index: simd/jdcolss2.asm
 ===================================================================
---- simd/jdcolss2.asm	(revision 722)
+--- simd/jdcolss2.asm	(revision 733)
 +++ simd/jdcolss2.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -1203,7 +960,7 @@
  
 Index: simd/jiss2red.asm
 ===================================================================
---- simd/jiss2red.asm	(revision 722)
+--- simd/jiss2red.asm	(revision 733)
 +++ simd/jiss2red.asm	(working copy)
 @@ -72,7 +72,7 @@
  	SECTION	SEG_CONST
@@ -1234,7 +991,7 @@
  	push	ebp
 Index: simd/jcclrss2.asm
 ===================================================================
---- simd/jcclrss2.asm	(revision 722)
+--- simd/jcclrss2.asm	(revision 733)
 +++ simd/jcclrss2.asm	(working copy)
 @@ -38,7 +38,7 @@
  
@@ -1247,7 +1004,7 @@
  	push	ebp
 Index: simd/jdmerss2.asm
 ===================================================================
---- simd/jdmerss2.asm	(revision 722)
+--- simd/jdmerss2.asm	(revision 733)
 +++ simd/jdmerss2.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -1260,7 +1017,7 @@
  
 Index: simd/jfss2fst-64.asm
 ===================================================================
---- simd/jfss2fst-64.asm	(revision 722)
+--- simd/jfss2fst-64.asm	(revision 733)
 +++ simd/jfss2fst-64.asm	(working copy)
 @@ -53,7 +53,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1282,7 +1039,7 @@
  	push	rbp
 Index: simd/jimmxfst.asm
 ===================================================================
---- simd/jimmxfst.asm	(revision 722)
+--- simd/jimmxfst.asm	(revision 733)
 +++ simd/jimmxfst.asm	(working copy)
 @@ -59,7 +59,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1304,7 +1061,7 @@
  	push	ebp
 Index: simd/jcqntmmx.asm
 ===================================================================
---- simd/jcqntmmx.asm	(revision 722)
+--- simd/jcqntmmx.asm	(revision 733)
 +++ simd/jcqntmmx.asm	(working copy)
 @@ -35,7 +35,7 @@
  %define workspace	ebp+16		; DCTELEM * workspace
@@ -1326,7 +1083,7 @@
  	push	ebp
 Index: simd/jfss2fst.asm
 ===================================================================
---- simd/jfss2fst.asm	(revision 722)
+--- simd/jfss2fst.asm	(revision 733)
 +++ simd/jfss2fst.asm	(working copy)
 @@ -52,7 +52,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1348,7 +1105,7 @@
  	push	ebp
 Index: simd/jcgrammx.asm
 ===================================================================
---- simd/jcgrammx.asm	(revision 722)
+--- simd/jcgrammx.asm	(revision 733)
 +++ simd/jcgrammx.asm	(working copy)
 @@ -33,7 +33,7 @@
  	SECTION	SEG_CONST
@@ -1361,7 +1118,7 @@
  
 Index: simd/jf3dnflt.asm
 ===================================================================
---- simd/jf3dnflt.asm	(revision 722)
+--- simd/jf3dnflt.asm	(revision 733)
 +++ simd/jf3dnflt.asm	(working copy)
 @@ -27,7 +27,7 @@
  	SECTION	SEG_CONST
@@ -1383,7 +1140,7 @@
  	push	ebp
 Index: simd/jdcolss2-64.asm
 ===================================================================
---- simd/jdcolss2-64.asm	(revision 722)
+--- simd/jdcolss2-64.asm	(revision 733)
 +++ simd/jdcolss2-64.asm	(working copy)
 @@ -35,7 +35,7 @@
  	SECTION	SEG_CONST
@@ -1396,7 +1153,7 @@
  
 Index: simd/jdsamss2-64.asm
 ===================================================================
---- simd/jdsamss2-64.asm	(revision 722)
+--- simd/jdsamss2-64.asm	(revision 733)
 +++ simd/jdsamss2-64.asm	(working copy)
 @@ -23,7 +23,7 @@
  	SECTION	SEG_CONST
@@ -1445,7 +1202,7 @@
  	push	rbp
 Index: simd/jcgrass2.asm
 ===================================================================
---- simd/jcgrass2.asm	(revision 722)
+--- simd/jcgrass2.asm	(revision 733)
 +++ simd/jcgrass2.asm	(working copy)
 @@ -30,7 +30,7 @@
  	SECTION	SEG_CONST
@@ -1458,7 +1215,7 @@
  
 Index: simd/jcsammmx.asm
 ===================================================================
---- simd/jcsammmx.asm	(revision 722)
+--- simd/jcsammmx.asm	(revision 733)
 +++ simd/jcsammmx.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define output_data(b)	(b)+28	; JSAMPARRAY output_data
@@ -1480,7 +1237,7 @@
  	push	ebp
 Index: simd/jcqnts2f-64.asm
 ===================================================================
---- simd/jcqnts2f-64.asm	(revision 722)
+--- simd/jcqnts2f-64.asm	(revision 733)
 +++ simd/jcqnts2f-64.asm	(working copy)
 @@ -36,7 +36,7 @@
  ; r12 = FAST_FLOAT * workspace
@@ -1502,7 +1259,7 @@
  	push	rbp
 Index: simd/jcqnt3dn.asm
 ===================================================================
---- simd/jcqnt3dn.asm	(revision 722)
+--- simd/jcqnt3dn.asm	(revision 733)
 +++ simd/jcqnt3dn.asm	(working copy)
 @@ -35,7 +35,7 @@
  %define workspace	ebp+16		; FAST_FLOAT * workspace
@@ -1524,7 +1281,7 @@
  	push	ebp
 Index: simd/jcsamss2.asm
 ===================================================================
---- simd/jcsamss2.asm	(revision 722)
+--- simd/jcsamss2.asm	(revision 733)
 +++ simd/jcsamss2.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define output_data(b)	(b)+28		; JSAMPARRAY output_data
@@ -1546,7 +1303,7 @@
  	push	ebp
 Index: simd/jimmxint.asm
 ===================================================================
---- simd/jimmxint.asm	(revision 722)
+--- simd/jimmxint.asm	(revision 733)
 +++ simd/jimmxint.asm	(working copy)
 @@ -66,7 +66,7 @@
  	SECTION	SEG_CONST
@@ -1568,7 +1325,7 @@
  	push	ebp
 Index: simd/jcgrymmx.asm
 ===================================================================
---- simd/jcgrymmx.asm	(revision 722)
+--- simd/jcgrymmx.asm	(revision 733)
 +++ simd/jcgrymmx.asm	(working copy)
 @@ -41,7 +41,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -1581,7 +1338,7 @@
  	push	ebp
 Index: simd/jfss2int.asm
 ===================================================================
---- simd/jfss2int.asm	(revision 722)
+--- simd/jfss2int.asm	(revision 733)
 +++ simd/jfss2int.asm	(working copy)
 @@ -66,7 +66,7 @@
  	SECTION	SEG_CONST
@@ -1603,7 +1360,7 @@
  	push	ebp
 Index: simd/jcgryss2.asm
 ===================================================================
---- simd/jcgryss2.asm	(revision 722)
+--- simd/jcgryss2.asm	(revision 733)
 +++ simd/jcgryss2.asm	(working copy)
 @@ -39,7 +39,7 @@
  
@@ -1616,7 +1373,7 @@
  	push	ebp
 Index: simd/jccolmmx.asm
 ===================================================================
---- simd/jccolmmx.asm	(revision 722)
+--- simd/jccolmmx.asm	(revision 733)
 +++ simd/jccolmmx.asm	(working copy)
 @@ -37,7 +37,7 @@
  	SECTION	SEG_CONST
@@ -1629,7 +1386,7 @@
  
 Index: simd/jimmxred.asm
 ===================================================================
---- simd/jimmxred.asm	(revision 722)
+--- simd/jimmxred.asm	(revision 733)
 +++ simd/jimmxred.asm	(working copy)
 @@ -72,7 +72,7 @@
  	SECTION	SEG_CONST
@@ -1660,7 +1417,7 @@
  	push	ebp
 Index: simd/jsimdext.inc
 ===================================================================
---- simd/jsimdext.inc	(revision 722)
+--- simd/jsimdext.inc	(revision 733)
 +++ simd/jsimdext.inc	(working copy)
 @@ -73,6 +73,9 @@
  ; * *BSD family Unix using elf format
@@ -1672,7 +1429,7 @@
  ; mark stack as non-executable
  section .note.GNU-stack noalloc noexec nowrite progbits
  
-@@ -373,4 +376,14 @@
+@@ -375,4 +378,14 @@
  ;
  %include "jsimdcfg.inc"
  
@@ -1689,7 +1446,7 @@
  ; --------------------------------------------------------------------------
 Index: simd/jdclrmmx.asm
 ===================================================================
---- simd/jdclrmmx.asm	(revision 722)
+--- simd/jdclrmmx.asm	(revision 733)
 +++ simd/jdclrmmx.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -1702,7 +1459,7 @@
  	push	ebp
 Index: simd/jccolss2.asm
 ===================================================================
---- simd/jccolss2.asm	(revision 722)
+--- simd/jccolss2.asm	(revision 733)
 +++ simd/jccolss2.asm	(working copy)
 @@ -34,7 +34,7 @@
  	SECTION	SEG_CONST
@@ -1715,7 +1472,7 @@
  
 Index: simd/jisseflt.asm
 ===================================================================
---- simd/jisseflt.asm	(revision 722)
+--- simd/jisseflt.asm	(revision 733)
 +++ simd/jisseflt.asm	(working copy)
 @@ -37,7 +37,7 @@
  	SECTION	SEG_CONST
@@ -1737,7 +1494,7 @@
  	push	ebp
 Index: simd/jcqnts2i-64.asm
 ===================================================================
---- simd/jcqnts2i-64.asm	(revision 722)
+--- simd/jcqnts2i-64.asm	(revision 733)
 +++ simd/jcqnts2i-64.asm	(working copy)
 @@ -36,7 +36,7 @@
  ; r12 = DCTELEM * workspace
@@ -1759,7 +1516,7 @@
  	push	rbp
 Index: simd/jdclrss2.asm
 ===================================================================
---- simd/jdclrss2.asm	(revision 722)
+--- simd/jdclrss2.asm	(revision 733)
 +++ simd/jdclrss2.asm	(working copy)
 @@ -40,7 +40,7 @@
  %define gotptr		wk(0)-SIZEOF_POINTER	; void * gotptr
@@ -1770,90 +1527,9 @@
  
  EXTN(jsimd_ycc_rgb_convert_sse2):
  	push	ebp
-@@ -302,6 +302,41 @@
- 	movdqa	xmmA,xmmD
- 	sub	ecx, byte SIZEOF_XMMWORD
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store the lower 8 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_MMWORD
-+	jb	short .column_st7
-+	movq	MMWORD [edi], xmmA
-+	add	edi, byte SIZEOF_MMWORD
-+	sub	ecx, byte SIZEOF_MMWORD
-+	psrldq	xmmA, SIZEOF_MMWORD
-+.column_st7:
-+	; Store the lower 4 bytes of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_DWORD
-+	jb	short .column_st3
-+	movd	DWORD [edi], xmmA
-+	add	edi, byte SIZEOF_DWORD
-+	sub	ecx, byte SIZEOF_DWORD
-+	psrldq	xmmA, SIZEOF_DWORD
-+.column_st3:
-+	; Store the lower 2 bytes of eax to the output when it has enough
-+	; space.
-+	movd	eax, xmmA
-+	cmp	ecx, byte SIZEOF_WORD
-+	jb	short .column_st1
-+	mov	WORD [edi], ax
-+	add	edi, byte SIZEOF_WORD
-+	sub	ecx, byte SIZEOF_WORD
-+	shr	eax, 16
-+.column_st1:
-+	; Store the lower 1 byte of eax to the output when it has enough
-+	; space.
-+	test	ecx, ecx
-+	jz	short .nextrow
-+	mov	BYTE [edi], al
-+%else
- 	mov	eax,ecx
- 	xor	ecx, byte 0x0F
- 	shl	ecx, 2
-@@ -341,6 +376,7 @@
- 	por	xmmE,xmmC
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %else ; RGB_PIXELSIZE == 4 ; -----------
- 
-@@ -426,6 +462,22 @@
- 	movdqa	xmmA,xmmD
- 	sub	ecx, byte SIZEOF_XMMWORD/4
- .column_st15:
-+%ifdef STRICT_MEMORY_ACCESS
-+	; Store two pixels (8 bytes) of xmmA to the output when it has enough
-+	; space.
-+	cmp	ecx, byte SIZEOF_XMMWORD/8
-+	jb	short .column_st7
-+	movq	MMWORD [edi], xmmA
-+	add	edi, byte SIZEOF_XMMWORD/8*4
-+	sub	ecx, byte SIZEOF_XMMWORD/8
-+	psrldq	xmmA, SIZEOF_XMMWORD/8*4
-+.column_st7:
-+	; Store one pixel (4 bytes) of xmmA to the output when it has enough
-+	; space.
-+	test	ecx, ecx
-+	jz	short .nextrow
-+	movd	DWORD [edi], xmmA
-+%else
- 	cmp	ecx, byte SIZEOF_XMMWORD/16
- 	jb	short .nextrow
- 	mov	eax,ecx
-@@ -465,6 +517,7 @@
- 	por	xmmE,xmmG
- .adj0:	; ----------------
- 	maskmovdqu xmmA,xmmE			; movntdqu XMMWORD [edi], xmmA
-+%endif ; STRICT_MEMORY_ACCESS ; ---------------
- 
- %endif ; RGB_PIXELSIZE ; ---------------
- 
 Index: simd/jcqntsse.asm
 ===================================================================
---- simd/jcqntsse.asm	(revision 722)
+--- simd/jcqntsse.asm	(revision 733)
 +++ simd/jcqntsse.asm	(working copy)
 @@ -35,7 +35,7 @@
  %define workspace	ebp+16		; FAST_FLOAT * workspace
@@ -1875,7 +1551,7 @@
  	push	ebp
 Index: simd/jiss2int-64.asm
 ===================================================================
---- simd/jiss2int-64.asm	(revision 722)
+--- simd/jiss2int-64.asm	(revision 733)
 +++ simd/jiss2int-64.asm	(working copy)
 @@ -67,7 +67,7 @@
  	SECTION	SEG_CONST
@@ -1897,7 +1573,7 @@
  	push	rbp
 Index: simd/jfmmxfst.asm
 ===================================================================
---- simd/jfmmxfst.asm	(revision 722)
+--- simd/jfmmxfst.asm	(revision 733)
 +++ simd/jfmmxfst.asm	(working copy)
 @@ -52,7 +52,7 @@
  %define CONST_SHIFT     (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1919,7 +1595,7 @@
  	push	ebp
 Index: jdarith.c
 ===================================================================
---- jdarith.c	(revision 722)
+--- jdarith.c	(revision 733)
 +++ jdarith.c	(working copy)
 @@ -150,8 +150,8 @@
     */
@@ -1934,7 +1610,7 @@
    temp = e->a - qe;
 Index: jdhuff.c
 ===================================================================
---- jdhuff.c	(revision 722)
+--- jdhuff.c	(revision 733)
 +++ jdhuff.c	(working copy)
 @@ -742,7 +742,7 @@
   * this module, since we'll just re-assign them on the next call.)
diff --git a/jccolor.c b/jccolor.c
index 12804f3..0d8910a 100644
--- a/jccolor.c
+++ b/jccolor.c
@@ -225,6 +225,7 @@
                                   num_rows);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       extrgbx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
                                    num_rows);
       break;
@@ -233,14 +234,17 @@
                                   num_rows);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       extbgrx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
                                    num_rows);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       extxbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
                                    num_rows);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       extxrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
                                    num_rows);
       break;
@@ -270,6 +274,7 @@
                                    num_rows);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       extrgbx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
                                     num_rows);
       break;
@@ -278,14 +283,17 @@
                                    num_rows);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       extbgrx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
                                     num_rows);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       extxbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
                                     num_rows);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       extxrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
                                     num_rows);
       break;
@@ -458,6 +466,10 @@
   case JCS_EXT_BGRX:
   case JCS_EXT_XBGR:
   case JCS_EXT_XRGB:
+  case JCS_EXT_RGBA:
+  case JCS_EXT_BGRA:
+  case JCS_EXT_ABGR:
+  case JCS_EXT_ARGB:
     if (cinfo->input_components != rgb_pixelsize[cinfo->in_color_space])
       ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE);
     break;
@@ -492,7 +504,11 @@
              cinfo->in_color_space == JCS_EXT_BGR ||
              cinfo->in_color_space == JCS_EXT_BGRX ||
              cinfo->in_color_space == JCS_EXT_XBGR ||
-             cinfo->in_color_space == JCS_EXT_XRGB) {
+             cinfo->in_color_space == JCS_EXT_XRGB ||
+             cinfo->in_color_space == JCS_EXT_RGBA ||
+             cinfo->in_color_space == JCS_EXT_BGRA ||
+             cinfo->in_color_space == JCS_EXT_ABGR ||
+             cinfo->in_color_space == JCS_EXT_ARGB) {
       if (jsimd_can_rgb_gray())
         cconvert->pub.color_convert = jsimd_rgb_gray_convert;
       else {
@@ -512,6 +528,10 @@
   case JCS_EXT_BGRX:
   case JCS_EXT_XBGR:
   case JCS_EXT_XRGB:
+  case JCS_EXT_RGBA:
+  case JCS_EXT_BGRA:
+  case JCS_EXT_ABGR:
+  case JCS_EXT_ARGB:
     if (cinfo->num_components != 3)
       ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);
     if (cinfo->in_color_space == cinfo->jpeg_color_space &&
@@ -530,7 +550,11 @@
         cinfo->in_color_space == JCS_EXT_BGR ||
         cinfo->in_color_space == JCS_EXT_BGRX ||
         cinfo->in_color_space == JCS_EXT_XBGR ||
-        cinfo->in_color_space == JCS_EXT_XRGB) {
+        cinfo->in_color_space == JCS_EXT_XRGB ||
+        cinfo->in_color_space == JCS_EXT_RGBA ||
+        cinfo->in_color_space == JCS_EXT_BGRA ||
+        cinfo->in_color_space == JCS_EXT_ABGR ||
+        cinfo->in_color_space == JCS_EXT_ARGB) {
       if (jsimd_can_rgb_ycc())
         cconvert->pub.color_convert = jsimd_rgb_ycc_convert;
       else {
diff --git a/jconfig.h b/jconfig.h
index dacc059..794edef 100644
--- a/jconfig.h
+++ b/jconfig.h
@@ -7,10 +7,10 @@
 #endif /* JPEG_LIB_VERSION */
 
 /* Support arithmetic encoding */
-#define C_ARITH_CODING_SUPPORTED 1
+/* #undef C_ARITH_CODING_SUPPORTED */
 
 /* Support arithmetic decoding */
-#define D_ARITH_CODING_SUPPORTED 1
+/* #undef D_ARITH_CODING_SUPPORTED */
 
 /* Define if your compiler supports prototypes */
 #ifndef HAVE_PROTOTYPES
diff --git a/jcparam.c b/jcparam.c
index 27b5a03..557fdc9 100644
--- a/jcparam.c
+++ b/jcparam.c
@@ -3,7 +3,7 @@
  *
  * Copyright (C) 1991-1998, Thomas G. Lane.
  * Modified 2003-2008 by Guido Vollbeding.
- * Copyright (C) 2009-2010, D. R. Commander.
+ * Copyright (C) 2009-2011, D. R. Commander.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -398,6 +398,10 @@
   case JCS_EXT_BGRX:
   case JCS_EXT_XBGR:
   case JCS_EXT_XRGB:
+  case JCS_EXT_RGBA:
+  case JCS_EXT_BGRA:
+  case JCS_EXT_ABGR:
+  case JCS_EXT_ARGB:
     jpeg_set_colorspace(cinfo, JCS_YCbCr);
     break;
   case JCS_YCbCr:
diff --git a/jdcolor.c b/jdcolor.c
index 05d389a..a9a9220 100644
--- a/jdcolor.c
+++ b/jdcolor.c
@@ -224,6 +224,7 @@
                                   num_rows);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       ycc_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
                                    num_rows);
       break;
@@ -232,14 +233,17 @@
                                   num_rows);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       ycc_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
                                    num_rows);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       ycc_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
                                    num_rows);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       ycc_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
                                    num_rows);
       break;
@@ -316,6 +320,7 @@
                                    num_rows);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       gray_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
                                     num_rows);
       break;
@@ -324,14 +329,17 @@
                                    num_rows);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       gray_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
                                     num_rows);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       gray_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
                                     num_rows);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       gray_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
                                     num_rows);
       break;
@@ -471,6 +479,10 @@
   case JCS_EXT_BGRX:
   case JCS_EXT_XBGR:
   case JCS_EXT_XRGB:
+  case JCS_EXT_RGBA:
+  case JCS_EXT_BGRA:
+  case JCS_EXT_ABGR:
+  case JCS_EXT_ARGB:
     cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
     if (cinfo->jpeg_color_space == JCS_YCbCr) {
       if (jsimd_can_ycc_rgb())
diff --git a/jdmaster.c b/jdmaster.c
index 14520da..c73ec02 100644
--- a/jdmaster.c
+++ b/jdmaster.c
@@ -2,7 +2,7 @@
  * jdmaster.c
  *
  * Copyright (C) 1991-1997, Thomas G. Lane.
- * Copyright (C) 2009-2010, D. R. Commander.
+ * Copyright (C) 2009-2011, D. R. Commander.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -57,7 +57,11 @@
       cinfo->out_color_space != JCS_EXT_BGR &&
       cinfo->out_color_space != JCS_EXT_BGRX &&
       cinfo->out_color_space != JCS_EXT_XBGR &&
-      cinfo->out_color_space != JCS_EXT_XRGB) ||
+      cinfo->out_color_space != JCS_EXT_XRGB &&
+      cinfo->out_color_space != JCS_EXT_RGBA &&
+      cinfo->out_color_space != JCS_EXT_BGRA &&
+      cinfo->out_color_space != JCS_EXT_ABGR &&
+      cinfo->out_color_space != JCS_EXT_ARGB) ||
       cinfo->out_color_components != rgb_pixelsize[cinfo->out_color_space])
     return FALSE;
   /* and it only handles 2h1v or 2h2v sampling ratios */
@@ -209,6 +213,10 @@
   case JCS_EXT_BGRX:
   case JCS_EXT_XBGR:
   case JCS_EXT_XRGB:
+  case JCS_EXT_RGBA:
+  case JCS_EXT_BGRA:
+  case JCS_EXT_ABGR:
+  case JCS_EXT_ARGB:
     cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
     break;
   case JCS_YCbCr:
diff --git a/jdmerge.c b/jdmerge.c
index c75497a..c813080 100644
--- a/jdmerge.c
+++ b/jdmerge.c
@@ -329,6 +329,7 @@
                                            output_buf);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       extrgbx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
@@ -337,14 +338,17 @@
                                            output_buf);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       extbgrx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       extxbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       extxrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
@@ -371,6 +375,7 @@
                                            output_buf);
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       extrgbx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
@@ -379,14 +384,17 @@
                                            output_buf);
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       extbgrx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       extxbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       extxrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
                                             output_buf);
       break;
diff --git a/jmorecfg.h b/jmorecfg.h
index ac4cd37..0b4c8cf 100644
--- a/jmorecfg.h
+++ b/jmorecfg.h
@@ -2,7 +2,7 @@
  * jmorecfg.h
  *
  * Copyright (C) 1991-1997, Thomas G. Lane.
- * Copyright (C) 2009, D. R. Commander.
+ * Copyright (C) 2009, 2011, D. R. Commander.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -319,7 +319,7 @@
 #define RGB_BLUE	2	/* Offset of Blue */
 #define RGB_PIXELSIZE	3	/* JSAMPLEs per RGB scanline element */
 
-#define JPEG_NUMCS 12
+#define JPEG_NUMCS 16
 
 #define EXT_RGB_RED        0
 #define EXT_RGB_GREEN      1
@@ -353,22 +353,26 @@
 
 static const int rgb_red[JPEG_NUMCS] = {
   -1, -1, RGB_RED, -1, -1, -1, EXT_RGB_RED, EXT_RGBX_RED,
-  EXT_BGR_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED
+  EXT_BGR_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED,
+  EXT_RGBX_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED
 };
 
 static const int rgb_green[JPEG_NUMCS] = {
   -1, -1, RGB_GREEN, -1, -1, -1, EXT_RGB_GREEN, EXT_RGBX_GREEN,
-  EXT_BGR_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN
+  EXT_BGR_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN,
+  EXT_RGBX_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN
 };
 
 static const int rgb_blue[JPEG_NUMCS] = {
   -1, -1, RGB_BLUE, -1, -1, -1, EXT_RGB_BLUE, EXT_RGBX_BLUE,
-  EXT_BGR_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE
+  EXT_BGR_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE,
+  EXT_RGBX_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE
 };
 
 static const int rgb_pixelsize[JPEG_NUMCS] = {
   -1, -1, RGB_PIXELSIZE, -1, -1, -1, EXT_RGB_PIXELSIZE, EXT_RGBX_PIXELSIZE,
-  EXT_BGR_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE
+  EXT_BGR_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE,
+  EXT_RGBX_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE
 };
 
 /* Definitions for speed-related optimizations. */
diff --git a/jpeglib.h b/jpeglib.h
index e9cbb0b..3403d3f 100644
--- a/jpeglib.h
+++ b/jpeglib.h
@@ -3,7 +3,7 @@
  *
  * Copyright (C) 1991-1998, Thomas G. Lane.
  * Modified 2002-2009 by Guido Vollbeding.
- * Copyright (C) 2009-2010, D. R. Commander.
+ * Copyright (C) 2009-2011, D. R. Commander.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -215,12 +215,13 @@
 /* Known color spaces. */
 
 #define JCS_EXTENSIONS 1
+#define JCS_ALPHA_EXTENSIONS 1
 
 typedef enum {
 	JCS_UNKNOWN,		/* error/unspecified */
 	JCS_GRAYSCALE,		/* monochrome */
 	JCS_RGB,		/* red/green/blue as specified by the RGB_RED, RGB_GREEN,
-                 RGB_BLUE, and RGB_PIXELSIZE macros */
+				   RGB_BLUE, and RGB_PIXELSIZE macros */
 	JCS_YCbCr,		/* Y/Cb/Cr (also known as YUV) */
 	JCS_CMYK,		/* C/M/Y/K */
 	JCS_YCCK,		/* Y/Cb/Cr/K */
@@ -229,7 +230,18 @@
 	JCS_EXT_BGR,		/* blue/green/red */
 	JCS_EXT_BGRX,		/* blue/green/red/x */
 	JCS_EXT_XBGR,		/* x/blue/green/red */
-	JCS_EXT_XRGB		/* x/red/green/blue */
+	JCS_EXT_XRGB,		/* x/red/green/blue */
+	/* When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX,
+	   JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is
+	   undefined, and in order to ensure the best performance,
+	   libjpeg-turbo can set that byte to whatever value it wishes.  Use
+	   the following colorspace constants to ensure that the X byte is set
+	   to 0xFF, so that it can be interpreted as an opaque alpha
+	   channel. */
+	JCS_EXT_RGBA,		/* red/green/blue/alpha */
+	JCS_EXT_BGRA,		/* blue/green/red/alpha */
+	JCS_EXT_ABGR,		/* alpha/blue/green/red */
+	JCS_EXT_ARGB		/* alpha/red/green/blue */
 } J_COLOR_SPACE;
 
 /* DCT/IDCT algorithm options. */
diff --git a/libjpeg.gyp b/libjpeg.gyp
index 6e9ab28..5301ab3 100644
--- a/libjpeg.gyp
+++ b/libjpeg.gyp
@@ -35,10 +35,8 @@
             'WITH_SIMD', 'MOTION_JPEG_SUPPORTED',
           ],
           'sources': [
-            'jaricom.c',
             'jcapimin.c',
             'jcapistd.c',
-            'jcarith.c',
             'jccoefct.c',
             'jccolor.c',
             'jcdctmgr.c',
@@ -56,7 +54,6 @@
             'jcsample.c',
             'jdapimin.c',
             'jdapistd.c',
-            'jdarith.c',
             'jdatadst.c',
             'jdatasrc.c',
             'jdcoefct.c',
diff --git a/simd/jdclrss2-64.asm b/simd/jdclrss2-64.asm
index bd7334c..522fc5a 100644
--- a/simd/jdclrss2-64.asm
+++ b/simd/jdclrss2-64.asm
@@ -311,7 +311,7 @@
 .column_st3:
 	; Store the lower 2 bytes of rax to the output when it has enough
 	; space.
-	movd	rax, xmmA
+	movd	eax, xmmA
 	cmp	rcx, byte SIZEOF_WORD
 	jb	short .column_st1
 	mov	WORD [rdi], ax
diff --git a/simd/jdmrgss2-64.asm b/simd/jdmrgss2-64.asm
index f7983f7..3bf4148 100644
--- a/simd/jdmrgss2-64.asm
+++ b/simd/jdmrgss2-64.asm
@@ -315,7 +315,7 @@
 .column_st3:
 	; Store the lower 2 bytes of rax to the output when it has enough
 	; space.
-	movd	rax, xmmA
+	movd	eax, xmmA
 	cmp	rcx, byte SIZEOF_WORD
 	jb	short .column_st1
 	mov	WORD [rdi], ax
diff --git a/simd/jdmrgss2.asm b/simd/jdmrgss2.asm
index 79712c6..f1c3c9d 100644
--- a/simd/jdmrgss2.asm
+++ b/simd/jdmrgss2.asm
@@ -476,7 +476,7 @@
 	cmp	ecx, byte SIZEOF_XMMWORD/8
 	jb	short .column_st7
 	movq	MMWORD [edi], xmmA
-	add	edi, byte SIZEOF_XMMWORD/8*4
+	add	edi, byte SIZEOF_XMMWORD/2
 	sub	ecx, byte SIZEOF_XMMWORD/8
 	psrldq	xmmA, SIZEOF_XMMWORD/8*4
 .column_st7:
diff --git a/simd/jsimd_arm.c b/simd/jsimd_arm.c
index a9d920c..5a095f2 100644
--- a/simd/jsimd_arm.c
+++ b/simd/jsimd_arm.c
@@ -189,18 +189,22 @@
       neonfct=jsimd_extrgb_ycc_convert_neon;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       neonfct=jsimd_extrgbx_ycc_convert_neon;
       break;
     case JCS_EXT_BGR:
       neonfct=jsimd_extbgr_ycc_convert_neon;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       neonfct=jsimd_extbgrx_ycc_convert_neon;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       neonfct=jsimd_extxbgr_ycc_convert_neon;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       neonfct=jsimd_extxrgb_ycc_convert_neon;
       break;
     default:
@@ -233,18 +237,22 @@
       neonfct=jsimd_ycc_extrgb_convert_neon;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       neonfct=jsimd_ycc_extrgbx_convert_neon;
       break;
     case JCS_EXT_BGR:
       neonfct=jsimd_ycc_extbgr_convert_neon;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       neonfct=jsimd_ycc_extbgrx_convert_neon;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       neonfct=jsimd_ycc_extxbgr_convert_neon;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       neonfct=jsimd_ycc_extxrgb_convert_neon;
       break;
   default:
diff --git a/simd/jsimd_i386.c b/simd/jsimd_i386.c
index f77c5ef..120eb02 100644
--- a/simd/jsimd_i386.c
+++ b/simd/jsimd_i386.c
@@ -142,6 +142,7 @@
       mmxfct=jsimd_extrgb_ycc_convert_mmx;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_extrgbx_ycc_convert_sse2;
       mmxfct=jsimd_extrgbx_ycc_convert_mmx;
       break;
@@ -150,14 +151,17 @@
       mmxfct=jsimd_extbgr_ycc_convert_mmx;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_extbgrx_ycc_convert_sse2;
       mmxfct=jsimd_extbgrx_ycc_convert_mmx;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_extxbgr_ycc_convert_sse2;
       mmxfct=jsimd_extxbgr_ycc_convert_mmx;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_extxrgb_ycc_convert_sse2;
       mmxfct=jsimd_extxrgb_ycc_convert_mmx;
       break;
@@ -191,6 +195,7 @@
       mmxfct=jsimd_extrgb_gray_convert_mmx;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_extrgbx_gray_convert_sse2;
       mmxfct=jsimd_extrgbx_gray_convert_mmx;
       break;
@@ -199,14 +204,17 @@
       mmxfct=jsimd_extbgr_gray_convert_mmx;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_extbgrx_gray_convert_sse2;
       mmxfct=jsimd_extbgrx_gray_convert_mmx;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_extxbgr_gray_convert_sse2;
       mmxfct=jsimd_extxbgr_gray_convert_mmx;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_extxrgb_gray_convert_sse2;
       mmxfct=jsimd_extxrgb_gray_convert_mmx;
       break;
@@ -240,6 +248,7 @@
       mmxfct=jsimd_ycc_extrgb_convert_mmx;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_ycc_extrgbx_convert_sse2;
       mmxfct=jsimd_ycc_extrgbx_convert_mmx;
       break;
@@ -248,14 +257,17 @@
       mmxfct=jsimd_ycc_extbgr_convert_mmx;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_ycc_extbgrx_convert_sse2;
       mmxfct=jsimd_ycc_extbgrx_convert_mmx;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_ycc_extxbgr_convert_sse2;
       mmxfct=jsimd_ycc_extxbgr_convert_mmx;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_ycc_extxrgb_convert_sse2;
       mmxfct=jsimd_ycc_extxrgb_convert_mmx;
       break;
@@ -532,6 +544,7 @@
       mmxfct=jsimd_h2v2_extrgb_merged_upsample_mmx;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;
       mmxfct=jsimd_h2v2_extrgbx_merged_upsample_mmx;
       break;
@@ -540,14 +553,17 @@
       mmxfct=jsimd_h2v2_extbgr_merged_upsample_mmx;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;
       mmxfct=jsimd_h2v2_extbgrx_merged_upsample_mmx;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;
       mmxfct=jsimd_h2v2_extxbgr_merged_upsample_mmx;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;
       mmxfct=jsimd_h2v2_extxrgb_merged_upsample_mmx;
       break;
@@ -582,6 +598,7 @@
       mmxfct=jsimd_h2v1_extrgb_merged_upsample_mmx;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;
       mmxfct=jsimd_h2v1_extrgbx_merged_upsample_mmx;
       break;
@@ -590,14 +607,17 @@
       mmxfct=jsimd_h2v1_extbgr_merged_upsample_mmx;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;
       mmxfct=jsimd_h2v1_extbgrx_merged_upsample_mmx;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;
       mmxfct=jsimd_h2v1_extxbgr_merged_upsample_mmx;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;
       mmxfct=jsimd_h2v1_extxrgb_merged_upsample_mmx;
       break;
diff --git a/simd/jsimd_x86_64.c b/simd/jsimd_x86_64.c
index 2951268..8d17db3 100644
--- a/simd/jsimd_x86_64.c
+++ b/simd/jsimd_x86_64.c
@@ -93,18 +93,22 @@
       sse2fct=jsimd_extrgb_ycc_convert_sse2;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_extrgbx_ycc_convert_sse2;
       break;
     case JCS_EXT_BGR:
       sse2fct=jsimd_extbgr_ycc_convert_sse2;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_extbgrx_ycc_convert_sse2;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_extxbgr_ycc_convert_sse2;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_extxrgb_ycc_convert_sse2;
       break;
     default:
@@ -128,18 +132,22 @@
       sse2fct=jsimd_extrgb_gray_convert_sse2;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_extrgbx_gray_convert_sse2;
       break;
     case JCS_EXT_BGR:
       sse2fct=jsimd_extbgr_gray_convert_sse2;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_extbgrx_gray_convert_sse2;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_extxbgr_gray_convert_sse2;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_extxrgb_gray_convert_sse2;
       break;
     default:
@@ -163,18 +171,22 @@
       sse2fct=jsimd_ycc_extrgb_convert_sse2;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_ycc_extrgbx_convert_sse2;
       break;
     case JCS_EXT_BGR:
       sse2fct=jsimd_ycc_extbgr_convert_sse2;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_ycc_extbgrx_convert_sse2;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_ycc_extxbgr_convert_sse2;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_ycc_extxrgb_convert_sse2;
       break;
     default:
@@ -373,18 +385,22 @@
       sse2fct=jsimd_h2v2_extrgb_merged_upsample_sse2;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;
       break;
     case JCS_EXT_BGR:
       sse2fct=jsimd_h2v2_extbgr_merged_upsample_sse2;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;
       break;
     default:
@@ -409,18 +425,22 @@
       sse2fct=jsimd_h2v1_extrgb_merged_upsample_sse2;
       break;
     case JCS_EXT_RGBX:
+    case JCS_EXT_RGBA:
       sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;
       break;
     case JCS_EXT_BGR:
       sse2fct=jsimd_h2v1_extbgr_merged_upsample_sse2;
       break;
     case JCS_EXT_BGRX:
+    case JCS_EXT_BGRA:
       sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;
       break;
     case JCS_EXT_XBGR:
+    case JCS_EXT_ABGR:
       sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;
       break;
     case JCS_EXT_XRGB:
+    case JCS_EXT_ARGB:
       sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;
       break;
     default:
diff --git a/simd/jsimdext.inc b/simd/jsimdext.inc
index 5af5332..265fa64 100644
--- a/simd/jsimdext.inc
+++ b/simd/jsimdext.inc
@@ -89,6 +89,8 @@
 %define SEG_CONST   .rodata progbits alloc noexec nowrite align=16
 %endif
 
+%define STRICT_MEMORY_ACCESS 1
+
 ; To make the code position-independent, append -DPIC to the commandline
 ;
 %define GOT_SYMBOL  _GLOBAL_OFFSET_TABLE_	; ELF supports PIC
diff --git a/tjunittest.c b/tjunittest.c
index 3a57347..d14ec52 100644
--- a/tjunittest.c
+++ b/tjunittest.c
@@ -65,10 +65,11 @@
 
 const char *pixFormatStr[TJ_NUMPF]=
 {
-	"RGB", "BGR", "RGBX", "BGRX", "XBGR", "XRGB", "Grayscale"
+	"RGB", "BGR", "RGBX", "BGRX", "XBGR", "XRGB", "Grayscale",
+	"RGBA", "BGRA", "ABGR", "ARGB"
 };
 
-const int alphaOffset[TJ_NUMPF] = {-1, -1, 3, 3, 0, 0, -1};
+const int alphaOffset[TJ_NUMPF] = {-1, -1, -1, -1, -1, -1, -1, 3, 3, 0, 0};
 
 const int _3byteFormats[]={TJPF_RGB, TJPF_BGR};
 const int _4byteFormats[]={TJPF_RGBX, TJPF_BGRX, TJPF_XBGR, TJPF_XRGB};
@@ -76,7 +77,7 @@
 const int _onlyRGB[]={TJPF_RGB};
 
 enum {YUVENCODE=1, YUVDECODE};
-int yuv=0, alloc=0;
+int yuv=0, alloc=0, alpha=0;
 
 int exitStatus=0;
 #define bailout() {exitStatus=-1;  goto bailout;}
@@ -511,6 +512,9 @@
 				flags);
 			decompTest(dhandle, dstBuf, size, w, h, pf, basename, subsamp,
 				flags);
+			if(pf>=TJPF_RGBX && pf<=TJPF_XRGB)
+				decompTest(dhandle, dstBuf, size, w, h, pf+(TJPF_RGBA-TJPF_RGBX),
+					basename, subsamp, flags);
 		}
 	}
 
diff --git a/turbojpeg.c b/turbojpeg.c
index 9fc60ce..e27f0da 100644
--- a/turbojpeg.c
+++ b/turbojpeg.c
@@ -147,12 +147,16 @@
 		case TJPF_BGR:
 			cinfo->in_color_space=JCS_EXT_BGR;  break;
 		case TJPF_RGBX:
+		case TJPF_RGBA:
 			cinfo->in_color_space=JCS_EXT_RGBX;  break;
 		case TJPF_BGRX:
+		case TJPF_BGRA:
 			cinfo->in_color_space=JCS_EXT_BGRX;  break;
 		case TJPF_XRGB:
+		case TJPF_ARGB:
 			cinfo->in_color_space=JCS_EXT_XRGB;  break;
 		case TJPF_XBGR:
+		case TJPF_ABGR:
 			cinfo->in_color_space=JCS_EXT_XBGR;  break;
 		#else
 		case TJPF_RGB:
@@ -213,20 +217,28 @@
 			dinfo->out_color_space=JCS_EXT_XRGB;  break;
 		case TJPF_XBGR:
 			dinfo->out_color_space=JCS_EXT_XBGR;  break;
+		#if JCS_ALPHA_EXTENSIONS==1
+		case TJPF_RGBA:
+			dinfo->out_color_space=JCS_EXT_RGBA;  break;
+		case TJPF_BGRA:
+			dinfo->out_color_space=JCS_EXT_BGRA;  break;
+		case TJPF_ARGB:
+			dinfo->out_color_space=JCS_EXT_ARGB;  break;
+		case TJPF_ABGR:
+			dinfo->out_color_space=JCS_EXT_ABGR;  break;
+		#endif
 		#else
 		case TJPF_RGB:
 			if(RGB_RED==0 && RGB_GREEN==1 && RGB_BLUE==2 && RGB_PIXELSIZE==3)
 			{
 				dinfo->out_color_space=JCS_RGB;  break;
 			}
+		#endif
 		default:
 			_throw("Unsupported pixel format");
-		#endif
 	}
 
-	#if JCS_EXTENSIONS!=1
 	bailout:
-	#endif
 	return retval;
 }
 
diff --git a/turbojpeg.h b/turbojpeg.h
index a065fd1..343788a 100644
--- a/turbojpeg.h
+++ b/turbojpeg.h
@@ -113,7 +113,7 @@
 /**
  * The number of pixel formats
  */
-#define TJ_NUMPF 7
+#define TJ_NUMPF 11
 
 /**
  * Pixel formats
@@ -135,32 +135,60 @@
   /**
    * RGBX pixel format.  The red, green, and blue components in the image are
    * stored in 4-byte pixels in the order R, G, B from lowest to highest byte
-   * address within each pixel.
+   * address within each pixel.  The X component is ignored when compressing
+   * and undefined when decompressing.
    */
   TJPF_RGBX,
   /**
    * BGRX pixel format.  The red, green, and blue components in the image are
    * stored in 4-byte pixels in the order B, G, R from lowest to highest byte
-   * address within each pixel.
+   * address within each pixel.  The X component is ignored when compressing
+   * and undefined when decompressing.
    */
   TJPF_BGRX,
   /**
    * XBGR pixel format.  The red, green, and blue components in the image are
    * stored in 4-byte pixels in the order R, G, B from highest to lowest byte
-   * address within each pixel.
+   * address within each pixel.  The X component is ignored when compressing
+   * and undefined when decompressing.
    */
   TJPF_XBGR,
   /**
    * XRGB pixel format.  The red, green, and blue components in the image are
    * stored in 4-byte pixels in the order B, G, R from highest to lowest byte
-   * address within each pixel.
+   * address within each pixel.  The X component is ignored when compressing
+   * and undefined when decompressing.
    */
   TJPF_XRGB,
   /**
    * Grayscale pixel format.  Each 1-byte pixel represents a luminance
    * (brightness) level from 0 to 255.
    */
-  TJPF_GRAY
+  TJPF_GRAY,
+  /**
+   * RGBA pixel format.  This is the same as @ref TJPF_RGBX, except that when
+   * decompressing, the X component is guaranteed to be 0xFF, which can be
+   * interpreted as an opaque alpha channel.
+   */
+  TJPF_RGBA,
+  /**
+   * BGRA pixel format.  This is the same as @ref TJPF_BGRX, except that when
+   * decompressing, the X component is guaranteed to be 0xFF, which can be
+   * interpreted as an opaque alpha channel.
+   */
+  TJPF_BGRA,
+  /**
+   * ABGR pixel format.  This is the same as @ref TJPF_XBGR, except that when
+   * decompressing, the X component is guaranteed to be 0xFF, which can be
+   * interpreted as an opaque alpha channel.
+   */
+  TJPF_ABGR,
+  /**
+   * ARGB pixel format.  This is the same as @ref TJPF_XRGB, except that when
+   * decompressing, the X component is guaranteed to be 0xFF, which can be
+   * interpreted as an opaque alpha channel.
+   */
+  TJPF_ARGB
 };
 
 /**
@@ -169,7 +197,7 @@
  * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,
  * then the red component will be <tt>pixel[tjRedOffset[TJ_BGRX]]</tt>.
  */
-static const int tjRedOffset[TJ_NUMPF] = {0, 2, 0, 2, 3, 1, 0};
+static const int tjRedOffset[TJ_NUMPF] = {0, 2, 0, 2, 3, 1, 0, 0, 2, 3, 1};
 /**
  * Green offset (in bytes) for a given pixel format.  This specifies the number
  * of bytes that the green component is offset from the start of the pixel.
@@ -177,19 +205,19 @@
  * <tt>char pixel[]</tt>, then the green component will be
  * <tt>pixel[tjGreenOffset[TJ_BGRX]]</tt>.
  */
-static const int tjGreenOffset[TJ_NUMPF] = {1, 1, 1, 1, 2, 2, 0};
+static const int tjGreenOffset[TJ_NUMPF] = {1, 1, 1, 1, 2, 2, 0, 1, 1, 2, 2};
 /**
  * Blue offset (in bytes) for a given pixel format.  This specifies the number
  * of bytes that the Blue component is offset from the start of the pixel.  For
  * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,
  * then the blue component will be <tt>pixel[tjBlueOffset[TJ_BGRX]]</tt>.
  */
-static const int tjBlueOffset[TJ_NUMPF] = {2, 0, 2, 0, 1, 3, 0};
+static const int tjBlueOffset[TJ_NUMPF] = {2, 0, 2, 0, 1, 3, 0, 2, 0, 1, 3};
 
 /**
  * Pixel size (in bytes) for a given pixel format.
  */
-static const int tjPixelSize[TJ_NUMPF] = {3, 3, 4, 4, 4, 4, 1};
+static const int tjPixelSize[TJ_NUMPF] = {3, 3, 4, 4, 4, 4, 1, 4, 4, 4, 4};
 
 
 /**
@@ -489,7 +517,7 @@
  *        size of your pre-allocated buffer.  In any case, unless you have
  *        set #TJFLAG_NOREALLOC, you should always check <tt>*jpegBuf</tt> upon
  *        return from this function, as it may have changed.
- * @param jpegSize pointer to an unsigned long variable which holds the size of
+ * @param jpegSize pointer to an unsigned long variable that holds the size of
  *        the JPEG image buffer.  If <tt>*jpegBuf</tt> points to a
  *        pre-allocated buffer, then <tt>*jpegSize</tt> should be set to the
  *        size of the buffer.  Upon return, <tt>*jpegSize</tt> will contain the
@@ -576,7 +604,7 @@
  * @param height height (in pixels) of the source image
  * @param pixelFormat pixel format of the source image (see @ref TJPF
  *        "Pixel formats".)
- * @param dstBuf pointer to an image buffer which will receive the YUV image.
+ * @param dstBuf pointer to an image buffer that will receive the YUV image.
  *        Use #tjBufSizeYUV() to determine the appropriate size for this buffer
  *        based on the image width, height, and level of chrominance
  *        subsampling.
@@ -608,11 +636,11 @@
  * @param handle a handle to a TurboJPEG decompressor or transformer instance
  * @param jpegBuf pointer to a buffer containing a JPEG image
  * @param jpegSize size of the JPEG image (in bytes)
- * @param width pointer to an integer variable which will receive the width (in
+ * @param width pointer to an integer variable that will receive the width (in
  *        pixels) of the JPEG image
- * @param height pointer to an integer variable which will receive the height
+ * @param height pointer to an integer variable that will receive the height
  *        (in pixels) of the JPEG image
- * @param jpegSubsamp pointer to an integer variable which will receive the
+ * @param jpegSubsamp pointer to an integer variable that will receive the
  *        level of chrominance subsampling used when compressing the JPEG image
  *        (see @ref TJSAMP "Chrominance subsampling options".)
  *
@@ -642,7 +670,7 @@
  * @param handle a handle to a TurboJPEG decompressor or transformer instance
  * @param jpegBuf pointer to a buffer containing the JPEG image to decompress
  * @param jpegSize size of the JPEG image (in bytes)
- * @param dstBuf pointer to an image buffer which will receive the decompressed
+ * @param dstBuf pointer to an image buffer that will receive the decompressed
  *        image.  This buffer should normally be <tt>pitch * scaledHeight</tt>
  *        bytes in size, where <tt>scaledHeight</tt> can be determined by
  *        calling #TJSCALED() with the JPEG image height and one of the scaling
@@ -695,7 +723,7 @@
  * @param handle a handle to a TurboJPEG decompressor or transformer instance
  * @param jpegBuf pointer to a buffer containing the JPEG image to decompress
  * @param jpegSize size of the JPEG image (in bytes)
- * @param dstBuf pointer to an image buffer which will receive the YUV image.
+ * @param dstBuf pointer to an image buffer that will receive the YUV image.
  *        Use #tjBufSizeYUV to determine the appropriate size for this buffer
  *        based on the image width, height, and level of subsampling.
  * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
@@ -752,7 +780,7 @@
  *        the size of your pre-allocated buffer.  In any case, unless you have
  *        set #TJFLAG_NOREALLOC, you should always check <tt>dstBufs[i]</tt>
  *        upon return from this function, as it may have changed.
- * @param dstSizes pointer to an array of n unsigned long variables which will
+ * @param dstSizes pointer to an array of n unsigned long variables that will
  *        receive the actual sizes (in bytes) of each transformed JPEG image.
  *        If <tt>dstBufs[i]</tt> points to a pre-allocated buffer, then
  *        <tt>dstSizes[i]</tt> should be set to the size of the buffer.  Upon