Merge https://code.google.com/p/zopfli
diff --git a/CONTRIBUTORS b/CONTRIBUTORS
new file mode 100644
index 0000000..a1800be
--- /dev/null
+++ b/CONTRIBUTORS
@@ -0,0 +1,7 @@
+Mark Adler
+Jyrki Alakuijala
+Frédéric Kayser
+Daniel Reed
+Huzaifa Sidhpurwala
+Péter Szabó
+Lode Vandevenne
diff --git a/COPYING b/COPYING
new file mode 100644
index 0000000..2e64530
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2011 Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..ef159da
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,37 @@
+CC = gcc
+CXX = g++
+
+CFLAGS = -W -Wall -Wextra -ansi -pedantic -lm -O2
+CXXFLAGS = -W -Wall -Wextra -ansi -pedantic -O2
+
+ZOPFLILIB_SRC = src/zopfli/blocksplitter.c src/zopfli/cache.c\
+                src/zopfli/deflate.c src/zopfli/gzip_container.c\
+                src/zopfli/hash.c src/zopfli/katajainen.c\
+                src/zopfli/lz77.c src/zopfli/squeeze.c\
+                src/zopfli/tree.c src/zopfli/util.c\
+                src/zopfli/zlib_container.c src/zopfli/zopfli_lib.c
+ZOPFLILIB_OBJ := $(patsubst src/zopfli/%.c,%.o,$(ZOPFLILIB_SRC))
+ZOPFLIBIN_SRC := src/zopfli/zopfli_bin.c
+LODEPNG_SRC := src/zopflipng/lodepng/lodepng.cpp src/zopflipng/lodepng/lodepng_util.cpp
+ZOPFLIPNGLIB_SRC := src/zopflipng/zopflipng_lib.cc
+ZOPFLIPNGBIN_SRC := src/zopflipng/zopflipng_bin.cc
+
+.PHONY: zopfli zopflipng
+
+# Zopfli binary
+zopfli:
+	$(CC) $(ZOPFLILIB_SRC) $(ZOPFLIBIN_SRC) $(CFLAGS) -o zopfli
+
+# Zopfli shared library
+libzopfli:
+	$(CC) $(ZOPFLILIB_SRC) $(CFLAGS) -fPIC -c
+	$(CC) $(ZOPFLILIB_OBJ) $(CFLAGS) -shared -Wl,-soname,libzopfli.so.1 -o libzopfli.so.1.0.1
+
+# ZopfliPNG binary
+zopflipng:
+	$(CC) $(ZOPFLILIB_SRC) $(CFLAGS) -c
+	$(CXX) $(ZOPFLILIB_OBJ) $(LODEPNG_SRC) $(ZOPFLIPNGLIB_SRC) $(ZOPFLIPNGBIN_SRC) $(CFLAGS) -o zopflipng
+
+# Remove all libraries and binaries
+clean:
+	rm -f zopflipng zopfli $(ZOPFLILIB_OBJ) libzopfli*
diff --git a/README b/README
new file mode 100644
index 0000000..b28b189
--- /dev/null
+++ b/README
@@ -0,0 +1,32 @@
+Zopfli Compression Algorithm is a compression library programmed in C to perform
+very good, but slow, deflate or zlib compression.
+
+The basic function to compress data is ZopfliCompress in zopfli.h. Use the
+ZopfliOptions object to set parameters that affect the speed and compression.
+Use the ZopfliInitOptions function to place the default values in the
+ZopfliOptions first.
+
+ZopfliCompress supports deflate, gzip and zlib output format with a parameter.
+To support only one individual format, you can instead use ZopfliDeflate in
+deflate.h, ZopfliZlibCompress in zlib_container.h or ZopfliGzipCompress in
+gzip_container.h.
+
+ZopfliDeflate creates a valid deflate stream in memory, see:
+http://www.ietf.org/rfc/rfc1951.txt
+ZopfliZlibCompress creates a valid zlib stream in memory, see:
+http://www.ietf.org/rfc/rfc1950.txt
+ZopfliGzipCompress creates a valid gzip stream in memory, see:
+http://www.ietf.org/rfc/rfc1952.txt
+
+This library can only compress, not decompress. Existing zlib or deflate
+libraries can decompress the data.
+
+zopfli_bin.c is separate from the library and contains an example program to
+create very well compressed gzip files. Currently the makefile builds this
+program with the library statically linked in.
+
+To build the binary, use "make". To build the library as a shared Linux library,
+use "make libzopfli". The source code of Zopfli is under src/zopfli.
+
+Zopfli Compression Algorithm was created by Lode Vandevenne and Jyrki
+Alakuijala, based on an algorithm by Jyrki Alakuijala.
diff --git a/README.zopflipng b/README.zopflipng
new file mode 100644
index 0000000..84019ee
--- /dev/null
+++ b/README.zopflipng
@@ -0,0 +1,35 @@
+ZopfliPNG is a command line program to optimize the Portable Network Graphics
+(PNG) images. This version has the following features:
+- uses Zopfli compression for the Deflate compression,
+- compares several strategies for choosing scanline filter codes,
+- chooses a suitable color type to losslessly encode the image,
+- removes all chunks that are unimportant for the typical web use (metadata,
+  text, etc...),
+- optionally alters the hidden colors of fully transparent pixels for more
+  compression, and,
+- optionally converts 16-bit color channels to 8-bit.
+
+This is an alpha-release for testing while improvements, particularly to add
+palette selection, are still being made. Feedback and bug reports are welcome.
+
+To build ZopfliPNG, use "make zopflipng", or compile all the sources except
+zopfli_bin.c.
+
+The main compression algorithm in ZopfliPNG is ported from WebP lossless, but
+naturally cannot give as much compression gain for PNGs as it does for a more
+modern compression codec like WebP
+https://developers.google.com/speed/webp/docs/webp_lossless_bitstream_specification.
+
+Compared to libpng -- an often used PNG encoder implementation -- ZopfliPNG uses
+2-3 orders of magnitude more CPU time for compression. Initial testing using a
+corpus of 1000 PNGs with translucency, randomly selected from the internet,
+gives a compression improvement of 12% compared to convert -q 95, but only 0.5%
+compared to pngout (from better of /f0 and /f5 runs).
+
+By releasing this software we hope to make images on the web load faster without
+a new image format, but the opportunities for optimization within PNG are
+limited. When targeting Android, Chrome, Opera, and Yandex browsers, or by using
+suitable plugins for other browsers, it is good to note that WebP lossless
+images are still 26 % smaller than images recompressed with ZopfliPNG.
+
+2013-05-07, Lode Vandevenne and Jyrki Alakuijala
diff --git a/src/zopfli/blocksplitter.c b/src/zopfli/blocksplitter.c
new file mode 100644
index 0000000..68f5ff3
--- /dev/null
+++ b/src/zopfli/blocksplitter.c
@@ -0,0 +1,342 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "blocksplitter.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "deflate.h"
+#include "lz77.h"
+#include "squeeze.h"
+#include "tree.h"
+#include "util.h"
+
+/*
+The "f" for the FindMinimum function below.
+i: the current parameter of f(i)
+context: for your implementation
+*/
+typedef double FindMinimumFun(size_t i, void* context);
+
+/*
+Finds minimum of function f(i) where is is of type size_t, f(i) is of type
+double, i is in range start-end (excluding end).
+*/
+static size_t FindMinimum(FindMinimumFun f, void* context,
+                          size_t start, size_t end) {
+  if (end - start < 1024) {
+    double best = ZOPFLI_LARGE_FLOAT;
+    size_t result = start;
+    size_t i;
+    for (i = start; i < end; i++) {
+      double v = f(i, context);
+      if (v < best) {
+        best = v;
+        result = i;
+      }
+    }
+    return result;
+  } else {
+    /* Try to find minimum faster by recursively checking multiple points. */
+#define NUM 9  /* Good value: 9. */
+    size_t i;
+    size_t p[NUM];
+    double vp[NUM];
+    size_t besti;
+    double best;
+    double lastbest = ZOPFLI_LARGE_FLOAT;
+    size_t pos = start;
+
+    for (;;) {
+      if (end - start <= NUM) break;
+
+      for (i = 0; i < NUM; i++) {
+        p[i] = start + (i + 1) * ((end - start) / (NUM + 1));
+        vp[i] = f(p[i], context);
+      }
+      besti = 0;
+      best = vp[0];
+      for (i = 1; i < NUM; i++) {
+        if (vp[i] < best) {
+          best = vp[i];
+          besti = i;
+        }
+      }
+      if (best > lastbest) break;
+
+      start = besti == 0 ? start : p[besti - 1];
+      end = besti == NUM - 1 ? end : p[besti + 1];
+
+      pos = p[besti];
+      lastbest = best;
+    }
+    return pos;
+#undef NUM
+  }
+}
+
+/*
+Returns estimated cost of a block in bits.  It includes the size to encode the
+tree and the size to encode all literal, length and distance symbols and their
+extra bits.
+
+litlens: lz77 lit/lengths
+dists: ll77 distances
+lstart: start of block
+lend: end of block (not inclusive)
+*/
+static double EstimateCost(const unsigned short* litlens,
+                           const unsigned short* dists,
+                           size_t lstart, size_t lend) {
+  return ZopfliCalculateBlockSize(litlens, dists, lstart, lend, 2);
+}
+
+typedef struct SplitCostContext {
+  const unsigned short* litlens;
+  const unsigned short* dists;
+  size_t llsize;
+  size_t start;
+  size_t end;
+} SplitCostContext;
+
+
+/*
+Gets the cost which is the sum of the cost of the left and the right section
+of the data.
+type: FindMinimumFun
+*/
+static double SplitCost(size_t i, void* context) {
+  SplitCostContext* c = (SplitCostContext*)context;
+  return EstimateCost(c->litlens, c->dists, c->start, i) +
+      EstimateCost(c->litlens, c->dists, i, c->end);
+}
+
+static void AddSorted(size_t value, size_t** out, size_t* outsize) {
+  size_t i;
+  ZOPFLI_APPEND_DATA(value, out, outsize);
+  for (i = 0; i + 1 < *outsize; i++) {
+    if ((*out)[i] > value) {
+      size_t j;
+      for (j = *outsize - 1; j > i; j--) {
+        (*out)[j] = (*out)[j - 1];
+      }
+      (*out)[i] = value;
+      break;
+    }
+  }
+}
+
+/*
+Prints the block split points as decimal and hex values in the terminal.
+*/
+static void PrintBlockSplitPoints(const unsigned short* litlens,
+                                  const unsigned short* dists,
+                                  size_t llsize, const size_t* lz77splitpoints,
+                                  size_t nlz77points) {
+  size_t* splitpoints = 0;
+  size_t npoints = 0;
+  size_t i;
+  /* The input is given as lz77 indices, but we want to see the uncompressed
+  index values. */
+  size_t pos = 0;
+  if (nlz77points > 0) {
+    for (i = 0; i < llsize; i++) {
+      size_t length = dists[i] == 0 ? 1 : litlens[i];
+      if (lz77splitpoints[npoints] == i) {
+        ZOPFLI_APPEND_DATA(pos, &splitpoints, &npoints);
+        if (npoints == nlz77points) break;
+      }
+      pos += length;
+    }
+  }
+  assert(npoints == nlz77points);
+
+  fprintf(stderr, "block split points: ");
+  for (i = 0; i < npoints; i++) {
+    fprintf(stderr, "%d ", (int)splitpoints[i]);
+  }
+  fprintf(stderr, "(hex:");
+  for (i = 0; i < npoints; i++) {
+    fprintf(stderr, " %x", (int)splitpoints[i]);
+  }
+  fprintf(stderr, ")\n");
+
+  free(splitpoints);
+}
+
+/*
+Finds next block to try to split, the largest of the available ones.
+The largest is chosen to make sure that if only a limited amount of blocks is
+requested, their sizes are spread evenly.
+llsize: the size of the LL77 data, which is the size of the done array here.
+done: array indicating which blocks starting at that position are no longer
+    splittable (splitting them increases rather than decreases cost).
+splitpoints: the splitpoints found so far.
+npoints: the amount of splitpoints found so far.
+lstart: output variable, giving start of block.
+lend: output variable, giving end of block.
+returns 1 if a block was found, 0 if no block found (all are done).
+*/
+static int FindLargestSplittableBlock(
+    size_t llsize, const unsigned char* done,
+    const size_t* splitpoints, size_t npoints,
+    size_t* lstart, size_t* lend) {
+  size_t longest = 0;
+  int found = 0;
+  size_t i;
+  for (i = 0; i <= npoints; i++) {
+    size_t start = i == 0 ? 0 : splitpoints[i - 1];
+    size_t end = i == npoints ? llsize - 1 : splitpoints[i];
+    if (!done[start] && end - start > longest) {
+      *lstart = start;
+      *lend = end;
+      found = 1;
+      longest = end - start;
+    }
+  }
+  return found;
+}
+
+void ZopfliBlockSplitLZ77(const ZopfliOptions* options,
+                          const unsigned short* litlens,
+                          const unsigned short* dists,
+                          size_t llsize, size_t maxblocks,
+                          size_t** splitpoints, size_t* npoints) {
+  size_t lstart, lend;
+  size_t i;
+  size_t llpos = 0;
+  size_t numblocks = 1;
+  unsigned char* done;
+  double splitcost, origcost;
+
+  if (llsize < 10) return;  /* This code fails on tiny files. */
+
+  done = (unsigned char*)malloc(llsize);
+  if (!done) exit(-1); /* Allocation failed. */
+  for (i = 0; i < llsize; i++) done[i] = 0;
+
+  lstart = 0;
+  lend = llsize;
+  for (;;) {
+    SplitCostContext c;
+
+    if (maxblocks > 0 && numblocks >= maxblocks) {
+      break;
+    }
+
+    c.litlens = litlens;
+    c.dists = dists;
+    c.llsize = llsize;
+    c.start = lstart;
+    c.end = lend;
+    assert(lstart < lend);
+    llpos = FindMinimum(SplitCost, &c, lstart + 1, lend);
+
+    assert(llpos > lstart);
+    assert(llpos < lend);
+
+    splitcost = EstimateCost(litlens, dists, lstart, llpos) +
+        EstimateCost(litlens, dists, llpos, lend);
+    origcost = EstimateCost(litlens, dists, lstart, lend);
+
+    if (splitcost > origcost || llpos == lstart + 1 || llpos == lend) {
+      done[lstart] = 1;
+    } else {
+      AddSorted(llpos, splitpoints, npoints);
+      numblocks++;
+    }
+
+    if (!FindLargestSplittableBlock(
+        llsize, done, *splitpoints, *npoints, &lstart, &lend)) {
+      break;  /* No further split will probably reduce compression. */
+    }
+
+    if (lend - lstart < 10) {
+      break;
+    }
+  }
+
+  if (options->verbose) {
+    PrintBlockSplitPoints(litlens, dists, llsize, *splitpoints, *npoints);
+  }
+
+  free(done);
+}
+
+void ZopfliBlockSplit(const ZopfliOptions* options,
+                      const unsigned char* in, size_t instart, size_t inend,
+                      size_t maxblocks, size_t** splitpoints, size_t* npoints) {
+  size_t pos = 0;
+  size_t i;
+  ZopfliBlockState s;
+  size_t* lz77splitpoints = 0;
+  size_t nlz77points = 0;
+  ZopfliLZ77Store store;
+
+  ZopfliInitLZ77Store(&store);
+
+  s.options = options;
+  s.blockstart = instart;
+  s.blockend = inend;
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  s.lmc = 0;
+#endif
+
+  *npoints = 0;
+  *splitpoints = 0;
+
+  /* Unintuitively, Using a simple LZ77 method here instead of ZopfliLZ77Optimal
+  results in better blocks. */
+  ZopfliLZ77Greedy(&s, in, instart, inend, &store);
+
+  ZopfliBlockSplitLZ77(options,
+                       store.litlens, store.dists, store.size, maxblocks,
+                       &lz77splitpoints, &nlz77points);
+
+  /* Convert LZ77 positions to positions in the uncompressed input. */
+  pos = instart;
+  if (nlz77points > 0) {
+    for (i = 0; i < store.size; i++) {
+      size_t length = store.dists[i] == 0 ? 1 : store.litlens[i];
+      if (lz77splitpoints[*npoints] == i) {
+        ZOPFLI_APPEND_DATA(pos, splitpoints, npoints);
+        if (*npoints == nlz77points) break;
+      }
+      pos += length;
+    }
+  }
+  assert(*npoints == nlz77points);
+
+  free(lz77splitpoints);
+  ZopfliCleanLZ77Store(&store);
+}
+
+void ZopfliBlockSplitSimple(const unsigned char* in,
+                            size_t instart, size_t inend,
+                            size_t blocksize,
+                            size_t** splitpoints, size_t* npoints) {
+  size_t i = instart;
+  while (i < inend) {
+    ZOPFLI_APPEND_DATA(i, splitpoints, npoints);
+    i += blocksize;
+  }
+  (void)in;
+}
diff --git a/src/zopfli/blocksplitter.h b/src/zopfli/blocksplitter.h
new file mode 100644
index 0000000..6791702
--- /dev/null
+++ b/src/zopfli/blocksplitter.h
@@ -0,0 +1,77 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Functions to choose good boundaries for block splitting. Deflate allows encoding
+the data in multiple blocks, with a separate Huffman tree for each block. The
+Huffman tree itself requires some bytes to encode, so by choosing certain
+blocks, you can either hurt, or enhance compression. These functions choose good
+ones that enhance it.
+*/
+
+#ifndef ZOPFLI_BLOCKSPLITTER_H_
+#define ZOPFLI_BLOCKSPLITTER_H_
+
+#include <stdlib.h>
+
+#include "zopfli.h"
+
+
+/*
+Does blocksplitting on LZ77 data.
+The output splitpoints are indices in the LZ77 data.
+litlens: lz77 lit/lengths
+dists: lz77 distances
+llsize: size of litlens and dists
+maxblocks: set a limit to the amount of blocks. Set to 0 to mean no limit.
+*/
+void ZopfliBlockSplitLZ77(const ZopfliOptions* options,
+                          const unsigned short* litlens,
+                          const unsigned short* dists,
+                          size_t llsize, size_t maxblocks,
+                          size_t** splitpoints, size_t* npoints);
+
+/*
+Does blocksplitting on uncompressed data.
+The output splitpoints are indices in the uncompressed bytes.
+
+options: general program options.
+in: uncompressed input data
+instart: where to start splitting
+inend: where to end splitting (not inclusive)
+maxblocks: maximum amount of blocks to split into, or 0 for no limit
+splitpoints: dynamic array to put the resulting split point coordinates into.
+  The coordinates are indices in the input array.
+npoints: pointer to amount of splitpoints, for the dynamic array. The amount of
+  blocks is the amount of splitpoitns + 1.
+*/
+void ZopfliBlockSplit(const ZopfliOptions* options,
+                      const unsigned char* in, size_t instart, size_t inend,
+                      size_t maxblocks, size_t** splitpoints, size_t* npoints);
+
+/*
+Divides the input into equal blocks, does not even take LZ77 lengths into
+account.
+*/
+void ZopfliBlockSplitSimple(const unsigned char* in,
+                            size_t instart, size_t inend,
+                            size_t blocksize,
+                            size_t** splitpoints, size_t* npoints);
+
+#endif  /* ZOPFLI_BLOCKSPLITTER_H_ */
diff --git a/src/zopfli/cache.c b/src/zopfli/cache.c
new file mode 100644
index 0000000..88a49ac
--- /dev/null
+++ b/src/zopfli/cache.c
@@ -0,0 +1,119 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "cache.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+
+void ZopfliInitCache(size_t blocksize, ZopfliLongestMatchCache* lmc) {
+  size_t i;
+  lmc->length = (unsigned short*)malloc(sizeof(unsigned short) * blocksize);
+  lmc->dist = (unsigned short*)malloc(sizeof(unsigned short) * blocksize);
+  /* Rather large amount of memory. */
+  lmc->sublen = (unsigned char*)malloc(ZOPFLI_CACHE_LENGTH * 3 * blocksize);
+
+  /* length > 0 and dist 0 is invalid combination, which indicates on purpose
+  that this cache value is not filled in yet. */
+  for (i = 0; i < blocksize; i++) lmc->length[i] = 1;
+  for (i = 0; i < blocksize; i++) lmc->dist[i] = 0;
+  for (i = 0; i < ZOPFLI_CACHE_LENGTH * blocksize * 3; i++) lmc->sublen[i] = 0;
+}
+
+void ZopfliCleanCache(ZopfliLongestMatchCache* lmc) {
+  free(lmc->length);
+  free(lmc->dist);
+  free(lmc->sublen);
+}
+
+void ZopfliSublenToCache(const unsigned short* sublen,
+                         size_t pos, size_t length,
+                         ZopfliLongestMatchCache* lmc) {
+  size_t i;
+  size_t j = 0;
+  unsigned bestlength = 0;
+  unsigned char* cache;
+
+#if ZOPFLI_CACHE_LENGTH == 0
+  return;
+#endif
+
+  cache = &lmc->sublen[ZOPFLI_CACHE_LENGTH * pos * 3];
+  if (length < 3) return;
+  for (i = 3; i <= length; i++) {
+    if (i == length || sublen[i] != sublen[i + 1]) {
+      cache[j * 3] = i - 3;
+      cache[j * 3 + 1] = sublen[i] % 256;
+      cache[j * 3 + 2] = (sublen[i] >> 8) % 256;
+      bestlength = i;
+      j++;
+      if (j >= ZOPFLI_CACHE_LENGTH) break;
+    }
+  }
+  if (j < ZOPFLI_CACHE_LENGTH) {
+    assert(bestlength == length);
+    cache[(ZOPFLI_CACHE_LENGTH - 1) * 3] = bestlength - 3;
+  } else {
+    assert(bestlength <= length);
+  }
+  assert(bestlength == ZopfliMaxCachedSublen(lmc, pos, length));
+}
+
+void ZopfliCacheToSublen(const ZopfliLongestMatchCache* lmc,
+                         size_t pos, size_t length,
+                         unsigned short* sublen) {
+  size_t i, j;
+  unsigned maxlength = ZopfliMaxCachedSublen(lmc, pos, length);
+  unsigned prevlength = 0;
+  unsigned char* cache;
+#if ZOPFLI_CACHE_LENGTH == 0
+  return;
+#endif
+  if (length < 3) return;
+  cache = &lmc->sublen[ZOPFLI_CACHE_LENGTH * pos * 3];
+  for (j = 0; j < ZOPFLI_CACHE_LENGTH; j++) {
+    unsigned length = cache[j * 3] + 3;
+    unsigned dist = cache[j * 3 + 1] + 256 * cache[j * 3 + 2];
+    for (i = prevlength; i <= length; i++) {
+      sublen[i] = dist;
+    }
+    if (length == maxlength) break;
+    prevlength = length + 1;
+  }
+}
+
+/*
+Returns the length up to which could be stored in the cache.
+*/
+unsigned ZopfliMaxCachedSublen(const ZopfliLongestMatchCache* lmc,
+                               size_t pos, size_t length) {
+  unsigned char* cache;
+#if ZOPFLI_CACHE_LENGTH == 0
+  return 0;
+#endif
+  cache = &lmc->sublen[ZOPFLI_CACHE_LENGTH * pos * 3];
+  (void)length;
+  if (cache[1] == 0 && cache[2] == 0) return 0;  /* No sublen cached. */
+  return cache[(ZOPFLI_CACHE_LENGTH - 1) * 3] + 3;
+}
+
+#endif  /* ZOPFLI_LONGEST_MATCH_CACHE */
diff --git a/src/zopfli/cache.h b/src/zopfli/cache.h
new file mode 100644
index 0000000..5ca0c50
--- /dev/null
+++ b/src/zopfli/cache.h
@@ -0,0 +1,66 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+The cache that speeds up ZopfliFindLongestMatch of lz77.c.
+*/
+
+#ifndef ZOPFLI_CACHE_H_
+#define ZOPFLI_CACHE_H_
+
+#include "util.h"
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+
+/*
+Cache used by ZopfliFindLongestMatch to remember previously found length/dist
+values.
+This is needed because the squeeze runs will ask these values multiple times for
+the same position.
+Uses large amounts of memory, since it has to remember the distance belonging
+to every possible shorter-than-the-best length (the so called "sublen" array).
+*/
+typedef struct ZopfliLongestMatchCache {
+  unsigned short* length;
+  unsigned short* dist;
+  unsigned char* sublen;
+} ZopfliLongestMatchCache;
+
+/* Initializes the ZopfliLongestMatchCache. */
+void ZopfliInitCache(size_t blocksize, ZopfliLongestMatchCache* lmc);
+
+/* Frees up the memory of the ZopfliLongestMatchCache. */
+void ZopfliCleanCache(ZopfliLongestMatchCache* lmc);
+
+/* Stores sublen array in the cache. */
+void ZopfliSublenToCache(const unsigned short* sublen,
+                         size_t pos, size_t length,
+                         ZopfliLongestMatchCache* lmc);
+
+/* Extracts sublen array from the cache. */
+void ZopfliCacheToSublen(const ZopfliLongestMatchCache* lmc,
+                         size_t pos, size_t length,
+                         unsigned short* sublen);
+/* Returns the length up to which could be stored in the cache. */
+unsigned ZopfliMaxCachedSublen(const ZopfliLongestMatchCache* lmc,
+                               size_t pos, size_t length);
+
+#endif  /* ZOPFLI_LONGEST_MATCH_CACHE */
+
+#endif  /* ZOPFLI_CACHE_H_ */
diff --git a/src/zopfli/deflate.c b/src/zopfli/deflate.c
new file mode 100644
index 0000000..4b0724b
--- /dev/null
+++ b/src/zopfli/deflate.c
@@ -0,0 +1,866 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "deflate.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "blocksplitter.h"
+#include "lz77.h"
+#include "squeeze.h"
+#include "tree.h"
+
+/*
+bp = bitpointer, always in range [0, 7].
+The outsize is number of necessary bytes to encode the bits.
+Given the value of bp and the amount of bytes, the amount of bits represented
+is not simply bytesize * 8 + bp because even representing one bit requires a
+whole byte. It is: (bp == 0) ? (bytesize * 8) : ((bytesize - 1) * 8 + bp)
+*/
+static void AddBit(int bit,
+                   unsigned char* bp, unsigned char** out, size_t* outsize) {
+  if (*bp == 0) ZOPFLI_APPEND_DATA(0, out, outsize);
+  (*out)[*outsize - 1] |= bit << *bp;
+  *bp = (*bp + 1) & 7;
+}
+
+static void AddBits(unsigned symbol, unsigned length,
+                    unsigned char* bp, unsigned char** out, size_t* outsize) {
+  /* TODO(lode): make more efficient (add more bits at once). */
+  unsigned i;
+  for (i = 0; i < length; i++) {
+    unsigned bit = (symbol >> i) & 1;
+    if (*bp == 0) ZOPFLI_APPEND_DATA(0, out, outsize);
+    (*out)[*outsize - 1] |= bit << *bp;
+    *bp = (*bp + 1) & 7;
+  }
+}
+
+/*
+Adds bits, like AddBits, but the order is inverted. The deflate specification
+uses both orders in one standard.
+*/
+static void AddHuffmanBits(unsigned symbol, unsigned length,
+                           unsigned char* bp, unsigned char** out,
+                           size_t* outsize) {
+  /* TODO(lode): make more efficient (add more bits at once). */
+  unsigned i;
+  for (i = 0; i < length; i++) {
+    unsigned bit = (symbol >> (length - i - 1)) & 1;
+    if (*bp == 0) ZOPFLI_APPEND_DATA(0, out, outsize);
+    (*out)[*outsize - 1] |= bit << *bp;
+    *bp = (*bp + 1) & 7;
+  }
+}
+
+/*
+Ensures there are at least 2 distance codes to support buggy decoders.
+Zlib 1.2.1 and below have a bug where it fails if there isn't at least 1
+distance code (with length > 0), even though it's valid according to the
+deflate spec to have 0 distance codes. On top of that, some mobile phones
+require at least two distance codes. To support these decoders too (but
+potentially at the cost of a few bytes), add dummy code lengths of 1.
+References to this bug can be found in the changelog of
+Zlib 1.2.2 and here: http://www.jonof.id.au/forum/index.php?topic=515.0.
+
+d_lengths: the 32 lengths of the distance codes.
+*/
+static void PatchDistanceCodesForBuggyDecoders(unsigned* d_lengths) {
+  int num_dist_codes = 0; /* Amount of non-zero distance codes */
+  int i;
+  for (i = 0; i < 30 /* Ignore the two unused codes from the spec */; i++) {
+    if (d_lengths[i]) num_dist_codes++;
+    if (num_dist_codes >= 2) return; /* Two or more codes is fine. */
+  }
+
+  if (num_dist_codes == 0) {
+    d_lengths[0] = d_lengths[1] = 1;
+  } else if (num_dist_codes == 1) {
+    d_lengths[d_lengths[0] ? 1 : 0] = 1;
+  }
+}
+
+/*
+Encodes the Huffman tree and returns how many bits its encoding takes. If out
+is a null pointer, only returns the size and runs faster.
+*/
+static size_t EncodeTree(const unsigned* ll_lengths,
+                         const unsigned* d_lengths,
+                         int use_16, int use_17, int use_18,
+                         unsigned char* bp,
+                         unsigned char** out, size_t* outsize) {
+  unsigned lld_total;  /* Total amount of literal, length, distance codes. */
+  /* Runlength encoded version of lengths of litlen and dist trees. */
+  unsigned* rle = 0;
+  unsigned* rle_bits = 0;  /* Extra bits for rle values 16, 17 and 18. */
+  size_t rle_size = 0;  /* Size of rle array. */
+  size_t rle_bits_size = 0;  /* Should have same value as rle_size. */
+  unsigned hlit = 29;  /* 286 - 257 */
+  unsigned hdist = 29;  /* 32 - 1, but gzip does not like hdist > 29.*/
+  unsigned hclen;
+  unsigned hlit2;
+  size_t i, j;
+  size_t clcounts[19];
+  unsigned clcl[19];  /* Code length code lengths. */
+  unsigned clsymbols[19];
+  /* The order in which code length code lengths are encoded as per deflate. */
+  static const unsigned order[19] = {
+    16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
+  };
+  int size_only = !out;
+  size_t result_size = 0;
+
+  for(i = 0; i < 19; i++) clcounts[i] = 0;
+
+  /* Trim zeros. */
+  while (hlit > 0 && ll_lengths[257 + hlit - 1] == 0) hlit--;
+  while (hdist > 0 && d_lengths[1 + hdist - 1] == 0) hdist--;
+  hlit2 = hlit + 257;
+
+  lld_total = hlit2 + hdist + 1;
+
+  for (i = 0; i < lld_total; i++) {
+    /* This is an encoding of a huffman tree, so now the length is a symbol */
+    unsigned char symbol = i < hlit2 ? ll_lengths[i] : d_lengths[i - hlit2];
+    unsigned count = 1;
+    if(use_16 || (symbol == 0 && (use_17 || use_18))) {
+      for (j = i + 1; j < lld_total && symbol ==
+          (j < hlit2 ? ll_lengths[j] : d_lengths[j - hlit2]); j++) {
+        count++;
+      }
+    }
+    i += count - 1;
+
+    /* Repetitions of zeroes */
+    if (symbol == 0 && count >= 3) {
+      if (use_18) {
+        while (count >= 11) {
+          unsigned count2 = count > 138 ? 138 : count;
+          if (!size_only) {
+            ZOPFLI_APPEND_DATA(18, &rle, &rle_size);
+            ZOPFLI_APPEND_DATA(count2 - 11, &rle_bits, &rle_bits_size);
+          }
+          clcounts[18]++;
+          count -= count2;
+        }
+      }
+      if (use_17) {
+        while (count >= 3) {
+          unsigned count2 = count > 10 ? 10 : count;
+          if (!size_only) {
+            ZOPFLI_APPEND_DATA(17, &rle, &rle_size);
+            ZOPFLI_APPEND_DATA(count2 - 3, &rle_bits, &rle_bits_size);
+          }
+          clcounts[17]++;
+          count -= count2;
+        }
+      }
+    }
+
+    /* Repetitions of any symbol */
+    if (use_16 && count >= 4) {
+      count--;  /* Since the first one is hardcoded. */
+      clcounts[symbol]++;
+      if (!size_only) {
+        ZOPFLI_APPEND_DATA(symbol, &rle, &rle_size);
+        ZOPFLI_APPEND_DATA(0, &rle_bits, &rle_bits_size);
+      }
+      while (count >= 3) {
+        unsigned count2 = count > 6 ? 6 : count;
+        if (!size_only) {
+          ZOPFLI_APPEND_DATA(16, &rle, &rle_size);
+          ZOPFLI_APPEND_DATA(count2 - 3, &rle_bits, &rle_bits_size);
+        }
+        clcounts[16]++;
+        count -= count2;
+      }
+    }
+
+    /* No or insufficient repetition */
+    clcounts[symbol] += count;
+    while (count > 0) {
+      if (!size_only) {
+        ZOPFLI_APPEND_DATA(symbol, &rle, &rle_size);
+        ZOPFLI_APPEND_DATA(0, &rle_bits, &rle_bits_size);
+      }
+      count--;
+    }
+  }
+
+  ZopfliCalculateBitLengths(clcounts, 19, 7, clcl);
+  if (!size_only) ZopfliLengthsToSymbols(clcl, 19, 7, clsymbols);
+
+  hclen = 15;
+  /* Trim zeros. */
+  while (hclen > 0 && clcounts[order[hclen + 4 - 1]] == 0) hclen--;
+
+  if (!size_only) {
+    AddBits(hlit, 5, bp, out, outsize);
+    AddBits(hdist, 5, bp, out, outsize);
+    AddBits(hclen, 4, bp, out, outsize);
+
+    for (i = 0; i < hclen + 4; i++) {
+      AddBits(clcl[order[i]], 3, bp, out, outsize);
+    }
+
+    for (i = 0; i < rle_size; i++) {
+      unsigned symbol = clsymbols[rle[i]];
+      AddHuffmanBits(symbol, clcl[rle[i]], bp, out, outsize);
+      /* Extra bits. */
+      if (rle[i] == 16) AddBits(rle_bits[i], 2, bp, out, outsize);
+      else if (rle[i] == 17) AddBits(rle_bits[i], 3, bp, out, outsize);
+      else if (rle[i] == 18) AddBits(rle_bits[i], 7, bp, out, outsize);
+    }
+  }
+
+  result_size += 14;  /* hlit, hdist, hclen bits */
+  result_size += (hclen + 4) * 3;  /* clcl bits */
+  for(i = 0; i < 19; i++) {
+    result_size += clcl[i] * clcounts[i];
+  }
+  /* Extra bits. */
+  result_size += clcounts[16] * 2;
+  result_size += clcounts[17] * 3;
+  result_size += clcounts[18] * 7;
+
+  /* Note: in case of "size_only" these are null pointers so no effect. */
+  free(rle);
+  free(rle_bits);
+
+  return result_size;
+}
+
+static void AddDynamicTree(const unsigned* ll_lengths,
+                           const unsigned* d_lengths,
+                           unsigned char* bp,
+                           unsigned char** out, size_t* outsize) {
+  int i;
+  int best = 0;
+  size_t bestsize = 0;
+
+  for(i = 0; i < 8; i++) {
+    size_t size = EncodeTree(ll_lengths, d_lengths,
+                             i & 1, i & 2, i & 4,
+                             0, 0, 0);
+    if (bestsize == 0 || size < bestsize) {
+      bestsize = size;
+      best = i;
+    }
+  }
+
+  EncodeTree(ll_lengths, d_lengths,
+             best & 1, best & 2, best & 4,
+             bp, out, outsize);
+}
+
+/*
+Gives the exact size of the tree, in bits, as it will be encoded in DEFLATE.
+*/
+static size_t CalculateTreeSize(const unsigned* ll_lengths,
+                                const unsigned* d_lengths) {
+  size_t result = 0;
+  int i;
+
+  for(i = 0; i < 8; i++) {
+    size_t size = EncodeTree(ll_lengths, d_lengths,
+                             i & 1, i & 2, i & 4,
+                             0, 0, 0);
+    if (result == 0 || size < result) result = size;
+  }
+
+  return result;
+}
+
+/*
+Adds all lit/len and dist codes from the lists as huffman symbols. Does not add
+end code 256. expected_data_size is the uncompressed block size, used for
+assert, but you can set it to 0 to not do the assertion.
+*/
+static void AddLZ77Data(const unsigned short* litlens,
+                        const unsigned short* dists,
+                        size_t lstart, size_t lend,
+                        size_t expected_data_size,
+                        const unsigned* ll_symbols, const unsigned* ll_lengths,
+                        const unsigned* d_symbols, const unsigned* d_lengths,
+                        unsigned char* bp,
+                        unsigned char** out, size_t* outsize) {
+  size_t testlength = 0;
+  size_t i;
+
+  for (i = lstart; i < lend; i++) {
+    unsigned dist = dists[i];
+    unsigned litlen = litlens[i];
+    if (dist == 0) {
+      assert(litlen < 256);
+      assert(ll_lengths[litlen] > 0);
+      AddHuffmanBits(ll_symbols[litlen], ll_lengths[litlen], bp, out, outsize);
+      testlength++;
+    } else {
+      unsigned lls = ZopfliGetLengthSymbol(litlen);
+      unsigned ds = ZopfliGetDistSymbol(dist);
+      assert(litlen >= 3 && litlen <= 288);
+      assert(ll_lengths[lls] > 0);
+      assert(d_lengths[ds] > 0);
+      AddHuffmanBits(ll_symbols[lls], ll_lengths[lls], bp, out, outsize);
+      AddBits(ZopfliGetLengthExtraBitsValue(litlen),
+              ZopfliGetLengthExtraBits(litlen),
+              bp, out, outsize);
+      AddHuffmanBits(d_symbols[ds], d_lengths[ds], bp, out, outsize);
+      AddBits(ZopfliGetDistExtraBitsValue(dist),
+              ZopfliGetDistExtraBits(dist),
+              bp, out, outsize);
+      testlength += litlen;
+    }
+  }
+  assert(expected_data_size == 0 || testlength == expected_data_size);
+}
+
+static void GetFixedTree(unsigned* ll_lengths, unsigned* d_lengths) {
+  size_t i;
+  for (i = 0; i < 144; i++) ll_lengths[i] = 8;
+  for (i = 144; i < 256; i++) ll_lengths[i] = 9;
+  for (i = 256; i < 280; i++) ll_lengths[i] = 7;
+  for (i = 280; i < 288; i++) ll_lengths[i] = 8;
+  for (i = 0; i < 32; i++) d_lengths[i] = 5;
+}
+
+/*
+Calculates size of the part after the header and tree of an LZ77 block, in bits.
+*/
+static size_t CalculateBlockSymbolSize(const unsigned* ll_lengths,
+                                       const unsigned* d_lengths,
+                                       const unsigned short* litlens,
+                                       const unsigned short* dists,
+                                       size_t lstart, size_t lend) {
+  size_t result = 0;
+  size_t i;
+  for (i = lstart; i < lend; i++) {
+    if (dists[i] == 0) {
+      result += ll_lengths[litlens[i]];
+    } else {
+      result += ll_lengths[ZopfliGetLengthSymbol(litlens[i])];
+      result += d_lengths[ZopfliGetDistSymbol(dists[i])];
+      result += ZopfliGetLengthExtraBits(litlens[i]);
+      result += ZopfliGetDistExtraBits(dists[i]);
+    }
+  }
+  result += ll_lengths[256]; /*end symbol*/
+  return result;
+}
+
+static size_t AbsDiff(size_t x, size_t y) {
+  if (x > y)
+    return x - y;
+  else
+    return y - x;
+}
+
+/*
+Change the population counts in a way that the consequent Hufmann tree
+compression, especially its rle-part will be more likely to compress this data
+more efficiently. length containts the size of the histogram.
+*/
+void OptimizeHuffmanForRle(int length, size_t* counts) {
+  int i, k, stride;
+  size_t symbol, sum, limit;
+  int* good_for_rle;
+
+  /* 1) We don't want to touch the trailing zeros. We may break the
+  rules of the format by adding more data in the distance codes. */
+  for (; length >= 0; --length) {
+    if (length == 0) {
+      return;
+    }
+    if (counts[length - 1] != 0) {
+      /* Now counts[0..length - 1] does not have trailing zeros. */
+      break;
+    }
+  }
+  /* 2) Let's mark all population counts that already can be encoded
+  with an rle code.*/
+  good_for_rle = (int*)malloc(length * sizeof(int));
+  for (i = 0; i < length; ++i) good_for_rle[i] = 0;
+
+  /* Let's not spoil any of the existing good rle codes.
+  Mark any seq of 0's that is longer than 5 as a good_for_rle.
+  Mark any seq of non-0's that is longer than 7 as a good_for_rle.*/
+  symbol = counts[0];
+  stride = 0;
+  for (i = 0; i < length + 1; ++i) {
+    if (i == length || counts[i] != symbol) {
+      if ((symbol == 0 && stride >= 5) || (symbol != 0 && stride >= 7)) {
+        for (k = 0; k < stride; ++k) {
+          good_for_rle[i - k - 1] = 1;
+        }
+      }
+      stride = 1;
+      if (i != length) {
+        symbol = counts[i];
+      }
+    } else {
+      ++stride;
+    }
+  }
+
+  /* 3) Let's replace those population counts that lead to more rle codes. */
+  stride = 0;
+  limit = counts[0];
+  sum = 0;
+  for (i = 0; i < length + 1; ++i) {
+    if (i == length || good_for_rle[i]
+        /* Heuristic for selecting the stride ranges to collapse. */
+        || AbsDiff(counts[i], limit) >= 4) {
+      if (stride >= 4 || (stride >= 3 && sum == 0)) {
+        /* The stride must end, collapse what we have, if we have enough (4). */
+        int count = (sum + stride / 2) / stride;
+        if (count < 1) count = 1;
+        if (sum == 0) {
+          /* Don't make an all zeros stride to be upgraded to ones. */
+          count = 0;
+        }
+        for (k = 0; k < stride; ++k) {
+          /* We don't want to change value at counts[i],
+          that is already belonging to the next stride. Thus - 1. */
+          counts[i - k - 1] = count;
+        }
+      }
+      stride = 0;
+      sum = 0;
+      if (i < length - 3) {
+        /* All interesting strides have a count of at least 4,
+        at least when non-zeros. */
+        limit = (counts[i] + counts[i + 1] +
+                 counts[i + 2] + counts[i + 3] + 2) / 4;
+      } else if (i < length) {
+        limit = counts[i];
+      } else {
+        limit = 0;
+      }
+    }
+    ++stride;
+    if (i != length) {
+      sum += counts[i];
+    }
+  }
+
+  free(good_for_rle);
+}
+
+/*
+Calculates the bit lengths for the symbols for dynamic blocks. Chooses bit
+lengths that give the smallest size of tree encoding + encoding of all the
+symbols to have smallest output size. This are not necessarily the ideal Huffman
+bit lengths.
+*/
+static void GetDynamicLengths(const unsigned short* litlens,
+                              const unsigned short* dists,
+                              size_t lstart, size_t lend,
+                              unsigned* ll_lengths, unsigned* d_lengths) {
+  size_t ll_counts[288];
+  size_t d_counts[32];
+
+  ZopfliLZ77Counts(litlens, dists, lstart, lend, ll_counts, d_counts);
+  OptimizeHuffmanForRle(288, ll_counts);
+  OptimizeHuffmanForRle(32, d_counts);
+  ZopfliCalculateBitLengths(ll_counts, 288, 15, ll_lengths);
+  ZopfliCalculateBitLengths(d_counts, 32, 15, d_lengths);
+  PatchDistanceCodesForBuggyDecoders(d_lengths);
+}
+
+double ZopfliCalculateBlockSize(const unsigned short* litlens,
+                                const unsigned short* dists,
+                                size_t lstart, size_t lend, int btype) {
+  unsigned ll_lengths[288];
+  unsigned d_lengths[32];
+
+  double result = 3; /* bfinal and btype bits */
+
+  assert(btype == 1 || btype == 2); /* This is not for uncompressed blocks. */
+
+  if(btype == 1) {
+    GetFixedTree(ll_lengths, d_lengths);
+  } else {
+    GetDynamicLengths(litlens, dists, lstart, lend, ll_lengths, d_lengths);
+    result += CalculateTreeSize(ll_lengths, d_lengths);
+  }
+
+  result += CalculateBlockSymbolSize(
+      ll_lengths, d_lengths, litlens, dists, lstart, lend);
+
+  return result;
+}
+
+/*
+Adds a deflate block with the given LZ77 data to the output.
+options: global program options
+btype: the block type, must be 1 or 2
+final: whether to set the "final" bit on this block, must be the last block
+litlens: literal/length array of the LZ77 data, in the same format as in
+    ZopfliLZ77Store.
+dists: distance array of the LZ77 data, in the same format as in
+    ZopfliLZ77Store.
+lstart: where to start in the LZ77 data
+lend: where to end in the LZ77 data (not inclusive)
+expected_data_size: the uncompressed block size, used for assert, but you can
+  set it to 0 to not do the assertion.
+bp: output bit pointer
+out: dynamic output array to append to
+outsize: dynamic output array size
+*/
+static void AddLZ77Block(const ZopfliOptions* options, int btype, int final,
+                         const unsigned short* litlens,
+                         const unsigned short* dists,
+                         size_t lstart, size_t lend,
+                         size_t expected_data_size,
+                         unsigned char* bp,
+                         unsigned char** out, size_t* outsize) {
+  unsigned ll_lengths[288];
+  unsigned d_lengths[32];
+  unsigned ll_symbols[288];
+  unsigned d_symbols[32];
+  size_t detect_block_size = *outsize;
+  size_t compressed_size;
+  size_t uncompressed_size = 0;
+  size_t i;
+
+  AddBit(final, bp, out, outsize);
+  AddBit(btype & 1, bp, out, outsize);
+  AddBit((btype & 2) >> 1, bp, out, outsize);
+
+  if (btype == 1) {
+    /* Fixed block. */
+    GetFixedTree(ll_lengths, d_lengths);
+  } else {
+    /* Dynamic block. */
+    unsigned detect_tree_size;
+    assert(btype == 2);
+
+    GetDynamicLengths(litlens, dists, lstart, lend, ll_lengths, d_lengths);
+
+    detect_tree_size = *outsize;
+    AddDynamicTree(ll_lengths, d_lengths, bp, out, outsize);
+    if (options->verbose) {
+      fprintf(stderr, "treesize: %d\n", (int)(*outsize - detect_tree_size));
+    }
+  }
+
+  ZopfliLengthsToSymbols(ll_lengths, 288, 15, ll_symbols);
+  ZopfliLengthsToSymbols(d_lengths, 32, 15, d_symbols);
+
+  detect_block_size = *outsize;
+  AddLZ77Data(litlens, dists, lstart, lend, expected_data_size,
+              ll_symbols, ll_lengths, d_symbols, d_lengths,
+              bp, out, outsize);
+  /* End symbol. */
+  AddHuffmanBits(ll_symbols[256], ll_lengths[256], bp, out, outsize);
+
+  for (i = lstart; i < lend; i++) {
+    uncompressed_size += dists[i] == 0 ? 1 : litlens[i];
+  }
+  compressed_size = *outsize - detect_block_size;
+  if (options->verbose) {
+    fprintf(stderr, "compressed block size: %d (%dk) (unc: %d)\n",
+           (int)compressed_size, (int)(compressed_size / 1024),
+           (int)(uncompressed_size));
+  }
+}
+
+static void DeflateDynamicBlock(const ZopfliOptions* options, int final,
+                                const unsigned char* in,
+                                size_t instart, size_t inend,
+                                unsigned char* bp,
+                                unsigned char** out, size_t* outsize) {
+  ZopfliBlockState s;
+  size_t blocksize = inend - instart;
+  ZopfliLZ77Store store;
+  int btype = 2;
+
+  ZopfliInitLZ77Store(&store);
+
+  s.options = options;
+  s.blockstart = instart;
+  s.blockend = inend;
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  s.lmc = (ZopfliLongestMatchCache*)malloc(sizeof(ZopfliLongestMatchCache));
+  ZopfliInitCache(blocksize, s.lmc);
+#endif
+
+  ZopfliLZ77Optimal(&s, in, instart, inend, &store);
+
+  /* For small block, encoding with fixed tree can be smaller. For large block,
+  don't bother doing this expensive test, dynamic tree will be better.*/
+  if (store.size < 1000) {
+    double dyncost, fixedcost;
+    ZopfliLZ77Store fixedstore;
+    ZopfliInitLZ77Store(&fixedstore);
+    ZopfliLZ77OptimalFixed(&s, in, instart, inend, &fixedstore);
+    dyncost = ZopfliCalculateBlockSize(store.litlens, store.dists,
+        0, store.size, 2);
+    fixedcost = ZopfliCalculateBlockSize(fixedstore.litlens, fixedstore.dists,
+        0, fixedstore.size, 1);
+    if (fixedcost < dyncost) {
+      btype = 1;
+      ZopfliCleanLZ77Store(&store);
+      store = fixedstore;
+    } else {
+      ZopfliCleanLZ77Store(&fixedstore);
+    }
+  }
+
+  AddLZ77Block(s.options, btype, final,
+               store.litlens, store.dists, 0, store.size,
+               blocksize, bp, out, outsize);
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  ZopfliCleanCache(s.lmc);
+  free(s.lmc);
+#endif
+  ZopfliCleanLZ77Store(&store);
+}
+
+static void DeflateFixedBlock(const ZopfliOptions* options, int final,
+                              const unsigned char* in,
+                              size_t instart, size_t inend,
+                              unsigned char* bp,
+                              unsigned char** out, size_t* outsize) {
+  ZopfliBlockState s;
+  size_t blocksize = inend - instart;
+  ZopfliLZ77Store store;
+
+  ZopfliInitLZ77Store(&store);
+
+  s.options = options;
+  s.blockstart = instart;
+  s.blockend = inend;
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  s.lmc = (ZopfliLongestMatchCache*)malloc(sizeof(ZopfliLongestMatchCache));
+  ZopfliInitCache(blocksize, s.lmc);
+#endif
+
+  ZopfliLZ77OptimalFixed(&s, in, instart, inend, &store);
+
+  AddLZ77Block(s.options, 1, final, store.litlens, store.dists, 0, store.size,
+               blocksize, bp, out, outsize);
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  ZopfliCleanCache(s.lmc);
+  free(s.lmc);
+#endif
+  ZopfliCleanLZ77Store(&store);
+}
+
+static void DeflateNonCompressedBlock(const ZopfliOptions* options, int final,
+                                      const unsigned char* in, size_t instart,
+                                      size_t inend,
+                                      unsigned char* bp,
+                                      unsigned char** out, size_t* outsize) {
+  size_t i;
+  size_t blocksize = inend - instart;
+  unsigned short nlen = ~blocksize;
+
+  (void)options;
+  assert(blocksize < 65536);  /* Non compressed blocks are max this size. */
+
+  AddBit(final, bp, out, outsize);
+  /* BTYPE 00 */
+  AddBit(0, bp, out, outsize);
+  AddBit(0, bp, out, outsize);
+
+  /* Any bits of input up to the next byte boundary are ignored. */
+  *bp = 0;
+
+  ZOPFLI_APPEND_DATA(blocksize % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((blocksize / 256) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA(nlen % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((nlen / 256) % 256, out, outsize);
+
+  for (i = instart; i < inend; i++) {
+    ZOPFLI_APPEND_DATA(in[i], out, outsize);
+  }
+}
+
+static void DeflateBlock(const ZopfliOptions* options,
+                         int btype, int final,
+                         const unsigned char* in, size_t instart, size_t inend,
+                         unsigned char* bp,
+                         unsigned char** out, size_t* outsize) {
+  if (btype == 0) {
+    DeflateNonCompressedBlock(
+        options, final, in, instart, inend, bp, out, outsize);
+  } else if (btype == 1) {
+     DeflateFixedBlock(options, final, in, instart, inend, bp, out, outsize);
+  } else {
+    assert (btype == 2);
+    DeflateDynamicBlock(options, final, in, instart, inend, bp, out, outsize);
+  }
+}
+
+/*
+Does squeeze strategy where first block splitting is done, then each block is
+squeezed.
+Parameters: see description of the ZopfliDeflate function.
+*/
+static void DeflateSplittingFirst(const ZopfliOptions* options,
+                                  int btype, int final,
+                                  const unsigned char* in,
+                                  size_t instart, size_t inend,
+                                  unsigned char* bp,
+                                  unsigned char** out, size_t* outsize) {
+  size_t i;
+  size_t* splitpoints = 0;
+  size_t npoints = 0;
+  if (btype == 0) {
+    ZopfliBlockSplitSimple(in, instart, inend, 65535, &splitpoints, &npoints);
+  } else if (btype == 1) {
+    /* If all blocks are fixed tree, splitting into separate blocks only
+    increases the total size. Leave npoints at 0, this represents 1 block. */
+  } else {
+    ZopfliBlockSplit(options, in, instart, inend,
+                     options->blocksplittingmax, &splitpoints, &npoints);
+  }
+
+  for (i = 0; i <= npoints; i++) {
+    size_t start = i == 0 ? instart : splitpoints[i - 1];
+    size_t end = i == npoints ? inend : splitpoints[i];
+    DeflateBlock(options, btype, i == npoints && final, in, start, end,
+                 bp, out, outsize);
+  }
+
+  free(splitpoints);
+}
+
+/*
+Does squeeze strategy where first the best possible lz77 is done, and then based
+on that data, block splitting is done.
+Parameters: see description of the ZopfliDeflate function.
+*/
+static void DeflateSplittingLast(const ZopfliOptions* options,
+                                 int btype, int final,
+                                 const unsigned char* in,
+                                 size_t instart, size_t inend,
+                                 unsigned char* bp,
+                                 unsigned char** out, size_t* outsize) {
+  size_t i;
+  ZopfliBlockState s;
+  ZopfliLZ77Store store;
+  size_t* splitpoints = 0;
+  size_t npoints = 0;
+
+  if (btype == 0) {
+    /* This function only supports LZ77 compression. DeflateSplittingFirst
+       supports the special case of noncompressed data. Punt it to that one. */
+    DeflateSplittingFirst(options, btype, final,
+                          in, instart, inend,
+                          bp, out, outsize);
+  }
+  assert(btype == 1 || btype == 2);
+
+  ZopfliInitLZ77Store(&store);
+
+  s.options = options;
+  s.blockstart = instart;
+  s.blockend = inend;
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  s.lmc = (ZopfliLongestMatchCache*)malloc(sizeof(ZopfliLongestMatchCache));
+  ZopfliInitCache(inend - instart, s.lmc);
+#endif
+
+  if (btype == 2) {
+    ZopfliLZ77Optimal(&s, in, instart, inend, &store);
+  } else {
+    assert (btype == 1);
+    ZopfliLZ77OptimalFixed(&s, in, instart, inend, &store);
+  }
+
+  if (btype == 1) {
+    /* If all blocks are fixed tree, splitting into separate blocks only
+    increases the total size. Leave npoints at 0, this represents 1 block. */
+  } else {
+    ZopfliBlockSplitLZ77(options, store.litlens, store.dists, store.size,
+                         options->blocksplittingmax, &splitpoints, &npoints);
+  }
+
+  for (i = 0; i <= npoints; i++) {
+    size_t start = i == 0 ? 0 : splitpoints[i - 1];
+    size_t end = i == npoints ? store.size : splitpoints[i];
+    AddLZ77Block(options, btype, i == npoints && final,
+                 store.litlens, store.dists, start, end, 0,
+                 bp, out, outsize);
+  }
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  ZopfliCleanCache(s.lmc);
+  free(s.lmc);
+#endif
+
+  ZopfliCleanLZ77Store(&store);
+  free(splitpoints);
+}
+
+/*
+Deflate a part, to allow ZopfliDeflate() to use multiple master blocks if
+needed.
+It is possible to call this function multiple times in a row, shifting
+instart and inend to next bytes of the data. If instart is larger than 0, then
+previous bytes are used as the initial dictionary for LZ77.
+This function will usually output multiple deflate blocks. If final is 1, then
+the final bit will be set on the last block.
+*/
+void ZopfliDeflatePart(const ZopfliOptions* options, int btype, int final,
+                       const unsigned char* in, size_t instart, size_t inend,
+                       unsigned char* bp, unsigned char** out,
+                       size_t* outsize) {
+  if (options->blocksplitting) {
+    if (options->blocksplittinglast) {
+      DeflateSplittingLast(options, btype, final, in, instart, inend,
+                           bp, out, outsize);
+    } else {
+      DeflateSplittingFirst(options, btype, final, in, instart, inend,
+                            bp, out, outsize);
+    }
+  } else {
+    DeflateBlock(options, btype, final, in, instart, inend, bp, out, outsize);
+  }
+}
+
+void ZopfliDeflate(const ZopfliOptions* options, int btype, int final,
+                   const unsigned char* in, size_t insize,
+                   unsigned char* bp, unsigned char** out, size_t* outsize) {
+#if ZOPFLI_MASTER_BLOCK_SIZE == 0
+  ZopfliDeflatePart(options, btype, final, in, 0, insize, bp, out, outsize);
+#else
+  size_t i = 0;
+  while (i < insize) {
+    int masterfinal = (i + ZOPFLI_MASTER_BLOCK_SIZE >= insize);
+    int final2 = final && masterfinal;
+    size_t size = masterfinal ? insize - i : ZOPFLI_MASTER_BLOCK_SIZE;
+    ZopfliDeflatePart(options, btype, final2,
+                      in, i, i + size, bp, out, outsize);
+    i += size;
+  }
+#endif
+  if (options->verbose) {
+    fprintf(stderr,
+            "Original Size: %d, Deflate: %d, Compression: %f%% Removed\n",
+            (int)insize, (int)*outsize,
+            100.0 * (double)(insize - *outsize) / (double)insize);
+  }
+}
diff --git a/src/zopfli/deflate.h b/src/zopfli/deflate.h
new file mode 100644
index 0000000..189c77a
--- /dev/null
+++ b/src/zopfli/deflate.h
@@ -0,0 +1,86 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#ifndef ZOPFLI_DEFLATE_H_
+#define ZOPFLI_DEFLATE_H_
+
+/*
+Functions to compress according to the DEFLATE specification, using the
+"squeeze" LZ77 compression backend.
+*/
+
+#include "zopfli.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+Compresses according to the deflate specification and append the compressed
+result to the output.
+This function will usually output multiple deflate blocks. If final is 1, then
+the final bit will be set on the last block.
+
+options: global program options
+btype: the deflate block type. Use 2 for best compression.
+  -0: non compressed blocks (00)
+  -1: blocks with fixed tree (01)
+  -2: blocks with dynamic tree (10)
+final: whether this is the last section of the input, sets the final bit to the
+  last deflate block.
+in: the input bytes
+insize: number of input bytes
+bp: bit pointer for the output array. This must initially be 0, and for
+  consecutive calls must be reused (it can have values from 0-7). This is
+  because deflate appends blocks as bit-based data, rather than on byte
+  boundaries.
+out: pointer to the dynamic output array to which the result is appended. Must
+  be freed after use.
+outsize: pointer to the dynamic output array size.
+*/
+void ZopfliDeflate(const ZopfliOptions* options, int btype, int final,
+                   const unsigned char* in, size_t insize,
+                   unsigned char* bp, unsigned char** out, size_t* outsize);
+
+/*
+Like ZopfliDeflate, but allows to specify start and end byte with instart and
+inend. Only that part is compressed, but earlier bytes are still used for the
+back window.
+*/
+void ZopfliDeflatePart(const ZopfliOptions* options, int btype, int final,
+                       const unsigned char* in, size_t instart, size_t inend,
+                       unsigned char* bp, unsigned char** out,
+                       size_t* outsize);
+
+/*
+Calculates block size in bits.
+litlens: lz77 lit/lengths
+dists: ll77 distances
+lstart: start of block
+lend: end of block (not inclusive)
+*/
+double ZopfliCalculateBlockSize(const unsigned short* litlens,
+                                const unsigned short* dists,
+                                size_t lstart, size_t lend, int btype);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  /* ZOPFLI_DEFLATE_H_ */
diff --git a/src/zopfli/gzip_container.c b/src/zopfli/gzip_container.c
new file mode 100644
index 0000000..8a062f2
--- /dev/null
+++ b/src/zopfli/gzip_container.c
@@ -0,0 +1,117 @@
+/*
+Copyright 2013 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "gzip_container.h"
+#include "util.h"
+
+#include <stdio.h>
+
+#include "deflate.h"
+
+/* Table of CRCs of all 8-bit messages. */
+static unsigned long crc_table[256];
+
+/* Flag: has the table been computed? Initially false. */
+static int crc_table_computed = 0;
+
+/* Makes the table for a fast CRC. */
+static void MakeCRCTable() {
+  unsigned long c;
+  int n, k;
+  for (n = 0; n < 256; n++) {
+    c = (unsigned long) n;
+    for (k = 0; k < 8; k++) {
+      if (c & 1) {
+        c = 0xedb88320L ^ (c >> 1);
+      } else {
+        c = c >> 1;
+      }
+    }
+    crc_table[n] = c;
+  }
+  crc_table_computed = 1;
+}
+
+
+/*
+Updates a running crc with the bytes buf[0..len-1] and returns
+the updated crc. The crc should be initialized to zero.
+*/
+static unsigned long UpdateCRC(unsigned long crc,
+                               const unsigned char *buf, size_t len) {
+  unsigned long c = crc ^ 0xffffffffL;
+  unsigned n;
+
+  if (!crc_table_computed)
+    MakeCRCTable();
+  for (n = 0; n < len; n++) {
+    c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
+  }
+  return c ^ 0xffffffffL;
+}
+
+/* Returns the CRC of the bytes buf[0..len-1]. */
+static unsigned long CRC(const unsigned char* buf, int len) {
+  return UpdateCRC(0L, buf, len);
+}
+
+/*
+Compresses the data according to the gzip specification.
+*/
+void ZopfliGzipCompress(const ZopfliOptions* options,
+                        const unsigned char* in, size_t insize,
+                        unsigned char** out, size_t* outsize) {
+  unsigned long crcvalue = CRC(in, insize);
+  unsigned char bp = 0;
+
+  ZOPFLI_APPEND_DATA(31, out, outsize);  /* ID1 */
+  ZOPFLI_APPEND_DATA(139, out, outsize);  /* ID2 */
+  ZOPFLI_APPEND_DATA(8, out, outsize);  /* CM */
+  ZOPFLI_APPEND_DATA(0, out, outsize);  /* FLG */
+  /* MTIME */
+  ZOPFLI_APPEND_DATA(0, out, outsize);
+  ZOPFLI_APPEND_DATA(0, out, outsize);
+  ZOPFLI_APPEND_DATA(0, out, outsize);
+  ZOPFLI_APPEND_DATA(0, out, outsize);
+
+  ZOPFLI_APPEND_DATA(2, out, outsize);  /* XFL, 2 indicates best compression. */
+  ZOPFLI_APPEND_DATA(3, out, outsize);  /* OS follows Unix conventions. */
+
+  ZopfliDeflate(options, 2 /* Dynamic block */, 1,
+                in, insize, &bp, out, outsize);
+
+  /* CRC */
+  ZOPFLI_APPEND_DATA(crcvalue % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((crcvalue >> 8) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((crcvalue >> 16) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((crcvalue >> 24) % 256, out, outsize);
+
+  /* ISIZE */
+  ZOPFLI_APPEND_DATA(insize % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((insize >> 8) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((insize >> 16) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((insize >> 24) % 256, out, outsize);
+
+  if (options->verbose) {
+    fprintf(stderr,
+            "Original Size: %d, Gzip: %d, Compression: %f%% Removed\n",
+            (int)insize, (int)*outsize,
+            100.0 * (double)(insize - *outsize) / (double)insize);
+  }
+}
diff --git a/src/zopfli/gzip_container.h b/src/zopfli/gzip_container.h
new file mode 100644
index 0000000..8f5ed90
--- /dev/null
+++ b/src/zopfli/gzip_container.h
@@ -0,0 +1,50 @@
+/*
+Copyright 2013 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#ifndef ZOPFLI_GZIP_H_
+#define ZOPFLI_GZIP_H_
+
+/*
+Functions to compress according to the Gzip specification.
+*/
+
+#include "zopfli.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+Compresses according to the gzip specification and append the compressed
+result to the output.
+
+options: global program options
+out: pointer to the dynamic output array to which the result is appended. Must
+  be freed after use.
+outsize: pointer to the dynamic output array size.
+*/
+void ZopfliGzipCompress(const ZopfliOptions* options,
+                        const unsigned char* in, size_t insize,
+                        unsigned char** out, size_t* outsize);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  /* ZOPFLI_GZIP_H_ */
diff --git a/src/zopfli/hash.c b/src/zopfli/hash.c
new file mode 100644
index 0000000..a3b294f
--- /dev/null
+++ b/src/zopfli/hash.c
@@ -0,0 +1,135 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "hash.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#define HASH_SHIFT 5
+#define HASH_MASK 32767
+
+void ZopfliInitHash(size_t window_size, ZopfliHash* h) {
+  size_t i;
+
+  h->val = 0;
+  h->head = (int*)malloc(sizeof(*h->head) * 65536);
+  h->prev = (unsigned short*)malloc(sizeof(*h->prev) * window_size);
+  h->hashval = (int*)malloc(sizeof(*h->hashval) * window_size);
+  for (i = 0; i < 65536; i++) {
+    h->head[i] = -1;  /* -1 indicates no head so far. */
+  }
+  for (i = 0; i < window_size; i++) {
+    h->prev[i] = i;  /* If prev[j] == j, then prev[j] is uninitialized. */
+    h->hashval[i] = -1;
+  }
+
+#ifdef ZOPFLI_HASH_SAME
+  h->same = (unsigned short*)malloc(sizeof(*h->same) * window_size);
+  for (i = 0; i < window_size; i++) {
+    h->same[i] = 0;
+  }
+#endif
+
+#ifdef ZOPFLI_HASH_SAME_HASH
+  h->val2 = 0;
+  h->head2 = (int*)malloc(sizeof(*h->head2) * 65536);
+  h->prev2 = (unsigned short*)malloc(sizeof(*h->prev2) * window_size);
+  h->hashval2 = (int*)malloc(sizeof(*h->hashval2) * window_size);
+  for (i = 0; i < 65536; i++) {
+    h->head2[i] = -1;
+  }
+  for (i = 0; i < window_size; i++) {
+    h->prev2[i] = i;
+    h->hashval2[i] = -1;
+  }
+#endif
+}
+
+void ZopfliCleanHash(ZopfliHash* h) {
+  free(h->head);
+  free(h->prev);
+  free(h->hashval);
+
+#ifdef ZOPFLI_HASH_SAME_HASH
+  free(h->head2);
+  free(h->prev2);
+  free(h->hashval2);
+#endif
+
+#ifdef ZOPFLI_HASH_SAME
+  free(h->same);
+#endif
+}
+
+/*
+Update the sliding hash value with the given byte. All calls to this function
+must be made on consecutive input characters. Since the hash value exists out
+of multiple input bytes, a few warmups with this function are needed initially.
+*/
+static void UpdateHashValue(ZopfliHash* h, unsigned char c) {
+  h->val = (((h->val) << HASH_SHIFT) ^ (c)) & HASH_MASK;
+}
+
+void ZopfliUpdateHash(const unsigned char* array, size_t pos, size_t end,
+                ZopfliHash* h) {
+  unsigned short hpos = pos & ZOPFLI_WINDOW_MASK;
+#ifdef ZOPFLI_HASH_SAME
+  size_t amount = 0;
+#endif
+
+  UpdateHashValue(h, pos + ZOPFLI_MIN_MATCH <= end ?
+      array[pos + ZOPFLI_MIN_MATCH - 1] : 0);
+  h->hashval[hpos] = h->val;
+  if (h->head[h->val] != -1 && h->hashval[h->head[h->val]] == h->val) {
+    h->prev[hpos] = h->head[h->val];
+  }
+  else h->prev[hpos] = hpos;
+  h->head[h->val] = hpos;
+
+#ifdef ZOPFLI_HASH_SAME
+  /* Update "same". */
+  if (h->same[(pos - 1) & ZOPFLI_WINDOW_MASK] > 1) {
+    amount = h->same[(pos - 1) & ZOPFLI_WINDOW_MASK] - 1;
+  }
+  while (pos + amount + 1 < end &&
+      array[pos] == array[pos + amount + 1] && amount < (unsigned short)(-1)) {
+    amount++;
+  }
+  h->same[hpos] = amount;
+#endif
+
+#ifdef ZOPFLI_HASH_SAME_HASH
+  h->val2 = ((h->same[hpos] - ZOPFLI_MIN_MATCH) & 255) ^ h->val;
+  h->hashval2[hpos] = h->val2;
+  if (h->head2[h->val2] != -1 && h->hashval2[h->head2[h->val2]] == h->val2) {
+    h->prev2[hpos] = h->head2[h->val2];
+  }
+  else h->prev2[hpos] = hpos;
+  h->head2[h->val2] = hpos;
+#endif
+}
+
+void ZopfliWarmupHash(const unsigned char* array, size_t pos, size_t end,
+                ZopfliHash* h) {
+  (void)end;
+  UpdateHashValue(h, array[pos + 0]);
+  UpdateHashValue(h, array[pos + 1]);
+}
diff --git a/src/zopfli/hash.h b/src/zopfli/hash.h
new file mode 100644
index 0000000..79c2479
--- /dev/null
+++ b/src/zopfli/hash.h
@@ -0,0 +1,70 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+The hash for ZopfliFindLongestMatch of lz77.c.
+*/
+
+#ifndef ZOPFLI_HASH_H_
+#define ZOPFLI_HASH_H_
+
+#include "util.h"
+
+typedef struct ZopfliHash {
+  int* head;  /* Hash value to index of its most recent occurance. */
+  unsigned short* prev;  /* Index to index of prev. occurance of same hash. */
+  int* hashval;  /* Index to hash value at this index. */
+  int val;  /* Current hash value. */
+
+#ifdef ZOPFLI_HASH_SAME_HASH
+  /* Fields with similar purpose as the above hash, but for the second hash with
+  a value that is calculated differently.  */
+  int* head2;  /* Hash value to index of its most recent occurance. */
+  unsigned short* prev2;  /* Index to index of prev. occurance of same hash. */
+  int* hashval2;  /* Index to hash value at this index. */
+  int val2;  /* Current hash value. */
+#endif
+
+#ifdef ZOPFLI_HASH_SAME
+  unsigned short* same;  /* Amount of repetitions of same byte after this .*/
+#endif
+} ZopfliHash;
+
+/* Allocates and initializes all fields of ZopfliHash. */
+void ZopfliInitHash(size_t window_size, ZopfliHash* h);
+
+/* Frees all fields of ZopfliHash. */
+void ZopfliCleanHash(ZopfliHash* h);
+
+/*
+Updates the hash values based on the current position in the array. All calls
+to this must be made for consecutive bytes.
+*/
+void ZopfliUpdateHash(const unsigned char* array, size_t pos, size_t end,
+                      ZopfliHash* h);
+
+/*
+Prepopulates hash:
+Fills in the initial values in the hash, before ZopfliUpdateHash can be used
+correctly.
+*/
+void ZopfliWarmupHash(const unsigned char* array, size_t pos, size_t end,
+                      ZopfliHash* h);
+
+#endif  /* ZOPFLI_HASH_H_ */
diff --git a/src/zopfli/katajainen.c b/src/zopfli/katajainen.c
new file mode 100644
index 0000000..783ea08
--- /dev/null
+++ b/src/zopfli/katajainen.c
@@ -0,0 +1,251 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Bounded package merge algorithm, based on the paper
+"A Fast and Space-Economical Algorithm for Length-Limited Coding
+Jyrki Katajainen, Alistair Moffat, Andrew Turpin".
+*/
+
+#include "katajainen.h"
+#include <assert.h>
+#include <stdlib.h>
+
+typedef struct Node Node;
+
+/*
+Nodes forming chains. Also used to represent leaves.
+*/
+struct Node {
+  size_t weight;  /* Total weight (symbol count) of this chain. */
+  Node* tail;  /* Previous node(s) of this chain, or 0 if none. */
+  int count;  /* Leaf symbol index, or number of leaves before this chain. */
+  char inuse;  /* Tracking for garbage collection. */
+};
+
+/*
+Memory pool for nodes.
+*/
+typedef struct NodePool {
+  Node* nodes;  /* The pool. */
+  Node* next;  /* Pointer to a possibly free node in the pool. */
+  int size;  /* Size of the memory pool. */
+} NodePool;
+
+/*
+Initializes a chain node with the given values and marks it as in use.
+*/
+static void InitNode(size_t weight, int count, Node* tail, Node* node) {
+  node->weight = weight;
+  node->count = count;
+  node->tail = tail;
+  node->inuse = 1;
+}
+
+/*
+Finds a free location in the memory pool. Performs garbage collection if needed.
+lists: If given, used to mark in-use nodes during garbage collection.
+maxbits: Size of lists.
+pool: Memory pool to get free node from.
+*/
+static Node* GetFreeNode(Node* (*lists)[2], int maxbits, NodePool* pool) {
+  for (;;) {
+    if (pool->next >= &pool->nodes[pool->size]) {
+      /* Garbage collection. */
+      int i;
+      for (i = 0; i < pool->size; i++) {
+        pool->nodes[i].inuse = 0;
+      }
+      if (lists) {
+        for (i = 0; i < maxbits * 2; i++) {
+          Node* node;
+          for (node = lists[i / 2][i % 2]; node; node = node->tail) {
+            node->inuse = 1;
+          }
+        }
+      }
+      pool->next = &pool->nodes[0];
+    }
+    if (!pool->next->inuse) break;  /* Found one. */
+    pool->next++;
+  }
+  return pool->next++;
+}
+
+
+/*
+Performs a Boundary Package-Merge step. Puts a new chain in the given list. The
+new chain is, depending on the weights, a leaf or a combination of two chains
+from the previous list.
+lists: The lists of chains.
+maxbits: Number of lists.
+leaves: The leaves, one per symbol.
+numsymbols: Number of leaves.
+pool: the node memory pool.
+index: The index of the list in which a new chain or leaf is required.
+final: Whether this is the last time this function is called. If it is then it
+  is no more needed to recursively call self.
+*/
+static void BoundaryPM(Node* (*lists)[2], int maxbits,
+    Node* leaves, int numsymbols, NodePool* pool, int index, char final) {
+  Node* newchain;
+  Node* oldchain;
+  int lastcount = lists[index][1]->count;  /* Count of last chain of list. */
+
+  if (index == 0 && lastcount >= numsymbols) return;
+
+  newchain = GetFreeNode(lists, maxbits, pool);
+  oldchain = lists[index][1];
+
+  /* These are set up before the recursive calls below, so that there is a list
+  pointing to the new node, to let the garbage collection know it's in use. */
+  lists[index][0] = oldchain;
+  lists[index][1] = newchain;
+
+  if (index == 0) {
+    /* New leaf node in list 0. */
+    InitNode(leaves[lastcount].weight, lastcount + 1, 0, newchain);
+  } else {
+    size_t sum = lists[index - 1][0]->weight + lists[index - 1][1]->weight;
+    if (lastcount < numsymbols && sum > leaves[lastcount].weight) {
+      /* New leaf inserted in list, so count is incremented. */
+      InitNode(leaves[lastcount].weight, lastcount + 1, oldchain->tail,
+          newchain);
+    } else {
+      InitNode(sum, lastcount, lists[index - 1][1], newchain);
+      if (!final) {
+        /* Two lookahead chains of previous list used up, create new ones. */
+        BoundaryPM(lists, maxbits, leaves, numsymbols, pool, index - 1, 0);
+        BoundaryPM(lists, maxbits, leaves, numsymbols, pool, index - 1, 0);
+      }
+    }
+  }
+}
+
+/*
+Initializes each list with as lookahead chains the two leaves with lowest
+weights.
+*/
+static void InitLists(
+    NodePool* pool, const Node* leaves, int maxbits, Node* (*lists)[2]) {
+  int i;
+  Node* node0 = GetFreeNode(0, maxbits, pool);
+  Node* node1 = GetFreeNode(0, maxbits, pool);
+  InitNode(leaves[0].weight, 1, 0, node0);
+  InitNode(leaves[1].weight, 2, 0, node1);
+  for (i = 0; i < maxbits; i++) {
+    lists[i][0] = node0;
+    lists[i][1] = node1;
+  }
+}
+
+/*
+Converts result of boundary package-merge to the bitlengths. The result in the
+last chain of the last list contains the amount of active leaves in each list.
+chain: Chain to extract the bit length from (last chain from last list).
+*/
+static void ExtractBitLengths(Node* chain, Node* leaves, unsigned* bitlengths) {
+  Node* node;
+  for (node = chain; node; node = node->tail) {
+    int i;
+    for (i = 0; i < node->count; i++) {
+      bitlengths[leaves[i].count]++;
+    }
+  }
+}
+
+/*
+Comparator for sorting the leaves. Has the function signature for qsort.
+*/
+static int LeafComparator(const void* a, const void* b) {
+  return ((const Node*)a)->weight - ((const Node*)b)->weight;
+}
+
+int ZopfliLengthLimitedCodeLengths(
+    const size_t* frequencies, int n, int maxbits, unsigned* bitlengths) {
+  NodePool pool;
+  int i;
+  int numsymbols = 0;  /* Amount of symbols with frequency > 0. */
+  int numBoundaryPMRuns;
+
+  /* Array of lists of chains. Each list requires only two lookahead chains at
+  a time, so each list is a array of two Node*'s. */
+  Node* (*lists)[2];
+
+  /* One leaf per symbol. Only numsymbols leaves will be used. */
+  Node* leaves = (Node*)malloc(n * sizeof(*leaves));
+
+  /* Initialize all bitlengths at 0. */
+  for (i = 0; i < n; i++) {
+    bitlengths[i] = 0;
+  }
+
+  /* Count used symbols and place them in the leaves. */
+  for (i = 0; i < n; i++) {
+    if (frequencies[i]) {
+      leaves[numsymbols].weight = frequencies[i];
+      leaves[numsymbols].count = i;  /* Index of symbol this leaf represents. */
+      numsymbols++;
+    }
+  }
+
+  /* Check special cases and error conditions. */
+  if ((1 << maxbits) < numsymbols) {
+    free(leaves);
+    return 1;  /* Error, too few maxbits to represent symbols. */
+  }
+  if (numsymbols == 0) {
+    free(leaves);
+    return 0;  /* No symbols at all. OK. */
+  }
+  if (numsymbols == 1) {
+    bitlengths[leaves[0].count] = 1;
+    free(leaves);
+    return 0;  /* Only one symbol, give it bitlength 1, not 0. OK. */
+  }
+
+  /* Sort the leaves from lightest to heaviest. */
+  qsort(leaves, numsymbols, sizeof(Node), LeafComparator);
+
+  /* Initialize node memory pool. */
+  pool.size = 2 * maxbits * (maxbits + 1);
+  pool.nodes = (Node*)malloc(pool.size * sizeof(*pool.nodes));
+  pool.next = pool.nodes;
+  for (i = 0; i < pool.size; i++) {
+    pool.nodes[i].inuse = 0;
+  }
+
+  lists = (Node* (*)[2])malloc(maxbits * sizeof(*lists));
+  InitLists(&pool, leaves, maxbits, lists);
+
+  /* In the last list, 2 * numsymbols - 2 active chains need to be created. Two
+  are already created in the initialization. Each BoundaryPM run creates one. */
+  numBoundaryPMRuns = 2 * numsymbols - 4;
+  for (i = 0; i < numBoundaryPMRuns; i++) {
+    char final = i == numBoundaryPMRuns - 1;
+    BoundaryPM(lists, maxbits, leaves, numsymbols, &pool, maxbits - 1, final);
+  }
+
+  ExtractBitLengths(lists[maxbits - 1][1], leaves, bitlengths);
+
+  free(lists);
+  free(leaves);
+  free(pool.nodes);
+  return 0;  /* OK. */
+}
diff --git a/src/zopfli/katajainen.h b/src/zopfli/katajainen.h
new file mode 100644
index 0000000..ee8a91e
--- /dev/null
+++ b/src/zopfli/katajainen.h
@@ -0,0 +1,42 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#ifndef ZOPFLI_KATAJAINEN_H_
+#define ZOPFLI_KATAJAINEN_H_
+
+#include <string.h>
+
+/*
+Outputs minimum-redundancy length-limited code bitlengths for symbols with the
+given counts. The bitlengths are limited by maxbits.
+
+The output is tailored for DEFLATE: symbols that never occur, get a bit length
+of 0, and if only a single symbol occurs at least once, its bitlength will be 1,
+and not 0 as would theoretically be needed for a single symbol.
+
+frequencies: The amount of occurances of each symbol.
+n: The amount of symbols.
+maxbits: Maximum bit length, inclusive.
+bitlengths: Output, the bitlengths for the symbol prefix codes.
+return: 0 for OK, non-0 for error.
+*/
+int ZopfliLengthLimitedCodeLengths(
+    const size_t* frequencies, int n, int maxbits, unsigned* bitlengths);
+
+#endif  /* ZOPFLI_KATAJAINEN_H_ */
diff --git a/src/zopfli/lz77.c b/src/zopfli/lz77.c
new file mode 100644
index 0000000..26186b4
--- /dev/null
+++ b/src/zopfli/lz77.c
@@ -0,0 +1,482 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "lz77.h"
+#include "util.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+void ZopfliInitLZ77Store(ZopfliLZ77Store* store) {
+  store->size = 0;
+  store->litlens = 0;
+  store->dists = 0;
+}
+
+void ZopfliCleanLZ77Store(ZopfliLZ77Store* store) {
+  free(store->litlens);
+  free(store->dists);
+}
+
+void ZopfliCopyLZ77Store(
+    const ZopfliLZ77Store* source, ZopfliLZ77Store* dest) {
+  size_t i;
+  ZopfliCleanLZ77Store(dest);
+  dest->litlens =
+      (unsigned short*)malloc(sizeof(*dest->litlens) * source->size);
+  dest->dists = (unsigned short*)malloc(sizeof(*dest->dists) * source->size);
+
+  if (!dest->litlens || !dest->dists) exit(-1); /* Allocation failed. */
+
+  dest->size = source->size;
+  for (i = 0; i < source->size; i++) {
+    dest->litlens[i] = source->litlens[i];
+    dest->dists[i] = source->dists[i];
+  }
+}
+
+/*
+Appends the length and distance to the LZ77 arrays of the ZopfliLZ77Store.
+context must be a ZopfliLZ77Store*.
+*/
+void ZopfliStoreLitLenDist(unsigned short length, unsigned short dist,
+                           ZopfliLZ77Store* store) {
+  size_t size2 = store->size;  /* Needed for using ZOPFLI_APPEND_DATA twice. */
+  ZOPFLI_APPEND_DATA(length, &store->litlens, &store->size);
+  ZOPFLI_APPEND_DATA(dist, &store->dists, &size2);
+}
+
+/*
+Gets a score of the length given the distance. Typically, the score of the
+length is the length itself, but if the distance is very long, decrease the
+score of the length a bit to make up for the fact that long distances use large
+amounts of extra bits.
+
+This is not an accurate score, it is a heuristic only for the greedy LZ77
+implementation. More accurate cost models are employed later. Making this
+heuristic more accurate may hurt rather than improve compression.
+
+The two direct uses of this heuristic are:
+-avoid using a length of 3 in combination with a long distance. This only has
+ an effect if length == 3.
+-make a slightly better choice between the two options of the lazy matching.
+
+Indirectly, this affects:
+-the block split points if the default of block splitting first is used, in a
+ rather unpredictable way
+-the first zopfli run, so it affects the chance of the first run being closer
+ to the optimal output
+*/
+static int GetLengthScore(int length, int distance) {
+  /*
+  At 1024, the distance uses 9+ extra bits and this seems to be the sweet spot
+  on tested files.
+  */
+  return distance > 1024 ? length - 1 : length;
+}
+
+void ZopfliVerifyLenDist(const unsigned char* data, size_t datasize, size_t pos,
+                         unsigned short dist, unsigned short length) {
+
+  /* TODO(lode): make this only run in a debug compile, it's for assert only. */
+  size_t i;
+
+  assert(pos + length <= datasize);
+  for (i = 0; i < length; i++) {
+    if (data[pos - dist + i] != data[pos + i]) {
+      assert(data[pos - dist + i] == data[pos + i]);
+      break;
+    }
+  }
+}
+
+/*
+Finds how long the match of scan and match is. Can be used to find how many
+bytes starting from scan, and from match, are equal. Returns the last byte
+after scan, which is still equal to the correspondinb byte after match.
+scan is the position to compare
+match is the earlier position to compare.
+end is the last possible byte, beyond which to stop looking.
+safe_end is a few (8) bytes before end, for comparing multiple bytes at once.
+*/
+static const unsigned char* GetMatch(const unsigned char* scan,
+                                     const unsigned char* match,
+                                     const unsigned char* end,
+                                     const unsigned char* safe_end) {
+
+  if (sizeof(size_t) == 8) {
+    /* 8 checks at once per array bounds check (size_t is 64-bit). */
+    while (scan < safe_end && *((size_t*)scan) == *((size_t*)match)) {
+      scan += 8;
+      match += 8;
+    }
+  } else if (sizeof(unsigned int) == 4) {
+    /* 4 checks at once per array bounds check (unsigned int is 32-bit). */
+    while (scan < safe_end
+        && *((unsigned int*)scan) == *((unsigned int*)match)) {
+      scan += 4;
+      match += 4;
+    }
+  } else {
+    /* do 8 checks at once per array bounds check. */
+    while (scan < safe_end && *scan == *match && *++scan == *++match
+          && *++scan == *++match && *++scan == *++match
+          && *++scan == *++match && *++scan == *++match
+          && *++scan == *++match && *++scan == *++match) {
+      scan++; match++;
+    }
+  }
+
+  /* The remaining few bytes. */
+  while (scan != end && *scan == *match) {
+    scan++; match++;
+  }
+
+  return scan;
+}
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+/*
+Gets distance, length and sublen values from the cache if possible.
+Returns 1 if it got the values from the cache, 0 if not.
+Updates the limit value to a smaller one if possible with more limited
+information from the cache.
+*/
+static int TryGetFromLongestMatchCache(ZopfliBlockState* s,
+    size_t pos, size_t* limit,
+    unsigned short* sublen, unsigned short* distance, unsigned short* length) {
+  /* The LMC cache starts at the beginning of the block rather than the
+     beginning of the whole array. */
+  size_t lmcpos = pos - s->blockstart;
+
+  /* Length > 0 and dist 0 is invalid combination, which indicates on purpose
+     that this cache value is not filled in yet. */
+  unsigned char cache_available = s->lmc && (s->lmc->length[lmcpos] == 0 ||
+      s->lmc->dist[lmcpos] != 0);
+  unsigned char limit_ok_for_cache = cache_available &&
+      (*limit == ZOPFLI_MAX_MATCH || s->lmc->length[lmcpos] <= *limit ||
+      (sublen && ZopfliMaxCachedSublen(s->lmc,
+          lmcpos, s->lmc->length[lmcpos]) >= *limit));
+
+  if (s->lmc && limit_ok_for_cache && cache_available) {
+    if (!sublen || s->lmc->length[lmcpos]
+        <= ZopfliMaxCachedSublen(s->lmc, lmcpos, s->lmc->length[lmcpos])) {
+      *length = s->lmc->length[lmcpos];
+      if (*length > *limit) *length = *limit;
+      if (sublen) {
+        ZopfliCacheToSublen(s->lmc, lmcpos, *length, sublen);
+        *distance = sublen[*length];
+        if (*limit == ZOPFLI_MAX_MATCH && *length >= ZOPFLI_MIN_MATCH) {
+          assert(sublen[*length] == s->lmc->dist[lmcpos]);
+        }
+      } else {
+        *distance = s->lmc->dist[lmcpos];
+      }
+      return 1;
+    }
+    /* Can't use much of the cache, since the "sublens" need to be calculated,
+       but at  least we already know when to stop. */
+    *limit = s->lmc->length[lmcpos];
+  }
+
+  return 0;
+}
+
+/*
+Stores the found sublen, distance and length in the longest match cache, if
+possible.
+*/
+static void StoreInLongestMatchCache(ZopfliBlockState* s,
+    size_t pos, size_t limit,
+    const unsigned short* sublen,
+    unsigned short distance, unsigned short length) {
+  /* The LMC cache starts at the beginning of the block rather than the
+     beginning of the whole array. */
+  size_t lmcpos = pos - s->blockstart;
+
+  /* Length > 0 and dist 0 is invalid combination, which indicates on purpose
+     that this cache value is not filled in yet. */
+  unsigned char cache_available = s->lmc && (s->lmc->length[lmcpos] == 0 ||
+      s->lmc->dist[lmcpos] != 0);
+
+  if (s->lmc && limit == ZOPFLI_MAX_MATCH && sublen && !cache_available) {
+    assert(s->lmc->length[lmcpos] == 1 && s->lmc->dist[lmcpos] == 0);
+    s->lmc->dist[lmcpos] = length < ZOPFLI_MIN_MATCH ? 0 : distance;
+    s->lmc->length[lmcpos] = length < ZOPFLI_MIN_MATCH ? 0 : length;
+    assert(!(s->lmc->length[lmcpos] == 1 && s->lmc->dist[lmcpos] == 0));
+    ZopfliSublenToCache(sublen, lmcpos, length, s->lmc);
+  }
+}
+#endif
+
+void ZopfliFindLongestMatch(ZopfliBlockState* s, const ZopfliHash* h,
+    const unsigned char* array,
+    size_t pos, size_t size, size_t limit,
+    unsigned short* sublen, unsigned short* distance, unsigned short* length) {
+  unsigned short hpos = pos & ZOPFLI_WINDOW_MASK, p, pp;
+  unsigned short bestdist = 0;
+  unsigned short bestlength = 1;
+  const unsigned char* scan;
+  const unsigned char* match;
+  const unsigned char* arrayend;
+  const unsigned char* arrayend_safe;
+#if ZOPFLI_MAX_CHAIN_HITS < ZOPFLI_WINDOW_SIZE
+  int chain_counter = ZOPFLI_MAX_CHAIN_HITS;  /* For quitting early. */
+#endif
+
+  unsigned dist = 0;  /* Not unsigned short on purpose. */
+
+  int* hhead = h->head;
+  unsigned short* hprev = h->prev;
+  int* hhashval = h->hashval;
+  int hval = h->val;
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  if (TryGetFromLongestMatchCache(s, pos, &limit, sublen, distance, length)) {
+    assert(pos + *length <= size);
+    return;
+  }
+#endif
+
+  assert(limit <= ZOPFLI_MAX_MATCH);
+  assert(limit >= ZOPFLI_MIN_MATCH);
+  assert(pos < size);
+
+  if (size - pos < ZOPFLI_MIN_MATCH) {
+    /* The rest of the code assumes there are at least ZOPFLI_MIN_MATCH bytes to
+       try. */
+    *length = 0;
+    *distance = 0;
+    return;
+  }
+
+  if (pos + limit > size) {
+    limit = size - pos;
+  }
+  arrayend = &array[pos] + limit;
+  arrayend_safe = arrayend - 8;
+
+  assert(hval < 65536);
+
+  pp = hhead[hval];  /* During the whole loop, p == hprev[pp]. */
+  p = hprev[pp];
+
+  assert(pp == hpos);
+
+  dist = p < pp ? pp - p : ((ZOPFLI_WINDOW_SIZE - p) + pp);
+
+  /* Go through all distances. */
+  while (dist < ZOPFLI_WINDOW_SIZE) {
+    unsigned short currentlength = 0;
+
+    assert(p < ZOPFLI_WINDOW_SIZE);
+    assert(p == hprev[pp]);
+    assert(hhashval[p] == hval);
+
+    if (dist > 0) {
+      assert(pos < size);
+      assert(dist <= pos);
+      scan = &array[pos];
+      match = &array[pos - dist];
+
+      /* Testing the byte at position bestlength first, goes slightly faster. */
+      if (pos + bestlength >= size
+          || *(scan + bestlength) == *(match + bestlength)) {
+
+#ifdef ZOPFLI_HASH_SAME
+        unsigned short same0 = h->same[pos & ZOPFLI_WINDOW_MASK];
+        if (same0 > 2 && *scan == *match) {
+          unsigned short same1 = h->same[(pos - dist) & ZOPFLI_WINDOW_MASK];
+          unsigned short same = same0 < same1 ? same0 : same1;
+          if (same > limit) same = limit;
+          scan += same;
+          match += same;
+        }
+#endif
+        scan = GetMatch(scan, match, arrayend, arrayend_safe);
+        currentlength = scan - &array[pos];  /* The found length. */
+      }
+
+      if (currentlength > bestlength) {
+        if (sublen) {
+          unsigned short j;
+          for (j = bestlength + 1; j <= currentlength; j++) {
+            sublen[j] = dist;
+          }
+        }
+        bestdist = dist;
+        bestlength = currentlength;
+        if (currentlength >= limit) break;
+      }
+    }
+
+
+#ifdef ZOPFLI_HASH_SAME_HASH
+    /* Switch to the other hash once this will be more efficient. */
+    if (hhead != h->head2 && bestlength >= h->same[hpos] &&
+        h->val2 == h->hashval2[p]) {
+      /* Now use the hash that encodes the length and first byte. */
+      hhead = h->head2;
+      hprev = h->prev2;
+      hhashval = h->hashval2;
+      hval = h->val2;
+    }
+#endif
+
+    pp = p;
+    p = hprev[p];
+    if (p == pp) break;  /* Uninited prev value. */
+
+    dist += p < pp ? pp - p : ((ZOPFLI_WINDOW_SIZE - p) + pp);
+
+#if ZOPFLI_MAX_CHAIN_HITS < ZOPFLI_WINDOW_SIZE
+    chain_counter--;
+    if (chain_counter <= 0) break;
+#endif
+  }
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  StoreInLongestMatchCache(s, pos, limit, sublen, bestdist, bestlength);
+#endif
+
+  assert(bestlength <= limit);
+
+  *distance = bestdist;
+  *length = bestlength;
+  assert(pos + *length <= size);
+}
+
+void ZopfliLZ77Greedy(ZopfliBlockState* s, const unsigned char* in,
+                      size_t instart, size_t inend,
+                      ZopfliLZ77Store* store) {
+  size_t i = 0, j;
+  unsigned short leng;
+  unsigned short dist;
+  int lengthscore;
+  size_t windowstart = instart > ZOPFLI_WINDOW_SIZE
+      ? instart - ZOPFLI_WINDOW_SIZE : 0;
+  unsigned short dummysublen[259];
+
+  ZopfliHash hash;
+  ZopfliHash* h = &hash;
+
+#ifdef ZOPFLI_LAZY_MATCHING
+  /* Lazy matching. */
+  unsigned prev_length = 0;
+  unsigned prev_match = 0;
+  int prevlengthscore;
+  int match_available = 0;
+#endif
+
+  if (instart == inend) return;
+
+  ZopfliInitHash(ZOPFLI_WINDOW_SIZE, h);
+  ZopfliWarmupHash(in, windowstart, inend, h);
+  for (i = windowstart; i < instart; i++) {
+    ZopfliUpdateHash(in, i, inend, h);
+  }
+
+  for (i = instart; i < inend; i++) {
+    ZopfliUpdateHash(in, i, inend, h);
+
+    ZopfliFindLongestMatch(s, h, in, i, inend, ZOPFLI_MAX_MATCH, dummysublen,
+                           &dist, &leng);
+    lengthscore = GetLengthScore(leng, dist);
+
+#ifdef ZOPFLI_LAZY_MATCHING
+    /* Lazy matching. */
+    prevlengthscore = GetLengthScore(prev_length, prev_match);
+    if (match_available) {
+      match_available = 0;
+      if (lengthscore > prevlengthscore + 1) {
+        ZopfliStoreLitLenDist(in[i - 1], 0, store);
+        if (lengthscore >= ZOPFLI_MIN_MATCH && leng < ZOPFLI_MAX_MATCH) {
+          match_available = 1;
+          prev_length = leng;
+          prev_match = dist;
+          continue;
+        }
+      } else {
+        /* Add previous to output. */
+        leng = prev_length;
+        dist = prev_match;
+        lengthscore = prevlengthscore;
+        /* Add to output. */
+        ZopfliVerifyLenDist(in, inend, i - 1, dist, leng);
+        ZopfliStoreLitLenDist(leng, dist, store);
+        for (j = 2; j < leng; j++) {
+          assert(i < inend);
+          i++;
+          ZopfliUpdateHash(in, i, inend, h);
+        }
+        continue;
+      }
+    }
+    else if (lengthscore >= ZOPFLI_MIN_MATCH && leng < ZOPFLI_MAX_MATCH) {
+      match_available = 1;
+      prev_length = leng;
+      prev_match = dist;
+      continue;
+    }
+    /* End of lazy matching. */
+#endif
+
+    /* Add to output. */
+    if (lengthscore >= ZOPFLI_MIN_MATCH) {
+      ZopfliVerifyLenDist(in, inend, i, dist, leng);
+      ZopfliStoreLitLenDist(leng, dist, store);
+    } else {
+      leng = 1;
+      ZopfliStoreLitLenDist(in[i], 0, store);
+    }
+    for (j = 1; j < leng; j++) {
+      assert(i < inend);
+      i++;
+      ZopfliUpdateHash(in, i, inend, h);
+    }
+  }
+
+  ZopfliCleanHash(h);
+}
+
+void ZopfliLZ77Counts(const unsigned short* litlens,
+                      const unsigned short* dists,
+                      size_t start, size_t end,
+                      size_t* ll_count, size_t* d_count) {
+  size_t i;
+
+  for (i = 0; i < 288; i++) {
+    ll_count[i] = 0;
+  }
+  for (i = 0; i < 32; i++) {
+    d_count[i] = 0;
+  }
+
+  for (i = start; i < end; i++) {
+    if (dists[i] == 0) {
+      ll_count[litlens[i]]++;
+    } else {
+      ll_count[ZopfliGetLengthSymbol(litlens[i])]++;
+      d_count[ZopfliGetDistSymbol(dists[i])]++;
+    }
+  }
+
+  ll_count[256] = 1;  /* End symbol. */
+}
diff --git a/src/zopfli/lz77.h b/src/zopfli/lz77.h
new file mode 100644
index 0000000..55186a7
--- /dev/null
+++ b/src/zopfli/lz77.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Functions for basic LZ77 compression and utilities for the "squeeze" LZ77
+compression.
+*/
+
+#ifndef ZOPFLI_LZ77_H_
+#define ZOPFLI_LZ77_H_
+
+#include <stdlib.h>
+
+#include "cache.h"
+#include "hash.h"
+#include "zopfli.h"
+
+/*
+Stores lit/length and dist pairs for LZ77.
+Parameter litlens: Contains the literal symbols or length values.
+Parameter dists: Contains the distances. A value is 0 to indicate that there is
+no dist and the corresponding litlens value is a literal instead of a length.
+Parameter size: The size of both the litlens and dists arrays.
+The memory can best be managed by using ZopfliInitLZ77Store to initialize it,
+ZopfliCleanLZ77Store to destroy it, and ZopfliStoreLitLenDist to append values.
+
+*/
+typedef struct ZopfliLZ77Store {
+  unsigned short* litlens;  /* Lit or len. */
+  unsigned short* dists;  /* If 0: indicates literal in corresponding litlens,
+      if > 0: length in corresponding litlens, this is the distance. */
+  size_t size;
+} ZopfliLZ77Store;
+
+void ZopfliInitLZ77Store(ZopfliLZ77Store* store);
+void ZopfliCleanLZ77Store(ZopfliLZ77Store* store);
+void ZopfliCopyLZ77Store(const ZopfliLZ77Store* source, ZopfliLZ77Store* dest);
+void ZopfliStoreLitLenDist(unsigned short length, unsigned short dist,
+                           ZopfliLZ77Store* store);
+
+/*
+Some state information for compressing a block.
+This is currently a bit under-used (with mainly only the longest match cache),
+but is kept for easy future expansion.
+*/
+typedef struct ZopfliBlockState {
+  const ZopfliOptions* options;
+
+#ifdef ZOPFLI_LONGEST_MATCH_CACHE
+  /* Cache for length/distance pairs found so far. */
+  ZopfliLongestMatchCache* lmc;
+#endif
+
+  /* The start (inclusive) and end (not inclusive) of the current block. */
+  size_t blockstart;
+  size_t blockend;
+} ZopfliBlockState;
+
+/*
+Finds the longest match (length and corresponding distance) for LZ77
+compression.
+Even when not using "sublen", it can be more efficient to provide an array,
+because only then the caching is used.
+array: the data
+pos: position in the data to find the match for
+size: size of the data
+limit: limit length to maximum this value (default should be 258). This allows
+    finding a shorter dist for that length (= less extra bits). Must be
+    in the range [ZOPFLI_MIN_MATCH, ZOPFLI_MAX_MATCH].
+sublen: output array of 259 elements, or null. Has, for each length, the
+    smallest distance required to reach this length. Only 256 of its 259 values
+    are used, the first 3 are ignored (the shortest length is 3. It is purely
+    for convenience that the array is made 3 longer).
+*/
+void ZopfliFindLongestMatch(
+    ZopfliBlockState *s, const ZopfliHash* h, const unsigned char* array,
+    size_t pos, size_t size, size_t limit,
+    unsigned short* sublen, unsigned short* distance, unsigned short* length);
+
+/*
+Verifies if length and dist are indeed valid, only used for assertion.
+*/
+void ZopfliVerifyLenDist(const unsigned char* data, size_t datasize, size_t pos,
+                         unsigned short dist, unsigned short length);
+
+/*
+Counts the number of literal, length and distance symbols in the given lz77
+arrays.
+litlens: lz77 lit/lengths
+dists: ll77 distances
+start: where to begin counting in litlens and dists
+end: where to stop counting in litlens and dists (not inclusive)
+ll_count: count of each lit/len symbol, must have size 288 (see deflate
+    standard)
+d_count: count of each dist symbol, must have size 32 (see deflate standard)
+*/
+void ZopfliLZ77Counts(const unsigned short* litlens,
+                      const unsigned short* dists,
+                      size_t start, size_t end,
+                      size_t* ll_count, size_t* d_count);
+
+/*
+Does LZ77 using an algorithm similar to gzip, with lazy matching, rather than
+with the slow but better "squeeze" implementation.
+The result is placed in the ZopfliLZ77Store.
+If instart is larger than 0, it uses values before instart as starting
+dictionary.
+*/
+void ZopfliLZ77Greedy(ZopfliBlockState* s, const unsigned char* in,
+                      size_t instart, size_t inend,
+                      ZopfliLZ77Store* store);
+
+#endif  /* ZOPFLI_LZ77_H_ */
diff --git a/src/zopfli/squeeze.c b/src/zopfli/squeeze.c
new file mode 100644
index 0000000..09e7e2e
--- /dev/null
+++ b/src/zopfli/squeeze.c
@@ -0,0 +1,546 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "squeeze.h"
+
+#include <assert.h>
+#include <math.h>
+#include <stdio.h>
+
+#include "blocksplitter.h"
+#include "deflate.h"
+#include "tree.h"
+#include "util.h"
+
+typedef struct SymbolStats {
+  /* The literal and length symbols. */
+  size_t litlens[288];
+  /* The 32 unique dist symbols, not the 32768 possible dists. */
+  size_t dists[32];
+
+  double ll_symbols[288];  /* Length of each lit/len symbol in bits. */
+  double d_symbols[32];  /* Length of each dist symbol in bits. */
+} SymbolStats;
+
+/* Sets everything to 0. */
+static void InitStats(SymbolStats* stats) {
+  memset(stats->litlens, 0, 288 * sizeof(stats->litlens[0]));
+  memset(stats->dists, 0, 32 * sizeof(stats->dists[0]));
+
+  memset(stats->ll_symbols, 0, 288 * sizeof(stats->ll_symbols[0]));
+  memset(stats->d_symbols, 0, 32 * sizeof(stats->d_symbols[0]));
+}
+
+static void CopyStats(SymbolStats* source, SymbolStats* dest) {
+  memcpy(dest->litlens, source->litlens, 288 * sizeof(dest->litlens[0]));
+  memcpy(dest->dists, source->dists, 32 * sizeof(dest->dists[0]));
+
+  memcpy(dest->ll_symbols, source->ll_symbols,
+         288 * sizeof(dest->ll_symbols[0]));
+  memcpy(dest->d_symbols, source->d_symbols, 32 * sizeof(dest->d_symbols[0]));
+}
+
+/* Adds the bit lengths. */
+static void AddWeighedStatFreqs(const SymbolStats* stats1, double w1,
+                                const SymbolStats* stats2, double w2,
+                                SymbolStats* result) {
+  size_t i;
+  for (i = 0; i < 288; i++) {
+    result->litlens[i] =
+        (size_t) (stats1->litlens[i] * w1 + stats2->litlens[i] * w2);
+  }
+  for (i = 0; i < 32; i++) {
+    result->dists[i] =
+        (size_t) (stats1->dists[i] * w1 + stats2->dists[i] * w2);
+  }
+  result->litlens[256] = 1;  /* End symbol. */
+}
+
+typedef struct RanState {
+  unsigned int m_w, m_z;
+} RanState;
+
+static void InitRanState(RanState* state) {
+  state->m_w = 1;
+  state->m_z = 2;
+}
+
+/* Get random number: "Multiply-With-Carry" generator of G. Marsaglia */
+static unsigned int Ran(RanState* state) {
+  state->m_z = 36969 * (state->m_z & 65535) + (state->m_z >> 16);
+  state->m_w = 18000 * (state->m_w & 65535) + (state->m_w >> 16);
+  return (state->m_z << 16) + state->m_w;  /* 32-bit result. */
+}
+
+static void RandomizeFreqs(RanState* state, size_t* freqs, int n) {
+  int i;
+  for (i = 0; i < n; i++) {
+    if ((Ran(state) >> 4) % 3 == 0) freqs[i] = freqs[Ran(state) % n];
+  }
+}
+
+static void RandomizeStatFreqs(RanState* state, SymbolStats* stats) {
+  RandomizeFreqs(state, stats->litlens, 288);
+  RandomizeFreqs(state, stats->dists, 32);
+  stats->litlens[256] = 1;  /* End symbol. */
+}
+
+static void ClearStatFreqs(SymbolStats* stats) {
+  size_t i;
+  for (i = 0; i < 288; i++) stats->litlens[i] = 0;
+  for (i = 0; i < 32; i++) stats->dists[i] = 0;
+}
+
+/*
+Function that calculates a cost based on a model for the given LZ77 symbol.
+litlen: means literal symbol if dist is 0, length otherwise.
+*/
+typedef double CostModelFun(unsigned litlen, unsigned dist, void* context);
+
+/*
+Cost model which should exactly match fixed tree.
+type: CostModelFun
+*/
+static double GetCostFixed(unsigned litlen, unsigned dist, void* unused) {
+  (void)unused;
+  if (dist == 0) {
+    if (litlen <= 143) return 8;
+    else return 9;
+  } else {
+    int dbits = ZopfliGetDistExtraBits(dist);
+    int lbits = ZopfliGetLengthExtraBits(litlen);
+    int lsym = ZopfliGetLengthSymbol(litlen);
+    double cost = 0;
+    if (lsym <= 279) cost += 7;
+    else cost += 8;
+    cost += 5;  /* Every dist symbol has length 5. */
+    return cost + dbits + lbits;
+  }
+}
+
+/*
+Cost model based on symbol statistics.
+type: CostModelFun
+*/
+static double GetCostStat(unsigned litlen, unsigned dist, void* context) {
+  SymbolStats* stats = (SymbolStats*)context;
+  if (dist == 0) {
+    return stats->ll_symbols[litlen];
+  } else {
+    int lsym = ZopfliGetLengthSymbol(litlen);
+    int lbits = ZopfliGetLengthExtraBits(litlen);
+    int dsym = ZopfliGetDistSymbol(dist);
+    int dbits = ZopfliGetDistExtraBits(dist);
+    return stats->ll_symbols[lsym] + lbits + stats->d_symbols[dsym] + dbits;
+  }
+}
+
+/*
+Finds the minimum possible cost this cost model can return for valid length and
+distance symbols.
+*/
+static double GetCostModelMinCost(CostModelFun* costmodel, void* costcontext) {
+  double mincost;
+  int bestlength = 0; /* length that has lowest cost in the cost model */
+  int bestdist = 0; /* distance that has lowest cost in the cost model */
+  int i;
+  /*
+  Table of distances that have a different distance symbol in the deflate
+  specification. Each value is the first distance that has a new symbol. Only
+  different symbols affect the cost model so only these need to be checked.
+  See RFC 1951 section 3.2.5. Compressed blocks (length and distance codes).
+  */
+  static const int dsymbols[30] = {
+    1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513,
+    769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577
+  };
+
+  mincost = ZOPFLI_LARGE_FLOAT;
+  for (i = 3; i < 259; i++) {
+    double c = costmodel(i, 1, costcontext);
+    if (c < mincost) {
+      bestlength = i;
+      mincost = c;
+    }
+  }
+
+  mincost = ZOPFLI_LARGE_FLOAT;
+  for (i = 0; i < 30; i++) {
+    double c = costmodel(3, dsymbols[i], costcontext);
+    if (c < mincost) {
+      bestdist = dsymbols[i];
+      mincost = c;
+    }
+  }
+
+  return costmodel(bestlength, bestdist, costcontext);
+}
+
+/*
+Performs the forward pass for "squeeze". Gets the most optimal length to reach
+every byte from a previous byte, using cost calculations.
+s: the ZopfliBlockState
+in: the input data array
+instart: where to start
+inend: where to stop (not inclusive)
+costmodel: function to calculate the cost of some lit/len/dist pair.
+costcontext: abstract context for the costmodel function
+length_array: output array of size (inend - instart) which will receive the best
+    length to reach this byte from a previous byte.
+returns the cost that was, according to the costmodel, needed to get to the end.
+*/
+static double GetBestLengths(ZopfliBlockState *s,
+                             const unsigned char* in,
+                             size_t instart, size_t inend,
+                             CostModelFun* costmodel, void* costcontext,
+                             unsigned short* length_array) {
+  /* Best cost to get here so far. */
+  size_t blocksize = inend - instart;
+  float* costs;
+  size_t i = 0, k;
+  unsigned short leng;
+  unsigned short dist;
+  unsigned short sublen[259];
+  size_t windowstart = instart > ZOPFLI_WINDOW_SIZE
+      ? instart - ZOPFLI_WINDOW_SIZE : 0;
+  ZopfliHash hash;
+  ZopfliHash* h = &hash;
+  double result;
+  double mincost = GetCostModelMinCost(costmodel, costcontext);
+
+  if (instart == inend) return 0;
+
+  costs = (float*)malloc(sizeof(float) * (blocksize + 1));
+  if (!costs) exit(-1); /* Allocation failed. */
+
+  ZopfliInitHash(ZOPFLI_WINDOW_SIZE, h);
+  ZopfliWarmupHash(in, windowstart, inend, h);
+  for (i = windowstart; i < instart; i++) {
+    ZopfliUpdateHash(in, i, inend, h);
+  }
+
+  for (i = 1; i < blocksize + 1; i++) costs[i] = ZOPFLI_LARGE_FLOAT;
+  costs[0] = 0;  /* Because it's the start. */
+  length_array[0] = 0;
+
+  for (i = instart; i < inend; i++) {
+    size_t j = i - instart;  /* Index in the costs array and length_array. */
+    ZopfliUpdateHash(in, i, inend, h);
+
+#ifdef ZOPFLI_SHORTCUT_LONG_REPETITIONS
+    /* If we're in a long repetition of the same character and have more than
+    ZOPFLI_MAX_MATCH characters before and after our position. */
+    if (h->same[i & ZOPFLI_WINDOW_MASK] > ZOPFLI_MAX_MATCH * 2
+        && i > instart + ZOPFLI_MAX_MATCH + 1
+        && i + ZOPFLI_MAX_MATCH * 2 + 1 < inend
+        && h->same[(i - ZOPFLI_MAX_MATCH) & ZOPFLI_WINDOW_MASK]
+            > ZOPFLI_MAX_MATCH) {
+      double symbolcost = costmodel(ZOPFLI_MAX_MATCH, 1, costcontext);
+      /* Set the length to reach each one to ZOPFLI_MAX_MATCH, and the cost to
+      the cost corresponding to that length. Doing this, we skip
+      ZOPFLI_MAX_MATCH values to avoid calling ZopfliFindLongestMatch. */
+      for (k = 0; k < ZOPFLI_MAX_MATCH; k++) {
+        costs[j + ZOPFLI_MAX_MATCH] = costs[j] + symbolcost;
+        length_array[j + ZOPFLI_MAX_MATCH] = ZOPFLI_MAX_MATCH;
+        i++;
+        j++;
+        ZopfliUpdateHash(in, i, inend, h);
+      }
+    }
+#endif
+
+    ZopfliFindLongestMatch(s, h, in, i, inend, ZOPFLI_MAX_MATCH, sublen,
+                           &dist, &leng);
+
+    /* Literal. */
+    if (i + 1 <= inend) {
+      double newCost = costs[j] + costmodel(in[i], 0, costcontext);
+      assert(newCost >= 0);
+      if (newCost < costs[j + 1]) {
+        costs[j + 1] = newCost;
+        length_array[j + 1] = 1;
+      }
+    }
+    /* Lengths. */
+    for (k = 3; k <= leng && i + k <= inend; k++) {
+      double newCost;
+
+      /* Calling the cost model is expensive, avoid this if we are already at
+      the minimum possible cost that it can return. */
+     if (costs[j + k] - costs[j] <= mincost) continue;
+
+      newCost = costs[j] + costmodel(k, sublen[k], costcontext);
+      assert(newCost >= 0);
+      if (newCost < costs[j + k]) {
+        assert(k <= ZOPFLI_MAX_MATCH);
+        costs[j + k] = newCost;
+        length_array[j + k] = k;
+      }
+    }
+  }
+
+  assert(costs[blocksize] >= 0);
+  result = costs[blocksize];
+
+  ZopfliCleanHash(h);
+  free(costs);
+
+  return result;
+}
+
+/*
+Calculates the optimal path of lz77 lengths to use, from the calculated
+length_array. The length_array must contain the optimal length to reach that
+byte. The path will be filled with the lengths to use, so its data size will be
+the amount of lz77 symbols.
+*/
+static void TraceBackwards(size_t size, const unsigned short* length_array,
+                           unsigned short** path, size_t* pathsize) {
+  size_t index = size;
+  if (size == 0) return;
+  for (;;) {
+    ZOPFLI_APPEND_DATA(length_array[index], path, pathsize);
+    assert(length_array[index] <= index);
+    assert(length_array[index] <= ZOPFLI_MAX_MATCH);
+    assert(length_array[index] != 0);
+    index -= length_array[index];
+    if (index == 0) break;
+  }
+
+  /* Mirror result. */
+  for (index = 0; index < *pathsize / 2; index++) {
+    unsigned short temp = (*path)[index];
+    (*path)[index] = (*path)[*pathsize - index - 1];
+    (*path)[*pathsize - index - 1] = temp;
+  }
+}
+
+static void FollowPath(ZopfliBlockState* s,
+                       const unsigned char* in, size_t instart, size_t inend,
+                       unsigned short* path, size_t pathsize,
+                       ZopfliLZ77Store* store) {
+  size_t i, j, pos = 0;
+  size_t windowstart = instart > ZOPFLI_WINDOW_SIZE
+      ? instart - ZOPFLI_WINDOW_SIZE : 0;
+
+  size_t total_length_test = 0;
+
+  ZopfliHash hash;
+  ZopfliHash* h = &hash;
+
+  if (instart == inend) return;
+
+  ZopfliInitHash(ZOPFLI_WINDOW_SIZE, h);
+  ZopfliWarmupHash(in, windowstart, inend, h);
+  for (i = windowstart; i < instart; i++) {
+    ZopfliUpdateHash(in, i, inend, h);
+  }
+
+  pos = instart;
+  for (i = 0; i < pathsize; i++) {
+    unsigned short length = path[i];
+    unsigned short dummy_length;
+    unsigned short dist;
+    assert(pos < inend);
+
+    ZopfliUpdateHash(in, pos, inend, h);
+
+    /* Add to output. */
+    if (length >= ZOPFLI_MIN_MATCH) {
+      /* Get the distance by recalculating longest match. The found length
+      should match the length from the path. */
+      ZopfliFindLongestMatch(s, h, in, pos, inend, length, 0,
+                             &dist, &dummy_length);
+      assert(!(dummy_length != length && length > 2 && dummy_length > 2));
+      ZopfliVerifyLenDist(in, inend, pos, dist, length);
+      ZopfliStoreLitLenDist(length, dist, store);
+      total_length_test += length;
+    } else {
+      length = 1;
+      ZopfliStoreLitLenDist(in[pos], 0, store);
+      total_length_test++;
+    }
+
+
+    assert(pos + length <= inend);
+    for (j = 1; j < length; j++) {
+      ZopfliUpdateHash(in, pos + j, inend, h);
+    }
+
+    pos += length;
+  }
+
+  ZopfliCleanHash(h);
+}
+
+/* Calculates the entropy of the statistics */
+static void CalculateStatistics(SymbolStats* stats) {
+  ZopfliCalculateEntropy(stats->litlens, 288, stats->ll_symbols);
+  ZopfliCalculateEntropy(stats->dists, 32, stats->d_symbols);
+}
+
+/* Appends the symbol statistics from the store. */
+static void GetStatistics(const ZopfliLZ77Store* store, SymbolStats* stats) {
+  size_t i;
+  for (i = 0; i < store->size; i++) {
+    if (store->dists[i] == 0) {
+      stats->litlens[store->litlens[i]]++;
+    } else {
+      stats->litlens[ZopfliGetLengthSymbol(store->litlens[i])]++;
+      stats->dists[ZopfliGetDistSymbol(store->dists[i])]++;
+    }
+  }
+  stats->litlens[256] = 1;  /* End symbol. */
+
+  CalculateStatistics(stats);
+}
+
+/*
+Does a single run for ZopfliLZ77Optimal. For good compression, repeated runs
+with updated statistics should be performed.
+
+s: the block state
+in: the input data array
+instart: where to start
+inend: where to stop (not inclusive)
+path: pointer to dynamically allocated memory to store the path
+pathsize: pointer to the size of the dynamic path array
+length_array: array if size (inend - instart) used to store lengths
+costmodel: function to use as the cost model for this squeeze run
+costcontext: abstract context for the costmodel function
+store: place to output the LZ77 data
+returns the cost that was, according to the costmodel, needed to get to the end.
+    This is not the actual cost.
+*/
+static double LZ77OptimalRun(ZopfliBlockState* s,
+    const unsigned char* in, size_t instart, size_t inend,
+    unsigned short** path, size_t* pathsize,
+    unsigned short* length_array, CostModelFun* costmodel,
+    void* costcontext, ZopfliLZ77Store* store) {
+  double cost = GetBestLengths(
+      s, in, instart, inend, costmodel, costcontext, length_array);
+  free(*path);
+  *path = 0;
+  *pathsize = 0;
+  TraceBackwards(inend - instart, length_array, path, pathsize);
+  FollowPath(s, in, instart, inend, *path, *pathsize, store);
+  assert(cost < ZOPFLI_LARGE_FLOAT);
+  return cost;
+}
+
+void ZopfliLZ77Optimal(ZopfliBlockState *s,
+                       const unsigned char* in, size_t instart, size_t inend,
+                       ZopfliLZ77Store* store) {
+  /* Dist to get to here with smallest cost. */
+  size_t blocksize = inend - instart;
+  unsigned short* length_array =
+      (unsigned short*)malloc(sizeof(unsigned short) * (blocksize + 1));
+  unsigned short* path = 0;
+  size_t pathsize = 0;
+  ZopfliLZ77Store currentstore;
+  SymbolStats stats, beststats, laststats;
+  int i;
+  double cost;
+  double bestcost = ZOPFLI_LARGE_FLOAT;
+  double lastcost = 0;
+  /* Try randomizing the costs a bit once the size stabilizes. */
+  RanState ran_state;
+  int lastrandomstep = -1;
+
+  if (!length_array) exit(-1); /* Allocation failed. */
+
+  InitRanState(&ran_state);
+  InitStats(&stats);
+  ZopfliInitLZ77Store(&currentstore);
+
+  /* Do regular deflate, then loop multiple shortest path runs, each time using
+  the statistics of the previous run. */
+
+  /* Initial run. */
+  ZopfliLZ77Greedy(s, in, instart, inend, &currentstore);
+  GetStatistics(&currentstore, &stats);
+
+  /* Repeat statistics with each time the cost model from the previous stat
+  run. */
+  for (i = 0; i < s->options->numiterations; i++) {
+    ZopfliCleanLZ77Store(&currentstore);
+    ZopfliInitLZ77Store(&currentstore);
+    LZ77OptimalRun(s, in, instart, inend, &path, &pathsize,
+                   length_array, GetCostStat, (void*)&stats,
+                   &currentstore);
+    cost = ZopfliCalculateBlockSize(currentstore.litlens, currentstore.dists,
+                                    0, currentstore.size, 2);
+    if (s->options->verbose_more || (s->options->verbose && cost < bestcost)) {
+      fprintf(stderr, "Iteration %d: %d bit\n", i, (int) cost);
+    }
+    if (cost < bestcost) {
+      /* Copy to the output store. */
+      ZopfliCopyLZ77Store(&currentstore, store);
+      CopyStats(&stats, &beststats);
+      bestcost = cost;
+    }
+    CopyStats(&stats, &laststats);
+    ClearStatFreqs(&stats);
+    GetStatistics(&currentstore, &stats);
+    if (lastrandomstep != -1) {
+      /* This makes it converge slower but better. Do it only once the
+      randomness kicks in so that if the user does few iterations, it gives a
+      better result sooner. */
+      AddWeighedStatFreqs(&stats, 1.0, &laststats, 0.5, &stats);
+      CalculateStatistics(&stats);
+    }
+    if (i > 5 && cost == lastcost) {
+      CopyStats(&beststats, &stats);
+      RandomizeStatFreqs(&ran_state, &stats);
+      CalculateStatistics(&stats);
+      lastrandomstep = i;
+    }
+    lastcost = cost;
+  }
+
+  free(length_array);
+  free(path);
+  ZopfliCleanLZ77Store(&currentstore);
+}
+
+void ZopfliLZ77OptimalFixed(ZopfliBlockState *s,
+                            const unsigned char* in,
+                            size_t instart, size_t inend,
+                            ZopfliLZ77Store* store)
+{
+  /* Dist to get to here with smallest cost. */
+  size_t blocksize = inend - instart;
+  unsigned short* length_array =
+      (unsigned short*)malloc(sizeof(unsigned short) * (blocksize + 1));
+  unsigned short* path = 0;
+  size_t pathsize = 0;
+
+  if (!length_array) exit(-1); /* Allocation failed. */
+
+  s->blockstart = instart;
+  s->blockend = inend;
+
+  /* Shortest path for fixed tree This one should give the shortest possible
+  result for fixed tree, no repeated runs are needed since the tree is known. */
+  LZ77OptimalRun(s, in, instart, inend, &path, &pathsize,
+                 length_array, GetCostFixed, 0, store);
+
+  free(length_array);
+  free(path);
+}
diff --git a/src/zopfli/squeeze.h b/src/zopfli/squeeze.h
new file mode 100644
index 0000000..e850aaa
--- /dev/null
+++ b/src/zopfli/squeeze.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+The squeeze functions do enhanced LZ77 compression by optimal parsing with a
+cost model, rather than greedily choosing the longest length or using a single
+step of lazy matching like regular implementations.
+
+Since the cost model is based on the Huffman tree that can only be calculated
+after the LZ77 data is generated, there is a chicken and egg problem, and
+multiple runs are done with updated cost models to converge to a better
+solution.
+*/
+
+#ifndef ZOPFLI_SQUEEZE_H_
+#define ZOPFLI_SQUEEZE_H_
+
+#include "lz77.h"
+
+/*
+Calculates lit/len and dist pairs for given data.
+If instart is larger than 0, it uses values before instart as starting
+dictionary.
+*/
+void ZopfliLZ77Optimal(ZopfliBlockState *s,
+                       const unsigned char* in, size_t instart, size_t inend,
+                       ZopfliLZ77Store* store);
+
+/*
+Does the same as ZopfliLZ77Optimal, but optimized for the fixed tree of the
+deflate standard.
+The fixed tree never gives the best compression. But this gives the best
+possible LZ77 encoding possible with the fixed tree.
+This does not create or output any fixed tree, only LZ77 data optimized for
+using with a fixed tree.
+If instart is larger than 0, it uses values before instart as starting
+dictionary.
+*/
+void ZopfliLZ77OptimalFixed(ZopfliBlockState *s,
+                            const unsigned char* in,
+                            size_t instart, size_t inend,
+                            ZopfliLZ77Store* store);
+
+#endif  /* ZOPFLI_SQUEEZE_H_ */
diff --git a/src/zopfli/tree.c b/src/zopfli/tree.c
new file mode 100644
index 0000000..c457511
--- /dev/null
+++ b/src/zopfli/tree.c
@@ -0,0 +1,101 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "tree.h"
+
+#include <assert.h>
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "katajainen.h"
+#include "util.h"
+
+void ZopfliLengthsToSymbols(const unsigned* lengths, size_t n, unsigned maxbits,
+                            unsigned* symbols) {
+  size_t* bl_count = (size_t*)malloc(sizeof(size_t) * (maxbits + 1));
+  size_t* next_code = (size_t*)malloc(sizeof(size_t) * (maxbits + 1));
+  unsigned bits, i;
+  unsigned code;
+
+  for (i = 0; i < n; i++) {
+    symbols[i] = 0;
+  }
+
+  /* 1) Count the number of codes for each code length. Let bl_count[N] be the
+  number of codes of length N, N >= 1. */
+  for (bits = 0; bits <= maxbits; bits++) {
+    bl_count[bits] = 0;
+  }
+  for (i = 0; i < n; i++) {
+    assert(lengths[i] <= maxbits);
+    bl_count[lengths[i]]++;
+  }
+  /* 2) Find the numerical value of the smallest code for each code length. */
+  code = 0;
+  bl_count[0] = 0;
+  for (bits = 1; bits <= maxbits; bits++) {
+    code = (code + bl_count[bits-1]) << 1;
+    next_code[bits] = code;
+  }
+  /* 3) Assign numerical values to all codes, using consecutive values for all
+  codes of the same length with the base values determined at step 2. */
+  for (i = 0;  i < n; i++) {
+    unsigned len = lengths[i];
+    if (len != 0) {
+      symbols[i] = next_code[len];
+      next_code[len]++;
+    }
+  }
+
+  free(bl_count);
+  free(next_code);
+}
+
+void ZopfliCalculateEntropy(const size_t* count, size_t n, double* bitlengths) {
+  static const double kInvLog2 = 1.4426950408889;  /* 1.0 / log(2.0) */
+  unsigned sum = 0;
+  unsigned i;
+  double log2sum;
+  for (i = 0; i < n; ++i) {
+    sum += count[i];
+  }
+  log2sum = (sum == 0 ? log(n) : log(sum)) * kInvLog2;
+  for (i = 0; i < n; ++i) {
+    /* When the count of the symbol is 0, but its cost is requested anyway, it
+    means the symbol will appear at least once anyway, so give it the cost as if
+    its count is 1.*/
+    if (count[i] == 0) bitlengths[i] = log2sum;
+    else bitlengths[i] = log2sum - log(count[i]) * kInvLog2;
+    /* Depending on compiler and architecture, the above subtraction of two
+    floating point numbers may give a negative result very close to zero
+    instead of zero (e.g. -5.973954e-17 with gcc 4.1.2 on Ubuntu 11.4). Clamp
+    it to zero. These floating point imprecisions do not affect the cost model
+    significantly so this is ok. */
+    if (bitlengths[i] < 0 && bitlengths[i] > -1e-5) bitlengths[i] = 0;
+    assert(bitlengths[i] >= 0);
+  }
+}
+
+void ZopfliCalculateBitLengths(const size_t* count, size_t n, int maxbits,
+                               unsigned* bitlengths) {
+  int error = ZopfliLengthLimitedCodeLengths(count, n, maxbits, bitlengths);
+  (void) error;
+  assert(!error);
+}
diff --git a/src/zopfli/tree.h b/src/zopfli/tree.h
new file mode 100644
index 0000000..4d6f469
--- /dev/null
+++ b/src/zopfli/tree.h
@@ -0,0 +1,51 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Utilities for creating and using Huffman trees.
+*/
+
+#ifndef ZOPFLI_TREE_H_
+#define ZOPFLI_TREE_H_
+
+#include <string.h>
+
+/*
+Calculates the bitlengths for the Huffman tree, based on the counts of each
+symbol.
+*/
+void ZopfliCalculateBitLengths(const size_t* count, size_t n, int maxbits,
+                               unsigned *bitlengths);
+
+/*
+Converts a series of Huffman tree bitlengths, to the bit values of the symbols.
+*/
+void ZopfliLengthsToSymbols(const unsigned* lengths, size_t n, unsigned maxbits,
+                            unsigned* symbols);
+
+/*
+Calculates the entropy of each symbol, based on the counts of each symbol. The
+result is similar to the result of ZopfliCalculateBitLengths, but with the
+actual theoritical bit lengths according to the entropy. Since the resulting
+values are fractional, they cannot be used to encode the tree specified by
+DEFLATE.
+*/
+void ZopfliCalculateEntropy(const size_t* count, size_t n, double* bitlengths);
+
+#endif  /* ZOPFLI_TREE_H_ */
diff --git a/src/zopfli/util.c b/src/zopfli/util.c
new file mode 100644
index 0000000..d207145
--- /dev/null
+++ b/src/zopfli/util.c
@@ -0,0 +1,213 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "util.h"
+
+#include "zopfli.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+int ZopfliGetDistExtraBits(int dist) {
+#ifdef __GNUC__
+  if (dist < 5) return 0;
+  return (31 ^ __builtin_clz(dist - 1)) - 1; /* log2(dist - 1) - 1 */
+#else
+  if (dist < 5) return 0;
+  else if (dist < 9) return 1;
+  else if (dist < 17) return 2;
+  else if (dist < 33) return 3;
+  else if (dist < 65) return 4;
+  else if (dist < 129) return 5;
+  else if (dist < 257) return 6;
+  else if (dist < 513) return 7;
+  else if (dist < 1025) return 8;
+  else if (dist < 2049) return 9;
+  else if (dist < 4097) return 10;
+  else if (dist < 8193) return 11;
+  else if (dist < 16385) return 12;
+  else return 13;
+#endif
+}
+
+int ZopfliGetDistExtraBitsValue(int dist) {
+#ifdef __GNUC__
+  if (dist < 5) {
+    return 0;
+  } else {
+    int l = 31 ^ __builtin_clz(dist - 1); /* log2(dist - 1) */
+    return (dist - (1 + (1 << l))) & ((1 << (l - 1)) - 1);
+  }
+#else
+  if (dist < 5) return 0;
+  else if (dist < 9) return (dist - 5) & 1;
+  else if (dist < 17) return (dist - 9) & 3;
+  else if (dist < 33) return (dist - 17) & 7;
+  else if (dist < 65) return (dist - 33) & 15;
+  else if (dist < 129) return (dist - 65) & 31;
+  else if (dist < 257) return (dist - 129) & 63;
+  else if (dist < 513) return (dist - 257) & 127;
+  else if (dist < 1025) return (dist - 513) & 255;
+  else if (dist < 2049) return (dist - 1025) & 511;
+  else if (dist < 4097) return (dist - 2049) & 1023;
+  else if (dist < 8193) return (dist - 4097) & 2047;
+  else if (dist < 16385) return (dist - 8193) & 4095;
+  else return (dist - 16385) & 8191;
+#endif
+}
+
+int ZopfliGetDistSymbol(int dist) {
+#ifdef __GNUC__
+  if (dist < 5) {
+    return dist - 1;
+  } else {
+    int l = (31 ^ __builtin_clz(dist - 1)); /* log2(dist - 1) */
+    int r = ((dist - 1) >> (l - 1)) & 1;
+    return l * 2 + r;
+  }
+#else
+  if (dist < 193) {
+    if (dist < 13) {  /* dist 0..13. */
+      if (dist < 5) return dist - 1;
+      else if (dist < 7) return 4;
+      else if (dist < 9) return 5;
+      else return 6;
+    } else {  /* dist 13..193. */
+      if (dist < 17) return 7;
+      else if (dist < 25) return 8;
+      else if (dist < 33) return 9;
+      else if (dist < 49) return 10;
+      else if (dist < 65) return 11;
+      else if (dist < 97) return 12;
+      else if (dist < 129) return 13;
+      else return 14;
+    }
+  } else {
+    if (dist < 2049) {  /* dist 193..2049. */
+      if (dist < 257) return 15;
+      else if (dist < 385) return 16;
+      else if (dist < 513) return 17;
+      else if (dist < 769) return 18;
+      else if (dist < 1025) return 19;
+      else if (dist < 1537) return 20;
+      else return 21;
+    } else {  /* dist 2049..32768. */
+      if (dist < 3073) return 22;
+      else if (dist < 4097) return 23;
+      else if (dist < 6145) return 24;
+      else if (dist < 8193) return 25;
+      else if (dist < 12289) return 26;
+      else if (dist < 16385) return 27;
+      else if (dist < 24577) return 28;
+      else return 29;
+    }
+  }
+#endif
+}
+
+int ZopfliGetLengthExtraBits(int l) {
+  static const int table[259] = {
+    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
+    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0
+  };
+  return table[l];
+}
+
+int ZopfliGetLengthExtraBitsValue(int l) {
+  static const int table[259] = {
+    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 0,
+    1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5,
+    6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6,
+    7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
+    13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2,
+    3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
+    10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
+    29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
+    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6,
+    7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
+    27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0
+  };
+  return table[l];
+}
+
+/*
+Returns symbol in range [257-285] (inclusive).
+*/
+int ZopfliGetLengthSymbol(int l) {
+  static const int table[259] = {
+    0, 0, 0, 257, 258, 259, 260, 261, 262, 263, 264,
+    265, 265, 266, 266, 267, 267, 268, 268,
+    269, 269, 269, 269, 270, 270, 270, 270,
+    271, 271, 271, 271, 272, 272, 272, 272,
+    273, 273, 273, 273, 273, 273, 273, 273,
+    274, 274, 274, 274, 274, 274, 274, 274,
+    275, 275, 275, 275, 275, 275, 275, 275,
+    276, 276, 276, 276, 276, 276, 276, 276,
+    277, 277, 277, 277, 277, 277, 277, 277,
+    277, 277, 277, 277, 277, 277, 277, 277,
+    278, 278, 278, 278, 278, 278, 278, 278,
+    278, 278, 278, 278, 278, 278, 278, 278,
+    279, 279, 279, 279, 279, 279, 279, 279,
+    279, 279, 279, 279, 279, 279, 279, 279,
+    280, 280, 280, 280, 280, 280, 280, 280,
+    280, 280, 280, 280, 280, 280, 280, 280,
+    281, 281, 281, 281, 281, 281, 281, 281,
+    281, 281, 281, 281, 281, 281, 281, 281,
+    281, 281, 281, 281, 281, 281, 281, 281,
+    281, 281, 281, 281, 281, 281, 281, 281,
+    282, 282, 282, 282, 282, 282, 282, 282,
+    282, 282, 282, 282, 282, 282, 282, 282,
+    282, 282, 282, 282, 282, 282, 282, 282,
+    282, 282, 282, 282, 282, 282, 282, 282,
+    283, 283, 283, 283, 283, 283, 283, 283,
+    283, 283, 283, 283, 283, 283, 283, 283,
+    283, 283, 283, 283, 283, 283, 283, 283,
+    283, 283, 283, 283, 283, 283, 283, 283,
+    284, 284, 284, 284, 284, 284, 284, 284,
+    284, 284, 284, 284, 284, 284, 284, 284,
+    284, 284, 284, 284, 284, 284, 284, 284,
+    284, 284, 284, 284, 284, 284, 284, 285
+  };
+  return table[l];
+}
+
+void ZopfliInitOptions(ZopfliOptions* options) {
+  options->verbose = 0;
+  options->verbose_more = 0;
+  options->numiterations = 15;
+  options->blocksplitting = 1;
+  options->blocksplittinglast = 0;
+  options->blocksplittingmax = 15;
+}
diff --git a/src/zopfli/util.h b/src/zopfli/util.h
new file mode 100644
index 0000000..4188f51
--- /dev/null
+++ b/src/zopfli/util.h
@@ -0,0 +1,175 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Several utilities, including: #defines to try different compression results,
+basic deflate specification values and generic program options.
+*/
+
+#ifndef ZOPFLI_UTIL_H_
+#define ZOPFLI_UTIL_H_
+
+#include <string.h>
+#include <stdlib.h>
+
+/* Minimum and maximum length that can be encoded in deflate. */
+#define ZOPFLI_MAX_MATCH 258
+#define ZOPFLI_MIN_MATCH 3
+
+/*
+The window size for deflate. Must be a power of two. This should be 32768, the
+maximum possible by the deflate spec. Anything less hurts compression more than
+speed.
+*/
+#define ZOPFLI_WINDOW_SIZE 32768
+
+/*
+The window mask used to wrap indices into the window. This is why the
+window size must be a power of two.
+*/
+#define ZOPFLI_WINDOW_MASK (ZOPFLI_WINDOW_SIZE - 1)
+
+/*
+A block structure of huge, non-smart, blocks to divide the input into, to allow
+operating on huge files without exceeding memory, such as the 1GB wiki9 corpus.
+The whole compression algorithm, including the smarter block splitting, will
+be executed independently on each huge block.
+Dividing into huge blocks hurts compression, but not much relative to the size.
+Set this to, for example, 20MB (20000000). Set it to 0 to disable master blocks.
+*/
+#define ZOPFLI_MASTER_BLOCK_SIZE 20000000
+
+/*
+Used to initialize costs for example
+*/
+#define ZOPFLI_LARGE_FLOAT 1e30
+
+/*
+For longest match cache. max 256. Uses huge amounts of memory but makes it
+faster. Uses this many times three bytes per single byte of the input data.
+This is so because longest match finding has to find the exact distance
+that belongs to each length for the best lz77 strategy.
+Good values: e.g. 5, 8.
+*/
+#define ZOPFLI_CACHE_LENGTH 8
+
+/*
+limit the max hash chain hits for this hash value. This has an effect only
+on files where the hash value is the same very often. On these files, this
+gives worse compression (the value should ideally be 32768, which is the
+ZOPFLI_WINDOW_SIZE, while zlib uses 4096 even for best level), but makes it
+faster on some specific files.
+Good value: e.g. 8192.
+*/
+#define ZOPFLI_MAX_CHAIN_HITS 8192
+
+/*
+Whether to use the longest match cache for ZopfliFindLongestMatch. This cache
+consumes a lot of memory but speeds it up. No effect on compression size.
+*/
+#define ZOPFLI_LONGEST_MATCH_CACHE
+
+/*
+Enable to remember amount of successive identical bytes in the hash chain for
+finding longest match
+required for ZOPFLI_HASH_SAME_HASH and ZOPFLI_SHORTCUT_LONG_REPETITIONS
+This has no effect on the compression result, and enabling it increases speed.
+*/
+#define ZOPFLI_HASH_SAME
+
+/*
+Switch to a faster hash based on the info from ZOPFLI_HASH_SAME once the
+best length so far is long enough. This is way faster for files with lots of
+identical bytes, on which the compressor is otherwise too slow. Regular files
+are unaffected or maybe a tiny bit slower.
+This has no effect on the compression result, only on speed.
+*/
+#define ZOPFLI_HASH_SAME_HASH
+
+/*
+Enable this, to avoid slowness for files which are a repetition of the same
+character more than a multiple of ZOPFLI_MAX_MATCH times. This should not affect
+the compression result.
+*/
+#define ZOPFLI_SHORTCUT_LONG_REPETITIONS
+
+/*
+Whether to use lazy matching in the greedy LZ77 implementation. This gives a
+better result of ZopfliLZ77Greedy, but the effect this has on the optimal LZ77
+varies from file to file.
+*/
+#define ZOPFLI_LAZY_MATCHING
+
+/*
+Gets the symbol for the given length, cfr. the DEFLATE spec.
+Returns the symbol in the range [257-285] (inclusive)
+*/
+int ZopfliGetLengthSymbol(int l);
+
+/* Gets the amount of extra bits for the given length, cfr. the DEFLATE spec. */
+int ZopfliGetLengthExtraBits(int l);
+
+/* Gets value of the extra bits for the given length, cfr. the DEFLATE spec. */
+int ZopfliGetLengthExtraBitsValue(int l);
+
+/* Gets the symbol for the given dist, cfr. the DEFLATE spec. */
+int ZopfliGetDistSymbol(int dist);
+
+/* Gets the amount of extra bits for the given dist, cfr. the DEFLATE spec. */
+int ZopfliGetDistExtraBits(int dist);
+
+/* Gets value of the extra bits for the given dist, cfr. the DEFLATE spec. */
+int ZopfliGetDistExtraBitsValue(int dist);
+
+/*
+Appends value to dynamically allocated memory, doubling its allocation size
+whenever needed.
+
+value: the value to append, type T
+data: pointer to the dynamic array to append to, type T**
+size: pointer to the size of the array to append to, type size_t*. This is the
+size that you consider the array to be, not the internal allocation size.
+Precondition: allocated size of data is at least a power of two greater than or
+equal than *size.
+*/
+#ifdef __cplusplus /* C++ cannot assign void* from malloc to *data */
+#define ZOPFLI_APPEND_DATA(/* T */ value, /* T** */ data, /* size_t* */ size) {\
+  if (!((*size) & ((*size) - 1))) {\
+    /*double alloc size if it's a power of two*/\
+    void** data_void = reinterpret_cast<void**>(data);\
+    *data_void = (*size) == 0 ? malloc(sizeof(**data))\
+                              : realloc((*data), (*size) * 2 * sizeof(**data));\
+  }\
+  (*data)[(*size)] = (value);\
+  (*size)++;\
+}
+#else /* C gives problems with strict-aliasing rules for (void**) cast */
+#define ZOPFLI_APPEND_DATA(/* T */ value, /* T** */ data, /* size_t* */ size) {\
+  if (!((*size) & ((*size) - 1))) {\
+    /*double alloc size if it's a power of two*/\
+    (*data) = (*size) == 0 ? malloc(sizeof(**data))\
+                           : realloc((*data), (*size) * 2 * sizeof(**data));\
+  }\
+  (*data)[(*size)] = (value);\
+  (*size)++;\
+}
+#endif
+
+
+#endif  /* ZOPFLI_UTIL_H_ */
diff --git a/src/zopfli/zlib_container.c b/src/zopfli/zlib_container.c
new file mode 100644
index 0000000..5b7d0aa
--- /dev/null
+++ b/src/zopfli/zlib_container.c
@@ -0,0 +1,79 @@
+/*
+Copyright 2013 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "zlib_container.h"
+#include "util.h"
+
+#include <stdio.h>
+
+#include "deflate.h"
+
+
+/* Calculates the adler32 checksum of the data */
+static unsigned adler32(const unsigned char* data, size_t size)
+{
+  static const unsigned sums_overflow = 5550;
+  unsigned s1 = 1;
+  unsigned s2 = 1 >> 16;
+
+  while (size > 0) {
+    size_t amount = size > sums_overflow ? sums_overflow : size;
+    size -= amount;
+    while (amount > 0) {
+      s1 += (*data++);
+      s2 += s1;
+      amount--;
+    }
+    s1 %= 65521;
+    s2 %= 65521;
+  }
+
+  return (s2 << 16) | s1;
+}
+
+void ZopfliZlibCompress(const ZopfliOptions* options,
+                        const unsigned char* in, size_t insize,
+                        unsigned char** out, size_t* outsize) {
+  unsigned char bitpointer = 0;
+  unsigned checksum = adler32(in, (unsigned)insize);
+  unsigned cmf = 120;  /* CM 8, CINFO 7. See zlib spec.*/
+  unsigned flevel = 0;
+  unsigned fdict = 0;
+  unsigned cmfflg = 256 * cmf + fdict * 32 + flevel * 64;
+  unsigned fcheck = 31 - cmfflg % 31;
+  cmfflg += fcheck;
+
+  ZOPFLI_APPEND_DATA(cmfflg / 256, out, outsize);
+  ZOPFLI_APPEND_DATA(cmfflg % 256, out, outsize);
+
+  ZopfliDeflate(options, 2 /* dynamic block */, 1 /* final */,
+                in, insize, &bitpointer, out, outsize);
+
+  ZOPFLI_APPEND_DATA((checksum >> 24) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((checksum >> 16) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA((checksum >> 8) % 256, out, outsize);
+  ZOPFLI_APPEND_DATA(checksum % 256, out, outsize);
+
+  if (options->verbose) {
+    fprintf(stderr,
+            "Original Size: %d, Zlib: %d, Compression: %f%% Removed\n",
+            (int)insize, (int)*outsize,
+            100.0 * (double)(insize - *outsize) / (double)insize);
+  }
+}
diff --git a/src/zopfli/zlib_container.h b/src/zopfli/zlib_container.h
new file mode 100644
index 0000000..9ddfb9c
--- /dev/null
+++ b/src/zopfli/zlib_container.h
@@ -0,0 +1,50 @@
+/*
+Copyright 2013 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#ifndef ZOPFLI_ZLIB_H_
+#define ZOPFLI_ZLIB_H_
+
+/*
+Functions to compress according to the Zlib specification.
+*/
+
+#include "zopfli.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+Compresses according to the zlib specification and append the compressed
+result to the output.
+
+options: global program options
+out: pointer to the dynamic output array to which the result is appended. Must
+  be freed after use.
+outsize: pointer to the dynamic output array size.
+*/
+void ZopfliZlibCompress(const ZopfliOptions* options,
+                        const unsigned char* in, size_t insize,
+                        unsigned char** out, size_t* outsize);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  /* ZOPFLI_ZLIB_H_ */
diff --git a/src/zopfli/zopfli.h b/src/zopfli/zopfli.h
new file mode 100644
index 0000000..56512a2
--- /dev/null
+++ b/src/zopfli/zopfli.h
@@ -0,0 +1,97 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#ifndef ZOPFLI_ZOPFLI_H_
+#define ZOPFLI_ZOPFLI_H_
+
+#include <stddef.h>
+#include <stdlib.h> /* for size_t */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+Options used throughout the program.
+*/
+typedef struct ZopfliOptions {
+  /* Whether to print output */
+  int verbose;
+
+  /* Whether to print more detailed output */
+  int verbose_more;
+
+  /*
+  Maximum amount of times to rerun forward and backward pass to optimize LZ77
+  compression cost. Good values: 10, 15 for small files, 5 for files over
+  several MB in size or it will be too slow.
+  */
+  int numiterations;
+
+  /*
+  If true, splits the data in multiple deflate blocks with optimal choice
+  for the block boundaries. Block splitting gives better compression. Default:
+  true (1).
+  */
+  int blocksplitting;
+
+  /*
+  If true, chooses the optimal block split points only after doing the iterative
+  LZ77 compression. If false, chooses the block split points first, then does
+  iterative LZ77 on each individual block. Depending on the file, either first
+  or last gives the best compression. Default: false (0).
+  */
+  int blocksplittinglast;
+
+  /*
+  Maximum amount of blocks to split into (0 for unlimited, but this can give
+  extreme results that hurt compression on some files). Default value: 15.
+  */
+  int blocksplittingmax;
+} ZopfliOptions;
+
+/* Initializes options with default values. */
+void ZopfliInitOptions(ZopfliOptions* options);
+
+/* Output format */
+typedef enum {
+  ZOPFLI_FORMAT_GZIP,
+  ZOPFLI_FORMAT_ZLIB,
+  ZOPFLI_FORMAT_DEFLATE
+} ZopfliFormat;
+
+/*
+Compresses according to the given output format and appends the result to the
+output.
+
+options: global program options
+output_type: the output format to use
+out: pointer to the dynamic output array to which the result is appended. Must
+  be freed after use
+outsize: pointer to the dynamic output array size
+*/
+void ZopfliCompress(const ZopfliOptions* options, ZopfliFormat output_type,
+                    const unsigned char* in, size_t insize,
+                    unsigned char** out, size_t* outsize);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  /* ZOPFLI_ZOPFLI_H_ */
diff --git a/src/zopfli/zopfli_bin.c b/src/zopfli/zopfli_bin.c
new file mode 100644
index 0000000..8a147ef
--- /dev/null
+++ b/src/zopfli/zopfli_bin.c
@@ -0,0 +1,203 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+/*
+Zopfli compressor program. It can output gzip-, zlib- or deflate-compatible
+data. By default it creates a .gz file. This tool can only compress, not
+decompress. Decompression can be done by any standard gzip, zlib or deflate
+decompressor.
+*/
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "deflate.h"
+#include "gzip_container.h"
+#include "zlib_container.h"
+
+/*
+Loads a file into a memory array.
+*/
+static void LoadFile(const char* filename,
+                     unsigned char** out, size_t* outsize) {
+  FILE* file;
+
+  *out = 0;
+  *outsize = 0;
+  file = fopen(filename, "rb");
+  if (!file) return;
+
+  fseek(file , 0 , SEEK_END);
+  *outsize = ftell(file);
+  rewind(file);
+
+  *out = (unsigned char*)malloc(*outsize);
+
+  if (*outsize && (*out)) {
+    size_t testsize = fread(*out, 1, *outsize, file);
+    if (testsize != *outsize) {
+      /* It could be a directory */
+      free(*out);
+      *out = 0;
+      *outsize = 0;
+    }
+  }
+
+  assert(!(*outsize) || out);  /* If size is not zero, out must be allocated. */
+  fclose(file);
+}
+
+/*
+Saves a file from a memory array, overwriting the file if it existed.
+*/
+static void SaveFile(const char* filename,
+                     const unsigned char* in, size_t insize) {
+  FILE* file = fopen(filename, "wb" );
+  assert(file);
+  fwrite((char*)in, 1, insize, file);
+  fclose(file);
+}
+
+/*
+outfilename: filename to write output to, or 0 to write to stdout instead
+*/
+static void CompressFile(const ZopfliOptions* options,
+                         ZopfliFormat output_type,
+                         const char* infilename,
+                         const char* outfilename) {
+  unsigned char* in;
+  size_t insize;
+  unsigned char* out = 0;
+  size_t outsize = 0;
+  LoadFile(infilename, &in, &insize);
+  if (insize == 0) {
+    fprintf(stderr, "Invalid filename: %s\n", infilename);
+    return;
+  }
+
+  ZopfliCompress(options, output_type, in, insize, &out, &outsize);
+
+  if (outfilename) {
+    SaveFile(outfilename, out, outsize);
+  } else {
+    size_t i;
+    for (i = 0; i < outsize; i++) {
+      /* Works only if terminal does not convert newlines. */
+      printf("%c", out[i]);
+    }
+  }
+
+  free(out);
+  free(in);
+}
+
+/*
+Add two strings together. Size does not matter. Result must be freed.
+*/
+static char* AddStrings(const char* str1, const char* str2) {
+  size_t len = strlen(str1) + strlen(str2);
+  char* result = (char*)malloc(len + 1);
+  if (!result) exit(-1); /* Allocation failed. */
+  strcpy(result, str1);
+  strcat(result, str2);
+  return result;
+}
+
+static char StringsEqual(const char* str1, const char* str2) {
+  return strcmp(str1, str2) == 0;
+}
+
+int main(int argc, char* argv[]) {
+  ZopfliOptions options;
+  ZopfliFormat output_type = ZOPFLI_FORMAT_GZIP;
+  const char* filename = 0;
+  int output_to_stdout = 0;
+  int i;
+
+  ZopfliInitOptions(&options);
+
+  for (i = 1; i < argc; i++) {
+    const char* arg = argv[i];
+    if (StringsEqual(arg, "-v")) options.verbose = 1;
+    else if (StringsEqual(arg, "-c")) output_to_stdout = 1;
+    else if (StringsEqual(arg, "--deflate")) {
+      output_type = ZOPFLI_FORMAT_DEFLATE;
+    }
+    else if (StringsEqual(arg, "--zlib")) output_type = ZOPFLI_FORMAT_ZLIB;
+    else if (StringsEqual(arg, "--gzip")) output_type = ZOPFLI_FORMAT_GZIP;
+    else if (StringsEqual(arg, "--splitlast")) options.blocksplittinglast = 1;
+    else if (arg[0] == '-' && arg[1] == '-' && arg[2] == 'i'
+        && arg[3] >= '0' && arg[3] <= '9') {
+      options.numiterations = atoi(arg + 3);
+    }
+    else if (StringsEqual(arg, "-h")) {
+      fprintf(stderr,
+          "Usage: zopfli [OPTION]... FILE\n"
+          "  -h    gives this help\n"
+          "  -c    write the result on standard output, instead of disk"
+          " filename + '.gz'\n"
+          "  -v    verbose mode\n"
+          "  --i#  perform # iterations (default 15). More gives"
+          " more compression but is slower."
+          " Examples: --i10, --i50, --i1000\n");
+      fprintf(stderr,
+          "  --gzip        output to gzip format (default)\n"
+          "  --zlib        output to zlib format instead of gzip\n"
+          "  --deflate     output to deflate format instead of gzip\n"
+          "  --splitlast   do block splitting last instead of first\n");
+      return 0;
+    }
+  }
+
+  if (options.numiterations < 1) {
+    fprintf(stderr, "Error: must have 1 or more iterations");
+    return 0;
+  }
+
+  for (i = 1; i < argc; i++) {
+    if (argv[i][0] != '-') {
+      char* outfilename;
+      filename = argv[i];
+      if (output_to_stdout) {
+        outfilename = 0;
+      } else if (output_type == ZOPFLI_FORMAT_GZIP) {
+        outfilename = AddStrings(filename, ".gz");
+      } else if (output_type == ZOPFLI_FORMAT_ZLIB) {
+        outfilename = AddStrings(filename, ".zlib");
+      } else {
+        assert(output_type == ZOPFLI_FORMAT_DEFLATE);
+        outfilename = AddStrings(filename, ".deflate");
+      }
+      if (options.verbose && outfilename) {
+        fprintf(stderr, "Saving to: %s\n", outfilename);
+      }
+      CompressFile(&options, output_type, filename, outfilename);
+      free(outfilename);
+    }
+  }
+
+  if (!filename) {
+    fprintf(stderr,
+            "Please provide filename\nFor help, type: %s -h\n", argv[0]);
+  }
+
+  return 0;
+}
diff --git a/src/zopfli/zopfli_lib.c b/src/zopfli/zopfli_lib.c
new file mode 100644
index 0000000..5f5b214
--- /dev/null
+++ b/src/zopfli/zopfli_lib.c
@@ -0,0 +1,42 @@
+/*
+Copyright 2011 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+*/
+
+#include "zopfli.h"
+
+#include "deflate.h"
+#include "gzip_container.h"
+#include "zlib_container.h"
+
+#include <assert.h>
+
+void ZopfliCompress(const ZopfliOptions* options, ZopfliFormat output_type,
+                    const unsigned char* in, size_t insize,
+                    unsigned char** out, size_t* outsize) {
+  if (output_type == ZOPFLI_FORMAT_GZIP) {
+    ZopfliGzipCompress(options, in, insize, out, outsize);
+  } else if (output_type == ZOPFLI_FORMAT_ZLIB) {
+    ZopfliZlibCompress(options, in, insize, out, outsize);
+  } else if (output_type == ZOPFLI_FORMAT_DEFLATE) {
+    unsigned char bp = 0;
+    ZopfliDeflate(options, 2 /* Dynamic block */, 1,
+                  in, insize, &bp, out, outsize);
+  } else {
+    assert(0);
+  }
+}
diff --git a/src/zopflipng/lodepng/lodepng.cpp b/src/zopflipng/lodepng/lodepng.cpp
new file mode 100644
index 0000000..e4f35d7
--- /dev/null
+++ b/src/zopflipng/lodepng/lodepng.cpp
@@ -0,0 +1,6260 @@
+/*
+LodePNG version 20131222
+
+Copyright (c) 2005-2013 Lode Vandevenne
+
+This software is provided 'as-is', without any express or implied
+warranty. In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+    1. The origin of this software must not be misrepresented; you must not
+    claim that you wrote the original software. If you use this software
+    in a product, an acknowledgment in the product documentation would be
+    appreciated but is not required.
+
+    2. Altered source versions must be plainly marked as such, and must not be
+    misrepresented as being the original software.
+
+    3. This notice may not be removed or altered from any source
+    distribution.
+*/
+
+/*
+The manual and changelog are in the header file "lodepng.h"
+Rename this file to lodepng.cpp to use it for C++, or to lodepng.c to use it for C.
+*/
+
+#include "lodepng.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef LODEPNG_COMPILE_CPP
+#include <fstream>
+#endif /*LODEPNG_COMPILE_CPP*/
+
+#define VERSION_STRING "20131222"
+
+/*
+This source file is built up in the following large parts. The code sections
+with the "LODEPNG_COMPILE_" #defines divide this up further in an intermixed way.
+-Tools for C and common code for PNG and Zlib
+-C Code for Zlib (huffman, deflate, ...)
+-C Code for PNG (file format chunks, adam7, PNG filters, color conversions, ...)
+-The C++ wrapper around all of the above
+*/
+
+/*The malloc, realloc and free functions defined here with "lodepng_" in front
+of the name, so that you can easily change them to others related to your
+platform if needed. Everything else in the code calls these. Pass
+-DLODEPNG_NO_COMPILE_ALLOCATORS to the compiler, or comment out
+#define LODEPNG_COMPILE_ALLOCATORS in the header, to disable the ones here and
+define them in your own project's source files without needing to change
+lodepng source code. Don't forget to remove "static" if you copypaste them
+from here.*/
+
+#ifdef LODEPNG_COMPILE_ALLOCATORS
+static void* lodepng_malloc(size_t size)
+{
+  return malloc(size);
+}
+
+static void* lodepng_realloc(void* ptr, size_t new_size)
+{
+  return realloc(ptr, new_size);
+}
+
+static void lodepng_free(void* ptr)
+{
+  free(ptr);
+}
+#else /*LODEPNG_COMPILE_ALLOCATORS*/
+void* lodepng_malloc(size_t size);
+void* lodepng_realloc(void* ptr, size_t new_size);
+void lodepng_free(void* ptr);
+#endif /*LODEPNG_COMPILE_ALLOCATORS*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* // Tools for C, and common code for PNG and Zlib.                       // */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*
+Often in case of an error a value is assigned to a variable and then it breaks
+out of a loop (to go to the cleanup phase of a function). This macro does that.
+It makes the error handling code shorter and more readable.
+
+Example: if(!uivector_resizev(&frequencies_ll, 286, 0)) ERROR_BREAK(83);
+*/
+#define CERROR_BREAK(errorvar, code)\
+{\
+  errorvar = code;\
+  break;\
+}
+
+/*version of CERROR_BREAK that assumes the common case where the error variable is named "error"*/
+#define ERROR_BREAK(code) CERROR_BREAK(error, code)
+
+/*Set error var to the error code, and return it.*/
+#define CERROR_RETURN_ERROR(errorvar, code)\
+{\
+  errorvar = code;\
+  return code;\
+}
+
+/*Try the code, if it returns error, also return the error.*/
+#define CERROR_TRY_RETURN(call)\
+{\
+  unsigned error = call;\
+  if(error) return error;\
+}
+
+/*
+About uivector, ucvector and string:
+-All of them wrap dynamic arrays or text strings in a similar way.
+-LodePNG was originally written in C++. The vectors replace the std::vectors that were used in the C++ version.
+-The string tools are made to avoid problems with compilers that declare things like strncat as deprecated.
+-They're not used in the interface, only internally in this file as static functions.
+-As with many other structs in this file, the init and cleanup functions serve as ctor and dtor.
+*/
+
+#ifdef LODEPNG_COMPILE_ZLIB
+/*dynamic vector of unsigned ints*/
+typedef struct uivector
+{
+  unsigned* data;
+  size_t size; /*size in number of unsigned longs*/
+  size_t allocsize; /*allocated size in bytes*/
+} uivector;
+
+static void uivector_cleanup(void* p)
+{
+  ((uivector*)p)->size = ((uivector*)p)->allocsize = 0;
+  lodepng_free(((uivector*)p)->data);
+  ((uivector*)p)->data = NULL;
+}
+
+/*returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned uivector_resize(uivector* p, size_t size)
+{
+  if(size * sizeof(unsigned) > p->allocsize)
+  {
+    size_t newsize = size * sizeof(unsigned) * 2;
+    void* data = lodepng_realloc(p->data, newsize);
+    if(data)
+    {
+      p->allocsize = newsize;
+      p->data = (unsigned*)data;
+      p->size = size;
+    }
+    else return 0;
+  }
+  else p->size = size;
+  return 1;
+}
+
+/*resize and give all new elements the value*/
+static unsigned uivector_resizev(uivector* p, size_t size, unsigned value)
+{
+  size_t oldsize = p->size, i;
+  if(!uivector_resize(p, size)) return 0;
+  for(i = oldsize; i < size; i++) p->data[i] = value;
+  return 1;
+}
+
+static void uivector_init(uivector* p)
+{
+  p->data = NULL;
+  p->size = p->allocsize = 0;
+}
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned uivector_push_back(uivector* p, unsigned c)
+{
+  if(!uivector_resize(p, p->size + 1)) return 0;
+  p->data[p->size - 1] = c;
+  return 1;
+}
+
+/*copy q to p, returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned uivector_copy(uivector* p, const uivector* q)
+{
+  size_t i;
+  if(!uivector_resize(p, q->size)) return 0;
+  for(i = 0; i < q->size; i++) p->data[i] = q->data[i];
+  return 1;
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+#endif /*LODEPNG_COMPILE_ZLIB*/
+
+/* /////////////////////////////////////////////////////////////////////////// */
+
+/*dynamic vector of unsigned chars*/
+typedef struct ucvector
+{
+  unsigned char* data;
+  size_t size; /*used size*/
+  size_t allocsize; /*allocated size*/
+} ucvector;
+
+/*returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned ucvector_resize(ucvector* p, size_t size)
+{
+  if(size * sizeof(unsigned char) > p->allocsize)
+  {
+    size_t newsize = size * sizeof(unsigned char) * 2;
+    void* data = lodepng_realloc(p->data, newsize);
+    if(data)
+    {
+      p->allocsize = newsize;
+      p->data = (unsigned char*)data;
+      p->size = size;
+    }
+    else return 0; /*error: not enough memory*/
+  }
+  else p->size = size;
+  return 1;
+}
+
+#ifdef LODEPNG_COMPILE_PNG
+
+static void ucvector_cleanup(void* p)
+{
+  ((ucvector*)p)->size = ((ucvector*)p)->allocsize = 0;
+  lodepng_free(((ucvector*)p)->data);
+  ((ucvector*)p)->data = NULL;
+}
+
+static void ucvector_init(ucvector* p)
+{
+  p->data = NULL;
+  p->size = p->allocsize = 0;
+}
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*resize and give all new elements the value*/
+static unsigned ucvector_resizev(ucvector* p, size_t size, unsigned char value)
+{
+  size_t oldsize = p->size, i;
+  if(!ucvector_resize(p, size)) return 0;
+  for(i = oldsize; i < size; i++) p->data[i] = value;
+  return 1;
+}
+#endif /*LODEPNG_COMPILE_DECODER*/
+#endif /*LODEPNG_COMPILE_PNG*/
+
+#ifdef LODEPNG_COMPILE_ZLIB
+/*you can both convert from vector to buffer&size and vica versa. If you use
+init_buffer to take over a buffer and size, it is not needed to use cleanup*/
+static void ucvector_init_buffer(ucvector* p, unsigned char* buffer, size_t size)
+{
+  p->data = buffer;
+  p->allocsize = p->size = size;
+}
+#endif /*LODEPNG_COMPILE_ZLIB*/
+
+#if (defined(LODEPNG_COMPILE_PNG) && defined(LODEPNG_COMPILE_ANCILLARY_CHUNKS)) || defined(LODEPNG_COMPILE_ENCODER)
+/*returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned ucvector_push_back(ucvector* p, unsigned char c)
+{
+  if(!ucvector_resize(p, p->size + 1)) return 0;
+  p->data[p->size - 1] = c;
+  return 1;
+}
+#endif /*defined(LODEPNG_COMPILE_PNG) || defined(LODEPNG_COMPILE_ENCODER)*/
+
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_PNG
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+/*returns 1 if success, 0 if failure ==> nothing done*/
+static unsigned string_resize(char** out, size_t size)
+{
+  char* data = (char*)lodepng_realloc(*out, size + 1);
+  if(data)
+  {
+    data[size] = 0; /*null termination char*/
+    *out = data;
+  }
+  return data != 0;
+}
+
+/*init a {char*, size_t} pair for use as string*/
+static void string_init(char** out)
+{
+  *out = NULL;
+  string_resize(out, 0);
+}
+
+/*free the above pair again*/
+static void string_cleanup(char** out)
+{
+  lodepng_free(*out);
+  *out = NULL;
+}
+
+static void string_set(char** out, const char* in)
+{
+  size_t insize = strlen(in), i = 0;
+  if(string_resize(out, insize))
+  {
+    for(i = 0; i < insize; i++)
+    {
+      (*out)[i] = in[i];
+    }
+  }
+}
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+#endif /*LODEPNG_COMPILE_PNG*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+unsigned lodepng_read32bitInt(const unsigned char* buffer)
+{
+  return (buffer[0] << 24) | (buffer[1] << 16) | (buffer[2] << 8) | buffer[3];
+}
+
+#if defined(LODEPNG_COMPILE_PNG) || defined(LODEPNG_COMPILE_ENCODER)
+/*buffer must have at least 4 allocated bytes available*/
+static void lodepng_set32bitInt(unsigned char* buffer, unsigned value)
+{
+  buffer[0] = (unsigned char)((value >> 24) & 0xff);
+  buffer[1] = (unsigned char)((value >> 16) & 0xff);
+  buffer[2] = (unsigned char)((value >>  8) & 0xff);
+  buffer[3] = (unsigned char)((value      ) & 0xff);
+}
+#endif /*defined(LODEPNG_COMPILE_PNG) || defined(LODEPNG_COMPILE_ENCODER)*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+static void lodepng_add32bitInt(ucvector* buffer, unsigned value)
+{
+  ucvector_resize(buffer, buffer->size + 4); /*todo: give error if resize failed*/
+  lodepng_set32bitInt(&buffer->data[buffer->size - 4], value);
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / File IO                                                                / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_DISK
+
+unsigned lodepng_load_file(unsigned char** out, size_t* outsize, const char* filename)
+{
+  FILE* file;
+  long size;
+
+  /*provide some proper output values if error will happen*/
+  *out = 0;
+  *outsize = 0;
+
+  file = fopen(filename, "rb");
+  if(!file) return 78;
+
+  /*get filesize:*/
+  fseek(file , 0 , SEEK_END);
+  size = ftell(file);
+  rewind(file);
+
+  /*read contents of the file into the vector*/
+  *outsize = 0;
+  *out = (unsigned char*)lodepng_malloc((size_t)size);
+  if(size && (*out)) (*outsize) = fread(*out, 1, (size_t)size, file);
+
+  fclose(file);
+  if(!(*out) && size) return 83; /*the above malloc failed*/
+  return 0;
+}
+
+/*write given buffer to the file, overwriting the file, it doesn't append to it.*/
+unsigned lodepng_save_file(const unsigned char* buffer, size_t buffersize, const char* filename)
+{
+  FILE* file;
+  file = fopen(filename, "wb" );
+  if(!file) return 79;
+  fwrite((char*)buffer , 1 , buffersize, file);
+  fclose(file);
+  return 0;
+}
+
+#endif /*LODEPNG_COMPILE_DISK*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* // End of common code and tools. Begin of Zlib related code.            // */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_ZLIB
+#ifdef LODEPNG_COMPILE_ENCODER
+/*TODO: this ignores potential out of memory errors*/
+#define addBitToStream(/*size_t**/ bitpointer, /*ucvector**/ bitstream, /*unsigned char*/ bit)\
+{\
+  /*add a new byte at the end*/\
+  if(((*bitpointer) & 7) == 0) ucvector_push_back(bitstream, (unsigned char)0);\
+  /*earlier bit of huffman code is in a lesser significant bit of an earlier byte*/\
+  (bitstream->data[bitstream->size - 1]) |= (bit << ((*bitpointer) & 0x7));\
+  (*bitpointer)++;\
+}
+
+static void addBitsToStream(size_t* bitpointer, ucvector* bitstream, unsigned value, size_t nbits)
+{
+  size_t i;
+  for(i = 0; i < nbits; i++) addBitToStream(bitpointer, bitstream, (unsigned char)((value >> i) & 1));
+}
+
+static void addBitsToStreamReversed(size_t* bitpointer, ucvector* bitstream, unsigned value, size_t nbits)
+{
+  size_t i;
+  for(i = 0; i < nbits; i++) addBitToStream(bitpointer, bitstream, (unsigned char)((value >> (nbits - 1 - i)) & 1));
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+#define READBIT(bitpointer, bitstream) ((bitstream[bitpointer >> 3] >> (bitpointer & 0x7)) & (unsigned char)1)
+
+static unsigned char readBitFromStream(size_t* bitpointer, const unsigned char* bitstream)
+{
+  unsigned char result = (unsigned char)(READBIT(*bitpointer, bitstream));
+  (*bitpointer)++;
+  return result;
+}
+
+static unsigned readBitsFromStream(size_t* bitpointer, const unsigned char* bitstream, size_t nbits)
+{
+  unsigned result = 0, i;
+  for(i = 0; i < nbits; i++)
+  {
+    result += ((unsigned)READBIT(*bitpointer, bitstream)) << i;
+    (*bitpointer)++;
+  }
+  return result;
+}
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Deflate - Huffman                                                      / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#define FIRST_LENGTH_CODE_INDEX 257
+#define LAST_LENGTH_CODE_INDEX 285
+/*256 literals, the end code, some length codes, and 2 unused codes*/
+#define NUM_DEFLATE_CODE_SYMBOLS 288
+/*the distance codes have their own symbols, 30 used, 2 unused*/
+#define NUM_DISTANCE_SYMBOLS 32
+/*the code length codes. 0-15: code lengths, 16: copy previous 3-6 times, 17: 3-10 zeros, 18: 11-138 zeros*/
+#define NUM_CODE_LENGTH_CODES 19
+
+/*the base lengths represented by codes 257-285*/
+static const unsigned LENGTHBASE[29]
+  = {3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 43, 51, 59,
+     67, 83, 99, 115, 131, 163, 195, 227, 258};
+
+/*the extra bits used by codes 257-285 (added to base length)*/
+static const unsigned LENGTHEXTRA[29]
+  = {0, 0, 0, 0, 0, 0, 0,  0,  1,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  3,
+      4,  4,  4,   4,   5,   5,   5,   5,   0};
+
+/*the base backwards distances (the bits of distance codes appear after length codes and use their own huffman tree)*/
+static const unsigned DISTANCEBASE[30]
+  = {1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513,
+     769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577};
+
+/*the extra bits of backwards distances (added to base)*/
+static const unsigned DISTANCEEXTRA[30]
+  = {0, 0, 0, 0, 1, 1, 2,  2,  3,  3,  4,  4,  5,  5,   6,   6,   7,   7,   8,
+       8,    9,    9,   10,   10,   11,   11,   12,    12,    13,    13};
+
+/*the order in which "code length alphabet code lengths" are stored, out of this
+the huffman tree of the dynamic huffman tree lengths is generated*/
+static const unsigned CLCL_ORDER[NUM_CODE_LENGTH_CODES]
+  = {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*
+Huffman tree struct, containing multiple representations of the tree
+*/
+typedef struct HuffmanTree
+{
+  unsigned* tree2d;
+  unsigned* tree1d;
+  unsigned* lengths; /*the lengths of the codes of the 1d-tree*/
+  unsigned maxbitlen; /*maximum number of bits a single code can get*/
+  unsigned numcodes; /*number of symbols in the alphabet = number of codes*/
+} HuffmanTree;
+
+/*function used for debug purposes to draw the tree in ascii art with C++*/
+/*
+static void HuffmanTree_draw(HuffmanTree* tree)
+{
+  std::cout << "tree. length: " << tree->numcodes << " maxbitlen: " << tree->maxbitlen << std::endl;
+  for(size_t i = 0; i < tree->tree1d.size; i++)
+  {
+    if(tree->lengths.data[i])
+      std::cout << i << " " << tree->tree1d.data[i] << " " << tree->lengths.data[i] << std::endl;
+  }
+  std::cout << std::endl;
+}*/
+
+static void HuffmanTree_init(HuffmanTree* tree)
+{
+  tree->tree2d = 0;
+  tree->tree1d = 0;
+  tree->lengths = 0;
+}
+
+static void HuffmanTree_cleanup(HuffmanTree* tree)
+{
+  lodepng_free(tree->tree2d);
+  lodepng_free(tree->tree1d);
+  lodepng_free(tree->lengths);
+}
+
+/*the tree representation used by the decoder. return value is error*/
+static unsigned HuffmanTree_make2DTree(HuffmanTree* tree)
+{
+  unsigned nodefilled = 0; /*up to which node it is filled*/
+  unsigned treepos = 0; /*position in the tree (1 of the numcodes columns)*/
+  unsigned n, i;
+
+  tree->tree2d = (unsigned*)lodepng_malloc(tree->numcodes * 2 * sizeof(unsigned));
+  if(!tree->tree2d) return 83; /*alloc fail*/
+
+  /*
+  convert tree1d[] to tree2d[][]. In the 2D array, a value of 32767 means
+  uninited, a value >= numcodes is an address to another bit, a value < numcodes
+  is a code. The 2 rows are the 2 possible bit values (0 or 1), there are as
+  many columns as codes - 1.
+  A good huffmann tree has N * 2 - 1 nodes, of which N - 1 are internal nodes.
+  Here, the internal nodes are stored (what their 0 and 1 option point to).
+  There is only memory for such good tree currently, if there are more nodes
+  (due to too long length codes), error 55 will happen
+  */
+  for(n = 0; n < tree->numcodes * 2; n++)
+  {
+    tree->tree2d[n] = 32767; /*32767 here means the tree2d isn't filled there yet*/
+  }
+
+  for(n = 0; n < tree->numcodes; n++) /*the codes*/
+  {
+    for(i = 0; i < tree->lengths[n]; i++) /*the bits for this code*/
+    {
+      unsigned char bit = (unsigned char)((tree->tree1d[n] >> (tree->lengths[n] - i - 1)) & 1);
+      if(treepos > tree->numcodes - 2) return 55; /*oversubscribed, see comment in lodepng_error_text*/
+      if(tree->tree2d[2 * treepos + bit] == 32767) /*not yet filled in*/
+      {
+        if(i + 1 == tree->lengths[n]) /*last bit*/
+        {
+          tree->tree2d[2 * treepos + bit] = n; /*put the current code in it*/
+          treepos = 0;
+        }
+        else
+        {
+          /*put address of the next step in here, first that address has to be found of course
+          (it's just nodefilled + 1)...*/
+          nodefilled++;
+          /*addresses encoded with numcodes added to it*/
+          tree->tree2d[2 * treepos + bit] = nodefilled + tree->numcodes;
+          treepos = nodefilled;
+        }
+      }
+      else treepos = tree->tree2d[2 * treepos + bit] - tree->numcodes;
+    }
+  }
+
+  for(n = 0; n < tree->numcodes * 2; n++)
+  {
+    if(tree->tree2d[n] == 32767) tree->tree2d[n] = 0; /*remove possible remaining 32767's*/
+  }
+
+  return 0;
+}
+
+/*
+Second step for the ...makeFromLengths and ...makeFromFrequencies functions.
+numcodes, lengths and maxbitlen must already be filled in correctly. return
+value is error.
+*/
+static unsigned HuffmanTree_makeFromLengths2(HuffmanTree* tree)
+{
+  uivector blcount;
+  uivector nextcode;
+  unsigned bits, n, error = 0;
+
+  uivector_init(&blcount);
+  uivector_init(&nextcode);
+
+  tree->tree1d = (unsigned*)lodepng_malloc(tree->numcodes * sizeof(unsigned));
+  if(!tree->tree1d) error = 83; /*alloc fail*/
+
+  if(!uivector_resizev(&blcount, tree->maxbitlen + 1, 0)
+  || !uivector_resizev(&nextcode, tree->maxbitlen + 1, 0))
+    error = 83; /*alloc fail*/
+
+  if(!error)
+  {
+    /*step 1: count number of instances of each code length*/
+    for(bits = 0; bits < tree->numcodes; bits++) blcount.data[tree->lengths[bits]]++;
+    /*step 2: generate the nextcode values*/
+    for(bits = 1; bits <= tree->maxbitlen; bits++)
+    {
+      nextcode.data[bits] = (nextcode.data[bits - 1] + blcount.data[bits - 1]) << 1;
+    }
+    /*step 3: generate all the codes*/
+    for(n = 0; n < tree->numcodes; n++)
+    {
+      if(tree->lengths[n] != 0) tree->tree1d[n] = nextcode.data[tree->lengths[n]]++;
+    }
+  }
+
+  uivector_cleanup(&blcount);
+  uivector_cleanup(&nextcode);
+
+  if(!error) return HuffmanTree_make2DTree(tree);
+  else return error;
+}
+
+/*
+given the code lengths (as stored in the PNG file), generate the tree as defined
+by Deflate. maxbitlen is the maximum bits that a code in the tree can have.
+return value is error.
+*/
+static unsigned HuffmanTree_makeFromLengths(HuffmanTree* tree, const unsigned* bitlen,
+                                            size_t numcodes, unsigned maxbitlen)
+{
+  unsigned i;
+  tree->lengths = (unsigned*)lodepng_malloc(numcodes * sizeof(unsigned));
+  if(!tree->lengths) return 83; /*alloc fail*/
+  for(i = 0; i < numcodes; i++) tree->lengths[i] = bitlen[i];
+  tree->numcodes = (unsigned)numcodes; /*number of symbols*/
+  tree->maxbitlen = maxbitlen;
+  return HuffmanTree_makeFromLengths2(tree);
+}
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+/*
+A coin, this is the terminology used for the package-merge algorithm and the
+coin collector's problem. This is used to generate the huffman tree.
+A coin can be multiple coins (when they're merged)
+*/
+typedef struct Coin
+{
+  uivector symbols;
+  float weight; /*the sum of all weights in this coin*/
+} Coin;
+
+static void coin_init(Coin* c)
+{
+  uivector_init(&c->symbols);
+}
+
+/*argument c is void* so that this dtor can be given as function pointer to the vector resize function*/
+static void coin_cleanup(void* c)
+{
+  uivector_cleanup(&((Coin*)c)->symbols);
+}
+
+static void coin_copy(Coin* c1, const Coin* c2)
+{
+  c1->weight = c2->weight;
+  uivector_copy(&c1->symbols, &c2->symbols);
+}
+
+static void add_coins(Coin* c1, const Coin* c2)
+{
+  size_t i;
+  for(i = 0; i < c2->symbols.size; i++) uivector_push_back(&c1->symbols, c2->symbols.data[i]);
+  c1->weight += c2->weight;
+}
+
+static void init_coins(Coin* coins, size_t num)
+{
+  size_t i;
+  for(i = 0; i < num; i++) coin_init(&coins[i]);
+}
+
+static void cleanup_coins(Coin* coins, size_t num)
+{
+  size_t i;
+  for(i = 0; i < num; i++) coin_cleanup(&coins[i]);
+}
+
+static int coin_compare(const void* a, const void* b) {
+  float wa = ((const Coin*)a)->weight;
+  float wb = ((const Coin*)b)->weight;
+  return wa > wb ? 1 : wa < wb ? -1 : 0;
+}
+
+static unsigned append_symbol_coins(Coin* coins, const unsigned* frequencies, unsigned numcodes, size_t sum)
+{
+  unsigned i;
+  unsigned j = 0; /*index of present symbols*/
+  for(i = 0; i < numcodes; i++)
+  {
+    if(frequencies[i] != 0) /*only include symbols that are present*/
+    {
+      coins[j].weight = frequencies[i] / (float)sum;
+      uivector_push_back(&coins[j].symbols, i);
+      j++;
+    }
+  }
+  return 0;
+}
+
+unsigned lodepng_huffman_code_lengths(unsigned* lengths, const unsigned* frequencies,
+                                      size_t numcodes, unsigned maxbitlen)
+{
+  unsigned i, j;
+  size_t sum = 0, numpresent = 0;
+  unsigned error = 0;
+  Coin* coins; /*the coins of the currently calculated row*/
+  Coin* prev_row; /*the previous row of coins*/
+  unsigned numcoins;
+  unsigned coinmem;
+
+  if(numcodes == 0) return 80; /*error: a tree of 0 symbols is not supposed to be made*/
+
+  for(i = 0; i < numcodes; i++)
+  {
+    if(frequencies[i] > 0)
+    {
+      numpresent++;
+      sum += frequencies[i];
+    }
+  }
+
+  for(i = 0; i < numcodes; i++) lengths[i] = 0;
+
+  /*ensure at least two present symbols. There should be at least one symbol
+  according to RFC 1951 section 3.2.7. To decoders incorrectly require two. To
+  make these work as well ensure there are at least two symbols. The
+  Package-Merge code below also doesn't work correctly if there's only one
+  symbol, it'd give it the theoritical 0 bits but in practice zlib wants 1 bit*/
+  if(numpresent == 0)
+  {
+    lengths[0] = lengths[1] = 1; /*note that for RFC 1951 section 3.2.7, only lengths[0] = 1 is needed*/
+  }
+  else if(numpresent == 1)
+  {
+    for(i = 0; i < numcodes; i++)
+    {
+      if(frequencies[i])
+      {
+        lengths[i] = 1;
+        lengths[i == 0 ? 1 : 0] = 1;
+        break;
+      }
+    }
+  }
+  else
+  {
+    /*Package-Merge algorithm represented by coin collector's problem
+    For every symbol, maxbitlen coins will be created*/
+
+    coinmem = numpresent * 2; /*max amount of coins needed with the current algo*/
+    coins = (Coin*)lodepng_malloc(sizeof(Coin) * coinmem);
+    prev_row = (Coin*)lodepng_malloc(sizeof(Coin) * coinmem);
+    if(!coins || !prev_row)
+    {
+      lodepng_free(coins);
+      lodepng_free(prev_row);
+      return 83; /*alloc fail*/
+    }
+    init_coins(coins, coinmem);
+    init_coins(prev_row, coinmem);
+
+    /*first row, lowest denominator*/
+    error = append_symbol_coins(coins, frequencies, numcodes, sum);
+    numcoins = numpresent;
+    qsort(coins, numcoins, sizeof(Coin), coin_compare);
+    if(!error)
+    {
+      unsigned numprev = 0;
+      for(j = 1; j <= maxbitlen && !error; j++) /*each of the remaining rows*/
+      {
+        unsigned tempnum;
+        Coin* tempcoins;
+        /*swap prev_row and coins, and their amounts*/
+        tempcoins = prev_row; prev_row = coins; coins = tempcoins;
+        tempnum = numprev; numprev = numcoins; numcoins = tempnum;
+
+        cleanup_coins(coins, numcoins);
+        init_coins(coins, numcoins);
+
+        numcoins = 0;
+
+        /*fill in the merged coins of the previous row*/
+        for(i = 0; i + 1 < numprev; i += 2)
+        {
+          /*merge prev_row[i] and prev_row[i + 1] into new coin*/
+          Coin* coin = &coins[numcoins++];
+          coin_copy(coin, &prev_row[i]);
+          add_coins(coin, &prev_row[i + 1]);
+        }
+        /*fill in all the original symbols again*/
+        if(j < maxbitlen)
+        {
+          error = append_symbol_coins(coins + numcoins, frequencies, numcodes, sum);
+          numcoins += numpresent;
+        }
+        qsort(coins, numcoins, sizeof(Coin), coin_compare);
+      }
+    }
+
+    if(!error)
+    {
+      /*calculate the lenghts of each symbol, as the amount of times a coin of each symbol is used*/
+      for(i = 0; i < numpresent - 1; i++)
+      {
+        Coin* coin = &coins[i];
+        for(j = 0; j < coin->symbols.size; j++) lengths[coin->symbols.data[j]]++;
+      }
+    }
+
+    cleanup_coins(coins, coinmem);
+    lodepng_free(coins);
+    cleanup_coins(prev_row, coinmem);
+    lodepng_free(prev_row);
+  }
+
+  return error;
+}
+
+/*Create the Huffman tree given the symbol frequencies*/
+static unsigned HuffmanTree_makeFromFrequencies(HuffmanTree* tree, const unsigned* frequencies,
+                                                size_t mincodes, size_t numcodes, unsigned maxbitlen)
+{
+  unsigned error = 0;
+  while(!frequencies[numcodes - 1] && numcodes > mincodes) numcodes--; /*trim zeroes*/
+  tree->maxbitlen = maxbitlen;
+  tree->numcodes = (unsigned)numcodes; /*number of symbols*/
+  tree->lengths = (unsigned*)lodepng_realloc(tree->lengths, numcodes * sizeof(unsigned));
+  if(!tree->lengths) return 83; /*alloc fail*/
+  /*initialize all lengths to 0*/
+  memset(tree->lengths, 0, numcodes * sizeof(unsigned));
+
+  error = lodepng_huffman_code_lengths(tree->lengths, frequencies, numcodes, maxbitlen);
+  if(!error) error = HuffmanTree_makeFromLengths2(tree);
+  return error;
+}
+
+static unsigned HuffmanTree_getCode(const HuffmanTree* tree, unsigned index)
+{
+  return tree->tree1d[index];
+}
+
+static unsigned HuffmanTree_getLength(const HuffmanTree* tree, unsigned index)
+{
+  return tree->lengths[index];
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+/*get the literal and length code tree of a deflated block with fixed tree, as per the deflate specification*/
+static unsigned generateFixedLitLenTree(HuffmanTree* tree)
+{
+  unsigned i, error = 0;
+  unsigned* bitlen = (unsigned*)lodepng_malloc(NUM_DEFLATE_CODE_SYMBOLS * sizeof(unsigned));
+  if(!bitlen) return 83; /*alloc fail*/
+
+  /*288 possible codes: 0-255=literals, 256=endcode, 257-285=lengthcodes, 286-287=unused*/
+  for(i =   0; i <= 143; i++) bitlen[i] = 8;
+  for(i = 144; i <= 255; i++) bitlen[i] = 9;
+  for(i = 256; i <= 279; i++) bitlen[i] = 7;
+  for(i = 280; i <= 287; i++) bitlen[i] = 8;
+
+  error = HuffmanTree_makeFromLengths(tree, bitlen, NUM_DEFLATE_CODE_SYMBOLS, 15);
+
+  lodepng_free(bitlen);
+  return error;
+}
+
+/*get the distance code tree of a deflated block with fixed tree, as specified in the deflate specification*/
+static unsigned generateFixedDistanceTree(HuffmanTree* tree)
+{
+  unsigned i, error = 0;
+  unsigned* bitlen = (unsigned*)lodepng_malloc(NUM_DISTANCE_SYMBOLS * sizeof(unsigned));
+  if(!bitlen) return 83; /*alloc fail*/
+
+  /*there are 32 distance codes, but 30-31 are unused*/
+  for(i = 0; i < NUM_DISTANCE_SYMBOLS; i++) bitlen[i] = 5;
+  error = HuffmanTree_makeFromLengths(tree, bitlen, NUM_DISTANCE_SYMBOLS, 15);
+
+  lodepng_free(bitlen);
+  return error;
+}
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+/*
+returns the code, or (unsigned)(-1) if error happened
+inbitlength is the length of the complete buffer, in bits (so its byte length times 8)
+*/
+static unsigned huffmanDecodeSymbol(const unsigned char* in, size_t* bp,
+                                    const HuffmanTree* codetree, size_t inbitlength)
+{
+  unsigned treepos = 0, ct;
+  for(;;)
+  {
+    if(*bp >= inbitlength) return (unsigned)(-1); /*error: end of input memory reached without endcode*/
+    /*
+    decode the symbol from the tree. The "readBitFromStream" code is inlined in
+    the expression below because this is the biggest bottleneck while decoding
+    */
+    ct = codetree->tree2d[(treepos << 1) + READBIT(*bp, in)];
+    (*bp)++;
+    if(ct < codetree->numcodes) return ct; /*the symbol is decoded, return it*/
+    else treepos = ct - codetree->numcodes; /*symbol not yet decoded, instead move tree position*/
+
+    if(treepos >= codetree->numcodes) return (unsigned)(-1); /*error: it appeared outside the codetree*/
+  }
+}
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Inflator (Decompressor)                                                / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*get the tree of a deflated block with fixed tree, as specified in the deflate specification*/
+static void getTreeInflateFixed(HuffmanTree* tree_ll, HuffmanTree* tree_d)
+{
+  /*TODO: check for out of memory errors*/
+  generateFixedLitLenTree(tree_ll);
+  generateFixedDistanceTree(tree_d);
+}
+
+/*get the tree of a deflated block with dynamic tree, the tree itself is also Huffman compressed with a known tree*/
+static unsigned getTreeInflateDynamic(HuffmanTree* tree_ll, HuffmanTree* tree_d,
+                                      const unsigned char* in, size_t* bp, size_t inlength)
+{
+  /*make sure that length values that aren't filled in will be 0, or a wrong tree will be generated*/
+  unsigned error = 0;
+  unsigned n, HLIT, HDIST, HCLEN, i;
+  size_t inbitlength = inlength * 8;
+
+  /*see comments in deflateDynamic for explanation of the context and these variables, it is analogous*/
+  unsigned* bitlen_ll = 0; /*lit,len code lengths*/
+  unsigned* bitlen_d = 0; /*dist code lengths*/
+  /*code length code lengths ("clcl"), the bit lengths of the huffman tree used to compress bitlen_ll and bitlen_d*/
+  unsigned* bitlen_cl = 0;
+  HuffmanTree tree_cl; /*the code tree for code length codes (the huffman tree for compressed huffman trees)*/
+
+  if((*bp) >> 3 >= inlength - 2) return 49; /*error: the bit pointer is or will go past the memory*/
+
+  /*number of literal/length codes + 257. Unlike the spec, the value 257 is added to it here already*/
+  HLIT =  readBitsFromStream(bp, in, 5) + 257;
+  /*number of distance codes. Unlike the spec, the value 1 is added to it here already*/
+  HDIST = readBitsFromStream(bp, in, 5) + 1;
+  /*number of code length codes. Unlike the spec, the value 4 is added to it here already*/
+  HCLEN = readBitsFromStream(bp, in, 4) + 4;
+
+  HuffmanTree_init(&tree_cl);
+
+  while(!error)
+  {
+    /*read the code length codes out of 3 * (amount of code length codes) bits*/
+
+    bitlen_cl = (unsigned*)lodepng_malloc(NUM_CODE_LENGTH_CODES * sizeof(unsigned));
+    if(!bitlen_cl) ERROR_BREAK(83 /*alloc fail*/);
+
+    for(i = 0; i < NUM_CODE_LENGTH_CODES; i++)
+    {
+      if(i < HCLEN) bitlen_cl[CLCL_ORDER[i]] = readBitsFromStream(bp, in, 3);
+      else bitlen_cl[CLCL_ORDER[i]] = 0; /*if not, it must stay 0*/
+    }
+
+    error = HuffmanTree_makeFromLengths(&tree_cl, bitlen_cl, NUM_CODE_LENGTH_CODES, 7);
+    if(error) break;
+
+    /*now we can use this tree to read the lengths for the tree that this function will return*/
+    bitlen_ll = (unsigned*)lodepng_malloc(NUM_DEFLATE_CODE_SYMBOLS * sizeof(unsigned));
+    bitlen_d = (unsigned*)lodepng_malloc(NUM_DISTANCE_SYMBOLS * sizeof(unsigned));
+    if(!bitlen_ll || !bitlen_d) ERROR_BREAK(83 /*alloc fail*/);
+    for(i = 0; i < NUM_DEFLATE_CODE_SYMBOLS; i++) bitlen_ll[i] = 0;
+    for(i = 0; i < NUM_DISTANCE_SYMBOLS; i++) bitlen_d[i] = 0;
+
+    /*i is the current symbol we're reading in the part that contains the code lengths of lit/len and dist codes*/
+    i = 0;
+    while(i < HLIT + HDIST)
+    {
+      unsigned code = huffmanDecodeSymbol(in, bp, &tree_cl, inbitlength);
+      if(code <= 15) /*a length code*/
+      {
+        if(i < HLIT) bitlen_ll[i] = code;
+        else bitlen_d[i - HLIT] = code;
+        i++;
+      }
+      else if(code == 16) /*repeat previous*/
+      {
+        unsigned replength = 3; /*read in the 2 bits that indicate repeat length (3-6)*/
+        unsigned value; /*set value to the previous code*/
+
+        if(*bp >= inbitlength) ERROR_BREAK(50); /*error, bit pointer jumps past memory*/
+        if (i == 0) ERROR_BREAK(54); /*can't repeat previous if i is 0*/
+
+        replength += readBitsFromStream(bp, in, 2);
+
+        if(i < HLIT + 1) value = bitlen_ll[i - 1];
+        else value = bitlen_d[i - HLIT - 1];
+        /*repeat this value in the next lengths*/
+        for(n = 0; n < replength; n++)
+        {
+          if(i >= HLIT + HDIST) ERROR_BREAK(13); /*error: i is larger than the amount of codes*/
+          if(i < HLIT) bitlen_ll[i] = value;
+          else bitlen_d[i - HLIT] = value;
+          i++;
+        }
+      }
+      else if(code == 17) /*repeat "0" 3-10 times*/
+      {
+        unsigned replength = 3; /*read in the bits that indicate repeat length*/
+        if(*bp >= inbitlength) ERROR_BREAK(50); /*error, bit pointer jumps past memory*/
+
+        replength += readBitsFromStream(bp, in, 3);
+
+        /*repeat this value in the next lengths*/
+        for(n = 0; n < replength; n++)
+        {
+          if(i >= HLIT + HDIST) ERROR_BREAK(14); /*error: i is larger than the amount of codes*/
+
+          if(i < HLIT) bitlen_ll[i] = 0;
+          else bitlen_d[i - HLIT] = 0;
+          i++;
+        }
+      }
+      else if(code == 18) /*repeat "0" 11-138 times*/
+      {
+        unsigned replength = 11; /*read in the bits that indicate repeat length*/
+        if(*bp >= inbitlength) ERROR_BREAK(50); /*error, bit pointer jumps past memory*/
+
+        replength += readBitsFromStream(bp, in, 7);
+
+        /*repeat this value in the next lengths*/
+        for(n = 0; n < replength; n++)
+        {
+          if(i >= HLIT + HDIST) ERROR_BREAK(15); /*error: i is larger than the amount of codes*/
+
+          if(i < HLIT) bitlen_ll[i] = 0;
+          else bitlen_d[i - HLIT] = 0;
+          i++;
+        }
+      }
+      else /*if(code == (unsigned)(-1))*/ /*huffmanDecodeSymbol returns (unsigned)(-1) in case of error*/
+      {
+        if(code == (unsigned)(-1))
+        {
+          /*return error code 10 or 11 depending on the situation that happened in huffmanDecodeSymbol
+          (10=no endcode, 11=wrong jump outside of tree)*/
+          error = (*bp) > inbitlength ? 10 : 11;
+        }
+        else error = 16; /*unexisting code, this can never happen*/
+        break;
+      }
+    }
+    if(error) break;
+
+    if(bitlen_ll[256] == 0) ERROR_BREAK(64); /*the length of the end code 256 must be larger than 0*/
+
+    /*now we've finally got HLIT and HDIST, so generate the code trees, and the function is done*/
+    error = HuffmanTree_makeFromLengths(tree_ll, bitlen_ll, NUM_DEFLATE_CODE_SYMBOLS, 15);
+    if(error) break;
+    error = HuffmanTree_makeFromLengths(tree_d, bitlen_d, NUM_DISTANCE_SYMBOLS, 15);
+
+    break; /*end of error-while*/
+  }
+
+  lodepng_free(bitlen_cl);
+  lodepng_free(bitlen_ll);
+  lodepng_free(bitlen_d);
+  HuffmanTree_cleanup(&tree_cl);
+
+  return error;
+}
+
+/*inflate a block with dynamic of fixed Huffman tree*/
+static unsigned inflateHuffmanBlock(ucvector* out, const unsigned char* in, size_t* bp,
+                                    size_t* pos, size_t inlength, unsigned btype)
+{
+  unsigned error = 0;
+  HuffmanTree tree_ll; /*the huffman tree for literal and length codes*/
+  HuffmanTree tree_d; /*the huffman tree for distance codes*/
+  size_t inbitlength = inlength * 8;
+
+  HuffmanTree_init(&tree_ll);
+  HuffmanTree_init(&tree_d);
+
+  if(btype == 1) getTreeInflateFixed(&tree_ll, &tree_d);
+  else if(btype == 2) error = getTreeInflateDynamic(&tree_ll, &tree_d, in, bp, inlength);
+
+  while(!error) /*decode all symbols until end reached, breaks at end code*/
+  {
+    /*code_ll is literal, length or end code*/
+    unsigned code_ll = huffmanDecodeSymbol(in, bp, &tree_ll, inbitlength);
+    if(code_ll <= 255) /*literal symbol*/
+    {
+      if((*pos) >= out->size)
+      {
+        /*reserve more room at once*/
+        if(!ucvector_resize(out, ((*pos) + 1) * 2)) ERROR_BREAK(83 /*alloc fail*/);
+      }
+      out->data[(*pos)] = (unsigned char)(code_ll);
+      (*pos)++;
+    }
+    else if(code_ll >= FIRST_LENGTH_CODE_INDEX && code_ll <= LAST_LENGTH_CODE_INDEX) /*length code*/
+    {
+      unsigned code_d, distance;
+      unsigned numextrabits_l, numextrabits_d; /*extra bits for length and distance*/
+      size_t start, forward, backward, length;
+
+      /*part 1: get length base*/
+      length = LENGTHBASE[code_ll - FIRST_LENGTH_CODE_INDEX];
+
+      /*part 2: get extra bits and add the value of that to length*/
+      numextrabits_l = LENGTHEXTRA[code_ll - FIRST_LENGTH_CODE_INDEX];
+      if(*bp >= inbitlength) ERROR_BREAK(51); /*error, bit pointer will jump past memory*/
+      length += readBitsFromStream(bp, in, numextrabits_l);
+
+      /*part 3: get distance code*/
+      code_d = huffmanDecodeSymbol(in, bp, &tree_d, inbitlength);
+      if(code_d > 29)
+      {
+        if(code_ll == (unsigned)(-1)) /*huffmanDecodeSymbol returns (unsigned)(-1) in case of error*/
+        {
+          /*return error code 10 or 11 depending on the situation that happened in huffmanDecodeSymbol
+          (10=no endcode, 11=wrong jump outside of tree)*/
+          error = (*bp) > inlength * 8 ? 10 : 11;
+        }
+        else error = 18; /*error: invalid distance code (30-31 are never used)*/
+        break;
+      }
+      distance = DISTANCEBASE[code_d];
+
+      /*part 4: get extra bits from distance*/
+      numextrabits_d = DISTANCEEXTRA[code_d];
+      if(*bp >= inbitlength) ERROR_BREAK(51); /*error, bit pointer will jump past memory*/
+
+      distance += readBitsFromStream(bp, in, numextrabits_d);
+
+      /*part 5: fill in all the out[n] values based on the length and dist*/
+      start = (*pos);
+      if(distance > start) ERROR_BREAK(52); /*too long backward distance*/
+      backward = start - distance;
+      if((*pos) + length >= out->size)
+      {
+        /*reserve more room at once*/
+        if(!ucvector_resize(out, ((*pos) + length) * 2)) ERROR_BREAK(83 /*alloc fail*/);
+      }
+
+      for(forward = 0; forward < length; forward++)
+      {
+        out->data[(*pos)] = out->data[backward];
+        (*pos)++;
+        backward++;
+        if(backward >= start) backward = start - distance;
+      }
+    }
+    else if(code_ll == 256)
+    {
+      break; /*end code, break the loop*/
+    }
+    else /*if(code == (unsigned)(-1))*/ /*huffmanDecodeSymbol returns (unsigned)(-1) in case of error*/
+    {
+      /*return error code 10 or 11 depending on the situation that happened in huffmanDecodeSymbol
+      (10=no endcode, 11=wrong jump outside of tree)*/
+      error = (*bp) > inlength * 8 ? 10 : 11;
+      break;
+    }
+  }
+
+  HuffmanTree_cleanup(&tree_ll);
+  HuffmanTree_cleanup(&tree_d);
+
+  return error;
+}
+
+static unsigned inflateNoCompression(ucvector* out, const unsigned char* in, size_t* bp, size_t* pos, size_t inlength)
+{
+  /*go to first boundary of byte*/
+  size_t p;
+  unsigned LEN, NLEN, n, error = 0;
+  while(((*bp) & 0x7) != 0) (*bp)++;
+  p = (*bp) / 8; /*byte position*/
+
+  /*read LEN (2 bytes) and NLEN (2 bytes)*/
+  if(p >= inlength - 4) return 52; /*error, bit pointer will jump past memory*/
+  LEN = in[p] + 256 * in[p + 1]; p += 2;
+  NLEN = in[p] + 256 * in[p + 1]; p += 2;
+
+  /*check if 16-bit NLEN is really the one's complement of LEN*/
+  if(LEN + NLEN != 65535) return 21; /*error: NLEN is not one's complement of LEN*/
+
+  if((*pos) + LEN >= out->size)
+  {
+    if(!ucvector_resize(out, (*pos) + LEN)) return 83; /*alloc fail*/
+  }
+
+  /*read the literal data: LEN bytes are now stored in the out buffer*/
+  if(p + LEN > inlength) return 23; /*error: reading outside of in buffer*/
+  for(n = 0; n < LEN; n++) out->data[(*pos)++] = in[p++];
+
+  (*bp) = p * 8;
+
+  return error;
+}
+
+static unsigned lodepng_inflatev(ucvector* out,
+                                 const unsigned char* in, size_t insize,
+                                 const LodePNGDecompressSettings* settings)
+{
+  /*bit pointer in the "in" data, current byte is bp >> 3, current bit is bp & 0x7 (from lsb to msb of the byte)*/
+  size_t bp = 0;
+  unsigned BFINAL = 0;
+  size_t pos = 0; /*byte position in the out buffer*/
+
+  unsigned error = 0;
+
+  (void)settings;
+
+  while(!BFINAL)
+  {
+    unsigned BTYPE;
+    if(bp + 2 >= insize * 8) return 52; /*error, bit pointer will jump past memory*/
+    BFINAL = readBitFromStream(&bp, in);
+    BTYPE = 1 * readBitFromStream(&bp, in);
+    BTYPE += 2 * readBitFromStream(&bp, in);
+
+    if(BTYPE == 3) return 20; /*error: invalid BTYPE*/
+    else if(BTYPE == 0) error = inflateNoCompression(out, in, &bp, &pos, insize); /*no compression*/
+    else error = inflateHuffmanBlock(out, in, &bp, &pos, insize, BTYPE); /*compression, BTYPE 01 or 10*/
+
+    if(error) return error;
+  }
+
+  /*Only now we know the true size of out, resize it to that*/
+  if(!ucvector_resize(out, pos)) error = 83; /*alloc fail*/
+
+  return error;
+}
+
+unsigned lodepng_inflate(unsigned char** out, size_t* outsize,
+                         const unsigned char* in, size_t insize,
+                         const LodePNGDecompressSettings* settings)
+{
+  unsigned error;
+  ucvector v;
+  ucvector_init_buffer(&v, *out, *outsize);
+  error = lodepng_inflatev(&v, in, insize, settings);
+  *out = v.data;
+  *outsize = v.size;
+  return error;
+}
+
+static unsigned inflate(unsigned char** out, size_t* outsize,
+                        const unsigned char* in, size_t insize,
+                        const LodePNGDecompressSettings* settings)
+{
+  if(settings->custom_inflate)
+  {
+    return settings->custom_inflate(out, outsize, in, insize, settings);
+  }
+  else
+  {
+    return lodepng_inflate(out, outsize, in, insize, settings);
+  }
+}
+
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Deflator (Compressor)                                                  / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+static const size_t MAX_SUPPORTED_DEFLATE_LENGTH = 258;
+
+/*bitlen is the size in bits of the code*/
+static void addHuffmanSymbol(size_t* bp, ucvector* compressed, unsigned code, unsigned bitlen)
+{
+  addBitsToStreamReversed(bp, compressed, code, bitlen);
+}
+
+/*search the index in the array, that has the largest value smaller than or equal to the given value,
+given array must be sorted (if no value is smaller, it returns the size of the given array)*/
+static size_t searchCodeIndex(const unsigned* array, size_t array_size, size_t value)
+{
+  /*linear search implementation*/
+  /*for(size_t i = 1; i < array_size; i++) if(array[i] > value) return i - 1;
+  return array_size - 1;*/
+
+  /*binary search implementation (not that much faster) (precondition: array_size > 0)*/
+  size_t left  = 1;
+  size_t right = array_size - 1;
+  while(left <= right)
+  {
+    size_t mid = (left + right) / 2;
+    if(array[mid] <= value) left = mid + 1; /*the value to find is more to the right*/
+    else if(array[mid - 1] > value) right = mid - 1; /*the value to find is more to the left*/
+    else return mid - 1;
+  }
+  return array_size - 1;
+}
+
+static void addLengthDistance(uivector* values, size_t length, size_t distance)
+{
+  /*values in encoded vector are those used by deflate:
+  0-255: literal bytes
+  256: end
+  257-285: length/distance pair (length code, followed by extra length bits, distance code, extra distance bits)
+  286-287: invalid*/
+
+  unsigned length_code = (unsigned)searchCodeIndex(LENGTHBASE, 29, length);
+  unsigned extra_length = (unsigned)(length - LENGTHBASE[length_code]);
+  unsigned dist_code = (unsigned)searchCodeIndex(DISTANCEBASE, 30, distance);
+  unsigned extra_distance = (unsigned)(distance - DISTANCEBASE[dist_code]);
+
+  uivector_push_back(values, length_code + FIRST_LENGTH_CODE_INDEX);
+  uivector_push_back(values, extra_length);
+  uivector_push_back(values, dist_code);
+  uivector_push_back(values, extra_distance);
+}
+
+static const unsigned HASH_BIT_MASK = 65535;
+static const unsigned HASH_NUM_VALUES = 65536;
+static const unsigned HASH_NUM_CHARACTERS = 3;
+static const unsigned HASH_SHIFT = 2;
+/*
+The HASH_NUM_CHARACTERS value is used to make encoding faster by using longer
+sequences to generate a hash value from the stream bytes. Setting it to 3
+gives exactly the same compression as the brute force method, since deflate's
+run length encoding starts with lengths of 3. Setting it to higher values,
+like 6, can make the encoding faster (not always though!), but will cause the
+encoding to miss any length between 3 and this value, so that the compression
+may be worse (but this can vary too depending on the image, sometimes it is
+even a bit better instead).
+The HASH_NUM_VALUES is the amount of unique possible hash values that
+combinations of bytes can give, the higher it is the more memory is needed, but
+if it's too low the advantage of hashing is gone.
+*/
+
+typedef struct Hash
+{
+  int* head; /*hash value to head circular pos*/
+  int* val; /*circular pos to hash value*/
+  /*circular pos to prev circular pos*/
+  unsigned short* chain;
+  unsigned short* zeros;
+} Hash;
+
+static unsigned hash_init(Hash* hash, unsigned windowsize)
+{
+  unsigned i;
+  hash->head = (int*)lodepng_malloc(sizeof(int) * HASH_NUM_VALUES);
+  hash->val = (int*)lodepng_malloc(sizeof(int) * windowsize);
+  hash->chain = (unsigned short*)lodepng_malloc(sizeof(unsigned short) * windowsize);
+  hash->zeros = (unsigned short*)lodepng_malloc(sizeof(unsigned short) * windowsize);
+
+  if(!hash->head || !hash->val || !hash->chain || !hash->zeros) return 83; /*alloc fail*/
+
+  /*initialize hash table*/
+  for(i = 0; i < HASH_NUM_VALUES; i++) hash->head[i] = -1;
+  for(i = 0; i < windowsize; i++) hash->val[i] = -1;
+  for(i = 0; i < windowsize; i++) hash->chain[i] = i; /*same value as index indicates uninitialized*/
+
+  return 0;
+}
+
+static void hash_cleanup(Hash* hash)
+{
+  lodepng_free(hash->head);
+  lodepng_free(hash->val);
+  lodepng_free(hash->chain);
+  lodepng_free(hash->zeros);
+}
+
+static unsigned getHash(const unsigned char* data, size_t size, size_t pos)
+{
+  unsigned result = 0;
+  if (HASH_NUM_CHARACTERS == 3 && pos + 2 < size) {
+    result ^= (data[pos + 0] << (0 * HASH_SHIFT));
+    result ^= (data[pos + 1] << (1 * HASH_SHIFT));
+    result ^= (data[pos + 2] << (2 * HASH_SHIFT));
+  } else {
+    size_t amount, i;
+    if(pos >= size) return 0;
+    amount = HASH_NUM_CHARACTERS;
+    if(pos + amount >= size) amount = size - pos;
+    for(i = 0; i < amount; i++) result ^= (data[pos + i] << (i * HASH_SHIFT));
+  }
+  return result & HASH_BIT_MASK;
+}
+
+static unsigned countZeros(const unsigned char* data, size_t size, size_t pos)
+{
+  const unsigned char* start = data + pos;
+  const unsigned char* end = start + MAX_SUPPORTED_DEFLATE_LENGTH;
+  if(end > data + size) end = data + size;
+  data = start;
+  while (data != end && *data == 0) data++;
+  /*subtracting two addresses returned as 32-bit number (max value is MAX_SUPPORTED_DEFLATE_LENGTH)*/
+  return (unsigned)(data - start);
+}
+
+/*wpos = pos & (windowsize - 1)*/
+static void updateHashChain(Hash* hash, size_t wpos, int hashval)
+{
+  hash->val[wpos] = hashval;
+  if(hash->head[hashval] != -1) hash->chain[wpos] = hash->head[hashval];
+  hash->head[hashval] = wpos;
+}
+
+/*
+LZ77-encode the data. Return value is error code. The input are raw bytes, the output
+is in the form of unsigned integers with codes representing for example literal bytes, or
+length/distance pairs.
+It uses a hash table technique to let it encode faster. When doing LZ77 encoding, a
+sliding window (of windowsize) is used, and all past bytes in that window can be used as
+the "dictionary". A brute force search through all possible distances would be slow, and
+this hash technique is one out of several ways to speed this up.
+*/
+static unsigned encodeLZ77(uivector* out, Hash* hash,
+                           const unsigned char* in, size_t inpos, size_t insize, unsigned windowsize,
+                           unsigned minmatch, unsigned nicematch, unsigned lazymatching)
+{
+  unsigned pos, i, error = 0;
+  /*for large window lengths, assume the user wants no compression loss. Otherwise, max hash chain length speedup.*/
+  unsigned maxchainlength = windowsize >= 8192 ? windowsize : windowsize / 8;
+  unsigned maxlazymatch = windowsize >= 8192 ? MAX_SUPPORTED_DEFLATE_LENGTH : 64;
+
+  unsigned usezeros = 1; /*not sure if setting it to false for windowsize < 8192 is better or worse*/
+  unsigned numzeros = 0;
+
+  unsigned offset; /*the offset represents the distance in LZ77 terminology*/
+  unsigned length;
+  unsigned lazy = 0;
+  unsigned lazylength = 0, lazyoffset = 0;
+  unsigned hashval;
+  unsigned current_offset, current_length;
+  const unsigned char *lastptr, *foreptr, *backptr;
+  unsigned hashpos, prevpos;
+
+  if(windowsize <= 0 || windowsize > 32768) return 60; /*error: windowsize smaller/larger than allowed*/
+  if((windowsize & (windowsize - 1)) != 0) return 90; /*error: must be power of two*/
+
+  if(nicematch > MAX_SUPPORTED_DEFLATE_LENGTH) nicematch = MAX_SUPPORTED_DEFLATE_LENGTH;
+
+  for(pos = inpos; pos < insize; pos++)
+  {
+    size_t wpos = pos & (windowsize - 1); /*position for in 'circular' hash buffers*/
+    unsigned chainlength = 0;
+
+    hashval = getHash(in, insize, pos);
+    updateHashChain(hash, wpos, hashval);
+
+    if(usezeros && hashval == 0)
+    {
+      if (numzeros == 0) numzeros = countZeros(in, insize, pos);
+      else if (pos + numzeros >= insize || in[pos + numzeros - 1] != 0) numzeros--;
+      hash->zeros[wpos] = numzeros;
+    }
+    else
+    {
+      numzeros = 0;
+    }
+
+    /*the length and offset found for the current position*/
+    length = 0;
+    offset = 0;
+
+    prevpos = hash->head[hashval];
+    hashpos = hash->chain[prevpos];
+
+    lastptr = &in[insize < pos + MAX_SUPPORTED_DEFLATE_LENGTH ? insize : pos + MAX_SUPPORTED_DEFLATE_LENGTH];
+
+    /*search for the longest string*/
+    for(;;)
+    {
+      /*stop when went completely around the circular buffer*/
+      if(prevpos < wpos && hashpos > prevpos && hashpos <= wpos) break;
+      if(prevpos > wpos && (hashpos <= wpos || hashpos > prevpos)) break;
+      if(chainlength++ >= maxchainlength) break;
+
+      current_offset = hashpos <= wpos ? wpos - hashpos : wpos - hashpos + windowsize;
+      if(current_offset > 0)
+      {
+        /*test the next characters*/
+        foreptr = &in[pos];
+        backptr = &in[pos - current_offset];
+
+        /*common case in PNGs is lots of zeros. Quickly skip over them as a speedup*/
+        if(usezeros && hashval == 0 && hash->val[hashpos] == 0 /*hashval[hashpos] may be out of date*/)
+        {
+          unsigned skip = hash->zeros[hashpos];
+          if(skip > numzeros) skip = numzeros;
+          backptr += skip;
+          foreptr += skip;
+        }
+
+        while(foreptr != lastptr && *backptr == *foreptr) /*maximum supported length by deflate is max length*/
+        {
+          ++backptr;
+          ++foreptr;
+        }
+        current_length = (unsigned)(foreptr - &in[pos]);
+
+        if(current_length > length)
+        {
+          length = current_length; /*the longest length*/
+          offset = current_offset; /*the offset that is related to this longest length*/
+          /*jump out once a length of max length is found (speed gain). This also jumps
+          out if length is MAX_SUPPORTED_DEFLATE_LENGTH*/
+          if(current_length >= nicematch) break;
+        }
+      }
+
+      if(hashpos == hash->chain[hashpos]) break;
+
+      prevpos = hashpos;
+      hashpos = hash->chain[hashpos];
+    }
+
+    if(lazymatching)
+    {
+      if(!lazy && length >= 3 && length <= maxlazymatch && length < MAX_SUPPORTED_DEFLATE_LENGTH)
+      {
+        lazy = 1;
+        lazylength = length;
+        lazyoffset = offset;
+        continue; /*try the next byte*/
+      }
+      if(lazy)
+      {
+        lazy = 0;
+        if(pos == 0) ERROR_BREAK(81);
+        if(length > lazylength + 1)
+        {
+          /*push the previous character as literal*/
+          if(!uivector_push_back(out, in[pos - 1])) ERROR_BREAK(83 /*alloc fail*/);
+        }
+        else
+        {
+          length = lazylength;
+          offset = lazyoffset;
+          hash->head[hashval] = -1; /*the same hashchain update will be done, this ensures no wrong alteration*/
+          pos--;
+        }
+      }
+    }
+    if(length >= 3 && offset > windowsize) ERROR_BREAK(86 /*too big (or overflown negative) offset*/);
+
+    /*encode it as length/distance pair or literal value*/
+    if(length < 3) /*only lengths of 3 or higher are supported as length/distance pair*/
+    {
+      if(!uivector_push_back(out, in[pos])) ERROR_BREAK(83 /*alloc fail*/);
+    }
+    else if(length < minmatch || (length == 3 && offset > 4096))
+    {
+      /*compensate for the fact that longer offsets have more extra bits, a
+      length of only 3 may be not worth it then*/
+      if(!uivector_push_back(out, in[pos])) ERROR_BREAK(83 /*alloc fail*/);
+    }
+    else
+    {
+      addLengthDistance(out, length, offset);
+      for(i = 1; i < length; i++)
+      {
+        pos++;
+        wpos = pos & (windowsize - 1);
+        hashval = getHash(in, insize, pos);
+        updateHashChain(hash, wpos, hashval);
+        if(usezeros && hashval == 0)
+        {
+          if (numzeros == 0) numzeros = countZeros(in, insize, pos);
+          else if (pos + numzeros >= insize || in[pos + numzeros - 1] != 0) numzeros--;
+          hash->zeros[wpos] = numzeros;
+        }
+        else
+        {
+          numzeros = 0;
+        }
+      }
+    }
+  } /*end of the loop through each character of input*/
+
+  return error;
+}
+
+/* /////////////////////////////////////////////////////////////////////////// */
+
+static unsigned deflateNoCompression(ucvector* out, const unsigned char* data, size_t datasize)
+{
+  /*non compressed deflate block data: 1 bit BFINAL,2 bits BTYPE,(5 bits): it jumps to start of next byte,
+  2 bytes LEN, 2 bytes NLEN, LEN bytes literal DATA*/
+
+  size_t i, j, numdeflateblocks = (datasize + 65534) / 65535;
+  unsigned datapos = 0;
+  for(i = 0; i < numdeflateblocks; i++)
+  {
+    unsigned BFINAL, BTYPE, LEN, NLEN;
+    unsigned char firstbyte;
+
+    BFINAL = (i == numdeflateblocks - 1);
+    BTYPE = 0;
+
+    firstbyte = (unsigned char)(BFINAL + ((BTYPE & 1) << 1) + ((BTYPE & 2) << 1));
+    ucvector_push_back(out, firstbyte);
+
+    LEN = 65535;
+    if(datasize - datapos < 65535) LEN = (unsigned)datasize - datapos;
+    NLEN = 65535 - LEN;
+
+    ucvector_push_back(out, (unsigned char)(LEN % 256));
+    ucvector_push_back(out, (unsigned char)(LEN / 256));
+    ucvector_push_back(out, (unsigned char)(NLEN % 256));
+    ucvector_push_back(out, (unsigned char)(NLEN / 256));
+
+    /*Decompressed data*/
+    for(j = 0; j < 65535 && datapos < datasize; j++)
+    {
+      ucvector_push_back(out, data[datapos++]);
+    }
+  }
+
+  return 0;
+}
+
+/*
+write the lz77-encoded data, which has lit, len and dist codes, to compressed stream using huffman trees.
+tree_ll: the tree for lit and len codes.
+tree_d: the tree for distance codes.
+*/
+static void writeLZ77data(size_t* bp, ucvector* out, const uivector* lz77_encoded,
+                          const HuffmanTree* tree_ll, const HuffmanTree* tree_d)
+{
+  size_t i = 0;
+  for(i = 0; i < lz77_encoded->size; i++)
+  {
+    unsigned val = lz77_encoded->data[i];
+    addHuffmanSymbol(bp, out, HuffmanTree_getCode(tree_ll, val), HuffmanTree_getLength(tree_ll, val));
+    if(val > 256) /*for a length code, 3 more things have to be added*/
+    {
+      unsigned length_index = val - FIRST_LENGTH_CODE_INDEX;
+      unsigned n_length_extra_bits = LENGTHEXTRA[length_index];
+      unsigned length_extra_bits = lz77_encoded->data[++i];
+
+      unsigned distance_code = lz77_encoded->data[++i];
+
+      unsigned distance_index = distance_code;
+      unsigned n_distance_extra_bits = DISTANCEEXTRA[distance_index];
+      unsigned distance_extra_bits = lz77_encoded->data[++i];
+
+      addBitsToStream(bp, out, length_extra_bits, n_length_extra_bits);
+      addHuffmanSymbol(bp, out, HuffmanTree_getCode(tree_d, distance_code),
+                       HuffmanTree_getLength(tree_d, distance_code));
+      addBitsToStream(bp, out, distance_extra_bits, n_distance_extra_bits);
+    }
+  }
+}
+
+/*Deflate for a block of type "dynamic", that is, with freely, optimally, created huffman trees*/
+static unsigned deflateDynamic(ucvector* out, size_t* bp, Hash* hash,
+                               const unsigned char* data, size_t datapos, size_t dataend,
+                               const LodePNGCompressSettings* settings, int final)
+{
+  unsigned error = 0;
+
+  /*
+  A block is compressed as follows: The PNG data is lz77 encoded, resulting in
+  literal bytes and length/distance pairs. This is then huffman compressed with
+  two huffman trees. One huffman tree is used for the lit and len values ("ll"),
+  another huffman tree is used for the dist values ("d"). These two trees are
+  stored using their code lengths, and to compress even more these code lengths
+  are also run-length encoded and huffman compressed. This gives a huffman tree
+  of code lengths "cl". The code lenghts used to describe this third tree are
+  the code length code lengths ("clcl").
+  */
+
+  /*The lz77 encoded data, represented with integers since there will also be length and distance codes in it*/
+  uivector lz77_encoded;
+  HuffmanTree tree_ll; /*tree for lit,len values*/
+  HuffmanTree tree_d; /*tree for distance codes*/
+  HuffmanTree tree_cl; /*tree for encoding the code lengths representing tree_ll and tree_d*/
+  uivector frequencies_ll; /*frequency of lit,len codes*/
+  uivector frequencies_d; /*frequency of dist codes*/
+  uivector frequencies_cl; /*frequency of code length codes*/
+  uivector bitlen_lld; /*lit,len,dist code lenghts (int bits), literally (without repeat codes).*/
+  uivector bitlen_lld_e; /*bitlen_lld encoded with repeat codes (this is a rudemtary run length compression)*/
+  /*bitlen_cl is the code length code lengths ("clcl"). The bit lengths of codes to represent tree_cl
+  (these are written as is in the file, it would be crazy to compress these using yet another huffman
+  tree that needs to be represented by yet another set of code lengths)*/
+  uivector bitlen_cl;
+  size_t datasize = dataend - datapos;
+
+  /*
+  Due to the huffman compression of huffman tree representations ("two levels"), there are some anologies:
+  bitlen_lld is to tree_cl what data is to tree_ll and tree_d.
+  bitlen_lld_e is to bitlen_lld what lz77_encoded is to data.
+  bitlen_cl is to bitlen_lld_e what bitlen_lld is to lz77_encoded.
+  */
+
+  unsigned BFINAL = final;
+  size_t numcodes_ll, numcodes_d, i;
+  unsigned HLIT, HDIST, HCLEN;
+
+  uivector_init(&lz77_encoded);
+  HuffmanTree_init(&tree_ll);
+  HuffmanTree_init(&tree_d);
+  HuffmanTree_init(&tree_cl);
+  uivector_init(&frequencies_ll);
+  uivector_init(&frequencies_d);
+  uivector_init(&frequencies_cl);
+  uivector_init(&bitlen_lld);
+  uivector_init(&bitlen_lld_e);
+  uivector_init(&bitlen_cl);
+
+  /*This while loop never loops due to a break at the end, it is here to
+  allow breaking out of it to the cleanup phase on error conditions.*/
+  while(!error)
+  {
+    if(settings->use_lz77)
+    {
+      error = encodeLZ77(&lz77_encoded, hash, data, datapos, dataend, settings->windowsize,
+                         settings->minmatch, settings->nicematch, settings->lazymatching);
+      if(error) break;
+    }
+    else
+    {
+      if(!uivector_resize(&lz77_encoded, datasize)) ERROR_BREAK(83 /*alloc fail*/);
+      for(i = datapos; i < dataend; i++) lz77_encoded.data[i] = data[i]; /*no LZ77, but still will be Huffman compressed*/
+    }
+
+    if(!uivector_resizev(&frequencies_ll, 286, 0)) ERROR_BREAK(83 /*alloc fail*/);
+    if(!uivector_resizev(&frequencies_d, 30, 0)) ERROR_BREAK(83 /*alloc fail*/);
+
+    /*Count the frequencies of lit, len and dist codes*/
+    for(i = 0; i < lz77_encoded.size; i++)
+    {
+      unsigned symbol = lz77_encoded.data[i];
+      frequencies_ll.data[symbol]++;
+      if(symbol > 256)
+      {
+        unsigned dist = lz77_encoded.data[i + 2];
+        frequencies_d.data[dist]++;
+        i += 3;
+      }
+    }
+    frequencies_ll.data[256] = 1; /*there will be exactly 1 end code, at the end of the block*/
+
+    /*Make both huffman trees, one for the lit and len codes, one for the dist codes*/
+    error = HuffmanTree_makeFromFrequencies(&tree_ll, frequencies_ll.data, 257, frequencies_ll.size, 15);
+    if(error) break;
+    /*2, not 1, is chosen for mincodes: some buggy PNG decoders require at least 2 symbols in the dist tree*/
+    error = HuffmanTree_makeFromFrequencies(&tree_d, frequencies_d.data, 2, frequencies_d.size, 15);
+    if(error) break;
+
+    numcodes_ll = tree_ll.numcodes; if(numcodes_ll > 286) numcodes_ll = 286;
+    numcodes_d = tree_d.numcodes; if(numcodes_d > 30) numcodes_d = 30;
+    /*store the code lengths of both generated trees in bitlen_lld*/
+    for(i = 0; i < numcodes_ll; i++) uivector_push_back(&bitlen_lld, HuffmanTree_getLength(&tree_ll, (unsigned)i));
+    for(i = 0; i < numcodes_d; i++) uivector_push_back(&bitlen_lld, HuffmanTree_getLength(&tree_d, (unsigned)i));
+
+    /*run-length compress bitlen_ldd into bitlen_lld_e by using repeat codes 16 (copy length 3-6 times),
+    17 (3-10 zeroes), 18 (11-138 zeroes)*/
+    for(i = 0; i < (unsigned)bitlen_lld.size; i++)
+    {
+      unsigned j = 0; /*amount of repititions*/
+      while(i + j + 1 < (unsigned)bitlen_lld.size && bitlen_lld.data[i + j + 1] == bitlen_lld.data[i]) j++;
+
+      if(bitlen_lld.data[i] == 0 && j >= 2) /*repeat code for zeroes*/
+      {
+        j++; /*include the first zero*/
+        if(j <= 10) /*repeat code 17 supports max 10 zeroes*/
+        {
+          uivector_push_back(&bitlen_lld_e, 17);
+          uivector_push_back(&bitlen_lld_e, j - 3);
+        }
+        else /*repeat code 18 supports max 138 zeroes*/
+        {
+          if(j > 138) j = 138;
+          uivector_push_back(&bitlen_lld_e, 18);
+          uivector_push_back(&bitlen_lld_e, j - 11);
+        }
+        i += (j - 1);
+      }
+      else if(j >= 3) /*repeat code for value other than zero*/
+      {
+        size_t k;
+        unsigned num = j / 6, rest = j % 6;
+        uivector_push_back(&bitlen_lld_e, bitlen_lld.data[i]);
+        for(k = 0; k < num; k++)
+        {
+          uivector_push_back(&bitlen_lld_e, 16);
+          uivector_push_back(&bitlen_lld_e, 6 - 3);
+        }
+        if(rest >= 3)
+        {
+          uivector_push_back(&bitlen_lld_e, 16);
+          uivector_push_back(&bitlen_lld_e, rest - 3);
+        }
+        else j -= rest;
+        i += j;
+      }
+      else /*too short to benefit from repeat code*/
+      {
+        uivector_push_back(&bitlen_lld_e, bitlen_lld.data[i]);
+      }
+    }
+
+    /*generate tree_cl, the huffmantree of huffmantrees*/
+
+    if(!uivector_resizev(&frequencies_cl, NUM_CODE_LENGTH_CODES, 0)) ERROR_BREAK(83 /*alloc fail*/);
+    for(i = 0; i < bitlen_lld_e.size; i++)
+    {
+      frequencies_cl.data[bitlen_lld_e.data[i]]++;
+      /*after a repeat code come the bits that specify the number of repetitions,
+      those don't need to be in the frequencies_cl calculation*/
+      if(bitlen_lld_e.data[i] >= 16) i++;
+    }
+
+    error = HuffmanTree_makeFromFrequencies(&tree_cl, frequencies_cl.data,
+                                            frequencies_cl.size, frequencies_cl.size, 7);
+    if(error) break;
+
+    if(!uivector_resize(&bitlen_cl, tree_cl.numcodes)) ERROR_BREAK(83 /*alloc fail*/);
+    for(i = 0; i < tree_cl.numcodes; i++)
+    {
+      /*lenghts of code length tree is in the order as specified by deflate*/
+      bitlen_cl.data[i] = HuffmanTree_getLength(&tree_cl, CLCL_ORDER[i]);
+    }
+    while(bitlen_cl.data[bitlen_cl.size - 1] == 0 && bitlen_cl.size > 4)
+    {
+      /*remove zeros at the end, but minimum size must be 4*/
+      if(!uivector_resize(&bitlen_cl, bitlen_cl.size - 1)) ERROR_BREAK(83 /*alloc fail*/);
+    }
+    if(error) break;
+
+    /*
+    Write everything into the output
+
+    After the BFINAL and BTYPE, the dynamic block consists out of the following:
+    - 5 bits HLIT, 5 bits HDIST, 4 bits HCLEN
+    - (HCLEN+4)*3 bits code lengths of code length alphabet
+    - HLIT + 257 code lenghts of lit/length alphabet (encoded using the code length
+      alphabet, + possible repetition codes 16, 17, 18)
+    - HDIST + 1 code lengths of distance alphabet (encoded using the code length
+      alphabet, + possible repetition codes 16, 17, 18)
+    - compressed data
+    - 256 (end code)
+    */
+
+    /*Write block type*/
+    addBitToStream(bp, out, BFINAL);
+    addBitToStream(bp, out, 0); /*first bit of BTYPE "dynamic"*/
+    addBitToStream(bp, out, 1); /*second bit of BTYPE "dynamic"*/
+
+    /*write the HLIT, HDIST and HCLEN values*/
+    HLIT = (unsigned)(numcodes_ll - 257);
+    HDIST = (unsigned)(numcodes_d - 1);
+    HCLEN = (unsigned)bitlen_cl.size - 4;
+    /*trim zeroes for HCLEN. HLIT and HDIST were already trimmed at tree creation*/
+    while(!bitlen_cl.data[HCLEN + 4 - 1] && HCLEN > 0) HCLEN--;
+    addBitsToStream(bp, out, HLIT, 5);
+    addBitsToStream(bp, out, HDIST, 5);
+    addBitsToStream(bp, out, HCLEN, 4);
+
+    /*write the code lenghts of the code length alphabet*/
+    for(i = 0; i < HCLEN + 4; i++) addBitsToStream(bp, out, bitlen_cl.data[i], 3);
+
+    /*write the lenghts of the lit/len AND the dist alphabet*/
+    for(i = 0; i < bitlen_lld_e.size; i++)
+    {
+      addHuffmanSymbol(bp, out, HuffmanTree_getCode(&tree_cl, bitlen_lld_e.data[i]),
+                       HuffmanTree_getLength(&tree_cl, bitlen_lld_e.data[i]));
+      /*extra bits of repeat codes*/
+      if(bitlen_lld_e.data[i] == 16) addBitsToStream(bp, out, bitlen_lld_e.data[++i], 2);
+      else if(bitlen_lld_e.data[i] == 17) addBitsToStream(bp, out, bitlen_lld_e.data[++i], 3);
+      else if(bitlen_lld_e.data[i] == 18) addBitsToStream(bp, out, bitlen_lld_e.data[++i], 7);
+    }
+
+    /*write the compressed data symbols*/
+    writeLZ77data(bp, out, &lz77_encoded, &tree_ll, &tree_d);
+    /*error: the length of the end code 256 must be larger than 0*/
+    if(HuffmanTree_getLength(&tree_ll, 256) == 0) ERROR_BREAK(64);
+
+    /*write the end code*/
+    addHuffmanSymbol(bp, out, HuffmanTree_getCode(&tree_ll, 256), HuffmanTree_getLength(&tree_ll, 256));
+
+    break; /*end of error-while*/
+  }
+
+  /*cleanup*/
+  uivector_cleanup(&lz77_encoded);
+  HuffmanTree_cleanup(&tree_ll);
+  HuffmanTree_cleanup(&tree_d);
+  HuffmanTree_cleanup(&tree_cl);
+  uivector_cleanup(&frequencies_ll);
+  uivector_cleanup(&frequencies_d);
+  uivector_cleanup(&frequencies_cl);
+  uivector_cleanup(&bitlen_lld_e);
+  uivector_cleanup(&bitlen_lld);
+  uivector_cleanup(&bitlen_cl);
+
+  return error;
+}
+
+static unsigned deflateFixed(ucvector* out, size_t* bp, Hash* hash,
+                             const unsigned char* data,
+                             size_t datapos, size_t dataend,
+                             const LodePNGCompressSettings* settings, int final)
+{
+  HuffmanTree tree_ll; /*tree for literal values and length codes*/
+  HuffmanTree tree_d; /*tree for distance codes*/
+
+  unsigned BFINAL = final;
+  unsigned error = 0;
+  size_t i;
+
+  HuffmanTree_init(&tree_ll);
+  HuffmanTree_init(&tree_d);
+
+  generateFixedLitLenTree(&tree_ll);
+  generateFixedDistanceTree(&tree_d);
+
+  addBitToStream(bp, out, BFINAL);
+  addBitToStream(bp, out, 1); /*first bit of BTYPE*/
+  addBitToStream(bp, out, 0); /*second bit of BTYPE*/
+
+  if(settings->use_lz77) /*LZ77 encoded*/
+  {
+    uivector lz77_encoded;
+    uivector_init(&lz77_encoded);
+    error = encodeLZ77(&lz77_encoded, hash, data, datapos, dataend, settings->windowsize,
+                       settings->minmatch, settings->nicematch, settings->lazymatching);
+    if(!error) writeLZ77data(bp, out, &lz77_encoded, &tree_ll, &tree_d);
+    uivector_cleanup(&lz77_encoded);
+  }
+  else /*no LZ77, but still will be Huffman compressed*/
+  {
+    for(i = datapos; i < dataend; i++)
+    {
+      addHuffmanSymbol(bp, out, HuffmanTree_getCode(&tree_ll, data[i]), HuffmanTree_getLength(&tree_ll, data[i]));
+    }
+  }
+  /*add END code*/
+  if(!error) addHuffmanSymbol(bp, out, HuffmanTree_getCode(&tree_ll, 256), HuffmanTree_getLength(&tree_ll, 256));
+
+  /*cleanup*/
+  HuffmanTree_cleanup(&tree_ll);
+  HuffmanTree_cleanup(&tree_d);
+
+  return error;
+}
+
+static unsigned lodepng_deflatev(ucvector* out, const unsigned char* in, size_t insize,
+                                 const LodePNGCompressSettings* settings)
+{
+  unsigned error = 0;
+  size_t i, blocksize, numdeflateblocks;
+  size_t bp = 0; /*the bit pointer*/
+  Hash hash;
+
+  if(settings->btype > 2) return 61;
+  else if(settings->btype == 0) return deflateNoCompression(out, in, insize);
+  else if(settings->btype == 1) blocksize = insize;
+  else /*if(settings->btype == 2)*/
+  {
+    blocksize = insize / 8 + 8;
+    if(blocksize < 65535) blocksize = 65535;
+  }
+
+  numdeflateblocks = (insize + blocksize - 1) / blocksize;
+  if(numdeflateblocks == 0) numdeflateblocks = 1;
+
+  error = hash_init(&hash, settings->windowsize);
+  if(error) return error;
+
+  for(i = 0; i < numdeflateblocks && !error; i++)
+  {
+    int final = i == numdeflateblocks - 1;
+    size_t start = i * blocksize;
+    size_t end = start + blocksize;
+    if(end > insize) end = insize;
+
+    if(settings->btype == 1) error = deflateFixed(out, &bp, &hash, in, start, end, settings, final);
+    else if(settings->btype == 2) error = deflateDynamic(out, &bp, &hash, in, start, end, settings, final);
+  }
+
+  hash_cleanup(&hash);
+
+  return error;
+}
+
+unsigned lodepng_deflate(unsigned char** out, size_t* outsize,
+                         const unsigned char* in, size_t insize,
+                         const LodePNGCompressSettings* settings)
+{
+  unsigned error;
+  ucvector v;
+  ucvector_init_buffer(&v, *out, *outsize);
+  error = lodepng_deflatev(&v, in, insize, settings);
+  *out = v.data;
+  *outsize = v.size;
+  return error;
+}
+
+static unsigned deflate(unsigned char** out, size_t* outsize,
+                        const unsigned char* in, size_t insize,
+                        const LodePNGCompressSettings* settings)
+{
+  if(settings->custom_deflate)
+  {
+    return settings->custom_deflate(out, outsize, in, insize, settings);
+  }
+  else
+  {
+    return lodepng_deflate(out, outsize, in, insize, settings);
+  }
+}
+
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Adler32                                                                  */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+static unsigned update_adler32(unsigned adler, const unsigned char* data, unsigned len)
+{
+   unsigned s1 = adler & 0xffff;
+   unsigned s2 = (adler >> 16) & 0xffff;
+
+  while(len > 0)
+  {
+    /*at least 5550 sums can be done before the sums overflow, saving a lot of module divisions*/
+    unsigned amount = len > 5550 ? 5550 : len;
+    len -= amount;
+    while(amount > 0)
+    {
+      s1 += (*data++);
+      s2 += s1;
+      amount--;
+    }
+    s1 %= 65521;
+    s2 %= 65521;
+  }
+
+  return (s2 << 16) | s1;
+}
+
+/*Return the adler32 of the bytes data[0..len-1]*/
+static unsigned adler32(const unsigned char* data, unsigned len)
+{
+  return update_adler32(1L, data, len);
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Zlib                                                                   / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+unsigned lodepng_zlib_decompress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                                 size_t insize, const LodePNGDecompressSettings* settings)
+{
+  unsigned error = 0;
+  unsigned CM, CINFO, FDICT;
+
+  if(insize < 2) return 53; /*error, size of zlib data too small*/
+  /*read information from zlib header*/
+  if((in[0] * 256 + in[1]) % 31 != 0)
+  {
+    /*error: 256 * in[0] + in[1] must be a multiple of 31, the FCHECK value is supposed to be made that way*/
+    return 24;
+  }
+
+  CM = in[0] & 15;
+  CINFO = (in[0] >> 4) & 15;
+  /*FCHECK = in[1] & 31;*/ /*FCHECK is already tested above*/
+  FDICT = (in[1] >> 5) & 1;
+  /*FLEVEL = (in[1] >> 6) & 3;*/ /*FLEVEL is not used here*/
+
+  if(CM != 8 || CINFO > 7)
+  {
+    /*error: only compression method 8: inflate with sliding window of 32k is supported by the PNG spec*/
+    return 25;
+  }
+  if(FDICT != 0)
+  {
+    /*error: the specification of PNG says about the zlib stream:
+      "The additional flags shall not specify a preset dictionary."*/
+    return 26;
+  }
+
+  error = inflate(out, outsize, in + 2, insize - 2, settings);
+  if(error) return error;
+
+  if(!settings->ignore_adler32)
+  {
+    unsigned ADLER32 = lodepng_read32bitInt(&in[insize - 4]);
+    unsigned checksum = adler32(*out, (unsigned)(*outsize));
+    if(checksum != ADLER32) return 58; /*error, adler checksum not correct, data must be corrupted*/
+  }
+
+  return 0; /*no error*/
+}
+
+static unsigned zlib_decompress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                                size_t insize, const LodePNGDecompressSettings* settings)
+{
+  if(settings->custom_zlib)
+  {
+    return settings->custom_zlib(out, outsize, in, insize, settings);
+  }
+  else
+  {
+    return lodepng_zlib_decompress(out, outsize, in, insize, settings);
+  }
+}
+
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+unsigned lodepng_zlib_compress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                               size_t insize, const LodePNGCompressSettings* settings)
+{
+  /*initially, *out must be NULL and outsize 0, if you just give some random *out
+  that's pointing to a non allocated buffer, this'll crash*/
+  ucvector outv;
+  size_t i;
+  unsigned error;
+  unsigned char* deflatedata = 0;
+  size_t deflatesize = 0;
+
+  unsigned ADLER32;
+  /*zlib data: 1 byte CMF (CM+CINFO), 1 byte FLG, deflate data, 4 byte ADLER32 checksum of the Decompressed data*/
+  unsigned CMF = 120; /*0b01111000: CM 8, CINFO 7. With CINFO 7, any window size up to 32768 can be used.*/
+  unsigned FLEVEL = 0;
+  unsigned FDICT = 0;
+  unsigned CMFFLG = 256 * CMF + FDICT * 32 + FLEVEL * 64;
+  unsigned FCHECK = 31 - CMFFLG % 31;
+  CMFFLG += FCHECK;
+
+  /*ucvector-controlled version of the output buffer, for dynamic array*/
+  ucvector_init_buffer(&outv, *out, *outsize);
+
+  ucvector_push_back(&outv, (unsigned char)(CMFFLG / 256));
+  ucvector_push_back(&outv, (unsigned char)(CMFFLG % 256));
+
+  error = deflate(&deflatedata, &deflatesize, in, insize, settings);
+
+  if(!error)
+  {
+    ADLER32 = adler32(in, (unsigned)insize);
+    for(i = 0; i < deflatesize; i++) ucvector_push_back(&outv, deflatedata[i]);
+    lodepng_free(deflatedata);
+    lodepng_add32bitInt(&outv, ADLER32);
+  }
+
+  *out = outv.data;
+  *outsize = outv.size;
+
+  return error;
+}
+
+/* compress using the default or custom zlib function */
+static unsigned zlib_compress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                              size_t insize, const LodePNGCompressSettings* settings)
+{
+  if(settings->custom_zlib)
+  {
+    return settings->custom_zlib(out, outsize, in, insize, settings);
+  }
+  else
+  {
+    return lodepng_zlib_compress(out, outsize, in, insize, settings);
+  }
+}
+
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#else /*no LODEPNG_COMPILE_ZLIB*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+static unsigned zlib_decompress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                                size_t insize, const LodePNGDecompressSettings* settings)
+{
+  if (!settings->custom_zlib) return 87; /*no custom zlib function provided */
+  return settings->custom_zlib(out, outsize, in, insize, settings);
+}
+#endif /*LODEPNG_COMPILE_DECODER*/
+#ifdef LODEPNG_COMPILE_ENCODER
+static unsigned zlib_compress(unsigned char** out, size_t* outsize, const unsigned char* in,
+                              size_t insize, const LodePNGCompressSettings* settings)
+{
+  if (!settings->custom_zlib) return 87; /*no custom zlib function provided */
+  return settings->custom_zlib(out, outsize, in, insize, settings);
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#endif /*LODEPNG_COMPILE_ZLIB*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+/*this is a good tradeoff between speed and compression ratio*/
+#define DEFAULT_WINDOWSIZE 2048
+
+void lodepng_compress_settings_init(LodePNGCompressSettings* settings)
+{
+  /*compress with dynamic huffman tree (not in the mathematical sense, just not the predefined one)*/
+  settings->btype = 2;
+  settings->use_lz77 = 1;
+  settings->windowsize = DEFAULT_WINDOWSIZE;
+  settings->minmatch = 3;
+  settings->nicematch = 128;
+  settings->lazymatching = 1;
+
+  settings->custom_zlib = 0;
+  settings->custom_deflate = 0;
+  settings->custom_context = 0;
+}
+
+const LodePNGCompressSettings lodepng_default_compress_settings = {2, 1, DEFAULT_WINDOWSIZE, 3, 128, 1, 0, 0, 0};
+
+
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+void lodepng_decompress_settings_init(LodePNGDecompressSettings* settings)
+{
+  settings->ignore_adler32 = 0;
+
+  settings->custom_zlib = 0;
+  settings->custom_inflate = 0;
+  settings->custom_context = 0;
+}
+
+const LodePNGDecompressSettings lodepng_default_decompress_settings = {0, 0, 0, 0};
+
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* // End of Zlib related code. Begin of PNG related code.                 // */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_PNG
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / CRC32                                                                  / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/* CRC polynomial: 0xedb88320 */
+static unsigned lodepng_crc32_table[256] = {
+           0u, 1996959894u, 3993919788u, 2567524794u,  124634137u, 1886057615u, 3915621685u, 2657392035u,
+   249268274u, 2044508324u, 3772115230u, 2547177864u,  162941995u, 2125561021u, 3887607047u, 2428444049u,
+   498536548u, 1789927666u, 4089016648u, 2227061214u,  450548861u, 1843258603u, 4107580753u, 2211677639u,
+   325883990u, 1684777152u, 4251122042u, 2321926636u,  335633487u, 1661365465u, 4195302755u, 2366115317u,
+   997073096u, 1281953886u, 3579855332u, 2724688242u, 1006888145u, 1258607687u, 3524101629u, 2768942443u,
+   901097722u, 1119000684u, 3686517206u, 2898065728u,  853044451u, 1172266101u, 3705015759u, 2882616665u,
+   651767980u, 1373503546u, 3369554304u, 3218104598u,  565507253u, 1454621731u, 3485111705u, 3099436303u,
+   671266974u, 1594198024u, 3322730930u, 2970347812u,  795835527u, 1483230225u, 3244367275u, 3060149565u,
+  1994146192u,   31158534u, 2563907772u, 4023717930u, 1907459465u,  112637215u, 2680153253u, 3904427059u,
+  2013776290u,  251722036u, 2517215374u, 3775830040u, 2137656763u,  141376813u, 2439277719u, 3865271297u,
+  1802195444u,  476864866u, 2238001368u, 4066508878u, 1812370925u,  453092731u, 2181625025u, 4111451223u,
+  1706088902u,  314042704u, 2344532202u, 4240017532u, 1658658271u,  366619977u, 2362670323u, 4224994405u,
+  1303535960u,  984961486u, 2747007092u, 3569037538u, 1256170817u, 1037604311u, 2765210733u, 3554079995u,
+  1131014506u,  879679996u, 2909243462u, 3663771856u, 1141124467u,  855842277u, 2852801631u, 3708648649u,
+  1342533948u,  654459306u, 3188396048u, 3373015174u, 1466479909u,  544179635u, 3110523913u, 3462522015u,
+  1591671054u,  702138776u, 2966460450u, 3352799412u, 1504918807u,  783551873u, 3082640443u, 3233442989u,
+  3988292384u, 2596254646u,   62317068u, 1957810842u, 3939845945u, 2647816111u,   81470997u, 1943803523u,
+  3814918930u, 2489596804u,  225274430u, 2053790376u, 3826175755u, 2466906013u,  167816743u, 2097651377u,
+  4027552580u, 2265490386u,  503444072u, 1762050814u, 4150417245u, 2154129355u,  426522225u, 1852507879u,
+  4275313526u, 2312317920u,  282753626u, 1742555852u, 4189708143u, 2394877945u,  397917763u, 1622183637u,
+  3604390888u, 2714866558u,  953729732u, 1340076626u, 3518719985u, 2797360999u, 1068828381u, 1219638859u,
+  3624741850u, 2936675148u,  906185462u, 1090812512u, 3747672003u, 2825379669u,  829329135u, 1181335161u,
+  3412177804u, 3160834842u,  628085408u, 1382605366u, 3423369109u, 3138078467u,  570562233u, 1426400815u,
+  3317316542u, 2998733608u,  733239954u, 1555261956u, 3268935591u, 3050360625u,  752459403u, 1541320221u,
+  2607071920u, 3965973030u, 1969922972u,   40735498u, 2617837225u, 3943577151u, 1913087877u,   83908371u,
+  2512341634u, 3803740692u, 2075208622u,  213261112u, 2463272603u, 3855990285u, 2094854071u,  198958881u,
+  2262029012u, 4057260610u, 1759359992u,  534414190u, 2176718541u, 4139329115u, 1873836001u,  414664567u,
+  2282248934u, 4279200368u, 1711684554u,  285281116u, 2405801727u, 4167216745u, 1634467795u,  376229701u,
+  2685067896u, 3608007406u, 1308918612u,  956543938u, 2808555105u, 3495958263u, 1231636301u, 1047427035u,
+  2932959818u, 3654703836u, 1088359270u,  936918000u, 2847714899u, 3736837829u, 1202900863u,  817233897u,
+  3183342108u, 3401237130u, 1404277552u,  615818150u, 3134207493u, 3453421203u, 1423857449u,  601450431u,
+  3009837614u, 3294710456u, 1567103746u,  711928724u, 3020668471u, 3272380065u, 1510334235u,  755167117u
+};
+
+/*Return the CRC of the bytes buf[0..len-1].*/
+unsigned lodepng_crc32(const unsigned char* buf, size_t len)
+{
+  unsigned c = 0xffffffffL;
+  size_t n;
+
+  for(n = 0; n < len; n++)
+  {
+    c = lodepng_crc32_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
+  }
+  return c ^ 0xffffffffL;
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Reading and writing single bits and bytes from/to stream for LodePNG   / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+static unsigned char readBitFromReversedStream(size_t* bitpointer, const unsigned char* bitstream)
+{
+  unsigned char result = (unsigned char)((bitstream[(*bitpointer) >> 3] >> (7 - ((*bitpointer) & 0x7))) & 1);
+  (*bitpointer)++;
+  return result;
+}
+
+static unsigned readBitsFromReversedStream(size_t* bitpointer, const unsigned char* bitstream, size_t nbits)
+{
+  unsigned result = 0;
+  size_t i;
+  for(i = nbits - 1; i < nbits; i--)
+  {
+    result += (unsigned)readBitFromReversedStream(bitpointer, bitstream) << i;
+  }
+  return result;
+}
+
+#ifdef LODEPNG_COMPILE_DECODER
+static void setBitOfReversedStream0(size_t* bitpointer, unsigned char* bitstream, unsigned char bit)
+{
+  /*the current bit in bitstream must be 0 for this to work*/
+  if(bit)
+  {
+    /*earlier bit of huffman code is in a lesser significant bit of an earlier byte*/
+    bitstream[(*bitpointer) >> 3] |= (bit << (7 - ((*bitpointer) & 0x7)));
+  }
+  (*bitpointer)++;
+}
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+static void setBitOfReversedStream(size_t* bitpointer, unsigned char* bitstream, unsigned char bit)
+{
+  /*the current bit in bitstream may be 0 or 1 for this to work*/
+  if(bit == 0) bitstream[(*bitpointer) >> 3] &=  (unsigned char)(~(1 << (7 - ((*bitpointer) & 0x7))));
+  else         bitstream[(*bitpointer) >> 3] |=  (1 << (7 - ((*bitpointer) & 0x7)));
+  (*bitpointer)++;
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / PNG chunks                                                             / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+unsigned lodepng_chunk_length(const unsigned char* chunk)
+{
+  return lodepng_read32bitInt(&chunk[0]);
+}
+
+void lodepng_chunk_type(char type[5], const unsigned char* chunk)
+{
+  unsigned i;
+  for(i = 0; i < 4; i++) type[i] = chunk[4 + i];
+  type[4] = 0; /*null termination char*/
+}
+
+unsigned char lodepng_chunk_type_equals(const unsigned char* chunk, const char* type)
+{
+  if(strlen(type) != 4) return 0;
+  return (chunk[4] == type[0] && chunk[5] == type[1] && chunk[6] == type[2] && chunk[7] == type[3]);
+}
+
+unsigned char lodepng_chunk_ancillary(const unsigned char* chunk)
+{
+  return((chunk[4] & 32) != 0);
+}
+
+unsigned char lodepng_chunk_private(const unsigned char* chunk)
+{
+  return((chunk[6] & 32) != 0);
+}
+
+unsigned char lodepng_chunk_safetocopy(const unsigned char* chunk)
+{
+  return((chunk[7] & 32) != 0);
+}
+
+unsigned char* lodepng_chunk_data(unsigned char* chunk)
+{
+  return &chunk[8];
+}
+
+const unsigned char* lodepng_chunk_data_const(const unsigned char* chunk)
+{
+  return &chunk[8];
+}
+
+unsigned lodepng_chunk_check_crc(const unsigned char* chunk)
+{
+  unsigned length = lodepng_chunk_length(chunk);
+  unsigned CRC = lodepng_read32bitInt(&chunk[length + 8]);
+  /*the CRC is taken of the data and the 4 chunk type letters, not the length*/
+  unsigned checksum = lodepng_crc32(&chunk[4], length + 4);
+  if(CRC != checksum) return 1;
+  else return 0;
+}
+
+void lodepng_chunk_generate_crc(unsigned char* chunk)
+{
+  unsigned length = lodepng_chunk_length(chunk);
+  unsigned CRC = lodepng_crc32(&chunk[4], length + 4);
+  lodepng_set32bitInt(chunk + 8 + length, CRC);
+}
+
+unsigned char* lodepng_chunk_next(unsigned char* chunk)
+{
+  unsigned total_chunk_length = lodepng_chunk_length(chunk) + 12;
+  return &chunk[total_chunk_length];
+}
+
+const unsigned char* lodepng_chunk_next_const(const unsigned char* chunk)
+{
+  unsigned total_chunk_length = lodepng_chunk_length(chunk) + 12;
+  return &chunk[total_chunk_length];
+}
+
+unsigned lodepng_chunk_append(unsigned char** out, size_t* outlength, const unsigned char* chunk)
+{
+  unsigned i;
+  unsigned total_chunk_length = lodepng_chunk_length(chunk) + 12;
+  unsigned char *chunk_start, *new_buffer;
+  size_t new_length = (*outlength) + total_chunk_length;
+  if(new_length < total_chunk_length || new_length < (*outlength)) return 77; /*integer overflow happened*/
+
+  new_buffer = (unsigned char*)lodepng_realloc(*out, new_length);
+  if(!new_buffer) return 83; /*alloc fail*/
+  (*out) = new_buffer;
+  (*outlength) = new_length;
+  chunk_start = &(*out)[new_length - total_chunk_length];
+
+  for(i = 0; i < total_chunk_length; i++) chunk_start[i] = chunk[i];
+
+  return 0;
+}
+
+unsigned lodepng_chunk_create(unsigned char** out, size_t* outlength, unsigned length,
+                              const char* type, const unsigned char* data)
+{
+  unsigned i;
+  unsigned char *chunk, *new_buffer;
+  size_t new_length = (*outlength) + length + 12;
+  if(new_length < length + 12 || new_length < (*outlength)) return 77; /*integer overflow happened*/
+  new_buffer = (unsigned char*)lodepng_realloc(*out, new_length);
+  if(!new_buffer) return 83; /*alloc fail*/
+  (*out) = new_buffer;
+  (*outlength) = new_length;
+  chunk = &(*out)[(*outlength) - length - 12];
+
+  /*1: length*/
+  lodepng_set32bitInt(chunk, (unsigned)length);
+
+  /*2: chunk name (4 letters)*/
+  chunk[4] = type[0];
+  chunk[5] = type[1];
+  chunk[6] = type[2];
+  chunk[7] = type[3];
+
+  /*3: the data*/
+  for(i = 0; i < length; i++) chunk[8 + i] = data[i];
+
+  /*4: CRC (of the chunkname characters and the data)*/
+  lodepng_chunk_generate_crc(chunk);
+
+  return 0;
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / Color types and such                                                   / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*return type is a LodePNG error code*/
+static unsigned checkColorValidity(LodePNGColorType colortype, unsigned bd) /*bd = bitdepth*/
+{
+  switch(colortype)
+  {
+    case 0: if(!(bd == 1 || bd == 2 || bd == 4 || bd == 8 || bd == 16)) return 37; break; /*grey*/
+    case 2: if(!(                                 bd == 8 || bd == 16)) return 37; break; /*RGB*/
+    case 3: if(!(bd == 1 || bd == 2 || bd == 4 || bd == 8            )) return 37; break; /*palette*/
+    case 4: if(!(                                 bd == 8 || bd == 16)) return 37; break; /*grey + alpha*/
+    case 6: if(!(                                 bd == 8 || bd == 16)) return 37; break; /*RGBA*/
+    default: return 31;
+  }
+  return 0; /*allowed color type / bits combination*/
+}
+
+static unsigned getNumColorChannels(LodePNGColorType colortype)
+{
+  switch(colortype)
+  {
+    case 0: return 1; /*grey*/
+    case 2: return 3; /*RGB*/
+    case 3: return 1; /*palette*/
+    case 4: return 2; /*grey + alpha*/
+    case 6: return 4; /*RGBA*/
+  }
+  return 0; /*unexisting color type*/
+}
+
+static unsigned lodepng_get_bpp_lct(LodePNGColorType colortype, unsigned bitdepth)
+{
+  /*bits per pixel is amount of channels * bits per channel*/
+  return getNumColorChannels(colortype) * bitdepth;
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+void lodepng_color_mode_init(LodePNGColorMode* info)
+{
+  info->key_defined = 0;
+  info->key_r = info->key_g = info->key_b = 0;
+  info->colortype = LCT_RGBA;
+  info->bitdepth = 8;
+  info->palette = 0;
+  info->palettesize = 0;
+}
+
+void lodepng_color_mode_cleanup(LodePNGColorMode* info)
+{
+  lodepng_palette_clear(info);
+}
+
+unsigned lodepng_color_mode_copy(LodePNGColorMode* dest, const LodePNGColorMode* source)
+{
+  size_t i;
+  lodepng_color_mode_cleanup(dest);
+  *dest = *source;
+  if(source->palette)
+  {
+    dest->palette = (unsigned char*)lodepng_malloc(1024);
+    if(!dest->palette && source->palettesize) return 83; /*alloc fail*/
+    for(i = 0; i < source->palettesize * 4; i++) dest->palette[i] = source->palette[i];
+  }
+  return 0;
+}
+
+static int lodepng_color_mode_equal(const LodePNGColorMode* a, const LodePNGColorMode* b)
+{
+  size_t i;
+  if(a->colortype != b->colortype) return 0;
+  if(a->bitdepth != b->bitdepth) return 0;
+  if(a->key_defined != b->key_defined) return 0;
+  if(a->key_defined)
+  {
+    if(a->key_r != b->key_r) return 0;
+    if(a->key_g != b->key_g) return 0;
+    if(a->key_b != b->key_b) return 0;
+  }
+  if(a->palettesize != b->palettesize) return 0;
+  for(i = 0; i < a->palettesize * 4; i++)
+  {
+    if(a->palette[i] != b->palette[i]) return 0;
+  }
+  return 1;
+}
+
+void lodepng_palette_clear(LodePNGColorMode* info)
+{
+  if(info->palette) lodepng_free(info->palette);
+  info->palette = 0;
+  info->palettesize = 0;
+}
+
+unsigned lodepng_palette_add(LodePNGColorMode* info,
+                             unsigned char r, unsigned char g, unsigned char b, unsigned char a)
+{
+  unsigned char* data;
+  /*the same resize technique as C++ std::vectors is used, and here it's made so that for a palette with
+  the max of 256 colors, it'll have the exact alloc size*/
+  if(!info->palette) /*allocate palette if empty*/
+  {
+    /*room for 256 colors with 4 bytes each*/
+    data = (unsigned char*)lodepng_realloc(info->palette, 1024);
+    if(!data) return 83; /*alloc fail*/
+    else info->palette = data;
+  }
+  info->palette[4 * info->palettesize + 0] = r;
+  info->palette[4 * info->palettesize + 1] = g;
+  info->palette[4 * info->palettesize + 2] = b;
+  info->palette[4 * info->palettesize + 3] = a;
+  info->palettesize++;
+  return 0;
+}
+
+unsigned lodepng_get_bpp(const LodePNGColorMode* info)
+{
+  /*calculate bits per pixel out of colortype and bitdepth*/
+  return lodepng_get_bpp_lct(info->colortype, info->bitdepth);
+}
+
+unsigned lodepng_get_channels(const LodePNGColorMode* info)
+{
+  return getNumColorChannels(info->colortype);
+}
+
+unsigned lodepng_is_greyscale_type(const LodePNGColorMode* info)
+{
+  return info->colortype == LCT_GREY || info->colortype == LCT_GREY_ALPHA;
+}
+
+unsigned lodepng_is_alpha_type(const LodePNGColorMode* info)
+{
+  return (info->colortype & 4) != 0; /*4 or 6*/
+}
+
+unsigned lodepng_is_palette_type(const LodePNGColorMode* info)
+{
+  return info->colortype == LCT_PALETTE;
+}
+
+unsigned lodepng_has_palette_alpha(const LodePNGColorMode* info)
+{
+  size_t i;
+  for(i = 0; i < info->palettesize; i++)
+  {
+    if(info->palette[i * 4 + 3] < 255) return 1;
+  }
+  return 0;
+}
+
+unsigned lodepng_can_have_alpha(const LodePNGColorMode* info)
+{
+  return info->key_defined
+      || lodepng_is_alpha_type(info)
+      || lodepng_has_palette_alpha(info);
+}
+
+size_t lodepng_get_raw_size(unsigned w, unsigned h, const LodePNGColorMode* color)
+{
+  return (w * h * lodepng_get_bpp(color) + 7) / 8;
+}
+
+size_t lodepng_get_raw_size_lct(unsigned w, unsigned h, LodePNGColorType colortype, unsigned bitdepth)
+{
+  return (w * h * lodepng_get_bpp_lct(colortype, bitdepth) + 7) / 8;
+}
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+
+static void LodePNGUnknownChunks_init(LodePNGInfo* info)
+{
+  unsigned i;
+  for(i = 0; i < 3; i++) info->unknown_chunks_data[i] = 0;
+  for(i = 0; i < 3; i++) info->unknown_chunks_size[i] = 0;
+}
+
+static void LodePNGUnknownChunks_cleanup(LodePNGInfo* info)
+{
+  unsigned i;
+  for(i = 0; i < 3; i++) lodepng_free(info->unknown_chunks_data[i]);
+}
+
+static unsigned LodePNGUnknownChunks_copy(LodePNGInfo* dest, const LodePNGInfo* src)
+{
+  unsigned i;
+
+  LodePNGUnknownChunks_cleanup(dest);
+
+  for(i = 0; i < 3; i++)
+  {
+    size_t j;
+    dest->unknown_chunks_size[i] = src->unknown_chunks_size[i];
+    dest->unknown_chunks_data[i] = (unsigned char*)lodepng_malloc(src->unknown_chunks_size[i]);
+    if(!dest->unknown_chunks_data[i] && dest->unknown_chunks_size[i]) return 83; /*alloc fail*/
+    for(j = 0; j < src->unknown_chunks_size[i]; j++)
+    {
+      dest->unknown_chunks_data[i][j] = src->unknown_chunks_data[i][j];
+    }
+  }
+
+  return 0;
+}
+
+/******************************************************************************/
+
+static void LodePNGText_init(LodePNGInfo* info)
+{
+  info->text_num = 0;
+  info->text_keys = NULL;
+  info->text_strings = NULL;
+}
+
+static void LodePNGText_cleanup(LodePNGInfo* info)
+{
+  size_t i;
+  for(i = 0; i < info->text_num; i++)
+  {
+    string_cleanup(&info->text_keys[i]);
+    string_cleanup(&info->text_strings[i]);
+  }
+  lodepng_free(info->text_keys);
+  lodepng_free(info->text_strings);
+}
+
+static unsigned LodePNGText_copy(LodePNGInfo* dest, const LodePNGInfo* source)
+{
+  size_t i = 0;
+  dest->text_keys = 0;
+  dest->text_strings = 0;
+  dest->text_num = 0;
+  for(i = 0; i < source->text_num; i++)
+  {
+    CERROR_TRY_RETURN(lodepng_add_text(dest, source->text_keys[i], source->text_strings[i]));
+  }
+  return 0;
+}
+
+void lodepng_clear_text(LodePNGInfo* info)
+{
+  LodePNGText_cleanup(info);
+}
+
+unsigned lodepng_add_text(LodePNGInfo* info, const char* key, const char* str)
+{
+  char** new_keys = (char**)(lodepng_realloc(info->text_keys, sizeof(char*) * (info->text_num + 1)));
+  char** new_strings = (char**)(lodepng_realloc(info->text_strings, sizeof(char*) * (info->text_num + 1)));
+  if(!new_keys || !new_strings)
+  {
+    lodepng_free(new_keys);
+    lodepng_free(new_strings);
+    return 83; /*alloc fail*/
+  }
+
+  info->text_num++;
+  info->text_keys = new_keys;
+  info->text_strings = new_strings;
+
+  string_init(&info->text_keys[info->text_num - 1]);
+  string_set(&info->text_keys[info->text_num - 1], key);
+
+  string_init(&info->text_strings[info->text_num - 1]);
+  string_set(&info->text_strings[info->text_num - 1], str);
+
+  return 0;
+}
+
+/******************************************************************************/
+
+static void LodePNGIText_init(LodePNGInfo* info)
+{
+  info->itext_num = 0;
+  info->itext_keys = NULL;
+  info->itext_langtags = NULL;
+  info->itext_transkeys = NULL;
+  info->itext_strings = NULL;
+}
+
+static void LodePNGIText_cleanup(LodePNGInfo* info)
+{
+  size_t i;
+  for(i = 0; i < info->itext_num; i++)
+  {
+    string_cleanup(&info->itext_keys[i]);
+    string_cleanup(&info->itext_langtags[i]);
+    string_cleanup(&info->itext_transkeys[i]);
+    string_cleanup(&info->itext_strings[i]);
+  }
+  lodepng_free(info->itext_keys);
+  lodepng_free(info->itext_langtags);
+  lodepng_free(info->itext_transkeys);
+  lodepng_free(info->itext_strings);
+}
+
+static unsigned LodePNGIText_copy(LodePNGInfo* dest, const LodePNGInfo* source)
+{
+  size_t i = 0;
+  dest->itext_keys = 0;
+  dest->itext_langtags = 0;
+  dest->itext_transkeys = 0;
+  dest->itext_strings = 0;
+  dest->itext_num = 0;
+  for(i = 0; i < source->itext_num; i++)
+  {
+    CERROR_TRY_RETURN(lodepng_add_itext(dest, source->itext_keys[i], source->itext_langtags[i],
+                                        source->itext_transkeys[i], source->itext_strings[i]));
+  }
+  return 0;
+}
+
+void lodepng_clear_itext(LodePNGInfo* info)
+{
+  LodePNGIText_cleanup(info);
+}
+
+unsigned lodepng_add_itext(LodePNGInfo* info, const char* key, const char* langtag,
+                           const char* transkey, const char* str)
+{
+  char** new_keys = (char**)(lodepng_realloc(info->itext_keys, sizeof(char*) * (info->itext_num + 1)));
+  char** new_langtags = (char**)(lodepng_realloc(info->itext_langtags, sizeof(char*) * (info->itext_num + 1)));
+  char** new_transkeys = (char**)(lodepng_realloc(info->itext_transkeys, sizeof(char*) * (info->itext_num + 1)));
+  char** new_strings = (char**)(lodepng_realloc(info->itext_strings, sizeof(char*) * (info->itext_num + 1)));
+  if(!new_keys || !new_langtags || !new_transkeys || !new_strings)
+  {
+    lodepng_free(new_keys);
+    lodepng_free(new_langtags);
+    lodepng_free(new_transkeys);
+    lodepng_free(new_strings);
+    return 83; /*alloc fail*/
+  }
+
+  info->itext_num++;
+  info->itext_keys = new_keys;
+  info->itext_langtags = new_langtags;
+  info->itext_transkeys = new_transkeys;
+  info->itext_strings = new_strings;
+
+  string_init(&info->itext_keys[info->itext_num - 1]);
+  string_set(&info->itext_keys[info->itext_num - 1], key);
+
+  string_init(&info->itext_langtags[info->itext_num - 1]);
+  string_set(&info->itext_langtags[info->itext_num - 1], langtag);
+
+  string_init(&info->itext_transkeys[info->itext_num - 1]);
+  string_set(&info->itext_transkeys[info->itext_num - 1], transkey);
+
+  string_init(&info->itext_strings[info->itext_num - 1]);
+  string_set(&info->itext_strings[info->itext_num - 1], str);
+
+  return 0;
+}
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+void lodepng_info_init(LodePNGInfo* info)
+{
+  lodepng_color_mode_init(&info->color);
+  info->interlace_method = 0;
+  info->compression_method = 0;
+  info->filter_method = 0;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  info->background_defined = 0;
+  info->background_r = info->background_g = info->background_b = 0;
+
+  LodePNGText_init(info);
+  LodePNGIText_init(info);
+
+  info->time_defined = 0;
+  info->phys_defined = 0;
+
+  LodePNGUnknownChunks_init(info);
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+}
+
+void lodepng_info_cleanup(LodePNGInfo* info)
+{
+  lodepng_color_mode_cleanup(&info->color);
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  LodePNGText_cleanup(info);
+  LodePNGIText_cleanup(info);
+
+  LodePNGUnknownChunks_cleanup(info);
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+}
+
+unsigned lodepng_info_copy(LodePNGInfo* dest, const LodePNGInfo* source)
+{
+  lodepng_info_cleanup(dest);
+  *dest = *source;
+  lodepng_color_mode_init(&dest->color);
+  CERROR_TRY_RETURN(lodepng_color_mode_copy(&dest->color, &source->color));
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  CERROR_TRY_RETURN(LodePNGText_copy(dest, source));
+  CERROR_TRY_RETURN(LodePNGIText_copy(dest, source));
+
+  LodePNGUnknownChunks_init(dest);
+  CERROR_TRY_RETURN(LodePNGUnknownChunks_copy(dest, source));
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+  return 0;
+}
+
+void lodepng_info_swap(LodePNGInfo* a, LodePNGInfo* b)
+{
+  LodePNGInfo temp = *a;
+  *a = *b;
+  *b = temp;
+}
+
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*index: bitgroup index, bits: bitgroup size(1, 2 or 4), in: bitgroup value, out: octet array to add bits to*/
+static void addColorBits(unsigned char* out, size_t index, unsigned bits, unsigned in)
+{
+  unsigned m = bits == 1 ? 7 : bits == 2 ? 3 : 1; /*8 / bits - 1*/
+  /*p = the partial index in the byte, e.g. with 4 palettebits it is 0 for first half or 1 for second half*/
+  unsigned p = index & m;
+  in &= (1 << bits) - 1; /*filter out any other bits of the input value*/
+  in = in << (bits * (m - p));
+  if(p == 0) out[index * bits / 8] = in;
+  else out[index * bits / 8] |= in;
+}
+
+typedef struct ColorTree ColorTree;
+
+/*
+One node of a color tree
+This is the data structure used to count the number of unique colors and to get a palette
+index for a color. It's like an octree, but because the alpha channel is used too, each
+node has 16 instead of 8 children.
+*/
+struct ColorTree
+{
+  ColorTree* children[16]; /*up to 16 pointers to ColorTree of next level*/
+  int index; /*the payload. Only has a meaningful value if this is in the last level*/
+};
+
+static void color_tree_init(ColorTree* tree)
+{
+  int i;
+  for(i = 0; i < 16; i++) tree->children[i] = 0;
+  tree->index = -1;
+}
+
+static void color_tree_cleanup(ColorTree* tree)
+{
+  int i;
+  for(i = 0; i < 16; i++)
+  {
+    if(tree->children[i])
+    {
+      color_tree_cleanup(tree->children[i]);
+      lodepng_free(tree->children[i]);
+    }
+  }
+}
+
+/*returns -1 if color not present, its index otherwise*/
+static int color_tree_get(ColorTree* tree, unsigned char r, unsigned char g, unsigned char b, unsigned char a)
+{
+  int bit = 0;
+  for(bit = 0; bit < 8; bit++)
+  {
+    int i = 8 * ((r >> bit) & 1) + 4 * ((g >> bit) & 1) + 2 * ((b >> bit) & 1) + 1 * ((a >> bit) & 1);
+    if(!tree->children[i]) return -1;
+    else tree = tree->children[i];
+  }
+  return tree ? tree->index : -1;
+}
+
+#ifdef LODEPNG_COMPILE_ENCODER
+static int color_tree_has(ColorTree* tree, unsigned char r, unsigned char g, unsigned char b, unsigned char a)
+{
+  return color_tree_get(tree, r, g, b, a) >= 0;
+}
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+/*color is not allowed to already exist.
+Index should be >= 0 (it's signed to be compatible with using -1 for "doesn't exist")*/
+static void color_tree_add(ColorTree* tree,
+                           unsigned char r, unsigned char g, unsigned char b, unsigned char a, int index)
+{
+  int bit;
+  for(bit = 0; bit < 8; bit++)
+  {
+    int i = 8 * ((r >> bit) & 1) + 4 * ((g >> bit) & 1) + 2 * ((b >> bit) & 1) + 1 * ((a >> bit) & 1);
+    if(!tree->children[i])
+    {
+      tree->children[i] = (ColorTree*)lodepng_malloc(sizeof(ColorTree));
+      color_tree_init(tree->children[i]);
+    }
+    tree = tree->children[i];
+  }
+  tree->index = index;
+}
+
+/*put a pixel, given its RGBA color, into image of any color type*/
+static unsigned rgba8ToPixel(unsigned char* out, size_t i,
+                             const LodePNGColorMode* mode, ColorTree* tree /*for palette*/,
+                             unsigned char r, unsigned char g, unsigned char b, unsigned char a)
+{
+  if(mode->colortype == LCT_GREY)
+  {
+    unsigned char grey = r; /*((unsigned short)r + g + b) / 3*/;
+    if(mode->bitdepth == 8) out[i] = grey;
+    else if(mode->bitdepth == 16) out[i * 2 + 0] = out[i * 2 + 1] = grey;
+    else
+    {
+      /*take the most significant bits of grey*/
+      grey = (grey >> (8 - mode->bitdepth)) & ((1 << mode->bitdepth) - 1);
+      addColorBits(out, i, mode->bitdepth, grey);
+    }
+  }
+  else if(mode->colortype == LCT_RGB)
+  {
+    if(mode->bitdepth == 8)
+    {
+      out[i * 3 + 0] = r;
+      out[i * 3 + 1] = g;
+      out[i * 3 + 2] = b;
+    }
+    else
+    {
+      out[i * 6 + 0] = out[i * 6 + 1] = r;
+      out[i * 6 + 2] = out[i * 6 + 3] = g;
+      out[i * 6 + 4] = out[i * 6 + 5] = b;
+    }
+  }
+  else if(mode->colortype == LCT_PALETTE)
+  {
+    int index = color_tree_get(tree, r, g, b, a);
+    if(index < 0) return 82; /*color not in palette*/
+    if(mode->bitdepth == 8) out[i] = index;
+    else addColorBits(out, i, mode->bitdepth, index);
+  }
+  else if(mode->colortype == LCT_GREY_ALPHA)
+  {
+    unsigned char grey = r; /*((unsigned short)r + g + b) / 3*/;
+    if(mode->bitdepth == 8)
+    {
+      out[i * 2 + 0] = grey;
+      out[i * 2 + 1] = a;
+    }
+    else if(mode->bitdepth == 16)
+    {
+      out[i * 4 + 0] = out[i * 4 + 1] = grey;
+      out[i * 4 + 2] = out[i * 4 + 3] = a;
+    }
+  }
+  else if(mode->colortype == LCT_RGBA)
+  {
+    if(mode->bitdepth == 8)
+    {
+      out[i * 4 + 0] = r;
+      out[i * 4 + 1] = g;
+      out[i * 4 + 2] = b;
+      out[i * 4 + 3] = a;
+    }
+    else
+    {
+      out[i * 8 + 0] = out[i * 8 + 1] = r;
+      out[i * 8 + 2] = out[i * 8 + 3] = g;
+      out[i * 8 + 4] = out[i * 8 + 5] = b;
+      out[i * 8 + 6] = out[i * 8 + 7] = a;
+    }
+  }
+
+  return 0; /*no error*/
+}
+
+/*put a pixel, given its RGBA16 color, into image of any color 16-bitdepth type*/
+static unsigned rgba16ToPixel(unsigned char* out, size_t i,
+                              const LodePNGColorMode* mode,
+                              unsigned short r, unsigned short g, unsigned short b, unsigned short a)
+{
+  if(mode->bitdepth != 16) return 85; /*must be 16 for this function*/
+  if(mode->colortype == LCT_GREY)
+  {
+    unsigned short grey = r; /*((unsigned)r + g + b) / 3*/;
+    out[i * 2 + 0] = (grey >> 8) & 255;
+    out[i * 2 + 1] = grey & 255;
+  }
+  else if(mode->colortype == LCT_RGB)
+  {
+    out[i * 6 + 0] = (r >> 8) & 255;
+    out[i * 6 + 1] = r & 255;
+    out[i * 6 + 2] = (g >> 8) & 255;
+    out[i * 6 + 3] = g & 255;
+    out[i * 6 + 4] = (b >> 8) & 255;
+    out[i * 6 + 5] = b & 255;
+  }
+  else if(mode->colortype == LCT_GREY_ALPHA)
+  {
+    unsigned short grey = r; /*((unsigned)r + g + b) / 3*/;
+    out[i * 4 + 0] = (grey >> 8) & 255;
+    out[i * 4 + 1] = grey & 255;
+    out[i * 4 + 2] = (a >> 8) & 255;
+    out[i * 4 + 3] = a & 255;
+  }
+  else if(mode->colortype == LCT_RGBA)
+  {
+    out[i * 8 + 0] = (r >> 8) & 255;
+    out[i * 8 + 1] = r & 255;
+    out[i * 8 + 2] = (g >> 8) & 255;
+    out[i * 8 + 3] = g & 255;
+    out[i * 8 + 4] = (b >> 8) & 255;
+    out[i * 8 + 5] = b & 255;
+    out[i * 8 + 6] = (a >> 8) & 255;
+    out[i * 8 + 7] = a & 255;
+  }
+
+  return 0; /*no error*/
+}
+
+/*Get RGBA8 color of pixel with index i (y * width + x) from the raw image with given color type.*/
+static unsigned getPixelColorRGBA8(unsigned char* r, unsigned char* g,
+                                   unsigned char* b, unsigned char* a,
+                                   const unsigned char* in, size_t i,
+                                   const LodePNGColorMode* mode,
+                                   unsigned fix_png)
+{
+  if(mode->colortype == LCT_GREY)
+  {
+    if(mode->bitdepth == 8)
+    {
+      *r = *g = *b = in[i];
+      if(mode->key_defined && *r == mode->key_r) *a = 0;
+      else *a = 255;
+    }
+    else if(mode->bitdepth == 16)
+    {
+      *r = *g = *b = in[i * 2 + 0];
+      if(mode->key_defined && 256U * in[i * 2 + 0] + in[i * 2 + 1] == mode->key_r) *a = 0;
+      else *a = 255;
+    }
+    else
+    {
+      unsigned highest = ((1U << mode->bitdepth) - 1U); /*highest possible value for this bit depth*/
+      size_t j = i * mode->bitdepth;
+      unsigned value = readBitsFromReversedStream(&j, in, mode->bitdepth);
+      *r = *g = *b = (value * 255) / highest;
+      if(mode->key_defined && value == mode->key_r) *a = 0;
+      else *a = 255;
+    }
+  }
+  else if(mode->colortype == LCT_RGB)
+  {
+    if(mode->bitdepth == 8)
+    {
+      *r = in[i * 3 + 0]; *g = in[i * 3 + 1]; *b = in[i * 3 + 2];
+      if(mode->key_defined && *r == mode->key_r && *g == mode->key_g && *b == mode->key_b) *a = 0;
+      else *a = 255;
+    }
+    else
+    {
+      *r = in[i * 6 + 0];
+      *g = in[i * 6 + 2];
+      *b = in[i * 6 + 4];
+      if(mode->key_defined && 256U * in[i * 6 + 0] + in[i * 6 + 1] == mode->key_r
+         && 256U * in[i * 6 + 2] + in[i * 6 + 3] == mode->key_g
+         && 256U * in[i * 6 + 4] + in[i * 6 + 5] == mode->key_b) *a = 0;
+      else *a = 255;
+    }
+  }
+  else if(mode->colortype == LCT_PALETTE)
+  {
+    unsigned index;
+    if(mode->bitdepth == 8) index = in[i];
+    else
+    {
+      size_t j = i * mode->bitdepth;
+      index = readBitsFromReversedStream(&j, in, mode->bitdepth);
+    }
+
+    if(index >= mode->palettesize)
+    {
+      /*This is an error according to the PNG spec, but fix_png can ignore it*/
+      if(!fix_png) return (mode->bitdepth == 8 ? 46 : 47); /*index out of palette*/
+      *r = *g = *b = 0;
+      *a = 255;
+    }
+    else
+    {
+      *r = mode->palette[index * 4 + 0];
+      *g = mode->palette[index * 4 + 1];
+      *b = mode->palette[index * 4 + 2];
+      *a = mode->palette[index * 4 + 3];
+    }
+  }
+  else if(mode->colortype == LCT_GREY_ALPHA)
+  {
+    if(mode->bitdepth == 8)
+    {
+      *r = *g = *b = in[i * 2 + 0];
+      *a = in[i * 2 + 1];
+    }
+    else
+    {
+      *r = *g = *b = in[i * 4 + 0];
+      *a = in[i * 4 + 2];
+    }
+  }
+  else if(mode->colortype == LCT_RGBA)
+  {
+    if(mode->bitdepth == 8)
+    {
+      *r = in[i * 4 + 0];
+      *g = in[i * 4 + 1];
+      *b = in[i * 4 + 2];
+      *a = in[i * 4 + 3];
+    }
+    else
+    {
+      *r = in[i * 8 + 0];
+      *g = in[i * 8 + 2];
+      *b = in[i * 8 + 4];
+      *a = in[i * 8 + 6];
+    }
+  }
+
+  return 0; /*no error*/
+}
+
+/*Similar to getPixelColorRGBA8, but with all the for loops inside of the color
+mode test cases, optimized to convert the colors much faster, when converting
+to RGBA or RGB with 8 bit per cannel. buffer must be RGBA or RGB output with
+enough memory, if has_alpha is true the output is RGBA. mode has the color mode
+of the input buffer.*/
+static unsigned getPixelColorsRGBA8(unsigned char* buffer, size_t numpixels,
+                                    unsigned has_alpha, const unsigned char* in,
+                                    const LodePNGColorMode* mode,
+                                    unsigned fix_png)
+{
+  unsigned num_channels = has_alpha ? 4 : 3;
+  size_t i;
+  if(mode->colortype == LCT_GREY)
+  {
+    if(mode->bitdepth == 8)
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = buffer[1] = buffer[2] = in[i];
+        if(has_alpha) buffer[3] = mode->key_defined && in[i] == mode->key_r ? 0 : 255;
+      }
+    }
+    else if(mode->bitdepth == 16)
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = buffer[1] = buffer[2] = in[i * 2];
+        if(has_alpha) buffer[3] = mode->key_defined && 256U * in[i * 2 + 0] + in[i * 2 + 1] == mode->key_r ? 0 : 255;
+      }
+    }
+    else
+    {
+      unsigned highest = ((1U << mode->bitdepth) - 1U); /*highest possible value for this bit depth*/
+      size_t j = 0;
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        unsigned value = readBitsFromReversedStream(&j, in, mode->bitdepth);
+        buffer[0] = buffer[1] = buffer[2] = (value * 255) / highest;
+        if(has_alpha) buffer[3] = mode->key_defined && value == mode->key_r ? 0 : 255;
+      }
+    }
+  }
+  else if(mode->colortype == LCT_RGB)
+  {
+    if(mode->bitdepth == 8)
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = in[i * 3 + 0];
+        buffer[1] = in[i * 3 + 1];
+        buffer[2] = in[i * 3 + 2];
+        if(has_alpha) buffer[3] = mode->key_defined && buffer[0] == mode->key_r
+           && buffer[1]== mode->key_g && buffer[2] == mode->key_b ? 0 : 255;
+      }
+    }
+    else
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = in[i * 6 + 0];
+        buffer[1] = in[i * 6 + 2];
+        buffer[2] = in[i * 6 + 4];
+        if(has_alpha) buffer[3] = mode->key_defined
+           && 256U * in[i * 6 + 0] + in[i * 6 + 1] == mode->key_r
+           && 256U * in[i * 6 + 2] + in[i * 6 + 3] == mode->key_g
+           && 256U * in[i * 6 + 4] + in[i * 6 + 5] == mode->key_b ? 0 : 255;
+      }
+    }
+  }
+  else if(mode->colortype == LCT_PALETTE)
+  {
+    unsigned index;
+    size_t j = 0;
+    for(i = 0; i < numpixels; i++, buffer += num_channels)
+    {
+      if(mode->bitdepth == 8) index = in[i];
+      else index = readBitsFromReversedStream(&j, in, mode->bitdepth);
+
+      if(index >= mode->palettesize)
+      {
+        /*This is an error according to the PNG spec, but fix_png can ignore it*/
+        if(!fix_png) return (mode->bitdepth == 8 ? 46 : 47); /*index out of palette*/
+        buffer[0] = buffer[1] = buffer[2] = 0;
+        if(has_alpha) buffer[3] = 255;
+      }
+      else
+      {
+        buffer[0] = mode->palette[index * 4 + 0];
+        buffer[1] = mode->palette[index * 4 + 1];
+        buffer[2] = mode->palette[index * 4 + 2];
+        if(has_alpha) buffer[3] = mode->palette[index * 4 + 3];
+      }
+    }
+  }
+  else if(mode->colortype == LCT_GREY_ALPHA)
+  {
+    if(mode->bitdepth == 8)
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = buffer[1] = buffer[2] = in[i * 2 + 0];
+        if(has_alpha) buffer[3] = in[i * 2 + 1];
+      }
+    }
+    else
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = buffer[1] = buffer[2] = in[i * 4 + 0];
+        if(has_alpha) buffer[3] = in[i * 4 + 2];
+      }
+    }
+  }
+  else if(mode->colortype == LCT_RGBA)
+  {
+    if(mode->bitdepth == 8)
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = in[i * 4 + 0];
+        buffer[1] = in[i * 4 + 1];
+        buffer[2] = in[i * 4 + 2];
+        if(has_alpha) buffer[3] = in[i * 4 + 3];
+      }
+    }
+    else
+    {
+      for(i = 0; i < numpixels; i++, buffer += num_channels)
+      {
+        buffer[0] = in[i * 8 + 0];
+        buffer[1] = in[i * 8 + 2];
+        buffer[2] = in[i * 8 + 4];
+        if(has_alpha) buffer[3] = in[i * 8 + 6];
+      }
+    }
+  }
+
+  return 0; /*no error*/
+}
+
+/*Get RGBA16 color of pixel with index i (y * width + x) from the raw image with
+given color type, but the given color type must be 16-bit itself.*/
+static unsigned getPixelColorRGBA16(unsigned short* r, unsigned short* g, unsigned short* b, unsigned short* a,
+                                    const unsigned char* in, size_t i, const LodePNGColorMode* mode)
+{
+  if(mode->bitdepth != 16) return 85; /*error: this function only supports 16-bit input*/
+
+  if(mode->colortype == LCT_GREY)
+  {
+    *r = *g = *b = 256 * in[i * 2 + 0] + in[i * 2 + 1];
+    if(mode->key_defined && 256U * in[i * 2 + 0] + in[i * 2 + 1] == mode->key_r) *a = 0;
+    else *a = 65535;
+  }
+  else if(mode->colortype == LCT_RGB)
+  {
+    *r = 256 * in[i * 6 + 0] + in[i * 6 + 1];
+    *g = 256 * in[i * 6 + 2] + in[i * 6 + 3];
+    *b = 256 * in[i * 6 + 4] + in[i * 6 + 5];
+    if(mode->key_defined && 256U * in[i * 6 + 0] + in[i * 6 + 1] == mode->key_r
+       && 256U * in[i * 6 + 2] + in[i * 6 + 3] == mode->key_g
+       && 256U * in[i * 6 + 4] + in[i * 6 + 5] == mode->key_b) *a = 0;
+    else *a = 65535;
+  }
+  else if(mode->colortype == LCT_GREY_ALPHA)
+  {
+    *r = *g = *b = 256 * in[i * 4 + 0] + in[i * 4 + 1];
+    *a = 256 * in[i * 4 + 2] + in[i * 4 + 3];
+  }
+  else if(mode->colortype == LCT_RGBA)
+  {
+    *r = 256 * in[i * 8 + 0] + in[i * 8 + 1];
+    *g = 256 * in[i * 8 + 2] + in[i * 8 + 3];
+    *b = 256 * in[i * 8 + 4] + in[i * 8 + 5];
+    *a = 256 * in[i * 8 + 6] + in[i * 8 + 7];
+  }
+  else return 85; /*error: this function only supports 16-bit input, not palettes*/
+
+  return 0; /*no error*/
+}
+
+/*
+converts from any color type to 24-bit or 32-bit (later maybe more supported). return value = LodePNG error code
+the out buffer must have (w * h * bpp + 7) / 8 bytes, where bpp is the bits per pixel of the output color type
+(lodepng_get_bpp) for < 8 bpp images, there may _not_ be padding bits at the end of scanlines.
+*/
+unsigned lodepng_convert(unsigned char* out, const unsigned char* in,
+                         LodePNGColorMode* mode_out, const LodePNGColorMode* mode_in,
+                         unsigned w, unsigned h, unsigned fix_png)
+{
+  unsigned error = 0;
+  size_t i;
+  ColorTree tree;
+  size_t numpixels = w * h;
+
+  if(lodepng_color_mode_equal(mode_out, mode_in))
+  {
+    size_t numbytes = lodepng_get_raw_size(w, h, mode_in);
+    for(i = 0; i < numbytes; i++) out[i] = in[i];
+    return error;
+  }
+
+  if(mode_out->colortype == LCT_PALETTE)
+  {
+    size_t palsize = 1 << mode_out->bitdepth;
+    if(mode_out->palettesize < palsize) palsize = mode_out->palettesize;
+    color_tree_init(&tree);
+    for(i = 0; i < palsize; i++)
+    {
+      unsigned char* p = &mode_out->palette[i * 4];
+      color_tree_add(&tree, p[0], p[1], p[2], p[3], i);
+    }
+  }
+
+  if(mode_in->bitdepth == 16 && mode_out->bitdepth == 16)
+  {
+    for(i = 0; i < numpixels; i++)
+    {
+      unsigned short r = 0, g = 0, b = 0, a = 0;
+      error = getPixelColorRGBA16(&r, &g, &b, &a, in, i, mode_in);
+      if(error) break;
+      error = rgba16ToPixel(out, i, mode_out, r, g, b, a);
+      if(error) break;
+    }
+  }
+  else if(mode_out->bitdepth == 8 && mode_out->colortype == LCT_RGBA)
+  {
+    error = getPixelColorsRGBA8(out, numpixels, 1, in, mode_in, fix_png);
+  }
+  else if(mode_out->bitdepth == 8 && mode_out->colortype == LCT_RGB)
+  {
+    error = getPixelColorsRGBA8(out, numpixels, 0, in, mode_in, fix_png);
+  }
+  else
+  {
+    unsigned char r = 0, g = 0, b = 0, a = 0;
+    for(i = 0; i < numpixels; i++)
+    {
+      error = getPixelColorRGBA8(&r, &g, &b, &a, in, i, mode_in, fix_png);
+      if(error) break;
+      error = rgba8ToPixel(out, i, mode_out, &tree, r, g, b, a);
+      if(error) break;
+    }
+  }
+
+  if(mode_out->colortype == LCT_PALETTE)
+  {
+    color_tree_cleanup(&tree);
+  }
+
+  return error;
+}
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+typedef struct ColorProfile
+{
+  unsigned char sixteenbit; /*needs more than 8 bits per channel*/
+  unsigned char sixteenbit_done;
+
+
+  unsigned char colored; /*not greyscale*/
+  unsigned char colored_done;
+
+  unsigned char key; /*a color key is required, or more*/
+  unsigned short key_r; /*these values are always in 16-bit bitdepth in the profile*/
+  unsigned short key_g;
+  unsigned short key_b;
+  unsigned char alpha; /*alpha channel, or alpha palette, required*/
+  unsigned char alpha_done;
+
+  unsigned numcolors;
+  ColorTree tree; /*for listing the counted colors, up to 256*/
+  unsigned char* palette; /*size 1024. Remember up to the first 256 RGBA colors*/
+  unsigned maxnumcolors; /*if more than that amount counted*/
+  unsigned char numcolors_done;
+
+  unsigned greybits; /*amount of bits required for greyscale (1, 2, 4, 8). Does not take 16 bit into account.*/
+  unsigned char greybits_done;
+
+} ColorProfile;
+
+static void color_profile_init(ColorProfile* profile, const LodePNGColorMode* mode)
+{
+  profile->sixteenbit = 0;
+  profile->sixteenbit_done = mode->bitdepth == 16 ? 0 : 1;
+
+  profile->colored = 0;
+  profile->colored_done = lodepng_is_greyscale_type(mode) ? 1 : 0;
+
+  profile->key = 0;
+  profile->alpha = 0;
+  profile->alpha_done = lodepng_can_have_alpha(mode) ? 0 : 1;
+
+  profile->numcolors = 0;
+  color_tree_init(&profile->tree);
+  profile->palette = (unsigned char*)lodepng_malloc(1024);
+  profile->maxnumcolors = 257;
+  if(lodepng_get_bpp(mode) <= 8)
+  {
+    int bpp = lodepng_get_bpp(mode);
+    profile->maxnumcolors = bpp == 1 ? 2 : (bpp == 2 ? 4 : (bpp == 4 ? 16 : 256));
+  }
+  profile->numcolors_done = 0;
+
+  profile->greybits = 1;
+  profile->greybits_done = lodepng_get_bpp(mode) == 1 ? 1 : 0;
+}
+
+static void color_profile_cleanup(ColorProfile* profile)
+{
+  color_tree_cleanup(&profile->tree);
+  lodepng_free(profile->palette);
+}
+
+/*function used for debug purposes with C++*/
+/*void printColorProfile(ColorProfile* p)
+{
+  std::cout << "sixteenbit: " << (int)p->sixteenbit << std::endl;
+  std::cout << "sixteenbit_done: " << (int)p->sixteenbit_done << std::endl;
+  std::cout << "colored: " << (int)p->colored << std::endl;
+  std::cout << "colored_done: " << (int)p->colored_done << std::endl;
+  std::cout << "key: " << (int)p->key << std::endl;
+  std::cout << "key_r: " << (int)p->key_r << std::endl;
+  std::cout << "key_g: " << (int)p->key_g << std::endl;
+  std::cout << "key_b: " << (int)p->key_b << std::endl;
+  std::cout << "alpha: " << (int)p->alpha << std::endl;
+  std::cout << "alpha_done: " << (int)p->alpha_done << std::endl;
+  std::cout << "numcolors: " << (int)p->numcolors << std::endl;
+  std::cout << "maxnumcolors: " << (int)p->maxnumcolors << std::endl;
+  std::cout << "numcolors_done: " << (int)p->numcolors_done << std::endl;
+  std::cout << "greybits: " << (int)p->greybits << std::endl;
+  std::cout << "greybits_done: " << (int)p->greybits_done << std::endl;
+}*/
+
+/*Returns how many bits needed to represent given value (max 8 bit)*/
+unsigned getValueRequiredBits(unsigned short value)
+{
+  if(value == 0 || value == 255) return 1;
+  /*The scaling of 2-bit and 4-bit values uses multiples of 85 and 17*/
+  if(value % 17 == 0) return value % 85 == 0 ? 2 : 4;
+  return 8;
+}
+
+/*profile must already have been inited with mode.
+It's ok to set some parameters of profile to done already.*/
+static unsigned get_color_profile(ColorProfile* profile,
+                                  const unsigned char* in,
+                                  size_t numpixels /*must be full image size, for certain filesize based choices*/,
+                                  const LodePNGColorMode* mode,
+                                  unsigned fix_png)
+{
+  unsigned error = 0;
+  size_t i;
+
+  if(mode->bitdepth == 16)
+  {
+    for(i = 0; i < numpixels; i++)
+    {
+      unsigned short r, g, b, a;
+      error = getPixelColorRGBA16(&r, &g, &b, &a, in, i, mode);
+      if(error) break;
+
+      /*a color is considered good for 8-bit if the first byte and the second byte are equal,
+        (so if it's divisible through 257), NOT necessarily if the second byte is 0*/
+      if(!profile->sixteenbit_done
+          && (((r & 255) != ((r >> 8) & 255))
+           || ((g & 255) != ((g >> 8) & 255))
+           || ((b & 255) != ((b >> 8) & 255))))
+      {
+        profile->sixteenbit = 1;
+        profile->sixteenbit_done = 1;
+        profile->greybits_done = 1; /*greybits is not applicable anymore at 16-bit*/
+        profile->numcolors_done = 1; /*counting colors no longer useful, palette doesn't support 16-bit*/
+      }
+
+      if(!profile->colored_done && (r != g || r != b))
+      {
+        profile->colored = 1;
+        profile->colored_done = 1;
+        profile->greybits_done = 1; /*greybits is not applicable anymore*/
+      }
+
+      if(!profile->alpha_done && a != 65535)
+      {
+        /*only use color key if numpixels large enough to justify tRNS chunk size*/
+        if(a == 0 && numpixels > 16 && !(profile->key && (r != profile->key_r || g != profile->key_g || b != profile->key_b)))
+        {
+          if(!profile->alpha && !profile->key)
+          {
+            profile->key = 1;
+            profile->key_r = r;
+            profile->key_g = g;
+            profile->key_b = b;
+          }
+        }
+        else
+        {
+          profile->alpha = 1;
+          profile->alpha_done = 1;
+          profile->greybits_done = 1; /*greybits is not applicable anymore*/
+        }
+      }
+
+      /* Color key cannot be used if an opaque pixel also has that RGB color. */
+      if(!profile->alpha_done && a == 65535 && profile->key
+          && r == profile->key_r && g == profile->key_g && b == profile->key_b)
+      {
+          profile->alpha = 1;
+          profile->alpha_done = 1;
+          profile->greybits_done = 1; /*greybits is not applicable anymore*/
+      }
+
+      if(!profile->greybits_done)
+      {
+        /*assuming 8-bit r, this test does not care about 16-bit*/
+        unsigned bits = getValueRequiredBits(r);
+        if(bits > profile->greybits) profile->greybits = bits;
+        if(profile->greybits >= 8) profile->greybits_done = 1;
+      }
+
+      if(!profile->numcolors_done)
+      {
+        /*assuming 8-bit rgba, this test does not care about 16-bit*/
+        if(!color_tree_has(&profile->tree, (unsigned char)r, (unsigned char)g, (unsigned char)b, (unsigned char)a))
+        {
+          color_tree_add(&profile->tree, (unsigned char)r, (unsigned char)g, (unsigned char)b, (unsigned char)a,
+            profile->numcolors);
+          if(profile->numcolors < 256)
+          {
+            unsigned char* p = profile->palette;
+            unsigned i = profile->numcolors;
+            p[i * 4 + 0] = (unsigned char)r;
+            p[i * 4 + 1] = (unsigned char)g;
+            p[i * 4 + 2] = (unsigned char)b;
+            p[i * 4 + 3] = (unsigned char)a;
+          }
+          profile->numcolors++;
+          if(profile->numcolors >= profile->maxnumcolors) profile->numcolors_done = 1;
+        }
+      }
+
+      if(profile->alpha_done && profile->numcolors_done
+      && profile->colored_done && profile->sixteenbit_done && profile->greybits_done)
+      {
+        break;
+      }
+    };
+  }
+  else /* < 16-bit */
+  {
+    for(i = 0; i < numpixels; i++)
+    {
+      unsigned char r = 0, g = 0, b = 0, a = 0;
+      error = getPixelColorRGBA8(&r, &g, &b, &a, in, i, mode, fix_png);
+      if(error) break;
+
+      if(!profile->colored_done && (r != g || r != b))
+      {
+        profile->colored = 1;
+        profile->colored_done = 1;
+        profile->greybits_done = 1; /*greybits is not applicable anymore*/
+      }
+
+      if(!profile->alpha_done && a != 255)
+      {
+        if(a == 0 && !(profile->key && (r != profile->key_r || g != profile->key_g || b != profile->key_b)))
+        {
+          if(!profile->key)
+          {
+            profile->key = 1;
+            profile->key_r = r;
+            profile->key_g = g;
+            profile->key_b = b;
+          }
+        }
+        else
+        {
+          profile->alpha = 1;
+          profile->alpha_done = 1;
+          profile->greybits_done = 1; /*greybits is not applicable anymore*/
+        }
+      }
+
+      /* Color key cannot be used if an opaque pixel also has that RGB color. */
+      if(!profile->alpha_done && a == 255 && profile->key
+          && r == profile->key_r && g == profile->key_g && b == profile->key_b)
+      {
+          profile->alpha = 1;
+          profile->alpha_done = 1;
+          profile->greybits_done = 1; /*greybits is not applicable anymore*/
+      }
+
+      if(!profile->greybits_done)
+      {
+        unsigned bits = getValueRequiredBits(r);
+        if(bits > profile->greybits) profile->greybits = bits;
+        if(profile->greybits >= 8) profile->greybits_done = 1;
+      }
+
+      if(!profile->numcolors_done)
+      {
+        if(!color_tree_has(&profile->tree, r, g, b, a))
+        {
+
+          color_tree_add(&profile->tree, r, g, b, a, profile->numcolors);
+          if(profile->numcolors < 256)
+          {
+            unsigned char* p = profile->palette;
+            unsigned i = profile->numcolors;
+            p[i * 4 + 0] = r;
+            p[i * 4 + 1] = g;
+            p[i * 4 + 2] = b;
+            p[i * 4 + 3] = a;
+          }
+          profile->numcolors++;
+          if(profile->numcolors >= profile->maxnumcolors) profile->numcolors_done = 1;
+        }
+      }
+
+      if(profile->alpha_done && profile->numcolors_done && profile->colored_done && profile->greybits_done)
+      {
+        break;
+      }
+    };
+  }
+
+  /*make the profile's key always 16-bit for consistency*/
+  if(mode->bitdepth < 16)
+  {
+    /*repeat each byte twice*/
+    profile->key_r *= 257;
+    profile->key_g *= 257;
+    profile->key_b *= 257;
+  }
+
+  return error;
+}
+
+static void setColorKeyFrom16bit(LodePNGColorMode* mode_out, unsigned r, unsigned g, unsigned b, unsigned bitdepth)
+{
+  unsigned mask = (1 << bitdepth) - 1;
+  mode_out->key_defined = 1;
+  mode_out->key_r = r & mask;
+  mode_out->key_g = g & mask;
+  mode_out->key_b = b & mask;
+}
+
+/*updates values of mode with a potentially smaller color model. mode_out should
+contain the user chosen color model, but will be overwritten with the new chosen one.*/
+unsigned lodepng_auto_choose_color(LodePNGColorMode* mode_out,
+                                   const unsigned char* image, unsigned w, unsigned h,
+                                   const LodePNGColorMode* mode_in,
+                                   LodePNGAutoConvert auto_convert)
+{
+  ColorProfile profile;
+  unsigned error = 0;
+  int no_nibbles = auto_convert == LAC_AUTO_NO_NIBBLES || auto_convert == LAC_AUTO_NO_NIBBLES_NO_PALETTE;
+  int no_palette = auto_convert == LAC_AUTO_NO_PALETTE || auto_convert == LAC_AUTO_NO_NIBBLES_NO_PALETTE;
+
+  if(auto_convert == LAC_ALPHA)
+  {
+    if(mode_out->colortype != LCT_RGBA && mode_out->colortype != LCT_GREY_ALPHA) return 0;
+  }
+
+  color_profile_init(&profile, mode_in);
+  if(auto_convert == LAC_ALPHA)
+  {
+    profile.colored_done = 1;
+    profile.greybits_done = 1;
+    profile.numcolors_done = 1;
+    profile.sixteenbit_done = 1;
+  }
+  error = get_color_profile(&profile, image, w * h, mode_in, 0 /*fix_png*/);
+  if(!error && auto_convert == LAC_ALPHA)
+  {
+    if(!profile.alpha)
+    {
+      mode_out->colortype = (mode_out->colortype == LCT_RGBA ? LCT_RGB : LCT_GREY);
+      if(profile.key) setColorKeyFrom16bit(mode_out, profile.key_r, profile.key_g, profile.key_b, mode_out->bitdepth);
+    }
+  }
+  else if(!error && auto_convert != LAC_ALPHA)
+  {
+    mode_out->key_defined = 0;
+
+    if(profile.sixteenbit)
+    {
+      mode_out->bitdepth = 16;
+      if(profile.alpha)
+      {
+        mode_out->colortype = profile.colored ? LCT_RGBA : LCT_GREY_ALPHA;
+      }
+      else
+      {
+        mode_out->colortype = profile.colored ? LCT_RGB : LCT_GREY;
+        if(profile.key) setColorKeyFrom16bit(mode_out, profile.key_r, profile.key_g, profile.key_b, mode_out->bitdepth);
+      }
+    }
+    else /*less than 16 bits per channel*/
+    {
+      /*don't add palette overhead if image hasn't got a lot of pixels*/
+      unsigned n = profile.numcolors;
+      int palette_ok = !no_palette && n <= 256 && (n * 2 < w * h);
+      unsigned palettebits = n <= 2 ? 1 : (n <= 4 ? 2 : (n <= 16 ? 4 : 8));
+      int grey_ok = !profile.colored && !profile.alpha; /*grey without alpha, with potentially low bits*/
+      if(palette_ok || grey_ok)
+      {
+        if(!palette_ok || (grey_ok && profile.greybits <= palettebits))
+        {
+          unsigned grey = profile.key_r;
+          mode_out->colortype = LCT_GREY;
+          mode_out->bitdepth = profile.greybits;
+          if(profile.key) setColorKeyFrom16bit(mode_out, grey, grey, grey, mode_out->bitdepth);
+        }
+        else
+        {
+          /*fill in the palette*/
+          unsigned i;
+          unsigned char* p = profile.palette;
+          /*remove potential earlier palette*/
+          lodepng_palette_clear(mode_out);
+          for(i = 0; i < profile.numcolors; i++)
+          {
+            error = lodepng_palette_add(mode_out, p[i * 4 + 0], p[i * 4 + 1], p[i * 4 + 2], p[i * 4 + 3]);
+            if(error) break;
+          }
+
+          mode_out->colortype = LCT_PALETTE;
+          mode_out->bitdepth = palettebits;
+        }
+      }
+      else /*8-bit per channel*/
+      {
+        mode_out->bitdepth = 8;
+        if(profile.alpha)
+        {
+          mode_out->colortype = profile.colored ? LCT_RGBA : LCT_GREY_ALPHA;
+        }
+        else
+        {
+          mode_out->colortype = profile.colored ? LCT_RGB : LCT_GREY /*LCT_GREY normally won't occur, already done earlier*/;
+          if(profile.key) setColorKeyFrom16bit(mode_out, profile.key_r, profile.key_g, profile.key_b, mode_out->bitdepth);
+        }
+      }
+    }
+  }
+
+  color_profile_cleanup(&profile);
+
+  if(mode_out->colortype == LCT_PALETTE && mode_in->palettesize == mode_out->palettesize)
+  {
+    /*In this case keep the palette order of the input, so that the user can choose an optimal one*/
+    size_t i;
+    for(i = 0; i < mode_in->palettesize * 4; i++)
+    {
+      mode_out->palette[i] = mode_in->palette[i];
+    }
+  }
+
+  if(no_nibbles && mode_out->bitdepth < 8)
+  {
+    /*palette can keep its small amount of colors, as long as no indices use it*/
+    mode_out->bitdepth = 8;
+  }
+
+  return error;
+}
+
+#endif /* #ifdef LODEPNG_COMPILE_ENCODER */
+
+/*
+Paeth predicter, used by PNG filter type 4
+The parameters are of type short, but should come from unsigned chars, the shorts
+are only needed to make the paeth calculation correct.
+*/
+static unsigned char paethPredictor(short a, short b, short c)
+{
+  short pa = abs(b - c);
+  short pb = abs(a - c);
+  short pc = abs(a + b - c - c);
+
+  if(pc < pa && pc < pb) return (unsigned char)c;
+  else if(pb < pa) return (unsigned char)b;
+  else return (unsigned char)a;
+}
+
+/*shared values used by multiple Adam7 related functions*/
+
+static const unsigned ADAM7_IX[7] = { 0, 4, 0, 2, 0, 1, 0 }; /*x start values*/
+static const unsigned ADAM7_IY[7] = { 0, 0, 4, 0, 2, 0, 1 }; /*y start values*/
+static const unsigned ADAM7_DX[7] = { 8, 8, 4, 4, 2, 2, 1 }; /*x delta values*/
+static const unsigned ADAM7_DY[7] = { 8, 8, 8, 4, 4, 2, 2 }; /*y delta values*/
+
+/*
+Outputs various dimensions and positions in the image related to the Adam7 reduced images.
+passw: output containing the width of the 7 passes
+passh: output containing the height of the 7 passes
+filter_passstart: output containing the index of the start and end of each
+ reduced image with filter bytes
+padded_passstart output containing the index of the start and end of each
+ reduced image when without filter bytes but with padded scanlines
+passstart: output containing the index of the start and end of each reduced
+ image without padding between scanlines, but still padding between the images
+w, h: width and height of non-interlaced image
+bpp: bits per pixel
+"padded" is only relevant if bpp is less than 8 and a scanline or image does not
+ end at a full byte
+*/
+static void Adam7_getpassvalues(unsigned passw[7], unsigned passh[7], size_t filter_passstart[8],
+                                size_t padded_passstart[8], size_t passstart[8], unsigned w, unsigned h, unsigned bpp)
+{
+  /*the passstart values have 8 values: the 8th one indicates the byte after the end of the 7th (= last) pass*/
+  unsigned i;
+
+  /*calculate width and height in pixels of each pass*/
+  for(i = 0; i < 7; i++)
+  {
+    passw[i] = (w + ADAM7_DX[i] - ADAM7_IX[i] - 1) / ADAM7_DX[i];
+    passh[i] = (h + ADAM7_DY[i] - ADAM7_IY[i] - 1) / ADAM7_DY[i];
+    if(passw[i] == 0) passh[i] = 0;
+    if(passh[i] == 0) passw[i] = 0;
+  }
+
+  filter_passstart[0] = padded_passstart[0] = passstart[0] = 0;
+  for(i = 0; i < 7; i++)
+  {
+    /*if passw[i] is 0, it's 0 bytes, not 1 (no filtertype-byte)*/
+    filter_passstart[i + 1] = filter_passstart[i]
+                            + ((passw[i] && passh[i]) ? passh[i] * (1 + (passw[i] * bpp + 7) / 8) : 0);
+    /*bits padded if needed to fill full byte at end of each scanline*/
+    padded_passstart[i + 1] = padded_passstart[i] + passh[i] * ((passw[i] * bpp + 7) / 8);
+    /*only padded at end of reduced image*/
+    passstart[i + 1] = passstart[i] + (passh[i] * passw[i] * bpp + 7) / 8;
+  }
+}
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / PNG Decoder                                                            / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*read the information from the header and store it in the LodePNGInfo. return value is error*/
+unsigned lodepng_inspect(unsigned* w, unsigned* h, LodePNGState* state,
+                         const unsigned char* in, size_t insize)
+{
+  LodePNGInfo* info = &state->info_png;
+  if(insize == 0 || in == 0)
+  {
+    CERROR_RETURN_ERROR(state->error, 48); /*error: the given data is empty*/
+  }
+  if(insize < 29)
+  {
+    CERROR_RETURN_ERROR(state->error, 27); /*error: the data length is smaller than the length of a PNG header*/
+  }
+
+  /*when decoding a new PNG image, make sure all parameters created after previous decoding are reset*/
+  lodepng_info_cleanup(info);
+  lodepng_info_init(info);
+
+  if(in[0] != 137 || in[1] != 80 || in[2] != 78 || in[3] != 71
+     || in[4] != 13 || in[5] != 10 || in[6] != 26 || in[7] != 10)
+  {
+    CERROR_RETURN_ERROR(state->error, 28); /*error: the first 8 bytes are not the correct PNG signature*/
+  }
+  if(in[12] != 'I' || in[13] != 'H' || in[14] != 'D' || in[15] != 'R')
+  {
+    CERROR_RETURN_ERROR(state->error, 29); /*error: it doesn't start with a IHDR chunk!*/
+  }
+
+  /*read the values given in the header*/
+  *w = lodepng_read32bitInt(&in[16]);
+  *h = lodepng_read32bitInt(&in[20]);
+  info->color.bitdepth = in[24];
+  info->color.colortype = (LodePNGColorType)in[25];
+  info->compression_method = in[26];
+  info->filter_method = in[27];
+  info->interlace_method = in[28];
+
+  if(!state->decoder.ignore_crc)
+  {
+    unsigned CRC = lodepng_read32bitInt(&in[29]);
+    unsigned checksum = lodepng_crc32(&in[12], 17);
+    if(CRC != checksum)
+    {
+      CERROR_RETURN_ERROR(state->error, 57); /*invalid CRC*/
+    }
+  }
+
+  /*error: only compression method 0 is allowed in the specification*/
+  if(info->compression_method != 0) CERROR_RETURN_ERROR(state->error, 32);
+  /*error: only filter method 0 is allowed in the specification*/
+  if(info->filter_method != 0) CERROR_RETURN_ERROR(state->error, 33);
+  /*error: only interlace methods 0 and 1 exist in the specification*/
+  if(info->interlace_method > 1) CERROR_RETURN_ERROR(state->error, 34);
+
+  state->error = checkColorValidity(info->color.colortype, info->color.bitdepth);
+  return state->error;
+}
+
+static unsigned unfilterScanline(unsigned char* recon, const unsigned char* scanline, const unsigned char* precon,
+                                 size_t bytewidth, unsigned char filterType, size_t length)
+{
+  /*
+  For PNG filter method 0
+  unfilter a PNG image scanline by scanline. when the pixels are smaller than 1 byte,
+  the filter works byte per byte (bytewidth = 1)
+  precon is the previous unfiltered scanline, recon the result, scanline the current one
+  the incoming scanlines do NOT include the filtertype byte, that one is given in the parameter filterType instead
+  recon and scanline MAY be the same memory address! precon must be disjoint.
+  */
+
+  size_t i;
+  switch(filterType)
+  {
+    case 0:
+      for(i = 0; i < length; i++) recon[i] = scanline[i];
+      break;
+    case 1:
+      for(i = 0; i < bytewidth; i++) recon[i] = scanline[i];
+      for(i = bytewidth; i < length; i++) recon[i] = scanline[i] + recon[i - bytewidth];
+      break;
+    case 2:
+      if(precon)
+      {
+        for(i = 0; i < length; i++) recon[i] = scanline[i] + precon[i];
+      }
+      else
+      {
+        for(i = 0; i < length; i++) recon[i] = scanline[i];
+      }
+      break;
+    case 3:
+      if(precon)
+      {
+        for(i = 0; i < bytewidth; i++) recon[i] = scanline[i] + precon[i] / 2;
+        for(i = bytewidth; i < length; i++) recon[i] = scanline[i] + ((recon[i - bytewidth] + precon[i]) / 2);
+      }
+      else
+      {
+        for(i = 0; i < bytewidth; i++) recon[i] = scanline[i];
+        for(i = bytewidth; i < length; i++) recon[i] = scanline[i] + recon[i - bytewidth] / 2;
+      }
+      break;
+    case 4:
+      if(precon)
+      {
+        for(i = 0; i < bytewidth; i++)
+        {
+          recon[i] = (scanline[i] + precon[i]); /*paethPredictor(0, precon[i], 0) is always precon[i]*/
+        }
+        for(i = bytewidth; i < length; i++)
+        {
+          recon[i] = (scanline[i] + paethPredictor(recon[i - bytewidth], precon[i], precon[i - bytewidth]));
+        }
+      }
+      else
+      {
+        for(i = 0; i < bytewidth; i++)
+        {
+          recon[i] = scanline[i];
+        }
+        for(i = bytewidth; i < length; i++)
+        {
+          /*paethPredictor(recon[i - bytewidth], 0, 0) is always recon[i - bytewidth]*/
+          recon[i] = (scanline[i] + recon[i - bytewidth]);
+        }
+      }
+      break;
+    default: return 36; /*error: unexisting filter type given*/
+  }
+  return 0;
+}
+
+static unsigned unfilter(unsigned char* out, const unsigned char* in, unsigned w, unsigned h, unsigned bpp)
+{
+  /*
+  For PNG filter method 0
+  this function unfilters a single image (e.g. without interlacing this is called once, with Adam7 seven times)
+  out must have enough bytes allocated already, in must have the scanlines + 1 filtertype byte per scanline
+  w and h are image dimensions or dimensions of reduced image, bpp is bits per pixel
+  in and out are allowed to be the same memory address (but aren't the same size since in has the extra filter bytes)
+  */
+
+  unsigned y;
+  unsigned char* prevline = 0;
+
+  /*bytewidth is used for filtering, is 1 when bpp < 8, number of bytes per pixel otherwise*/
+  size_t bytewidth = (bpp + 7) / 8;
+  size_t linebytes = (w * bpp + 7) / 8;
+
+  for(y = 0; y < h; y++)
+  {
+    size_t outindex = linebytes * y;
+    size_t inindex = (1 + linebytes) * y; /*the extra filterbyte added to each row*/
+    unsigned char filterType = in[inindex];
+
+    CERROR_TRY_RETURN(unfilterScanline(&out[outindex], &in[inindex + 1], prevline, bytewidth, filterType, linebytes));
+
+    prevline = &out[outindex];
+  }
+
+  return 0;
+}
+
+/*
+in: Adam7 interlaced image, with no padding bits between scanlines, but between
+ reduced images so that each reduced image starts at a byte.
+out: the same pixels, but re-ordered so that they're now a non-interlaced image with size w*h
+bpp: bits per pixel
+out has the following size in bits: w * h * bpp.
+in is possibly bigger due to padding bits between reduced images.
+out must be big enough AND must be 0 everywhere if bpp < 8 in the current implementation
+(because that's likely a little bit faster)
+NOTE: comments about padding bits are only relevant if bpp < 8
+*/
+static void Adam7_deinterlace(unsigned char* out, const unsigned char* in, unsigned w, unsigned h, unsigned bpp)
+{
+  unsigned passw[7], passh[7];
+  size_t filter_passstart[8], padded_passstart[8], passstart[8];
+  unsigned i;
+
+  Adam7_getpassvalues(passw, passh, filter_passstart, padded_passstart, passstart, w, h, bpp);
+
+  if(bpp >= 8)
+  {
+    for(i = 0; i < 7; i++)
+    {
+      unsigned x, y, b;
+      size_t bytewidth = bpp / 8;
+      for(y = 0; y < passh[i]; y++)
+      for(x = 0; x < passw[i]; x++)
+      {
+        size_t pixelinstart = passstart[i] + (y * passw[i] + x) * bytewidth;
+        size_t pixeloutstart = ((ADAM7_IY[i] + y * ADAM7_DY[i]) * w + ADAM7_IX[i] + x * ADAM7_DX[i]) * bytewidth;
+        for(b = 0; b < bytewidth; b++)
+        {
+          out[pixeloutstart + b] = in[pixelinstart + b];
+        }
+      }
+    }
+  }
+  else /*bpp < 8: Adam7 with pixels < 8 bit is a bit trickier: with bit pointers*/
+  {
+    for(i = 0; i < 7; i++)
+    {
+      unsigned x, y, b;
+      unsigned ilinebits = bpp * passw[i];
+      unsigned olinebits = bpp * w;
+      size_t obp, ibp; /*bit pointers (for out and in buffer)*/
+      for(y = 0; y < passh[i]; y++)
+      for(x = 0; x < passw[i]; x++)
+      {
+        ibp = (8 * passstart[i]) + (y * ilinebits + x * bpp);
+        obp = (ADAM7_IY[i] + y * ADAM7_DY[i]) * olinebits + (ADAM7_IX[i] + x * ADAM7_DX[i]) * bpp;
+        for(b = 0; b < bpp; b++)
+        {
+          unsigned char bit = readBitFromReversedStream(&ibp, in);
+          /*note that this function assumes the out buffer is completely 0, use setBitOfReversedStream otherwise*/
+          setBitOfReversedStream0(&obp, out, bit);
+        }
+      }
+    }
+  }
+}
+
+static void removePaddingBits(unsigned char* out, const unsigned char* in,
+                              size_t olinebits, size_t ilinebits, unsigned h)
+{
+  /*
+  After filtering there are still padding bits if scanlines have non multiple of 8 bit amounts. They need
+  to be removed (except at last scanline of (Adam7-reduced) image) before working with pure image buffers
+  for the Adam7 code, the color convert code and the output to the user.
+  in and out are allowed to be the same buffer, in may also be higher but still overlapping; in must
+  have >= ilinebits*h bits, out must have >= olinebits*h bits, olinebits must be <= ilinebits
+  also used to move bits after earlier such operations happened, e.g. in a sequence of reduced images from Adam7
+  only useful if (ilinebits - olinebits) is a value in the range 1..7
+  */
+  unsigned y;
+  size_t diff = ilinebits - olinebits;
+  size_t ibp = 0, obp = 0; /*input and output bit pointers*/
+  for(y = 0; y < h; y++)
+  {
+    size_t x;
+    for(x = 0; x < olinebits; x++)
+    {
+      unsigned char bit = readBitFromReversedStream(&ibp, in);
+      setBitOfReversedStream(&obp, out, bit);
+    }
+    ibp += diff;
+  }
+}
+
+/*out must be buffer big enough to contain full image, and in must contain the full decompressed data from
+the IDAT chunks (with filter index bytes and possible padding bits)
+return value is error*/
+static unsigned postProcessScanlines(unsigned char* out, unsigned char* in,
+                                     unsigned w, unsigned h, const LodePNGInfo* info_png)
+{
+  /*
+  This function converts the filtered-padded-interlaced data into pure 2D image buffer with the PNG's colortype.
+  Steps:
+  *) if no Adam7: 1) unfilter 2) remove padding bits (= posible extra bits per scanline if bpp < 8)
+  *) if adam7: 1) 7x unfilter 2) 7x remove padding bits 3) Adam7_deinterlace
+  NOTE: the in buffer will be overwritten with intermediate data!
+  */
+  unsigned bpp = lodepng_get_bpp(&info_png->color);
+  if(bpp == 0) return 31; /*error: invalid colortype*/
+
+  if(info_png->interlace_method == 0)
+  {
+    if(bpp < 8 && w * bpp != ((w * bpp + 7) / 8) * 8)
+    {
+      CERROR_TRY_RETURN(unfilter(in, in, w, h, bpp));
+      removePaddingBits(out, in, w * bpp, ((w * bpp + 7) / 8) * 8, h);
+    }
+    /*we can immediatly filter into the out buffer, no other steps needed*/
+    else CERROR_TRY_RETURN(unfilter(out, in, w, h, bpp));
+  }
+  else /*interlace_method is 1 (Adam7)*/
+  {
+    unsigned passw[7], passh[7]; size_t filter_passstart[8], padded_passstart[8], passstart[8];
+    unsigned i;
+
+    Adam7_getpassvalues(passw, passh, filter_passstart, padded_passstart, passstart, w, h, bpp);
+
+    for(i = 0; i < 7; i++)
+    {
+      CERROR_TRY_RETURN(unfilter(&in[padded_passstart[i]], &in[filter_passstart[i]], passw[i], passh[i], bpp));
+      /*TODO: possible efficiency improvement: if in this reduced image the bits fit nicely in 1 scanline,
+      move bytes instead of bits or move not at all*/
+      if(bpp < 8)
+      {
+        /*remove padding bits in scanlines; after this there still may be padding
+        bits between the different reduced images: each reduced image still starts nicely at a byte*/
+        removePaddingBits(&in[passstart[i]], &in[padded_passstart[i]], passw[i] * bpp,
+                          ((passw[i] * bpp + 7) / 8) * 8, passh[i]);
+      }
+    }
+
+    Adam7_deinterlace(out, in, w, h, bpp);
+  }
+
+  return 0;
+}
+
+static unsigned readChunk_PLTE(LodePNGColorMode* color, const unsigned char* data, size_t chunkLength)
+{
+  unsigned pos = 0, i;
+  if(color->palette) lodepng_free(color->palette);
+  color->palettesize = chunkLength / 3;
+  color->palette = (unsigned char*)lodepng_malloc(4 * color->palettesize);
+  if(!color->palette && color->palettesize)
+  {
+    color->palettesize = 0;
+    return 83; /*alloc fail*/
+  }
+  if(color->palettesize > 256) return 38; /*error: palette too big*/
+
+  for(i = 0; i < color->palettesize; i++)
+  {
+    color->palette[4 * i + 0] = data[pos++]; /*R*/
+    color->palette[4 * i + 1] = data[pos++]; /*G*/
+    color->palette[4 * i + 2] = data[pos++]; /*B*/
+    color->palette[4 * i + 3] = 255; /*alpha*/
+  }
+
+  return 0; /* OK */
+}
+
+static unsigned readChunk_tRNS(LodePNGColorMode* color, const unsigned char* data, size_t chunkLength)
+{
+  unsigned i;
+  if(color->colortype == LCT_PALETTE)
+  {
+    /*error: more alpha values given than there are palette entries*/
+    if(chunkLength > color->palettesize) return 38;
+
+    for(i = 0; i < chunkLength; i++) color->palette[4 * i + 3] = data[i];
+  }
+  else if(color->colortype == LCT_GREY)
+  {
+    /*error: this chunk must be 2 bytes for greyscale image*/
+    if(chunkLength != 2) return 30;
+
+    color->key_defined = 1;
+    color->key_r = color->key_g = color->key_b = 256 * data[0] + data[1];
+  }
+  else if(color->colortype == LCT_RGB)
+  {
+    /*error: this chunk must be 6 bytes for RGB image*/
+    if(chunkLength != 6) return 41;
+
+    color->key_defined = 1;
+    color->key_r = 256 * data[0] + data[1];
+    color->key_g = 256 * data[2] + data[3];
+    color->key_b = 256 * data[4] + data[5];
+  }
+  else return 42; /*error: tRNS chunk not allowed for other color models*/
+
+  return 0; /* OK */
+}
+
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+/*background color chunk (bKGD)*/
+static unsigned readChunk_bKGD(LodePNGInfo* info, const unsigned char* data, size_t chunkLength)
+{
+  if(info->color.colortype == LCT_PALETTE)
+  {
+    /*error: this chunk must be 1 byte for indexed color image*/
+    if(chunkLength != 1) return 43;
+
+    info->background_defined = 1;
+    info->background_r = info->background_g = info->background_b = data[0];
+  }
+  else if(info->color.colortype == LCT_GREY || info->color.colortype == LCT_GREY_ALPHA)
+  {
+    /*error: this chunk must be 2 bytes for greyscale image*/
+    if(chunkLength != 2) return 44;
+
+    info->background_defined = 1;
+    info->background_r = info->background_g = info->background_b
+                                 = 256 * data[0] + data[1];
+  }
+  else if(info->color.colortype == LCT_RGB || info->color.colortype == LCT_RGBA)
+  {
+    /*error: this chunk must be 6 bytes for greyscale image*/
+    if(chunkLength != 6) return 45;
+
+    info->background_defined = 1;
+    info->background_r = 256 * data[0] + data[1];
+    info->background_g = 256 * data[2] + data[3];
+    info->background_b = 256 * data[4] + data[5];
+  }
+
+  return 0; /* OK */
+}
+
+/*text chunk (tEXt)*/
+static unsigned readChunk_tEXt(LodePNGInfo* info, const unsigned char* data, size_t chunkLength)
+{
+  unsigned error = 0;
+  char *key = 0, *str = 0;
+  unsigned i;
+
+  while(!error) /*not really a while loop, only used to break on error*/
+  {
+    unsigned length, string2_begin;
+
+    length = 0;
+    while(length < chunkLength && data[length] != 0) length++;
+    /*even though it's not allowed by the standard, no error is thrown if
+    there's no null termination char, if the text is empty*/
+    if(length < 1 || length > 79) CERROR_BREAK(error, 89); /*keyword too short or long*/
+
+    key = (char*)lodepng_malloc(length + 1);
+    if(!key) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    key[length] = 0;
+    for(i = 0; i < length; i++) key[i] = data[i];
+
+    string2_begin = length + 1; /*skip keyword null terminator*/
+
+    length = chunkLength < string2_begin ? 0 : chunkLength - string2_begin;
+    str = (char*)lodepng_malloc(length + 1);
+    if(!str) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    str[length] = 0;
+    for(i = 0; i < length; i++) str[i] = data[string2_begin + i];
+
+    error = lodepng_add_text(info, key, str);
+
+    break;
+  }
+
+  lodepng_free(key);
+  lodepng_free(str);
+
+  return error;
+}
+
+/*compressed text chunk (zTXt)*/
+static unsigned readChunk_zTXt(LodePNGInfo* info, const LodePNGDecompressSettings* zlibsettings,
+                               const unsigned char* data, size_t chunkLength)
+{
+  unsigned error = 0;
+  unsigned i;
+
+  unsigned length, string2_begin;
+  char *key = 0;
+  ucvector decoded;
+
+  ucvector_init(&decoded);
+
+  while(!error) /*not really a while loop, only used to break on error*/
+  {
+    for(length = 0; length < chunkLength && data[length] != 0; length++) ;
+    if(length + 2 >= chunkLength) CERROR_BREAK(error, 75); /*no null termination, corrupt?*/
+    if(length < 1 || length > 79) CERROR_BREAK(error, 89); /*keyword too short or long*/
+
+    key = (char*)lodepng_malloc(length + 1);
+    if(!key) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    key[length] = 0;
+    for(i = 0; i < length; i++) key[i] = data[i];
+
+    if(data[length + 1] != 0) CERROR_BREAK(error, 72); /*the 0 byte indicating compression must be 0*/
+
+    string2_begin = length + 2;
+    if(string2_begin > chunkLength) CERROR_BREAK(error, 75); /*no null termination, corrupt?*/
+
+    length = chunkLength - string2_begin;
+    /*will fail if zlib error, e.g. if length is too small*/
+    error = zlib_decompress(&decoded.data, &decoded.size,
+                            (unsigned char*)(&data[string2_begin]),
+                            length, zlibsettings);
+    if(error) break;
+    ucvector_push_back(&decoded, 0);
+
+    error = lodepng_add_text(info, key, (char*)decoded.data);
+
+    break;
+  }
+
+  lodepng_free(key);
+  ucvector_cleanup(&decoded);
+
+  return error;
+}
+
+/*international text chunk (iTXt)*/
+static unsigned readChunk_iTXt(LodePNGInfo* info, const LodePNGDecompressSettings* zlibsettings,
+                               const unsigned char* data, size_t chunkLength)
+{
+  unsigned error = 0;
+  unsigned i;
+
+  unsigned length, begin, compressed;
+  char *key = 0, *langtag = 0, *transkey = 0;
+  ucvector decoded;
+  ucvector_init(&decoded);
+
+  while(!error) /*not really a while loop, only used to break on error*/
+  {
+    /*Quick check if the chunk length isn't too small. Even without check
+    it'd still fail with other error checks below if it's too short. This just gives a different error code.*/
+    if(chunkLength < 5) CERROR_BREAK(error, 30); /*iTXt chunk too short*/
+
+    /*read the key*/
+    for(length = 0; length < chunkLength && data[length] != 0; length++) ;
+    if(length + 3 >= chunkLength) CERROR_BREAK(error, 75); /*no null termination char, corrupt?*/
+    if(length < 1 || length > 79) CERROR_BREAK(error, 89); /*keyword too short or long*/
+
+    key = (char*)lodepng_malloc(length + 1);
+    if(!key) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    key[length] = 0;
+    for(i = 0; i < length; i++) key[i] = data[i];
+
+    /*read the compression method*/
+    compressed = data[length + 1];
+    if(data[length + 2] != 0) CERROR_BREAK(error, 72); /*the 0 byte indicating compression must be 0*/
+
+    /*even though it's not allowed by the standard, no error is thrown if
+    there's no null termination char, if the text is empty for the next 3 texts*/
+
+    /*read the langtag*/
+    begin = length + 3;
+    length = 0;
+    for(i = begin; i < chunkLength && data[i] != 0; i++) length++;
+
+    langtag = (char*)lodepng_malloc(length + 1);
+    if(!langtag) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    langtag[length] = 0;
+    for(i = 0; i < length; i++) langtag[i] = data[begin + i];
+
+    /*read the transkey*/
+    begin += length + 1;
+    length = 0;
+    for(i = begin; i < chunkLength && data[i] != 0; i++) length++;
+
+    transkey = (char*)lodepng_malloc(length + 1);
+    if(!transkey) CERROR_BREAK(error, 83); /*alloc fail*/
+
+    transkey[length] = 0;
+    for(i = 0; i < length; i++) transkey[i] = data[begin + i];
+
+    /*read the actual text*/
+    begin += length + 1;
+
+    length = chunkLength < begin ? 0 : chunkLength - begin;
+
+    if(compressed)
+    {
+      /*will fail if zlib error, e.g. if length is too small*/
+      error = zlib_decompress(&decoded.data, &decoded.size,
+                              (unsigned char*)(&data[begin]),
+                              length, zlibsettings);
+      if(error) break;
+      if(decoded.allocsize < decoded.size) decoded.allocsize = decoded.size;
+      ucvector_push_back(&decoded, 0);
+    }
+    else
+    {
+      if(!ucvector_resize(&decoded, length + 1)) CERROR_BREAK(error, 83 /*alloc fail*/);
+
+      decoded.data[length] = 0;
+      for(i = 0; i < length; i++) decoded.data[i] = data[begin + i];
+    }
+
+    error = lodepng_add_itext(info, key, langtag, transkey, (char*)decoded.data);
+
+    break;
+  }
+
+  lodepng_free(key);
+  lodepng_free(langtag);
+  lodepng_free(transkey);
+  ucvector_cleanup(&decoded);
+
+  return error;
+}
+
+static unsigned readChunk_tIME(LodePNGInfo* info, const unsigned char* data, size_t chunkLength)
+{
+  if(chunkLength != 7) return 73; /*invalid tIME chunk size*/
+
+  info->time_defined = 1;
+  info->time.year = 256 * data[0] + data[+ 1];
+  info->time.month = data[2];
+  info->time.day = data[3];
+  info->time.hour = data[4];
+  info->time.minute = data[5];
+  info->time.second = data[6];
+
+  return 0; /* OK */
+}
+
+static unsigned readChunk_pHYs(LodePNGInfo* info, const unsigned char* data, size_t chunkLength)
+{
+  if(chunkLength != 9) return 74; /*invalid pHYs chunk size*/
+
+  info->phys_defined = 1;
+  info->phys_x = 16777216 * data[0] + 65536 * data[1] + 256 * data[2] + data[3];
+  info->phys_y = 16777216 * data[4] + 65536 * data[5] + 256 * data[6] + data[7];
+  info->phys_unit = data[8];
+
+  return 0; /* OK */
+}
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+/*read a PNG, the result will be in the same color type as the PNG (hence "generic")*/
+static void decodeGeneric(unsigned char** out, unsigned* w, unsigned* h,
+                          LodePNGState* state,
+                          const unsigned char* in, size_t insize)
+{
+  unsigned char IEND = 0;
+  const unsigned char* chunk;
+  size_t i;
+  ucvector idat; /*the data from idat chunks*/
+  ucvector scanlines;
+
+  /*for unknown chunk order*/
+  unsigned unknown = 0;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  unsigned critical_pos = 1; /*1 = after IHDR, 2 = after PLTE, 3 = after IDAT*/
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+  /*provide some proper output values if error will happen*/
+  *out = 0;
+
+  state->error = lodepng_inspect(w, h, state, in, insize); /*reads header and resets other parameters in state->info_png*/
+  if(state->error) return;
+
+  ucvector_init(&idat);
+  chunk = &in[33]; /*first byte of the first chunk after the header*/
+
+  /*loop through the chunks, ignoring unknown chunks and stopping at IEND chunk.
+  IDAT data is put at the start of the in buffer*/
+  while(!IEND && !state->error)
+  {
+    unsigned chunkLength;
+    const unsigned char* data; /*the data in the chunk*/
+
+    /*error: size of the in buffer too small to contain next chunk*/
+    if((size_t)((chunk - in) + 12) > insize || chunk < in) CERROR_BREAK(state->error, 30);
+
+    /*length of the data of the chunk, excluding the length bytes, chunk type and CRC bytes*/
+    chunkLength = lodepng_chunk_length(chunk);
+    /*error: chunk length larger than the max PNG chunk size*/
+    if(chunkLength > 2147483647) CERROR_BREAK(state->error, 63);
+
+    if((size_t)((chunk - in) + chunkLength + 12) > insize || (chunk + chunkLength + 12) < in)
+    {
+      CERROR_BREAK(state->error, 64); /*error: size of the in buffer too small to contain next chunk*/
+    }
+
+    data = lodepng_chunk_data_const(chunk);
+
+    /*IDAT chunk, containing compressed image data*/
+    if(lodepng_chunk_type_equals(chunk, "IDAT"))
+    {
+      size_t oldsize = idat.size;
+      if(!ucvector_resize(&idat, oldsize + chunkLength)) CERROR_BREAK(state->error, 83 /*alloc fail*/);
+      for(i = 0; i < chunkLength; i++) idat.data[oldsize + i] = data[i];
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+      critical_pos = 3;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    }
+    /*IEND chunk*/
+    else if(lodepng_chunk_type_equals(chunk, "IEND"))
+    {
+      IEND = 1;
+    }
+    /*palette chunk (PLTE)*/
+    else if(lodepng_chunk_type_equals(chunk, "PLTE"))
+    {
+      state->error = readChunk_PLTE(&state->info_png.color, data, chunkLength);
+      if(state->error) break;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+      critical_pos = 2;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    }
+    /*palette transparency chunk (tRNS)*/
+    else if(lodepng_chunk_type_equals(chunk, "tRNS"))
+    {
+      state->error = readChunk_tRNS(&state->info_png.color, data, chunkLength);
+      if(state->error) break;
+    }
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+    /*background color chunk (bKGD)*/
+    else if(lodepng_chunk_type_equals(chunk, "bKGD"))
+    {
+      state->error = readChunk_bKGD(&state->info_png, data, chunkLength);
+      if(state->error) break;
+    }
+    /*text chunk (tEXt)*/
+    else if(lodepng_chunk_type_equals(chunk, "tEXt"))
+    {
+      if(state->decoder.read_text_chunks)
+      {
+        state->error = readChunk_tEXt(&state->info_png, data, chunkLength);
+        if(state->error) break;
+      }
+    }
+    /*compressed text chunk (zTXt)*/
+    else if(lodepng_chunk_type_equals(chunk, "zTXt"))
+    {
+      if(state->decoder.read_text_chunks)
+      {
+        state->error = readChunk_zTXt(&state->info_png, &state->decoder.zlibsettings, data, chunkLength);
+        if(state->error) break;
+      }
+    }
+    /*international text chunk (iTXt)*/
+    else if(lodepng_chunk_type_equals(chunk, "iTXt"))
+    {
+      if(state->decoder.read_text_chunks)
+      {
+        state->error = readChunk_iTXt(&state->info_png, &state->decoder.zlibsettings, data, chunkLength);
+        if(state->error) break;
+      }
+    }
+    else if(lodepng_chunk_type_equals(chunk, "tIME"))
+    {
+      state->error = readChunk_tIME(&state->info_png, data, chunkLength);
+      if(state->error) break;
+    }
+    else if(lodepng_chunk_type_equals(chunk, "pHYs"))
+    {
+      state->error = readChunk_pHYs(&state->info_png, data, chunkLength);
+      if(state->error) break;
+    }
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    else /*it's not an implemented chunk type, so ignore it: skip over the data*/
+    {
+      /*error: unknown critical chunk (5th bit of first byte of chunk type is 0)*/
+      if(!lodepng_chunk_ancillary(chunk)) CERROR_BREAK(state->error, 69);
+
+      unknown = 1;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+      if(state->decoder.remember_unknown_chunks)
+      {
+        state->error = lodepng_chunk_append(&state->info_png.unknown_chunks_data[critical_pos - 1],
+                                            &state->info_png.unknown_chunks_size[critical_pos - 1], chunk);
+        if(state->error) break;
+      }
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    }
+
+    if(!state->decoder.ignore_crc && !unknown) /*check CRC if wanted, only on known chunk types*/
+    {
+      if(lodepng_chunk_check_crc(chunk)) CERROR_BREAK(state->error, 57); /*invalid CRC*/
+    }
+
+    if(!IEND) chunk = lodepng_chunk_next_const(chunk);
+  }
+
+  ucvector_init(&scanlines);
+  if(!state->error)
+  {
+    /*maximum final image length is already reserved in the vector's length - this is not really necessary*/
+    if(!ucvector_resize(&scanlines, lodepng_get_raw_size(*w, *h, &state->info_png.color) + *h))
+    {
+      state->error = 83; /*alloc fail*/
+    }
+  }
+  if(!state->error)
+  {
+    /*decompress with the Zlib decompressor*/
+    state->error = zlib_decompress(&scanlines.data, &scanlines.size, idat.data,
+                                   idat.size, &state->decoder.zlibsettings);
+  }
+  ucvector_cleanup(&idat);
+
+  if(!state->error)
+  {
+    ucvector outv;
+    ucvector_init(&outv);
+    if(!ucvector_resizev(&outv,
+        lodepng_get_raw_size(*w, *h, &state->info_png.color), 0)) state->error = 83; /*alloc fail*/
+    if(!state->error) state->error = postProcessScanlines(outv.data, scanlines.data, *w, *h, &state->info_png);
+    *out = outv.data;
+  }
+  ucvector_cleanup(&scanlines);
+}
+
+unsigned lodepng_decode(unsigned char** out, unsigned* w, unsigned* h,
+                        LodePNGState* state,
+                        const unsigned char* in, size_t insize)
+{
+  *out = 0;
+  decodeGeneric(out, w, h, state, in, insize);
+  if(state->error) return state->error;
+  if(!state->decoder.color_convert || lodepng_color_mode_equal(&state->info_raw, &state->info_png.color))
+  {
+    /*same color type, no copying or converting of data needed*/
+    /*store the info_png color settings on the info_raw so that the info_raw still reflects what colortype
+    the raw image has to the end user*/
+    if(!state->decoder.color_convert)
+    {
+      state->error = lodepng_color_mode_copy(&state->info_raw, &state->info_png.color);
+      if(state->error) return state->error;
+    }
+  }
+  else
+  {
+    /*color conversion needed; sort of copy of the data*/
+    unsigned char* data = *out;
+    size_t outsize;
+
+    /*TODO: check if this works according to the statement in the documentation: "The converter can convert
+    from greyscale input color type, to 8-bit greyscale or greyscale with alpha"*/
+    if(!(state->info_raw.colortype == LCT_RGB || state->info_raw.colortype == LCT_RGBA)
+       && !(state->info_raw.bitdepth == 8))
+    {
+      return 56; /*unsupported color mode conversion*/
+    }
+
+    outsize = lodepng_get_raw_size(*w, *h, &state->info_raw);
+    *out = (unsigned char*)lodepng_malloc(outsize);
+    if(!(*out))
+    {
+      state->error = 83; /*alloc fail*/
+    }
+    else state->error = lodepng_convert(*out, data, &state->info_raw, &state->info_png.color, *w, *h, state->decoder.fix_png);
+    lodepng_free(data);
+  }
+  return state->error;
+}
+
+unsigned lodepng_decode_memory(unsigned char** out, unsigned* w, unsigned* h, const unsigned char* in,
+                               size_t insize, LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned error;
+  LodePNGState state;
+  lodepng_state_init(&state);
+  state.info_raw.colortype = colortype;
+  state.info_raw.bitdepth = bitdepth;
+  error = lodepng_decode(out, w, h, &state, in, insize);
+  lodepng_state_cleanup(&state);
+  return error;
+}
+
+unsigned lodepng_decode32(unsigned char** out, unsigned* w, unsigned* h, const unsigned char* in, size_t insize)
+{
+  return lodepng_decode_memory(out, w, h, in, insize, LCT_RGBA, 8);
+}
+
+unsigned lodepng_decode24(unsigned char** out, unsigned* w, unsigned* h, const unsigned char* in, size_t insize)
+{
+  return lodepng_decode_memory(out, w, h, in, insize, LCT_RGB, 8);
+}
+
+#ifdef LODEPNG_COMPILE_DISK
+unsigned lodepng_decode_file(unsigned char** out, unsigned* w, unsigned* h, const char* filename,
+                             LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned char* buffer;
+  size_t buffersize;
+  unsigned error;
+  error = lodepng_load_file(&buffer, &buffersize, filename);
+  if(!error) error = lodepng_decode_memory(out, w, h, buffer, buffersize, colortype, bitdepth);
+  lodepng_free(buffer);
+  return error;
+}
+
+unsigned lodepng_decode32_file(unsigned char** out, unsigned* w, unsigned* h, const char* filename)
+{
+  return lodepng_decode_file(out, w, h, filename, LCT_RGBA, 8);
+}
+
+unsigned lodepng_decode24_file(unsigned char** out, unsigned* w, unsigned* h, const char* filename)
+{
+  return lodepng_decode_file(out, w, h, filename, LCT_RGB, 8);
+}
+#endif /*LODEPNG_COMPILE_DISK*/
+
+void lodepng_decoder_settings_init(LodePNGDecoderSettings* settings)
+{
+  settings->color_convert = 1;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  settings->read_text_chunks = 1;
+  settings->remember_unknown_chunks = 0;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+  settings->ignore_crc = 0;
+  settings->fix_png = 0;
+  lodepng_decompress_settings_init(&settings->zlibsettings);
+}
+
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#if defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_ENCODER)
+
+void lodepng_state_init(LodePNGState* state)
+{
+#ifdef LODEPNG_COMPILE_DECODER
+  lodepng_decoder_settings_init(&state->decoder);
+#endif /*LODEPNG_COMPILE_DECODER*/
+#ifdef LODEPNG_COMPILE_ENCODER
+  lodepng_encoder_settings_init(&state->encoder);
+#endif /*LODEPNG_COMPILE_ENCODER*/
+  lodepng_color_mode_init(&state->info_raw);
+  lodepng_info_init(&state->info_png);
+  state->error = 1;
+}
+
+void lodepng_state_cleanup(LodePNGState* state)
+{
+  lodepng_color_mode_cleanup(&state->info_raw);
+  lodepng_info_cleanup(&state->info_png);
+}
+
+void lodepng_state_copy(LodePNGState* dest, const LodePNGState* source)
+{
+  lodepng_state_cleanup(dest);
+  *dest = *source;
+  lodepng_color_mode_init(&dest->info_raw);
+  lodepng_info_init(&dest->info_png);
+  dest->error = lodepng_color_mode_copy(&dest->info_raw, &source->info_raw); if(dest->error) return;
+  dest->error = lodepng_info_copy(&dest->info_png, &source->info_png); if(dest->error) return;
+}
+
+#endif /* defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_ENCODER) */
+
+#ifdef LODEPNG_COMPILE_ENCODER
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* / PNG Encoder                                                            / */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+/*chunkName must be string of 4 characters*/
+static unsigned addChunk(ucvector* out, const char* chunkName, const unsigned char* data, size_t length)
+{
+  CERROR_TRY_RETURN(lodepng_chunk_create(&out->data, &out->size, (unsigned)length, chunkName, data));
+  out->allocsize = out->size; /*fix the allocsize again*/
+  return 0;
+}
+
+static void writeSignature(ucvector* out)
+{
+  /*8 bytes PNG signature, aka the magic bytes*/
+  ucvector_push_back(out, 137);
+  ucvector_push_back(out, 80);
+  ucvector_push_back(out, 78);
+  ucvector_push_back(out, 71);
+  ucvector_push_back(out, 13);
+  ucvector_push_back(out, 10);
+  ucvector_push_back(out, 26);
+  ucvector_push_back(out, 10);
+}
+
+static unsigned addChunk_IHDR(ucvector* out, unsigned w, unsigned h,
+                              LodePNGColorType colortype, unsigned bitdepth, unsigned interlace_method)
+{
+  unsigned error = 0;
+  ucvector header;
+  ucvector_init(&header);
+
+  lodepng_add32bitInt(&header, w); /*width*/
+  lodepng_add32bitInt(&header, h); /*height*/
+  ucvector_push_back(&header, (unsigned char)bitdepth); /*bit depth*/
+  ucvector_push_back(&header, (unsigned char)colortype); /*color type*/
+  ucvector_push_back(&header, 0); /*compression method*/
+  ucvector_push_back(&header, 0); /*filter method*/
+  ucvector_push_back(&header, interlace_method); /*interlace method*/
+
+  error = addChunk(out, "IHDR", header.data, header.size);
+  ucvector_cleanup(&header);
+
+  return error;
+}
+
+static unsigned addChunk_PLTE(ucvector* out, const LodePNGColorMode* info)
+{
+  unsigned error = 0;
+  size_t i;
+  ucvector PLTE;
+  ucvector_init(&PLTE);
+  for(i = 0; i < info->palettesize * 4; i++)
+  {
+    /*add all channels except alpha channel*/
+    if(i % 4 != 3) ucvector_push_back(&PLTE, info->palette[i]);
+  }
+  error = addChunk(out, "PLTE", PLTE.data, PLTE.size);
+  ucvector_cleanup(&PLTE);
+
+  return error;
+}
+
+static unsigned addChunk_tRNS(ucvector* out, const LodePNGColorMode* info)
+{
+  unsigned error = 0;
+  size_t i;
+  ucvector tRNS;
+  ucvector_init(&tRNS);
+  if(info->colortype == LCT_PALETTE)
+  {
+    size_t amount = info->palettesize;
+    /*the tail of palette values that all have 255 as alpha, does not have to be encoded*/
+    for(i = info->palettesize; i > 0; i--)
+    {
+      if(info->palette[4 * (i - 1) + 3] == 255) amount--;
+      else break;
+    }
+    /*add only alpha channel*/
+    for(i = 0; i < amount; i++) ucvector_push_back(&tRNS, info->palette[4 * i + 3]);
+  }
+  else if(info->colortype == LCT_GREY)
+  {
+    if(info->key_defined)
+    {
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_r / 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_r % 256));
+    }
+  }
+  else if(info->colortype == LCT_RGB)
+  {
+    if(info->key_defined)
+    {
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_r / 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_r % 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_g / 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_g % 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_b / 256));
+      ucvector_push_back(&tRNS, (unsigned char)(info->key_b % 256));
+    }
+  }
+
+  error = addChunk(out, "tRNS", tRNS.data, tRNS.size);
+  ucvector_cleanup(&tRNS);
+
+  return error;
+}
+
+static unsigned addChunk_IDAT(ucvector* out, const unsigned char* data, size_t datasize,
+                              LodePNGCompressSettings* zlibsettings)
+{
+  ucvector zlibdata;
+  unsigned error = 0;
+
+  /*compress with the Zlib compressor*/
+  ucvector_init(&zlibdata);
+  error = zlib_compress(&zlibdata.data, &zlibdata.size, data, datasize, zlibsettings);
+  if(!error) error = addChunk(out, "IDAT", zlibdata.data, zlibdata.size);
+  ucvector_cleanup(&zlibdata);
+
+  return error;
+}
+
+static unsigned addChunk_IEND(ucvector* out)
+{
+  unsigned error = 0;
+  error = addChunk(out, "IEND", 0, 0);
+  return error;
+}
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+
+static unsigned addChunk_tEXt(ucvector* out, const char* keyword, const char* textstring)
+{
+  unsigned error = 0;
+  size_t i;
+  ucvector text;
+  ucvector_init(&text);
+  for(i = 0; keyword[i] != 0; i++) ucvector_push_back(&text, (unsigned char)keyword[i]);
+  if(i < 1 || i > 79) return 89; /*error: invalid keyword size*/
+  ucvector_push_back(&text, 0); /*0 termination char*/
+  for(i = 0; textstring[i] != 0; i++) ucvector_push_back(&text, (unsigned char)textstring[i]);
+  error = addChunk(out, "tEXt", text.data, text.size);
+  ucvector_cleanup(&text);
+
+  return error;
+}
+
+static unsigned addChunk_zTXt(ucvector* out, const char* keyword, const char* textstring,
+                              LodePNGCompressSettings* zlibsettings)
+{
+  unsigned error = 0;
+  ucvector data, compressed;
+  size_t i, textsize = strlen(textstring);
+
+  ucvector_init(&data);
+  ucvector_init(&compressed);
+  for(i = 0; keyword[i] != 0; i++) ucvector_push_back(&data, (unsigned char)keyword[i]);
+  if(i < 1 || i > 79) return 89; /*error: invalid keyword size*/
+  ucvector_push_back(&data, 0); /*0 termination char*/
+  ucvector_push_back(&data, 0); /*compression method: 0*/
+
+  error = zlib_compress(&compressed.data, &compressed.size,
+                        (unsigned char*)textstring, textsize, zlibsettings);
+  if(!error)
+  {
+    for(i = 0; i < compressed.size; i++) ucvector_push_back(&data, compressed.data[i]);
+    error = addChunk(out, "zTXt", data.data, data.size);
+  }
+
+  ucvector_cleanup(&compressed);
+  ucvector_cleanup(&data);
+  return error;
+}
+
+static unsigned addChunk_iTXt(ucvector* out, unsigned compressed, const char* keyword, const char* langtag,
+                              const char* transkey, const char* textstring, LodePNGCompressSettings* zlibsettings)
+{
+  unsigned error = 0;
+  ucvector data;
+  size_t i, textsize = strlen(textstring);
+
+  ucvector_init(&data);
+
+  for(i = 0; keyword[i] != 0; i++) ucvector_push_back(&data, (unsigned char)keyword[i]);
+  if(i < 1 || i > 79) return 89; /*error: invalid keyword size*/
+  ucvector_push_back(&data, 0); /*null termination char*/
+  ucvector_push_back(&data, compressed ? 1 : 0); /*compression flag*/
+  ucvector_push_back(&data, 0); /*compression method*/
+  for(i = 0; langtag[i] != 0; i++) ucvector_push_back(&data, (unsigned char)langtag[i]);
+  ucvector_push_back(&data, 0); /*null termination char*/
+  for(i = 0; transkey[i] != 0; i++) ucvector_push_back(&data, (unsigned char)transkey[i]);
+  ucvector_push_back(&data, 0); /*null termination char*/
+
+  if(compressed)
+  {
+    ucvector compressed_data;
+    ucvector_init(&compressed_data);
+    error = zlib_compress(&compressed_data.data, &compressed_data.size,
+                          (unsigned char*)textstring, textsize, zlibsettings);
+    if(!error)
+    {
+      for(i = 0; i < compressed_data.size; i++) ucvector_push_back(&data, compressed_data.data[i]);
+    }
+    ucvector_cleanup(&compressed_data);
+  }
+  else /*not compressed*/
+  {
+    for(i = 0; textstring[i] != 0; i++) ucvector_push_back(&data, (unsigned char)textstring[i]);
+  }
+
+  if(!error) error = addChunk(out, "iTXt", data.data, data.size);
+  ucvector_cleanup(&data);
+  return error;
+}
+
+static unsigned addChunk_bKGD(ucvector* out, const LodePNGInfo* info)
+{
+  unsigned error = 0;
+  ucvector bKGD;
+  ucvector_init(&bKGD);
+  if(info->color.colortype == LCT_GREY || info->color.colortype == LCT_GREY_ALPHA)
+  {
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_r / 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_r % 256));
+  }
+  else if(info->color.colortype == LCT_RGB || info->color.colortype == LCT_RGBA)
+  {
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_r / 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_r % 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_g / 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_g % 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_b / 256));
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_b % 256));
+  }
+  else if(info->color.colortype == LCT_PALETTE)
+  {
+    ucvector_push_back(&bKGD, (unsigned char)(info->background_r % 256)); /*palette index*/
+  }
+
+  error = addChunk(out, "bKGD", bKGD.data, bKGD.size);
+  ucvector_cleanup(&bKGD);
+
+  return error;
+}
+
+static unsigned addChunk_tIME(ucvector* out, const LodePNGTime* time)
+{
+  unsigned error = 0;
+  unsigned char* data = (unsigned char*)lodepng_malloc(7);
+  if(!data) return 83; /*alloc fail*/
+  data[0] = (unsigned char)(time->year / 256);
+  data[1] = (unsigned char)(time->year % 256);
+  data[2] = time->month;
+  data[3] = time->day;
+  data[4] = time->hour;
+  data[5] = time->minute;
+  data[6] = time->second;
+  error = addChunk(out, "tIME", data, 7);
+  lodepng_free(data);
+  return error;
+}
+
+static unsigned addChunk_pHYs(ucvector* out, const LodePNGInfo* info)
+{
+  unsigned error = 0;
+  ucvector data;
+  ucvector_init(&data);
+
+  lodepng_add32bitInt(&data, info->phys_x);
+  lodepng_add32bitInt(&data, info->phys_y);
+  ucvector_push_back(&data, info->phys_unit);
+
+  error = addChunk(out, "pHYs", data.data, data.size);
+  ucvector_cleanup(&data);
+
+  return error;
+}
+
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+static void filterScanline(unsigned char* out, const unsigned char* scanline, const unsigned char* prevline,
+                           size_t length, size_t bytewidth, unsigned char filterType)
+{
+  size_t i;
+  switch(filterType)
+  {
+    case 0: /*None*/
+      for(i = 0; i < length; i++) out[i] = scanline[i];
+      break;
+    case 1: /*Sub*/
+      if(prevline)
+      {
+        for(i = 0; i < bytewidth; i++) out[i] = scanline[i];
+        for(i = bytewidth; i < length; i++) out[i] = scanline[i] - scanline[i - bytewidth];
+      }
+      else
+      {
+        for(i = 0; i < bytewidth; i++) out[i] = scanline[i];
+        for(i = bytewidth; i < length; i++) out[i] = scanline[i] - scanline[i - bytewidth];
+      }
+      break;
+    case 2: /*Up*/
+      if(prevline)
+      {
+        for(i = 0; i < length; i++) out[i] = scanline[i] - prevline[i];
+      }
+      else
+      {
+        for(i = 0; i < length; i++) out[i] = scanline[i];
+      }
+      break;
+    case 3: /*Average*/
+      if(prevline)
+      {
+        for(i = 0; i < bytewidth; i++) out[i] = scanline[i] - prevline[i] / 2;
+        for(i = bytewidth; i < length; i++) out[i] = scanline[i] - ((scanline[i - bytewidth] + prevline[i]) / 2);
+      }
+      else
+      {
+        for(i = 0; i < bytewidth; i++) out[i] = scanline[i];
+        for(i = bytewidth; i < length; i++) out[i] = scanline[i] - scanline[i - bytewidth] / 2;
+      }
+      break;
+    case 4: /*Paeth*/
+      if(prevline)
+      {
+        /*paethPredictor(0, prevline[i], 0) is always prevline[i]*/
+        for(i = 0; i < bytewidth; i++) out[i] = (scanline[i] - prevline[i]);
+        for(i = bytewidth; i < length; i++)
+        {
+          out[i] = (scanline[i] - paethPredictor(scanline[i - bytewidth], prevline[i], prevline[i - bytewidth]));
+        }
+      }
+      else
+      {
+        for(i = 0; i < bytewidth; i++) out[i] = scanline[i];
+        /*paethPredictor(scanline[i - bytewidth], 0, 0) is always scanline[i - bytewidth]*/
+        for(i = bytewidth; i < length; i++) out[i] = (scanline[i] - scanline[i - bytewidth]);
+      }
+      break;
+    default: return; /*unexisting filter type given*/
+  }
+}
+
+/* log2 approximation. A slight bit faster than std::log. */
+static float flog2(float f)
+{
+  float result = 0;
+  while(f > 32) { result += 4; f /= 16; }
+  while(f > 2) { result++; f /= 2; }
+  return result + 1.442695f * (f * f * f / 3 - 3 * f * f / 2 + 3 * f - 1.83333f);
+}
+
+static unsigned filter(unsigned char* out, const unsigned char* in, unsigned w, unsigned h,
+                       const LodePNGColorMode* info, const LodePNGEncoderSettings* settings)
+{
+  /*
+  For PNG filter method 0
+  out must be a buffer with as size: h + (w * h * bpp + 7) / 8, because there are
+  the scanlines with 1 extra byte per scanline
+  */
+
+  unsigned bpp = lodepng_get_bpp(info);
+  /*the width of a scanline in bytes, not including the filter type*/
+  size_t linebytes = (w * bpp + 7) / 8;
+  /*bytewidth is used for filtering, is 1 when bpp < 8, number of bytes per pixel otherwise*/
+  size_t bytewidth = (bpp + 7) / 8;
+  const unsigned char* prevline = 0;
+  unsigned x, y;
+  unsigned error = 0;
+  LodePNGFilterStrategy strategy = settings->filter_strategy;
+
+  /*
+  There is a heuristic called the minimum sum of absolute differences heuristic, suggested by the PNG standard:
+   *  If the image type is Palette, or the bit depth is smaller than 8, then do not filter the image (i.e.
+      use fixed filtering, with the filter None).
+   * (The other case) If the image type is Grayscale or RGB (with or without Alpha), and the bit depth is
+     not smaller than 8, then use adaptive filtering heuristic as follows: independently for each row, apply
+     all five filters and select the filter that produces the smallest sum of absolute values per row.
+  This heuristic is used if filter strategy is LFS_MINSUM and filter_palette_zero is true.
+
+  If filter_palette_zero is true and filter_strategy is not LFS_MINSUM, the above heuristic is followed,
+  but for "the other case", whatever strategy filter_strategy is set to instead of the minimum sum
+  heuristic is used.
+  */
+  if(settings->filter_palette_zero &&
+     (info->colortype == LCT_PALETTE || info->bitdepth < 8)) strategy = LFS_ZERO;
+
+  if(bpp == 0) return 31; /*error: invalid color type*/
+
+  if(strategy == LFS_ZERO)
+  {
+    for(y = 0; y < h; y++)
+    {
+      size_t outindex = (1 + linebytes) * y; /*the extra filterbyte added to each row*/
+      size_t inindex = linebytes * y;
+      out[outindex] = 0; /*filter type byte*/
+      filterScanline(&out[outindex + 1], &in[inindex], prevline, linebytes, bytewidth, 0);
+      prevline = &in[inindex];
+    }
+  }
+  else if(strategy == LFS_MINSUM)
+  {
+    /*adaptive filtering*/
+    size_t sum[5];
+    ucvector attempt[5]; /*five filtering attempts, one for each filter type*/
+    size_t smallest = 0;
+    unsigned type, bestType = 0;
+
+    for(type = 0; type < 5; type++)
+    {
+      ucvector_init(&attempt[type]);
+      if(!ucvector_resize(&attempt[type], linebytes)) return 83; /*alloc fail*/
+    }
+
+    if(!error)
+    {
+      for(y = 0; y < h; y++)
+      {
+        /*try the 5 filter types*/
+        for(type = 0; type < 5; type++)
+        {
+          filterScanline(attempt[type].data, &in[y * linebytes], prevline, linebytes, bytewidth, type);
+
+          /*calculate the sum of the result*/
+          sum[type] = 0;
+          if(type == 0)
+          {
+            for(x = 0; x < linebytes; x++) sum[type] += (unsigned char)(attempt[type].data[x]);
+          }
+          else
+          {
+            for(x = 0; x < linebytes; x++)
+            {
+              /*For differences, each byte should be treated as signed, values above 127 are negative
+              (converted to signed char). Filtertype 0 isn't a difference though, so use unsigned there.
+              This means filtertype 0 is almost never chosen, but that is justified.*/
+              signed char s = (signed char)(attempt[type].data[x]);
+              sum[type] += s < 0 ? -s : s;
+            }
+          }
+
+          /*check if this is smallest sum (or if type == 0 it's the first case so always store the values)*/
+          if(type == 0 || sum[type] < smallest)
+          {
+            bestType = type;
+            smallest = sum[type];
+          }
+        }
+
+        prevline = &in[y * linebytes];
+
+        /*now fill the out values*/
+        out[y * (linebytes + 1)] = bestType; /*the first byte of a scanline will be the filter type*/
+        for(x = 0; x < linebytes; x++) out[y * (linebytes + 1) + 1 + x] = attempt[bestType].data[x];
+      }
+    }
+
+    for(type = 0; type < 5; type++) ucvector_cleanup(&attempt[type]);
+  }
+  else if(strategy == LFS_ENTROPY)
+  {
+    float sum[5];
+    ucvector attempt[5]; /*five filtering attempts, one for each filter type*/
+    float smallest = 0;
+    unsigned type, bestType = 0;
+    unsigned count[256];
+
+    for(type = 0; type < 5; type++)
+    {
+      ucvector_init(&attempt[type]);
+      if(!ucvector_resize(&attempt[type], linebytes)) return 83; /*alloc fail*/
+    }
+
+    for(y = 0; y < h; y++)
+    {
+      /*try the 5 filter types*/
+      for(type = 0; type < 5; type++)
+      {
+        filterScanline(attempt[type].data, &in[y * linebytes], prevline, linebytes, bytewidth, type);
+        for(x = 0; x < 256; x++) count[x] = 0;
+        for(x = 0; x < linebytes; x++) count[attempt[type].data[x]]++;
+        count[type]++; /*the filter type itself is part of the scanline*/
+        sum[type] = 0;
+        for(x = 0; x < 256; x++)
+        {
+          float p = count[x] / (float)(linebytes + 1);
+          sum[type] += count[x] == 0 ? 0 : flog2(1 / p) * p;
+        }
+        /*check if this is smallest sum (or if type == 0 it's the first case so always store the values)*/
+        if(type == 0 || sum[type] < smallest)
+        {
+          bestType = type;
+          smallest = sum[type];
+        }
+      }
+
+      prevline = &in[y * linebytes];
+
+      /*now fill the out values*/
+      out[y * (linebytes + 1)] = bestType; /*the first byte of a scanline will be the filter type*/
+      for(x = 0; x < linebytes; x++) out[y * (linebytes + 1) + 1 + x] = attempt[bestType].data[x];
+    }
+
+    for(type = 0; type < 5; type++) ucvector_cleanup(&attempt[type]);
+  }
+  else if(strategy == LFS_PREDEFINED)
+  {
+    for(y = 0; y < h; y++)
+    {
+      size_t outindex = (1 + linebytes) * y; /*the extra filterbyte added to each row*/
+      size_t inindex = linebytes * y;
+      unsigned type = settings->predefined_filters[y];
+      out[outindex] = type; /*filter type byte*/
+      filterScanline(&out[outindex + 1], &in[inindex], prevline, linebytes, bytewidth, type);
+      prevline = &in[inindex];
+    }
+  }
+  else if(strategy == LFS_BRUTE_FORCE)
+  {
+    /*brute force filter chooser.
+    deflate the scanline after every filter attempt to see which one deflates best.
+    This is very slow and gives only slightly smaller, sometimes even larger, result*/
+    size_t size[5];
+    ucvector attempt[5]; /*five filtering attempts, one for each filter type*/
+    size_t smallest = 0;
+    unsigned type = 0, bestType = 0;
+    unsigned char* dummy;
+    LodePNGCompressSettings zlibsettings = settings->zlibsettings;
+    /*use fixed tree on the attempts so that the tree is not adapted to the filtertype on purpose,
+    to simulate the true case where the tree is the same for the whole image. Sometimes it gives
+    better result with dynamic tree anyway. Using the fixed tree sometimes gives worse, but in rare
+    cases better compression. It does make this a bit less slow, so it's worth doing this.*/
+    zlibsettings.btype = 1;
+    /*a custom encoder likely doesn't read the btype setting and is optimized for complete PNG
+    images only, so disable it*/
+    zlibsettings.custom_zlib = 0;
+    zlibsettings.custom_deflate = 0;
+    for(type = 0; type < 5; type++)
+    {
+      ucvector_init(&attempt[type]);
+      ucvector_resize(&attempt[type], linebytes); /*todo: give error if resize failed*/
+    }
+    for(y = 0; y < h; y++) /*try the 5 filter types*/
+    {
+      for(type = 0; type < 5; type++)
+      {
+        unsigned testsize = attempt[type].size;
+        /*if(testsize > 8) testsize /= 8;*/ /*it already works good enough by testing a part of the row*/
+
+        filterScanline(attempt[type].data, &in[y * linebytes], prevline, linebytes, bytewidth, type);
+        size[type] = 0;
+        dummy = 0;
+        zlib_compress(&dummy, &size[type], attempt[type].data, testsize, &zlibsettings);
+        lodepng_free(dummy);
+        /*check if this is smallest size (or if type == 0 it's the first case so always store the values)*/
+        if(type == 0 || size[type] < smallest)
+        {
+          bestType = type;
+          smallest = size[type];
+        }
+      }
+      prevline = &in[y * linebytes];
+      out[y * (linebytes + 1)] = bestType; /*the first byte of a scanline will be the filter type*/
+      for(x = 0; x < linebytes; x++) out[y * (linebytes + 1) + 1 + x] = attempt[bestType].data[x];
+    }
+    for(type = 0; type < 5; type++) ucvector_cleanup(&attempt[type]);
+  }
+  else return 88; /* unknown filter strategy */
+
+  return error;
+}
+
+static void addPaddingBits(unsigned char* out, const unsigned char* in,
+                           size_t olinebits, size_t ilinebits, unsigned h)
+{
+  /*The opposite of the removePaddingBits function
+  olinebits must be >= ilinebits*/
+  unsigned y;
+  size_t diff = olinebits - ilinebits;
+  size_t obp = 0, ibp = 0; /*bit pointers*/
+  for(y = 0; y < h; y++)
+  {
+    size_t x;
+    for(x = 0; x < ilinebits; x++)
+    {
+      unsigned char bit = readBitFromReversedStream(&ibp, in);
+      setBitOfReversedStream(&obp, out, bit);
+    }
+    /*obp += diff; --> no, fill in some value in the padding bits too, to avoid
+    "Use of uninitialised value of size ###" warning from valgrind*/
+    for(x = 0; x < diff; x++) setBitOfReversedStream(&obp, out, 0);
+  }
+}
+
+/*
+in: non-interlaced image with size w*h
+out: the same pixels, but re-ordered according to PNG's Adam7 interlacing, with
+ no padding bits between scanlines, but between reduced images so that each
+ reduced image starts at a byte.
+bpp: bits per pixel
+there are no padding bits, not between scanlines, not between reduced images
+in has the following size in bits: w * h * bpp.
+out is possibly bigger due to padding bits between reduced images
+NOTE: comments about padding bits are only relevant if bpp < 8
+*/
+static void Adam7_interlace(unsigned char* out, const unsigned char* in, unsigned w, unsigned h, unsigned bpp)
+{
+  unsigned passw[7], passh[7];
+  size_t filter_passstart[8], padded_passstart[8], passstart[8];
+  unsigned i;
+
+  Adam7_getpassvalues(passw, passh, filter_passstart, padded_passstart, passstart, w, h, bpp);
+
+  if(bpp >= 8)
+  {
+    for(i = 0; i < 7; i++)
+    {
+      unsigned x, y, b;
+      size_t bytewidth = bpp / 8;
+      for(y = 0; y < passh[i]; y++)
+      for(x = 0; x < passw[i]; x++)
+      {
+        size_t pixelinstart = ((ADAM7_IY[i] + y * ADAM7_DY[i]) * w + ADAM7_IX[i] + x * ADAM7_DX[i]) * bytewidth;
+        size_t pixeloutstart = passstart[i] + (y * passw[i] + x) * bytewidth;
+        for(b = 0; b < bytewidth; b++)
+        {
+          out[pixeloutstart + b] = in[pixelinstart + b];
+        }
+      }
+    }
+  }
+  else /*bpp < 8: Adam7 with pixels < 8 bit is a bit trickier: with bit pointers*/
+  {
+    for(i = 0; i < 7; i++)
+    {
+      unsigned x, y, b;
+      unsigned ilinebits = bpp * passw[i];
+      unsigned olinebits = bpp * w;
+      size_t obp, ibp; /*bit pointers (for out and in buffer)*/
+      for(y = 0; y < passh[i]; y++)
+      for(x = 0; x < passw[i]; x++)
+      {
+        ibp = (ADAM7_IY[i] + y * ADAM7_DY[i]) * olinebits + (ADAM7_IX[i] + x * ADAM7_DX[i]) * bpp;
+        obp = (8 * passstart[i]) + (y * ilinebits + x * bpp);
+        for(b = 0; b < bpp; b++)
+        {
+          unsigned char bit = readBitFromReversedStream(&ibp, in);
+          setBitOfReversedStream(&obp, out, bit);
+        }
+      }
+    }
+  }
+}
+
+/*out must be buffer big enough to contain uncompressed IDAT chunk data, and in must contain the full image.
+return value is error**/
+static unsigned preProcessScanlines(unsigned char** out, size_t* outsize, const unsigned char* in,
+                                    unsigned w, unsigned h,
+                                    const LodePNGInfo* info_png, const LodePNGEncoderSettings* settings)
+{
+  /*
+  This function converts the pure 2D image with the PNG's colortype, into filtered-padded-interlaced data. Steps:
+  *) if no Adam7: 1) add padding bits (= posible extra bits per scanline if bpp < 8) 2) filter
+  *) if adam7: 1) Adam7_interlace 2) 7x add padding bits 3) 7x filter
+  */
+  unsigned bpp = lodepng_get_bpp(&info_png->color);
+  unsigned error = 0;
+
+  if(info_png->interlace_method == 0)
+  {
+    *outsize = h + (h * ((w * bpp + 7) / 8)); /*image size plus an extra byte per scanline + possible padding bits*/
+    *out = (unsigned char*)lodepng_malloc(*outsize);
+    if(!(*out) && (*outsize)) error = 83; /*alloc fail*/
+
+    if(!error)
+    {
+      /*non multiple of 8 bits per scanline, padding bits needed per scanline*/
+      if(bpp < 8 && w * bpp != ((w * bpp + 7) / 8) * 8)
+      {
+        unsigned char* padded = (unsigned char*)lodepng_malloc(h * ((w * bpp + 7) / 8));
+        if(!padded) error = 83; /*alloc fail*/
+        if(!error)
+        {
+          addPaddingBits(padded, in, ((w * bpp + 7) / 8) * 8, w * bpp, h);
+          error = filter(*out, padded, w, h, &info_png->color, settings);
+        }
+        lodepng_free(padded);
+      }
+      else
+      {
+        /*we can immediatly filter into the out buffer, no other steps needed*/
+        error = filter(*out, in, w, h, &info_png->color, settings);
+      }
+    }
+  }
+  else /*interlace_method is 1 (Adam7)*/
+  {
+    unsigned passw[7], passh[7];
+    size_t filter_passstart[8], padded_passstart[8], passstart[8];
+    unsigned char* adam7;
+
+    Adam7_getpassvalues(passw, passh, filter_passstart, padded_passstart, passstart, w, h, bpp);
+
+    *outsize = filter_passstart[7]; /*image size plus an extra byte per scanline + possible padding bits*/
+    *out = (unsigned char*)lodepng_malloc(*outsize);
+    if(!(*out)) error = 83; /*alloc fail*/
+
+    adam7 = (unsigned char*)lodepng_malloc(passstart[7]);
+    if(!adam7 && passstart[7]) error = 83; /*alloc fail*/
+
+    if(!error)
+    {
+      unsigned i;
+
+      Adam7_interlace(adam7, in, w, h, bpp);
+      for(i = 0; i < 7; i++)
+      {
+        if(bpp < 8)
+        {
+          unsigned char* padded = (unsigned char*)lodepng_malloc(padded_passstart[i + 1] - padded_passstart[i]);
+          if(!padded) ERROR_BREAK(83); /*alloc fail*/
+          addPaddingBits(padded, &adam7[passstart[i]],
+                         ((passw[i] * bpp + 7) / 8) * 8, passw[i] * bpp, passh[i]);
+          error = filter(&(*out)[filter_passstart[i]], padded,
+                         passw[i], passh[i], &info_png->color, settings);
+          lodepng_free(padded);
+        }
+        else
+        {
+          error = filter(&(*out)[filter_passstart[i]], &adam7[padded_passstart[i]],
+                         passw[i], passh[i], &info_png->color, settings);
+        }
+
+        if(error) break;
+      }
+    }
+
+    lodepng_free(adam7);
+  }
+
+  return error;
+}
+
+/*
+palette must have 4 * palettesize bytes allocated, and given in format RGBARGBARGBARGBA...
+returns 0 if the palette is opaque,
+returns 1 if the palette has a single color with alpha 0 ==> color key
+returns 2 if the palette is semi-translucent.
+*/
+static unsigned getPaletteTranslucency(const unsigned char* palette, size_t palettesize)
+{
+  size_t i, key = 0;
+  unsigned r = 0, g = 0, b = 0; /*the value of the color with alpha 0, so long as color keying is possible*/
+  for(i = 0; i < palettesize; i++)
+  {
+    if(!key && palette[4 * i + 3] == 0)
+    {
+      r = palette[4 * i + 0]; g = palette[4 * i + 1]; b = palette[4 * i + 2];
+      key = 1;
+      i = (size_t)(-1); /*restart from beginning, to detect earlier opaque colors with key's value*/
+    }
+    else if(palette[4 * i + 3] != 255) return 2;
+    /*when key, no opaque RGB may have key's RGB*/
+    else if(key && r == palette[i * 4 + 0] && g == palette[i * 4 + 1] && b == palette[i * 4 + 2]) return 2;
+  }
+  return key;
+}
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+static unsigned addUnknownChunks(ucvector* out, unsigned char* data, size_t datasize)
+{
+  unsigned char* inchunk = data;
+  while((size_t)(inchunk - data) < datasize)
+  {
+    CERROR_TRY_RETURN(lodepng_chunk_append(&out->data, &out->size, inchunk));
+    out->allocsize = out->size; /*fix the allocsize again*/
+    inchunk = lodepng_chunk_next(inchunk);
+  }
+  return 0;
+}
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+unsigned lodepng_encode(unsigned char** out, size_t* outsize,
+                        const unsigned char* image, unsigned w, unsigned h,
+                        LodePNGState* state)
+{
+  LodePNGInfo info;
+  ucvector outv;
+  unsigned char* data = 0; /*uncompressed version of the IDAT chunk data*/
+  size_t datasize = 0;
+
+  /*provide some proper output values if error will happen*/
+  *out = 0;
+  *outsize = 0;
+  state->error = 0;
+
+  lodepng_info_init(&info);
+  lodepng_info_copy(&info, &state->info_png);
+
+  if((info.color.colortype == LCT_PALETTE || state->encoder.force_palette)
+      && (info.color.palettesize == 0 || info.color.palettesize > 256))
+  {
+    state->error = 68; /*invalid palette size, it is only allowed to be 1-256*/
+    return state->error;
+  }
+
+  if(state->encoder.auto_convert != LAC_NO)
+  {
+    state->error = lodepng_auto_choose_color(&info.color, image, w, h, &state->info_raw,
+                                             state->encoder.auto_convert);
+  }
+  if(state->error) return state->error;
+
+  if(state->encoder.zlibsettings.btype > 2)
+  {
+    CERROR_RETURN_ERROR(state->error, 61); /*error: unexisting btype*/
+  }
+  if(state->info_png.interlace_method > 1)
+  {
+    CERROR_RETURN_ERROR(state->error, 71); /*error: unexisting interlace mode*/
+  }
+
+  state->error = checkColorValidity(info.color.colortype, info.color.bitdepth);
+  if(state->error) return state->error; /*error: unexisting color type given*/
+  state->error = checkColorValidity(state->info_raw.colortype, state->info_raw.bitdepth);
+  if(state->error) return state->error; /*error: unexisting color type given*/
+
+  if(!lodepng_color_mode_equal(&state->info_raw, &info.color))
+  {
+    unsigned char* converted;
+    size_t size = (w * h * lodepng_get_bpp(&info.color) + 7) / 8;
+
+    converted = (unsigned char*)lodepng_malloc(size);
+    if(!converted && size) state->error = 83; /*alloc fail*/
+    if(!state->error)
+    {
+      state->error = lodepng_convert(converted, image, &info.color, &state->info_raw, w, h, 0 /*fix_png*/);
+    }
+    if(!state->error) preProcessScanlines(&data, &datasize, converted, w, h, &info, &state->encoder);
+    lodepng_free(converted);
+  }
+  else preProcessScanlines(&data, &datasize, image, w, h, &info, &state->encoder);
+
+  ucvector_init(&outv);
+  while(!state->error) /*while only executed once, to break on error*/
+  {
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+    size_t i;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    /*write signature and chunks*/
+    writeSignature(&outv);
+    /*IHDR*/
+    addChunk_IHDR(&outv, w, h, info.color.colortype, info.color.bitdepth, info.interlace_method);
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+    /*unknown chunks between IHDR and PLTE*/
+    if(info.unknown_chunks_data[0])
+    {
+      state->error = addUnknownChunks(&outv, info.unknown_chunks_data[0], info.unknown_chunks_size[0]);
+      if(state->error) break;
+    }
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    /*PLTE*/
+    if(info.color.colortype == LCT_PALETTE)
+    {
+      addChunk_PLTE(&outv, &info.color);
+    }
+    if(state->encoder.force_palette && (info.color.colortype == LCT_RGB || info.color.colortype == LCT_RGBA))
+    {
+      addChunk_PLTE(&outv, &info.color);
+    }
+    /*tRNS*/
+    if(info.color.colortype == LCT_PALETTE && getPaletteTranslucency(info.color.palette, info.color.palettesize) != 0)
+    {
+      addChunk_tRNS(&outv, &info.color);
+    }
+    if((info.color.colortype == LCT_GREY || info.color.colortype == LCT_RGB) && info.color.key_defined)
+    {
+      addChunk_tRNS(&outv, &info.color);
+    }
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+    /*bKGD (must come between PLTE and the IDAt chunks*/
+    if(info.background_defined) addChunk_bKGD(&outv, &info);
+    /*pHYs (must come before the IDAT chunks)*/
+    if(info.phys_defined) addChunk_pHYs(&outv, &info);
+
+    /*unknown chunks between PLTE and IDAT*/
+    if(info.unknown_chunks_data[1])
+    {
+      state->error = addUnknownChunks(&outv, info.unknown_chunks_data[1], info.unknown_chunks_size[1]);
+      if(state->error) break;
+    }
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    /*IDAT (multiple IDAT chunks must be consecutive)*/
+    state->error = addChunk_IDAT(&outv, data, datasize, &state->encoder.zlibsettings);
+    if(state->error) break;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+    /*tIME*/
+    if(info.time_defined) addChunk_tIME(&outv, &info.time);
+    /*tEXt and/or zTXt*/
+    for(i = 0; i < info.text_num; i++)
+    {
+      if(strlen(info.text_keys[i]) > 79)
+      {
+        state->error = 66; /*text chunk too large*/
+        break;
+      }
+      if(strlen(info.text_keys[i]) < 1)
+      {
+        state->error = 67; /*text chunk too small*/
+        break;
+      }
+      if(state->encoder.text_compression)
+      {
+        addChunk_zTXt(&outv, info.text_keys[i], info.text_strings[i], &state->encoder.zlibsettings);
+      }
+      else
+      {
+        addChunk_tEXt(&outv, info.text_keys[i], info.text_strings[i]);
+      }
+    }
+    /*LodePNG version id in text chunk*/
+    if(state->encoder.add_id)
+    {
+      unsigned alread_added_id_text = 0;
+      for(i = 0; i < info.text_num; i++)
+      {
+        if(!strcmp(info.text_keys[i], "LodePNG"))
+        {
+          alread_added_id_text = 1;
+          break;
+        }
+      }
+      if(alread_added_id_text == 0)
+      {
+        addChunk_tEXt(&outv, "LodePNG", VERSION_STRING); /*it's shorter as tEXt than as zTXt chunk*/
+      }
+    }
+    /*iTXt*/
+    for(i = 0; i < info.itext_num; i++)
+    {
+      if(strlen(info.itext_keys[i]) > 79)
+      {
+        state->error = 66; /*text chunk too large*/
+        break;
+      }
+      if(strlen(info.itext_keys[i]) < 1)
+      {
+        state->error = 67; /*text chunk too small*/
+        break;
+      }
+      addChunk_iTXt(&outv, state->encoder.text_compression,
+                    info.itext_keys[i], info.itext_langtags[i], info.itext_transkeys[i], info.itext_strings[i],
+                    &state->encoder.zlibsettings);
+    }
+
+    /*unknown chunks between IDAT and IEND*/
+    if(info.unknown_chunks_data[2])
+    {
+      state->error = addUnknownChunks(&outv, info.unknown_chunks_data[2], info.unknown_chunks_size[2]);
+      if(state->error) break;
+    }
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+    addChunk_IEND(&outv);
+
+    break; /*this isn't really a while loop; no error happened so break out now!*/
+  }
+
+  lodepng_info_cleanup(&info);
+  lodepng_free(data);
+  /*instead of cleaning the vector up, give it to the output*/
+  *out = outv.data;
+  *outsize = outv.size;
+
+  return state->error;
+}
+
+unsigned lodepng_encode_memory(unsigned char** out, size_t* outsize, const unsigned char* image,
+                               unsigned w, unsigned h, LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned error;
+  LodePNGState state;
+  lodepng_state_init(&state);
+  state.info_raw.colortype = colortype;
+  state.info_raw.bitdepth = bitdepth;
+  state.info_png.color.colortype = colortype;
+  state.info_png.color.bitdepth = bitdepth;
+  lodepng_encode(out, outsize, image, w, h, &state);
+  error = state.error;
+  lodepng_state_cleanup(&state);
+  return error;
+}
+
+unsigned lodepng_encode32(unsigned char** out, size_t* outsize, const unsigned char* image, unsigned w, unsigned h)
+{
+  return lodepng_encode_memory(out, outsize, image, w, h, LCT_RGBA, 8);
+}
+
+unsigned lodepng_encode24(unsigned char** out, size_t* outsize, const unsigned char* image, unsigned w, unsigned h)
+{
+  return lodepng_encode_memory(out, outsize, image, w, h, LCT_RGB, 8);
+}
+
+#ifdef LODEPNG_COMPILE_DISK
+unsigned lodepng_encode_file(const char* filename, const unsigned char* image, unsigned w, unsigned h,
+                             LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned char* buffer;
+  size_t buffersize;
+  unsigned error = lodepng_encode_memory(&buffer, &buffersize, image, w, h, colortype, bitdepth);
+  if(!error) error = lodepng_save_file(buffer, buffersize, filename);
+  lodepng_free(buffer);
+  return error;
+}
+
+unsigned lodepng_encode32_file(const char* filename, const unsigned char* image, unsigned w, unsigned h)
+{
+  return lodepng_encode_file(filename, image, w, h, LCT_RGBA, 8);
+}
+
+unsigned lodepng_encode24_file(const char* filename, const unsigned char* image, unsigned w, unsigned h)
+{
+  return lodepng_encode_file(filename, image, w, h, LCT_RGB, 8);
+}
+#endif /*LODEPNG_COMPILE_DISK*/
+
+void lodepng_encoder_settings_init(LodePNGEncoderSettings* settings)
+{
+  lodepng_compress_settings_init(&settings->zlibsettings);
+  settings->filter_palette_zero = 1;
+  settings->filter_strategy = LFS_MINSUM;
+  settings->auto_convert = LAC_AUTO;
+  settings->force_palette = 0;
+  settings->predefined_filters = 0;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  settings->add_id = 0;
+  settings->text_compression = 1;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+}
+
+#endif /*LODEPNG_COMPILE_ENCODER*/
+#endif /*LODEPNG_COMPILE_PNG*/
+
+#ifdef LODEPNG_COMPILE_ERROR_TEXT
+/*
+This returns the description of a numerical error code in English. This is also
+the documentation of all the error codes.
+*/
+const char* lodepng_error_text(unsigned code)
+{
+  switch(code)
+  {
+    case 0: return "no error, everything went ok";
+    case 1: return "nothing done yet"; /*the Encoder/Decoder has done nothing yet, error checking makes no sense yet*/
+    case 10: return "end of input memory reached without huffman end code"; /*while huffman decoding*/
+    case 11: return "error in code tree made it jump outside of huffman tree"; /*while huffman decoding*/
+    case 13: return "problem while processing dynamic deflate block";
+    case 14: return "problem while processing dynamic deflate block";
+    case 15: return "problem while processing dynamic deflate block";
+    case 16: return "unexisting code while processing dynamic deflate block";
+    case 17: return "end of out buffer memory reached while inflating";
+    case 18: return "invalid distance code while inflating";
+    case 19: return "end of out buffer memory reached while inflating";
+    case 20: return "invalid deflate block BTYPE encountered while decoding";
+    case 21: return "NLEN is not ones complement of LEN in a deflate block";
+     /*end of out buffer memory reached while inflating:
+     This can happen if the inflated deflate data is longer than the amount of bytes required to fill up
+     all the pixels of the image, given the color depth and image dimensions. Something that doesn't
+     happen in a normal, well encoded, PNG image.*/
+    case 22: return "end of out buffer memory reached while inflating";
+    case 23: return "end of in buffer memory reached while inflating";
+    case 24: return "invalid FCHECK in zlib header";
+    case 25: return "invalid compression method in zlib header";
+    case 26: return "FDICT encountered in zlib header while it's not used for PNG";
+    case 27: return "PNG file is smaller than a PNG header";
+    /*Checks the magic file header, the first 8 bytes of the PNG file*/
+    case 28: return "incorrect PNG signature, it's no PNG or corrupted";
+    case 29: return "first chunk is not the header chunk";
+    case 30: return "chunk length too large, chunk broken off at end of file";
+    case 31: return "illegal PNG color type or bpp";
+    case 32: return "illegal PNG compression method";
+    case 33: return "illegal PNG filter method";
+    case 34: return "illegal PNG interlace method";
+    case 35: return "chunk length of a chunk is too large or the chunk too small";
+    case 36: return "illegal PNG filter type encountered";
+    case 37: return "illegal bit depth for this color type given";
+    case 38: return "the palette is too big"; /*more than 256 colors*/
+    case 39: return "more palette alpha values given in tRNS chunk than there are colors in the palette";
+    case 40: return "tRNS chunk has wrong size for greyscale image";
+    case 41: return "tRNS chunk has wrong size for RGB image";
+    case 42: return "tRNS chunk appeared while it was not allowed for this color type";
+    case 43: return "bKGD chunk has wrong size for palette image";
+    case 44: return "bKGD chunk has wrong size for greyscale image";
+    case 45: return "bKGD chunk has wrong size for RGB image";
+    /*Is the palette too small?*/
+    case 46: return "a value in indexed image is larger than the palette size (bitdepth = 8)";
+    /*Is the palette too small?*/
+    case 47: return "a value in indexed image is larger than the palette size (bitdepth < 8)";
+    /*the input data is empty, maybe a PNG file doesn't exist or is in the wrong path*/
+    case 48: return "empty input or file doesn't exist";
+    case 49: return "jumped past memory while generating dynamic huffman tree";
+    case 50: return "jumped past memory while generating dynamic huffman tree";
+    case 51: return "jumped past memory while inflating huffman block";
+    case 52: return "jumped past memory while inflating";
+    case 53: return "size of zlib data too small";
+    case 54: return "repeat symbol in tree while there was no value symbol yet";
+    /*jumped past tree while generating huffman tree, this could be when the
+    tree will have more leaves than symbols after generating it out of the
+    given lenghts. They call this an oversubscribed dynamic bit lengths tree in zlib.*/
+    case 55: return "jumped past tree while generating huffman tree";
+    case 56: return "given output image colortype or bitdepth not supported for color conversion";
+    case 57: return "invalid CRC encountered (checking CRC can be disabled)";
+    case 58: return "invalid ADLER32 encountered (checking ADLER32 can be disabled)";
+    case 59: return "requested color conversion not supported";
+    case 60: return "invalid window size given in the settings of the encoder (must be 0-32768)";
+    case 61: return "invalid BTYPE given in the settings of the encoder (only 0, 1 and 2 are allowed)";
+    /*LodePNG leaves the choice of RGB to greyscale conversion formula to the user.*/
+    case 62: return "conversion from color to greyscale not supported";
+    case 63: return "length of a chunk too long, max allowed for PNG is 2147483647 bytes per chunk"; /*(2^31-1)*/
+    /*this would result in the inability of a deflated block to ever contain an end code. It must be at least 1.*/
+    case 64: return "the length of the END symbol 256 in the Huffman tree is 0";
+    case 66: return "the length of a text chunk keyword given to the encoder is longer than the maximum of 79 bytes";
+    case 67: return "the length of a text chunk keyword given to the encoder is smaller than the minimum of 1 byte";
+    case 68: return "tried to encode a PLTE chunk with a palette that has less than 1 or more than 256 colors";
+    case 69: return "unknown chunk type with 'critical' flag encountered by the decoder";
+    case 71: return "unexisting interlace mode given to encoder (must be 0 or 1)";
+    case 72: return "while decoding, unexisting compression method encountering in zTXt or iTXt chunk (it must be 0)";
+    case 73: return "invalid tIME chunk size";
+    case 74: return "invalid pHYs chunk size";
+    /*length could be wrong, or data chopped off*/
+    case 75: return "no null termination char found while decoding text chunk";
+    case 76: return "iTXt chunk too short to contain required bytes";
+    case 77: return "integer overflow in buffer size";
+    case 78: return "failed to open file for reading"; /*file doesn't exist or couldn't be opened for reading*/
+    case 79: return "failed to open file for writing";
+    case 80: return "tried creating a tree of 0 symbols";
+    case 81: return "lazy matching at pos 0 is impossible";
+    case 82: return "color conversion to palette requested while a color isn't in palette";
+    case 83: return "memory allocation failed";
+    case 84: return "given image too small to contain all pixels to be encoded";
+    case 85: return "internal color conversion bug";
+    case 86: return "impossible offset in lz77 encoding (internal bug)";
+    case 87: return "must provide custom zlib function pointer if LODEPNG_COMPILE_ZLIB is not defined";
+    case 88: return "invalid filter strategy given for LodePNGEncoderSettings.filter_strategy";
+    case 89: return "text chunk keyword too short or long: must have size 1-79";
+    /*the windowsize in the LodePNGCompressSettings. Requiring POT(==> & instead of %) makes encoding 12% faster.*/
+    case 90: return "windowsize must be a power of two";
+  }
+  return "unknown error code";
+}
+#endif /*LODEPNG_COMPILE_ERROR_TEXT*/
+
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* // C++ Wrapper                                                          // */
+/* ////////////////////////////////////////////////////////////////////////// */
+/* ////////////////////////////////////////////////////////////////////////// */
+
+#ifdef LODEPNG_COMPILE_CPP
+namespace lodepng
+{
+
+#ifdef LODEPNG_COMPILE_DISK
+void load_file(std::vector<unsigned char>& buffer, const std::string& filename)
+{
+  std::ifstream file(filename.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
+
+  /*get filesize*/
+  std::streamsize size = 0;
+  if(file.seekg(0, std::ios::end).good()) size = file.tellg();
+  if(file.seekg(0, std::ios::beg).good()) size -= file.tellg();
+
+  /*read contents of the file into the vector*/
+  buffer.resize(size_t(size));
+  if(size > 0) file.read((char*)(&buffer[0]), size);
+}
+
+/*write given buffer to the file, overwriting the file, it doesn't append to it.*/
+void save_file(const std::vector<unsigned char>& buffer, const std::string& filename)
+{
+  std::ofstream file(filename.c_str(), std::ios::out|std::ios::binary);
+  file.write(buffer.empty() ? 0 : (char*)&buffer[0], std::streamsize(buffer.size()));
+}
+#endif //LODEPNG_COMPILE_DISK
+
+#ifdef LODEPNG_COMPILE_ZLIB
+#ifdef LODEPNG_COMPILE_DECODER
+unsigned decompress(std::vector<unsigned char>& out, const unsigned char* in, size_t insize,
+                    const LodePNGDecompressSettings& settings)
+{
+  unsigned char* buffer = 0;
+  size_t buffersize = 0;
+  unsigned error = zlib_decompress(&buffer, &buffersize, in, insize, &settings);
+  if(buffer)
+  {
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+    lodepng_free(buffer);
+  }
+  return error;
+}
+
+unsigned decompress(std::vector<unsigned char>& out, const std::vector<unsigned char>& in,
+                    const LodePNGDecompressSettings& settings)
+{
+  return decompress(out, in.empty() ? 0 : &in[0], in.size(), settings);
+}
+#endif //LODEPNG_COMPILE_DECODER
+
+#ifdef LODEPNG_COMPILE_ENCODER
+unsigned compress(std::vector<unsigned char>& out, const unsigned char* in, size_t insize,
+                  const LodePNGCompressSettings& settings)
+{
+  unsigned char* buffer = 0;
+  size_t buffersize = 0;
+  unsigned error = zlib_compress(&buffer, &buffersize, in, insize, &settings);
+  if(buffer)
+  {
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+    lodepng_free(buffer);
+  }
+  return error;
+}
+
+unsigned compress(std::vector<unsigned char>& out, const std::vector<unsigned char>& in,
+                  const LodePNGCompressSettings& settings)
+{
+  return compress(out, in.empty() ? 0 : &in[0], in.size(), settings);
+}
+#endif //LODEPNG_COMPILE_ENCODER
+#endif //LODEPNG_COMPILE_ZLIB
+
+
+#ifdef LODEPNG_COMPILE_PNG
+
+State::State()
+{
+  lodepng_state_init(this);
+}
+
+State::State(const State& other)
+{
+  lodepng_state_init(this);
+  lodepng_state_copy(this, &other);
+}
+
+State::~State()
+{
+  lodepng_state_cleanup(this);
+}
+
+State& State::operator=(const State& other)
+{
+  lodepng_state_copy(this, &other);
+  return *this;
+}
+
+#ifdef LODEPNG_COMPILE_DECODER
+
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h, const unsigned char* in,
+                size_t insize, LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned char* buffer;
+  unsigned error = lodepng_decode_memory(&buffer, &w, &h, in, insize, colortype, bitdepth);
+  if(buffer && !error)
+  {
+    State state;
+    state.info_raw.colortype = colortype;
+    state.info_raw.bitdepth = bitdepth;
+    size_t buffersize = lodepng_get_raw_size(w, h, &state.info_raw);
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+    lodepng_free(buffer);
+  }
+  return error;
+}
+
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                const std::vector<unsigned char>& in, LodePNGColorType colortype, unsigned bitdepth)
+{
+  return decode(out, w, h, in.empty() ? 0 : &in[0], (unsigned)in.size(), colortype, bitdepth);
+}
+
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                State& state,
+                const unsigned char* in, size_t insize)
+{
+  unsigned char* buffer = NULL;
+  unsigned error = lodepng_decode(&buffer, &w, &h, &state, in, insize);
+  if(buffer && !error)
+  {
+    size_t buffersize = lodepng_get_raw_size(w, h, &state.info_raw);
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+  }
+  lodepng_free(buffer);
+  return error;
+}
+
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                State& state,
+                const std::vector<unsigned char>& in)
+{
+  return decode(out, w, h, state, in.empty() ? 0 : &in[0], in.size());
+}
+
+#ifdef LODEPNG_COMPILE_DISK
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h, const std::string& filename,
+                LodePNGColorType colortype, unsigned bitdepth)
+{
+  std::vector<unsigned char> buffer;
+  load_file(buffer, filename);
+  return decode(out, w, h, buffer, colortype, bitdepth);
+}
+#endif //LODEPNG_COMPILE_DECODER
+#endif //LODEPNG_COMPILE_DISK
+
+#ifdef LODEPNG_COMPILE_ENCODER
+unsigned encode(std::vector<unsigned char>& out, const unsigned char* in, unsigned w, unsigned h,
+                LodePNGColorType colortype, unsigned bitdepth)
+{
+  unsigned char* buffer;
+  size_t buffersize;
+  unsigned error = lodepng_encode_memory(&buffer, &buffersize, in, w, h, colortype, bitdepth);
+  if(buffer)
+  {
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+    lodepng_free(buffer);
+  }
+  return error;
+}
+
+unsigned encode(std::vector<unsigned char>& out,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                LodePNGColorType colortype, unsigned bitdepth)
+{
+  if(lodepng_get_raw_size_lct(w, h, colortype, bitdepth) > in.size()) return 84;
+  return encode(out, in.empty() ? 0 : &in[0], w, h, colortype, bitdepth);
+}
+
+unsigned encode(std::vector<unsigned char>& out,
+                const unsigned char* in, unsigned w, unsigned h,
+                State& state)
+{
+  unsigned char* buffer;
+  size_t buffersize;
+  unsigned error = lodepng_encode(&buffer, &buffersize, in, w, h, &state);
+  if(buffer)
+  {
+    out.insert(out.end(), &buffer[0], &buffer[buffersize]);
+    lodepng_free(buffer);
+  }
+  return error;
+}
+
+unsigned encode(std::vector<unsigned char>& out,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                State& state)
+{
+  if(lodepng_get_raw_size(w, h, &state.info_raw) > in.size()) return 84;
+  return encode(out, in.empty() ? 0 : &in[0], w, h, state);
+}
+
+#ifdef LODEPNG_COMPILE_DISK
+unsigned encode(const std::string& filename,
+                const unsigned char* in, unsigned w, unsigned h,
+                LodePNGColorType colortype, unsigned bitdepth)
+{
+  std::vector<unsigned char> buffer;
+  unsigned error = encode(buffer, in, w, h, colortype, bitdepth);
+  if(!error) save_file(buffer, filename);
+  return error;
+}
+
+unsigned encode(const std::string& filename,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                LodePNGColorType colortype, unsigned bitdepth)
+{
+  if(lodepng_get_raw_size_lct(w, h, colortype, bitdepth) > in.size()) return 84;
+  return encode(filename, in.empty() ? 0 : &in[0], w, h, colortype, bitdepth);
+}
+#endif //LODEPNG_COMPILE_DISK
+#endif //LODEPNG_COMPILE_ENCODER
+#endif //LODEPNG_COMPILE_PNG
+} //namespace lodepng
+#endif /*LODEPNG_COMPILE_CPP*/
diff --git a/src/zopflipng/lodepng/lodepng.h b/src/zopflipng/lodepng/lodepng.h
new file mode 100644
index 0000000..c497a5c
--- /dev/null
+++ b/src/zopflipng/lodepng/lodepng.h
@@ -0,0 +1,1716 @@
+/*
+LodePNG version 20131222
+
+Copyright (c) 2005-2013 Lode Vandevenne
+
+This software is provided 'as-is', without any express or implied
+warranty. In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+    1. The origin of this software must not be misrepresented; you must not
+    claim that you wrote the original software. If you use this software
+    in a product, an acknowledgment in the product documentation would be
+    appreciated but is not required.
+
+    2. Altered source versions must be plainly marked as such, and must not be
+    misrepresented as being the original software.
+
+    3. This notice may not be removed or altered from any source
+    distribution.
+*/
+
+#ifndef LODEPNG_H
+#define LODEPNG_H
+
+#include <string.h> /*for size_t*/
+
+#ifdef __cplusplus
+#include <vector>
+#include <string>
+#endif /*__cplusplus*/
+
+/*
+The following #defines are used to create code sections. They can be disabled
+to disable code sections, which can give faster compile time and smaller binary.
+The "NO_COMPILE" defines are designed to be used to pass as defines to the
+compiler command to disable them without modifying this header, e.g.
+-DLODEPNG_NO_COMPILE_ZLIB for gcc.
+*/
+/*deflate & zlib. If disabled, you must specify alternative zlib functions in
+the custom_zlib field of the compress and decompress settings*/
+#ifndef LODEPNG_NO_COMPILE_ZLIB
+#define LODEPNG_COMPILE_ZLIB
+#endif
+/*png encoder and png decoder*/
+#ifndef LODEPNG_NO_COMPILE_PNG
+#define LODEPNG_COMPILE_PNG
+#endif
+/*deflate&zlib decoder and png decoder*/
+#ifndef LODEPNG_NO_COMPILE_DECODER
+#define LODEPNG_COMPILE_DECODER
+#endif
+/*deflate&zlib encoder and png encoder*/
+#ifndef LODEPNG_NO_COMPILE_ENCODER
+#define LODEPNG_COMPILE_ENCODER
+#endif
+/*the optional built in harddisk file loading and saving functions*/
+#ifndef LODEPNG_NO_COMPILE_DISK
+#define LODEPNG_COMPILE_DISK
+#endif
+/*support for chunks other than IHDR, IDAT, PLTE, tRNS, IEND: ancillary and unknown chunks*/
+#ifndef LODEPNG_NO_COMPILE_ANCILLARY_CHUNKS
+#define LODEPNG_COMPILE_ANCILLARY_CHUNKS
+#endif
+/*ability to convert error numerical codes to English text string*/
+#ifndef LODEPNG_NO_COMPILE_ERROR_TEXT
+#define LODEPNG_COMPILE_ERROR_TEXT
+#endif
+/*Compile the default allocators (C's free, malloc and realloc). If you disable this,
+you can define the functions lodepng_free, lodepng_malloc and lodepng_realloc in your
+source files with custom allocators.*/
+#ifndef LODEPNG_NO_COMPILE_ALLOCATORS
+#define LODEPNG_COMPILE_ALLOCATORS
+#endif
+/*compile the C++ version (you can disable the C++ wrapper here even when compiling for C++)*/
+#ifdef __cplusplus
+#ifndef LODEPNG_NO_COMPILE_CPP
+#define LODEPNG_COMPILE_CPP
+#endif
+#endif
+
+#ifdef LODEPNG_COMPILE_PNG
+/*The PNG color types (also used for raw).*/
+typedef enum LodePNGColorType
+{
+  LCT_GREY = 0, /*greyscale: 1,2,4,8,16 bit*/
+  LCT_RGB = 2, /*RGB: 8,16 bit*/
+  LCT_PALETTE = 3, /*palette: 1,2,4,8 bit*/
+  LCT_GREY_ALPHA = 4, /*greyscale with alpha: 8,16 bit*/
+  LCT_RGBA = 6 /*RGB with alpha: 8,16 bit*/
+} LodePNGColorType;
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*
+Converts PNG data in memory to raw pixel data.
+out: Output parameter. Pointer to buffer that will contain the raw pixel data.
+     After decoding, its size is w * h * (bytes per pixel) bytes larger than
+     initially. Bytes per pixel depends on colortype and bitdepth.
+     Must be freed after usage with free(*out).
+     Note: for 16-bit per channel colors, uses big endian format like PNG does.
+w: Output parameter. Pointer to width of pixel data.
+h: Output parameter. Pointer to height of pixel data.
+in: Memory buffer with the PNG file.
+insize: size of the in buffer.
+colortype: the desired color type for the raw output image. See explanation on PNG color types.
+bitdepth: the desired bit depth for the raw output image. See explanation on PNG color types.
+Return value: LodePNG error code (0 means no error).
+*/
+unsigned lodepng_decode_memory(unsigned char** out, unsigned* w, unsigned* h,
+                               const unsigned char* in, size_t insize,
+                               LodePNGColorType colortype, unsigned bitdepth);
+
+/*Same as lodepng_decode_memory, but always decodes to 32-bit RGBA raw image*/
+unsigned lodepng_decode32(unsigned char** out, unsigned* w, unsigned* h,
+                          const unsigned char* in, size_t insize);
+
+/*Same as lodepng_decode_memory, but always decodes to 24-bit RGB raw image*/
+unsigned lodepng_decode24(unsigned char** out, unsigned* w, unsigned* h,
+                          const unsigned char* in, size_t insize);
+
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Load PNG from disk, from file with given name.
+Same as the other decode functions, but instead takes a filename as input.
+*/
+unsigned lodepng_decode_file(unsigned char** out, unsigned* w, unsigned* h,
+                             const char* filename,
+                             LodePNGColorType colortype, unsigned bitdepth);
+
+/*Same as lodepng_decode_file, but always decodes to 32-bit RGBA raw image.*/
+unsigned lodepng_decode32_file(unsigned char** out, unsigned* w, unsigned* h,
+                               const char* filename);
+
+/*Same as lodepng_decode_file, but always decodes to 24-bit RGB raw image.*/
+unsigned lodepng_decode24_file(unsigned char** out, unsigned* w, unsigned* h,
+                               const char* filename);
+#endif /*LODEPNG_COMPILE_DISK*/
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*
+Converts raw pixel data into a PNG image in memory. The colortype and bitdepth
+  of the output PNG image cannot be chosen, they are automatically determined
+  by the colortype, bitdepth and content of the input pixel data.
+  Note: for 16-bit per channel colors, needs big endian format like PNG does.
+out: Output parameter. Pointer to buffer that will contain the PNG image data.
+     Must be freed after usage with free(*out).
+outsize: Output parameter. Pointer to the size in bytes of the out buffer.
+image: The raw pixel data to encode. The size of this buffer should be
+       w * h * (bytes per pixel), bytes per pixel depends on colortype and bitdepth.
+w: width of the raw pixel data in pixels.
+h: height of the raw pixel data in pixels.
+colortype: the color type of the raw input image. See explanation on PNG color types.
+bitdepth: the bit depth of the raw input image. See explanation on PNG color types.
+Return value: LodePNG error code (0 means no error).
+*/
+unsigned lodepng_encode_memory(unsigned char** out, size_t* outsize,
+                               const unsigned char* image, unsigned w, unsigned h,
+                               LodePNGColorType colortype, unsigned bitdepth);
+
+/*Same as lodepng_encode_memory, but always encodes from 32-bit RGBA raw image.*/
+unsigned lodepng_encode32(unsigned char** out, size_t* outsize,
+                          const unsigned char* image, unsigned w, unsigned h);
+
+/*Same as lodepng_encode_memory, but always encodes from 24-bit RGB raw image.*/
+unsigned lodepng_encode24(unsigned char** out, size_t* outsize,
+                          const unsigned char* image, unsigned w, unsigned h);
+
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Converts raw pixel data into a PNG file on disk.
+Same as the other encode functions, but instead takes a filename as output.
+NOTE: This overwrites existing files without warning!
+*/
+unsigned lodepng_encode_file(const char* filename,
+                             const unsigned char* image, unsigned w, unsigned h,
+                             LodePNGColorType colortype, unsigned bitdepth);
+
+/*Same as lodepng_encode_file, but always encodes from 32-bit RGBA raw image.*/
+unsigned lodepng_encode32_file(const char* filename,
+                               const unsigned char* image, unsigned w, unsigned h);
+
+/*Same as lodepng_encode_file, but always encodes from 24-bit RGB raw image.*/
+unsigned lodepng_encode24_file(const char* filename,
+                               const unsigned char* image, unsigned w, unsigned h);
+#endif /*LODEPNG_COMPILE_DISK*/
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+
+#ifdef LODEPNG_COMPILE_CPP
+namespace lodepng
+{
+#ifdef LODEPNG_COMPILE_DECODER
+/*Same as lodepng_decode_memory, but decodes to an std::vector.*/
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                const unsigned char* in, size_t insize,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                const std::vector<unsigned char>& in,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Converts PNG file from disk to raw pixel data in memory.
+Same as the other decode functions, but instead takes a filename as input.
+*/
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                const std::string& filename,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+#endif //LODEPNG_COMPILE_DISK
+#endif //LODEPNG_COMPILE_DECODER
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*Same as lodepng_encode_memory, but encodes to an std::vector.*/
+unsigned encode(std::vector<unsigned char>& out,
+                const unsigned char* in, unsigned w, unsigned h,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+unsigned encode(std::vector<unsigned char>& out,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Converts 32-bit RGBA raw pixel data into a PNG file on disk.
+Same as the other encode functions, but instead takes a filename as output.
+NOTE: This overwrites existing files without warning!
+*/
+unsigned encode(const std::string& filename,
+                const unsigned char* in, unsigned w, unsigned h,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+unsigned encode(const std::string& filename,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                LodePNGColorType colortype = LCT_RGBA, unsigned bitdepth = 8);
+#endif //LODEPNG_COMPILE_DISK
+#endif //LODEPNG_COMPILE_ENCODER
+} //namespace lodepng
+#endif /*LODEPNG_COMPILE_CPP*/
+#endif /*LODEPNG_COMPILE_PNG*/
+
+#ifdef LODEPNG_COMPILE_ERROR_TEXT
+/*Returns an English description of the numerical error code.*/
+const char* lodepng_error_text(unsigned code);
+#endif /*LODEPNG_COMPILE_ERROR_TEXT*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*Settings for zlib decompression*/
+typedef struct LodePNGDecompressSettings LodePNGDecompressSettings;
+struct LodePNGDecompressSettings
+{
+  unsigned ignore_adler32; /*if 1, continue and don't give an error message if the Adler32 checksum is corrupted*/
+
+  /*use custom zlib decoder instead of built in one (default: null)*/
+  unsigned (*custom_zlib)(unsigned char**, size_t*,
+                          const unsigned char*, size_t,
+                          const LodePNGDecompressSettings*);
+  /*use custom deflate decoder instead of built in one (default: null)
+  if custom_zlib is used, custom_deflate is ignored since only the built in
+  zlib function will call custom_deflate*/
+  unsigned (*custom_inflate)(unsigned char**, size_t*,
+                             const unsigned char*, size_t,
+                             const LodePNGDecompressSettings*);
+
+  const void* custom_context; /*optional custom settings for custom functions*/
+};
+
+extern const LodePNGDecompressSettings lodepng_default_decompress_settings;
+void lodepng_decompress_settings_init(LodePNGDecompressSettings* settings);
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*
+Settings for zlib compression. Tweaking these settings tweaks the balance
+between speed and compression ratio.
+*/
+typedef struct LodePNGCompressSettings LodePNGCompressSettings;
+struct LodePNGCompressSettings /*deflate = compress*/
+{
+  /*LZ77 related settings*/
+  unsigned btype; /*the block type for LZ (0, 1, 2 or 3, see zlib standard). Should be 2 for proper compression.*/
+  unsigned use_lz77; /*whether or not to use LZ77. Should be 1 for proper compression.*/
+  unsigned windowsize; /*must be a power of two <= 32768. higher compresses more but is slower. Typical value: 2048.*/
+  unsigned minmatch; /*mininum lz77 length. 3 is normally best, 6 can be better for some PNGs. Default: 0*/
+  unsigned nicematch; /*stop searching if >= this length found. Set to 258 for best compression. Default: 128*/
+  unsigned lazymatching; /*use lazy matching: better compression but a bit slower. Default: true*/
+
+  /*use custom zlib encoder instead of built in one (default: null)*/
+  unsigned (*custom_zlib)(unsigned char**, size_t*,
+                          const unsigned char*, size_t,
+                          const LodePNGCompressSettings*);
+  /*use custom deflate encoder instead of built in one (default: null)
+  if custom_zlib is used, custom_deflate is ignored since only the built in
+  zlib function will call custom_deflate*/
+  unsigned (*custom_deflate)(unsigned char**, size_t*,
+                             const unsigned char*, size_t,
+                             const LodePNGCompressSettings*);
+
+  const void* custom_context; /*optional custom settings for custom functions*/
+};
+
+extern const LodePNGCompressSettings lodepng_default_compress_settings;
+void lodepng_compress_settings_init(LodePNGCompressSettings* settings);
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#ifdef LODEPNG_COMPILE_PNG
+/*
+Color mode of an image. Contains all information required to decode the pixel
+bits to RGBA colors. This information is the same as used in the PNG file
+format, and is used both for PNG and raw image data in LodePNG.
+*/
+typedef struct LodePNGColorMode
+{
+  /*header (IHDR)*/
+  LodePNGColorType colortype; /*color type, see PNG standard or documentation further in this header file*/
+  unsigned bitdepth;  /*bits per sample, see PNG standard or documentation further in this header file*/
+
+  /*
+  palette (PLTE and tRNS)
+
+  Dynamically allocated with the colors of the palette, including alpha.
+  When encoding a PNG, to store your colors in the palette of the LodePNGColorMode, first use
+  lodepng_palette_clear, then for each color use lodepng_palette_add.
+  If you encode an image without alpha with palette, don't forget to put value 255 in each A byte of the palette.
+
+  When decoding, by default you can ignore this palette, since LodePNG already
+  fills the palette colors in the pixels of the raw RGBA output.
+
+  The palette is only supported for color type 3.
+  */
+  unsigned char* palette; /*palette in RGBARGBA... order. When allocated, must be either 0, or have size 1024*/
+  size_t palettesize; /*palette size in number of colors (amount of bytes is 4 * palettesize)*/
+
+  /*
+  transparent color key (tRNS)
+
+  This color uses the same bit depth as the bitdepth value in this struct, which can be 1-bit to 16-bit.
+  For greyscale PNGs, r, g and b will all 3 be set to the same.
+
+  When decoding, by default you can ignore this information, since LodePNG sets
+  pixels with this key to transparent already in the raw RGBA output.
+
+  The color key is only supported for color types 0 and 2.
+  */
+  unsigned key_defined; /*is a transparent color key given? 0 = false, 1 = true*/
+  unsigned key_r;       /*red/greyscale component of color key*/
+  unsigned key_g;       /*green component of color key*/
+  unsigned key_b;       /*blue component of color key*/
+} LodePNGColorMode;
+
+/*init, cleanup and copy functions to use with this struct*/
+void lodepng_color_mode_init(LodePNGColorMode* info);
+void lodepng_color_mode_cleanup(LodePNGColorMode* info);
+/*return value is error code (0 means no error)*/
+unsigned lodepng_color_mode_copy(LodePNGColorMode* dest, const LodePNGColorMode* source);
+
+void lodepng_palette_clear(LodePNGColorMode* info);
+/*add 1 color to the palette*/
+unsigned lodepng_palette_add(LodePNGColorMode* info,
+                             unsigned char r, unsigned char g, unsigned char b, unsigned char a);
+
+/*get the total amount of bits per pixel, based on colortype and bitdepth in the struct*/
+unsigned lodepng_get_bpp(const LodePNGColorMode* info);
+/*get the amount of color channels used, based on colortype in the struct.
+If a palette is used, it counts as 1 channel.*/
+unsigned lodepng_get_channels(const LodePNGColorMode* info);
+/*is it a greyscale type? (only colortype 0 or 4)*/
+unsigned lodepng_is_greyscale_type(const LodePNGColorMode* info);
+/*has it got an alpha channel? (only colortype 2 or 6)*/
+unsigned lodepng_is_alpha_type(const LodePNGColorMode* info);
+/*has it got a palette? (only colortype 3)*/
+unsigned lodepng_is_palette_type(const LodePNGColorMode* info);
+/*only returns true if there is a palette and there is a value in the palette with alpha < 255.
+Loops through the palette to check this.*/
+unsigned lodepng_has_palette_alpha(const LodePNGColorMode* info);
+/*
+Check if the given color info indicates the possibility of having non-opaque pixels in the PNG image.
+Returns true if the image can have translucent or invisible pixels (it still be opaque if it doesn't use such pixels).
+Returns false if the image can only have opaque pixels.
+In detail, it returns true only if it's a color type with alpha, or has a palette with non-opaque values,
+or if "key_defined" is true.
+*/
+unsigned lodepng_can_have_alpha(const LodePNGColorMode* info);
+/*Returns the byte size of a raw image buffer with given width, height and color mode*/
+size_t lodepng_get_raw_size(unsigned w, unsigned h, const LodePNGColorMode* color);
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+/*The information of a Time chunk in PNG.*/
+typedef struct LodePNGTime
+{
+  unsigned year;    /*2 bytes used (0-65535)*/
+  unsigned month;   /*1-12*/
+  unsigned day;     /*1-31*/
+  unsigned hour;    /*0-23*/
+  unsigned minute;  /*0-59*/
+  unsigned second;  /*0-60 (to allow for leap seconds)*/
+} LodePNGTime;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+/*Information about the PNG image, except pixels, width and height.*/
+typedef struct LodePNGInfo
+{
+  /*header (IHDR), palette (PLTE) and transparency (tRNS) chunks*/
+  unsigned compression_method;/*compression method of the original file. Always 0.*/
+  unsigned filter_method;     /*filter method of the original file*/
+  unsigned interlace_method;  /*interlace method of the original file*/
+  LodePNGColorMode color;     /*color type and bits, palette and transparency of the PNG file*/
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  /*
+  suggested background color chunk (bKGD)
+  This color uses the same color mode as the PNG (except alpha channel), which can be 1-bit to 16-bit.
+
+  For greyscale PNGs, r, g and b will all 3 be set to the same. When encoding
+  the encoder writes the red one. For palette PNGs: When decoding, the RGB value
+  will be stored, not a palette index. But when encoding, specify the index of
+  the palette in background_r, the other two are then ignored.
+
+  The decoder does not use this background color to edit the color of pixels.
+  */
+  unsigned background_defined; /*is a suggested background color given?*/
+  unsigned background_r;       /*red component of suggested background color*/
+  unsigned background_g;       /*green component of suggested background color*/
+  unsigned background_b;       /*blue component of suggested background color*/
+
+  /*
+  non-international text chunks (tEXt and zTXt)
+
+  The char** arrays each contain num strings. The actual messages are in
+  text_strings, while text_keys are keywords that give a short description what
+  the actual text represents, e.g. Title, Author, Description, or anything else.
+
+  A keyword is minimum 1 character and maximum 79 characters long. It's
+  discouraged to use a single line length longer than 79 characters for texts.
+
+  Don't allocate these text buffers yourself. Use the init/cleanup functions
+  correctly and use lodepng_add_text and lodepng_clear_text.
+  */
+  size_t text_num; /*the amount of texts in these char** buffers (there may be more texts in itext)*/
+  char** text_keys; /*the keyword of a text chunk (e.g. "Comment")*/
+  char** text_strings; /*the actual text*/
+
+  /*
+  international text chunks (iTXt)
+  Similar to the non-international text chunks, but with additional strings
+  "langtags" and "transkeys".
+  */
+  size_t itext_num; /*the amount of international texts in this PNG*/
+  char** itext_keys; /*the English keyword of the text chunk (e.g. "Comment")*/
+  char** itext_langtags; /*language tag for this text's language, ISO/IEC 646 string, e.g. ISO 639 language tag*/
+  char** itext_transkeys; /*keyword translated to the international language - UTF-8 string*/
+  char** itext_strings; /*the actual international text - UTF-8 string*/
+
+  /*time chunk (tIME)*/
+  unsigned time_defined; /*set to 1 to make the encoder generate a tIME chunk*/
+  LodePNGTime time;
+
+  /*phys chunk (pHYs)*/
+  unsigned phys_defined; /*if 0, there is no pHYs chunk and the values below are undefined, if 1 else there is one*/
+  unsigned phys_x; /*pixels per unit in x direction*/
+  unsigned phys_y; /*pixels per unit in y direction*/
+  unsigned phys_unit; /*may be 0 (unknown unit) or 1 (metre)*/
+
+  /*
+  unknown chunks
+  There are 3 buffers, one for each position in the PNG where unknown chunks can appear
+  each buffer contains all unknown chunks for that position consecutively
+  The 3 buffers are the unknown chunks between certain critical chunks:
+  0: IHDR-PLTE, 1: PLTE-IDAT, 2: IDAT-IEND
+  Do not allocate or traverse this data yourself. Use the chunk traversing functions declared
+  later, such as lodepng_chunk_next and lodepng_chunk_append, to read/write this struct.
+  */
+  unsigned char* unknown_chunks_data[3];
+  size_t unknown_chunks_size[3]; /*size in bytes of the unknown chunks, given for protection*/
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+} LodePNGInfo;
+
+/*init, cleanup and copy functions to use with this struct*/
+void lodepng_info_init(LodePNGInfo* info);
+void lodepng_info_cleanup(LodePNGInfo* info);
+/*return value is error code (0 means no error)*/
+unsigned lodepng_info_copy(LodePNGInfo* dest, const LodePNGInfo* source);
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+void lodepng_clear_text(LodePNGInfo* info); /*use this to clear the texts again after you filled them in*/
+unsigned lodepng_add_text(LodePNGInfo* info, const char* key, const char* str); /*push back both texts at once*/
+
+void lodepng_clear_itext(LodePNGInfo* info); /*use this to clear the itexts again after you filled them in*/
+unsigned lodepng_add_itext(LodePNGInfo* info, const char* key, const char* langtag,
+                           const char* transkey, const char* str); /*push back the 4 texts of 1 chunk at once*/
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+
+/*
+Converts raw buffer from one color type to another color type, based on
+LodePNGColorMode structs to describe the input and output color type.
+See the reference manual at the end of this header file to see which color conversions are supported.
+return value = LodePNG error code (0 if all went ok, an error if the conversion isn't supported)
+The out buffer must have size (w * h * bpp + 7) / 8, where bpp is the bits per pixel
+of the output color type (lodepng_get_bpp)
+The fix_png value works as described in struct LodePNGDecoderSettings.
+Note: for 16-bit per channel colors, uses big endian format like PNG does.
+*/
+unsigned lodepng_convert(unsigned char* out, const unsigned char* in,
+                         LodePNGColorMode* mode_out, const LodePNGColorMode* mode_in,
+                         unsigned w, unsigned h, unsigned fix_png);
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*
+Settings for the decoder. This contains settings for the PNG and the Zlib
+decoder, but not the Info settings from the Info structs.
+*/
+typedef struct LodePNGDecoderSettings
+{
+  LodePNGDecompressSettings zlibsettings; /*in here is the setting to ignore Adler32 checksums*/
+
+  unsigned ignore_crc; /*ignore CRC checksums*/
+  /*
+  The fix_png setting, if 1, makes the decoder tolerant towards some PNG images
+  that do not correctly follow the PNG specification. This only supports errors
+  that are fixable, were found in images that are actually used on the web, and
+  are silently tolerated by other decoders as well. Currently only one such fix
+  is implemented: if a palette index is out of bounds given the palette size,
+  interpret it as opaque black.
+  By default this value is 0, which makes it stop with an error on such images.
+  */
+  unsigned fix_png;
+  unsigned color_convert; /*whether to convert the PNG to the color type you want. Default: yes*/
+
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  unsigned read_text_chunks; /*if false but remember_unknown_chunks is true, they're stored in the unknown chunks*/
+  /*store all bytes from unknown chunks in the LodePNGInfo (off by default, useful for a png editor)*/
+  unsigned remember_unknown_chunks;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+} LodePNGDecoderSettings;
+
+void lodepng_decoder_settings_init(LodePNGDecoderSettings* settings);
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*automatically use color type with less bits per pixel if losslessly possible. Default: AUTO*/
+typedef enum LodePNGFilterStrategy
+{
+  /*every filter at zero*/
+  LFS_ZERO,
+  /*Use filter that gives minumum sum, as described in the official PNG filter heuristic.*/
+  LFS_MINSUM,
+  /*Use the filter type that gives smallest Shannon entropy for this scanline. Depending
+  on the image, this is better or worse than minsum.*/
+  LFS_ENTROPY,
+  /*
+  Brute-force-search PNG filters by compressing each filter for each scanline.
+  Experimental, very slow, and only rarely gives better compression than MINSUM.
+  */
+  LFS_BRUTE_FORCE,
+  /*use predefined_filters buffer: you specify the filter type for each scanline*/
+  LFS_PREDEFINED
+} LodePNGFilterStrategy;
+
+/*automatically use color type with less bits per pixel if losslessly possible. Default: LAC_AUTO*/
+typedef enum LodePNGAutoConvert
+{
+  LAC_NO, /*use color type user requested*/
+  LAC_ALPHA, /*use color type user requested, but if only opaque pixels and RGBA or grey+alpha, use RGB or grey*/
+  LAC_AUTO, /*use PNG color type that can losslessly represent the uncompressed image the smallest possible*/
+  /*
+  like AUTO, but do not choose 1, 2 or 4 bit per pixel types.
+  sometimes a PNG image compresses worse if less than 8 bits per pixels.
+  */
+  LAC_AUTO_NO_NIBBLES,
+  /*
+  like AUTO, but never choose palette color type. For small images, encoding
+  the palette may take more bytes than what is gained. Note that AUTO also
+  already prevents encoding the palette for extremely small images, but that may
+  not be sufficient because due to the compression it cannot predict when to
+  switch.
+  */
+  LAC_AUTO_NO_PALETTE,
+  LAC_AUTO_NO_NIBBLES_NO_PALETTE
+} LodePNGAutoConvert;
+
+
+/*
+Automatically chooses color type that gives smallest amount of bits in the
+output image, e.g. grey if there are only greyscale pixels, palette if there
+are less than 256 colors, ...
+The auto_convert parameter allows limiting it to not use palette, ...
+*/
+unsigned lodepng_auto_choose_color(LodePNGColorMode* mode_out,
+                                   const unsigned char* image, unsigned w, unsigned h,
+                                   const LodePNGColorMode* mode_in,
+                                   LodePNGAutoConvert auto_convert);
+
+/*Settings for the encoder.*/
+typedef struct LodePNGEncoderSettings
+{
+  LodePNGCompressSettings zlibsettings; /*settings for the zlib encoder, such as window size, ...*/
+
+  LodePNGAutoConvert auto_convert; /*how to automatically choose output PNG color type, if at all*/
+
+  /*If true, follows the official PNG heuristic: if the PNG uses a palette or lower than
+  8 bit depth, set all filters to zero. Otherwise use the filter_strategy. Note that to
+  completely follow the official PNG heuristic, filter_palette_zero must be true and
+  filter_strategy must be LFS_MINSUM*/
+  unsigned filter_palette_zero;
+  /*Which filter strategy to use when not using zeroes due to filter_palette_zero.
+  Set filter_palette_zero to 0 to ensure always using your chosen strategy. Default: LFS_MINSUM*/
+  LodePNGFilterStrategy filter_strategy;
+  /*used if filter_strategy is LFS_PREDEFINED. In that case, this must point to a buffer with
+  the same length as the amount of scanlines in the image, and each value must <= 5. You
+  have to cleanup this buffer, LodePNG will never free it. Don't forget that filter_palette_zero
+  must be set to 0 to ensure this is also used on palette or low bitdepth images.*/
+  const unsigned char* predefined_filters;
+
+  /*force creating a PLTE chunk if colortype is 2 or 6 (= a suggested palette).
+  If colortype is 3, PLTE is _always_ created.*/
+  unsigned force_palette;
+#ifdef LODEPNG_COMPILE_ANCILLARY_CHUNKS
+  /*add LodePNG identifier and version as a text chunk, for debugging*/
+  unsigned add_id;
+  /*encode text chunks as zTXt chunks instead of tEXt chunks, and use compression in iTXt chunks*/
+  unsigned text_compression;
+#endif /*LODEPNG_COMPILE_ANCILLARY_CHUNKS*/
+} LodePNGEncoderSettings;
+
+void lodepng_encoder_settings_init(LodePNGEncoderSettings* settings);
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+
+#if defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_ENCODER)
+/*The settings, state and information for extended encoding and decoding.*/
+typedef struct LodePNGState
+{
+#ifdef LODEPNG_COMPILE_DECODER
+  LodePNGDecoderSettings decoder; /*the decoding settings*/
+#endif /*LODEPNG_COMPILE_DECODER*/
+#ifdef LODEPNG_COMPILE_ENCODER
+  LodePNGEncoderSettings encoder; /*the encoding settings*/
+#endif /*LODEPNG_COMPILE_ENCODER*/
+  LodePNGColorMode info_raw; /*specifies the format in which you would like to get the raw pixel buffer*/
+  LodePNGInfo info_png; /*info of the PNG image obtained after decoding*/
+  unsigned error;
+#ifdef LODEPNG_COMPILE_CPP
+  //For the lodepng::State subclass.
+  virtual ~LodePNGState(){}
+#endif
+} LodePNGState;
+
+/*init, cleanup and copy functions to use with this struct*/
+void lodepng_state_init(LodePNGState* state);
+void lodepng_state_cleanup(LodePNGState* state);
+void lodepng_state_copy(LodePNGState* dest, const LodePNGState* source);
+#endif /* defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_ENCODER) */
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*
+Same as lodepng_decode_memory, but uses a LodePNGState to allow custom settings and
+getting much more information about the PNG image and color mode.
+*/
+unsigned lodepng_decode(unsigned char** out, unsigned* w, unsigned* h,
+                        LodePNGState* state,
+                        const unsigned char* in, size_t insize);
+
+/*
+Read the PNG header, but not the actual data. This returns only the information
+that is in the header chunk of the PNG, such as width, height and color type. The
+information is placed in the info_png field of the LodePNGState.
+*/
+unsigned lodepng_inspect(unsigned* w, unsigned* h,
+                         LodePNGState* state,
+                         const unsigned char* in, size_t insize);
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*This function allocates the out buffer with standard malloc and stores the size in *outsize.*/
+unsigned lodepng_encode(unsigned char** out, size_t* outsize,
+                        const unsigned char* image, unsigned w, unsigned h,
+                        LodePNGState* state);
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+/*
+The lodepng_chunk functions are normally not needed, except to traverse the
+unknown chunks stored in the LodePNGInfo struct, or add new ones to it.
+It also allows traversing the chunks of an encoded PNG file yourself.
+
+PNG standard chunk naming conventions:
+First byte: uppercase = critical, lowercase = ancillary
+Second byte: uppercase = public, lowercase = private
+Third byte: must be uppercase
+Fourth byte: uppercase = unsafe to copy, lowercase = safe to copy
+*/
+
+/*get the length of the data of the chunk. Total chunk length has 12 bytes more.*/
+unsigned lodepng_chunk_length(const unsigned char* chunk);
+
+/*puts the 4-byte type in null terminated string*/
+void lodepng_chunk_type(char type[5], const unsigned char* chunk);
+
+/*check if the type is the given type*/
+unsigned char lodepng_chunk_type_equals(const unsigned char* chunk, const char* type);
+
+/*0: it's one of the critical chunk types, 1: it's an ancillary chunk (see PNG standard)*/
+unsigned char lodepng_chunk_ancillary(const unsigned char* chunk);
+
+/*0: public, 1: private (see PNG standard)*/
+unsigned char lodepng_chunk_private(const unsigned char* chunk);
+
+/*0: the chunk is unsafe to copy, 1: the chunk is safe to copy (see PNG standard)*/
+unsigned char lodepng_chunk_safetocopy(const unsigned char* chunk);
+
+/*get pointer to the data of the chunk, where the input points to the header of the chunk*/
+unsigned char* lodepng_chunk_data(unsigned char* chunk);
+const unsigned char* lodepng_chunk_data_const(const unsigned char* chunk);
+
+/*returns 0 if the crc is correct, 1 if it's incorrect (0 for OK as usual!)*/
+unsigned lodepng_chunk_check_crc(const unsigned char* chunk);
+
+/*generates the correct CRC from the data and puts it in the last 4 bytes of the chunk*/
+void lodepng_chunk_generate_crc(unsigned char* chunk);
+
+/*iterate to next chunks. don't use on IEND chunk, as there is no next chunk then*/
+unsigned char* lodepng_chunk_next(unsigned char* chunk);
+const unsigned char* lodepng_chunk_next_const(const unsigned char* chunk);
+
+/*
+Appends chunk to the data in out. The given chunk should already have its chunk header.
+The out variable and outlength are updated to reflect the new reallocated buffer.
+Returns error code (0 if it went ok)
+*/
+unsigned lodepng_chunk_append(unsigned char** out, size_t* outlength, const unsigned char* chunk);
+
+/*
+Appends new chunk to out. The chunk to append is given by giving its length, type
+and data separately. The type is a 4-letter string.
+The out variable and outlength are updated to reflect the new reallocated buffer.
+Returne error code (0 if it went ok)
+*/
+unsigned lodepng_chunk_create(unsigned char** out, size_t* outlength, unsigned length,
+                              const char* type, const unsigned char* data);
+
+
+/*Calculate CRC32 of buffer*/
+unsigned lodepng_crc32(const unsigned char* buf, size_t len);
+#endif /*LODEPNG_COMPILE_PNG*/
+
+
+#ifdef LODEPNG_COMPILE_ZLIB
+/*
+This zlib part can be used independently to zlib compress and decompress a
+buffer. It cannot be used to create gzip files however, and it only supports the
+part of zlib that is required for PNG, it does not support dictionaries.
+*/
+
+#ifdef LODEPNG_COMPILE_DECODER
+/*Inflate a buffer. Inflate is the decompression step of deflate. Out buffer must be freed after use.*/
+unsigned lodepng_inflate(unsigned char** out, size_t* outsize,
+                         const unsigned char* in, size_t insize,
+                         const LodePNGDecompressSettings* settings);
+
+/*
+Decompresses Zlib data. Reallocates the out buffer and appends the data. The
+data must be according to the zlib specification.
+Either, *out must be NULL and *outsize must be 0, or, *out must be a valid
+buffer and *outsize its size in bytes. out must be freed by user after usage.
+*/
+unsigned lodepng_zlib_decompress(unsigned char** out, size_t* outsize,
+                                 const unsigned char* in, size_t insize,
+                                 const LodePNGDecompressSettings* settings);
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+/*
+Compresses data with Zlib. Reallocates the out buffer and appends the data.
+Zlib adds a small header and trailer around the deflate data.
+The data is output in the format of the zlib specification.
+Either, *out must be NULL and *outsize must be 0, or, *out must be a valid
+buffer and *outsize its size in bytes. out must be freed by user after usage.
+*/
+unsigned lodepng_zlib_compress(unsigned char** out, size_t* outsize,
+                               const unsigned char* in, size_t insize,
+                               const LodePNGCompressSettings* settings);
+
+/*
+Find length-limited Huffman code for given frequencies. This function is in the
+public interface only for tests, it's used internally by lodepng_deflate.
+*/
+unsigned lodepng_huffman_code_lengths(unsigned* lengths, const unsigned* frequencies,
+                                      size_t numcodes, unsigned maxbitlen);
+
+/*Compress a buffer with deflate. See RFC 1951. Out buffer must be freed after use.*/
+unsigned lodepng_deflate(unsigned char** out, size_t* outsize,
+                         const unsigned char* in, size_t insize,
+                         const LodePNGCompressSettings* settings);
+
+#endif /*LODEPNG_COMPILE_ENCODER*/
+#endif /*LODEPNG_COMPILE_ZLIB*/
+
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Load a file from disk into buffer. The function allocates the out buffer, and
+after usage you should free it.
+out: output parameter, contains pointer to loaded buffer.
+outsize: output parameter, size of the allocated out buffer
+filename: the path to the file to load
+return value: error code (0 means ok)
+*/
+unsigned lodepng_load_file(unsigned char** out, size_t* outsize, const char* filename);
+
+/*
+Save a file from buffer to disk. Warning, if it exists, this function overwrites
+the file without warning!
+buffer: the buffer to write
+buffersize: size of the buffer to write
+filename: the path to the file to save to
+return value: error code (0 means ok)
+*/
+unsigned lodepng_save_file(const unsigned char* buffer, size_t buffersize, const char* filename);
+#endif /*LODEPNG_COMPILE_DISK*/
+
+#ifdef LODEPNG_COMPILE_CPP
+//The LodePNG C++ wrapper uses std::vectors instead of manually allocated memory buffers.
+namespace lodepng
+{
+#ifdef LODEPNG_COMPILE_PNG
+class State : public LodePNGState
+{
+  public:
+    State();
+    State(const State& other);
+    virtual ~State();
+    State& operator=(const State& other);
+};
+
+#ifdef LODEPNG_COMPILE_DECODER
+//Same as other lodepng::decode, but using a State for more settings and information.
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                State& state,
+                const unsigned char* in, size_t insize);
+unsigned decode(std::vector<unsigned char>& out, unsigned& w, unsigned& h,
+                State& state,
+                const std::vector<unsigned char>& in);
+#endif /*LODEPNG_COMPILE_DECODER*/
+
+#ifdef LODEPNG_COMPILE_ENCODER
+//Same as other lodepng::encode, but using a State for more settings and information.
+unsigned encode(std::vector<unsigned char>& out,
+                const unsigned char* in, unsigned w, unsigned h,
+                State& state);
+unsigned encode(std::vector<unsigned char>& out,
+                const std::vector<unsigned char>& in, unsigned w, unsigned h,
+                State& state);
+#endif /*LODEPNG_COMPILE_ENCODER*/
+
+#ifdef LODEPNG_COMPILE_DISK
+/*
+Load a file from disk into an std::vector. If the vector is empty, then either
+the file doesn't exist or is an empty file.
+*/
+void load_file(std::vector<unsigned char>& buffer, const std::string& filename);
+
+/*
+Save the binary data in an std::vector to a file on disk. The file is overwritten
+without warning.
+*/
+void save_file(const std::vector<unsigned char>& buffer, const std::string& filename);
+#endif //LODEPNG_COMPILE_DISK
+#endif //LODEPNG_COMPILE_PNG
+
+#ifdef LODEPNG_COMPILE_ZLIB
+#ifdef LODEPNG_COMPILE_DECODER
+//Zlib-decompress an unsigned char buffer
+unsigned decompress(std::vector<unsigned char>& out, const unsigned char* in, size_t insize,
+                    const LodePNGDecompressSettings& settings = lodepng_default_decompress_settings);
+
+//Zlib-decompress an std::vector
+unsigned decompress(std::vector<unsigned char>& out, const std::vector<unsigned char>& in,
+                    const LodePNGDecompressSettings& settings = lodepng_default_decompress_settings);
+#endif //LODEPNG_COMPILE_DECODER
+
+#ifdef LODEPNG_COMPILE_ENCODER
+//Zlib-compress an unsigned char buffer
+unsigned compress(std::vector<unsigned char>& out, const unsigned char* in, size_t insize,
+                  const LodePNGCompressSettings& settings = lodepng_default_compress_settings);
+
+//Zlib-compress an std::vector
+unsigned compress(std::vector<unsigned char>& out, const std::vector<unsigned char>& in,
+                  const LodePNGCompressSettings& settings = lodepng_default_compress_settings);
+#endif //LODEPNG_COMPILE_ENCODER
+#endif //LODEPNG_COMPILE_ZLIB
+} //namespace lodepng
+#endif /*LODEPNG_COMPILE_CPP*/
+
+/*
+TODO:
+[.] test if there are no memory leaks or security exploits - done a lot but needs to be checked often
+[.] check compatibility with vareous compilers  - done but needs to be redone for every newer version
+[X] converting color to 16-bit per channel types
+[ ] read all public PNG chunk types (but never let the color profile and gamma ones touch RGB values)
+[ ] make sure encoder generates no chunks with size > (2^31)-1
+[ ] partial decoding (stream processing)
+[X] let the "isFullyOpaque" function check color keys and transparent palettes too
+[X] better name for the variables "codes", "codesD", "codelengthcodes", "clcl" and "lldl"
+[ ] don't stop decoding on errors like 69, 57, 58 (make warnings)
+[ ] make option to choose if the raw image with non multiple of 8 bits per scanline should have padding bits or not
+[ ] let the C++ wrapper catch exceptions coming from the standard library and return LodePNG error codes
+*/
+
+#endif /*LODEPNG_H inclusion guard*/
+
+/*
+LodePNG Documentation
+---------------------
+
+0. table of contents
+--------------------
+
+  1. about
+   1.1. supported features
+   1.2. features not supported
+  2. C and C++ version
+  3. security
+  4. decoding
+  5. encoding
+  6. color conversions
+    6.1. PNG color types
+    6.2. color conversions
+    6.3. padding bits
+    6.4. A note about 16-bits per channel and endianness
+  7. error values
+  8. chunks and PNG editing
+  9. compiler support
+  10. examples
+   10.1. decoder C++ example
+   10.2. decoder C example
+  11. changes
+  12. contact information
+
+
+1. about
+--------
+
+PNG is a file format to store raster images losslessly with good compression,
+supporting different color types and alpha channel.
+
+LodePNG is a PNG codec according to the Portable Network Graphics (PNG)
+Specification (Second Edition) - W3C Recommendation 10 November 2003.
+
+The specifications used are:
+
+*) Portable Network Graphics (PNG) Specification (Second Edition):
+     http://www.w3.org/TR/2003/REC-PNG-20031110
+*) RFC 1950 ZLIB Compressed Data Format version 3.3:
+     http://www.gzip.org/zlib/rfc-zlib.html
+*) RFC 1951 DEFLATE Compressed Data Format Specification ver 1.3:
+     http://www.gzip.org/zlib/rfc-deflate.html
+
+The most recent version of LodePNG can currently be found at
+http://lodev.org/lodepng/
+
+LodePNG works both in C (ISO C90) and C++, with a C++ wrapper that adds
+extra functionality.
+
+LodePNG exists out of two files:
+-lodepng.h: the header file for both C and C++
+-lodepng.c(pp): give it the name lodepng.c or lodepng.cpp (or .cc) depending on your usage
+
+If you want to start using LodePNG right away without reading this doc, get the
+examples from the LodePNG website to see how to use it in code, or check the
+smaller examples in chapter 13 here.
+
+LodePNG is simple but only supports the basic requirements. To achieve
+simplicity, the following design choices were made: There are no dependencies
+on any external library. There are functions to decode and encode a PNG with
+a single function call, and extended versions of these functions taking a
+LodePNGState struct allowing to specify or get more information. By default
+the colors of the raw image are always RGB or RGBA, no matter what color type
+the PNG file uses. To read and write files, there are simple functions to
+convert the files to/from buffers in memory.
+
+This all makes LodePNG suitable for loading textures in games, demos and small
+programs, ... It's less suitable for full fledged image editors, loading PNGs
+over network (it requires all the image data to be available before decoding can
+begin), life-critical systems, ...
+
+1.1. supported features
+-----------------------
+
+The following features are supported by the decoder:
+
+*) decoding of PNGs with any color type, bit depth and interlace mode, to a 24- or 32-bit color raw image,
+   or the same color type as the PNG
+*) encoding of PNGs, from any raw image to 24- or 32-bit color, or the same color type as the raw image
+*) Adam7 interlace and deinterlace for any color type
+*) loading the image from harddisk or decoding it from a buffer from other sources than harddisk
+*) support for alpha channels, including RGBA color model, translucent palettes and color keying
+*) zlib decompression (inflate)
+*) zlib compression (deflate)
+*) CRC32 and ADLER32 checksums
+*) handling of unknown chunks, allowing making a PNG editor that stores custom and unknown chunks.
+*) the following chunks are supported (generated/interpreted) by both encoder and decoder:
+    IHDR: header information
+    PLTE: color palette
+    IDAT: pixel data
+    IEND: the final chunk
+    tRNS: transparency for palettized images
+    tEXt: textual information
+    zTXt: compressed textual information
+    iTXt: international textual information
+    bKGD: suggested background color
+    pHYs: physical dimensions
+    tIME: modification time
+
+1.2. features not supported
+---------------------------
+
+The following features are _not_ supported:
+
+*) some features needed to make a conformant PNG-Editor might be still missing.
+*) partial loading/stream processing. All data must be available and is processed in one call.
+*) The following public chunks are not supported but treated as unknown chunks by LodePNG
+    cHRM, gAMA, iCCP, sRGB, sBIT, hIST, sPLT
+   Some of these are not supported on purpose: LodePNG wants to provide the RGB values
+   stored in the pixels, not values modified by system dependent gamma or color models.
+
+
+2. C and C++ version
+--------------------
+
+The C version uses buffers allocated with alloc that you need to free()
+yourself. You need to use init and cleanup functions for each struct whenever
+using a struct from the C version to avoid exploits and memory leaks.
+
+The C++ version has extra functions with std::vectors in the interface and the
+lodepng::State class which is a LodePNGState with constructor and destructor.
+
+These files work without modification for both C and C++ compilers because all
+the additional C++ code is in "#ifdef __cplusplus" blocks that make C-compilers
+ignore it, and the C code is made to compile both with strict ISO C90 and C++.
+
+To use the C++ version, you need to rename the source file to lodepng.cpp
+(instead of lodepng.c), and compile it with a C++ compiler.
+
+To use the C version, you need to rename the source file to lodepng.c (instead
+of lodepng.cpp), and compile it with a C compiler.
+
+
+3. Security
+-----------
+
+Even if carefully designed, it's always possible that LodePNG contains possible
+exploits. If you discover one, please let me know, and it will be fixed.
+
+When using LodePNG, care has to be taken with the C version of LodePNG, as well
+as the C-style structs when working with C++. The following conventions are used
+for all C-style structs:
+
+-if a struct has a corresponding init function, always call the init function when making a new one
+-if a struct has a corresponding cleanup function, call it before the struct disappears to avoid memory leaks
+-if a struct has a corresponding copy function, use the copy function instead of "=".
+ The destination must also be inited already.
+
+
+4. Decoding
+-----------
+
+Decoding converts a PNG compressed image to a raw pixel buffer.
+
+Most documentation on using the decoder is at its declarations in the header
+above. For C, simple decoding can be done with functions such as
+lodepng_decode32, and more advanced decoding can be done with the struct
+LodePNGState and lodepng_decode. For C++, all decoding can be done with the
+various lodepng::decode functions, and lodepng::State can be used for advanced
+features.
+
+When using the LodePNGState, it uses the following fields for decoding:
+*) LodePNGInfo info_png: it stores extra information about the PNG (the input) in here
+*) LodePNGColorMode info_raw: here you can say what color mode of the raw image (the output) you want to get
+*) LodePNGDecoderSettings decoder: you can specify a few extra settings for the decoder to use
+
+LodePNGInfo info_png
+--------------------
+
+After decoding, this contains extra information of the PNG image, except the actual
+pixels, width and height because these are already gotten directly from the decoder
+functions.
+
+It contains for example the original color type of the PNG image, text comments,
+suggested background color, etc... More details about the LodePNGInfo struct are
+at its declaration documentation.
+
+LodePNGColorMode info_raw
+-------------------------
+
+When decoding, here you can specify which color type you want
+the resulting raw image to be. If this is different from the colortype of the
+PNG, then the decoder will automatically convert the result. This conversion
+always works, except if you want it to convert a color PNG to greyscale or to
+a palette with missing colors.
+
+By default, 32-bit color is used for the result.
+
+LodePNGDecoderSettings decoder
+------------------------------
+
+The settings can be used to ignore the errors created by invalid CRC and Adler32
+chunks, and to disable the decoding of tEXt chunks.
+
+There's also a setting color_convert, true by default. If false, no conversion
+is done, the resulting data will be as it was in the PNG (after decompression)
+and you'll have to puzzle the colors of the pixels together yourself using the
+color type information in the LodePNGInfo.
+
+
+5. Encoding
+-----------
+
+Encoding converts a raw pixel buffer to a PNG compressed image.
+
+Most documentation on using the encoder is at its declarations in the header
+above. For C, simple encoding can be done with functions such as
+lodepng_encode32, and more advanced decoding can be done with the struct
+LodePNGState and lodepng_encode. For C++, all encoding can be done with the
+various lodepng::encode functions, and lodepng::State can be used for advanced
+features.
+
+Like the decoder, the encoder can also give errors. However it gives less errors
+since the encoder input is trusted, the decoder input (a PNG image that could
+be forged by anyone) is not trusted.
+
+When using the LodePNGState, it uses the following fields for encoding:
+*) LodePNGInfo info_png: here you specify how you want the PNG (the output) to be.
+*) LodePNGColorMode info_raw: here you say what color type of the raw image (the input) has
+*) LodePNGEncoderSettings encoder: you can specify a few settings for the encoder to use
+
+LodePNGInfo info_png
+--------------------
+
+When encoding, you use this the opposite way as when decoding: for encoding,
+you fill in the values you want the PNG to have before encoding. By default it's
+not needed to specify a color type for the PNG since it's automatically chosen,
+but it's possible to choose it yourself given the right settings.
+
+The encoder will not always exactly match the LodePNGInfo struct you give,
+it tries as close as possible. Some things are ignored by the encoder. The
+encoder uses, for example, the following settings from it when applicable:
+colortype and bitdepth, text chunks, time chunk, the color key, the palette, the
+background color, the interlace method, unknown chunks, ...
+
+When encoding to a PNG with colortype 3, the encoder will generate a PLTE chunk.
+If the palette contains any colors for which the alpha channel is not 255 (so
+there are translucent colors in the palette), it'll add a tRNS chunk.
+
+LodePNGColorMode info_raw
+-------------------------
+
+You specify the color type of the raw image that you give to the input here,
+including a possible transparent color key and palette you happen to be using in
+your raw image data.
+
+By default, 32-bit color is assumed, meaning your input has to be in RGBA
+format with 4 bytes (unsigned chars) per pixel.
+
+LodePNGEncoderSettings encoder
+------------------------------
+
+The following settings are supported (some are in sub-structs):
+*) auto_convert: when this option is enabled, the encoder will
+automatically choose the smallest possible color mode (including color key) that
+can encode the colors of all pixels without information loss.
+*) btype: the block type for LZ77. 0 = uncompressed, 1 = fixed huffman tree,
+   2 = dynamic huffman tree (best compression). Should be 2 for proper
+   compression.
+*) use_lz77: whether or not to use LZ77 for compressed block types. Should be
+   true for proper compression.
+*) windowsize: the window size used by the LZ77 encoder (1 - 32768). Has value
+   2048 by default, but can be set to 32768 for better, but slow, compression.
+*) force_palette: if colortype is 2 or 6, you can make the encoder write a PLTE
+   chunk if force_palette is true. This can used as suggested palette to convert
+   to by viewers that don't support more than 256 colors (if those still exist)
+*) add_id: add text chunk "Encoder: LodePNG <version>" to the image.
+*) text_compression: default 1. If 1, it'll store texts as zTXt instead of tEXt chunks.
+  zTXt chunks use zlib compression on the text. This gives a smaller result on
+  large texts but a larger result on small texts (such as a single program name).
+  It's all tEXt or all zTXt though, there's no separate setting per text yet.
+
+
+6. color conversions
+--------------------
+
+An important thing to note about LodePNG, is that the color type of the PNG, and
+the color type of the raw image, are completely independent. By default, when
+you decode a PNG, you get the result as a raw image in the color type you want,
+no matter whether the PNG was encoded with a palette, greyscale or RGBA color.
+And if you encode an image, by default LodePNG will automatically choose the PNG
+color type that gives good compression based on the values of colors and amount
+of colors in the image. It can be configured to let you control it instead as
+well, though.
+
+To be able to do this, LodePNG does conversions from one color mode to another.
+It can convert from almost any color type to any other color type, except the
+following conversions: RGB to greyscale is not supported, and converting to a
+palette when the palette doesn't have a required color is not supported. This is
+not supported on purpose: this is information loss which requires a color
+reduction algorithm that is beyong the scope of a PNG encoder (yes, RGB to grey
+is easy, but there are multiple ways if you want to give some channels more
+weight).
+
+By default, when decoding, you get the raw image in 32-bit RGBA or 24-bit RGB
+color, no matter what color type the PNG has. And by default when encoding,
+LodePNG automatically picks the best color model for the output PNG, and expects
+the input image to be 32-bit RGBA or 24-bit RGB. So, unless you want to control
+the color format of the images yourself, you can skip this chapter.
+
+6.1. PNG color types
+--------------------
+
+A PNG image can have many color types, ranging from 1-bit color to 64-bit color,
+as well as palettized color modes. After the zlib decompression and unfiltering
+in the PNG image is done, the raw pixel data will have that color type and thus
+a certain amount of bits per pixel. If you want the output raw image after
+decoding to have another color type, a conversion is done by LodePNG.
+
+The PNG specification gives the following color types:
+
+0: greyscale, bit depths 1, 2, 4, 8, 16
+2: RGB, bit depths 8 and 16
+3: palette, bit depths 1, 2, 4 and 8
+4: greyscale with alpha, bit depths 8 and 16
+6: RGBA, bit depths 8 and 16
+
+Bit depth is the amount of bits per pixel per color channel. So the total amount
+of bits per pixel is: amount of channels * bitdepth.
+
+6.2. color conversions
+----------------------
+
+As explained in the sections about the encoder and decoder, you can specify
+color types and bit depths in info_png and info_raw to change the default
+behaviour.
+
+If, when decoding, you want the raw image to be something else than the default,
+you need to set the color type and bit depth you want in the LodePNGColorMode,
+or the parameters of the simple function of LodePNG you're using.
+
+If, when encoding, you use another color type than the default in the input
+image, you need to specify its color type and bit depth in the LodePNGColorMode
+of the raw image, or use the parameters of the simplefunction of LodePNG you're
+using.
+
+If, when encoding, you don't want LodePNG to choose the output PNG color type
+but control it yourself, you need to set auto_convert in the encoder settings
+to LAC_NONE, and specify the color type you want in the LodePNGInfo of the
+encoder.
+
+If you do any of the above, LodePNG may need to do a color conversion, which
+follows the rules below, and may sometimes not be allowed.
+
+To avoid some confusion:
+-the decoder converts from PNG to raw image
+-the encoder converts from raw image to PNG
+-the colortype and bitdepth in LodePNGColorMode info_raw, are those of the raw image
+-the colortype and bitdepth in the color field of LodePNGInfo info_png, are those of the PNG
+-when encoding, the color type in LodePNGInfo is ignored if auto_convert
+ is enabled, it is automatically generated instead
+-when decoding, the color type in LodePNGInfo is set by the decoder to that of the original
+ PNG image, but it can be ignored since the raw image has the color type you requested instead
+-if the color type of the LodePNGColorMode and PNG image aren't the same, a conversion
+ between the color types is done if the color types are supported. If it is not
+ supported, an error is returned. If the types are the same, no conversion is done.
+-even though some conversions aren't supported, LodePNG supports loading PNGs from any
+ colortype and saving PNGs to any colortype, sometimes it just requires preparing
+ the raw image correctly before encoding.
+-both encoder and decoder use the same color converter.
+
+Non supported color conversions:
+-color to greyscale: no error is thrown, but the result will look ugly because
+only the red channel is taken
+-anything, to palette when that palette does not have that color in it: in this
+case an error is thrown
+
+Supported color conversions:
+-anything to 8-bit RGB, 8-bit RGBA, 16-bit RGB, 16-bit RGBA
+-any grey or grey+alpha, to grey or grey+alpha
+-anything to a palette, as long as the palette has the requested colors in it
+-removing alpha channel
+-higher to smaller bitdepth, and vice versa
+
+If you want no color conversion to be done:
+-In the encoder, you can make it save a PNG with any color type by giving the
+raw color mode and LodePNGInfo the same color mode, and setting auto_convert to
+LAC_NO.
+-In the decoder, you can make it store the pixel data in the same color type
+as the PNG has, by setting the color_convert setting to false. Settings in
+info_raw are then ignored.
+
+The function lodepng_convert does the color conversion. It is available in the
+interface but normally isn't needed since the encoder and decoder already call
+it.
+
+6.3. padding bits
+-----------------
+
+In the PNG file format, if a less than 8-bit per pixel color type is used and the scanlines
+have a bit amount that isn't a multiple of 8, then padding bits are used so that each
+scanline starts at a fresh byte. But that is NOT true for the LodePNG raw input and output.
+The raw input image you give to the encoder, and the raw output image you get from the decoder
+will NOT have these padding bits, e.g. in the case of a 1-bit image with a width
+of 7 pixels, the first pixel of the second scanline will the the 8th bit of the first byte,
+not the first bit of a new byte.
+
+6.4. A note about 16-bits per channel and endianness
+----------------------------------------------------
+
+LodePNG uses unsigned char arrays for 16-bit per channel colors too, just like
+for any other color format. The 16-bit values are stored in big endian (most
+significant byte first) in these arrays. This is the opposite order of the
+little endian used by x86 CPU's.
+
+LodePNG always uses big endian because the PNG file format does so internally.
+Conversions to other formats than PNG uses internally are not supported by
+LodePNG on purpose, there are myriads of formats, including endianness of 16-bit
+colors, the order in which you store R, G, B and A, and so on. Supporting and
+converting to/from all that is outside the scope of LodePNG.
+
+This may mean that, depending on your use case, you may want to convert the big
+endian output of LodePNG to little endian with a for loop. This is certainly not
+always needed, many applications and libraries support big endian 16-bit colors
+anyway, but it means you cannot simply cast the unsigned char* buffer to an
+unsigned short* buffer on x86 CPUs.
+
+
+7. error values
+---------------
+
+All functions in LodePNG that return an error code, return 0 if everything went
+OK, or a non-zero code if there was an error.
+
+The meaning of the LodePNG error values can be retrieved with the function
+lodepng_error_text: given the numerical error code, it returns a description
+of the error in English as a string.
+
+Check the implementation of lodepng_error_text to see the meaning of each code.
+
+
+8. chunks and PNG editing
+-------------------------
+
+If you want to add extra chunks to a PNG you encode, or use LodePNG for a PNG
+editor that should follow the rules about handling of unknown chunks, or if your
+program is able to read other types of chunks than the ones handled by LodePNG,
+then that's possible with the chunk functions of LodePNG.
+
+A PNG chunk has the following layout:
+
+4 bytes length
+4 bytes type name
+length bytes data
+4 bytes CRC
+
+8.1. iterating through chunks
+-----------------------------
+
+If you have a buffer containing the PNG image data, then the first chunk (the
+IHDR chunk) starts at byte number 8 of that buffer. The first 8 bytes are the
+signature of the PNG and are not part of a chunk. But if you start at byte 8
+then you have a chunk, and can check the following things of it.
+
+NOTE: none of these functions check for memory buffer boundaries. To avoid
+exploits, always make sure the buffer contains all the data of the chunks.
+When using lodepng_chunk_next, make sure the returned value is within the
+allocated memory.
+
+unsigned lodepng_chunk_length(const unsigned char* chunk):
+
+Get the length of the chunk's data. The total chunk length is this length + 12.
+
+void lodepng_chunk_type(char type[5], const unsigned char* chunk):
+unsigned char lodepng_chunk_type_equals(const unsigned char* chunk, const char* type):
+
+Get the type of the chunk or compare if it's a certain type
+
+unsigned char lodepng_chunk_critical(const unsigned char* chunk):
+unsigned char lodepng_chunk_private(const unsigned char* chunk):
+unsigned char lodepng_chunk_safetocopy(const unsigned char* chunk):
+
+Check if the chunk is critical in the PNG standard (only IHDR, PLTE, IDAT and IEND are).
+Check if the chunk is private (public chunks are part of the standard, private ones not).
+Check if the chunk is safe to copy. If it's not, then, when modifying data in a critical
+chunk, unsafe to copy chunks of the old image may NOT be saved in the new one if your
+program doesn't handle that type of unknown chunk.
+
+unsigned char* lodepng_chunk_data(unsigned char* chunk):
+const unsigned char* lodepng_chunk_data_const(const unsigned char* chunk):
+
+Get a pointer to the start of the data of the chunk.
+
+unsigned lodepng_chunk_check_crc(const unsigned char* chunk):
+void lodepng_chunk_generate_crc(unsigned char* chunk):
+
+Check if the crc is correct or generate a correct one.
+
+unsigned char* lodepng_chunk_next(unsigned char* chunk):
+const unsigned char* lodepng_chunk_next_const(const unsigned char* chunk):
+
+Iterate to the next chunk. This works if you have a buffer with consecutive chunks. Note that these
+functions do no boundary checking of the allocated data whatsoever, so make sure there is enough
+data available in the buffer to be able to go to the next chunk.
+
+unsigned lodepng_chunk_append(unsigned char** out, size_t* outlength, const unsigned char* chunk):
+unsigned lodepng_chunk_create(unsigned char** out, size_t* outlength, unsigned length,
+                              const char* type, const unsigned char* data):
+
+These functions are used to create new chunks that are appended to the data in *out that has
+length *outlength. The append function appends an existing chunk to the new data. The create
+function creates a new chunk with the given parameters and appends it. Type is the 4-letter
+name of the chunk.
+
+8.2. chunks in info_png
+-----------------------
+
+The LodePNGInfo struct contains fields with the unknown chunk in it. It has 3
+buffers (each with size) to contain 3 types of unknown chunks:
+the ones that come before the PLTE chunk, the ones that come between the PLTE
+and the IDAT chunks, and the ones that come after the IDAT chunks.
+It's necessary to make the distionction between these 3 cases because the PNG
+standard forces to keep the ordering of unknown chunks compared to the critical
+chunks, but does not force any other ordering rules.
+
+info_png.unknown_chunks_data[0] is the chunks before PLTE
+info_png.unknown_chunks_data[1] is the chunks after PLTE, before IDAT
+info_png.unknown_chunks_data[2] is the chunks after IDAT
+
+The chunks in these 3 buffers can be iterated through and read by using the same
+way described in the previous subchapter.
+
+When using the decoder to decode a PNG, you can make it store all unknown chunks
+if you set the option settings.remember_unknown_chunks to 1. By default, this
+option is off (0).
+
+The encoder will always encode unknown chunks that are stored in the info_png.
+If you need it to add a particular chunk that isn't known by LodePNG, you can
+use lodepng_chunk_append or lodepng_chunk_create to the chunk data in
+info_png.unknown_chunks_data[x].
+
+Chunks that are known by LodePNG should not be added in that way. E.g. to make
+LodePNG add a bKGD chunk, set background_defined to true and add the correct
+parameters there instead.
+
+
+9. compiler support
+-------------------
+
+No libraries other than the current standard C library are needed to compile
+LodePNG. For the C++ version, only the standard C++ library is needed on top.
+Add the files lodepng.c(pp) and lodepng.h to your project, include
+lodepng.h where needed, and your program can read/write PNG files.
+
+If performance is important, use optimization when compiling! For both the
+encoder and decoder, this makes a large difference.
+
+Make sure that LodePNG is compiled with the same compiler of the same version
+and with the same settings as the rest of the program, or the interfaces with
+std::vectors and std::strings in C++ can be incompatible.
+
+CHAR_BITS must be 8 or higher, because LodePNG uses unsigned chars for octets.
+
+*) gcc and g++
+
+LodePNG is developed in gcc so this compiler is natively supported. It gives no
+warnings with compiler options "-Wall -Wextra -pedantic -ansi", with gcc and g++
+version 4.7.1 on Linux, 32-bit and 64-bit.
+
+*) Mingw
+
+The Mingw compiler (a port of gcc) for Windows is fully supported by LodePNG.
+
+*) Visual Studio 2005 and up, Visual C++ Express Edition 2005 and up
+
+Visual Studio may give warnings about 'fopen' being deprecated. A multiplatform library
+can't support the proposed Visual Studio alternative however, so LodePNG keeps using
+fopen. If you don't want to see the deprecated warnings, put this on top of lodepng.h
+before the inclusions:
+#define _CRT_SECURE_NO_DEPRECATE
+
+Other than the above warnings, LodePNG should be warning-free with warning
+level 3 (W3). Warning level 4 (W4) will give warnings about integer conversions.
+I'm not planning to resolve these warnings. To get rid of them, let Visual
+Studio use warning level W3 for lodepng.cpp only: right click lodepng.cpp,
+Properties, C/C++, General, Warning Level: Level 3 (/W3).
+
+Visual Studio may want "stdafx.h" files to be included in each source file and
+give an error "unexpected end of file while looking for precompiled header".
+That is not standard C++ and will not be added to the stock LodePNG. You can
+disable it for lodepng.cpp only by right clicking it, Properties, C/C++,
+Precompiled Headers, and set it to Not Using Precompiled Headers there.
+
+*) Visual Studio 6.0
+
+LodePNG support for Visual Studio 6.0 is not guaranteed because VS6 doesn't
+follow the C++ standard correctly.
+
+*) Comeau C/C++
+
+Vesion 20070107 compiles without problems on the Comeau C/C++ Online Test Drive
+at http://www.comeaucomputing.com/tryitout in both C90 and C++ mode.
+
+*) Compilers on Macintosh
+
+LodePNG has been reported to work both with the gcc and LLVM for Macintosh, both
+for C and C++.
+
+*) Other Compilers
+
+If you encounter problems on other compilers, feel free to let me know and I may
+try to fix it if the compiler is modern standards complient.
+
+
+10. examples
+------------
+
+This decoder example shows the most basic usage of LodePNG. More complex
+examples can be found on the LodePNG website.
+
+10.1. decoder C++ example
+-------------------------
+
+#include "lodepng.h"
+#include <iostream>
+
+int main(int argc, char *argv[])
+{
+  const char* filename = argc > 1 ? argv[1] : "test.png";
+
+  //load and decode
+  std::vector<unsigned char> image;
+  unsigned width, height;
+  unsigned error = lodepng::decode(image, width, height, filename);
+
+  //if there's an error, display it
+  if(error) std::cout << "decoder error " << error << ": " << lodepng_error_text(error) << std::endl;
+
+  //the pixels are now in the vector "image", 4 bytes per pixel, ordered RGBARGBA..., use it as texture, draw it, ...
+}
+
+10.2. decoder C example
+-----------------------
+
+#include "lodepng.h"
+
+int main(int argc, char *argv[])
+{
+  unsigned error;
+  unsigned char* image;
+  size_t width, height;
+  const char* filename = argc > 1 ? argv[1] : "test.png";
+
+  error = lodepng_decode32_file(&image, &width, &height, filename);
+
+  if(error) printf("decoder error %u: %s\n", error, lodepng_error_text(error));
+
+  / * use image here * /
+
+  free(image);
+  return 0;
+}
+
+
+11. changes
+-----------
+
+The version number of LodePNG is the date of the change given in the format
+yyyymmdd.
+
+Some changes aren't backwards compatible. Those are indicated with a (!)
+symbol.
+
+*) 22 dec 2013: Power of two windowsize required for optimization.
+*) 15 apr 2013: Fixed bug with LAC_ALPHA and color key.
+*) 25 mar 2013: Added an optional feature to ignore some PNG errors (fix_png).
+*) 11 mar 2013 (!): Bugfix with custom free. Changed from "my" to "lodepng_"
+    prefix for the custom allocators and made it possible with a new #define to
+    use custom ones in your project without needing to change lodepng's code.
+*) 28 jan 2013: Bugfix with color key.
+*) 27 okt 2012: Tweaks in text chunk keyword length error handling.
+*) 8 okt 2012 (!): Added new filter strategy (entropy) and new auto color mode.
+    (no palette). Better deflate tree encoding. New compression tweak settings.
+    Faster color conversions while decoding. Some internal cleanups.
+*) 23 sep 2012: Reduced warnings in Visual Studio a little bit.
+*) 1 sep 2012 (!): Removed #define's for giving custom (de)compression functions
+    and made it work with function pointers instead.
+*) 23 jun 2012: Added more filter strategies. Made it easier to use custom alloc
+    and free functions and toggle #defines from compiler flags. Small fixes.
+*) 6 may 2012 (!): Made plugging in custom zlib/deflate functions more flexible.
+*) 22 apr 2012 (!): Made interface more consistent, renaming a lot. Removed
+    redundant C++ codec classes. Reduced amount of structs. Everything changed,
+    but it is cleaner now imho and functionality remains the same. Also fixed
+    several bugs and shrinked the implementation code. Made new samples.
+*) 6 nov 2011 (!): By default, the encoder now automatically chooses the best
+    PNG color model and bit depth, based on the amount and type of colors of the
+    raw image. For this, autoLeaveOutAlphaChannel replaced by auto_choose_color.
+*) 9 okt 2011: simpler hash chain implementation for the encoder.
+*) 8 sep 2011: lz77 encoder lazy matching instead of greedy matching.
+*) 23 aug 2011: tweaked the zlib compression parameters after benchmarking.
+    A bug with the PNG filtertype heuristic was fixed, so that it chooses much
+    better ones (it's quite significant). A setting to do an experimental, slow,
+    brute force search for PNG filter types is added.
+*) 17 aug 2011 (!): changed some C zlib related function names.
+*) 16 aug 2011: made the code less wide (max 120 characters per line).
+*) 17 apr 2011: code cleanup. Bugfixes. Convert low to 16-bit per sample colors.
+*) 21 feb 2011: fixed compiling for C90. Fixed compiling with sections disabled.
+*) 11 dec 2010: encoding is made faster, based on suggestion by Peter Eastman
+    to optimize long sequences of zeros.
+*) 13 nov 2010: added LodePNG_InfoColor_hasPaletteAlpha and
+    LodePNG_InfoColor_canHaveAlpha functions for convenience.
+*) 7 nov 2010: added LodePNG_error_text function to get error code description.
+*) 30 okt 2010: made decoding slightly faster
+*) 26 okt 2010: (!) changed some C function and struct names (more consistent).
+     Reorganized the documentation and the declaration order in the header.
+*) 08 aug 2010: only changed some comments and external samples.
+*) 05 jul 2010: fixed bug thanks to warnings in the new gcc version.
+*) 14 mar 2010: fixed bug where too much memory was allocated for char buffers.
+*) 02 sep 2008: fixed bug where it could create empty tree that linux apps could
+    read by ignoring the problem but windows apps couldn't.
+*) 06 jun 2008: added more error checks for out of memory cases.
+*) 26 apr 2008: added a few more checks here and there to ensure more safety.
+*) 06 mar 2008: crash with encoding of strings fixed
+*) 02 feb 2008: support for international text chunks added (iTXt)
+*) 23 jan 2008: small cleanups, and #defines to divide code in sections
+*) 20 jan 2008: support for unknown chunks allowing using LodePNG for an editor.
+*) 18 jan 2008: support for tIME and pHYs chunks added to encoder and decoder.
+*) 17 jan 2008: ability to encode and decode compressed zTXt chunks added
+    Also vareous fixes, such as in the deflate and the padding bits code.
+*) 13 jan 2008: Added ability to encode Adam7-interlaced images. Improved
+    filtering code of encoder.
+*) 07 jan 2008: (!) changed LodePNG to use ISO C90 instead of C++. A
+    C++ wrapper around this provides an interface almost identical to before.
+    Having LodePNG be pure ISO C90 makes it more portable. The C and C++ code
+    are together in these files but it works both for C and C++ compilers.
+*) 29 dec 2007: (!) changed most integer types to unsigned int + other tweaks
+*) 30 aug 2007: bug fixed which makes this Borland C++ compatible
+*) 09 aug 2007: some VS2005 warnings removed again
+*) 21 jul 2007: deflate code placed in new namespace separate from zlib code
+*) 08 jun 2007: fixed bug with 2- and 4-bit color, and small interlaced images
+*) 04 jun 2007: improved support for Visual Studio 2005: crash with accessing
+    invalid std::vector element [0] fixed, and level 3 and 4 warnings removed
+*) 02 jun 2007: made the encoder add a tag with version by default
+*) 27 may 2007: zlib and png code separated (but still in the same file),
+    simple encoder/decoder functions added for more simple usage cases
+*) 19 may 2007: minor fixes, some code cleaning, new error added (error 69),
+    moved some examples from here to lodepng_examples.cpp
+*) 12 may 2007: palette decoding bug fixed
+*) 24 apr 2007: changed the license from BSD to the zlib license
+*) 11 mar 2007: very simple addition: ability to encode bKGD chunks.
+*) 04 mar 2007: (!) tEXt chunk related fixes, and support for encoding
+    palettized PNG images. Plus little interface change with palette and texts.
+*) 03 mar 2007: Made it encode dynamic Huffman shorter with repeat codes.
+    Fixed a bug where the end code of a block had length 0 in the Huffman tree.
+*) 26 feb 2007: Huffman compression with dynamic trees (BTYPE 2) now implemented
+    and supported by the encoder, resulting in smaller PNGs at the output.
+*) 27 jan 2007: Made the Adler-32 test faster so that a timewaste is gone.
+*) 24 jan 2007: gave encoder an error interface. Added color conversion from any
+    greyscale type to 8-bit greyscale with or without alpha.
+*) 21 jan 2007: (!) Totally changed the interface. It allows more color types
+    to convert to and is more uniform. See the manual for how it works now.
+*) 07 jan 2007: Some cleanup & fixes, and a few changes over the last days:
+    encode/decode custom tEXt chunks, separate classes for zlib & deflate, and
+    at last made the decoder give errors for incorrect Adler32 or Crc.
+*) 01 jan 2007: Fixed bug with encoding PNGs with less than 8 bits per channel.
+*) 29 dec 2006: Added support for encoding images without alpha channel, and
+    cleaned out code as well as making certain parts faster.
+*) 28 dec 2006: Added "Settings" to the encoder.
+*) 26 dec 2006: The encoder now does LZ77 encoding and produces much smaller files now.
+    Removed some code duplication in the decoder. Fixed little bug in an example.
+*) 09 dec 2006: (!) Placed output parameters of public functions as first parameter.
+    Fixed a bug of the decoder with 16-bit per color.
+*) 15 okt 2006: Changed documentation structure
+*) 09 okt 2006: Encoder class added. It encodes a valid PNG image from the
+    given image buffer, however for now it's not compressed.
+*) 08 sep 2006: (!) Changed to interface with a Decoder class
+*) 30 jul 2006: (!) LodePNG_InfoPng , width and height are now retrieved in different
+    way. Renamed decodePNG to decodePNGGeneric.
+*) 29 jul 2006: (!) Changed the interface: image info is now returned as a
+    struct of type LodePNG::LodePNG_Info, instead of a vector, which was a bit clumsy.
+*) 28 jul 2006: Cleaned the code and added new error checks.
+    Corrected terminology "deflate" into "inflate".
+*) 23 jun 2006: Added SDL example in the documentation in the header, this
+    example allows easy debugging by displaying the PNG and its transparency.
+*) 22 jun 2006: (!) Changed way to obtain error value. Added
+    loadFile function for convenience. Made decodePNG32 faster.
+*) 21 jun 2006: (!) Changed type of info vector to unsigned.
+    Changed position of palette in info vector. Fixed an important bug that
+    happened on PNGs with an uncompressed block.
+*) 16 jun 2006: Internally changed unsigned into unsigned where
+    needed, and performed some optimizations.
+*) 07 jun 2006: (!) Renamed functions to decodePNG and placed them
+    in LodePNG namespace. Changed the order of the parameters. Rewrote the
+    documentation in the header. Renamed files to lodepng.cpp and lodepng.h
+*) 22 apr 2006: Optimized and improved some code
+*) 07 sep 2005: (!) Changed to std::vector interface
+*) 12 aug 2005: Initial release (C++, decoder only)
+
+
+12. contact information
+-----------------------
+
+Feel free to contact me with suggestions, problems, comments, ... concerning
+LodePNG. If you encounter a PNG image that doesn't work properly with this
+decoder, feel free to send it and I'll use it to find and fix the problem.
+
+My email address is (puzzle the account and domain together with an @ symbol):
+Domain: gmail dot com.
+Account: lode dot vandevenne.
+
+
+Copyright (c) 2005-2013 Lode Vandevenne
+*/
diff --git a/src/zopflipng/lodepng/lodepng_util.cpp b/src/zopflipng/lodepng/lodepng_util.cpp
new file mode 100644
index 0000000..a429b69
--- /dev/null
+++ b/src/zopflipng/lodepng/lodepng_util.cpp
@@ -0,0 +1,656 @@
+/*
+LodePNG Utils
+
+Copyright (c) 2005-2012 Lode Vandevenne
+
+This software is provided 'as-is', without any express or implied
+warranty. In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+    1. The origin of this software must not be misrepresented; you must not
+    claim that you wrote the original software. If you use this software
+    in a product, an acknowledgment in the product documentation would be
+    appreciated but is not required.
+
+    2. Altered source versions must be plainly marked as such, and must not be
+    misrepresented as being the original software.
+
+    3. This notice may not be removed or altered from any source
+    distribution.
+*/
+
+#include "lodepng_util.h"
+#include <iostream>
+
+namespace lodepng
+{
+
+LodePNGInfo getPNGHeaderInfo(const std::vector<unsigned char>& png)
+{
+  unsigned w, h;
+  lodepng::State state;
+  lodepng_inspect(&w, &h, &state, &png[0], png.size());
+  return state.info_png;
+}
+
+unsigned getChunkInfo(std::vector<std::string>& names, std::vector<size_t>& sizes,
+                      const std::vector<unsigned char>& png)
+{
+  // Listing chunks is based on the original file, not the decoded png info.
+  const unsigned char *chunk, *begin, *end;
+  end = &png.back() + 1;
+  begin = chunk = &png.front() + 8;
+
+  while(chunk + 8 < end && chunk >= begin)
+  {
+    char type[5];
+    lodepng_chunk_type(type, chunk);
+    if(std::string(type).size() != 4) return 1;
+
+    names.push_back(type);
+    sizes.push_back(lodepng_chunk_length(chunk));
+
+    chunk = lodepng_chunk_next_const(chunk);
+  }
+  return 0;
+}
+
+unsigned getChunks(std::vector<std::string> names[3],
+                   std::vector<std::vector<unsigned char> > chunks[3],
+                   const std::vector<unsigned char>& png)
+{
+  const unsigned char *chunk, *next, *begin, *end;
+  end = &png.back() + 1;
+  begin = chunk = &png.front() + 8;
+
+  int location = 0;
+
+  while(chunk + 8 < end && chunk >= begin)
+  {
+    char type[5];
+    lodepng_chunk_type(type, chunk);
+    std::string name(type);
+    if(name.size() != 4) return 1;
+
+    next = lodepng_chunk_next_const(chunk);
+
+    if(name == "IHDR")
+    {
+      location = 0;
+    }
+    else if(name == "PLTE")
+    {
+      location = 1;
+    }
+    else if(name == "IDAT")
+    {
+      location = 2;
+    }
+    else if(name != "IEND")
+    {
+      names[location].push_back(name);
+      chunks[location].push_back(std::vector<unsigned char>(chunk, next));
+    }
+
+    chunk = next;
+  }
+  return 0;
+}
+
+
+unsigned insertChunks(std::vector<unsigned char>& png,
+                      const std::vector<std::vector<unsigned char> > chunks[3])
+{
+  const unsigned char *chunk, *next, *begin, *end;
+  end = &png.back() + 1;
+  begin = chunk = &png.front() + 8;
+
+  size_t l0 = 0; //location 0: IHDR-l0-PLTE (or IHDR-l0-l1-IDAT)
+  size_t l1 = 0; //location 1: PLTE-l1-IDAT (or IHDR-l0-l1-IDAT)
+  size_t l2 = 0; //location 2: IDAT-l2-IEND
+
+  while(chunk + 8 < end && chunk >= begin)
+  {
+    char type[5];
+    lodepng_chunk_type(type, chunk);
+    std::string name(type);
+    if(name.size() != 4) return 1;
+
+    next = lodepng_chunk_next_const(chunk);
+
+    if(name == "PLTE")
+    {
+      if(l0 == 0) l0 = chunk - begin + 8;
+    }
+    else if(name == "IDAT")
+    {
+      if(l0 == 0) l0 = chunk - begin + 8;
+      if(l1 == 0) l1 = chunk - begin + 8;
+    }
+    else if(name == "IEND")
+    {
+      if(l2 == 0) l2 = chunk - begin + 8;
+    }
+
+    chunk = next;
+  }
+
+  std::vector<unsigned char> result;
+  result.insert(result.end(), png.begin(), png.begin() + l0);
+  for(size_t i = 0; i < chunks[0].size(); i++) result.insert(result.end(), chunks[0][i].begin(), chunks[0][i].end());
+  result.insert(result.end(), png.begin() + l0, png.begin() + l1);
+  for(size_t i = 0; i < chunks[1].size(); i++) result.insert(result.end(), chunks[1][i].begin(), chunks[1][i].end());
+  result.insert(result.end(), png.begin() + l1, png.begin() + l2);
+  for(size_t i = 0; i < chunks[2].size(); i++) result.insert(result.end(), chunks[2][i].begin(), chunks[2][i].end());
+  result.insert(result.end(), png.begin() + l2, png.end());
+
+  png = result;
+  return 0;
+}
+
+unsigned getFilterTypesInterlaced(std::vector<std::vector<unsigned char> >& filterTypes,
+                                  const std::vector<unsigned char>& png)
+{
+  //Get color type and interlace type
+  lodepng::State state;
+  unsigned w, h;
+  unsigned error;
+  error = lodepng_inspect(&w, &h, &state, &png[0], png.size());
+
+  if(error) return 1;
+
+  //Read literal data from all IDAT chunks
+  const unsigned char *chunk, *begin, *end;
+  end = &png.back() + 1;
+  begin = chunk = &png.front() + 8;
+
+  std::vector<unsigned char> zdata;
+
+  while(chunk + 8 < end && chunk >= begin)
+  {
+    char type[5];
+    lodepng_chunk_type(type, chunk);
+    if(std::string(type).size() != 4) return 1; //Probably not a PNG file
+
+    if(std::string(type) == "IDAT")
+    {
+      const unsigned char* cdata = lodepng_chunk_data_const(chunk);
+      unsigned clength = lodepng_chunk_length(chunk);
+
+      for(unsigned i = 0; i < clength; i++)
+      {
+        zdata.push_back(cdata[i]);
+      }
+    }
+
+    chunk = lodepng_chunk_next_const(chunk);
+  }
+
+  //Decompress all IDAT data
+  std::vector<unsigned char> data;
+  error = lodepng::decompress(data, &zdata[0], zdata.size());
+
+  if(error) return 1;
+
+  if(state.info_png.interlace_method == 0)
+  {
+    filterTypes.resize(1);
+
+    //A line is 1 filter byte + all pixels
+    size_t linebytes = 1 + lodepng_get_raw_size(w, 1, &state.info_png.color);
+
+    for(size_t i = 0; i < data.size(); i += linebytes)
+    {
+      filterTypes[0].push_back(data[i]);
+    }
+  }
+  else
+  {
+    //Interlaced
+    filterTypes.resize(7);
+    static const unsigned ADAM7_IX[7] = { 0, 4, 0, 2, 0, 1, 0 }; /*x start values*/
+    static const unsigned ADAM7_IY[7] = { 0, 0, 4, 0, 2, 0, 1 }; /*y start values*/
+    static const unsigned ADAM7_DX[7] = { 8, 8, 4, 4, 2, 2, 1 }; /*x delta values*/
+    static const unsigned ADAM7_DY[7] = { 8, 8, 8, 4, 4, 2, 2 }; /*y delta values*/
+    size_t pos = 0;
+    for(int j = 0; j < 7; j++)
+    {
+      unsigned w2 = (w - ADAM7_IX[j] + ADAM7_DX[j] - 1) / ADAM7_DX[j];
+      unsigned h2 = (h - ADAM7_IY[j] + ADAM7_DY[j] - 1) / ADAM7_DY[j];
+      if(ADAM7_IX[j] >= w || ADAM7_IY[j] >= h) w2 = h2 = 0;
+      size_t linebytes = 1 + lodepng_get_raw_size(w2, 1, &state.info_png.color);
+      for(size_t i = 0; i < h2; i++)
+      {
+        filterTypes[j].push_back(data[pos]);
+        pos += linebytes;
+      }
+    }
+  }
+  return 0; /* OK */
+}
+
+
+unsigned getFilterTypes(std::vector<unsigned char>& filterTypes, const std::vector<unsigned char>& png)
+{
+  std::vector<std::vector<unsigned char> > passes;
+  unsigned error = getFilterTypesInterlaced(passes, png);
+  if(error) return error;
+
+  if(passes.size() == 1)
+  {
+    filterTypes.swap(passes[0]);
+  }
+  else
+  {
+    lodepng::State state;
+    unsigned w, h;
+    lodepng_inspect(&w, &h, &state, &png[0], png.size());
+    /*
+    Interlaced. Simplify it: put pass 6 and 7 alternating in the one vector so
+    that one filter per scanline of the uninterlaced image is given, with that
+    filter corresponding the closest to what it would be for non-interlaced
+    image.
+    */
+    for(size_t i = 0; i < h; i++)
+    {
+      filterTypes.push_back(i % 2 == 0 ? passes[5][i / 2] : passes[6][i / 2]);
+    }
+  }
+  return 0; /* OK */
+}
+
+int getPaletteValue(const unsigned char* data, size_t i, int bits)
+{
+  if(bits == 8) return data[i];
+  else if(bits == 4) return (data[i / 2] >> ((i % 2) * 4)) & 15;
+  else if(bits == 2) return (data[i / 4] >> ((i % 4) * 2)) & 3;
+  else if(bits == 1) return (data[i / 8] >> (i % 8)) & 1;
+  else return 0;
+}
+
+//This uses a stripped down version of picoPNG to extract detailed zlib information while decompressing.
+static const unsigned long LENBASE[29] =
+    {3,4,5,6,7,8,9,10,11,13,15,17,19,23,27,31,35,43,51,59,67,83,99,115,131,163,195,227,258};
+static const unsigned long LENEXTRA[29] =
+    {0,0,0,0,0,0,0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,  4,  5,  5,  5,  5,  0};
+static const unsigned long DISTBASE[30] =
+    {1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577};
+static const unsigned long DISTEXTRA[30] =
+    {0,0,0,0,1,1,2, 2, 3, 3, 4, 4, 5, 5,  6,  6,  7,  7,  8,  8,   9,   9,  10,  10,  11,  11,  12,   12,   13,   13};
+static const unsigned long CLCL[19] =
+    {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15}; //code length code lengths
+
+struct ExtractZlib // Zlib decompression and information extraction
+{
+  std::vector<ZlibBlockInfo>* zlibinfo;
+  int error;
+
+  ExtractZlib(std::vector<ZlibBlockInfo>* output) : zlibinfo(output) {};
+
+  unsigned long readBitFromStream(size_t& bitp, const unsigned char* bits)
+  {
+    unsigned long result = (bits[bitp >> 3] >> (bitp & 0x7)) & 1;
+    bitp++;
+    return result;
+  }
+
+  unsigned long readBitsFromStream(size_t& bitp, const unsigned char* bits, size_t nbits)
+  {
+    unsigned long result = 0;
+    for(size_t i = 0; i < nbits; i++) result += (readBitFromStream(bitp, bits)) << i;
+    return result;
+  }
+
+  struct HuffmanTree
+  {
+    int makeFromLengths(const std::vector<unsigned long>& bitlen, unsigned long maxbitlen)
+    { //make tree given the lengths
+      unsigned long numcodes = (unsigned long)(bitlen.size()), treepos = 0, nodefilled = 0;
+      std::vector<unsigned long> tree1d(numcodes), blcount(maxbitlen + 1, 0), nextcode(maxbitlen + 1, 0);
+      //count number of instances of each code length
+      for(unsigned long bits = 0; bits < numcodes; bits++) blcount[bitlen[bits]]++;
+      for(unsigned long bits = 1; bits <= maxbitlen; bits++)
+      {
+        nextcode[bits] = (nextcode[bits - 1] + blcount[bits - 1]) << 1;
+      }
+      //generate all the codes
+      for(unsigned long n = 0; n < numcodes; n++) if(bitlen[n] != 0) tree1d[n] = nextcode[bitlen[n]]++;
+      tree2d.clear(); tree2d.resize(numcodes * 2, 32767); //32767 here means the tree2d isn't filled there yet
+      for(unsigned long n = 0; n < numcodes; n++) //the codes
+      for(unsigned long i = 0; i < bitlen[n]; i++) //the bits for this code
+      {
+        unsigned long bit = (tree1d[n] >> (bitlen[n] - i - 1)) & 1;
+        if(treepos > numcodes - 2) return 55;
+        if(tree2d[2 * treepos + bit] == 32767) //not yet filled in
+        {
+          if(i + 1 == bitlen[n])
+          {
+            //last bit
+            tree2d[2 * treepos + bit] = n;
+            treepos = 0;
+          }
+          else
+          {
+            //addresses are encoded as values > numcodes
+            tree2d[2 * treepos + bit] = ++nodefilled + numcodes;
+            treepos = nodefilled;
+          }
+        }
+        else treepos = tree2d[2 * treepos + bit] - numcodes; //subtract numcodes from address to get address value
+      }
+      return 0;
+    }
+    int decode(bool& decoded, unsigned long& result, size_t& treepos, unsigned long bit) const
+    { //Decodes a symbol from the tree
+      unsigned long numcodes = (unsigned long)tree2d.size() / 2;
+      if(treepos >= numcodes) return 11; //error: you appeared outside the codetree
+      result = tree2d[2 * treepos + bit];
+      decoded = (result < numcodes);
+      treepos = decoded ? 0 : result - numcodes;
+      return 0;
+    }
+    //2D representation of a huffman tree: one dimension is "0" or "1", the other contains all nodes and leaves.
+    std::vector<unsigned long> tree2d;
+  };
+
+  void inflate(std::vector<unsigned char>& out, const std::vector<unsigned char>& in, size_t inpos = 0)
+  {
+    size_t bp = 0, pos = 0; //bit pointer and byte pointer
+    error = 0;
+    unsigned long BFINAL = 0;
+    while(!BFINAL && !error)
+    {
+      size_t uncomprblockstart = pos;
+      size_t bpstart = bp;
+      if(bp >> 3 >= in.size()) { error = 52; return; } //error, bit pointer will jump past memory
+      BFINAL = readBitFromStream(bp, &in[inpos]);
+      unsigned long BTYPE = readBitFromStream(bp, &in[inpos]); BTYPE += 2 * readBitFromStream(bp, &in[inpos]);
+      zlibinfo->resize(zlibinfo->size() + 1);
+      zlibinfo->back().btype = BTYPE;
+      if(BTYPE == 3) { error = 20; return; } //error: invalid BTYPE
+      else if(BTYPE == 0) inflateNoCompression(out, &in[inpos], bp, pos, in.size());
+      else inflateHuffmanBlock(out, &in[inpos], bp, pos, in.size(), BTYPE);
+      size_t uncomprblocksize = pos - uncomprblockstart;
+      zlibinfo->back().compressedbits = bp - bpstart;
+      zlibinfo->back().uncompressedbytes = uncomprblocksize;
+    }
+  }
+
+  void generateFixedTrees(HuffmanTree& tree, HuffmanTree& treeD) //get the tree of a deflated block with fixed tree
+  {
+    std::vector<unsigned long> bitlen(288, 8), bitlenD(32, 5);;
+    for(size_t i = 144; i <= 255; i++) bitlen[i] = 9;
+    for(size_t i = 256; i <= 279; i++) bitlen[i] = 7;
+    tree.makeFromLengths(bitlen, 15);
+    treeD.makeFromLengths(bitlenD, 15);
+  }
+
+  //the code tree for Huffman codes, dist codes, and code length codes
+  HuffmanTree codetree, codetreeD, codelengthcodetree;
+  unsigned long huffmanDecodeSymbol(const unsigned char* in, size_t& bp, const HuffmanTree& codetree, size_t inlength)
+  {
+    //decode a single symbol from given list of bits with given code tree. return value is the symbol
+    bool decoded; unsigned long ct;
+    for(size_t treepos = 0;;)
+    {
+      if((bp & 0x07) == 0 && (bp >> 3) > inlength) { error = 10; return 0; } //error: end reached without endcode
+      error = codetree.decode(decoded, ct, treepos, readBitFromStream(bp, in));
+      if(error) return 0; //stop, an error happened
+      if(decoded) return ct;
+    }
+  }
+
+  void getTreeInflateDynamic(HuffmanTree& tree, HuffmanTree& treeD,
+                             const unsigned char* in, size_t& bp, size_t inlength)
+  {
+    size_t bpstart = bp;
+    //get the tree of a deflated block with dynamic tree, the tree itself is also Huffman compressed with a known tree
+    std::vector<unsigned long> bitlen(288, 0), bitlenD(32, 0);
+    if(bp >> 3 >= inlength - 2) { error = 49; return; } //the bit pointer is or will go past the memory
+    size_t HLIT =  readBitsFromStream(bp, in, 5) + 257; //number of literal/length codes + 257
+    size_t HDIST = readBitsFromStream(bp, in, 5) + 1; //number of dist codes + 1
+    size_t HCLEN = readBitsFromStream(bp, in, 4) + 4; //number of code length codes + 4
+    zlibinfo->back().hlit = HLIT - 257;
+    zlibinfo->back().hdist = HDIST - 1;
+    zlibinfo->back().hclen = HCLEN - 4;
+    std::vector<unsigned long> codelengthcode(19); //lengths of tree to decode the lengths of the dynamic tree
+    for(size_t i = 0; i < 19; i++) codelengthcode[CLCL[i]] = (i < HCLEN) ? readBitsFromStream(bp, in, 3) : 0;
+    //code length code lengths
+    for(size_t i = 0; i < codelengthcode.size(); i++) zlibinfo->back().clcl.push_back(codelengthcode[i]);
+    error = codelengthcodetree.makeFromLengths(codelengthcode, 7); if(error) return;
+    size_t i = 0, replength;
+    while(i < HLIT + HDIST)
+    {
+      unsigned long code = huffmanDecodeSymbol(in, bp, codelengthcodetree, inlength); if(error) return;
+      zlibinfo->back().treecodes.push_back(code); //tree symbol code
+      if(code <= 15)  { if(i < HLIT) bitlen[i++] = code; else bitlenD[i++ - HLIT] = code; } //a length code
+      else if(code == 16) //repeat previous
+      {
+        if(bp >> 3 >= inlength) { error = 50; return; } //error, bit pointer jumps past memory
+        replength = 3 + readBitsFromStream(bp, in, 2);
+        unsigned long value; //set value to the previous code
+        if((i - 1) < HLIT) value = bitlen[i - 1];
+        else value = bitlenD[i - HLIT - 1];
+        for(size_t n = 0; n < replength; n++) //repeat this value in the next lengths
+        {
+          if(i >= HLIT + HDIST) { error = 13; return; } //error: i is larger than the amount of codes
+          if(i < HLIT) bitlen[i++] = value; else bitlenD[i++ - HLIT] = value;
+        }
+      }
+      else if(code == 17) //repeat "0" 3-10 times
+      {
+        if(bp >> 3 >= inlength) { error = 50; return; } //error, bit pointer jumps past memory
+        replength = 3 + readBitsFromStream(bp, in, 3);
+        zlibinfo->back().treecodes.push_back(replength); //tree symbol code repetitions
+        for(size_t n = 0; n < replength; n++) //repeat this value in the next lengths
+        {
+          if(i >= HLIT + HDIST) { error = 14; return; } //error: i is larger than the amount of codes
+          if(i < HLIT) bitlen[i++] = 0; else bitlenD[i++ - HLIT] = 0;
+        }
+      }
+      else if(code == 18) //repeat "0" 11-138 times
+      {
+        if(bp >> 3 >= inlength) { error = 50; return; } //error, bit pointer jumps past memory
+        replength = 11 + readBitsFromStream(bp, in, 7);
+        zlibinfo->back().treecodes.push_back(replength); //tree symbol code repetitions
+        for(size_t n = 0; n < replength; n++) //repeat this value in the next lengths
+        {
+          if(i >= HLIT + HDIST) { error = 15; return; } //error: i is larger than the amount of codes
+          if(i < HLIT) bitlen[i++] = 0; else bitlenD[i++ - HLIT] = 0;
+        }
+      }
+      else { error = 16; return; } //error: somehow an unexisting code appeared. This can never happen.
+    }
+    if(bitlen[256] == 0) { error = 64; return; } //the length of the end code 256 must be larger than 0
+    error = tree.makeFromLengths(bitlen, 15);
+    if(error) return; //now we've finally got HLIT and HDIST, so generate the code trees, and the function is done
+    error = treeD.makeFromLengths(bitlenD, 15);
+    if(error) return;
+    zlibinfo->back().treebits = bp - bpstart;
+    //lit/len/end symbol lengths
+    for(size_t i = 0; i < bitlen.size(); i++) zlibinfo->back().litlenlengths.push_back(bitlen[i]);
+    //dist lengths
+    for(size_t i = 0; i < bitlenD.size(); i++) zlibinfo->back().distlengths.push_back(bitlenD[i]);
+  }
+
+  void inflateHuffmanBlock(std::vector<unsigned char>& out,
+                           const unsigned char* in, size_t& bp, size_t& pos, size_t inlength, unsigned long btype)
+  {
+    size_t numcodes = 0, numlit = 0, numlen = 0; //for logging
+    if(btype == 1) { generateFixedTrees(codetree, codetreeD); }
+    else if(btype == 2) { getTreeInflateDynamic(codetree, codetreeD, in, bp, inlength); if(error) return; }
+    for(;;)
+    {
+      unsigned long code = huffmanDecodeSymbol(in, bp, codetree, inlength); if(error) return;
+      numcodes++;
+      zlibinfo->back().lz77_lcode.push_back(code); //output code
+      zlibinfo->back().lz77_dcode.push_back(0);
+      zlibinfo->back().lz77_lbits.push_back(0);
+      zlibinfo->back().lz77_dbits.push_back(0);
+      zlibinfo->back().lz77_lvalue.push_back(0);
+      zlibinfo->back().lz77_dvalue.push_back(0);
+
+      if(code == 256) break; //end code
+      else if(code <= 255) //literal symbol
+      {
+        out.push_back((unsigned char)(code));
+        pos++;
+        numlit++;
+      }
+      else if(code >= 257 && code <= 285) //length code
+      {
+        size_t length = LENBASE[code - 257], numextrabits = LENEXTRA[code - 257];
+        if((bp >> 3) >= inlength) { error = 51; return; } //error, bit pointer will jump past memory
+        length += readBitsFromStream(bp, in, numextrabits);
+        unsigned long codeD = huffmanDecodeSymbol(in, bp, codetreeD, inlength); if(error) return;
+        if(codeD > 29) { error = 18; return; } //error: invalid dist code (30-31 are never used)
+        unsigned long dist = DISTBASE[codeD], numextrabitsD = DISTEXTRA[codeD];
+        if((bp >> 3) >= inlength) { error = 51; return; } //error, bit pointer will jump past memory
+        dist += readBitsFromStream(bp, in, numextrabitsD);
+        size_t start = pos, back = start - dist; //backwards
+        for(size_t i = 0; i < length; i++)
+        {
+          out.push_back(out[back++]);
+          pos++;
+          if(back >= start) back = start - dist;
+        }
+        numlen++;
+        zlibinfo->back().lz77_dcode.back() = codeD; //output distance code
+        zlibinfo->back().lz77_lbits.back() = numextrabits; //output length extra bits
+        zlibinfo->back().lz77_dbits.back() = numextrabitsD; //output dist extra bits
+        zlibinfo->back().lz77_lvalue.back() = length; //output length
+        zlibinfo->back().lz77_dvalue.back() = dist; //output dist
+      }
+    }
+    zlibinfo->back().numlit = numlit; //output number of literal symbols
+    zlibinfo->back().numlen = numlen; //output number of length symbols
+  }
+
+  void inflateNoCompression(std::vector<unsigned char>& out,
+                            const unsigned char* in, size_t& bp, size_t& pos, size_t inlength)
+  {
+    while((bp & 0x7) != 0) bp++; //go to first boundary of byte
+    size_t p = bp / 8;
+    if(p >= inlength - 4) { error = 52; return; } //error, bit pointer will jump past memory
+    unsigned long LEN = in[p] + 256 * in[p + 1], NLEN = in[p + 2] + 256 * in[p + 3]; p += 4;
+    if(LEN + NLEN != 65535) { error = 21; return; } //error: NLEN is not one's complement of LEN
+    if(p + LEN > inlength) { error = 23; return; } //error: reading outside of in buffer
+    for(unsigned long n = 0; n < LEN; n++)
+    {
+      out.push_back(in[p++]); //read LEN bytes of literal data
+      pos++;
+    }
+    bp = p * 8;
+  }
+
+  int decompress(std::vector<unsigned char>& out, const std::vector<unsigned char>& in) //returns error value
+  {
+    if(in.size() < 2) { return 53; } //error, size of zlib data too small
+    //error: 256 * in[0] + in[1] must be a multiple of 31, the FCHECK value is supposed to be made that way
+    if((in[0] * 256 + in[1]) % 31 != 0) { return 24; }
+    unsigned long CM = in[0] & 15, CINFO = (in[0] >> 4) & 15, FDICT = (in[1] >> 5) & 1;
+    //error: only compression method 8: inflate with sliding window of 32k is supported by the PNG spec
+    if(CM != 8 || CINFO > 7) { return 25; }
+    //error: the PNG spec says about the zlib stream: "The additional flags shall not specify a preset dictionary."
+    if(FDICT != 0) { return 26; }
+    inflate(out, in, 2);
+    return error; //note: adler32 checksum was skipped and ignored
+  }
+};
+
+struct ExtractPNG //PNG decoding and information extraction
+{
+  std::vector<ZlibBlockInfo>* zlibinfo;
+  int error;
+
+  ExtractPNG(std::vector<ZlibBlockInfo>* output) : zlibinfo(output) {};
+
+  void decode(const unsigned char* in, size_t size)
+  {
+    error = 0;
+    if(size == 0 || in == 0) { error = 48; return; } //the given data is empty
+    readPngHeader(&in[0], size); if(error) return;
+    size_t pos = 33; //first byte of the first chunk after the header
+    std::vector<unsigned char> idat; //the data from idat chunks
+    bool IEND = false;
+    //loop through the chunks, ignoring unknown chunks and stopping at IEND chunk.
+    //IDAT data is put at the start of the in buffer
+    while(!IEND)
+    {
+      //error: size of the in buffer too small to contain next chunk
+      if(pos + 8 >= size) { error = 30; return; }
+      size_t chunkLength = read32bitInt(&in[pos]); pos += 4;
+      if(chunkLength > 2147483647) { error = 63; return; }
+      //error: size of the in buffer too small to contain next chunk
+      if(pos + chunkLength >= size) { error = 35; return; }
+      //IDAT chunk, containing compressed image data
+      if(in[pos + 0] == 'I' && in[pos + 1] == 'D' && in[pos + 2] == 'A' && in[pos + 3] == 'T')
+      {
+        idat.insert(idat.end(), &in[pos + 4], &in[pos + 4 + chunkLength]);
+        pos += (4 + chunkLength);
+      }
+      else if(in[pos + 0] == 'I' && in[pos + 1] == 'E' && in[pos + 2] == 'N' && in[pos + 3] == 'D')
+      {
+          pos += 4;
+          IEND = true;
+      }
+      else //it's not an implemented chunk type, so ignore it: skip over the data
+      {
+        pos += (chunkLength + 4); //skip 4 letters and uninterpreted data of unimplemented chunk
+      }
+      pos += 4; //step over CRC (which is ignored)
+    }
+    std::vector<unsigned char> out; //now the out buffer will be filled
+    ExtractZlib zlib(zlibinfo); //decompress with the Zlib decompressor
+    error = zlib.decompress(out, idat);
+    if(error) return; //stop if the zlib decompressor returned an error
+  }
+
+  //read the information from the header and store it in the Info
+  void readPngHeader(const unsigned char* in, size_t inlength)
+  {
+    if(inlength < 29) { error = 27; return; } //error: the data length is smaller than the length of the header
+    if(in[0] != 137 || in[1] != 80 || in[2] != 78 || in[3] != 71
+    || in[4] != 13 || in[5] != 10 || in[6] != 26 || in[7] != 10) { error = 28; return; } //no PNG signature
+    //error: it doesn't start with a IHDR chunk!
+    if(in[12] != 'I' || in[13] != 'H' || in[14] != 'D' || in[15] != 'R') { error = 29; return; }
+  }
+
+  unsigned long readBitFromReversedStream(size_t& bitp, const unsigned char* bits)
+  {
+    unsigned long result = (bits[bitp >> 3] >> (7 - (bitp & 0x7))) & 1;
+    bitp++;
+    return result;
+  }
+
+  unsigned long readBitsFromReversedStream(size_t& bitp, const unsigned char* bits, unsigned long nbits)
+  {
+    unsigned long result = 0;
+    for(size_t i = nbits - 1; i < nbits; i--) result += ((readBitFromReversedStream(bitp, bits)) << i);
+    return result;
+  }
+
+  void setBitOfReversedStream(size_t& bitp, unsigned char* bits, unsigned long bit)
+  {
+    bits[bitp >> 3] |=  (bit << (7 - (bitp & 0x7))); bitp++;
+  }
+
+  unsigned long read32bitInt(const unsigned char* buffer)
+  {
+    return (buffer[0] << 24) | (buffer[1] << 16) | (buffer[2] << 8) | buffer[3];
+  }
+};
+
+void extractZlibInfo(std::vector<ZlibBlockInfo>& zlibinfo, const std::vector<unsigned char>& in)
+{
+  ExtractPNG decoder(&zlibinfo);
+  decoder.decode(&in[0], in.size());
+
+  if(decoder.error) std::cout << "extract error: " << decoder.error << std::endl;
+}
+
+} // namespace lodepng
diff --git a/src/zopflipng/lodepng/lodepng_util.h b/src/zopflipng/lodepng/lodepng_util.h
new file mode 100644
index 0000000..b18ac71
--- /dev/null
+++ b/src/zopflipng/lodepng/lodepng_util.h
@@ -0,0 +1,151 @@
+/*
+LodePNG Utils
+
+Copyright (c) 2005-2012 Lode Vandevenne
+
+This software is provided 'as-is', without any express or implied
+warranty. In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+    1. The origin of this software must not be misrepresented; you must not
+    claim that you wrote the original software. If you use this software
+    in a product, an acknowledgment in the product documentation would be
+    appreciated but is not required.
+
+    2. Altered source versions must be plainly marked as such, and must not be
+    misrepresented as being the original software.
+
+    3. This notice may not be removed or altered from any source
+    distribution.
+*/
+
+/*
+Extra C++ utilities for LodePNG, for convenience.
+*/
+
+#include <string>
+#include <vector>
+#include "lodepng.h"
+
+#pragma once
+
+namespace lodepng
+{
+
+/*
+Returns info from the header of the PNG by value, purely for convenience.
+Does NOT check for errors. Returns bogus info if the PNG has an error.
+Does not require cleanup of allocated memory because no palette or text chunk
+info is in the LodePNGInfo object after checking only the header of the PNG.
+*/
+LodePNGInfo getPNGHeaderInfo(const std::vector<unsigned char>& png);
+
+/*
+Get the names and sizes of all chunks in the PNG file.
+Returns 0 if ok, non-0 if error happened.
+*/
+unsigned getChunkInfo(std::vector<std::string>& names, std::vector<size_t>& sizes,
+                      const std::vector<unsigned char>& png);
+
+/*
+Returns the names and full chunks (including the name and everything else that
+makes up the chunk) for all chunks except IHDR, PLTE, IDAT and IEND.
+It separates the chunks into 3 separate lists, representing the chunks between
+certain critical chunks: 0: IHDR-PLTE, 1: PLTE-IDAT, 2: IDAT-IEND
+Returns 0 if ok, non-0 if error happened.
+*/
+unsigned getChunks(std::vector<std::string> names[3],
+                   std::vector<std::vector<unsigned char> > chunks[3],
+                   const std::vector<unsigned char>& png);
+
+/*
+Inserts chunks into the given png file. The chunks must be fully encoded,
+including length, type, content and CRC.
+The array index determines where it goes:
+0: between IHDR and PLTE, 1: between PLTE and IDAT, 2: between IDAT and IEND.
+They're appended at the end of those locations within the PNG.
+Returns 0 if ok, non-0 if error happened.
+*/
+unsigned insertChunks(std::vector<unsigned char>& png,
+                      const std::vector<std::vector<unsigned char> > chunks[3]);
+
+/*
+Get the filtertypes of each scanline in this PNG file.
+Returns 0 if ok, 1 if PNG decoding error happened.
+
+For a non-interlaced PNG, it returns one filtertype per scanline, in order.
+
+For interlaced PNGs, it returns a result as if it's not interlaced. It returns
+one filtertype per scanline, in order. The values match pass 6 and 7 of the
+Adam7 interlacing, alternating between the two, so that the values correspond
+the most to their scanlines.
+*/
+unsigned getFilterTypes(std::vector<unsigned char>& filterTypes, const std::vector<unsigned char>& png);
+
+/*
+Get the filtertypes of each scanline in every interlace pass this PNG file.
+Returns 0 if ok, 1 if PNG decoding error happened.
+
+For a non-interlaced PNG, it returns one filtertype per scanline, in order, in
+a single std::vector in filterTypes.
+
+For an interlaced PNG, it returns 7 std::vectors in filterTypes, one for each
+Adam7 pass. The amount of values per pass can be calculated as follows, where
+w and h are the size of the image and all divisions are integer divisions:
+pass 1: (h + 7) / 8
+pass 2: w <= 4 ? 0 : (h + 7) / 8
+pass 3: h <= 4 ? 0 : (h + 7) / 8
+pass 4: w <= 2 ? 0 : (h + 3) / 4
+pass 5: h <= 2 ? 0 : (h + 3) / 4
+pass 6: w <= 1 ? 0 : (h + 1) / 2
+pass 7: h <= 1 ? 0 : (h + 1) / 2
+*/
+unsigned getFilterTypesInterlaced(std::vector<std::vector<unsigned char> >& filterTypes,
+                                  const std::vector<unsigned char>& png);
+
+/*
+Returns the value of the i-th pixel in an image with 1, 2, 4 or 8-bit color.
+E.g. if bits is 4 and i is 5, it returns the 5th nibble (4-bit group), which
+is the second half of the 3th byte, in big endian (PNG's endian order).
+*/
+int getPaletteValue(const unsigned char* data, size_t i, int bits);
+
+/*
+The information for extractZlibInfo.
+*/
+struct ZlibBlockInfo
+{
+  int btype; //block type (0-2)
+  size_t compressedbits; //size of compressed block in bits
+  size_t uncompressedbytes; //size of uncompressed block in bytes
+
+  // only filled in for block type 2
+  size_t treebits; //encoded tree size in bits
+  int hlit; //the HLIT value that was filled in for this tree
+  int hdist; //the HDIST value that was filled in for this tree
+  int hclen; //the HCLEN value that was filled in for this tree
+  std::vector<int> clcl; //19 code length code lengths (compressed tree's tree)
+  std::vector<int> treecodes; //N tree codes, with values 0-18. Values 17 or 18 are followed by the repetition value.
+  std::vector<int> litlenlengths; //288 code lengths for lit/len symbols
+  std::vector<int> distlengths; //32 code lengths for dist symbols
+
+  // only filled in for block types 1 or 2
+  std::vector<int> lz77_lcode; //LZ77 codes. 0-255: literals. 256: end symbol. 257-285: length code of length/dist pairs
+  // the next vectors have the same size as lz77_lcode, but an element only has meaningful value if lz77_lcode contains a length code.
+  std::vector<int> lz77_dcode;
+  std::vector<int> lz77_lbits;
+  std::vector<int> lz77_dbits;
+  std::vector<int> lz77_lvalue;
+  std::vector<int> lz77_dvalue;
+  size_t numlit; //number of lit codes in this block
+  size_t numlen; //number of len codes in this block
+};
+
+//Extracts all info needed from a PNG file to reconstruct the zlib compression exactly.
+void extractZlibInfo(std::vector<ZlibBlockInfo>& zlibinfo, const std::vector<unsigned char>& in);
+
+} // namespace lodepng
diff --git a/src/zopflipng/zopflipng_bin.cc b/src/zopflipng/zopflipng_bin.cc
new file mode 100644
index 0000000..3faea06
--- /dev/null
+++ b/src/zopflipng/zopflipng_bin.cc
@@ -0,0 +1,407 @@
+// Copyright 2013 Google Inc. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+// Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+// Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+
+// Command line tool to recompress and optimize PNG images, using zopflipng_lib.
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#include "lodepng/lodepng.h"
+#include "zopflipng_lib.h"
+
+// Returns directory path (including last slash) in dir, filename without
+// extension in file, extension (including the dot) in ext
+void GetFileNameParts(const std::string& filename,
+    std::string* dir, std::string* file, std::string* ext) {
+  size_t npos = (size_t)(-1);
+  size_t slashpos = filename.find_last_of("/\\");
+  std::string nodir;
+  if (slashpos == npos) {
+    *dir = "";
+    nodir = filename;
+  } else {
+    *dir = filename.substr(0, slashpos + 1);
+    nodir = filename.substr(slashpos + 1);
+  }
+  size_t dotpos = nodir.find_last_of('.');
+  if (dotpos == (size_t)(-1)) {
+    *file = nodir;
+    *ext = "";
+  } else {
+    *file = nodir.substr(0, dotpos);
+    *ext = nodir.substr(dotpos);
+  }
+}
+
+// Returns the size of the file
+size_t GetFileSize(const std::string& filename) {
+  size_t size;
+  FILE* file = fopen(filename.c_str(), "rb");
+  if (!file) return 0;
+  fseek(file , 0 , SEEK_END);
+  size = static_cast<size_t>(ftell(file));
+  fclose(file);
+  return size;
+}
+
+void ShowHelp() {
+  printf("ZopfliPNG, a Portable Network Graphics (PNG) image optimizer.\n"
+         "\n"
+         "Usage: zopflipng [options]... infile.png outfile.png\n"
+         "       zopflipng [options]... --prefix=[fileprefix] [files.png]...\n"
+         "\n"
+         "If the output file exists, it is considered a result from a"
+         " previous run and not overwritten if its filesize is smaller.\n"
+         "\n"
+         "Options:\n"
+         "-m: compress more: use more iterations (depending on file size) and"
+         " use block split strategy 3\n"
+         "--prefix=[fileprefix]: Adds a prefix to output filenames. May also"
+         " contain a directory path. When using a prefix, multiple input files"
+         " can be given and the output filenames are generated with the"
+         " prefix\n"
+         " If --prefix is specified without value, 'zopfli_' is used.\n"
+         " If input file names contain the prefix, they are not processed but"
+         " considered as output from previous runs. This is handy when using"
+         " *.png wildcard expansion with multiple runs.\n"
+         "-y: do not ask about overwriting files.\n"
+         "--lossy_transparent: remove colors behind alpha channel 0. No visual"
+         " difference, removes hidden information.\n"
+         "--lossy_8bit: convert 16-bit per channel image to 8-bit per"
+         " channel.\n"
+         "-d: dry run: don't save any files, just see the console output"
+         " (e.g. for benchmarking)\n"
+         "--always_zopflify: always output the image encoded by Zopfli, even if"
+         " it's bigger than the original, for benchmarking the algorithm. Not"
+         " good for real optimization.\n"
+         "-q: use quick, but not very good, compression"
+         " (e.g. for only trying the PNG filter and color types)\n"
+         "--iterations=[number]: number of iterations, more iterations makes it"
+         " slower but provides slightly better compression. Default: 15 for"
+         " small files, 5 for large files.\n"
+         "--splitting=[0-3]: block split strategy:"
+         " 0=none, 1=first, 2=last, 3=try both and take the best\n"
+         "--filters=[types]: filter strategies to try:\n"
+         " 0-4: give all scanlines PNG filter type 0-4\n"
+         " m: minimum sum\n"
+         " e: entropy\n"
+         " p: predefined (keep from input, this likely overlaps another"
+         " strategy)\n"
+         " b: brute force (experimental)\n"
+         " By default, if this argument is not given, one that is most likely"
+         " the best for this image is chosen by trying faster compression with"
+         " each type.\n"
+         " If this argument is used, all given filter types"
+         " are tried with slow compression and the best result retained. A good"
+         " set of filters to try is --filters=0me.\n"
+         "--keepchunks=nAME,nAME,...: keep metadata chunks with these names"
+         " that would normally be removed, e.g. tEXt,zTXt,iTXt,gAMA, ... \n"
+         " Due to adding extra data, this increases the result size. By default"
+         " ZopfliPNG only keeps the following chunks because they are"
+         " essential: IHDR, PLTE, tRNS, IDAT and IEND.\n"
+         "\n"
+         "Usage examples:\n"
+         "Optimize a file and overwrite if smaller: zopflipng infile.png"
+         " outfile.png\n"
+         "Compress more: zopflipng -m infile.png outfile.png\n"
+         "Optimize multiple files: zopflipng --prefix a.png b.png c.png\n"
+         "Compress really good and trying all filter strategies: zopflipng"
+         " --iterations=500 --splitting=3 --filters=01234mepb"
+         " --lossy_8bit --lossy_transparent infile.png outfile.png\n");
+}
+
+void PrintSize(const char* label, size_t size) {
+  printf("%s: %d (%dK)\n", label, (int) size, (int) size / 1024);
+}
+
+void PrintResultSize(const char* label, size_t oldsize, size_t newsize) {
+  printf("%s: %d (%dK). Percentage of original: %.3f%%\n",
+         label, (int) newsize, (int) newsize / 1024, newsize * 100.0 / oldsize);
+}
+
+int main(int argc, char *argv[]) {
+  if (argc < 2) {
+    ShowHelp();
+    return 0;
+  }
+
+  ZopfliPNGOptions png_options;
+
+  // cmd line options
+  bool always_zopflify = false;  // overwrite file even if we have bigger result
+  bool yes = false;  // do not ask to overwrite files
+  bool dryrun = false;  // never save anything
+
+  std::string user_out_filename;  // output filename if no prefix is used
+  bool use_prefix = false;
+  std::string prefix = "zopfli_";  // prefix for output filenames
+
+  std::vector<std::string> files;
+  std::vector<char> options;
+  for (int i = 1; i < argc; i++) {
+    std::string arg = argv[i];
+    if (arg[0] == '-' && arg.size() > 1 && arg[1] != '-') {
+      for (size_t pos = 1; pos < arg.size(); pos++) {
+        char c = arg[pos];
+        if (c == 'y') {
+          yes = true;
+        } else if (c == 'd') {
+          dryrun = true;
+        } else if (c == 'm') {
+          png_options.num_iterations *= 4;
+          png_options.num_iterations_large *= 4;
+          png_options.block_split_strategy = 3;
+        } else if (c == 'q') {
+          png_options.use_zopfli = false;
+        } else if (c == 'h') {
+          ShowHelp();
+          return 0;
+        } else {
+          printf("Unknown flag: %c\n", c);
+          return 0;
+        }
+      }
+    } else if (arg[0] == '-' && arg.size() > 1 && arg[1] == '-') {
+      size_t eq = arg.find('=');
+      std::string name = arg.substr(0, eq);
+      std::string value = eq >= arg.size() - 1 ? "" : arg.substr(eq + 1);
+      int num = atoi(value.c_str());
+      if (name == "--always_zopflify") {
+        always_zopflify = true;
+      } else if (name == "--lossy_transparent") {
+        png_options.lossy_transparent = true;
+      } else if (name == "--lossy_8bit") {
+        png_options.lossy_8bit = true;
+      } else if (name == "--iterations") {
+        if (num < 1) num = 1;
+        png_options.num_iterations = num;
+        png_options.num_iterations_large = num;
+      } else if (name == "--splitting") {
+        if (num < 0 || num > 3) num = 1;
+        png_options.block_split_strategy = num;
+      } else if (name == "--filters") {
+        for (size_t j = 0; j < value.size(); j++) {
+          ZopfliPNGFilterStrategy strategy = kStrategyZero;
+          char f = value[j];
+          switch (f) {
+            case '0': strategy = kStrategyZero; break;
+            case '1': strategy = kStrategyOne; break;
+            case '2': strategy = kStrategyTwo; break;
+            case '3': strategy = kStrategyThree; break;
+            case '4': strategy = kStrategyFour; break;
+            case 'm': strategy = kStrategyMinSum; break;
+            case 'e': strategy = kStrategyEntropy; break;
+            case 'p': strategy = kStrategyPredefined; break;
+            case 'b': strategy = kStrategyBruteForce; break;
+            default:
+              printf("Unknown filter strategy: %c\n", f);
+              return 1;
+          }
+          png_options.filter_strategies.push_back(strategy);
+          // Enable auto filter strategy only if no user-specified filter is
+          // given.
+          png_options.auto_filter_strategy = false;
+        }
+      } else if (name == "--keepchunks") {
+        bool correct = true;
+        if ((value.size() + 1) % 5 != 0) correct = false;
+        for (size_t i = 0; i + 4 <= value.size() && correct; i += 5) {
+          png_options.keepchunks.push_back(value.substr(i, 4));
+          if (i > 4 && value[i - 1] != ',') correct = false;
+        }
+        if (!correct) {
+          printf("Error: keepchunks format must be like for example:\n"
+                 " --keepchunks=gAMA,cHRM,sRGB,iCCP\n");
+          return 0;
+        }
+      } else if (name == "--prefix") {
+        use_prefix = true;
+        if (!value.empty()) prefix = value;
+      } else if (name == "--help") {
+        ShowHelp();
+        return 0;
+      } else {
+        printf("Unknown flag: %s\n", name.c_str());
+        return 0;
+      }
+    } else {
+      files.push_back(argv[i]);
+    }
+  }
+
+  if (!use_prefix) {
+    if (files.size() == 2) {
+      // The second filename is the output instead of an input if no prefix is
+      // given.
+      user_out_filename = files[1];
+      files.resize(1);
+    } else {
+      printf("Please provide one input and output filename\n\n");
+      ShowHelp();
+      return 0;
+    }
+  }
+
+  size_t total_in_size = 0;
+  // Total output size, taking input size if the input file was smaller
+  size_t total_out_size = 0;
+  // Total output size that zopfli produced, even if input was smaller, for
+  // benchmark information
+  size_t total_out_size_zopfli = 0;
+  size_t total_errors = 0;
+  size_t total_files = 0;
+  size_t total_files_smaller = 0;
+  size_t total_files_saved = 0;
+  size_t total_files_equal = 0;
+
+  for (size_t i = 0; i < files.size(); i++) {
+    if (use_prefix && files.size() > 1) {
+      std::string dir, file, ext;
+      GetFileNameParts(files[i], &dir, &file, &ext);
+      // avoid doing filenames which were already output by this so that you
+      // don't get zopfli_zopfli_zopfli_... files after multiple runs.
+      if (file.find(prefix) == 0) continue;
+    }
+
+    total_files++;
+
+    printf("Optimizing %s\n", files[i].c_str());
+    std::vector<unsigned char> image;
+    unsigned w, h;
+    std::vector<unsigned char> origpng;
+    unsigned error;
+    lodepng::State inputstate;
+    std::vector<unsigned char> resultpng;
+
+    lodepng::load_file(origpng, files[i]);
+    error = ZopfliPNGOptimize(origpng, png_options, true, &resultpng);
+
+    if (error) {
+      printf("Decoding error %i: %s\n", error, lodepng_error_text(error));
+    }
+
+    // Verify result, check that the result causes no decoding errors
+    if (!error) {
+      error = lodepng::decode(image, w, h, inputstate, resultpng);
+      if (error) printf("Error: verification of result failed.\n");
+    }
+
+    if (error) {
+      printf("There was an error\n");
+      total_errors++;
+    } else {
+      size_t origsize = GetFileSize(files[i]);
+      size_t resultsize = resultpng.size();
+
+      if (resultsize < origsize) {
+        printf("Result is smaller\n");
+      } else if (resultsize == origsize) {
+        printf("Result has exact same size\n");
+      } else {
+        printf(always_zopflify
+            ? "Original was smaller\n"
+            : "Preserving original PNG since it was smaller\n");
+      }
+      PrintSize("Input size", origsize);
+      PrintResultSize("Result size", origsize, resultsize);
+
+      std::string out_filename = user_out_filename;
+      if (use_prefix) {
+        std::string dir, file, ext;
+        GetFileNameParts(files[i], &dir, &file, &ext);
+        out_filename = dir + prefix + file + ext;
+      }
+      bool different_output_name = out_filename != files[i];
+
+      total_in_size += origsize;
+      total_out_size_zopfli += resultpng.size();
+      if (resultpng.size() < origsize) total_files_smaller++;
+      else if (resultpng.size() == origsize) total_files_equal++;
+
+      if (!always_zopflify && resultpng.size() > origsize) {
+        // Set output file to input since input was smaller.
+        resultpng = origpng;
+      }
+
+      size_t origoutfilesize = GetFileSize(out_filename);
+      bool already_exists = true;
+      if (origoutfilesize == 0) already_exists = false;
+
+      // When using a prefix, and the output file already exist, assume it's
+      // from a previous run. If that file is smaller, it may represent a
+      // previous run with different parameters that gave a smaller PNG image.
+      // In that case, do not overwrite it. This behaviour can be removed by
+      // adding the always_zopflify flag.
+      bool keep_earlier_output_file = already_exists &&
+          resultpng.size() >= origoutfilesize && !always_zopflify && use_prefix;
+
+      if (keep_earlier_output_file) {
+        // An output file from a previous run is kept, add that files' size
+        // to the output size statistics.
+        total_out_size += origoutfilesize;
+        if (different_output_name) {
+          printf(resultpng.size() == origoutfilesize
+              ? "File not written because a previous run was as good.\n"
+              : "File not written because a previous run was better.\n");
+        }
+      } else {
+        bool confirmed = true;
+        if (!yes && !dryrun && already_exists) {
+          printf("File %s exists, overwrite? (y/N) ", out_filename.c_str());
+          char answer = 0;
+          // Read the first character, the others and enter with getchar.
+          while (int input = getchar()) {
+            if (input == '\n' || input == EOF) break;
+            else if (!answer) answer = input;
+          }
+          confirmed = answer == 'y' || answer == 'Y';
+        }
+        if (confirmed) {
+          if (!dryrun) {
+            lodepng::save_file(resultpng, out_filename);
+            total_files_saved++;
+          }
+          total_out_size += resultpng.size();
+        } else {
+          // An output file from a previous run is kept, add that files' size
+          // to the output size statistics.
+          total_out_size += origoutfilesize;
+        }
+      }
+    }
+    printf("\n");
+  }
+
+  if (total_files > 1) {
+    printf("Summary for all files:\n");
+    printf("Files tried: %d\n", (int) total_files);
+    printf("Files smaller: %d\n", (int) total_files_smaller);
+    if (total_files_equal) {
+      printf("Files equal: %d\n", (int) total_files_equal);
+    }
+    printf("Files saved: %d\n", (int) total_files_saved);
+    if (total_errors) printf("Errors: %d\n", (int) total_errors);
+    PrintSize("Total input size", total_in_size);
+    PrintResultSize("Total output size", total_in_size, total_out_size);
+    PrintResultSize("Benchmark result size",
+                    total_in_size, total_out_size_zopfli);
+  }
+
+  if (dryrun) printf("No files were written because dry run was specified\n");
+
+  return total_errors;
+}
diff --git a/src/zopflipng/zopflipng_lib.cc b/src/zopflipng/zopflipng_lib.cc
new file mode 100755
index 0000000..c310790
--- /dev/null
+++ b/src/zopflipng/zopflipng_lib.cc
@@ -0,0 +1,425 @@
+// Copyright 2013 Google Inc. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+// Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+// Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+
+// See zopflipng_lib.h
+
+#include "zopflipng_lib.h"
+
+#include <stdio.h>
+#include <set>
+#include <vector>
+
+#include "lodepng/lodepng.h"
+#include "lodepng/lodepng_util.h"
+#include "../zopfli/deflate.h"
+
+ZopfliPNGOptions::ZopfliPNGOptions()
+  : lossy_transparent(false)
+  , lossy_8bit(false)
+  , auto_filter_strategy(true)
+  , use_zopfli(true)
+  , num_iterations(15)
+  , num_iterations_large(5)
+  , block_split_strategy(1) {
+}
+
+// Deflate compressor passed as fuction pointer to LodePNG to have it use Zopfli
+// as its compression backend.
+unsigned CustomPNGDeflate(unsigned char** out, size_t* outsize,
+                          const unsigned char* in, size_t insize,
+                          const LodePNGCompressSettings* settings) {
+  const ZopfliPNGOptions* png_options =
+      static_cast<const ZopfliPNGOptions*>(settings->custom_context);
+  unsigned char bp = 0;
+  ZopfliOptions options;
+  ZopfliInitOptions(&options);
+
+  options.numiterations = insize < 200000
+      ? png_options->num_iterations : png_options->num_iterations_large;
+
+  if (png_options->block_split_strategy == 3) {
+    // Try both block splitting first and last.
+    unsigned char* out2 = 0;
+    size_t outsize2 = 0;
+    options.blocksplittinglast = 0;
+    ZopfliDeflate(&options, 2 /* Dynamic */, 1, in, insize, &bp, out, outsize);
+    bp = 0;
+    options.blocksplittinglast = 1;
+    ZopfliDeflate(&options, 2 /* Dynamic */, 1,
+                  in, insize, &bp, &out2, &outsize2);
+
+    if (outsize2 < *outsize) {
+      free(*out);
+      *out = out2;
+      *outsize = outsize2;
+      printf("Block splitting last was better\n");
+    } else {
+      free(out2);
+    }
+  } else {
+    if (png_options->block_split_strategy == 0) options.blocksplitting = 0;
+    options.blocksplittinglast = png_options->block_split_strategy == 2;
+    ZopfliDeflate(&options, 2 /* Dynamic */, 1, in, insize, &bp, out, outsize);
+  }
+
+  return 0;  // OK
+}
+
+// Returns 32-bit integer value for RGBA color.
+static unsigned ColorIndex(const unsigned char* color) {
+  return color[0] + 256u * color[1] + 65536u * color[1] + 16777216u * color[3];
+}
+
+// Counts amount of colors in the image, up to 257. If transparent_counts_as_one
+// is enabled, any color with alpha channel 0 is treated as a single color with
+// index 0.
+void CountColors(std::set<unsigned>* unique,
+                 const unsigned char* image, unsigned w, unsigned h,
+                 bool transparent_counts_as_one) {
+  unique->clear();
+  for (size_t i = 0; i < w * h; i++) {
+    unsigned index = ColorIndex(&image[i * 4]);
+    if (transparent_counts_as_one && image[i * 4 + 3] == 0) index = 0;
+    unique->insert(index);
+    if (unique->size() > 256) break;
+  }
+}
+
+// Remove RGB information from pixels with alpha=0
+void LossyOptimizeTransparent(lodepng::State* inputstate, unsigned char* image,
+    unsigned w, unsigned h) {
+  // First check if we want to preserve potential color-key background color,
+  // or instead use the last encountered RGB value all the time to save bytes.
+  bool key = true;
+  for (size_t i = 0; i < w * h; i++) {
+    if (image[i * 4 + 3] > 0 && image[i * 4 + 3] < 255) {
+      key = false;
+      break;
+    }
+  }
+  std::set<unsigned> count;  // Color count, up to 257.
+  CountColors(&count, image, w, h, true);
+  // If true, means palette is possible so avoid using different RGB values for
+  // the transparent color.
+  bool palette = count.size() <= 256;
+
+  // Choose the color key or first initial background color.
+  int r = 0, g = 0, b = 0;
+  if (key || palette) {
+    for (size_t i = 0; i < w * h; i++) {
+      if (image[i * 4 + 3] == 0) {
+        // Use RGB value of first encountered transparent pixel. This can be
+        // used as a valid color key, or in case of palette ensures a color
+        // existing in the input image palette is used.
+        r = image[i * 4 + 0];
+        g = image[i * 4 + 1];
+        b = image[i * 4 + 2];
+      }
+    }
+  }
+
+  for (size_t i = 0; i < w * h; i++) {
+    // if alpha is 0, alter the RGB value to a possibly more efficient one.
+    if (image[i * 4 + 3] == 0) {
+      image[i * 4 + 0] = r;
+      image[i * 4 + 1] = g;
+      image[i * 4 + 2] = b;
+    } else {
+      if (!key && !palette) {
+        // Use the last encountered RGB value if no key or palette is used: that
+        // way more values can be 0 thanks to the PNG filter types.
+        r = image[i * 4 + 0];
+        g = image[i * 4 + 1];
+        b = image[i * 4 + 2];
+      }
+    }
+  }
+
+  // If there are now less colors, update palette of input image to match this.
+  if (palette && inputstate->info_png.color.palettesize > 0) {
+    CountColors(&count, image, w, h, false);
+    if (count.size() < inputstate->info_png.color.palettesize) {
+      std::vector<unsigned char> palette_out;
+      unsigned char* palette_in = inputstate->info_png.color.palette;
+      for (size_t i = 0; i < inputstate->info_png.color.palettesize; i++) {
+        if (count.count(ColorIndex(&palette_in[i * 4])) != 0) {
+          palette_out.push_back(palette_in[i * 4 + 0]);
+          palette_out.push_back(palette_in[i * 4 + 1]);
+          palette_out.push_back(palette_in[i * 4 + 2]);
+          palette_out.push_back(palette_in[i * 4 + 3]);
+        }
+      }
+      inputstate->info_png.color.palettesize = palette_out.size() / 4;
+      for (size_t i = 0; i < palette_out.size(); i++) {
+        palette_in[i] = palette_out[i];
+      }
+    }
+  }
+}
+
+// Tries to optimize given a single PNG filter strategy.
+// Returns 0 if ok, other value for error
+unsigned TryOptimize(
+    const std::vector<unsigned char>& image, unsigned w, unsigned h,
+    const lodepng::State& inputstate, bool bit16,
+    const std::vector<unsigned char>& origfile,
+    ZopfliPNGFilterStrategy filterstrategy,
+    bool use_zopfli, int windowsize, const ZopfliPNGOptions* png_options,
+    std::vector<unsigned char>* out) {
+  unsigned error = 0;
+
+  lodepng::State state;
+  state.encoder.zlibsettings.windowsize = windowsize;
+  if (use_zopfli && png_options->use_zopfli) {
+    state.encoder.zlibsettings.custom_deflate = CustomPNGDeflate;
+    state.encoder.zlibsettings.custom_context = png_options;
+  }
+
+  if (inputstate.info_png.color.colortype == LCT_PALETTE) {
+    // Make it preserve the original palette order
+    lodepng_color_mode_copy(&state.info_raw, &inputstate.info_png.color);
+    state.info_raw.colortype = LCT_RGBA;
+    state.info_raw.bitdepth = 8;
+  }
+  if (bit16) {
+    state.info_raw.bitdepth = 16;
+  }
+
+  state.encoder.filter_palette_zero = 0;
+
+  std::vector<unsigned char> filters;
+  switch (filterstrategy) {
+    case kStrategyZero:
+      state.encoder.filter_strategy = LFS_ZERO;
+      break;
+    case kStrategyMinSum:
+      state.encoder.filter_strategy = LFS_MINSUM;
+      break;
+    case kStrategyEntropy:
+      state.encoder.filter_strategy = LFS_ENTROPY;
+      break;
+    case kStrategyBruteForce:
+      state.encoder.filter_strategy = LFS_BRUTE_FORCE;
+      break;
+    case kStrategyOne:
+    case kStrategyTwo:
+    case kStrategyThree:
+    case kStrategyFour:
+      // Set the filters of all scanlines to that number.
+      filters.resize(h, filterstrategy);
+      state.encoder.filter_strategy = LFS_PREDEFINED;
+      state.encoder.predefined_filters = &filters[0];
+      break;
+    case kStrategyPredefined:
+      lodepng::getFilterTypes(filters, origfile);
+      state.encoder.filter_strategy = LFS_PREDEFINED;
+      state.encoder.predefined_filters = &filters[0];
+      break;
+    default:
+      break;
+  }
+
+  state.encoder.add_id = false;
+  state.encoder.text_compression = 1;
+
+  error = lodepng::encode(*out, image, w, h, state);
+
+  // For very small output, also try without palette, it may be smaller thanks
+  // to no palette storage overhead.
+  if (!error && out->size() < 4096) {
+    lodepng::State teststate;
+    std::vector<unsigned char> temp;
+    lodepng::decode(temp, w, h, teststate, *out);
+    LodePNGColorMode& color = teststate.info_png.color;
+    if (color.colortype == LCT_PALETTE) {
+      std::vector<unsigned char> out2;
+      state.encoder.auto_convert = LAC_ALPHA;
+      bool grey = true;
+      for (size_t i = 0; i < color.palettesize; i++) {
+        if (color.palette[i * 4 + 0] != color.palette[i * 4 + 2]
+            || color.palette[i * 4 + 1] != color.palette[i * 4 + 2]) {
+          grey = false;
+          break;
+        }
+      }
+      if (grey) state.info_png.color.colortype = LCT_GREY_ALPHA;
+
+      error = lodepng::encode(out2, image, w, h, state);
+      if (out2.size() < out->size()) out->swap(out2);
+    }
+  }
+
+  if (error) {
+    printf("Encoding error %u: %s\n", error, lodepng_error_text(error));
+    return error;
+  }
+
+  return 0;
+}
+
+// Use fast compression to check which PNG filter strategy gives the smallest
+// output. This allows to then do the slow and good compression only on that
+// filter type.
+unsigned AutoChooseFilterStrategy(const std::vector<unsigned char>& image,
+                                  unsigned w, unsigned h,
+                                  const lodepng::State& inputstate, bool bit16,
+                                  const std::vector<unsigned char>& origfile,
+                                  int numstrategies,
+                                  ZopfliPNGFilterStrategy* strategies,
+                                  bool* enable) {
+  std::vector<unsigned char> out;
+  size_t bestsize = 0;
+  int bestfilter = 0;
+
+  // A large window size should still be used to do the quick compression to
+  // try out filter strategies: which filter strategy is the best depends
+  // largely on the window size, the closer to the actual used window size the
+  // better.
+  int windowsize = 8192;
+
+  for (int i = 0; i < numstrategies; i++) {
+    out.clear();
+    unsigned error = TryOptimize(image, w, h, inputstate, bit16, origfile,
+                                 strategies[i], false, windowsize, 0, &out);
+    if (error) return error;
+    if (bestsize == 0 || out.size() < bestsize) {
+      bestsize = out.size();
+      bestfilter = i;
+    }
+  }
+
+  for (int i = 0; i < numstrategies; i++) {
+    enable[i] = (i == bestfilter);
+  }
+
+  return 0;  /* OK */
+}
+
+// Keeps chunks with given names from the original png by literally copying them
+// into the new png
+void KeepChunks(const std::vector<unsigned char>& origpng,
+                const std::vector<std::string>& keepnames,
+                std::vector<unsigned char>* png) {
+  std::vector<std::string> names[3];
+  std::vector<std::vector<unsigned char> > chunks[3];
+
+  lodepng::getChunks(names, chunks, origpng);
+  std::vector<std::vector<unsigned char> > keepchunks[3];
+
+  // There are 3 distinct locations in a PNG file for chunks: between IHDR and
+  // PLTE, between PLTE and IDAT, and between IDAT and IEND. Keep each chunk at
+  // its corresponding location in the new PNG.
+  for (size_t i = 0; i < 3; i++) {
+    for (size_t j = 0; j < names[i].size(); j++) {
+      for (size_t k = 0; k < keepnames.size(); k++) {
+        if (keepnames[k] == names[i][j]) {
+          keepchunks[i].push_back(chunks[i][j]);
+        }
+      }
+    }
+  }
+
+  lodepng::insertChunks(*png, keepchunks);
+}
+
+int ZopfliPNGOptimize(const std::vector<unsigned char>& origpng,
+    const ZopfliPNGOptions& png_options,
+    bool verbose,
+    std::vector<unsigned char>* resultpng) {
+  // Use the largest possible deflate window size
+  int windowsize = 32768;
+
+  ZopfliPNGFilterStrategy filterstrategies[kNumFilterStrategies] = {
+    kStrategyZero, kStrategyOne, kStrategyTwo, kStrategyThree, kStrategyFour,
+    kStrategyMinSum, kStrategyEntropy, kStrategyPredefined, kStrategyBruteForce
+  };
+  bool strategy_enable[kNumFilterStrategies] = {
+    false, false, false, false, false, false, false, false, false
+  };
+  std::string strategy_name[kNumFilterStrategies] = {
+    "zero", "one", "two", "three", "four",
+    "minimum sum", "entropy", "predefined", "brute force"
+  };
+  for (size_t i = 0; i < png_options.filter_strategies.size(); i++) {
+    strategy_enable[png_options.filter_strategies[i]] = true;
+  }
+
+  std::vector<unsigned char> image;
+  unsigned w, h;
+  unsigned error;
+  lodepng::State inputstate;
+  error = lodepng::decode(image, w, h, inputstate, origpng);
+
+  if (error) {
+    if (verbose) {
+      printf("Decoding error %i: %s\n", error, lodepng_error_text(error));
+    }
+    return error;
+  }
+
+  bool bit16 = false;  // Using 16-bit per channel raw image
+  if (inputstate.info_png.color.bitdepth == 16 && !png_options.lossy_8bit) {
+    // Decode as 16-bit
+    image.clear();
+    error = lodepng::decode(image, w, h, origpng, LCT_RGBA, 16);
+    bit16 = true;
+  }
+
+  if (!error) {
+    // If lossy_transparent, remove RGB information from pixels with alpha=0
+    if (png_options.lossy_transparent && !bit16) {
+      LossyOptimizeTransparent(&inputstate, &image[0], w, h);
+    }
+
+    if (png_options.auto_filter_strategy) {
+      error = AutoChooseFilterStrategy(image, w, h, inputstate, bit16,
+                                       origpng,
+                                       /* Don't try brute force */
+                                       kNumFilterStrategies - 1,
+                                       filterstrategies, strategy_enable);
+    }
+  }
+
+  if (!error) {
+    size_t bestsize = 0;
+
+    for (int i = 0; i < kNumFilterStrategies; i++) {
+      if (!strategy_enable[i]) continue;
+
+      std::vector<unsigned char> temp;
+      error = TryOptimize(image, w, h, inputstate, bit16, origpng,
+                          filterstrategies[i], true /* use_zopfli */,
+                          windowsize, &png_options, &temp);
+      if (!error) {
+        if (verbose) {
+          printf("Filter strategy %s: %d bytes\n",
+                 strategy_name[i].c_str(), (int) temp.size());
+        }
+        if (bestsize == 0 || temp.size() < bestsize) {
+          bestsize = temp.size();
+          (*resultpng).swap(temp);  // Store best result so far in the output.
+        }
+      }
+    }
+
+    if (!png_options.keepchunks.empty()) {
+      KeepChunks(origpng, png_options.keepchunks, resultpng);
+    }
+  }
+
+  return error;
+}
diff --git a/src/zopflipng/zopflipng_lib.h b/src/zopflipng/zopflipng_lib.h
new file mode 100644
index 0000000..cb749fc
--- /dev/null
+++ b/src/zopflipng/zopflipng_lib.h
@@ -0,0 +1,79 @@
+// Copyright 2013 Google Inc. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+// Author: lode.vandevenne@gmail.com (Lode Vandevenne)
+// Author: jyrki.alakuijala@gmail.com (Jyrki Alakuijala)
+
+// Library to recompress and optimize PNG images. Uses Zopfli as the compression
+// backend, chooses optimal PNG color model, and tries out several PNG filter
+// strategies.
+
+#ifndef ZOPFLIPNG_LIB_H_
+#define ZOPFLIPNG_LIB_H_
+
+#include <string>
+#include <vector>
+
+enum ZopfliPNGFilterStrategy {
+  kStrategyZero = 0,
+  kStrategyOne = 1,
+  kStrategyTwo = 2,
+  kStrategyThree = 3,
+  kStrategyFour = 4,
+  kStrategyMinSum,
+  kStrategyEntropy,
+  kStrategyPredefined,
+  kStrategyBruteForce,
+  kNumFilterStrategies /* Not a strategy but used for the size of this enum */
+};
+
+struct ZopfliPNGOptions {
+  ZopfliPNGOptions();
+
+  // Allow altering hidden colors of fully transparent pixels
+  bool lossy_transparent;
+  // Convert 16-bit per channel images to 8-bit per channel
+  bool lossy_8bit;
+
+  // Filter strategies to try
+  std::vector<ZopfliPNGFilterStrategy> filter_strategies;
+
+  // Automatically choose filter strategy using less good compression
+  bool auto_filter_strategy;
+
+  // PNG chunks to keep
+  // chunks to literally copy over from the original PNG to the resulting one
+  std::vector<std::string> keepchunks;
+
+  // Use Zopfli deflate compression
+  bool use_zopfli;
+
+  // Zopfli number of iterations
+  int num_iterations;
+
+  // Zopfli number of iterations on large images
+  int num_iterations_large;
+
+  // 0=none, 1=first, 2=last, 3=both
+  int block_split_strategy;
+};
+
+// Returns 0 on success, error code otherwise.
+// If verbose is true, it will print some info while working.
+int ZopfliPNGOptimize(const std::vector<unsigned char>& origpng,
+    const ZopfliPNGOptions& png_options,
+    bool verbose,
+    std::vector<unsigned char>* resultpng);
+
+#endif  // ZOPFLIPNG_LIB_H_