Use Puffer (instead of zlib library) to find deflate locations

Currently, we are dependent on zlib to find deflate locations in gzip and zip
archives. This means we have to pass a deflate stream one time to zlib library
and two times to puffer (to find deflate subblocks locations and puff
locations). This patch changes the puffer function to terminate the call when
the final block is reached. Hence, it can be used easily to identify the
location of deflate subblocks by only one pass (plus one pass for puff
locations). This will reduce the delta payload generation time.

Bug: 80148379
Test: unittests
Test: test_corpus.py
Test: measure_patch_size.py

Change-Id: I4e7e6ed1edc2d5636124600d41f776332cda570a
7 files changed
tree: 633cd3a9efa37338599e29899e5f8f575360b91f
  1. scripts/
  2. src/
  3. .clang-format
  4. Android.bp
  5. libpuffdiff.pc
  6. libpuffpatch.pc
  7. LICENSE
  8. Makefile
  9. OWNERS
  10. PRESUBMIT.cfg
  11. PREUPLOAD.cfg
  12. puffin.gyp
  13. README.md
  14. README.version
README.md

Puffin

Source code for Puffin: A utility for deterministic DEFLATE recompression.

TODO(ahassani): Describe the directory structure and how-tos.

Glossary

  • Alphabet A value that occurs in the input stream. It can be either a literal:[0..255], and end of block sign [256], a length[257..285], or a distance [0..29].

  • Huffman code A variable length code representing the Huffman encoded of an alphabet. Huffman codes can be created uniquely using Huffman code length array.

  • Huffman code array An array which an array index identifies a Huffman code and the array element in that index represents the corresponding alphabet. Throughout the code, Huffman code arrays are identified by vectors with postfix hcodes_.

  • Huffman reverse code array An array which an array index identifies an alphabet and the array element in that index contains the Huffman code of the alphabet. Throughout the code, The Huffman reverse code arrays are identified by vectors with postfix rcodes_.

  • Huffman code length The number of bits in a Huffman code.

  • Huffman code length array An array of Huffman code lengths with the array index as the alphabet. Throughout the code, Huffman code length arrays are identified by vectors with postfix lens_.