|author||Grace Cham <email@example.com>||Mon May 23 14:56:59 2022 +0900|
|committer||Grace Cham <firstname.lastname@example.org>||Mon May 23 14:56:59 2022 +0900|
puffin: include <unistd.h> explicitly It is removed from build/build_config.h from libchrome r950791 onwards. For unlink, close. Bug: 231676446 Test: FEATURES=test emerge-hatch puffin Change-Id: Iae91fd56ac1403c4fe47b6a8939339a6d3049cdc
Puffin is a deterministic deflate recompressor. It is mainly used for patching deflate compressed images (zip, gzip, etc.) because patches created from deflate files/streams are usually large (deflate has a bit-aligned format, hence, changing one byte in the raw data can cause the entire deflate stream to change drastically.)
Puffin has two tools (operations):
puffpatch (shown here.) The purpose of
puffdiff operation is to create a patch between a source and a target file with one or both of them having some deflate streams. This patch is used in the
puffpatch operation to generate the target file from the source file deterministically. The patch itself is created by
bsdiff library (but can be replaced by any other diffing mechanism). But, before it uses
bsdiff to create the patch, we have to transform both the source and target files into a specific format so
bsdiff can produce smaller patches. We define
puff operation to perform such a transformation. The resulting stream is called a
puff stream. The reverse transformation (from
puff stream to deflate stream) is called a
huff is used in the client to transform the
puff stream back into its original deflate stream deterministically.
For information about deflate format see RFC 1951.
puff is an operation that decompresses only the Huffman part of the deflate stream and keeps the structure of the LZ77 coding unchanged. This is roughly equivalent of decompressing ‘half way’.
huff is the exact opposite of
puff and it deterministically converts the
puff stream back to its original deflate stream. This deterministic conversion is based on two facts:
puffstream. This means the deflate stream can be reconstructed deterministically using the Huffman tables.
The inclusion of Huffman tables in the
puff stream has minuscule space burden (on average maximum 320 bytes for each block. There is roughly one block per 32KB of uncompressed data).
bsdiff of two
puffed streams has much smaller patch in comparison to their deflate streams, but it is larger than uncompressed streams.
The major benefits
huffare deterministic operations.
huffis in order of 10X faster than full recompression. It is even faster than Huffman algorithm because it already has the Huffman tables and does not need to reconstruct it.
A deflate compressed file (gzip, zip, etc.) contains multiple deflate streams at different locations. When this file is puffed, the resulting
puff stream contains both puffs and the raw data that existed between the deflates in the compressed file. When performing
huff operation, the location of puffs in the
puff stream and deflates in the deflate stream should be passed upon. This is necessary as
huff operation has to know exactly where the locations of both puffs and deflates are. (See the following image)
puffpatch requires deflates and puffs locations in both the source and target streams. These location information is saved using Protobufs in the patch generated by
bsdiff. One requirement for these two operations are that
puffpatch has to be efficient in the client. Client devices have limited memory and CPU bandwidth and it is necessary that each of these operations are performed with the most efficiency available. In order to achieve this efficiency a new operation can be added to
bspatch, that reads and writes into a
puff streams using special interfaces for puffing and huffing on the fly.
Depending on the scheme for storing the Huffman tables, the payload size can change. We discovered that the following scheme does not produce the smallest payload, but it is the most deterministic one. In a deflate stream, Huffman tables for literals/length and distances are themselves Huffman coded. In this format, we also
puff the Huffman tables into the
puff stream instead of completely decompressing them.
There are three tables stored in this structure very similar to the one defined in RFC 1951. A Huffman table can be defined as an array of unsigned integer code length values. Three Puffed Huffman tables appear like the following scheme. The first table (codes) is the Huffman table used to decode the next two Huffman tables. The second Huffman table is used to decode literals and lengths, and the third Huffman table is used to decode distances.
Literals lists are constructed by a “length” value followed by “length” bytes of literals. The Puffer should try to merge adjacent literals lists as much as possible into one literals list in the
puff stream. This Is a length value followed by length bytes of literals (Even if there is only one literal.)
This Is a Length value followed by a Distance value.
Currently Puffin is being used in both Android and Chrome OS and is built differently in each of them. There is also a Makefile build, but it is not comprehensive.
Puffin builds an executable
puffin which can be used to perform diffing and patching algorithms using Puffin format. To get the list of options available run:
It can also be used as a library (currently used by update_engine) that provides different APIs.