doc/02-bitpacking.tex - platform/external/libvorbis - Git at Google

 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
 %!TEX root = Vorbis_I_spec.tex
 % $Id$
 \section{Bitpacking Convention} \label{vorbis:spec:bitpacking}

 \subsection{Overview}

 The Vorbis codec uses relatively unstructured raw packets containing
 arbitrary-width binary integer fields.  Logically, these packets are a
 bitstream in which bits are coded one-by-one by the encoder and then
 read one-by-one in the same monotonically increasing order by the
 decoder.  Most current binary storage arrangements group bits into a
 native word size of eight bits (octets), sixteen bits, thirty-two bits
 or, less commonly other fixed word sizes.  The Vorbis bitpacking
 convention specifies the correct mapping of the logical packet
 bitstream into an actual representation in fixed-width words.


 \subsubsection{octets, bytes and words}

 In most contemporary architectures, a 'byte' is synonymous with an
 'octet', that is, eight bits.  This has not always been the case;
 seven, ten, eleven and sixteen bit 'bytes' have been used.  For
 purposes of the bitpacking convention, a byte implies the native,
 smallest integer storage representation offered by a platform.  On
 modern platforms, this is generally assumed to be eight bits (not
 necessarily because of the processor but because of the
 filesystem/memory architecture.  Modern filesystems invariably offer
 bytes as the fundamental atom of storage).  A 'word' is an integer
 size that is a grouped multiple of this smallest size.

 The most ubiquitous architectures today consider a 'byte' to be an
 octet (eight bits) and a word to be a group of two, four or eight
 bytes (16, 32 or 64 bits).  Note however that the Vorbis bitpacking
 convention is still well defined for any native byte size; Vorbis uses
 the native bit-width of a given storage system. This document assumes
 that a byte is one octet for purposes of example.

 \subsubsection{bit order}

 A byte has a well-defined 'least significant' bit (LSb), which is the
 only bit set when the byte is storing the two's complement integer
 value +1.  A byte's 'most significant' bit (MSb) is at the opposite
 end of the byte. Bits in a byte are numbered from zero at the LSb to
 $n$ ($n=7$ in an octet) for the
 MSb.


 \subsubsection{byte order}

 Words are native groupings of multiple bytes.  Several byte orderings
 are possible in a word; the common ones are 3-2-1-0 ('big endian' or
 'most significant byte first' in which the highest-valued byte comes
 first), 0-1-2-3 ('little endian' or 'least significant byte first' in
 which the lowest value byte comes first) and less commonly 3-1-2-0 and
 0-2-1-3 ('mixed endian').

 The Vorbis bitpacking convention specifies storage and bitstream
 manipulation at the byte, not word, level, thus host word ordering is
 of a concern only during optimization when writing high performance
 code that operates on a word of storage at a time rather than by byte.
 Logically, bytes are always coded and decoded in order from byte zero
 through byte $n$.


 \subsubsection{coding bits into byte sequences}

 The Vorbis codec has need to code arbitrary bit-width integers, from
 zero to 32 bits wide, into packets.  These integer fields are not
 aligned to the boundaries of the byte representation; the next field
 is written at the bit position at which the previous field ends.

 The encoder logically packs integers by writing the LSb of a binary
 integer to the logical bitstream first, followed by next least
 significant bit, etc, until the requested number of bits have been
 coded.  When packing the bits into bytes, the encoder begins by
 placing the LSb of the integer to be written into the least
 significant unused bit position of the destination byte, followed by
 the next-least significant bit of the source integer and so on up to
 the requested number of bits.  When all bits of the destination byte
 have been filled, encoding continues by zeroing all bits of the next
 byte and writing the next bit into the bit position 0 of that byte.
 Decoding follows the same process as encoding, but by reading bits
 from the byte stream and reassembling them into integers.


 \subsubsection{signedness}

 The signedness of a specific number resulting from decode is to be
 interpreted by the decoder given decode context.  That is, the three
 bit binary pattern 'b111' can be taken to represent either 'seven' as
 an unsigned integer, or '-1' as a signed, two's complement integer.
 The encoder and decoder are responsible for knowing if fields are to
 be treated as signed or unsigned.


 \subsubsection{coding example}

 Code the 4 bit integer value '12' [b1100] into an empty bytestream.
 Bytestream result:

 \begin{Verbatim}[commandchars=\\\{\}]
               |
               V

         7 6 5 4 3 2 1 0
 byte 0 [0 0 0 0 1 1 0 0]  <-
 byte 1 [               ]
 byte 2 [               ]
 byte 3 [               ]
              ...
 byte n [               ]  bytestream length == 1 byte

 \end{Verbatim}


 Continue by coding the 3 bit integer value '-1' [b111]:

 \begin{Verbatim}[commandchars=\\\{\}]
         |
         V

         7 6 5 4 3 2 1 0
 byte 0 [0 1 1 1 1 1 0 0]  <-
 byte 1 [               ]
 byte 2 [               ]
 byte 3 [               ]
              ...
 byte n [               ]  bytestream length == 1 byte
 \end{Verbatim}


 Continue by coding the 7 bit integer value '17' [b0010001]:

 \begin{Verbatim}[commandchars=\\\{\}]
           |
           V

         7 6 5 4 3 2 1 0
 byte 0 [1 1 1 1 1 1 0 0]
 byte 1 [0 0 0 0 1 0 0 0]  <-
 byte 2 [               ]
 byte 3 [               ]
              ...
 byte n [               ]  bytestream length == 2 bytes
                           bit cursor == 6
 \end{Verbatim}


 Continue by coding the 13 bit integer value '6969' [b110 11001110 01]:

 \begin{Verbatim}[commandchars=\\\{\}]
                 |
                 V

         7 6 5 4 3 2 1 0
 byte 0 [1 1 1 1 1 1 0 0]
 byte 1 [0 1 0 0 1 0 0 0]
 byte 2 [1 1 0 0 1 1 1 0]
 byte 3 [0 0 0 0 0 1 1 0]  <-
              ...
 byte n [               ]  bytestream length == 4 bytes

 \end{Verbatim}


 \subsubsection{decoding example}

 Reading from the beginning of the bytestream encoded in the above example:

 \begin{Verbatim}[commandchars=\\\{\}]
                       |
                       V

         7 6 5 4 3 2 1 0
 byte 0 [1 1 1 1 1 1 0 0]  <-
 byte 1 [0 1 0 0 1 0 0 0]
 byte 2 [1 1 0 0 1 1 1 0]
 byte 3 [0 0 0 0 0 1 1 0]  bytestream length == 4 bytes

 \end{Verbatim}


 We read two, two-bit integer fields, resulting in the returned numbers
 'b00' and 'b11'.  Two things are worth noting here:

 \begin{itemize}
 \item Although these four bits were originally written as a single
 four-bit integer, reading some other combination of bit-widths from the
 bitstream is well defined.  There are no artificial alignment
 boundaries maintained in the bitstream.

 \item The second value is the
 two-bit-wide integer 'b11'.  This value may be interpreted either as
 the unsigned value '3', or the signed value '-1'.  Signedness is
 dependent on decode context.
 \end{itemize}


 \subsubsection{end-of-packet alignment}

 The typical use of bitpacking is to produce many independent
 byte-aligned packets which are embedded into a larger byte-aligned
 container structure, such as an Ogg transport bitstream.  Externally,
 each bytestream (encoded bitstream) must begin and end on a byte
 boundary.  Often, the encoded bitstream is not an integer number of
 bytes, and so there is unused (uncoded) space in the last byte of a
 packet.

 Unused space in the last byte of a bytestream is always zeroed during
 the coding process.  Thus, should this unused space be read, it will
 return binary zeroes.

 Attempting to read past the end of an encoded packet results in an
 'end-of-packet' condition.  End-of-packet is not to be considered an
 error; it is merely a state indicating that there is insufficient
 remaining data to fulfill the desired read size.  Vorbis uses truncated
 packets as a normal mode of operation, and as such, decoders must
 handle reading past the end of a packet as a typical mode of
 operation. Any further read operations after an 'end-of-packet'
 condition shall also return 'end-of-packet'.


 \subsubsection{reading zero bits}

 Reading a zero-bit-wide integer returns the value '0' and does not
 increment the stream cursor.  Reading to the end of the packet (but
 not past, such that an 'end-of-packet' condition has not triggered)
 and then reading a zero bit integer shall succeed, returning 0, and
 not trigger an end-of-packet condition.  Reading a zero-bit-wide
 integer after a previous read sets 'end-of-packet' shall also fail
 with 'end-of-packet'.
	% -- mode: latex; TeX-master: "Vorbis_I_spec"; --
	%!TEX root = Vorbis_I_spec.tex
	% $Id$
	\section{Bitpacking Convention} \label{vorbis:spec:bitpacking}

	\subsection{Overview}

	The Vorbis codec uses relatively unstructured raw packets containing
	arbitrary-width binary integer fields. Logically, these packets are a
	bitstream in which bits are coded one-by-one by the encoder and then
	read one-by-one in the same monotonically increasing order by the
	decoder. Most current binary storage arrangements group bits into a
	native word size of eight bits (octets), sixteen bits, thirty-two bits
	or, less commonly other fixed word sizes. The Vorbis bitpacking
	convention specifies the correct mapping of the logical packet
	bitstream into an actual representation in fixed-width words.


	\subsubsection{octets, bytes and words}

	In most contemporary architectures, a 'byte' is synonymous with an
	'octet', that is, eight bits. This has not always been the case;
	seven, ten, eleven and sixteen bit 'bytes' have been used. For
	purposes of the bitpacking convention, a byte implies the native,
	smallest integer storage representation offered by a platform. On
	modern platforms, this is generally assumed to be eight bits (not
	necessarily because of the processor but because of the
	filesystem/memory architecture. Modern filesystems invariably offer
	bytes as the fundamental atom of storage). A 'word' is an integer
	size that is a grouped multiple of this smallest size.

	The most ubiquitous architectures today consider a 'byte' to be an
	octet (eight bits) and a word to be a group of two, four or eight
	bytes (16, 32 or 64 bits). Note however that the Vorbis bitpacking
	convention is still well defined for any native byte size; Vorbis uses
	the native bit-width of a given storage system. This document assumes
	that a byte is one octet for purposes of example.

	\subsubsection{bit order}

	A byte has a well-defined 'least significant' bit (LSb), which is the
	only bit set when the byte is storing the two's complement integer
	value +1. A byte's 'most significant' bit (MSb) is at the opposite
	end of the byte. Bits in a byte are numbered from zero at the LSb to
	$n$ ($n=7$ in an octet) for the
	MSb.



	\subsubsection{byte order}

	Words are native groupings of multiple bytes. Several byte orderings
	are possible in a word; the common ones are 3-2-1-0 ('big endian' or
	'most significant byte first' in which the highest-valued byte comes
	first), 0-1-2-3 ('little endian' or 'least significant byte first' in
	which the lowest value byte comes first) and less commonly 3-1-2-0 and
	0-2-1-3 ('mixed endian').

	The Vorbis bitpacking convention specifies storage and bitstream
	manipulation at the byte, not word, level, thus host word ordering is
	of a concern only during optimization when writing high performance
	code that operates on a word of storage at a time rather than by byte.
	Logically, bytes are always coded and decoded in order from byte zero
	through byte $n$.



	\subsubsection{coding bits into byte sequences}

	The Vorbis codec has need to code arbitrary bit-width integers, from
	zero to 32 bits wide, into packets. These integer fields are not
	aligned to the boundaries of the byte representation; the next field
	is written at the bit position at which the previous field ends.

	The encoder logically packs integers by writing the LSb of a binary
	integer to the logical bitstream first, followed by next least
	significant bit, etc, until the requested number of bits have been
	coded. When packing the bits into bytes, the encoder begins by
	placing the LSb of the integer to be written into the least
	significant unused bit position of the destination byte, followed by
	the next-least significant bit of the source integer and so on up to
	the requested number of bits. When all bits of the destination byte
	have been filled, encoding continues by zeroing all bits of the next
	byte and writing the next bit into the bit position 0 of that byte.
	Decoding follows the same process as encoding, but by reading bits
	from the byte stream and reassembling them into integers.



	\subsubsection{signedness}

	The signedness of a specific number resulting from decode is to be
	interpreted by the decoder given decode context. That is, the three
	bit binary pattern 'b111' can be taken to represent either 'seven' as
	an unsigned integer, or '-1' as a signed, two's complement integer.
	The encoder and decoder are responsible for knowing if fields are to
	be treated as signed or unsigned.



	\subsubsection{coding example}

	Code the 4 bit integer value '12' [b1100] into an empty bytestream.
	Bytestream result:

	\begin{Verbatim}[commandchars=\\\{\}]
	\|
	V

	7 6 5 4 3 2 1 0
	byte 0 [0 0 0 0 1 1 0 0] <-
	byte 1 [ ]
	byte 2 [ ]
	byte 3 [ ]
	...
	byte n [ ] bytestream length == 1 byte

	\end{Verbatim}


	Continue by coding the 3 bit integer value '-1' [b111]:

	\begin{Verbatim}[commandchars=\\\{\}]
	\|
	V

	7 6 5 4 3 2 1 0
	byte 0 [0 1 1 1 1 1 0 0] <-
	byte 1 [ ]
	byte 2 [ ]
	byte 3 [ ]
	...
	byte n [ ] bytestream length == 1 byte
	\end{Verbatim}


	Continue by coding the 7 bit integer value '17' [b0010001]:

	\begin{Verbatim}[commandchars=\\\{\}]
	\|
	V

	7 6 5 4 3 2 1 0
	byte 0 [1 1 1 1 1 1 0 0]
	byte 1 [0 0 0 0 1 0 0 0] <-
	byte 2 [ ]
	byte 3 [ ]
	...
	byte n [ ] bytestream length == 2 bytes
	bit cursor == 6
	\end{Verbatim}


	Continue by coding the 13 bit integer value '6969' [b110 11001110 01]:

	\begin{Verbatim}[commandchars=\\\{\}]
	\|
	V

	7 6 5 4 3 2 1 0
	byte 0 [1 1 1 1 1 1 0 0]
	byte 1 [0 1 0 0 1 0 0 0]
	byte 2 [1 1 0 0 1 1 1 0]
	byte 3 [0 0 0 0 0 1 1 0] <-
	...
	byte n [ ] bytestream length == 4 bytes

	\end{Verbatim}




	\subsubsection{decoding example}

	Reading from the beginning of the bytestream encoded in the above example:

	\begin{Verbatim}[commandchars=\\\{\}]
	\|
	V

	7 6 5 4 3 2 1 0
	byte 0 [1 1 1 1 1 1 0 0] <-
	byte 1 [0 1 0 0 1 0 0 0]
	byte 2 [1 1 0 0 1 1 1 0]
	byte 3 [0 0 0 0 0 1 1 0] bytestream length == 4 bytes

	\end{Verbatim}


	We read two, two-bit integer fields, resulting in the returned numbers
	'b00' and 'b11'. Two things are worth noting here:

	\begin{itemize}
	\item Although these four bits were originally written as a single
	four-bit integer, reading some other combination of bit-widths from the
	bitstream is well defined. There are no artificial alignment
	boundaries maintained in the bitstream.

	\item The second value is the
	two-bit-wide integer 'b11'. This value may be interpreted either as
	the unsigned value '3', or the signed value '-1'. Signedness is
	dependent on decode context.
	\end{itemize}




	\subsubsection{end-of-packet alignment}

	The typical use of bitpacking is to produce many independent
	byte-aligned packets which are embedded into a larger byte-aligned
	container structure, such as an Ogg transport bitstream. Externally,
	each bytestream (encoded bitstream) must begin and end on a byte
	boundary. Often, the encoded bitstream is not an integer number of
	bytes, and so there is unused (uncoded) space in the last byte of a
	packet.

	Unused space in the last byte of a bytestream is always zeroed during
	the coding process. Thus, should this unused space be read, it will
	return binary zeroes.

	Attempting to read past the end of an encoded packet results in an
	'end-of-packet' condition. End-of-packet is not to be considered an
	error; it is merely a state indicating that there is insufficient
	remaining data to fulfill the desired read size. Vorbis uses truncated
	packets as a normal mode of operation, and as such, decoders must
	handle reading past the end of a packet as a typical mode of
	operation. Any further read operations after an 'end-of-packet'
	condition shall also return 'end-of-packet'.



	\subsubsection{reading zero bits}

	Reading a zero-bit-wide integer returns the value '0' and does not
	increment the stream cursor. Reading to the end of the packet (but
	not past, such that an 'end-of-packet' condition has not triggered)
	and then reading a zero bit integer shall succeed, returning 0, and
	not trigger an end-of-packet condition. Reading a zero-bit-wide
	integer after a previous read sets 'end-of-packet' shall also fail
	with 'end-of-packet'.