doc/stereo.html - platform/external/libvorbis - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html>
 <head>

 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
 <title>Ogg Vorbis Documentation</title>

 <style type="text/css">
 body {
   margin: 0 18px 0 18px;
   padding-bottom: 30px;
   font-family: Verdana, Arial, Helvetica, sans-serif;
   color: #333333;
   font-size: .8em;
 }

 a {
   color: #3366cc;
 }

 img {
   border: 0;
 }

 #xiphlogo {
   margin: 30px 0 16px 0;
 }

 #content p {
   line-height: 1.4;
 }

 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
   font-weight: bold;
   color: #ff9900;
   margin: 1.3em 0 8px 0;
 }

 h1 {
   font-size: 1.3em;
 }

 h2 {
   font-size: 1.2em;
 }

 h3 {
   font-size: 1.1em;
 }

 li {
   line-height: 1.4;
 }

 #copyright {
   margin-top: 30px;
   line-height: 1.5em;
   text-align: center;
   font-size: .8em;
   color: #888888;
   clear: both;
 }
 </style>

 </head>

 <body>

 <div id="xiphlogo">
   <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
 </div>

 <h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>

 <h2>Abstract</h2>

 <p>The Vorbis audio CODEC provides a channel coupling
 mechanisms designed to reduce effective bitrate by both eliminating
 interchannel redundancy and eliminating stereo image information
 labeled inaudible or undesirable according to spatial psychoacoustic
 models. This document describes both the mechanical coupling
 mechanisms available within the Vorbis specification, as well as the
 specific stereo coupling models used by the reference
 <tt>libvorbis</tt> codec provided by xiph.org.</p>

 <h2>Mechanisms</h2>

 <p>In encoder release beta 4 and earlier, Vorbis supported multiple
 channel encoding, but the channels were encoded entirely separately
 with no cross-analysis or redundancy elimination between channels.
 This multichannel strategy is very similar to the mp3's <em>dual
 stereo</em> mode and Vorbis uses the same name for its analogous
 uncoupled multichannel modes.</p>

 <p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
 later implement a coupled channel strategy. Vorbis has two specific
 mechanisms that may be used alone or in conjunction to implement
 channel coupling. The first is <em>channel interleaving</em> via
 residue backend type 2, and the second is <em>square polar
 mapping</em>. These two general mechanisms are particularly well
 suited to coupling due to the structure of Vorbis encoding, as we'll
 explore below, and using both we can implement both totally
 <em>lossless stereo image coupling</em> [bit-for-bit decode-identical
 to uncoupled modes], as well as various lossy models that seek to
 eliminate inaudible or unimportant aspects of the stereo image in
 order to enhance bitrate. The exact coupling implementation is
 generalized to allow the encoder a great deal of flexibility in
 implementation of a stereo or surround model without requiring any
 significant complexity increase over the combinatorially simpler
 mid/side joint stereo of mp3 and other current audio codecs.</p>

 <p>A particular Vorbis bitstream may apply channel coupling directly to
 more than a pair of channels; polar mapping is hierarchical such that
 polar coupling may be extrapolated to an arbitrary number of channels
 and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
 surround. However, the scope of this document restricts itself to the
 stereo coupling case.</p>

 <a name="sqpm"></a>
 <h3>Square Polar Mapping</h3>

 <h4>maximal correlation</h4>

 <p>Recall that the basic structure of a a Vorbis I stream first generates
 from input audio a spectral 'floor' function that serves as an
 MDCT-domain whitening filter. This floor is meant to represent the
 rough envelope of the frequency spectrum, using whatever metric the
 encoder cares to define. This floor is subtracted from the log
 frequency spectrum, effectively normalizing the spectrum by frequency.
 Each input channel is associated with a unique floor function.</p>

 <p>The basic idea behind any stereo coupling is that the left and right
 channels usually correlate. This correlation is even stronger if one
 first accounts for energy differences in any given frequency band
 across left and right; think for example of individual instruments
 mixed into different portions of the stereo image, or a stereo
 recording with a dominant feature not perfectly in the center. The
 floor functions, each specific to a channel, provide the perfect means
 of normalizing left and right energies across the spectrum to maximize
 correlation before coupling. This feature of the Vorbis format is not
 a convenient accident.</p>

 <p>Because we strive to maximally correlate the left and right channels
 and generally succeed in doing so, left and right residue is typically
 nearly identical. We could use channel interleaving (discussed below)
 alone to efficiently remove the redundancy between the left and right
 channels as a side effect of entropy encoding, but a polar
 representation gives benefits when left/right correlation is
 strong.</p>

 <h4>point and diffuse imaging</h4>

 <p>The first advantage of a polar representation is that it effectively
 separates the spatial audio information into a 'point image'
 (magnitude) at a given frequency and located somewhere in the sound
 field, and a 'diffuse image' (angle) that fills a large amount of
 space simultaneously. Even if we preserve only the magnitude (point)
 data, a detailed and carefully chosen floor function in each channel
 provides us with a free, fine-grained, frequency relative intensity
 stereo*. Angle information represents diffuse sound fields, such as
 reverberation that fills the entire space simultaneously.</p>

 <p>*<em>Because the Vorbis model supports a number of different possible
 stereo models and these models may be mixed, we do not use the term
 'intensity stereo' talking about Vorbis; instead we use the terms
 'point stereo', 'phase stereo' and subcategories of each.</em></p>

 <p>The majority of a stereo image is representable by polar magnitude
 alone, as strong sounds tend to be produced at near-point sources;
 even non-diffuse, fast, sharp echoes track very accurately using
 magnitude representation almost alone (for those experimenting with
 Vorbis tuning, this strategy works much better with the precise,
 piecewise control of floor 1; the continuous approximation of floor 0
 results in unstable imaging). Reverberation and diffuse sounds tend
 to contain less energy and be psychoacoustically dominated by the
 point sources embedded in them. Thus, we again tend to concentrate
 more represented energy into a predictably smaller number of numbers.
 Separating representation of point and diffuse imaging also allows us
 to model and manipulate point and diffuse qualities separately.</p>

 <h4>controlling bit leakage and symbol crosstalk</h4>

 <p>Because polar
 representation concentrates represented energy into fewer large
 values, we reduce bit 'leakage' during cascading (multistage VQ
 encoding) as a secondary benefit. A single large, monolithic VQ
 codebook is more efficient than a cascaded book due to entropy
 'crosstalk' among symbols between different stages of a multistage cascade.
 Polar representation is a way of further concentrating entropy into
 predictable locations so that codebook design can take steps to
 improve multistage codebook efficiency. It also allows us to cascade
 various elements of the stereo image independently.</p>

 <h4>eliminating trigonometry and rounding</h4>

 <p>Rounding and computational complexity are potential problems with a
 polar representation. As our encoding process involves quantization,
 mixing a polar representation and quantization makes it potentially
 impossible, depending on implementation, to construct a coupled stereo
 mechanism that results in bit-identical decompressed output compared
 to an uncoupled encoding should the encoder desire it.</p>

 <p>Vorbis uses a mapping that preserves the most useful qualities of
 polar representation, relies only on addition/subtraction (during
 decode; high quality encoding still requires some trig), and makes it
 trivial before or after quantization to represent an angle/magnitude
 through a one-to-one mapping from possible left/right value
 permutations. We do this by basing our polar representation on the
 unit square rather than the unit-circle.</p>

 <p>Given a magnitude and angle, we recover left and right using the
 following function (note that A/B may be left/right or right/left
 depending on the coupling definition used by the encoder):</p>

 <pre>
       if(magnitude>0)
         if(angle>0){
           A=magnitude;
           B=magnitude-angle;
         }else{
           B=magnitude;
           A=magnitude+angle;
         }
       else
         if(angle>0){
           A=magnitude;
           B=magnitude+angle;
         }else{
           B=magnitude;
           A=magnitude-angle;
         }
     }
 </pre>

 <p>The function is antisymmetric for positive and negative magnitudes in
 order to eliminate a redundant value when quantizing. For example, if
 we're quantizing to integer values, we can visualize a magnitude of 5
 and an angle of -2 as follows:</p>

 <p><img src="squarepolar.png" alt="square polar"/></p>

 <p>This representation loses or replicates no values; if the range of A
 and B are integral -5 through 5, the number of possible Cartesian
 permutations is 121. Represented in square polar notation, the
 possible values are:</p>

 <pre>
  0, 0

 -1,-2  -1,-1  -1, 0  -1, 1

  1,-2   1,-1   1, 0   1, 1

 -2,-4  -2,-3  -2,-2  -2,-1  -2, 0  -2, 1  -2, 2  -2, 3

  2,-4   2,-3   ... following the pattern ...

  ...   5, 1   5, 2   5, 3   5, 4   5, 5   5, 6   5, 7   5, 8   5, 9

 </pre>

 <p>...for a grand total of 121 possible values, the same number as in
 Cartesian representation (note that, for example, <tt>5,-10</tt> is
 the same as <tt>-5,10</tt>, so there's no reason to represent
 both. 2,10 cannot happen, and there's no reason to account for it.)
 It's also obvious that this mapping is exactly reversible.</p>

 <h3>Channel interleaving</h3>

 <p>We can remap and A/B vector using polar mapping into a magnitude/angle
 vector, and it's clear that, in general, this concentrates energy in
 the magnitude vector and reduces the amount of information to encode
 in the angle vector. Encoding these vectors independently with
 residue backend #0 or residue backend #1 will result in bitrate
 savings. However, there are still implicit correlations between the
 magnitude and angle vectors. The most obvious is that the amplitude
 of the angle is bounded by its corresponding magnitude value.</p>

 <p>Entropy coding the results, then, further benefits from the entropy
 model being able to compress magnitude and angle simultaneously. For
 this reason, Vorbis implements residue backend #2 which pre-interleaves
 a number of input vectors (in the stereo case, two, A and B) into a
 single output vector (with the elements in the order of
 A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
 each vector to be coded by the vector quantization backend consists of
 matching magnitude and angle values.</p>

 <p>The astute reader, at this point, will notice that in the theoretical
 case in which we can use monolithic codebooks of arbitrarily large
 size, we can directly interleave and encode left and right without
 polar mapping; in fact, the polar mapping does not appear to lend any
 benefit whatsoever to the efficiency of the entropy coding. In fact,
 it is perfectly possible and reasonable to build a Vorbis encoder that
 dispenses with polar mapping entirely and merely interleaves the
 channel. Libvorbis based encoders may configure such an encoding and
 it will work as intended.</p>

 <p>However, when we leave the ideal/theoretical domain, we notice that
 polar mapping does give additional practical benefits, as discussed in
 the above section on polar mapping and summarized again here:</p>

 <ul>
 <li>Polar mapping aids in controlling entropy 'leakage' between stages
 of a cascaded codebook.</li>
 <li>Polar mapping separates the stereo image
 into point and diffuse components which may be analyzed and handled
 differently.</li>
 </ul>

 <h2>Stereo Models</h2>

 <h3>Dual Stereo</h3>

 <p>Dual stereo refers to stereo encoding where the channels are entirely
 separate; they are analyzed and encoded as entirely distinct entities.
 This terminology is familiar from mp3.</p>

 <h3>Lossless Stereo</h3>

 <p>Using polar mapping and/or channel interleaving, it's possible to
 couple Vorbis channels losslessly, that is, construct a stereo
 coupling encoding that both saves space but also decodes
 bit-identically to dual stereo. OggEnc 1.0 and later uses this
 mode in all high-bitrate encoding.</p>

 <p>Overall, this stereo mode is overkill; however, it offers a safe
 alternative to users concerned about the slightest possible
 degradation to the stereo image or archival quality audio.</p>

 <h3>Phase Stereo</h3>

 <p>Phase stereo is the least aggressive means of gracefully dropping
 resolution from the stereo image; it affects only diffuse imaging.</p>

 <p>It's often quoted that the human ear is deaf to signal phase above
 about 4kHz; this is nearly true and a passable rule of thumb, but it
 can be demonstrated that even an average user can tell the difference
 between high frequency in-phase and out-of-phase noise. Obviously
 then, the statement is not entirely true. However, it's also the case
 that one must resort to nearly such an extreme demonstration before
 finding the counterexample.</p>

 <p>'Phase stereo' is simply a more aggressive quantization of the polar
 angle vector; above 4kHz it's generally quite safe to quantize noise
 and noisy elements to only a handful of allowed phases, or to thin the
 phase with respect to the magnitude. The phases of high amplitude
 pure tones may or may not be preserved more carefully (they are
 relatively rare and L/R tend to be in phase, so there is generally
 little reason not to spend a few more bits on them)</p>

 <h4>example: eight phase stereo</h4>

 <p>Vorbis may implement phase stereo coupling by preserving the entirety
 of the magnitude vector (essential to fine amplitude and energy
 resolution overall) and quantizing the angle vector to one of only
 four possible values. Given that the magnitude vector may be positive
 or negative, this results in left and right phase having eight
 possible permutation, thus 'eight phase stereo':</p>

 <p><img src="eightphase.png" alt="eight phase"/></p>

 <p>Left and right may be in phase (positive or negative), the most common
 case by far, or out of phase by 90 or 180 degrees.</p>

 <h4>example: four phase stereo</h4>

 <p>Similarly, four phase stereo takes the quantization one step further;
 it allows only in-phase and 180 degree out-out-phase signals:</p>

 <p><img src="fourphase.png" alt="four phase"/></p>

 <h3>example: point stereo</h3>

 <p>Point stereo eliminates the possibility of out-of-phase signal
 entirely. Any diffuse quality to a sound source tends to collapse
 inward to a point somewhere within the stereo image. A practical
 example would be balanced reverberations within a large, live space;
 normally the sound is diffuse and soft, giving a sonic impression of
 volume. In point-stereo, the reverberations would still exist, but
 sound fairly firmly centered within the image (assuming the
 reverberation was centered overall; if the reverberation is stronger
 to the left, then the point of localization in point stereo would be
 to the left). This effect is most noticeable at low and mid
 frequencies and using headphones (which grant perfect stereo
 separation). Point stereo is is a graceful but generally easy to
 detect degradation to the sound quality and is thus used in frequency
 ranges where it is least noticeable.</p>

 <h3>Mixed Stereo</h3>

 <p>Mixed stereo is the simultaneous use of more than one of the above
 stereo encoding models, generally using more aggressive modes in
 higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>

 <p>It is also the case that near-DC frequencies should be encoded using
 lossless coupling to avoid frame blocking artifacts.</p>

 <h3>Vorbis Stereo Modes</h3>

 <p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
 constructed out of lossless and point stereo. Phase stereo was used
 in the rc2 encoder, but is not currently used for simplicity's sake. It
 will likely be re-added to the stereo model in the future.</p>

 <div id="copyright">
   The Xiph Fish Logo is a
   trademark (&trade;) of Xiph.Org.<br/>

   These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
 </div>

 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
	<html>
	<head>

	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
	<title>Ogg Vorbis Documentation</title>

	<style type="text/css">
	body {
	margin: 0 18px 0 18px;
	padding-bottom: 30px;
	font-family: Verdana, Arial, Helvetica, sans-serif;
	color: #333333;
	font-size: .8em;
	}

	a {
	color: #3366cc;
	}

	img {
	border: 0;
	}

	#xiphlogo {
	margin: 30px 0 16px 0;
	}

	#content p {
	line-height: 1.4;
	}

	h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
	font-weight: bold;
	color: #ff9900;
	margin: 1.3em 0 8px 0;
	}

	h1 {
	font-size: 1.3em;
	}

	h2 {
	font-size: 1.2em;
	}

	h3 {
	font-size: 1.1em;
	}

	li {
	line-height: 1.4;
	}

	#copyright {
	margin-top: 30px;
	line-height: 1.5em;
	text-align: center;
	font-size: .8em;
	color: #888888;
	clear: both;
	}
	</style>

	</head>

	<body>

	<div id="xiphlogo">
	<a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
	</div>

	<h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>

	<h2>Abstract</h2>

	<p>The Vorbis audio CODEC provides a channel coupling
	mechanisms designed to reduce effective bitrate by both eliminating
	interchannel redundancy and eliminating stereo image information
	labeled inaudible or undesirable according to spatial psychoacoustic
	models. This document describes both the mechanical coupling
	mechanisms available within the Vorbis specification, as well as the
	specific stereo coupling models used by the reference
	<tt>libvorbis</tt> codec provided by xiph.org.</p>

	<h2>Mechanisms</h2>

	<p>In encoder release beta 4 and earlier, Vorbis supported multiple
	channel encoding, but the channels were encoded entirely separately
	with no cross-analysis or redundancy elimination between channels.
	This multichannel strategy is very similar to the mp3's <em>dual
	stereo</em> mode and Vorbis uses the same name for its analogous
	uncoupled multichannel modes.</p>

	<p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
	later implement a coupled channel strategy. Vorbis has two specific
	mechanisms that may be used alone or in conjunction to implement
	channel coupling. The first is <em>channel interleaving</em> via
	residue backend type 2, and the second is <em>square polar
	mapping</em>. These two general mechanisms are particularly well
	suited to coupling due to the structure of Vorbis encoding, as we'll
	explore below, and using both we can implement both totally
	<em>lossless stereo image coupling</em> [bit-for-bit decode-identical
	to uncoupled modes], as well as various lossy models that seek to
	eliminate inaudible or unimportant aspects of the stereo image in
	order to enhance bitrate. The exact coupling implementation is
	generalized to allow the encoder a great deal of flexibility in
	implementation of a stereo or surround model without requiring any
	significant complexity increase over the combinatorially simpler
	mid/side joint stereo of mp3 and other current audio codecs.</p>

	<p>A particular Vorbis bitstream may apply channel coupling directly to
	more than a pair of channels; polar mapping is hierarchical such that
	polar coupling may be extrapolated to an arbitrary number of channels
	and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
	surround. However, the scope of this document restricts itself to the
	stereo coupling case.</p>

	<a name="sqpm"></a>
	<h3>Square Polar Mapping</h3>

	<h4>maximal correlation</h4>

	<p>Recall that the basic structure of a a Vorbis I stream first generates
	from input audio a spectral 'floor' function that serves as an
	MDCT-domain whitening filter. This floor is meant to represent the
	rough envelope of the frequency spectrum, using whatever metric the
	encoder cares to define. This floor is subtracted from the log
	frequency spectrum, effectively normalizing the spectrum by frequency.
	Each input channel is associated with a unique floor function.</p>

	<p>The basic idea behind any stereo coupling is that the left and right
	channels usually correlate. This correlation is even stronger if one
	first accounts for energy differences in any given frequency band
	across left and right; think for example of individual instruments
	mixed into different portions of the stereo image, or a stereo
	recording with a dominant feature not perfectly in the center. The
	floor functions, each specific to a channel, provide the perfect means
	of normalizing left and right energies across the spectrum to maximize
	correlation before coupling. This feature of the Vorbis format is not
	a convenient accident.</p>

	<p>Because we strive to maximally correlate the left and right channels
	and generally succeed in doing so, left and right residue is typically
	nearly identical. We could use channel interleaving (discussed below)
	alone to efficiently remove the redundancy between the left and right
	channels as a side effect of entropy encoding, but a polar
	representation gives benefits when left/right correlation is
	strong.</p>

	<h4>point and diffuse imaging</h4>

	<p>The first advantage of a polar representation is that it effectively
	separates the spatial audio information into a 'point image'
	(magnitude) at a given frequency and located somewhere in the sound
	field, and a 'diffuse image' (angle) that fills a large amount of
	space simultaneously. Even if we preserve only the magnitude (point)
	data, a detailed and carefully chosen floor function in each channel
	provides us with a free, fine-grained, frequency relative intensity
	stereo*. Angle information represents diffuse sound fields, such as
	reverberation that fills the entire space simultaneously.</p>

	<p>*<em>Because the Vorbis model supports a number of different possible
	stereo models and these models may be mixed, we do not use the term
	'intensity stereo' talking about Vorbis; instead we use the terms
	'point stereo', 'phase stereo' and subcategories of each.</em></p>

	<p>The majority of a stereo image is representable by polar magnitude
	alone, as strong sounds tend to be produced at near-point sources;
	even non-diffuse, fast, sharp echoes track very accurately using
	magnitude representation almost alone (for those experimenting with
	Vorbis tuning, this strategy works much better with the precise,
	piecewise control of floor 1; the continuous approximation of floor 0
	results in unstable imaging). Reverberation and diffuse sounds tend
	to contain less energy and be psychoacoustically dominated by the
	point sources embedded in them. Thus, we again tend to concentrate
	more represented energy into a predictably smaller number of numbers.
	Separating representation of point and diffuse imaging also allows us
	to model and manipulate point and diffuse qualities separately.</p>

	<h4>controlling bit leakage and symbol crosstalk</h4>

	<p>Because polar
	representation concentrates represented energy into fewer large
	values, we reduce bit 'leakage' during cascading (multistage VQ
	encoding) as a secondary benefit. A single large, monolithic VQ
	codebook is more efficient than a cascaded book due to entropy
	'crosstalk' among symbols between different stages of a multistage cascade.
	Polar representation is a way of further concentrating entropy into
	predictable locations so that codebook design can take steps to
	improve multistage codebook efficiency. It also allows us to cascade
	various elements of the stereo image independently.</p>

	<h4>eliminating trigonometry and rounding</h4>

	<p>Rounding and computational complexity are potential problems with a
	polar representation. As our encoding process involves quantization,
	mixing a polar representation and quantization makes it potentially
	impossible, depending on implementation, to construct a coupled stereo
	mechanism that results in bit-identical decompressed output compared
	to an uncoupled encoding should the encoder desire it.</p>

	<p>Vorbis uses a mapping that preserves the most useful qualities of
	polar representation, relies only on addition/subtraction (during
	decode; high quality encoding still requires some trig), and makes it
	trivial before or after quantization to represent an angle/magnitude
	through a one-to-one mapping from possible left/right value
	permutations. We do this by basing our polar representation on the
	unit square rather than the unit-circle.</p>

	<p>Given a magnitude and angle, we recover left and right using the
	following function (note that A/B may be left/right or right/left
	depending on the coupling definition used by the encoder):</p>

	<pre>
	if(magnitude>0)
	if(angle>0){
	A=magnitude;
	B=magnitude-angle;
	}else{
	B=magnitude;
	A=magnitude+angle;
	}
	else
	if(angle>0){
	A=magnitude;
	B=magnitude+angle;
	}else{
	B=magnitude;
	A=magnitude-angle;
	}
	}
	</pre>

	<p>The function is antisymmetric for positive and negative magnitudes in
	order to eliminate a redundant value when quantizing. For example, if
	we're quantizing to integer values, we can visualize a magnitude of 5
	and an angle of -2 as follows:</p>

	<p><img src="squarepolar.png" alt="square polar"/></p>

	<p>This representation loses or replicates no values; if the range of A
	and B are integral -5 through 5, the number of possible Cartesian
	permutations is 121. Represented in square polar notation, the
	possible values are:</p>

	<pre>
	0, 0

	-1,-2 -1,-1 -1, 0 -1, 1

	1,-2 1,-1 1, 0 1, 1

	-2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3

	2,-4 2,-3 ... following the pattern ...

	... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9

	</pre>

	<p>...for a grand total of 121 possible values, the same number as in
	Cartesian representation (note that, for example, <tt>5,-10</tt> is
	the same as <tt>-5,10</tt>, so there's no reason to represent
	both. 2,10 cannot happen, and there's no reason to account for it.)
	It's also obvious that this mapping is exactly reversible.</p>

	<h3>Channel interleaving</h3>

	<p>We can remap and A/B vector using polar mapping into a magnitude/angle
	vector, and it's clear that, in general, this concentrates energy in
	the magnitude vector and reduces the amount of information to encode
	in the angle vector. Encoding these vectors independently with
	residue backend #0 or residue backend #1 will result in bitrate
	savings. However, there are still implicit correlations between the
	magnitude and angle vectors. The most obvious is that the amplitude
	of the angle is bounded by its corresponding magnitude value.</p>

	<p>Entropy coding the results, then, further benefits from the entropy
	model being able to compress magnitude and angle simultaneously. For
	this reason, Vorbis implements residue backend #2 which pre-interleaves
	a number of input vectors (in the stereo case, two, A and B) into a
	single output vector (with the elements in the order of
	A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
	each vector to be coded by the vector quantization backend consists of
	matching magnitude and angle values.</p>

	<p>The astute reader, at this point, will notice that in the theoretical
	case in which we can use monolithic codebooks of arbitrarily large
	size, we can directly interleave and encode left and right without
	polar mapping; in fact, the polar mapping does not appear to lend any
	benefit whatsoever to the efficiency of the entropy coding. In fact,
	it is perfectly possible and reasonable to build a Vorbis encoder that
	dispenses with polar mapping entirely and merely interleaves the
	channel. Libvorbis based encoders may configure such an encoding and
	it will work as intended.</p>

	<p>However, when we leave the ideal/theoretical domain, we notice that
	polar mapping does give additional practical benefits, as discussed in
	the above section on polar mapping and summarized again here:</p>

	<ul>
	<li>Polar mapping aids in controlling entropy 'leakage' between stages
	of a cascaded codebook.</li>
	<li>Polar mapping separates the stereo image
	into point and diffuse components which may be analyzed and handled
	differently.</li>
	</ul>

	<h2>Stereo Models</h2>

	<h3>Dual Stereo</h3>

	<p>Dual stereo refers to stereo encoding where the channels are entirely
	separate; they are analyzed and encoded as entirely distinct entities.
	This terminology is familiar from mp3.</p>

	<h3>Lossless Stereo</h3>

	<p>Using polar mapping and/or channel interleaving, it's possible to
	couple Vorbis channels losslessly, that is, construct a stereo
	coupling encoding that both saves space but also decodes
	bit-identically to dual stereo. OggEnc 1.0 and later uses this
	mode in all high-bitrate encoding.</p>

	<p>Overall, this stereo mode is overkill; however, it offers a safe
	alternative to users concerned about the slightest possible
	degradation to the stereo image or archival quality audio.</p>

	<h3>Phase Stereo</h3>

	<p>Phase stereo is the least aggressive means of gracefully dropping
	resolution from the stereo image; it affects only diffuse imaging.</p>

	<p>It's often quoted that the human ear is deaf to signal phase above
	about 4kHz; this is nearly true and a passable rule of thumb, but it
	can be demonstrated that even an average user can tell the difference
	between high frequency in-phase and out-of-phase noise. Obviously
	then, the statement is not entirely true. However, it's also the case
	that one must resort to nearly such an extreme demonstration before
	finding the counterexample.</p>

	<p>'Phase stereo' is simply a more aggressive quantization of the polar
	angle vector; above 4kHz it's generally quite safe to quantize noise
	and noisy elements to only a handful of allowed phases, or to thin the
	phase with respect to the magnitude. The phases of high amplitude
	pure tones may or may not be preserved more carefully (they are
	relatively rare and L/R tend to be in phase, so there is generally
	little reason not to spend a few more bits on them)</p>

	<h4>example: eight phase stereo</h4>

	<p>Vorbis may implement phase stereo coupling by preserving the entirety
	of the magnitude vector (essential to fine amplitude and energy
	resolution overall) and quantizing the angle vector to one of only
	four possible values. Given that the magnitude vector may be positive
	or negative, this results in left and right phase having eight
	possible permutation, thus 'eight phase stereo':</p>

	<p><img src="eightphase.png" alt="eight phase"/></p>

	<p>Left and right may be in phase (positive or negative), the most common
	case by far, or out of phase by 90 or 180 degrees.</p>

	<h4>example: four phase stereo</h4>

	<p>Similarly, four phase stereo takes the quantization one step further;
	it allows only in-phase and 180 degree out-out-phase signals:</p>

	<p><img src="fourphase.png" alt="four phase"/></p>

	<h3>example: point stereo</h3>

	<p>Point stereo eliminates the possibility of out-of-phase signal
	entirely. Any diffuse quality to a sound source tends to collapse
	inward to a point somewhere within the stereo image. A practical
	example would be balanced reverberations within a large, live space;
	normally the sound is diffuse and soft, giving a sonic impression of
	volume. In point-stereo, the reverberations would still exist, but
	sound fairly firmly centered within the image (assuming the
	reverberation was centered overall; if the reverberation is stronger
	to the left, then the point of localization in point stereo would be
	to the left). This effect is most noticeable at low and mid
	frequencies and using headphones (which grant perfect stereo
	separation). Point stereo is is a graceful but generally easy to
	detect degradation to the sound quality and is thus used in frequency
	ranges where it is least noticeable.</p>

	<h3>Mixed Stereo</h3>

	<p>Mixed stereo is the simultaneous use of more than one of the above
	stereo encoding models, generally using more aggressive modes in
	higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>

	<p>It is also the case that near-DC frequencies should be encoded using
	lossless coupling to avoid frame blocking artifacts.</p>

	<h3>Vorbis Stereo Modes</h3>

	<p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
	constructed out of lossless and point stereo. Phase stereo was used
	in the rc2 encoder, but is not currently used for simplicity's sake. It
	will likely be re-added to the stereo model in the future.</p>

	<div id="copyright">
	The Xiph Fish Logo is a
	trademark (™) of Xiph.Org.<br/>

	These pages © 1994 - 2005 Xiph.Org. All rights reserved.
	</div>

	</body>
	</html>