| Encapsulation of FLAC in ISO Base Media File Format |
| Version 0.0.4 (draft) |
| |
| Table of Contents |
| 1 Scope |
| 2 Supporting Normative References |
| 3 Design Rules of Encapsulation |
| 3.1 File Type Identification |
| 3.2 Overview of Track Structure |
| 3.3 Definition of FLAC sample |
| 3.3.1 Sample entry format |
| 3.3.2 FLAC Specific Box |
| 3.3.3 Sample format |
| 3.3.4 Duration of FLAC sample |
| 3.3.5 Sub-sample |
| 3.3.6 Random Access |
| 3.3.6.1 Random Access Point |
| 3.4 Basic Structure (informative) |
| 3.4.1 Initial Movie |
| 3.5 Example of Encapsulation (informative) |
| 4 Acknowledgements |
| 5 Author's Address |
| |
| 1 Scope |
| |
| This document specifies the normative mapping for encapsulation of |
| FLAC coded audio bitstreams in ISO Base Media file format and its |
| derivatives. The encapsulation of FLAC coded bitstreams in |
| QuickTime file format is outside the scope of this specification. |
| |
| 2 Supporting Normative References |
| |
| [1] ISO/IEC 14496-12:2012 Corrected version |
| |
| Information technology — Coding of audio-visual objects — Part |
| 12: ISO base media file format |
| |
| [2] ISO/IEC 14496-12:2012/Amd.1:2013 |
| |
| Information technology — Coding of audio-visual objects — Part |
| 12: ISO base media file format AMENDMENT 1: Various |
| enhancements including support for large metadata |
| |
| [3] FLAC format specification |
| |
| https://xiph.org/flac/format.html |
| |
| Definition of the FLAC Audio Codec stream format |
| |
| [4] FLAC-in-Ogg mapping specification |
| |
| https://xiph.org/flac/ogg_mapping.html |
| |
| Ogg Encapsulation for the FLAC Audio Codec |
| |
| [5] Matroska specification |
| |
| 3 Design Rules of Encapsulation |
| |
| 3.1 File Type Identification |
| |
| This specification does not define any brand to declare files |
| which conform to this specification. Files which conform to |
| this specification shall contain at least one brand which |
| supports the requirements and the requirements described in |
| this clause without contradiction in the compatible brands |
| list of the File Type Box. The minimal support of the |
| encapsulation of FLAC bitstreams in ISO Base Media file format |
| requires the 'isom' brand. |
| |
| 3.2 Overview of Track Structure |
| |
| FLAC coded audio shall be encapsulated into the ISO Base |
| Media File Format as media data within an audio track. |
| |
| + The handler_type field in the Handler Reference Box |
| shall be set to 'soun'. |
| |
| + The Media Information Box shall contain the Sound Media |
| Header Box. |
| |
| + The codingname of the sample entry is 'fLaC'. |
| |
| This specification does not define any encapsulation |
| using MP4AudioSampleEntry with objectTypeIndication |
| specified by the MPEG-4 Registration Authority |
| (http://www.mp4ra.org/). See section 'Sample entry |
| format' for the definition of the sample entry. |
| |
| + The 'dfLa' box is added to the sample entry to convey |
| initializing information for the decoder. |
| |
| See section 'FLAC Specific Box' for the definition of |
| the box contents. |
| |
| + A FLAC sample is exactly one FLAC frame as described |
| in the format specification[3]. See section |
| 'Sample format' for details of the frame contents. |
| |
| + Every FLAC sample is a sync sample. No pre-roll or |
| lapping is required. See section 'Random Access' for |
| further details. |
| |
| 3.3 Definition of a FLAC sample |
| |
| 3.3.1 Sample entry format |
| |
| For any track containing one or more FLAC bitstreams, a |
| sample entry describing the corresponding FLAC bitstream |
| shall be present inside the Sample Table Box. This version |
| of the specification defines only one sample entry format |
| named FLACSampleEntry whose codingname is 'fLaC'. This |
| sample entry includes exactly one FLAC Specific Box |
| defined in section 'FLAC specific box' as a mandatory box |
| and indicates that FLAC samples described by this sample |
| entry are stored by the sample format described in section |
| 'Sample format'. |
| |
| The syntax and semantics of the FLACSampleEntry is shown |
| as follows. The data fields of this box and native |
| FLAC[3] structures encoded within FLAC blocks are both |
| stored in big-endian format, though for purposes of the |
| ISO BMFF container, FLAC native metadata and data blocks |
| are treated as unstructured octet streams. |
| |
| class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){ |
| FLACSpecificBox(); |
| } |
| |
| The fields of the AudioSampleEntry portion shall be set as |
| follows: |
| |
| + channelcount: |
| |
| The channelcount field shall be set equal to the |
| channel count specified by the FLAC bitstream's native |
| METADATA_BLOCK_STREAMINFO header as described in [3]. |
| Note that the FLAC FRAME_HEADER structure that begins |
| each FLAC sample redundantly encodes channel number; |
| the number of channels declared in each FRAME_HEADER |
| MUST match the number of channels declared here and in |
| the METADATA_BLOCK_STREAMINFO header. |
| |
| + samplesize: |
| |
| The samplesize field shall be set equal to the bits |
| per sample specified by the FLAC bitstream's native |
| METADATA_BLOCK_STREAMINFO header as described in [3]. |
| Note that the FLAC FRAME_HEADER structure that begins |
| each FLAC sample redundantly encodes the number of |
| bits per sample; the bits per sample declared in each |
| FRAME_HEADER MUST match the samplesize declared here |
| and the bits per sample field declared in the |
| METADATA_BLOCK_STREAMINFO header. |
| |
| + samplerate: |
| |
| When possible, the samplerate field shall be set |
| equal to the sample rate specified by the FLAC |
| bitstream's native METADATA_BLOCK_STREAMINFO header |
| as described in [3], left-shifted by 16 bits to |
| create the appropriate 16.16 fixed-point |
| representation. |
| |
| When the bitstream's native sample rate is greater |
| than the maximum expressible value of 65535 Hz, |
| the samplerate field shall hold the greatest |
| expressible regular division of that rate. I.e. |
| the samplerate field shall hold 48000.0 for |
| native sample rates of 96 and 192 kHz. In the |
| case of unusual sample rates which do not have |
| an expressible regular division, the maximum value |
| of 65535.0 Hz should be used. |
| |
| High-rate FLAC bitstreams are common, and the native |
| value from the METADATA_BLOCK_STREAMINFO header in |
| the FLACSpecificBox MUST be read to determine the |
| correct sample rate of the bitstream. |
| |
| Note that the FLAC FRAME_HEADER structure that begins |
| each FLAC sample redundantly encodes the sample rate; |
| the sample rate declared in each FRAME_HEADER MUST |
| match the sample rate declared in the |
| METADATA_BLOCK_STREAMINFO header, and here in the |
| AudioSampleEntry portion of the FLACSampleEntry |
| as much as is allowed by the encoding restrictions |
| described above. |
| |
| Finally, the FLACSpecificBox carries codec headers: |
| |
| + FLACSpecificBox |
| |
| This box contains initializing information for the |
| decoder as defined in section 'FLAC specific box'. |
| |
| 3.3.2 FLAC Specific Box |
| |
| Exactly one FLAC Specific Box shall be present in each |
| FLACSampleEntry. This specification defines version 0 |
| of this box. If incompatible changes occur in future |
| versions of this specification, another version number |
| will be defined. The data fields of this box and native |
| FLAC[3] structures encoded within FLAC blocks are both |
| stored in big-endian format, though for purposes of the |
| ISO BMFF container, FLAC native metadata and data blocks |
| are treated as unstructured octet streams. |
| |
| The syntax and semantics of the FLAC Specific Box is shown |
| as follows. |
| |
| class FLACMetadataBlock { |
| unsigned int(1) LastMetadataBlockFlag; |
| unsigned int(7) BlockType; |
| unsigned int(24) Length; |
| unsigned int(8) BlockData[Length]; |
| } |
| |
| aligned(8) class FLACSpecificBox |
| extends FullBox('dfLa', version=0, 0){ |
| for (i=0; ; i++) { // to end of box |
| FLACMetadataBlock(); |
| } |
| } |
| |
| + Version: |
| |
| The Version field shall be set to 0. |
| |
| In the future versions of this specification, this |
| field may be set to other values. And without support |
| of those values, the reader shall not read the fields |
| after this within the FLACSpecificBox. |
| |
| + Flags: |
| |
| The Flags field shall be set to 0. |
| |
| After the FullBox header, the box contains a sequence of |
| FLAC[3] native-metadata block structures that fill the |
| remainder of the box. |
| |
| Each FLACMetadataBlock structure consists of three fields |
| filling a total of four bytes that form a FLAC[3] native |
| METADATA_BLOCK_HEADER, followed by raw octet bytes that |
| comprise the FLAC[3] native METADATA_BLOCK_DATA. |
| |
| + LastMetadataBlockFlag: |
| |
| The LastMetadataBlockFlag field maps semantically to |
| the FLAC[3] native METADATA_BLOCK_HEADER |
| Last-metadata-block flag as defined in the FLAC[3] |
| file specification. |
| |
| The LastMetadataBlockFlag is set to 1 if this |
| MetadataBlock is the last metadata block in the |
| FLACSpecificBox. It is set to 0 otherwise. |
| |
| + BlockType: |
| |
| The BlockType field maps semantically to the FLAC[3] |
| native METADATA_BLOCK_HEADER BLOCK_TYPE field as |
| defined in the FLAC[3] file specification. |
| |
| The BlockType is set to a valid FLAC[3] BLOCK_TYPE |
| value that identifies the type of this native metadata |
| block. The BlockType of the first FLACMetadataBlock |
| must be set to 0, signifying this is a FLAC[3] native |
| METADATA_BLOCK_STREAMINFO block. |
| |
| + Length: |
| |
| The Length field maps semantically to the FLAC[3] |
| native METADATA_BLOCK_HEADER Length field as |
| defined in the FLAC[3] file specification. |
| |
| The length field specifies the number of bytes of |
| MetadataBlockData to follow. |
| |
| + BlockData |
| |
| The BlockData field maps semantically to the FLAC[3] |
| native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as |
| defined in the FLAC[3] file specification. |
| |
| Taken together, the bytes of the FLACMetadataBlock form a |
| complete FLAC[3] native METADATA_BLOCK structure. |
| |
| Note that a minimum of a single FLACMetadataBlock, |
| consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO |
| structure, is required. Should the FLACSpecificBox |
| contain more than a single FLACMetadataBlock structure, |
| the FLACMetadataBlock containing the FLAC[3] native |
| METADATA_BLOCK_STREAMINFO must occur first in the list. |
| |
| Other containers that package FLAC audio streams, such as |
| Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without |
| modification similar to this specification. When |
| repackaging or remuxing FLAC[3] streams from another |
| format that contains FLAC[3] native metadata into an ISO |
| BMFF file, the complete FLAC[3] native metadata should be |
| preserved in the ISO BMFF stream as described above. It |
| is also allowed to parse this native metadata and include |
| contextually redundant ISO BMFF-native repackagings and/or |
| reparsings of FLAC[3] native metadata, so long as the |
| native metadata is also preserved. |
| |
| 3.3.3 Sample format |
| |
| A FLAC sample is exactly one FLAC audio FRAME (as defined |
| in the FLAC[3] file specification) belonging to a FLAC |
| bitstreams. The FLAC sample data begins with a complete |
| FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per |
| channel, any necessary bit padding, and ends with the |
| usual FLAC FRAME_FOOTER. |
| |
| Note that the FLAC native FRAME_HEADER structure that |
| begins each FLAC sample redundantly encodes channel count, |
| sample rate, and sample size. The values of these fields |
| must agree both with the values declared in the FLAC |
| METADATA_BLOCK_STREAMINFO structure as well as the |
| FLACSampleEntry box. |
| |
| 3.3.4 Duration of a FLAC sample |
| |
| The duration of any given FLAC sample is determined by |
| dividing the decoded block size of a FLAC frame, as |
| encoded in the FLAC FRAME's FRAME_HEADER structure, by the |
| value of the timescale field in the Media Header Box. |
| FLAC samples are permitted to have variable durations |
| within a given audio stream. FLAC does not use padding |
| values. |
| |
| 3.3.5 Sub-sample |
| |
| Sub-samples are not defined for FLAC samples in this |
| specification. |
| |
| 3.3.6 Random Access |
| |
| This subclause describes the nature of the random access |
| of FLAC sample. |
| |
| 3.3.6.1 Random Access Point |
| |
| All FLAC samples can be independently decoded |
| i.e. every FLAC sample is a sync sample. The Sync |
| Sample Box shall not be present as long as there are |
| no samples other than FLAC samples in the same |
| track. The sample_is_non_sync_sample field for FLAC |
| samples shall be set to 0. |
| |
| 3.4 Basic Structure (informative) |
| |
| 3.4.1 Initial Movie |
| |
| This subclause shows a basic structure of the Movie Box as follows: |
| |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| |moov| | | | | | | | Movie Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | |mvhd| | | | | | | Movie Header Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | |trak| | | | | | | Track Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | |tkhd| | | | | | Track Header Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | |edts|* | | | | | Edit Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | |elst|* | | | | Edit List Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | |mdia| | | | | | Media Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | |mdhd| | | | | Media Header Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | |hdlr| | | | | Handler Reference Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | |minf| | | | | Media Information Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | |smhd| | | | Sound Media Header Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | |dinf| | | | Data Information Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |dref| | | Data Reference Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | | |url | | DataEntryUrlBox | |
| +----+----+----+----+----+----+ or +----+------------------------------+ |
| | | | | | | |urn | | DataEntryUrnBox | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | |stbl| | | | Sample Table | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |stsd| | | Sample Description Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | | |fLaC| | FLACSampleEntry | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | | | |dfLa| FLAC Specific Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |stts| | | Decoding Time to Sample Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |stsc| | | Sample To Chunk Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |stsz| | | Sample Size Box | |
| +----+----+----+----+----+ or +----+----+------------------------------+ |
| | | | | | |stz2| | | Compact Sample Size Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | | | | |stco| | | Chunk Offset Box | |
| +----+----+----+----+----+ or +----+----+------------------------------+ |
| | | | | | |co64| | | Chunk Large Offset Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | |mvex|* | | | | | | Movie Extends Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| | | |trex|* | | | | | Track Extends Box | |
| +----+----+----+----+----+----+----+----+------------------------------+ |
| |
| Figure 1 - Basic structure of Movie Box |
| |
| It is strongly recommended that the order of boxes should |
| follow the above structure. Boxes marked with an asterisk |
| (*) may or may not be present depending on context. For |
| most boxes listed above, the definition is as is defined |
| in ISO/IEC 14496-12 [1]. The additional boxes and the |
| additional requirements, restrictions and recommendations |
| to the other boxes are described in this specification. |
| |
| 3.5 Example of Encapsulation (informative) |
| [File] |
| size = 17790 |
| [ftyp: File Type Box] |
| position = 0 |
| size = 24 |
| major_brand = mp42 : MP4 version 2 |
| minor_version = 0 |
| compatible_brands |
| brand[0] = mp42 : MP4 version 2 |
| brand[1] = isom : ISO Base Media file format |
| [moov: Movie Box] |
| position = 24 |
| size = 757 |
| [mvhd: Movie Header Box] |
| position = 32 |
| size = 108 |
| version = 0 |
| flags = 0x000000 |
| creation_time = UTC 2014/12/12, 18:41:19 |
| modification_time = UTC 2014/12/12, 18:41:19 |
| timescale = 48000 |
| duration = 33600 (00:00:00.700) |
| rate = 1.000000 |
| volume = 1.000000 |
| reserved = 0x0000 |
| reserved = 0x00000000 |
| reserved = 0x00000000 |
| transformation matrix |
| | a, b, u | | 1.000000, 0.000000, 0.000000 | |
| | c, d, v | = | 0.000000, 1.000000, 0.000000 | |
| | x, y, w | | 0.000000, 0.000000, 1.000000 | |
| pre_defined = 0x00000000 |
| pre_defined = 0x00000000 |
| pre_defined = 0x00000000 |
| pre_defined = 0x00000000 |
| pre_defined = 0x00000000 |
| pre_defined = 0x00000000 |
| next_track_ID = 2 |
| [iods: Object Descriptor Box] |
| position = 140 |
| size = 33 |
| version = 0 |
| flags = 0x000000 |
| [tag = 0x10: MP4_IOD] |
| expandableClassSize = 16 |
| ObjectDescriptorID = 1 |
| URL_Flag = 0 |
| includeInlineProfileLevelFlag = 0 |
| reserved = 0xf |
| ODProfileLevelIndication = 0xff |
| sceneProfileLevelIndication = 0xff |
| audioProfileLevelIndication = 0xfe |
| visualProfileLevelIndication = 0xff |
| graphicsProfileLevelIndication = 0xff |
| [tag = 0x0e: ES_ID_Inc] |
| expandableClassSize = 4 |
| Track_ID = 1 |
| [trak: Track Box] |
| position = 173 |
| size = 608 |
| [tkhd: Track Header Box] |
| position = 181 |
| size = 92 |
| version = 0 |
| flags = 0x000007 |
| Track enabled |
| Track in movie |
| Track in preview |
| creation_time = UTC 2014/12/12, 18:41:19 |
| modification_time = UTC 2014/12/12, 18:41:19 |
| track_ID = 1 |
| reserved = 0x00000000 |
| duration = 33600 (00:00:00.700) |
| reserved = 0x00000000 |
| reserved = 0x00000000 |
| layer = 0 |
| alternate_group = 0 |
| volume = 1.000000 |
| reserved = 0x0000 |
| transformation matrix |
| | a, b, u | | 1.000000, 0.000000, 0.000000 | |
| | c, d, v | = | 0.000000, 1.000000, 0.000000 | |
| | x, y, w | | 0.000000, 0.000000, 1.000000 | |
| width = 0.000000 |
| height = 0.000000 |
| [mdia: Media Box] |
| position = 273 |
| size = 472 |
| [mdhd: Media Header Box] |
| position = 281 |
| size = 32 |
| version = 0 |
| flags = 0x000000 |
| creation_time = UTC 2014/12/12, 18:41:19 |
| modification_time = UTC 2014/12/12, 18:41:19 |
| timescale = 48000 |
| duration = 34560 (00:00:00.720) |
| language = und |
| pre_defined = 0x0000 |
| [hdlr: Handler Reference Box] |
| position = 313 |
| size = 51 |
| version = 0 |
| flags = 0x000000 |
| pre_defined = 0x00000000 |
| handler_type = soun |
| reserved = 0x00000000 |
| reserved = 0x00000000 |
| reserved = 0x00000000 |
| name = Xiph Audio Handler |
| [minf: Media Information Box] |
| position = 364 |
| size = 381 |
| [smhd: Sound Media Header Box] |
| position = 372 |
| size = 16 |
| version = 0 |
| flags = 0x000000 |
| balance = 0.000000 |
| reserved = 0x0000 |
| [dinf: Data Information Box] |
| position = 388 |
| size = 36 |
| [dref: Data Reference Box] |
| position = 396 |
| size = 28 |
| version = 0 |
| flags = 0x000000 |
| entry_count = 1 |
| [url : Data Entry Url Box] |
| position = 412 |
| size = 12 |
| version = 0 |
| flags = 0x000001 |
| location = in the same file |
| [stbl: Sample Table Box] |
| position = 424 |
| size = 321 |
| [stsd: Sample Description Box] |
| position = 432 |
| size = 79 |
| version = 0 |
| flags = 0x000000 |
| entry_count = 1 |
| [fLaC: Audio Description] |
| position = 448 |
| size = 63 |
| reserved = 0x000000000000 |
| data_reference_index = 1 |
| reserved = 0x0000 |
| reserved = 0x0000 |
| reserved = 0x00000000 |
| channelcount = 2 |
| samplesize = 16 |
| pre_defined = 0 |
| reserved = 0 |
| samplerate = 48000.000000 |
| [dfLa: FLAC Specific Box] |
| position = 484 |
| size = 50 |
| version = 0 |
| flags = 0x000000 |
| [FLACMetadataBlock] |
| LastMetadataBlockFlag = 1 |
| BlockType = 0 |
| Length = 34 |
| BlockData[34]; |
| [stts: Decoding Time to Sample Box] |
| position = 492 |
| size = 24 |
| version = 0 |
| flags = 0x000000 |
| entry_count = 1 |
| entry[0] |
| sample_count = 18 |
| sample_delta = 1920 |
| [stsc: Sample To Chunk Box] |
| position = 516 |
| size = 40 |
| version = 0 |
| flags = 0x000000 |
| entry_count = 2 |
| entry[0] |
| first_chunk = 1 |
| samples_per_chunk = 13 |
| sample_description_index = 1 |
| entry[1] |
| first_chunk = 2 |
| samples_per_chunk = 5 |
| sample_description_index = 1 |
| [stsz: Sample Size Box] |
| position = 556 |
| size = 92 |
| version = 0 |
| flags = 0x000000 |
| sample_size = 0 (variable) |
| sample_count = 18 |
| entry_size[0] = 977 |
| entry_size[1] = 938 |
| entry_size[2] = 939 |
| entry_size[3] = 938 |
| entry_size[4] = 934 |
| entry_size[5] = 945 |
| entry_size[6] = 948 |
| entry_size[7] = 956 |
| entry_size[8] = 955 |
| entry_size[9] = 930 |
| entry_size[10] = 933 |
| entry_size[11] = 934 |
| entry_size[12] = 972 |
| entry_size[13] = 977 |
| entry_size[14] = 958 |
| entry_size[15] = 949 |
| entry_size[16] = 962 |
| entry_size[17] = 848 |
| [stco: Chunk Offset Box] |
| position = 648 |
| size = 24 |
| version = 0 |
| flags = 0x000000 |
| entry_count = 2 |
| chunk_offset[0] = 686 |
| chunk_offset[1] = 12985 |
| [free: Free Space Box] |
| position = 672 |
| size = 8 |
| [mdat: Media Data Box] |
| position = 680 |
| size = 17001 |
| |
| 4 Acknowledgements |
| |
| This spec draws heavily from the Opus-in-ISOBMFF specification |
| work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com> |
| |
| Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura |
| for valuable feedback. Thank you to Ralph Giles for editorial |
| help. |
| |
| 5 Author Address |
| |
| Monty Montgomery <cmontgomery@mozilla.com> |