This document will as the C++ port matures serve as a log to how different parts of the library work. As of today, there is some general info but mostly CMap specific details.
One of the important goals in sfntly
is thread safety which is why tables can only be created with their nested Builder
class and are immutable after creation.
CMapTable
CMap = character map; it converts code points in a code page to glyph IDs.
The CMapTable is a table of CMaps (CMaps are also tables; one for every encoding supported by the font). Representing an encoding-dependent character map is in one of 14 formats, out of which formats 0 and 4 are the most used; sfntly/C++ will initially only support formats 0, 2, 4 and 12.
CMapFormat0
Byte encoding tableFormat 0 is a basic table where a character’s glyph ID is looked up in a glyphIdArray256. As it only supports 256 characters it can only encode ASCII and ISO 8859-x (alphabet-based languages).
CMapFormat2
High-byte mapping through tableChinese, Japanese and Korean (CJK) need special 2 byte encodings for each code point like Shift-JIS.
CMapFormat4
Segment mapping to delta valuesThis is the preferred format for Unicode Basic Multilingual Plane (BMP) encodings according to the Microsoft spec. Format 4 defines segments (contiguous ranges of characters; variable length). Finding a character’s glyph id first means finding the segment it is part of using a binary search (the segments are sorted). A segment has a startCode
, an endCode
(the minimum and maximum code points in the segment), an idDelta
(delta for all code points in the segment) and an idRangeOffset
(offset into glyphIdArray or 0).
idDelta
and idRangeOffset
seem to be the same thing, offsets. In fact, idRangeOffset
uses the glyph array to get the index by relying on the fact that the array is immediately after the idRangeOffset
table in the font file. So, the segment’s offset is idRangeOffset[i]
but since the idRangeOffset
table contains words and not bytes, the value is divided by 2.
glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
idDelta[i]
is another kind of segment offset used when idRangeOffset[i] = 0
, in which case it is added directly to the character code.
glyphIndex = idDelta[i] + c
CMapTable
is the main class and the container for all other CMap related classes.
CMapTable::CMapId
describes a pair of IDs, platform ID and encoding ID that form the CMaps ID. The ID a CMap has is usually a good indicator as to what kind of format the CMap uses (Unicode CMaps are usually either format 4 or format 12).CMapTable::CMapIdComparator
CMapTable::CMapIterator
iteration through the CMapTable is supported through a Java-style iterator.CMapTable::CMapFilter
Java-style filter; CMapIterator supports filtering CMaps. By default, it accepts everything CMap.CMapTable::CMapIdFilter
extends CMapFilter; only accepts one type of CMap. Used in conjunction with CMapIterator, this is how the CMap getters are implemented.CMapTable::Builder
is the only way to create a CMapTable.CMapTable::CMap
is the abstract base class that all CMapFormat*
derive. It defines basic functions and the abstract CMapTable::CMap::CharacterIterator
class to iterate through the characters in the map. The basic implementation just loops through every character between a start and an end. This is overridden so that format specific iteration is performed.CMapFormat0
(mostly done?)CMapFormat2
(needs builders)[todo: will add images soon; need to upload to svn]
Building a data table in sfntly is done by the FontDataTable::Builder::build
method which defines the general pipeline and leaves the details to each implementing subclass (CMapTable::Builder
for example). Note: sub*
methods are table specific
ReadableFontDataPtr data = internalReadData()
There are 2 private fields in the
FontDataTable::Builder
class:rData
andwData
forReadableFontData
andWritableFontData
. This function returnsrData
if there is any orwData
(it is cast to readable font data) ifrData
is null. They hold the same data!
if (model_changed_)
A font is essentially a binary blob when loaded inside a
FontData
object. A model is the Java/C++ collection of objects that represent the same data in a manipulable format. If you ask for the model (even if you dont write to it), it will count as changed and the underlying raw data will get updated.
if (!subReadyToSerialize())
return NULL
else
size = subDataToSerialize()
WritableDataPtr new_data = container_->getNewData(size)
subSerialize(new_data)
data = new_data
FontDataTablePtr table = subBuildTable(data)
The table is actually built, where
subBuildTable
is overridden by every class of table but a table header is always added.
Subtables are lazily built
When creating the object view of the font and dealing with lots of tables, it would be wasteful to create builders for every subtable there is since most users only do fairly high level manipulation of the font. Instead, only the tables at font level are fully built.
All other subtables have builders that contain valid FontData but the object view is not created by default. For the CMapTable
, this means that if you don’t go through the getCMapBuilders()
method, the CMap builders are not initialized. So, the builder map would seem to be empty when calling its size()
method but there are CMaps in the font when calling numCMaps(internalReadFont())
.
Sfntly/Java uses a native ICU-based API for encoding characters. Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in UTF16. Public APIs will use ICU classes like UnicodeString
.