blob: 56741b9fd062d3531f856812487c14a445e158ae [file] [log] [blame]
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html5>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<!-- N.B. Sentences in this document are double-spaced so that Emacs
sentence-editing functions work more reliably. -->
<head>
<title>DCTV trace analysis system</title>
<link rel="stylesheet" href="reset.css" />
<link rel="stylesheet" href="styles.css" />
</head>
<body>
<div id="container">
<header id="header">DCTV</header>
<nav id="sidebar">
<ul>
<li>
<a href="#introduction">Introduction</a>
<ul>
<li><a href="#quickstart">Quick start</a></li>
<li><a href="#background">Background</a></li>
<li><a href="#conventions">Document conventions</a></li>
<li><a href="#datamodel">Data model</a></li>
<li><a href="#differences">Differences from standard SQL</a></li>
</ul>
</li>
<li><a href="#syntax">Syntax reference</a></li>
<li><a href="#standard_library">Standard library</a></li>
<li><a href="#example">Worked example</a></li>
</ul>
</nav>
<main id="manual">
<h1><a name="introduction">Introduction</a></h1>
<p>DCTV is a data exploration toolkit designed for both
interactive and batch analysis of trace files and other
heterogeneous time series data. It's designed to answer
complex of the sort of data that one frequently finds in
records of system activity.</p>
<p>Important features of DCTV are:</p>
<ul>
<li>SQL1999 querying of trace files</li>
<li>specialized relational algebra and SQL syntax for time series</li>
<li>comprehensive dimensional analysis for unit conversion
and error detection</li>
<li>support for analyzing very large (larger than memory)
trace files</li>
<li>powerful GUI for interactive trace exploration</li>
</ul>
<p>Use cases include:</p>
<ul>
<li>examining CPU time spent by a particular application</li>
<li>examining CPU time spent in <emph>part</emph> of an
application</li>
<li>examining memory activity of the whole system to determine
what caused a game to miss a frame deadline</li>
<li>finding which functions cause the most page faults
during app startup</li>
<li>tracking down slow memory leaks</li>
<li>finding why a real-time thread took too long to run and
poll a device</li>
<li>bulk analysis of traces from production to extract metrics for a
dashboard</li>
</ul>
<p>DCTV is a "power user" tool: using it effectively requires
an understanding of both the system components that generate
the trace events being queried and an understanding of
SQL-like declarative query systems. This document aims to
describe and document DCTV's functionality, walk through a few
examples of trace analysis, and invite the reader to
investigate further.</p>
<h2><a name="quickstart">Quick start</a></h2>
<aside class="warning">
DCTV is under active development and is not yet stable.
It also currently runs on Linux systems only; a port to
macOS is underway. See the <a href="go/dctv-db-design">DB
design document</a> for further information on checking out and
building the source code.
</aside>
<h3>Getting DCTV</h3>
<ol>
<li>Be running gLinux (we'll port eventually)</li>
<li><code>git clone sso://team/dctv/dctv</code></li>
<li><code>make dev</code></li>
<li>follow prompts; install dependencies</li>
<li>while the build is broken, complain to dancol@, goto 2</li>
<li><code>./dctv</code></li>
</ol>
<h3>Hello world</h3>
<code class="blockquote"><![CDATA[$ ./dctv repl mytrace=mytrace.ftrace
Type .help for help.
DCTV> SELECT COUNT(*) FROM mytrace.scheduler.timeslices_p_cpu;
COUNT()
-------
32362
]]></code>
<p>
</p>
<h2>Background</h2>
<blockquote>
Life is just one damned thing after another.
<cite>Arnold J. Toynbee</cite>
</blockquote>
<h3>Purpose of DCTV</h3>
<p>A trace file by itself is of limited utility: it's
gigabytes of detailed, low-level records of system activity.
When we analyze a trace file, what we really want to do is
<emph>pose questions</emph> to that trace file and get back
meaningful answers. The information we want lies in the
non-trivial <emph>relationships</emph> between trace events,
the relationships between relationships, and so on, in a way
that puts limits on the kind of trace analysis that it's
possible to do using ad-hoc analysis of trace
events themselves.</p>
<p>After we pose questions to a trace file and get answers, we
frequently want to use these answers as the basis for further
questions. In this way, we gradually increase the level of
abstraction of our analysis, moving from questions posed in
terms of raw trace events to ones posed in terms of the
problem we've actually trying to solve.</p>
<p>DCTV is a question-answering machine. By incrementally
constructing queries and then querying against them (for
example, using the <code>WITH</code> construction), users
extract increasingly abstract data from trace files, data not
directly represented by discrete and specific low-level events
in a trace. The SQL REPL and the GUI both provide
information-querying capabilities.</p>
<p>DCTV also provides a <a href="#standard_library">standard
library</a> of ready-made building blocks that users can query
during trace analysis.</p>
<h3>Other trace analysis tools</h3>
<p>DCTV is not the first such tool for trace analysis.
It integrates the best parts of WPA, LISA, and Perfetto's
trace analysis models.</p>
<code>TODO(dancol): flesh out this section</code>
<h2><a name="conventions">Document conventions</a></h2>
<p>This document currently assumes the reader is familiar with
the basics of SQL and the basics of trace processing, focusing
on DCTV's specific features in this area.</p>
<h3>Time tables</h3>
<p>Some figures below are "time tables" (they have "Time ▶" in
the upper-left). They represent timelines, where each row in
the table is a separate and independent data series.
Some tables represent operands and results; in this case, a
thick black line separates the input rows and output rows.</p>
<h3>Function signatures</h3>
<p>Table-valued function signatures are given in Python
syntax, with a bare <code>*</code> signifying that all
arguments following the <code>*</code> are keyword-only and
cannot be specified positionally. (That is, if a function
signature is <code>foo(*, bar=7)</code>, then you have to
write either <code>foo()</code> (using <code>bar</code>'s
default value or write <code>foo(bar=&gt;5)</code> (specifying an
explicit value of the keyword argument), and you can't write
<code>foo(1)</code> (because we can't specify
<code>bar</code> positionally.)</p>
<h2><a name="datamodel">Data model</a></h2>
<p>DCTV is designed around querying one or more trace files
using SQL queries. DCTV performs no hardcoded pre-processing
of trace files: we model each event in a trace file as a row
of the "raw events" table corresponding to that event's type.
Each field in an event is a column in that event's table;
users extract higher-level information from these low-level
events by defining views in terms of these low-level events.
By querying the views, users can extract higher-level trace
events; users can also define views in terms of other views to
answer more abstract questions.</p>
<h3>Table types</h3>
<p>DCTV's query engine provides the tables and set functions
that any SQL system provides, but extends these facilities
with a set of operators and functions dedicated to working
with heterogeneous time series. Tables in DCTV are
first-class <emph>typed</emph> objects: tables are either
regular tables, span tables, or event tables. Each type of
table has a set of query operations that it supports; DCTV
provides functions to convert one type of table to another as
needed.</p>
<aside class="note">It's always possible to "view" one of
DCTV's special table types as a regular table by just using
regular table operations (like the non-<code>SPAN</code>
variant of <code>SELECT</code>) on it. The result of any of
these non-special operations is itself a regular
table.</aside>
<p>This table summarizes the special operations DCTV supports.
Don't worry if you don't recognize some of these terms (like
"partitioned span table"): they're defined below.</p>
<table class="general">
<tr>
<th>Operation</th>
<th>Left operand</th>
<th>Right operand</th>
<th>Result</th>
</tr>
<tr>
<td>SELECT</td>
<td>Regular table</td>
<td>N/A</td>
<td>Regular table</td>
</tr>
<tr>
<td>SELECT</td>
<td>Span table</td>
<td>N/A</td>
<td>Regular table</td>
</tr>
<tr>
<td>SELECT SPAN</td>
<td>Span table</td>
<td>N/A</td>
<td>Span table</td>
</tr>
<tr>
<td>SPAN JOIN</td>
<td>Unpartitioned span table</td>
<td>Unpartitioned span table</td>
<td>Unpartitioned span table</td>
</tr>
<tr>
<td>SPAN BROADCAST INTO</td>
<td>Unpartitioned span table</td>
<td>Partitioned span table</td>
<td>Partitioned span table</td>
</tr>
<tr>
<td>SPAN BROADCAST FROM</td>
<td>Partitioned span table</td>
<td>Unpartitioned span table</td>
<td>Partitioned span table</td>
</tr>
<tr>
<td>GROUP USING PARTITION</td>
<td>Partitioned span table</td>
<td>N/A</td>
<td>Unpartitioned span table</td>
</tr>
<tr>
<td>GROUP USING SPANS FROM</td>
<td>Partitioned span table</td>
<td>Unpartitioned span table</td>
<td>Partitioned span table</td>
</tr>
<tr>
<td>GROUP USING SPANS FROM</td>
<td>Unpartitioned span table</td>
<td>Unpartitioned span table</td>
<td>Unpartitioned span table</td>
</tr>
</table>
<p>A <dfn>regular SQL table</dfn> is essentially a list of
points in high-dimensional space, with each column in the
table representing one dimension along which a point can
vary.</p>
<p>A <dfn>span table</dfn> represents data that vary over the
time dimension. An interval of time over which the data in a
span table remain the same is called a <dfn>span</dfn>.
The collection of time-varying data described by a span table
is the <dfn>payload</dfn> of that span table.</p>
<!-- TODO(dancol): talk about different time basis? -->
<p>All span tables have two special columns:
<dfn><code>_ts</code></dfn> and
<dfn><code>_duration</code></dfn>. <code>_ts</code> is an
<code>INT64</code> timestamp, in nanoseconds since the start
of the trace. <code>_duration</code> is a non-zero
<code>INT64</code> number of nanoseconds that the span covers.
(That is, the span describes the region of time
[<code>_ts</code>, <code>_ts</code> +
<code>_duration</code>].)</p>
<p><code>_ts</code> and <code>_duration</code> are always
non-<code>NULL</code>, and a span table is always ordered by
increasing values of <code>_ts</code>. Spans in a span table
cannot "overlap": a span must end either before or at exactly
the same time as the next span begins. (Spans from different
partitions may overlap, however: see immediately below.) A
span table need not be contiguous: that is, it's legal for
gaps to exist between spans.</p>
<p>For example, imagine that you're looking at a Christmas
tree light that changes color in time with music. We might
describe the color of the light using spans. The following
diagram depicts how we might use spans to describe the light's
state. Each pair of numbers (one above the table, one below)
indicates the time corresponding the vertical line connecting
them.</p>
<table class="spanop">
<caption>Light color</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
<tr>
<td>Color</td>
<td colspan="2">Red</td>
<td colspan="1" class="empty" />
<td colspan="1">Green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
</table>
<p>Here, the light was red from time one to time three and
then green from time four to time five, inclusive. (From time
three to time four, the light was off; we're choosing to
represent "off" as the absence of a span, but an equally valid
choice would be to make a span with a special "Off" value for
the color.)</p>
<p>It's useful to look at the physical table representation
of the above set of spans.</p>
<table class="general spantable">
<caption>Light color (span table representation)</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>color</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>red</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>green</td>
</tr>
</table>
<p>Note that one row in the physical table representation of a
span table corresponds to one <emph>logical</emph> span.</p>
<aside class="note">
It's because span tables are always ordered by
<code>_ts</code> that DCTV disallows queries of the form
<code>SELECT SPAN ... ORDER BY ...</code>. Re-ordering a
span table makes no sense. If you don't want to
<code>SELECT</code> from a span table and make the result a
span table, you can choose to instead view the span table as
a regular table by using the non-<code>SPAN</code> variant
of select (<code>SELECT * FROM my_span_table</code>),
and in this mode, <code>SELECT</code> will let you order
the result set by whatever you want.
</aside>
<p>An <dfn>event table</dfn> is like a span table, but without
the <code>_duration</code> column. It represents a sequence
of "points" in time. The advantage of using an event table
over a regular SQL table to represent points is automatic
integration of the event table into time-based operations
on spans.</p>
<h3>Partitions</h3>
<p>A span table is either a <dfn>partitioned span table</dfn>
or a <dfn>non-partitioned span table</dfn>. A non-partitioned
span table is just the kind of span table described above.
A partitioned span table, by contrast, has an additional
special column, the <dfn>partition column</dfn>.
A partitioned span table is basically a bundle of logical
partition tables all combined into a single table under a
single name. Each distinct <emph>value</emph> of the
partition column, which is called a <dfn>partition</dfn>,
defines one independent sequence of spans.</p>
<p>All of DCTV's operations on span tables know about
partitioned span tables (the partition column is part of the
span table's type) and operate on each partition within a span
table independently. There are also operations that transform
a partitioned span table into a non-partitioned span table
through the use of SQL grouping operators.</p>
<p>It's useful to sequences of spans this way instead of
putting each in own table: this way, using a partitioned span
table, we can operate on groups of related time series
uniformly without having to change our queries depending on
how many different time series we have: for example, a
CPU-related query should look the same on any system no matter
how many CPUs it has!</p>
<p>DCTV currently allows a span table to have either zero or
one partition column, but not more. This limit is just an
implementation limit, and in the future, DCTV will allow
partitioning by more than one column.</p>
<p>Let's look at our Christmas tree light example, but with
partitions. Here, we're looking at two lights, one called
"light#0" and another called "light#1". We use a sequence of
spans to describe each light's state. It's critical to
understand that each light has a distinct state history, but
that we store all of these histories in the same physical
table, using a column to describe the specific light that a
specific row describes.</p>
<aside class="note">For the remainder of this document, when
the character "#" appears in a span row label, it refers to a
specific partition of a partitioned span table.</aside>
<table class="spanop">
<caption>Colors of two lights</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
<tr>
<td>Light#0</td>
<td colspan="2">Red</td>
<td colspan="1" class="empty" />
<td colspan="1">Green</td>
</tr>
<tr>
<td>Light#1</td>
<td colspan="1">Green</td>
<td colspan="3">Red</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
</table>
<p>Here's the physical partitioned span table representation
of the logical spans from the above diagram.</p>
<table class="general spantable">
<caption>Colors of two lights (span table representation)</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>lightno</th>
<th>color</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0</td>
<td>red</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>green</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1</td>
<td>red</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>0</td>
<td>green</td>
</tr>
</table>
<p>Like an unpartitioned span table, a partitioned span table
is ordered strictly by increasing <code>_ts</code>. If spans
from two different partitions begin at the same time, the
ordering of those with the same <code>_ts</code> value is
unspecified. </p>
<aside class="example">
A real world use of spans is analyzing CPU-specific data.
On a multi-CPU system, each CPU has its own frequency.
A CPU might change from 800MHz to 1GHz and then down to
600MHz, while another CPU, at the same time, might change
its frequency from 600MHz to 800MHz and then up to 1GHz.
Each of the two time series (the first CPU's frequency
history and the second CPU's frequency history) is an
independent time series.
</aside>
<h3>Span operations</h3>
<p>While we can apply normal SQL querying operations to span
tables, we can answer certain questions much more conveniently
by using DCTV's special span operations, which are designed to
make it easy to work with real-world time series data.</p>
<h4>Span join</h4>
<p>The <dfn>span join</dfn> family of operations merge spans
together in a timewise-correct way and generates new spans
divided on the common boundaries of the spans that flow as
input into the span join.</p>
<p>It's easiest to demonstrate a span join visually.</p>
<!-- TODO(dancol): can we make this diagram more fun? -->
<table class="spanop">
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
<tr>
<td>Size</td>
<td colspan="2">tiny</td>
<td colspan="1">giant</td>
</tr>
<tr>
<td>Species</td>
<td colspan="1">fish</td>
<td colspan="2">squirrel</td>
</tr>
<tr class="result-divider"><td>SPAN JOIN</td></tr>
<tr>
<td>Phenotype</td>
<td>tiny fish</td>
<td>tiny squirrel</td>
<td>giant squirrel</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
</table>
<p>Here, we're joining two hypothetical time series (as
represented by span tables), a time series of sizes and a time
series of animal types. (Imagine we're trying to reconstruct
the state of an animal given a record of the transmutation
spells some novice sorcer's apprentice might have haphazardly
cast.)</p>
<p>In this trace, the "make the animal tiny" spell was in
effect from timestamp one to timestamp three (inclusive), and
the "make the animal giant" spell was in effect from timestamp
3 onward. Likewise, the "make the animal a fish" spell was in
effect from timestamp one to timestamp two (inclusive) and the
"make the animal a squirrel" spell was in effect from
timestamp two onward. The first row depicts the result of the
size spells, and the second row depicts the effect of the
animal-type spell. (We imagine that each spell cancels the
effect of the last spell of the same type.)</p>
<p>The last row, "phenotype", represents a span table giving
the type of animal that we observe at each moment, inferred
from the effects of the previous two rows. Note that the
result span table has a span division wherever any of the
inputs has a span division. We ensure that all the properties
of any of the input spans stay constant "within" any of the
output spans, allowing for correct future computation
involving these values.
</p>
<p>It may be informative to look at the row-wise representation
of the above span tables:</p>
<table class="general spantable">
<caption>Size</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>size</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>tiny</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>giant</td>
</tr>
</table>
<table class="general spantable">
<caption>Species</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>species</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>fish</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>squirrel</td>
</tr>
</table>
<table class="general spantable">
<caption>Phenotype</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>size</th>
<th>species</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>tiny</td>
<td>fish</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>tiny</td>
<td>squirrel</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>giant</td>
<td>squirrel</td>
</tr>
</table>
<h4>Span join: inner and outer</h4>
<p>What happens when spans don't line up exactly?</p>
<p>Span joins come in two varieties, named after the varieties
of regular SQL joins: <dfn>inner span join</dfn> and
<dfn>outer span join</dfn>. When all the inputs to a span
join cover the same period of time, the difference doesn't
matter. But when there are gaps in one sequence or another,
the difference becomes important. Just as in the previous
section, we'll start with a diagram.</p>
<table class="spanop">
<caption>Sample inputs</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
<tr>
<td>Breath</td>
<td colspan="1">fire</td>
<td class="empty"/>
<td colspan="1">ice</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="2">green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
</table>
<p>Here, we see that there is no magic breath spell in effect
from time two to time three, inclusive. What happens when we
perform a span join on these span tables? It depends on the
kind of span join.</p>
<table class="spanop">
<caption>Span inner join</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
<tr>
<td>Breath</td>
<td colspan="1">fire</td>
<td class="empty"/>
<td colspan="1">ice</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="2">green</td>
</tr>
<tr class="result-divider">
<td>Span inner join</td>
</tr>
<tr>
<td>Phenotype</td>
<td colspan="1">fire-breathing red</td>
<td class="empty"/>
<td colspan="1">ice-breathing green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
</table>
<table class="spanop">
<caption>Span outer join</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
<tr>
<td>Breath</td>
<td colspan="1">fire</td>
<td class="empty"/>
<td colspan="1">ice</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="2">green</td>
</tr>
<tr class="result-divider">
<td>Span outer join</td>
</tr>
<tr>
<td>Phenotype</td>
<td colspan="1">fire-breathing red</td>
<td colspan="1"><code>NULL</code>-breathing green</td>
<td colspan="1">ice-breathing green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
</table>
<p>In the span inner join case, we emit an output span only
when <i>all</i> input spans cover a time interval. In the
span outer join case, we emit an output span when <i>any</i>
input span covers a specific time region, providing NULL for
the value of any payload column not provided by a span for
that region.</p>
<p>The table representations of the two result span tables may
make the result more clear.</p>
<table class="general spantable">
<caption>Span inner join result (table view)</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>breath</th>
<th>color</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>fire</td>
<td>red</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>ice</td>
<td>green</td>
</tr>
</table>
<table class="general spantable">
<caption>Span outer join result (table view)</caption>
<tr>
<th>_ts</th>
<th>_duration</th>
<th>breath</th>
<th>color</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>fire</td>
<td>red</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td><code>NULL</code></td>
<td>green</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>ice</td>
<td>green</td>
</tr>
</table>
<p>Note that even a span outer join won't produce a result
span that covers a period of time that no input span covered,
as the following diagram indicates.</p>
<table class="spanop">
<caption>Holes in span outer join</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
<tr>
<td>Breath</td>
<td colspan="1" class="empty"/>
<td colspan="1" class="empty"/>
<td colspan="1">ice</td>
</tr>
<tr>
<td>Color</td>
<td class="empty"/>
<td colspan="1">red</td>
<td colspan="1">green</td>
</tr>
<tr class="result-divider">
<td>Span outer join </td>
</tr>
<tr>
<td>Phenotype</td>
<td class="empty"/>
<td colspan="1"><code>NULL</code>-breathing red</td>
<td colspan="1">ice-breathing green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span><span>4</span></td>
</tr>
</table>
<h4>Span broadcast</h4>
<p>A <dfn>span broadcast</dfn> is a special kind of span join
that operates on two span tables, one partitioned and one not.
Normally, DCTV treats each partition within a partitioned span
table as a separate time series and operates on each
independently; DCTV refuses to perform span operations on span
tables partitioned by different columns or between partitioned
and non-partitioned span tables, since the desired operation
isn't obvious.</p>
<p>With a span broadcast, we can tell DCTV to perform a
special kind of span join between a partitioned and
non-partitioned table, "broadcasting" the non-partitioned span
into every partition in the partitioned span table in such a
way that the result has useful properties.</p>
<p>The overall result is <emph>almost</emph> as if we copied
the non-partitioned span table N times, one for each N
partition, into a new partitioned span table, and then joined
that new partitioned span table with the other partitioned
span table that we had when we started. The difference
between this hypothetical operation and span broadcast is that
span broadcast doesn't generate any output spans for regions
not covered by any span in the partitioned span table, even if
that region is covered by the non-partitioned span table.</p>
<p>Another way to think of it is that span broadcast "labels"
each span in a partitioned span table with the payload of the
non-partitioned table. The output of a span broadcast
operation is partitioned in the same way as its partitioned
input.</p>
<p>As usual, a diagram may be illustrative. Here, "Size#0"
and "Size#1" indicate two spans of the same span table (let's
suppose animals 0 and 1 have different size spells cast on
them), "Size". "Color" is the input non-partitioned span
table (let's suppose color spells affect all animal at the
same time).</p>
<table class="spanop">
<caption>Sample inputs</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
<tr>
<td>Size #0</td>
<td colspan="1">tiny</td>
<td colspan="2">giant</td>
</tr>
<tr>
<td>Size #1</td>
<td colspan="3">tiny</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="1" class="empty" />
<td colspan="2">green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
</table>
<p>Just like regular span joins, span broadcasts come in
<dfn>span inner broadcast</dfn> and <dfn>span outer
broadcast</dfn> varieties, depicted below. Note that the time
period from four to five doesn't appear in the result span
tables, since from time four to time five, we had a color span
from the non-partitioned span, but no spans from size, the
partitioned span table.</p>
<table class="spanop">
<caption>Inner broadcast of color into size</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
<tr>
<td>Size #0</td>
<td colspan="1">tiny</td>
<td colspan="2">giant</td>
</tr>
<tr>
<td>Size #1</td>
<td colspan="3">tiny</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="1" class="empty" />
<td colspan="2">green</td>
</tr>
<tr class="result-divider">
<td>Inner broadcast</td>
</tr>
<tr>
<td>Result#0</td>
<td colspan="1">tiny red</td>
<td colspan="1" class="empty" />
<td colspan="1">giant green</td>
</tr>
<tr>
<td>Result#1</td>
<td colspan="1">tiny red</td>
<td colspan="1" class="empty" />
<td colspan="1">tiny green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
</table>
<table class="spanop">
<caption>Outer broadcast of color into size</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
<tr>
<td>Size #0</td>
<td colspan="1">tiny</td>
<td colspan="2">giant</td>
</tr>
<tr>
<td>Size #1</td>
<td colspan="3">tiny</td>
</tr>
<tr>
<td>Color</td>
<td colspan="1">red</td>
<td colspan="1" class="empty" />
<td colspan="2">green</td>
</tr>
<tr class="result-divider">
<td>Outer broadcast</td>
</tr>
<tr>
<td>Result#0</td>
<td colspan="1">tiny red</td>
<td colspan="1"><code>NULL</code>-colored giant</td>
<td colspan="1">giant green</td>
</tr>
<tr>
<td>Result#1</td>
<td colspan="1">tiny red</td>
<td colspan="1"><code>NULL</code>-colored tiny</td>
<td colspan="1">tiny green</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span><span>5</span></td>
</tr>
</table>
<p>In general, we use a span broadcast when we have a number
of different things happening at the same time (each
represented by one partition of a span table) and we want to
"mix into" this span table knowledge of something that affects
the environment as a whole.</p>
<aside class="example">We might denote a period of a few
seconds time during which the app
com.flashlightco.myflashlight starts up in response to a
launcher tap. (This is not an efficient flashight app.) If
we have a table of process activity, partitioned by CPU, we
can apply a span inner broadcast to the process activity table
and narrow our view of that table to the interval during which
the flashlight app was starting, but keep the result
partitioned by CPU.
</aside>
<h4>Span group</h4>
<p>A <dfn>span group</dfn> operation is the opposite of a span
join, in a sense. It merges spans together and applies SQL
set functions (like <code>MAX</code> and <code>SUM</code>) to
the payloads of the merged spans, forming for each payload a
combined value determined through the usual SQL aggregation
operation..</p>
<p>Here's a diagram.</p>
<table class="spanop">
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Number arms</td>
<td>2</td>
<td>5</td>
<td>0</td>
<td>7</td>
<td>2</td>
<td>4</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Periods</td>
<td colspan="2">A</td>
<td colspan="2">B</td>
<td colspan="2">C</td>
<td colspan="2">D</td>
</tr>
<tr class="result-divider"><td>Span group</td></tr>
<tr>
<td><code>MAX(arms)</code></td>
<td colspan="2">5</td>
<td colspan="2">7</td>
<td colspan="2">4</td>
<td colspan="2">9</td>
</tr>
<tr>
<td><code>MIN(arms)</code></td>
<td colspan="2">2</td>
<td colspan="2">0</td>
<td colspan="2">2</td>
<td colspan="2">0</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<p>Here, our hapless sorcerer repeatedly changed the numbers
of arms that our poor animal had at any time. We want to
determine, based on the record of arm-number changes, for each
relatively broad interval A, B, C, and D, the minimum and
maximum number of arms our animal had during that
interval.</p>
<p>A span group operation involves two span tables: the
<dfn>grouped</dfn> table and the <dfn>grouper</dfn> table.
The grouped table ("number of arms", in our example) supplies
the source data for the grouping operations; the grouper table
(here, "periods") supplies spans describing the groups that
form the output value. The grouped table may or may not be
partitioned; if it is partitioned, DCTV applies grouping to
each partition individually. The grouper table may not
currently be partitioned.</p>
<p>A span group operation always emits one output span for
each span in its grouper input span table. If no grouped span
overlaps with a given grouper span, all its aggregate values
end up being <code>NULL</code>. An example follows.</p>
<table class="spanop">
<caption>Illustration of span group behavior with missing
grouped values</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Number arms</td>
<td>2</td>
<td>5</td>
<td>0</td>
<td class="empty" />
<td class="empty" />
<td class="empty" />
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Periods</td>
<td colspan="2">A</td>
<td colspan="2">B</td>
<td colspan="2">C</td>
<td colspan="2">D</td>
</tr>
<tr class="result-divider"><td>Span group</td></tr>
<tr>
<td><code>MAX(arms)</code></td>
<td colspan="2">5</td>
<td colspan="2">0</td>
<td colspan="2"><code>NULL</code></td>
<td colspan="2">9</td>
</tr>
<tr>
<td><code>MIN(arms)</code></td>
<td colspan="2">2</td>
<td colspan="2">0</td>
<td colspan="2"><code>NULL</code></td>
<td colspan="2">0</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<!-- TODO(dancol): make this paragraph more clear -->
<p>Span group operations have two flavors: <dfn>span group and
intersect</dfn> and <dfn>span group and union</dfn>.
The difference matters only when multiple partitions are
involved. In the former case, we include payloads from the
grouped span table only when all partitions are present in a
given interval; in the latter case, we include the grouped
span table in the output spans when any input grouped
partition is present.</p>
<aside class="note">If we want the output of a span join to
include only the regions of time covered by the grouped span,
first span join the grouper with the grouped span, then use
the result as the grouper span table in the span
group.</aside>
<h4>Span departition</h4>
<p>A <dfn>span departition</dfn> operation transforms a
partitioned span table into a non-partitioned span table by
grouping the partition payloads with SQL set values.
This operation is useful mainly when we have a "split up" view
of activity on the system and want to derive a whole-system
view by matching up all the partitions.</p>
<p>To return to our magical forensics example, imagine our
apprentice cast some very expensive add-arms-to-animals spells
on a number of different animals. We're billed for arms based
on the total number we're using at any one time (there's a
license server and everything), so we want to reconstruct,
based on a record of each animal's arm count, the number of
arms we were using in total at a particular moment. In the
following table, "Arms#0", "Arms#1", and so on denote the
partitions of a single "Arms" span table.</p>
<table class="spanop">
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Arms#0</td>
<td colspan="3">2</td>
<td colspan="2">7</td>
<td>4</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Arms#1</td>
<td colspan="5">2</td>
<td colspan="3">4</td>
</tr>
<tr class="result-divider"><td>Departition</td></tr>
<tr>
<td><code>SUM(arms)</code></td>
<td colspan="3">4</td>
<td colspan="2">9</td>
<td colspan="1">8</td>
<td colspan="1">13</td>
<td colspan="1">4</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<p>A span departition operation resembles a span group join
followed by a span group operation, but it's specified
separately so that we can work with partitioned span tables
without knowing in advance how many partitions we have or
having to expand our queries to work with each partition
separately.</p>
<p>Span departitions come in two varieties, the <dfn>span
departition and union</dfn> and <dfn>span departition and
intersect</dfn> operations, with the difference concerning the
treatment of missing data. The following table gives the
differences between these approaches.</p>
<table class="spanop">
<caption>Arm history with missing data</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Arms#0</td>
<td colspan="3" class="empty" />
<td colspan="2">7</td>
<td>4</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Arms#1</td>
<td colspan="5">2</td>
<td colspan="3" class="empty" />
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<p>In intersect mode, we generate an output span for a region
of time only when <emph>all</emph> partitions have a span
covering that period.</p>
<table class="spanop">
<caption>Departition intersect result</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Arms#0</td>
<td colspan="3">2</td>
<td colspan="2">7</td>
<td>4</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Arms#1</td>
<td colspan="5">2</td>
<td colspan="3">4</td>
</tr>
<tr class="result-divider">
<td>Departition intersect</td>
</tr>
<tr>
<td><code>SUM(arms)</code></td>
<td colspan="3" class="empty" />
<td colspan="2">9</td>
<td colspan="3" class="empty" />
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<p>By contrast, in union mode, we generate an output span when
<emph>any</emph> partition covers a unit in time. We treat
any missing partitions as contributing <code>NULL</code> to
the output aggregation for each span. Note that SQL
aggregation functions just skip <code>NULL</code> values, so
the sums below are correct.</p>
<table class="spanop">
<caption>Departition union result</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
<tr>
<td>Arms#0</td>
<td colspan="3">2</td>
<td colspan="2">7</td>
<td>4</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>Arms#1</td>
<td colspan="5">2</td>
<td colspan="3">4</td>
</tr>
<tr class="result-divider">
<td>Departition union</td>
</tr>
<tr>
<td><code>SUM(arms)</code></td>
<td colspan="3">2</td>
<td colspan="2">9</td>
<td colspan="1">4</td>
<td colspan="1">9</td>
<td colspan="1">0</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span></td>
<td><span>7</span></td>
<td><span>8</span><span>9</span></td>
</tr>
</table>
<h3>Trace processing intrinsic functions</h3>
<p>DCTV aims to be a general-purpose time series analysis
program, one that just happens to be especially useful for
processing Android system traces. Its general approach is to
avoid system- and metric-specific data processing routines and
provide general-purpose operators that users can combine to
analyze data in particular situations.</p>
<p>The previous section describes operations that DCTV
provides in the form of query operators. DCTV also provides
some operations, usually less common ones, in the form of
table-valued functions.</p>
<h4><a name="time_series_to_span_conversion">Time series to span conversion</a></h4>
<p>Recall that DCTV exposes events from trace files as raw
data points, in event tables. We have to build span tables
from these raw data somehow, and the <a
href="#time_series_to_spans"><code>time_series_to_spans</code></a>
table-valued function does exactly that.</p>
<p><code>time_series_to_spans</code> takes as input a set of
event sources and a set of output column descriptors and
produces a span table as output. Logically, it consuming
events from the given sources, in time order, and constructs
spans by watching for "start" and "stop" events as denoted by
the input sources. Payload values attached to the event
sources become payload columns of the output span table
according to each column specification's column
specification.</p>
<p>Each source is either a "start-start" source or a "stop"
source. The former case models a set of events that divide a
timeline up into discrete chunks.</p>
<p>Returning for a moment to our hypothetical wizardly
apprentice, we recall that an animal's size might change as
our apprentice casts various "change size" spells on it.
The raw, event-by-event, record of spells cast by our
apprentice might look like this.</p>
<table class="general spantable">
<caption>Raw size spell record</caption>
<tr>
<th>_ts</th>
<th>size</th>
</tr>
<tr>
<td>1</td>
<td>tiny</td>
</tr>
<tr>
<td>3</td>
<td>huge</td>
</tr>
<tr>
<td>4</td>
<td>large</td>
</tr>
<tr>
<td>6</td>
<td>huge</td>
</tr>
</table>
<p>Processing this raw event table into spans using
<code>time_series_to_spans</code>, we end up with a span table
that looks like this. (The time scale goes to seven for
easier comparison with the next example.)</p>
<table class="spanop">
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
<tr>
<td>Size</td>
<td colspan="2">tiny</td>
<td colspan="1">huge</td>
<td colspan="2">large</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
</table>
<aside class="note">The final "huge" spell isn't reflected in
the output span table, because
<code>time_series_to_spans</code> ignores spans left "open"
(i.e., unclosed) at the end of processing. The intent of this
feature is to work with span inner join operations to
automatically ignore noisy partial-data "junk intervals" at
the beginning and end of traces. If a need arises,
<code>time_series_to_spans</code> could be extended in the
future to automatically close open spans.</aside>
<p><code>time_series_to_spans</code> also supports "stop"
events. These events don't start new spans, but do indicate
that any open span active at the time of the stop event should
be finished. In an operating system context, if
<code>sched_switch</code> is a start-start event, a CPU
hotplug off event might be a "stop" event, since it would
indicate that a CPU has stopped processing traces without
producing any new ones.</p>
<p>To return to our unfortunate apprentice example, suppose we
have an additional table of "size reset" spells that we know
were cast during the sequence of size change spells. A size
reset spell just returns a creature to whatever size it had
without any magical augmentation. The raw table might look
something like this.</p>
<table class="general spantable">
<caption>Raw size-reset spell record</caption>
<tr>
<th>_ts</th>
</tr>
<tr>
<td>5</td>
</tr>
<tr>
<td>7</td>
</tr>
</table>
<p>If we feed both our original size spell record event table
<emph>and</emph> our size-reset spell table into
<code>time_series_to_spans</code>, we end up with a span table
that looks like this.</p>
<table class="spanop">
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
<tr>
<td>Size</td>
<td colspan="2">tiny</td>
<td colspan="1">huge</td>
<td colspan="1">large</td>
<td colspan="1" class="empty" />
<td colspan="1">huge</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
</table>
<p>Note the differences: first, we now have a "hole" between
times five and six, because the stop table told us that we
stopped changing our poor confused creature's size at time
five and didn't start changing it again until time six.
Second, we have a "huge" span from time six to seven, because
the span beginning at time six is no longer left open after
<code>time_series_to_spans</code> ends.</p>
<aside class="note">If you want a span table that substitutes
a concrete value (say, "normal") for the hole, you can combine
a span outer join of the whole-trace span with
<code>COALESCE</code> on the payload column to make
one.</aside>
<p>Each payload column that <code>time_series_to_spans</code>
generates is described by a "source specification".
The specification describes, for each output column, the
source event table from which we get the column's value and
the "edge" from which we draw the value. (The edge defaults
to "rising".) Using the "rising" edge means that we draw the
output payload column for a span from the event that started
the span; using "falling" instead tells
<code>time_series_to_spans</code> to draw the payload column
value from the <emph>closing</emph> event. We typically stick
with "rising" except in special cases.</p>
<p><code>time_series_to_spans</code> supports creating
partitioned span tables as well; each source specification can
be associated with a partition column in that source table.
All sources for a given call to
<code>time_series_to_spans</code> must be partitioned the same
way.</p>
<h4><a name="stackification">Stackification</a></h4>
<p>Not all raw input events look like a series of start and
stop on a timeline. Another common pattern in row input is
the "start-stop stack", in which a series of nested and
balanced start and stop events describe the erection and
demolition of a stack of some kind of thing.</p>
<p>Stacks can be anything: examples include procedure call
stacks, Android synchronous atrace regions, and nested
interrupt handlers. To keep with our hapless-apprentice
example theme, we'll imagine that spells are prepared by
simultaneous chanting, waving, and stirring, and that we have
distinct "start" and "stop" records for each activity.</p>
<p>Suppose we know at what time our apprentice starts a given
activity and know at what time an activity ends. Suppose also
that our apprentice at least paid enough attention in class to
understand that one always stops the magical activity one most
recently started.</p>
<p>(Note that at time five, a second chant begins even though
a chant was already ongoing. A friend must have joined
in.)</p>
<table class="general spantable">
<caption>Spell starts</caption>
<tr>
<th>_ts</th>
<th>activity</th>
</tr>
<tr>
<td>1</td>
<td>stir</td>
</tr>
<tr>
<td>3</td>
<td>wave</td>
</tr>
<tr>
<td>4</td>
<td>chant</td>
</tr>
<tr>
<td>5</td>
<td>chant</td>
</tr>
</table>
<table class="general spantable">
<caption>Spell stops</caption>
<tr>
<th>_ts</th>
</tr>
<tr>
<td>2</td>
</tr>
<tr>
<td>7</td>
</tr>
<tr>
<td>7</td>
</tr>
<tr>
<td>7</td>
</tr>
</table>
<p>What happens if we rearrange these data into spans?</p>
<table class="spanop">
<caption>Notional stackified spells</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
<tr>
<td>Effects</td>
<td colspan="1">[stir]</td>
<td colspan="1" class="empty" />
<td colspan="1">[wave]</td>
<td colspan="1">[wave, chant]</td>
<td colspan="2">[wave, chant, chant]</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
</table>
<p>This arrangement makes logical sense, but it isn't quite
compatible with DCTV's data model. Note that the value of
each cell is actually a list! Unlike some databases, DCTV
does not support composite (multi-part) values as column
values. But here we apparently have composite values in the
cells. How do we represent these spans as tables? By <a
href="https://en.wikipedia.org/wiki/Database_normalization">
normalization</a>.</p>
<table class="general spantable">
<caption>Stack contents</caption>
<tr>
<th>stack_id</th>
<th>depth</th>
<th>token</th>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>stir</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>wave</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>wave</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>chant</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>wave</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>chant</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
<td>chant</td>
</tr>
</table>
<table class="spanop">
<caption>Normalized stackified spells</caption>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
<tr>
<td>Stack Id</td>
<td colspan="1">1</td>
<td colspan="1" class="empty" />
<td colspan="1">2</td>
<td colspan="1">3</td>
<td colspan="2">4</td>
</tr>
<tr class="times">
<td>Time ▶</td>
<td><span>1</span></td>
<td><span>2</span></td>
<td><span>3</span></td>
<td><span>4</span></td>
<td><span>5</span></td>
<td><span>6</span><span>7</span></td>
</tr>
</table>
<p>Now, we can look up the stack corresponding to each span by
looking at that span's stack id payload and joining it against
the stack contents table. The stackify DCTV intrinsic
processes any kind of stack into these two tables (the stack
contents regular table and the "stack history" span
table).</p>
<aside class="note">This approach is admittedly pretty ugly,
but it works. If it turns out to be a big enough of a
problem, we may just implement support for composite values
(which is really just this approach under the hood).</aside>
<h4><a name="span_generation">Generating span tables from thin air</a></h4>
<p>There are general utility functions to generate specialized
span tables useful for composing with others.
The <code>generate_sequential_spans</code> table-valued
function generates a sequence of spans according to the start
time, stop time, and duration specified in the call.
It's useful for generating spans to quantize the timeline into
discrete intervals and for generating "whole trace" spans that
act as inputs to span joins.</p>
<p>Each trace namespace has a few convenience functions for
succinctly generating, using
<code>generate_sequential_spans</code>, certain kinds of span
tables. See the <a href="#standard_library">"standard
library"</a> reference below.</p>
<h3>Dimensional analysis</h3>
<p>DCTV provides a dimensional analysis feature to make it
easy and natural to query traces using naturally-specified
values and to avoid errors that can arise from accidental
nonsensical combinations of incompatible units. Each quantity
in a query is associated with a <dfn>unit</dfn> and these
units propagate through the query as it is processed.
Quantities with different units combine according to the rules
of <a
href="https://en.wikipedia.org/wiki/Dimensional_analysis">dimensional
analysis</a>. DCTV also knows how to convert from one
compatible unit to another. DCTV will signal errors rather
than produce results that are dimensional nonsense.
The overall goal of the dimensional analysis feature is to
make it easy and natural to query traces using
naturally-specified values and to avoid errors that can arise
from accidental nonsensical combinations of incompatible
units.</p>
<h4>Unit specification</h4>
<p>Units come from two sources:</p>
<ol>
<li>intrinsic tagging of
quantities with units during trace parsing, and</li>
<li>explicit
tagging of quantities with units in query syntax.</li>
</ol>
<p>The syntax for specifying a unit is just adding the name of
the unit after a numeric literal. For simple alphanumeric
unit names, a bare word is sufficient, e.g., <code>4ns</code>.
For more complicated units that contain operators that SQL
would otherwise interpret as part of expressions, the unit
name needs to be quoted with backticks, as in
<code>4`miles/hour`</code>. Without the backticks, SQL would
interpret <code>4miles/hour</code> as an attempt to divide the
quantity <code>4</code> by the column <code>hour</code>, which
is probably not what we want.</p>
<h4>Unit names</h4>
<!-- TODO(dancol): include the unit name list here -->
<p>DCTV understands both common and abbreviated names for
units. This document will eventually list all understood unit
names; for the moment, see <a
href="https://team.git.corp.google.com/dctv/dctv/+/master/src/dctv/units.txt">units.txt</a>
in the DCTV source code.</p>
<h4>Unit conversion</h4>
<p>Queries can explicitly convert units from one type to
another using the <code>IN</code> operator. <!-- TODO(dancol):
link to syntax --></p>
<aside class="example code-example"><![CDATA[DCTV> SELECT 4`inches` IN cm;
4 IN cm [cm]
------------
10.16]]></aside>
<p>In the DCTV REPL, column headers that denote a quantity
with unit list that unit in square brackets after the column
name. Above, we see <code>[cm]</code> at the end of the
column name, indicating that the <code>10.16</code> is
specified in terms of centimeters.</p>
<p>DCTV's unit analysis also understands rates. In the
example below, DCTV gives a unit in terms of miles, because
we're multiplying a rate, in miles per hour, by a unit of
time. The time unit here need not be the literal unit used in
the rate: DCTV will convert units as needed.</p>
<aside class="example code-example"><![CDATA[DCTV> SELECT 4`miles/hour` * 2`days`;
(4 * 2) [mi]
------------
192]]></aside>
<h3>
</h3>
<h2><a name="differences">Differences from standard SQL</a></h2>
<h3>Nested namespaces</h3>
<p>Standard SQL provides a two-level namespace for tables:
each table is named by an optional schema (followed by a dot),
and then a table name. DCTV, by contrast, allows for
arbitrarily deep nesting of namespaces, with each namespace
component separated by a period. (SQL's standard syntax is a
special case.) We use the nested namespace syntax to talk
about specific tables and views embedded in a "trace
sub-namespace", which we form when we mount a trace into the
global SQL namespace.</p>
<h3>Keyword arguments</h3>
<p>Normal SQL allows only positional arguments to function
calls. DCTV allows for Python-style keyword arguments as
well, with each keyword-argument pair separated by the "=&gt;"
token. See the <a href="#syntax">syntax reference</a> for
details.</p>
<h3>Extended table-valued-function-call syntax</h3>
<p>DCTV exposes some facilities as table-valued functions.
The arguments to these functions are evaluated in a context
different from normal SQL expression evaluation, and in this
context, DCTV supports extended syntax, including the use of
list and dictionary literals. (Table-valued functions are
Python functions and these list and dictionary literals become
list and dict values inside calls.) See the syntax reference
for details.</p>
<h3>Miscellaneous syntax extensions</h3>
<p>DCTV is designed to minimize users fighting with the
syntax. Wherever SQL requires a list of something to be
comma-separated, DCTV allows and ignores a trailing comma.
Where SQL requires a list terminator (e.g., semicolons after
each query statement), DCTV allows users to omit the list
terminator.</p>
<p>DCTV recognizes <code>&lt;&gt;</code> and the C-style
<code>!=</code> operators as equivalent.</p>
<p>DCTV provides the "spaceship" and "anti-spaceship"
operators <code>&lt;=&gt;</code> and <code>&lt;!=&gt;</code>,
respectively, which act like <code>==</code> and
<code>!=</code>, except that they treat <code>NULL</code> as
being equal to itself. (MySQL calls these operators "null
safe comparison operators".)</p>
<p>In addition to the standard SQL <code>-- </code> comment
prefix, DCTV allows the use of <code>#</code> as a
Python-style comment prefix and the use of <code>/*</code> and
<code>*/</code> for C-style block comments.</p>
<h3>Missing features</h3>
<p>DCTV does not implement some features of more traditional
databases. The following table summarizes the features not
provided, whether we plan to provide them, and any additional
relevant information.</p>
<table class="general">
<tr>
<th>Feature</th>
<th>Status</th>
<th>Command</th>
</tr>
<tr>
<td>INSERT/UPDATE/DELETE</td>
<td>Not planned</td>
<td>DCTV is immutable</td>
</tr>
<tr>
<td>SQL1999 window functions</td>
<td>Planned</td>
<td></td>
</tr>
<tr>
<td>SQL/PL</td>
<td>Planned</td>
<td>Will be accelerated</td>
</tr>
<tr>
<td>Recursive CTEs</td>
<td>Planned</td>
<td></td>
</tr>
<tr>
<td>Coordinated subqueries</td>
<td>Planned</td>
<td></td>
</tr>
</table>
<h1><a name="syntax">Syntax reference</a></h1>
<h2>SQL Statement list</h2>
<p>The REPL accepts statement lists as top-level input.</p>
<object class="sytax" data="sql-stmt-list.syntax.svg" type="image/svg+xml" />
<h2>SQL statement</h2>
<p>A given SQL statement is either a SELECT, which performs a
query, or one of a few miscellaneous types of data management
operation.</p>
<object class="sytax" data="sql-stmt.syntax.svg" type="image/svg+xml" />
<h2>SELECT</h2>
<p>A SELECT is a combination of one or more "select core"
statements (combined together with operators like
<code>UNION</code>), all sorted and windowed.</p>
<p>Note that span tables cannot be combined using
SQL compound operators.</p>
<p>Common table expressions are "local" views that exist only
in the context of the following SELECT and
subsequently-defined common table expressions. (That is, the
common table expressions have lexical scope, and the names are
bound as with <code>let*</code> in Lisp.)</p>
<object class="sytax" data="select-stmt.syntax.svg" type="image/svg+xml" />
<h2>Regular select core</h2>
<p>This diagram shows the syntax for the main body of a
<code>SELECT</code> statement. If the keyword
<code>SPAN</code> appears after the <code>SELECT</code>, the
"type" of the result of the <code>SELECT</code> is a span
table; otherwise, it's a regular table.</p>
<p>In <code>SPAN</code> mode, <code>SELECT</code> always
includes the special columns <code>_ts</code>,
<code>_duration</code>, and (if partitioned) the partition
column in the selected column set. <code>SELECT SPAN FROM
...</code> (with no column list between the <code>SPAN</code>
and the <code>FROM</code>) indicates selecting
<emph>only</emph> these special columns. These special
columns may not be specified "by hand" in the result-column
list.</p>
<p>The <code>table-or-join</code> clause describes the syntax
for span join operators. The <code>GROUP...USING SPANS</code>
syntax describes a span group operation from the data model
section; the <code>GROUP...USING PARTITION</code> syntax
describes a span departition operation. <code>GROUP BY</code>
works exactly the same way it does in standard SQL.</p>
<p><code>HAVING</code> in span mode always filters the
<emph>generated</emph> spans; <code>WHERE</code> filters the
<emph>inputs</emph> to any span join and grouping operations,
analogously to the distinction between <code>WHERE</code> and
<code>HAVING</code> in standard SQL.</p>
<object class="sytax" data="regular-select.syntax.svg" type="image/svg+xml" />
<h2>Result column</h2>
<object class="sytax" data="result-column.syntax.svg" type="image/svg+xml" />
<h2>Table or join specification</h2>
<p>This element describes a "column source" from which a
<code>SELECT</code> draws columns. It can be a simple table
name, a call to a table-valued function, a subquery (of which
<code>VALUES</code> is a special case), or a join operation of
other column sources.</p>
<p>A comma joining two table specifications is equivalent to
<code>INNER JOIN</code>.</p>
<p><code>AS</code> assigns a local alias to one of these
column sources, the alias being useful in expressions in the
result-column clauses. If a column list comes after the
<code>AS</code>, the columns of the thing named with
<code>AS</code> are renamed to match the columns in the column
list that follows the <code>AS</code>, which must have the
same length as the set of columns in the named thing.</p>
<object class="sytax" data="table-or-join.syntax.svg" type="image/svg+xml" />
<h2>Conventional join</h2>
<p>A normal SQL join.</p>
<object class="sytax" data="conventional-join.syntax.svg" type="image/svg+xml" />
<h2>Span join</h2>
<p>Describes a span join operation. The <code>PARTITION
AS</code> clause provides the name of the partition column in
the resulting span table, which must be specified if the left
and right span tables have partition columns with
different names.</p>
<object class="sytax" data="span-join.syntax.svg" type="image/svg+xml" />
<h2>Span broadcast</h2>
<p>A span broadcast operation. In the
<code>BROADCAST..INTO</code> variant, the unpartitioned span
table is on the left, whereas in the
<code>BROADCAST...FROM</code> variant, the unpartitioned span
table is on the right.</p>
<object class="sytax" data="span-broadcast.syntax.svg" type="image/svg+xml" />
<h2>Table specification</h2>
<p>Names a single table, either as a name of a table
in the table namespace, a call to a table-valued
function in the table namespace, or a subquery.</p>
<object class="sytax" data="table-spec.syntax.svg" type="image/svg+xml" />
<h2>Table-valued function arglist</h2>
<p>The argument list for a table-valued function call. Note
the keyword arguments.</p>
<object class="sytax" data="tvf-arglist.syntax.svg" type="image/svg+xml" />
<h2>Table-valued function (TVF) expression</h2>
<p>The syntax for an expression in TVF context. Note the dict
and list literal syntax. A subquery is also a valid argument
to a table-valued function!</p>
<object class="sytax" data="tvf-expr.syntax.svg" type="image/svg+xml" />
<h2>SQL expression</h2>
<p>Syntax for expressions that can occur in a query
outside TVF context.</p>
<object class="sytax" data="expr.syntax.svg" type="image/svg+xml" />
<h2>Function call argument list</h2>
<p>The argument list for a call to a function SQL
expression context. Note the keyword arguments and
the optional <code>DISTINCT</code> keyword.</p>
<object class="sytax" data="function-arglist.syntax.svg" type="image/svg+xml" />
<h2>Data type names</h2>
<p>List of allowed data type names.</p>
<object class="sytax" data="type-name.syntax.svg" type="image/svg+xml" />
<h2>Literal value syntax</h2>
<p>Literal values can appear either as regular SQL expressions
or as TVF expressions. In TVF (i.e., Python) context,
<code>TRUE</code> becomes <code>True</code>,
<code>FALSE</code> <code>False</code>, and <code>NULL</code>
<code>None</code>.</p>
<object class="sytax" data="literal-value.syntax.svg" type="image/svg+xml" />
<h2>Bind parameters</h2>
<p>Represents a parameter substitution in a query.
Positional argument numbers are assigned automatically to
positional <code>?</code> substitutions without explicit
numbers. The assignments starts from zero and proceeds
left-to-right during parsing, incrementing each unnumbered
positional substitution's substitution number by one.
Explicitly numbered substitutions do not affect this automatic
numbering.</p>
<object class="sytax" data="bind-parameter.syntax.svg" type="image/svg+xml" />
<h2>Numeric literal</h2>
<object class="sytax" data="numeric-literal.syntax.svg" type="image/svg+xml" />
<h2>VALUES list</h2>
<p><code>VALUES</code> works just as it does in standard SQL
and allows query authors to include data inline. DCTV does
not provide a <code>CREATE TABLE</code> function, but one can
achieve a similar effect by using <code>CREATE VIEW</code>
with a <code>VALUES</code> select-part.</p>
<object class="sytax" data="values-list.syntax.svg" type="image/svg+xml" />
<div />
<object class="sytax" data="values-list-row.syntax.svg" type="image/svg+xml" />
<h2>Common table expression</h2>
<p>The common table expression part of a SQL query.
The optional column list in the name performs the same
column-renaming operation that the optional column list after
<code>AS</code> does.</p>
<object class="sytax" data="common-table-expression.syntax.svg" type="image/svg+xml" />
<h2>Namespace prefix</h2>
<p>Names a part of the DCTV namespace.</p>
<object class="sytax" data="ns-prefix.syntax.svg" type="image/svg+xml" />
<h2>Table namespace name</h2>
<p>Describes a table in the table namespace.</p>
<object class="sytax" data="table-ns-name.syntax.svg" type="image/svg+xml" />
<h2>Table-valued-function name</h2>
<p>Describes a table-valued-function in the table namespace.</p>
<object class="sytax" data="tvf-ns-name.syntax.svg" type="image/svg+xml" />
<h2>SQL function name</h2>
<p>Describes a SQL function name in the function namespace.
Note that the function namespace is distinct from the table
namespace.</p>
<object class="sytax" data="function-name.syntax.svg" type="image/svg+xml" />
<h2>CREATE VIEW</h2>
<p><code>CREATE VIEW</code> works just like it does in
standard SQL.</p>
<object class="sytax" data="create-view-stmt.syntax.svg" type="image/svg+xml" />
<h2>DROP VIEW</h2>
<p><code>DROP VIEW</code> works just like it does in
standard SQL.</p>
<object class="sytax" data="drop-view-stmt.syntax.svg" type="image/svg+xml" />
<h2>DROP ALL</h2>
<p><code>DROP ALL</code> drops everything from a prefix
of the DCTV namespace. It is useful for "unmounting"
traces by detaching a trace sub-namespace from the global
namespace.</p>
<object class="sytax" data="drop-all-stmt.syntax.svg" type="image/svg+xml" />
<h2>MOUNT TRACE</h2>
<p>Mount trace "mounts" a trace file at a prefix of the trace
namespace. See the standard library section of this manual
for a description of what's available under the mount
prefix.</p>
<object class="sytax" data="mount-trace-stmt.syntax.svg" type="image/svg+xml" />
<h2>Ordering term</h2>
<p>Ordering in a <code>SELECT</code></p>
<object class="sytax" data="ordering-term.syntax.svg" type="image/svg+xml" />
<h2>SQL compound operators</h2>
<p>Ways of combining multiple <code>SELECT</code> "cores".</p>
<p>Not applicable to span tables.</p>
<object class="sytax" data="compound-operator-name.syntax.svg" type="image/svg+xml" />
<h2>Comment syntax</h2>
<object class="sytax" data="comment-syntax.syntax.svg" type="image/svg+xml" />
<h2>Operators</h2>
<p>The following table gives the precedence
of each operator. Operators on the same row have
the same precedence.</p>
<table class="general" style="width:50%">
<caption>Operator precedence</caption>
<tr>
<td>
*
/
%
//
</td>
</tr>
<tr>
<td>
+
-
</td>
</tr>
<tr>
<td>
&lt;&lt;
&gt;&gt;
</td>
</tr>
<tr>
<td>
&amp;
</td>
</tr>
<tr>
<td>
|
</td>
</tr>
<tr>
<td>
BETWEEN
</td>
</tr>
<tr>
<td>
=
&lt;=&gt;
&lt;!=&gt;
&gt;=
&gt;
&lt;=
&lt;
!=
&lt;&gt;
==
IS
</td>
</tr>
<tr>
<td>
NOT
</td>
</tr>
<tr>
<td>
AND
</td>
</tr>
<tr>
<td>
OR
</td>
</tr>
</table>
<table class="general" style="width:50%">
<caption>Operator descriptions</caption>
<tr>
<th>Operator</th>
<th>Description</th>
</tr>
<tr>
<td>*</td>
<td>Multiply</td>
</tr>
<tr>
<td>/</td>
<td>True division (yield float)</td>
</tr>
<tr>
<td>%</td>
<td>Modulus</td>
</tr>
<tr>
<td>//</td>
<td>Floor division (truncates toward zero)</td>
</tr>
<tr>
<td>+</td>
<td>Addition</td>
</tr>
<tr>
<td>-</td>
<td>Subtraction</td>
</tr>
<tr>
<td>&lt;&lt;</td>
<td>Left shift</td>
</tr>
<tr>
<td>&gt;&gt;</td>
<td>Right shift</td>
</tr>
<tr>
<td>&amp;</td>
<td>Bitwise AND</td>
</tr>
<tr>
<td>|</td>
<td>Bitwise OR</td>
</tr>
<tr>
<td>BETWEEN</td>
<td>Standard SQL</td>
</tr>
<tr>
<td>= ==</td>
<td>Equality</td>
</tr>
<tr>
<td>&lt;=&gt; IS</td>
<td>NULL-safe equality</td>
</tr>
<tr>
<td>&lt;!=&gt; IS NOT</td>
<td>NULL-safe inequality</td>
</tr>
<tr>
<td>&gt;=</td>
<td>Greater than or equals</td>
</tr>
<tr>
<td>&gt;</td>
<td>Greater than</td>
</tr>
<tr>
<td>&lt;=</td>
<td>Less than or equal</td>
</tr>
<tr>
<td>&lt;</td>
<td>Less than</td>
</tr>
<tr>
<td>!= &lt;&gt;</td>
<td>Inequality</td>
</tr>
<tr>
<td>NOT</td>
<td>Logical negation</td>
</tr>
<tr>
<td>AND</td>
<td>Logical conjunction</td>
</tr>
<tr>
<td>OR</td>
<td>Logical disjunction</td>
</tr>
</table>
<h1><a name="standard_library">Trace standard library</a></h1>
<!-- TODO(dancol): document columns -->
<p>DCTV queries operate on a single shared per-session
namespace. (Each leaf level of the namespace has distinct
mappings for table-valued things and for SQL functions, but
the non-leaf nodes are shared between the namespaces.)</p>
<h2>Per-trace names</h2>
<p>DCTV makes a trace available by "mounting" it under a
namespace prefix. Names beginning with this prefix then refer
to the trace that was mounted. Multiple traces can be mounted
in the same session, and a single query can pull data from
multiple traces.</p>
<p>In the description below, <code>mytrace.</code>
refers to an arbitrary trace mountpoint.</p>
<dl>
<dt><code>mytrace.raw_events.*</code></dt>
<dd>
<p>These tables provide access to the raw events embedded
in the trace. For example,
<code>mytrace.raw_events.sched_switch</code> is a table of
<code>sched_switch</code> events, with one column for each
field in the ftrace event.</p>
<p>The DCTV event parser has a special case for
<code>trace_marker_write</code> events: we put each one in
a table formed by the concatenation of the event name and
the first part of the event payload. For example,
<code>`print|B`</code> refers to those
<code>trace_marker_write</code> events that begin with the
prefix <code>B|</code>, indicating the start of a
synchronous application-defined trace event. We need to
write <code>mytrace.raw_events.`print|B`</code> instead of
<code>mytrace.raw_events.print|B</code> because
<code>|</code> is normally an operator, so to treat it as
part of a table name, we need to escape it with
backticks.</p>
<aside class="note">This special case for
<code>write</code> is an ugly hack that exists so that we
can give a different "schema" (set of columns) to each
different kind of write event depending on its
payload.</aside>
</dd>
<dt><code>mytrace.scheduler.timeslices_p_cpu</code></dt>
<dd>
<p>This table is a span table partitioned by CPU
representing the scheduler activity of the system.</p>
</dd>
<dt><code>mytrace.scheduler.cpufreq_p_cpu</code></dt>
<dd>
<p>This table is a span table partitioned by CPU
representing the CPU frequency that each CPU is
known to have.</p>
</dd>
<dt><code>mytrace.last_ts</code></dt>
<dd>
<p>Single-column, single-value event table giving the
largest timestamp found in the trace. It's useful for
building spans that cover the whole trace, but see
<code>quantize</code> immediately below.</p>
</dd>
<dt><code>mytrace.quantize(interval=&gt;NULL)</code></dt>
<dd>
<p>This table-valued function generates a payloadless span
table that divides the trace timeline into fixed-size
spans of duration <code>interval</code>. This table is
useful for quantizing the trace timeline into fixed-size
blocks for display or analysis, and is designed to work
with span group operations.</p>
<p>If <code>interval</code> is <code>NULL</code>,
generates a span table with one huge span covering the
whole trace.</p>
<aside class="example code-example"><![CDATA[SELECT SPAN SUM(_duration)/5s AS non_idle_ratio
FROM (SELECT SPAN * FROM my_cpu_timeslices WHERE pid != 0)
GROUP USING SPANS FROM mytrace.quantize(5s) ]]></aside>
</dd>
</dl>
<h2>The DCTV namespace</h2>
<p>DCTV-specific query functions live under the
<code>dctv.</code> namespace prefix.</p>
<dl>
<dt><code><dfn><a name="time_series_to_spans">
dctv.time_series_to_spans(*, sources, columns, partition=&gt;NULL)
</a></dfn></code></dt>
<dd>
<p>This function implements the time series to span
conversion operation described <a
href="#time_series_to_span_conversion"> above</a>.</p>
<p><code>sources</code> is a list of source
specifications. Each sources specification is a dict with
the following entries; entries are optional unless
otherwise indicated. As a convenience, a source
specification can also be a list, the elements of which
are turned into dict elements in the order given below.
If a source specification is neither a dict nor a list, it
is treated as if it were a dict with only the source
element provided. (This way, a bare table is a valid
event source.)</p>
<dl>
<dt>source</dt>
<dd>The event table providing the raw events that this routine
turns into spans. Mandatory.</dd>
<dt>role</dt>
<dd>Either <code>"start"</code> or <code>"stop"</code>, defaulting to
<code>"start"</code>. Indicates whether the given source starts and
separates output spans (in the former case) or whether it stops only
started spans (the latter case).</dd>
<dt>partition</dt>
<dd>Either a string naming the column by which this
source is partitioned or <code>NULL</code>, indicating
that the source is unpartitioned. Defaults to
<code>NULL</code>.</dd>
<dt>timestamp</dt>
<dd>The name of the column in the source providing the timestamp.
Defaults to <code>"_ts"</code>.</dd>
<dt>nickname</dt>
<dd>An optional string assigning a name to this source that column
specifications in <code>columns</code> can reference.</dd>
</dl>
<p><code>columns</code> is a list of column
specifications, each representing one payload column in
the generated span table.</p>
<p>Each column specification is a dict with the elements
below. As a convenience, a column specification can also
be a list, the elements of which are turned into dict
elements in the order given below. If a column
specification is neither a dict nor a list, it must be a
string, and it is treated as if it were a dict with only
the column element set. (This way, a simple string is a
valid column descriptor in the case that we have only one
source.)</p>
<dl>
<dt>column</dt>
<dd>String naming the output column in the generated
span table. Mandatory.</dd>
<dt>source</dt>
<dd>Identifies the source that supplies this output
column. May be omitted when only one source is given to
the call; otherwise, must either be a number (naming a
source positionally) or a string (matching the nickname
given to a source in its specification).</dd>
<dt>source_column</dt>
<dd>Name of the column in the source event table that
supplies the value of the corresponding column in the
output table. Defaults to the name of the output
column.</dd>
<dt>edge</dt>
<dd>Either <code>"rising"</code> or
<code>"falling"</code>, defaulting to the former.
Determines which event supplies the value of the column
in the output table: the event that starts a span or the
event that ends a span.</dd>
</dl>
<p><code>partition</code> is the name of the partition
column in the output span table. If it is specified, all
sources must have their own partitions specified. If it
is not yet, then no source may be partitioned.</p>
<aside class="example code-example"><![CDATA[SELECT SPAN * FROM dctv.time_series_to_span(
sources=>[my_raw_events_table],
columns=>["foo", "bar", "qux"],
)]]></aside>
<aside class="example code-example"><![CDATA[SELECT SPAN * FROM dctv.time_series_to_span(
sources=>[{source=>table1, partition=>"cpu", nickname="foo"},
{source=>table2, partition=>"cpu", role="stop"}],
columns=>[{column=>"total_things",
source=>"foo",
source_column=>"last_things",
edge=>"falling"}])
)]]></aside>
<p>This routine looks pretty ugly when called. Most of
the time, you want to use one of the pre-defined span
tables in the <a href="#standard_library">standard
library</a>, which call <code>time_series_to_spans</code>
for you.</p>
</dd>
<dt><dfn><code>dctv.stack_history()</code></dfn></dt>
<dd>
<p>This table-valued function understands "nested" events,
turning them into stacks for further analysis.
This table-valued function generates a span table mapping
time intervals to stack IDs.</p>
<p>See the <a href="#stackification">stackification</a>
sub-section of the data model section.</p>
<!-- TODO(dancol): document argument list -->
</dd>
<dt><dfn><code>dctv.stack_contents()</code></dfn></dt>
<dd>
<p>This table-valued function generates the
<emph>contents</emph> of the stack IDs generated by the
previous function.</p>
<p>See the <a href="#stackification">stackification</a>
sub-section of the data model section.</p>
</dd>
<dt><dfn><code>dctv.generate_sequential_spans(start, stop, duration)</code></dfn></dt>
<dd>
<p>This table-valued function generates "synthetic" spans
useful for a variety of purposes. See the <a
href="#span_generation">span generation</a> sub-section of
the data model section above.</p>
<p><code>start</code> is a timestamp at which the spans
should start. <code>stop</code> is the time at which the
last span should end. <code>duration</code> is the length
of each generated span. Output spans are generated with
no gaps.</p>
</dd>
</dl>
<h1><a name="example">Worked example</a></h1>
<p>Having read the above manual, this query should make sense.</p>
<p><code>TODO(dancol):</code> expand this section.</p>
<ol>
<li>Extract from “print|B” a list of frame-start events.</li>
<li>Take these events and, using time_series_to_span’s
“start-start” mode, assemble a set of spans partitioning the
trace timeline into frames.</li>
<li>Select those frame-spans that lasted longer than 17ms,
i.e., that took a long time to render.</li>
<li>Intersect this bad-frame span set with the per-processor
span table describing what the system is actually
doing. Don’t consider the idle process.</li>
</ol>
<code class="blockquote"><![CDATA[WITH frames AS (SELECT SPAN * FROM dctv.time_series_to_spans(
sources=>[{source=>(SELECT * FROM trace.raw_events.`print|B` W
HERE name='eglBeginFrame'),
timestamp=>'ts'}],
columns=>[])),
bad_frames AS (SELECT SPAN * FROM frames WHERE _duration > 17ms),
bad_timeslices AS (SELECT SPAN * FROM trace.scheduler.timeslices_p_cpu
SPAN BROADCAST FROM bad_frames)
SELECT comm, cpu, SUM(_duration) AS totdur FROM bad_timeslices
WHERE pid != 0
GROUP BY comm, cpu
ORDER BY totdur DESC
LIMIT 20
]]></code>
</main>
</div>
</body>
</html>