blob: 3c5aa4f9bbac88b06f8f5dc1a7fd9ac76269a261 [file] [log] [blame]
// Copyright 2018 The Kythe Authors. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
Kythe Configurable Extraction
=============================
Craig Barber <craigbarber@google.com>
v0.1.1, 09-May-2018: Draft
:toc:
This document provides an overview of Kythe's configurable extraction framework,
and serves as a de facto getting started guide for onboarding support for a
new build system for Kythe extraction.
== Overview
The Kythe Configurable Extraction system is designed to be a generalized
solution providing support for running Kythe extraction on a diversity of
build systems. The system consists of a per-repository configuration file and
a collection of tools that consume that file to generate a customized extraction
image tailored to that repository. An extraction image defines a host environment
for hermetically building the repository's contents (e.g., in Docker) with Kythe
extractor tools injected, to generate Kythe http://kythe.io/docs/kythe-index-pack.html#_compilation_unit_description_format[Compilation Units].
These units are consumed downstream for static analysis and indexing.
The generated extraction image is in the format of a https://docs.docker.com/[Docker image]
a standardized container format. https://git-scm.com[Git] is currently used
for retrieving repository contents, however support for other source control
tools can be added as needed.
Note that this is a work in progress not a finished product. The intent is to
document the system as it evolves in order provide early adopters with a means of
trying it out and providing feedback.
== Extraction Configuration Schema
An extraction configuration is used to construct a Docker image suitable for
building and extracting a given repository. This schema defines a low-level
configuration format. Where practical, configuration settings will be inferred
automatically, but in cases where that is not possible, a user-friendly interface
may be added to allow users to control the extraction behavior directly. For the
time being, this intermediate configuration schema can be utilized by users who
would like to get a head-start on enabling Kythe on their repositories. The
configuration schema is defined within https://github.com/kythe/kythe/blob/master/kythe/proto/extraction_config.proto[extraction_config.proto].
== Extraction Configuration Usage
Instances of this configuration schema can be placed in the root directory of
the repository in a file named: ".kythe-extraction-config", formatted as a https://developers.google.com/protocol-buffers/docs/proto3#json[JSON encoded protobuf].
An example of an existing extraction configuration can be found here: https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/base/testdata/mvn_config.json[mvn_config.json].
The corresponding extraction image which gets generated from the mvn_config.json
file can be found here: https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/base/testdata/expected_mvn_config.Dockerfile[expected_mvn_config.Dockerfile].
This configuration serves as an input to the https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/extractrepo.go[extractrepo]
tool which executes the Kythe extraction process on a given repository.
=== Extraction Configuration Components
....
repeated Image required_image
....
This field defines a set of artifacts from a base image to copy into the
generated extraction image, where for each listed `required_image`, the Docker
image will have:
....
FROM <image.uri> as <image.name>
# ...repeated...
COPY <image.copy_spec.source> <image.copy_spec.dest>
# ...repeated...
ENV <image.env_var.name>=<image.env_var.value>
....
The https://github.com/kythe/kythe/blob/006fd1149173162df2b6670f4bf3d4a38204ca4c/kythe/proto/extraction_config.proto#L38[Image]
message has the following parts:
`repeated CopySpec copy_spec` defines a list of artifacts to be copied from the
base image into the generated extraction image.
`string uri` defines the URI to a base docker image. This can refer to images
defined within either local or online docker container registries.
`string name` defines a unique name for this image, to be referenced when
copying artifacts.
`repeated EnvVar env_var` defines environment variables within the generated
extraction image related to the artifacts copied from the base image.
....
repeated RunCommand run_command
....
This field configures the execution of arbitrary RUN commands during the
construction of the generated extraction image. This provides for the
installation of required resources which may not have corresponding base docker
images. For each listed `run_command`, the Docker image will have:
....
RUN <command> "<arg[0]>" "<arg[1]>" ...
....
....
repeated string entry_point
....
This field defines the entry point for the generated image. The entry point is
the logic which is run when the generated image's container is started. This is
typically a script or binary which intiates the build and extraction process. An
example entry point binary can be found here:
https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/runextractor/runextractor.go[runextractor.go].
For each listed `entry_point` the Docker image will have:
....
ENTRYPOINT ["<entrypoint[0]>", "<entrypoint[1]>", ...]
....
== Extraction Image Volumes
Each generated extraction image contains default volumes for input and output
during the extraction process. These utilize the Docker https://docs.docker.com/storage/volumes/[volume]
feature to specify host directories which are mounted within the running
container.
/repo::
This volume contains the contents of the repository to be processed by the
Kythe extraction framework. It should have read and write privileges as it is
common for some build systems' configuration files to require pre-processing
in order for successful extraction.
/out::
This volume will contain the output artifacts of the Kythe extraction process in
the form of http://kythe.io/docs/kythe-index-pack.html[kindex] files, (note: this
format may change in the future). Any diagnostic output from extractors will
also be written here. This directory should have read and write privileges.
== Extraction Image Environment Variables
In addition to environment variables defined by the configuration schema,
generated extraction images also contain a default set of environment variables
facilitating access to input and output for extractors running within the
container.
KYTHE_ROOT_DIRECTORY::
This environment variable points to the volume mount path for the */repo* volume.
KYTHE_OUTPUT_DIRECTORY::
This environment variables points to the volume mount path for the */out* volume.
== Extraction Wrapper
In the process of enabling support for a new build system, it is common to
implement a build system wrapper which serves as the entry point for the
generated extraction image. This wrapper is responsible for any pre-processing
of build configuration files which might be necessary, as well as invoking the
build system with the arguments necessary to hook the extractor into the build
system's compilation step. An example of such a wrapper can be found here:
https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/runextractor/runextractor.go[runextractor.go].
A common pattern is to have the wrapper as well as any language specific
extraction binaries bundled within an extraction artifacts base image for use
in the extraction configuration. An example of such an artifacts base image can
be found here: https://github.com/kythe/kythe/blob/master/kythe/java/com/google/devtools/kythe/extractors/java/artifacts/Dockerfile[kythe/extractors/java/artifacts].
== Extraction Tools
The Kythe project contains a collection of tools available for running and
testing extraction manually. The documentation for these tools can be found
here: https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/README.md[README.md].
These tools require the following to programs to be locally installed and
accessible on the $PATH: https://www.docker.com/get-docker[Docker], https://git-scm.com/downloads[Git].
The https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/extractrepo/extractrepo.go[extractrepo]
binary provides a tool for running an extraction manually. It consumes an extraction configuration file either specified as a command line argument, or else contained within the ".kythe-extraction-config" file in the root of the repository. The
binary generates the extraction image, clones the repository, and then runs the
extraction image's container to perform the Kythe extraction on its contents.
The usage for the binary is as follows:
....
extractrepo -repo <repo_uri> -output <output_file_path> -config [config_file_path]
....
The https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/repotester/repotester.go[repostester]
binary provides a tool which runs an extraction on a given repository, and then
runs a smoke test to verify adequate file coverage on the extraction's output.
The usage for the binary is as follows:
....
repotester -repos <comma_delimited,repo_urls> [-config <config_file_path>] [-github_token <github_token>]
repotester -repo_list_file <file> [-config <config_file_path>] [-github_token <github_token>]
....