| // Copyright 2018 The Kythe Authors. All rights reserved. |
| // |
| // Licensed under the Apache License, Version 2.0 (the "License"); |
| // you may not use this file except in compliance with the License. |
| // You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| Kythe Configurable Extraction |
| ============================= |
| Craig Barber <craigbarber@google.com> |
| v0.1.1, 09-May-2018: Draft |
| :toc: |
| |
| This document provides an overview of Kythe's configurable extraction framework, |
| and serves as a de facto getting started guide for onboarding support for a |
| new build system for Kythe extraction. |
| |
| == Overview |
| |
| The Kythe Configurable Extraction system is designed to be a generalized |
| solution providing support for running Kythe extraction on a diversity of |
| build systems. The system consists of a per-repository configuration file and |
| a collection of tools that consume that file to generate a customized extraction |
| image tailored to that repository. An extraction image defines a host environment |
| for hermetically building the repository's contents (e.g., in Docker) with Kythe |
| extractor tools injected, to generate Kythe http://kythe.io/docs/kythe-index-pack.html#_compilation_unit_description_format[Compilation Units]. |
| These units are consumed downstream for static analysis and indexing. |
| |
| The generated extraction image is in the format of a https://docs.docker.com/[Docker image] |
| a standardized container format. https://git-scm.com[Git] is currently used |
| for retrieving repository contents, however support for other source control |
| tools can be added as needed. |
| |
| Note that this is a work in progress not a finished product. The intent is to |
| document the system as it evolves in order provide early adopters with a means of |
| trying it out and providing feedback. |
| |
| == Extraction Configuration Schema |
| |
| An extraction configuration is used to construct a Docker image suitable for |
| building and extracting a given repository. This schema defines a low-level |
| configuration format. Where practical, configuration settings will be inferred |
| automatically, but in cases where that is not possible, a user-friendly interface |
| may be added to allow users to control the extraction behavior directly. For the |
| time being, this intermediate configuration schema can be utilized by users who |
| would like to get a head-start on enabling Kythe on their repositories. The |
| configuration schema is defined within https://github.com/kythe/kythe/blob/master/kythe/proto/extraction_config.proto[extraction_config.proto]. |
| |
| == Extraction Configuration Usage |
| |
| Instances of this configuration schema can be placed in the root directory of |
| the repository in a file named: ".kythe-extraction-config", formatted as a https://developers.google.com/protocol-buffers/docs/proto3#json[JSON encoded protobuf]. |
| An example of an existing extraction configuration can be found here: https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/base/testdata/mvn_config.json[mvn_config.json]. |
| The corresponding extraction image which gets generated from the mvn_config.json |
| file can be found here: https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/base/testdata/expected_mvn_config.Dockerfile[expected_mvn_config.Dockerfile]. |
| This configuration serves as an input to the https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/extractrepo.go[extractrepo] |
| tool which executes the Kythe extraction process on a given repository. |
| |
| === Extraction Configuration Components |
| |
| .... |
| repeated Image required_image |
| .... |
| |
| This field defines a set of artifacts from a base image to copy into the |
| generated extraction image, where for each listed `required_image`, the Docker |
| image will have: |
| .... |
| FROM <image.uri> as <image.name> |
| # ...repeated... |
| COPY <image.copy_spec.source> <image.copy_spec.dest> |
| # ...repeated... |
| ENV <image.env_var.name>=<image.env_var.value> |
| .... |
| |
| The https://github.com/kythe/kythe/blob/006fd1149173162df2b6670f4bf3d4a38204ca4c/kythe/proto/extraction_config.proto#L38[Image] |
| message has the following parts: |
| |
| `repeated CopySpec copy_spec` defines a list of artifacts to be copied from the |
| base image into the generated extraction image. |
| |
| `string uri` defines the URI to a base docker image. This can refer to images |
| defined within either local or online docker container registries. |
| |
| `string name` defines a unique name for this image, to be referenced when |
| copying artifacts. |
| |
| `repeated EnvVar env_var` defines environment variables within the generated |
| extraction image related to the artifacts copied from the base image. |
| |
| .... |
| repeated RunCommand run_command |
| .... |
| |
| This field configures the execution of arbitrary RUN commands during the |
| construction of the generated extraction image. This provides for the |
| installation of required resources which may not have corresponding base docker |
| images. For each listed `run_command`, the Docker image will have: |
| .... |
| RUN <command> "<arg[0]>" "<arg[1]>" ... |
| .... |
| |
| .... |
| repeated string entry_point |
| .... |
| This field defines the entry point for the generated image. The entry point is |
| the logic which is run when the generated image's container is started. This is |
| typically a script or binary which intiates the build and extraction process. An |
| example entry point binary can be found here: |
| https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/runextractor/runextractor.go[runextractor.go]. |
| For each listed `entry_point` the Docker image will have: |
| .... |
| ENTRYPOINT ["<entrypoint[0]>", "<entrypoint[1]>", ...] |
| .... |
| |
| == Extraction Image Volumes |
| |
| Each generated extraction image contains default volumes for input and output |
| during the extraction process. These utilize the Docker https://docs.docker.com/storage/volumes/[volume] |
| feature to specify host directories which are mounted within the running |
| container. |
| |
| /repo:: |
| This volume contains the contents of the repository to be processed by the |
| Kythe extraction framework. It should have read and write privileges as it is |
| common for some build systems' configuration files to require pre-processing |
| in order for successful extraction. |
| |
| /out:: |
| This volume will contain the output artifacts of the Kythe extraction process in |
| the form of http://kythe.io/docs/kythe-index-pack.html[kindex] files, (note: this |
| format may change in the future). Any diagnostic output from extractors will |
| also be written here. This directory should have read and write privileges. |
| |
| == Extraction Image Environment Variables |
| |
| In addition to environment variables defined by the configuration schema, |
| generated extraction images also contain a default set of environment variables |
| facilitating access to input and output for extractors running within the |
| container. |
| |
| KYTHE_ROOT_DIRECTORY:: |
| This environment variable points to the volume mount path for the */repo* volume. |
| |
| KYTHE_OUTPUT_DIRECTORY:: |
| This environment variables points to the volume mount path for the */out* volume. |
| |
| == Extraction Wrapper |
| |
| In the process of enabling support for a new build system, it is common to |
| implement a build system wrapper which serves as the entry point for the |
| generated extraction image. This wrapper is responsible for any pre-processing |
| of build configuration files which might be necessary, as well as invoking the |
| build system with the arguments necessary to hook the extractor into the build |
| system's compilation step. An example of such a wrapper can be found here: |
| https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/runextractor/runextractor.go[runextractor.go]. |
| |
| A common pattern is to have the wrapper as well as any language specific |
| extraction binaries bundled within an extraction artifacts base image for use |
| in the extraction configuration. An example of such an artifacts base image can |
| be found here: https://github.com/kythe/kythe/blob/master/kythe/java/com/google/devtools/kythe/extractors/java/artifacts/Dockerfile[kythe/extractors/java/artifacts]. |
| |
| == Extraction Tools |
| |
| The Kythe project contains a collection of tools available for running and |
| testing extraction manually. The documentation for these tools can be found |
| here: https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/README.md[README.md]. |
| These tools require the following to programs to be locally installed and |
| accessible on the $PATH: https://www.docker.com/get-docker[Docker], https://git-scm.com/downloads[Git]. |
| |
| The https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/extractrepo/extractrepo.go[extractrepo] |
| binary provides a tool for running an extraction manually. It consumes an extraction configuration file either specified as a command line argument, or else contained within the ".kythe-extraction-config" file in the root of the repository. The |
| binary generates the extraction image, clones the repository, and then runs the |
| extraction image's container to perform the Kythe extraction on its contents. |
| The usage for the binary is as follows: |
| .... |
| extractrepo -repo <repo_uri> -output <output_file_path> -config [config_file_path] |
| .... |
| |
| The https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/extraction/repotester/repotester.go[repostester] |
| binary provides a tool which runs an extraction on a given repository, and then |
| runs a smoke test to verify adequate file coverage on the extraction's output. |
| The usage for the binary is as follows: |
| .... |
| repotester -repos <comma_delimited,repo_urls> [-config <config_file_path>] [-github_token <github_token>] |
| repotester -repo_list_file <file> [-config <config_file_path>] [-github_token <github_token>] |
| .... |