| // Copyright 2014 The Kythe Authors. All rights reserved. |
| // |
| // Licensed under the Apache License, Version 2.0 (the "License"); |
| // you may not use this file except in compliance with the License. |
| // You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| An Overview of Kythe |
| ==================== |
| Michael J. Fromberger <fromberger@google.com> |
| v0.1.1, 28-Oct-2014: Draft |
| :toc: |
| :priority: 1000 |
| |
| == Introduction to Kythe |
| |
| The Kythe project was founded to provide and support tools and standards that |
| encourage interoperability among programs that manipulate source code. At a |
| high level, the main goal of Kythe is to provide a standard, language-agnostic |
| interchange mechanism, allowing tools that operate on source code -- including |
| build systems, compilers, interpreters, static analyses, editors, code-review |
| applications, and more -- to share information with each other smoothly. |
| |
| The remainder of this document gives a basic introduction to the ideas and |
| rationale behind Kythe, and provides links to other more specific documentation |
| about the project. |
| |
| == Background |
| |
| As the size and scope of a software project grows, developers rely more and |
| more on tools for routine tasks such as building, testing, deploying, |
| debugging, refactoring, and analyzing their source code. Even for a small |
| development team such as a startup, the toolchain used to write, test, and |
| deploy code can be fairly complex -- subsuming multiple languages, a variety of |
| source control systems, a variety of build & deployment tools, test frameworks, |
| and of course a great patchwork of scripts to tie it all together. |
| |
| When a codebase is small and dependencies are few, it's fairly easy to do these |
| tasks manually: A small codebase of a few thousand files can be imported into |
| an IDE and manipulated directly. As the codebase grows, however, it assumes |
| more and more dependencies on external (``third party'') libraries and tools. |
| Moreover, as a developer team grows, the complexity of performing those tasks |
| increases to the point where it is difficult -- or even infeasible -- to build, |
| debug, and test the project efficiently on a single workstation. |
| |
| Kythe grew out of our experience creating a large-scale semantic index of |
| cross-references for the enormous, multi-lingual internal codebase at Google. |
| We found that engineers often lose a lot of time adapting a new tool to their |
| project, and in the process wind up re-inventing a solution to a problem that |
| had already been solved by some other program (_e.g.,_ the compiler or an |
| analysis tool embedded in the IDE). The main reason this happens is that the |
| existing solution usually doesn't play well with the other tools the developers |
| are using. Some teams `work around' this problem by forcing everyone to use |
| the same tools; but in our experience, that approach scales poorly: Integrated |
| environments work well for the languages and tools that are integrated, but the |
| cost of adding new pieces is high. Developer tool-preferences are highly |
| diverse and idiosyncratic, and developers' productivity declines sharply when |
| they are forced to use tools they dislike. |
| |
| The main premise of Kythe, therefore, is that programming language tools ought |
| to be able to talk to each other easily: Not just the tools for a given |
| language, but across all the languages used in your project -- and not just for |
| a single chosen development environment, but (potentially) for any workable |
| combination of tools your developers may use. Considering editors, compilers, |
| build systems, analysis tools, deployment, testing, continuous integration -- |
| there are a lot of options for each of these and more; but at present |
| relatively few combinations actually work well together in practice. |
| |
| == Goals of Kythe |
| |
| The best way to view Kythe is as a ``hub'' for connecting tools for various |
| languages, clients and build systems. By defining *language-agnostic |
| protocols and data formats* for representing, accessing and querying |
| source code information as data, Kythe allows *language analysis and |
| indexing to be run as services*. This, in turn, enables lightweight |
| (``thin'') composition of analysis tools with client tools such as |
| editors, IDEs, and code browsers. |
| |
| A hub-and-spoke model reduces the overall work to integrate _L_ languages, _C_ |
| clients, and _B_ build systems from a worst-case of O(L×C×B) -- combinatorial |
| in the size of the ecosystem -- to O(L+C+B): Implementing Kythe compatibility |
| for a given compiler, editor, or build system is, roughly, a constant up-front |
| cost for each component, after which that component can interoperate with all |
| the existing pieces directly. |
| |
| To make this model work, Kythe provides a language-agnostic graph structure to |
| capture build-system and compiler metadata, as well as semantic information |
| about source code such as cross-references (_e.g.,_ definitions and their |
| usages, type information, and cross-language associations). By design, the |
| Kythe graph schema is liberal and extensible -- we've defined a number of |
| useful subgraphs, but new node and edge kinds are structured so that the graph |
| can easily be extended without recourse to a central authority. |
| |
| One of the basic design principles of Kythe is that interoperability should not |
| be `all-or-nothing': Tools should adjust gracefully to missing or incomplete |
| data. For many purposes, we've found that some information is almost always |
| better than none. At the same time, it is better to emit _incomplete_ data |
| than to emit _incorrect_ data. In practice, the important point is that tools |
| should not ``give up'' in the presence of incomplete data, as partial results |
| are often still useful. |
| |
| == Non-goals of Kythe |
| |
| Although Kythe provides interoperability for many purposes, it does not cover |
| every possible situation. By design, there are some specific problems we are |
| explicitly _not_ attempting to solve with this project: |
| |
| * *Writing a compiler or optimizer.* Kythe's graph is meant to capture |
| high-level, cross-cutting information that has a similar character across a |
| variety of languages. |
| Low-level details like code generation and optimization are, by nature, |
| language-specific. While you could in principle model a compiler's internal |
| structures in Kythe's graph, that is not a primary goal. |
| |
| * *Replacing existing IRs.* Some tools (_e.g.,_ static analyzers) already have |
| expressive purpose-built internal representations for code. Kythe is not |
| meant to be a universal replacement for such IRs -- instead, our goal is to |
| provide a way for such tools to capture ``interesting subsets'' of an |
| analysis for sharing with other tools. |
| |
| *In short:* We are not interested in an http://en.wikipedia.org/wiki/UNCOL[UNCOL]. |
| |
| Our goal is to provide a language for sharing data between tools, and while we |
| find that this works well for a large class of interesting problems, there will |
| always be situations that are highly specific to a particular language or data |
| model. For such cases, it's entirely appropriate to use a representation that |
| is tuned to that purpose. |
| |
| == What Kythe Provides |
| |
| The core of the Kythe project centers around three themes, which are embodied |
| in our open-source tools and supported by the Kythe team at Google together |
| with any interested contributors: |
| |
| * *Language-agnostic graph storage format*. Kythe defines a simple, flexible, |
| and portable graph representation that is easy to emit from an instrumented |
| compiler, and for clients to consume. |
| |
| * *Graph schema*. Kythe provides a simple, extensible |
| https://kythe.io/schema/[graph schema] for a variety of |
| interesting semantic cross-reference data in various languages, including |
| C++, Java, and (soon) Go. We also provide some simple, open-source tools |
| that make it easy to add new elements to the schema, and test whether an |
| analyzer that produces those elements has met its contract. |
| |
| * *Analyzers, tools and examples*. The Kythe project provides several |
| open-source tools for generating and manipulating Kythe data, including |
| indexers for C++, Java, and (soon) Go; a self-contained server that can use |
| Kythe data to answer cross-reference queries; and some UI example code that |
| shows how some of these pieces can be glued together. |
| |
| == What Kythe Requires |
| |
| Essentially all that is needed to participate in the Kythe ecosystem is for a |
| tool to consume and/or emit data in the Kythe format, and -- where appropriate |
| -- to follow the Kythe schema. You'll need: |
| |
| *To plug in a language*:: |
| A compiler that can be instrumented to produce an indexer that emits Kythe |
| data about source code in that language. |
| |
| *To plug in a build system*:: |
| A tool that can ``extract'' compilation information from the build process, |
| allowing a language-specific analyzer to be run on the code and its |
| dependencies. |
| |
| *To plug in a UI tool*:: |
| Any tool that can consume Kythe graph artifacts can use Kythe data to answer |
| questions about code. The only specific knowledge that needs to be baked |
| into the tool is the naming scheme. |
| |
| *To build a service or other analysis on Kythe data*:: |
| The Kythe data format is simple, flat, and easily portable. A tool or |
| service can quickly convert Kythe graph data into tabular or other structured |
| formats for quick serving, graph exploration, visualization, etc. |
| |
| Other documents and examples cover the details of how these pieces are |
| implemented in practice, but the unifying principles are the common data |
| representation and schema. |
| |
| == Related Documentatation |
| |
| * link:kythe-compatible-compilers.html[Kythe-Compatible Tools] |
| * link:schema/[Kythe Schema Reference] |