Skip to content
README.txt 8.84 KiB
Newer Older
//===----------------------------------------------------------------------===//
// C Language Family Front-end
//===----------------------------------------------------------------------===//
Chris Lattner's avatar
Chris Lattner committed
                                                             Chris Lattner

I. Introduction:
 
 clang: noun
    1. A loud, resonant, metallic sound.
    2. The strident call of a crane or goose.
Chris Lattner's avatar
Chris Lattner committed
    3. C-language family front-end toolkit.
 The world needs better compiler tools, tools which are built as libraries. This
 design point allows reuse of the tools in new and novel ways. However, building
 the tools as libraries isn't enough: they must have clean APIs, be as
 decoupled from each other as possible, and be easy to modify/extend.  This
 requires clean layering, decent design, and avoiding tying the libraries to a
 specific use.  Oh yeah, did I mention that we want the resultant libraries to
 be as fast as possible? :)

Chris Lattner's avatar
Chris Lattner committed
 This front-end is built as a component of the LLVM toolkit that can be used
 with the LLVM backend or independently of it.  In this spirit, the API has been
 carefully designed as the following components:
 
   libsupport  - Basic support library, reused from LLVM.
   libsystem   - System abstraction library, reused from LLVM.
Chris Lattner's avatar
Chris Lattner committed
   libbasic    - Diagnostics, SourceLocations, SourceBuffer abstraction,
                 file system caching for input source files.  This depends on
                 libsupport and libsystem.
   libast      - Provides classes to represent the C AST, the C type system,
                 builtin functions, and various helpers for analyzing and
                 manipulating the AST (visitors, pretty printers, etc).  This
                 library depends on libbasic.
Chris Lattner's avatar
Chris Lattner committed
   liblex      - C/C++/ObjC lexing and preprocessing, identifier hash table,
                 pragma handling, tokens, and macros.  This depends on libbasic.
Chris Lattner's avatar
Chris Lattner committed
   libparse    - C (for now) parsing and local semantic analysis. This library
Chris Lattner's avatar
Chris Lattner committed
                 invokes coarse-grained 'Actions' provided by the client to do
Chris Lattner's avatar
Chris Lattner committed
                 stuff (e.g. libsema builds ASTs).  This depends on liblex.
   libsema     - Provides a set of parser actions to build a standardized AST
                 for programs.  AST's are 'streamed' out a top-level declaration
                 at a time, allowing clients to use decl-at-a-time processing,
                 build up entire translation units, or even build 'whole
                 program' ASTs depending on how they use the APIs.  This depends
                 on libast and libparse.

   librewrite  - Fast, scalable rewriting of source code.  This operates on
                 on the raw syntactic text of source code, allowing a client
                 to insert and delete text in very large source files using
                 the same source location information embedded in ASTs.  This
                 is intended to be a low-level API that is useful for
                 higher-level clients and libraries such as code refactoring.

   libanalysis - Source-level dataflow analysis useful for performing analyses
                 such as computing live variables.  It also includes a
                 path-sensitive "graph-reachability" engine for writing
                 analyses that reason about different possible paths of
                 execution through source code.  This is currently being
Ted Kremenek's avatar
Ted Kremenek committed
                 employed to write a set of checks for finding bugs in software.
Chris Lattner's avatar
Chris Lattner committed
   libcodegen  - Lower the AST to LLVM IR for optimization & codegen.  Depends
                 on libast.
Chris Lattner's avatar
Chris Lattner committed
   clang       - An example driver, client of the libraries at various levels.
Chris Lattner's avatar
Chris Lattner committed
                 This depends on all these libraries, and on LLVM VMCore.
Chris Lattner's avatar
Chris Lattner committed
 This front-end has been intentionally built as a DAG of libraries, making it
 easy to  reuse individual parts or replace pieces if desired. For example, to
 build a preprocessor, you take the Basic and Lexer libraries. If you want an
 indexer, you take those plus the Parser library and provide some actions for
 indexing.  If you want a refactoring, static analysis, or source-to-source
 compiler tool, it makes sense to take those plus the AST building and semantic
 analyzer library.  Finally, if you want to use this with the LLVM backend,
 you'd take these components plus the AST to LLVM lowering code.
 
 In the future I hope this toolkit will grow to include new and interesting
Chris Lattner's avatar
Chris Lattner committed
 components, including a C++ front-end, ObjC support, and a whole lot of other
 things.

 Finally, it should be pointed out that the goal here is to build something that
 is high-quality and industrial-strength: all the obnoxious features of the C
 family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
Chris Lattner's avatar
Chris Lattner committed
 prototypes, GCC/MS extensions, etc).  It cannot be used if it is not 'real'.
Chris Lattner's avatar
Chris Lattner committed


II. Usage of clang driver:

 * Basic Command-Line Options:
   - Help: clang --help
Chris Lattner's avatar
Chris Lattner committed
   - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
Chris Lattner's avatar
Chris Lattner committed
   - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
Chris Lattner's avatar
Chris Lattner committed
   - Enable metric printing: -stats
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
 * -fsyntax-only is currently the default mode.
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
 * -E mode works the same way as GCC.
Chris Lattner's avatar
Chris Lattner committed

 * -Eonly mode does all preprocessing, but does not print the output,
     useful for timing the preprocessor.
Chris Lattner's avatar
Chris Lattner committed
 
 * -fsyntax-only is currently partially implemented, lacking some
     semantic analysis (some errors and warnings are not produced).
Chris Lattner's avatar
Chris Lattner committed

 * -parse-noop parses code without building an AST.  This is useful
     for timing the cost of the parser without including AST building
     time.
 * -parse-ast builds ASTs, but doesn't print them.  This is most
     useful for timing AST building vs -parse-noop.
Chris Lattner's avatar
Chris Lattner committed
 
Chris Lattner's avatar
Chris Lattner committed
 * -parse-ast-print pretty prints most expression and statements nodes.
Chris Lattner's avatar
Chris Lattner committed

 * -parse-ast-check checks that diagnostic messages that are expected
     are reported and that those which are reported are expected.

 * -dump-cfg builds ASTs and then CFGs.  CFGs are then pretty-printed.

 * -view-cfg builds ASTs and then CFGs.  CFGs are then visualized by
     invoking Graphviz.

     For more information on getting Graphviz to work with clang/LLVM,
     see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
III. Current advantages over GCC:
Chris Lattner's avatar
Chris Lattner committed
 * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
Chris Lattner's avatar
Chris Lattner committed
 * All diagnostics have column numbers, includes 'caret diagnostics', and they
   highlight regions of interesting code (e.g. the LHS and RHS of a binop).
 * Full diagnostic customization by client (can format diagnostics however they
Chris Lattner's avatar
Chris Lattner committed
   like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
 * Built as a framework, can be reused by multiple tools.
 * All languages supported linked into same library (no cc1,cc1obj, ...).
 * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
Chris Lattner's avatar
Chris Lattner committed
 * LLVM License, can be linked into non-GPL projects.
 * Full diagnostic control, per diagnostic.  Diagnostics are identified by ID.
 * Significantly faster than GCC at semantic analysis, parsing, preprocessing
   and lexing.
Chris Lattner's avatar
Chris Lattner committed
 * Defers exposing platform-specific stuff to as late as possible, tracks use of
   platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
Chris Lattner's avatar
Chris Lattner committed
 * The lexer doesn't rely on the "lexer hack": it has no notion of scope and
Chris Lattner's avatar
Chris Lattner committed
   does not categorize identifiers as types or variables -- this is up to the
Chris Lattner's avatar
Chris Lattner committed
   parser to decide.
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
Potential Future Features:
Chris Lattner's avatar
Chris Lattner committed

 * Fine grained diag control within the source (#pragma enable/disable warning).
 * Better token tracking within macros?  (Token came from this line, which is
   a macro argument instantiated here, recursively instantiated here).
Chris Lattner's avatar
Chris Lattner committed
 * Fast #import with a module system.
Chris Lattner's avatar
Chris Lattner committed
 * Dependency tracking: change to header file doesn't recompile every function
   that texually depends on it: recompile only those functions that need it.
Chris Lattner's avatar
Chris Lattner committed
   This is aka 'incremental parsing'.
Chris Lattner's avatar
Chris Lattner committed
IV. Missing Functionality / Improvements

clang driver:
Chris Lattner's avatar
Chris Lattner committed
 * Include search paths are hard-coded into the driver.  Doh.
Chris Lattner's avatar
Chris Lattner committed
File Manager:
Chris Lattner's avatar
Chris Lattner committed
 * Reduce syscalls for reduced compile time, see NOTES.txt.
Chris Lattner's avatar
Chris Lattner committed

Lexer:
 * Source character mapping.  GCC supports ASCII and UTF-8.
   See GCC options: -ftarget-charset and -ftarget-wide-charset.
 * Universal character support.  Experimental in GCC, enabled with
   -fextended-identifiers.
 * -fpreprocessed mode.

Preprocessor:
Chris Lattner's avatar
Chris Lattner committed
 * Know about apple header maps.
 * #assert/#unassert
Chris Lattner's avatar
Chris Lattner committed
 * #line / #file directives (currently accepted and ignored).
 * MSExtension: "L#param" stringizes to a wide string literal.
 * Charize extension: "#define F(o) #@o  F(a)"  -> 'a'.
Chris Lattner's avatar
Chris Lattner committed
 * Consider merging the parser's expression parser into the preprocessor to
   eliminate duplicate code.
 * Add support for -M*

Traditional Preprocessor:
Chris Lattner's avatar
Chris Lattner committed
 * Currently, we have none. :)
Chris Lattner's avatar
Chris Lattner committed
Parser:
Chris Lattner's avatar
Chris Lattner committed
 * C90/K&R modes are only partially implemented.
Chris Lattner's avatar
Chris Lattner committed
 * __extension__ is currently just skipped and ignored.
Chris Lattner's avatar
Chris Lattner committed
 
Chris Lattner's avatar
Chris Lattner committed
Semantic Analysis:
Chris Lattner's avatar
Chris Lattner committed
 * Perhaps 85% done.
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
LLVM Code Gen:
Chris Lattner's avatar
Chris Lattner committed
 * Most of the easy stuff is done, probably 64.9% done so far.
Chris Lattner's avatar
Chris Lattner committed