Skip to content
README.txt 7.05 KiB
Newer Older
//===----------------------------------------------------------------------===//
// C Language Family Front-end
//===----------------------------------------------------------------------===//
Chris Lattner's avatar
Chris Lattner committed
                                                             Chris Lattner

I. Introduction:
 
 clang: noun
    1. A loud, resonant, metallic sound.
    2. The strident call of a crane or goose.
    3. C-language front-end toolkit.

 The world needs better compiler tools, tools which are built as libraries. This
 design point allows reuse of the tools in new and novel ways. However, building
 the tools as libraries isn't enough: they must have clean APIs, be as
 decoupled from each other as possible, and be easy to modify/extend.  This
 requires clean layering, decent design, and avoiding tying the libraries to a
 specific use.  Oh yeah, did I mention that we want the resultant libraries to
 be as fast as possible? :)

 This front-end is built as a component of the LLVM toolkit (which really really
 needs a better name) that can be used with the LLVM backend or independently of
 it.  In this spirit, the API has been carefully designed to include the
 following components:
 
   libsupport  - Basic support library, reused from LLVM.
   libsystem   - System abstraction library, reused from LLVM.
Chris Lattner's avatar
Chris Lattner committed
   libbasic    - Diagnostics, SourceLocations, SourceBuffer abstraction,
                 file system caching for input source files.
   liblex      - C/C++/ObjC lexing and preprocessing, identifier hash table,
                 pragma handling, tokens, and macros.
   libparse    - C99 (for now) parsing and local semantic analysis. This library
                 invokes coarse-grained 'Actions' provided by the client to do
                 stuff (great idea shamelessly stolen from Devkit).  ObjC/C90
                 need to be added soon, K&R C and C++ can be added in the
                 future, but are not a high priority.
   libast      - Provides a set of parser actions to build a standardized AST
                 for programs.  AST can be built in two forms: streamlined and
                 'complete' mode, which captures *full* location info for every
Chris Lattner's avatar
Chris Lattner committed
                 token in the AST.  AST's are 'streamed' out a top-level
                 declaration at a time, allowing clients to use decl-at-a-time
                 processing, build up entire translation units, or even build
Chris Lattner's avatar
Chris Lattner committed
                 'whole program' ASTs depending on how they use the APIs.
   libast2llvm - [Planned] Lower the AST to LLVM IR for optimization & codegen.
Chris Lattner's avatar
Chris Lattner committed
   clang       - An example client of the libraries at various levels.

 This front-end has been intentionally built as a stack, making it trivial
 to replace anything below a particular point.  For example, if you want a
 preprocessor, you take the Basic and Lexer libraries.  If you want an indexer,
 you take those plus the Parser library and provide some actions for indexing.
 If you want a refactoring, static analysis, or source-to-source compiler tool,
 it makes sense to take those plus the AST building library.  Finally, if you
 want to use this with the LLVM backend, you'd take these components plus the
 AST to LLVM lowering code.
 
 In the future I hope this toolkit will grow to include new and interesting
 components, including a C++ front-end, ObjC support, AST pretty printing
 support, and a whole lot of other things.

 Finally, it should be pointed out that the goal here is to build something that
 is high-quality and industrial-strength: all the obnoxious features of the C
 family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
Chris Lattner's avatar
Chris Lattner committed
 prototypes, GCC/MS extensions, etc).  It cannot be used if it is not 'real'.
Chris Lattner's avatar
Chris Lattner committed


II. Usage of clang driver:

 * Basic Command-Line Options:
   - Help: clang --help
Chris Lattner's avatar
Chris Lattner committed
   - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
Chris Lattner's avatar
Chris Lattner committed
   - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
Chris Lattner's avatar
Chris Lattner committed
   - Enable metric printing: -stats
Chris Lattner's avatar
Chris Lattner committed

 * -parse-noop is the default mode.

 * -E mode gives output nearly identical to GCC, though not all bugs in
   whitespace calculation have been emulated.
Chris Lattner's avatar
Chris Lattner committed

 * -fsyntax-only is currently unimplemented.
Chris Lattner's avatar
Chris Lattner committed
 
Chris Lattner's avatar
Chris Lattner committed
 * -parse-print-callbacks prints almost no callbacks so far.
Chris Lattner's avatar
Chris Lattner committed
 * -parse-ast builds ASTs, but doesn't print them.  This is most useful for
   timing AST building vs -parse-noop.
 
 * -parse-ast-print prints most expression and statements nodes, but some
   things are missing.
Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed

Chris Lattner's avatar
Chris Lattner committed
III. Current advantages over GCC:
Chris Lattner's avatar
Chris Lattner committed
 * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
 * All diagnostics have column numbers, includes 'caret diagnostics'.
 * Full diagnostic customization by client (can format diagnostics however they
Chris Lattner's avatar
Chris Lattner committed
   like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
 * Built as a framework, can be reused by multiple tools.
 * All languages supported linked into same library (no cc1,cc1obj, ...).
 * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
 * BSD License, can be linked into non-GPL projects.
 * Full diagnostic control, per diagnostic.
Chris Lattner's avatar
Chris Lattner committed
 * Faster than GCC at parsing, lexing, and preprocessing.
Chris Lattner's avatar
Chris Lattner committed
 * Defers exposing platform-specific stuff to as late as possible, tracks use of
   platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.

Future Features:
Chris Lattner's avatar
Chris Lattner committed

 * Fine grained diag control within the source (#pragma enable/disable warning).
Chris Lattner's avatar
Chris Lattner committed
 * Faster than GCC at AST generation [measure when complete].
 * Better token tracking within macros?  (Token came from this line, which is
   a macro argument instantiated here, recursively instantiated here).
Chris Lattner's avatar
Chris Lattner committed
 * Fast #import!
 * Dependency tracking: change to header file doesn't recompile every function
   that texually depends on it: recompile only those functions that need it.
Chris Lattner's avatar
Chris Lattner committed
IV. Missing Functionality / Improvements

clang driver:
Chris Lattner's avatar
Chris Lattner committed
 * Include search paths are hard-coded into the driver.
Chris Lattner's avatar
Chris Lattner committed
File Manager:
 * Reduce syscalls, see NOTES.txt.
Chris Lattner's avatar
Chris Lattner committed

Lexer:
 * Source character mapping.  GCC supports ASCII and UTF-8.
   See GCC options: -ftarget-charset and -ftarget-wide-charset.
 * Universal character support.  Experimental in GCC, enabled with
   -fextended-identifiers.
 * -fpreprocessed mode.

Preprocessor:
Chris Lattner's avatar
Chris Lattner committed
 * Know about apple header maps.
 * #assert/#unassert
Chris Lattner's avatar
Chris Lattner committed
 * #line / #file directives (currently accepted and ignored).
 * MSExtension: "L#param" stringizes to a wide string literal.
Chris Lattner's avatar
Chris Lattner committed
 * Consider merging the parser's expression parser into the preprocessor to
   eliminate duplicate code.
 * Add support for -M*

Traditional Preprocessor:
 * All.
Chris Lattner's avatar
Chris Lattner committed
Parser:
 * C90/K&R modes.  Need to get a copy of the C90 spec.
 * __extension__, __attribute__ [currently just skipped and ignored].
Chris Lattner's avatar
Chris Lattner committed
 * A lot of semantic analysis is missing.
Chris Lattner's avatar
Chris Lattner committed
 * "initializers", GCC inline asm.
Chris Lattner's avatar
Chris Lattner committed

Parser Actions:
Chris Lattner's avatar
Chris Lattner committed
 * All that are missing.
 * SemaActions vs MinimalActions.
Chris Lattner's avatar
Chris Lattner committed
 * Would like to either lazily resolve types [refactoring] or aggressively
   resolve them [c compiler].  Need to know whether something is a type or not
   to compile, but don't need to know what it is.
Chris Lattner's avatar
Chris Lattner committed
 * Implement a little devkit-style "indexer".
 
AST Builder:
 * Implement more nodes as actions are available.
 * Types.