Skip to content
LangRef.html 172 KiB
Newer Older
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title>LLVM Assembly Language Reference Manual</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="author" content="Chris Lattner">
  <meta name="description" 
  content="LLVM Assembly Language Reference Manual.">
  <link rel="stylesheet" href="llvm.css" type="text/css">
</head>
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_title"> LLVM Language Reference Manual </div>
Chris Lattner's avatar
Chris Lattner committed
<ol>
  <li><a href="#abstract">Abstract</a></li>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#identifiers">Identifiers</a></li>
  <li><a href="#highlevel">High Level Structure</a>
    <ol>
      <li><a href="#modulestructure">Module Structure</a></li>
      <li><a href="#linkage">Linkage Types</a></li>
      <li><a href="#callingconv">Calling Conventions</a></li>
      <li><a href="#globalvars">Global Variables</a></li>
      <li><a href="#functionstructure">Functions</a></li>
Reid Spencer's avatar
Reid Spencer committed
      <li><a href="#paramattrs">Parameter Attributes</a></li>
      <li><a href="#moduleasm">Module-Level Inline Assembly</a></li>
      <li><a href="#datalayout">Data Layout</a></li>
Chris Lattner's avatar
Chris Lattner committed
  <li><a href="#typesystem">Type System</a>
    <ol>
      <li><a href="#t_primitive">Primitive Types</a>    
Chris Lattner's avatar
Chris Lattner committed
        <ol>
          <li><a href="#t_classifications">Type Classifications</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#t_derived">Derived Types</a>
        <ol>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#t_array">Array Type</a></li>
          <li><a href="#t_function">Function Type</a></li>
          <li><a href="#t_pointer">Pointer Type</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#t_struct">Structure Type</a></li>
          <li><a href="#t_pstruct">Packed Structure Type</a></li>
Reid Spencer's avatar
Reid Spencer committed
          <li><a href="#t_vector">Vector Type</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#t_opaque">Opaque Type</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
    </ol>
  </li>
  <li><a href="#constants">Constants</a>
    <ol>
      <li><a href="#simpleconstants">Simple Constants</a>
      <li><a href="#aggregateconstants">Aggregate Constants</a>
      <li><a href="#globalconstants">Global Variable and Function Addresses</a>
      <li><a href="#undefvalues">Undefined Values</a>
      <li><a href="#constantexprs">Constant Expressions</a>
    </ol>
Chris Lattner's avatar
Chris Lattner committed
  </li>
  <li><a href="#othervalues">Other Values</a>
    <ol>
      <li><a href="#inlineasm">Inline Assembler Expressions</a>
    </ol>
  </li>
Chris Lattner's avatar
Chris Lattner committed
  <li><a href="#instref">Instruction Reference</a>
    <ol>
      <li><a href="#terminators">Terminator Instructions</a>
        <ol>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_ret">'<tt>ret</tt>' Instruction</a></li>
          <li><a href="#i_br">'<tt>br</tt>' Instruction</a></li>
          <li><a href="#i_switch">'<tt>switch</tt>' Instruction</a></li>
          <li><a href="#i_invoke">'<tt>invoke</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_unwind">'<tt>unwind</tt>'  Instruction</a></li>
          <li><a href="#i_unreachable">'<tt>unreachable</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#binaryops">Binary Operations</a>
        <ol>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_add">'<tt>add</tt>' Instruction</a></li>
          <li><a href="#i_sub">'<tt>sub</tt>' Instruction</a></li>
          <li><a href="#i_mul">'<tt>mul</tt>' Instruction</a></li>
Reid Spencer's avatar
Reid Spencer committed
          <li><a href="#i_udiv">'<tt>udiv</tt>' Instruction</a></li>
          <li><a href="#i_sdiv">'<tt>sdiv</tt>' Instruction</a></li>
          <li><a href="#i_fdiv">'<tt>fdiv</tt>' Instruction</a></li>
Reid Spencer's avatar
Reid Spencer committed
          <li><a href="#i_urem">'<tt>urem</tt>' Instruction</a></li>
          <li><a href="#i_srem">'<tt>srem</tt>' Instruction</a></li>
          <li><a href="#i_frem">'<tt>frem</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#bitwiseops">Bitwise Binary Operations</a>
        <ol>
          <li><a href="#i_shl">'<tt>shl</tt>' Instruction</a></li>
          <li><a href="#i_lshr">'<tt>lshr</tt>' Instruction</a></li>
          <li><a href="#i_ashr">'<tt>ashr</tt>' Instruction</a></li>
          <li><a href="#i_and">'<tt>and</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_or">'<tt>or</tt>'  Instruction</a></li>
          <li><a href="#i_xor">'<tt>xor</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
      <li><a href="#vectorops">Vector Operations</a>
        <ol>
          <li><a href="#i_extractelement">'<tt>extractelement</tt>' Instruction</a></li>
          <li><a href="#i_insertelement">'<tt>insertelement</tt>' Instruction</a></li>
          <li><a href="#i_shufflevector">'<tt>shufflevector</tt>' Instruction</a></li>
        </ol>
      </li>
      <li><a href="#memoryops">Memory Access and Addressing Operations</a>
Chris Lattner's avatar
Chris Lattner committed
        <ol>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_malloc">'<tt>malloc</tt>'   Instruction</a></li>
          <li><a href="#i_free">'<tt>free</tt>'     Instruction</a></li>
          <li><a href="#i_alloca">'<tt>alloca</tt>'   Instruction</a></li>
         <li><a href="#i_load">'<tt>load</tt>'     Instruction</a></li>
         <li><a href="#i_store">'<tt>store</tt>'    Instruction</a></li>
         <li><a href="#i_getelementptr">'<tt>getelementptr</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
Reid Spencer's avatar
Reid Spencer committed
      <li><a href="#convertops">Conversion Operations</a>
Reid Spencer's avatar
Reid Spencer committed
        <ol>
          <li><a href="#i_trunc">'<tt>trunc .. to</tt>' Instruction</a></li>
          <li><a href="#i_zext">'<tt>zext .. to</tt>' Instruction</a></li>
          <li><a href="#i_sext">'<tt>sext .. to</tt>' Instruction</a></li>
          <li><a href="#i_fptrunc">'<tt>fptrunc .. to</tt>' Instruction</a></li>
          <li><a href="#i_fpext">'<tt>fpext .. to</tt>' Instruction</a></li>
          <li><a href="#i_fptoui">'<tt>fptoui .. to</tt>' Instruction</a></li>
          <li><a href="#i_fptosi">'<tt>fptosi .. to</tt>' Instruction</a></li>
          <li><a href="#i_uitofp">'<tt>uitofp .. to</tt>' Instruction</a></li>
          <li><a href="#i_sitofp">'<tt>sitofp .. to</tt>' Instruction</a></li>
          <li><a href="#i_ptrtoint">'<tt>ptrtoint .. to</tt>' Instruction</a></li>
          <li><a href="#i_inttoptr">'<tt>inttoptr .. to</tt>' Instruction</a></li>
          <li><a href="#i_bitcast">'<tt>bitcast .. to</tt>' Instruction</a></li>
Reid Spencer's avatar
Reid Spencer committed
        </ol>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#otherops">Other Operations</a>
        <ol>
Reid Spencer's avatar
Reid Spencer committed
          <li><a href="#i_icmp">'<tt>icmp</tt>' Instruction</a></li>
          <li><a href="#i_fcmp">'<tt>fcmp</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_phi">'<tt>phi</tt>'   Instruction</a></li>
          <li><a href="#i_select">'<tt>select</tt>' Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_call">'<tt>call</tt>'  Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#i_va_arg">'<tt>va_arg</tt>'  Instruction</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
Chris Lattner's avatar
Chris Lattner committed
      </li>
Chris Lattner's avatar
Chris Lattner committed
    </ol>
Chris Lattner's avatar
Chris Lattner committed
  </li>
  <li><a href="#intrinsics">Intrinsic Functions</a>
    <ol>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#int_varargs">Variable Argument Handling Intrinsics</a>
        <ol>
          <li><a href="#int_va_start">'<tt>llvm.va_start</tt>' Intrinsic</a></li>
          <li><a href="#int_va_end">'<tt>llvm.va_end</tt>'   Intrinsic</a></li>
          <li><a href="#int_va_copy">'<tt>llvm.va_copy</tt>'  Intrinsic</a></li>
Chris Lattner's avatar
Chris Lattner committed
        </ol>
      </li>
      <li><a href="#int_gc">Accurate Garbage Collection Intrinsics</a>
        <ol>
          <li><a href="#int_gcroot">'<tt>llvm.gcroot</tt>' Intrinsic</a></li>
          <li><a href="#int_gcread">'<tt>llvm.gcread</tt>' Intrinsic</a></li>
          <li><a href="#int_gcwrite">'<tt>llvm.gcwrite</tt>' Intrinsic</a></li>
Chris Lattner's avatar
Chris Lattner committed
      <li><a href="#int_codegen">Code Generator Intrinsics</a>
        <ol>
          <li><a href="#int_returnaddress">'<tt>llvm.returnaddress</tt>' Intrinsic</a></li>
          <li><a href="#int_frameaddress">'<tt>llvm.frameaddress</tt>'   Intrinsic</a></li>
          <li><a href="#int_stacksave">'<tt>llvm.stacksave</tt>' Intrinsic</a></li>
          <li><a href="#int_stackrestore">'<tt>llvm.stackrestore</tt>' Intrinsic</a></li>
          <li><a href="#int_prefetch">'<tt>llvm.prefetch</tt>' Intrinsic</a></li>
          <li><a href="#int_pcmarker">'<tt>llvm.pcmarker</tt>' Intrinsic</a></li>
          <li><a href="#int_readcyclecounter"><tt>llvm.readcyclecounter</tt>' Intrinsic</a></li>
      <li><a href="#int_libc">Standard C Library Intrinsics</a>
        <ol>
          <li><a href="#int_memcpy">'<tt>llvm.memcpy.*</tt>' Intrinsic</a></li>
          <li><a href="#int_memmove">'<tt>llvm.memmove.*</tt>' Intrinsic</a></li>
          <li><a href="#int_memset">'<tt>llvm.memset.*</tt>' Intrinsic</a></li>
          <li><a href="#int_sqrt">'<tt>llvm.sqrt.*</tt>' Intrinsic</a></li>
          <li><a href="#int_powi">'<tt>llvm.powi.*</tt>' Intrinsic</a></li>
      <li><a href="#int_manip">Bit Manipulation Intrinsics</a>
          <li><a href="#int_bswap">'<tt>llvm.bswap.*</tt>' Intrinsics</a></li>
Chris Lattner's avatar
Chris Lattner committed
          <li><a href="#int_ctpop">'<tt>llvm.ctpop.*</tt>' Intrinsic </a></li>
          <li><a href="#int_ctlz">'<tt>llvm.ctlz.*</tt>' Intrinsic </a></li>
          <li><a href="#int_cttz">'<tt>llvm.cttz.*</tt>' Intrinsic </a></li>
          <li><a href="#int_bit_part_select">'<tt>llvm.bit.part_select.*</tt>' Intrinsic </a></li>
      <li><a href="#int_debugger">Debugger intrinsics</a></li>
      <li><a href="#int_eh">Exception Handling intrinsics</a></li>
Chris Lattner's avatar
Chris Lattner committed
    </ol>
  </li>
Chris Lattner's avatar
Chris Lattner committed
</ol>

<div class="doc_author">
  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
            and <a href="mailto:vadve@cs.uiuc.edu">Vikram Adve</a></p>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_section"> <a name="abstract">Abstract </a></div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<p>This document is a reference manual for the LLVM assembly language. 
LLVM is an SSA based representation that provides type safety,
low-level operations, flexibility, and the capability of representing
'all' high-level languages cleanly.  It is the common code
representation used throughout all phases of the LLVM compilation
strategy.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_section"> <a name="introduction">Introduction</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<p>The LLVM code representation is designed to be used in three
different forms: as an in-memory compiler IR, as an on-disk bytecode
representation (suitable for fast loading by a Just-In-Time compiler),
and as a human readable assembly language representation.  This allows
LLVM to provide a powerful intermediate representation for efficient
compiler transformations and analysis, while providing a natural means
to debug and visualize the transformations.  The three different forms
of LLVM are all equivalent.  This document describes the human readable
representation and notation.</p>
John Criswell's avatar
John Criswell committed
<p>The LLVM representation aims to be light-weight and low-level
Chris Lattner's avatar
Chris Lattner committed
while being expressive, typed, and extensible at the same time.  It
aims to be a "universal IR" of sorts, by being at a low enough level
that high-level ideas may be cleanly mapped to it (similar to how
microprocessors are "universal IR's", allowing many source languages to
be mapped to them).  By providing type information, LLVM can be used as
the target of optimizations: for example, through pointer analysis, it
can be proven that a C automatic variable is never accessed outside of
the current function... allowing it to be promoted to a simple SSA
value instead of a memory location.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- _______________________________________________________________________ -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_subsubsection"> <a name="wellformed">Well-Formedness</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<p>It is important to note that this document describes 'well formed'
LLVM assembly language.  There is a difference between what the parser
accepts and what is considered 'well formed'.  For example, the
following instruction is syntactically okay, but not well formed:</p>
Reid Spencer's avatar
Reid Spencer committed
  %x = <a href="#i_add">add</a> i32 1, %x
Chris Lattner's avatar
Chris Lattner committed
<p>...because the definition of <tt>%x</tt> does not dominate all of
its uses. The LLVM infrastructure provides a verification pass that may
be used to verify that an LLVM module is well formed.  This pass is
John Criswell's avatar
John Criswell committed
automatically run by the parser after parsing input assembly and by
Chris Lattner's avatar
Chris Lattner committed
the optimizer before it outputs bytecode.  The violations pointed out
by the verifier pass indicate bugs in transformation passes or input to
the parser.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- Describe the typesetting conventions here. --> </div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_section"> <a name="identifiers">Identifiers</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<p>LLVM uses three different forms of identifiers, for different
purposes:</p>
Chris Lattner's avatar
Chris Lattner committed
<ol>
  <li>Named values are represented as a string of characters with a '%' prefix.
  For example, %foo, %DivisionByZero, %a.really.long.identifier.  The actual
  regular expression used is '<tt>%[a-zA-Z$._][a-zA-Z$._0-9]*</tt>'.
  Identifiers which require other characters in their names can be surrounded
  with quotes.  In this way, anything except a <tt>&quot;</tt> character can be used
  in a name.</li>

  <li>Unnamed values are represented as an unsigned numeric value with a '%'
  prefix.  For example, %12, %2, %44.</li>

Reid Spencer's avatar
Reid Spencer committed
  <li>Constants, which are described in a <a href="#constants">section about
  constants</a>, below.</li>

<p>LLVM requires that values start with a '%' sign for two reasons: Compilers
don't need to worry about name clashes with reserved words, and the set of
reserved words may be expanded in the future without penalty.  Additionally,
unnamed identifiers allow a compiler to quickly come up with a temporary
variable without having to avoid symbol table conflicts.</p>

Chris Lattner's avatar
Chris Lattner committed
<p>Reserved words in LLVM are very similar to reserved words in other
languages. There are keywords for different opcodes 
('<tt><a href="#i_add">add</a></tt>', 
 '<tt><a href="#i_bitcast">bitcast</a></tt>', 
 '<tt><a href="#i_ret">ret</a></tt>', etc...), for primitive type names ('<tt><a
Reid Spencer's avatar
Reid Spencer committed
href="#t_void">void</a></tt>', '<tt><a href="#t_primitive">i32</a></tt>', etc...),
and others.  These reserved words cannot conflict with variable names, because
none of them start with a '%' character.</p>

<p>Here is an example of LLVM code to multiply the integer variable
'<tt>%X</tt>' by 8:</p>

Reid Spencer's avatar
Reid Spencer committed
  %result = <a href="#i_mul">mul</a> i32 %X, 8
Reid Spencer's avatar
Reid Spencer committed
  %result = <a href="#i_shl">shl</a> i32 %X, i8 3
Reid Spencer's avatar
Reid Spencer committed
  <a href="#i_add">add</a> i32 %X, %X           <i>; yields {i32}:%0</i>
  <a href="#i_add">add</a> i32 %0, %0           <i>; yields {i32}:%1</i>
  %result = <a href="#i_add">add</a> i32 %1, %1
Chris Lattner's avatar
Chris Lattner committed
<p>This last way of multiplying <tt>%X</tt> by 8 illustrates several
important lexical features of LLVM:</p>
Chris Lattner's avatar
Chris Lattner committed
<ol>

  <li>Comments are delimited with a '<tt>;</tt>' and go until the end of
  line.</li>

  <li>Unnamed temporaries are created when the result of a computation is not
  assigned to a named value.</li>

  <li>Unnamed temporaries are numbered sequentially</li>
John Criswell's avatar
John Criswell committed
<p>...and it also shows a convention that we follow in this document.  When
demonstrating instructions, we will follow an instruction with a comment that
defines the type and name of value produced.  Comments are shown in italic
text.</p>


<!-- *********************************************************************** -->
<div class="doc_section"> <a name="highlevel">High Level Structure</a> </div>
<!-- *********************************************************************** -->

<!-- ======================================================================= -->
<div class="doc_subsection"> <a name="modulestructure">Module Structure</a>
</div>

<div class="doc_text">

<p>LLVM programs are composed of "Module"s, each of which is a
translation unit of the input programs.  Each module consists of
functions, global variables, and symbol table entries.  Modules may be
combined together with the LLVM linker, which merges function (and
global variable) definitions, resolves forward declarations, and merges
symbol table entries. Here is an example of the "hello world" module:</p>

<pre><i>; Declare the string constant as a global constant...</i>
<a href="#identifiers">%.LC0</a> = <a href="#linkage_internal">internal</a> <a
Reid Spencer's avatar
Reid Spencer committed
 href="#globalvars">constant</a> <a href="#t_array">[13 x i8 ]</a> c"hello world\0A\00"          <i>; [13 x i8 ]*</i>

<i>; External declaration of the puts function</i>
Reid Spencer's avatar
Reid Spencer committed
<a href="#functionstructure">declare</a> i32 %puts(i8 *)                                            <i>; i32(i8 *)* </i>

<i>; Definition of main function</i>
Reid Spencer's avatar
Reid Spencer committed
define i32 %main() {                                                 <i>; i32()* </i>
        <i>; Convert [13x i8 ]* to i8  *...</i>
Reid Spencer's avatar
Reid Spencer committed
 href="#i_getelementptr">getelementptr</a> [13 x i8 ]* %.LC0, i64 0, i64 0 <i>; i8 *</i>

        <i>; Call puts function to write out the string to stdout...</i>
        <a
Reid Spencer's avatar
Reid Spencer committed
 href="#i_call">call</a> i32 %puts(i8 * %cast210)                              <i>; i32</i>
Reid Spencer's avatar
Reid Spencer committed
 href="#i_ret">ret</a> i32 0<br>}<br></pre>

<p>This example is made up of a <a href="#globalvars">global variable</a>
named "<tt>.LC0</tt>", an external declaration of the "<tt>puts</tt>"
function, and a <a href="#functionstructure">function definition</a>
for "<tt>main</tt>".</p>

<p>In general, a module is made up of a list of global values,
where both functions and global variables are global values.  Global values are
represented by a pointer to a memory location (in this case, a pointer to an
array of char, and a pointer to a function), and have one of the following <a
href="#linkage">linkage types</a>.</p>

</div>

<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="linkage">Linkage Types</a>
</div>
<div class="doc_text">

<p>
All Global Variables and Functions have one of the following types of linkage:
</p>
  <dt><tt><b><a name="linkage_internal">internal</a></b></tt> </dt>

  <dd>Global values with internal linkage are only directly accessible by
  objects in the current module.  In particular, linking code into a module with
  an internal global value may cause the internal to be renamed as necessary to
  avoid collisions.  Because the symbol is internal to the module, all
  references can be updated.  This corresponds to the notion of the
  '<tt>static</tt>' keyword in C.
  <dt><tt><b><a name="linkage_linkonce">linkonce</a></b></tt>: </dt>
  <dd>Globals with "<tt>linkonce</tt>" linkage are merged with other globals of
  the same name when linkage occurs.  This is typically used to implement 
  inline functions, templates, or other code which must be generated in each 
  translation unit that uses it.  Unreferenced <tt>linkonce</tt> globals are 
  allowed to be discarded.
  <dt><tt><b><a name="linkage_weak">weak</a></b></tt>: </dt>

  <dd>"<tt>weak</tt>" linkage is exactly the same as <tt>linkonce</tt> linkage,
  except that unreferenced <tt>weak</tt> globals may not be discarded.  This is
  used for globals that may be emitted in multiple translation units, but that
  are not guaranteed to be emitted into every translation unit that uses them.
  One example of this are common globals in C, such as "<tt>int X;</tt>" at 
  global scope.
  <dt><tt><b><a name="linkage_appending">appending</a></b></tt>: </dt>

  <dd>"<tt>appending</tt>" linkage may only be applied to global variables of
  pointer to array type.  When two global variables with appending linkage are
  linked together, the two global arrays are appended together.  This is the
  LLVM, typesafe, equivalent of having the system linker append together
  "sections" with identical names when .o files are linked.
  <dt><tt><b><a name="linkage_externweak">extern_weak</a></b></tt>: </dt>
  <dd>The semantics of this linkage follow the ELF model: the symbol is weak
    until linked, if not linked, the symbol becomes null instead of being an
    undefined reference.
  </dd>
</dl>

  <dt><tt><b><a name="linkage_external">externally visible</a></b></tt>:</dt>

  <dd>If none of the above identifiers are used, the global is externally
  visible, meaning that it participates in linkage and can be used to resolve
  external symbol references.

  <p>
  The next two types of linkage are targeted for Microsoft Windows platform
  only. They are designed to support importing (exporting) symbols from (to)
  DLLs.
  </p>

  <dt><tt><b><a name="linkage_dllimport">dllimport</a></b></tt>: </dt>

  <dd>"<tt>dllimport</tt>" linkage causes the compiler to reference a function
    or variable via a global pointer to a pointer that is set up by the DLL
    exporting the symbol. On Microsoft Windows targets, the pointer name is
    formed by combining <code>_imp__</code> and the function or variable name.
  </dd>

  <dt><tt><b><a name="linkage_dllexport">dllexport</a></b></tt>: </dt>

  <dd>"<tt>dllexport</tt>" linkage causes the compiler to provide a global
    pointer to a pointer in a DLL, so that it can be referenced with the
    <tt>dllimport</tt> attribute. On Microsoft Windows targets, the pointer
    name is formed by combining <code>_imp__</code> and the function or variable
    name.
  </dd>

<p><a name="linkage_external"></a>For example, since the "<tt>.LC0</tt>"
variable is defined to be internal, if another module defined a "<tt>.LC0</tt>"
variable and was linked with this one, one of the two would be renamed,
preventing a collision.  Since "<tt>main</tt>" and "<tt>puts</tt>" are
external (i.e., lacking any linkage declarations), they are accessible
outside of the current module.</p>
<p>It is illegal for a function <i>declaration</i>
to have any linkage type other than "externally visible", <tt>dllimport</tt>,
or <tt>extern_weak</tt>.</p>
<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="callingconv">Calling Conventions</a>
</div>

<div class="doc_text">

<p>LLVM <a href="#functionstructure">functions</a>, <a href="#i_call">calls</a>
and <a href="#i_invoke">invokes</a> can all have an optional calling convention
specified for the call.  The calling convention of any pair of dynamic
caller/callee must match, or the behavior of the program is undefined.  The
following calling conventions are supported by LLVM, and more may be added in
the future:</p>

<dl>
  <dt><b>"<tt>ccc</tt>" - The C calling convention</b>:</dt>

  <dd>This calling convention (the default if no other calling convention is
  specified) matches the target C calling conventions.  This calling convention
John Criswell's avatar
John Criswell committed
  supports varargs function calls and tolerates some mismatch in the declared
  prototype and implemented declaration of the function (as does normal C). 
  </dd>

  <dt><b>"<tt>fastcc</tt>" - The fast calling convention</b>:</dt>

  <dd>This calling convention attempts to make calls as fast as possible
  (e.g. by passing things in registers).  This calling convention allows the
  target to use whatever tricks it wants to produce fast code for the target,
Chris Lattner's avatar
Chris Lattner committed
  without having to conform to an externally specified ABI.  Implementations of
  this convention should allow arbitrary tail call optimization to be supported.
  This calling convention does not support varargs and requires the prototype of
  all callees to exactly match the prototype of the function definition.
  </dd>

  <dt><b>"<tt>coldcc</tt>" - The cold calling convention</b>:</dt>

  <dd>This calling convention attempts to make code in the caller as efficient
  as possible under the assumption that the call is not commonly executed.  As
  such, these calls often preserve all registers so that the call does not break
  any live ranges in the caller side.  This calling convention does not support
  varargs and requires the prototype of all callees to exactly match the
  prototype of the function definition.
  </dd>

  <dt><b>"<tt>cc &lt;<em>n</em>&gt;</tt>" - Numbered convention</b>:</dt>

  <dd>Any calling convention may be specified by number, allowing
  target-specific calling conventions to be used.  Target specific calling
  conventions start at 64.
  </dd>

<p>More calling conventions can be added/defined on an as-needed basis, to
support pascal conventions or any other well-known target-independent
convention.</p>

</div>

<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="visibility">Visibility Styles</a>
</div>

<div class="doc_text">

<p>
All Global Variables and Functions have one of the following visibility styles:
</p>

<dl>
  <dt><b>"<tt>default</tt>" - Default style</b>:</dt>

  <dd>On ELF, default visibility means that the declaration is visible to other
    modules and, in shared libraries, means that the declared entity may be
    overridden. On Darwin, default visibility means that the declaration is
    visible to other modules. Default visibility corresponds to "external
    linkage" in the language.
  </dd>

  <dt><b>"<tt>hidden</tt>" - Hidden style</b>:</dt>

  <dd>Two declarations of an object with hidden visibility refer to the same
    object if they are in the same shared object. Usually, hidden visibility
    indicates that the symbol will not be placed into the dynamic symbol table,
    so no other module (executable or shared library) can reference it
    directly.
  </dd>

</dl>

</div>

<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="globalvars">Global Variables</a>
</div>

<div class="doc_text">

<p>Global variables define regions of memory allocated at compilation time
Chris Lattner's avatar
Chris Lattner committed
instead of run-time.  Global variables may optionally be initialized, may have
an explicit section to be placed in, and may
have an optional explicit alignment specified.  A
variable may be defined as a global "constant," which indicates that the
contents of the variable will <b>never</b> be modified (enabling better
optimization, allowing the global data to be placed in the read-only section of
an executable, etc).  Note that variables that need runtime initialization
cannot be marked "constant" as there is a store to the variable.</p>

<p>
LLVM explicitly allows <em>declarations</em> of global variables to be marked
constant, even if the final definition of the global is not.  This capability
can be used to enable slightly better optimization of the program, but requires
the language definition to guarantee that optimizations based on the
'constantness' are valid for the translation units that do not include the
definition.
</p>

<p>As SSA values, global variables define pointer values that are in
scope (i.e. they dominate) all basic blocks in the program.  Global
variables always define a pointer to their "content" type because they
describe a region of memory, and all memory objects in LLVM are
accessed through pointers.</p>

Chris Lattner's avatar
Chris Lattner committed
<p>LLVM allows an explicit section to be specified for globals.  If the target
supports it, it will emit globals to the section specified.</p>

<p>An explicit alignment may be specified for a global.  If not present, or if
the alignment is set to zero, the alignment of the global is set by the target
to whatever it feels convenient.  If an explicit alignment is specified, the 
global is forced to have at least that much alignment.  All alignments must be
a power of 2.</p>

<p>For example, the following defines a global with an initializer, section,
   and alignment:</p>

<pre>
  %G = constant float 1.0, section "foo", align 4
</pre>

</div>


<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="functionstructure">Functions</a>
</div>

<div class="doc_text">

Reid Spencer's avatar
Reid Spencer committed
<p>LLVM function definitions consist of the "<tt>define</tt>" keyord, 
an optional <a href="#linkage">linkage type</a>, an optional 
<a href="#visibility">visibility style</a>, an optional 
Reid Spencer's avatar
Reid Spencer committed
<a href="#callingconv">calling convention</a>, a return type, an optional
<a href="#paramattrs">parameter attribute</a> for the return type, a function 
name, a (possibly empty) argument list (each with optional 
<a href="#paramattrs">parameter attributes</a>), an optional section, an
optional alignment, an opening curly brace, a list of basic blocks, and a
closing curly brace.  

LLVM function declarations consist of the "<tt>declare</tt>" keyword, an
optional <a href="#linkage">linkage type</a>, an optional
<a href="#visibility">visibility style</a>, an optional 
<a href="#callingconv">calling convention</a>, a return type, an optional
Reid Spencer's avatar
Reid Spencer committed
<a href="#paramattrs">parameter attribute</a> for the return type, a function 
name, a possibly empty list of arguments, and an optional alignment.</p>

<p>A function definition contains a list of basic blocks, forming the CFG for
the function.  Each basic block may optionally start with a label (giving the
basic block a symbol table entry), contains a list of instructions, and ends
with a <a href="#terminators">terminator</a> instruction (such as a branch or
function return).</p>

John Criswell's avatar
John Criswell committed
<p>The first basic block in a program is special in two ways: it is immediately
executed on entrance to the function, and it is not allowed to have predecessor
basic blocks (i.e. there can not be any branches to the entry block of a
function).  Because the block can have no predecessors, it also cannot have any
<a href="#i_phi">PHI nodes</a>.</p>

<p>LLVM functions are identified by their name and type signature.  Hence, two
functions with the same name but different parameter lists or return values are
considered different functions, and LLVM will resolve references to each
Chris Lattner's avatar
Chris Lattner committed
<p>LLVM allows an explicit section to be specified for functions.  If the target
supports it, it will emit functions to the section specified.</p>

<p>An explicit alignment may be specified for a function.  If not present, or if
the alignment is set to zero, the alignment of the function is set by the target
to whatever it feels convenient.  If an explicit alignment is specified, the
function is forced to have at least that much alignment.  All alignments must be
a power of 2.</p>

Reid Spencer's avatar
Reid Spencer committed
<!-- ======================================================================= -->
<div class="doc_subsection"><a name="paramattrs">Parameter Attributes</a></div>
<div class="doc_text">
  <p>The return type and each parameter of a function type may have a set of
  <i>parameter attributes</i> associated with them. Parameter attributes are
  used to communicate additional information about the result or parameters of
  a function. Parameter attributes are considered to be part of the function
  type so two functions types that differ only by the parameter attributes 
  are different function types.</p>

  <p>Parameter attributes are simple keywords that follow the type specified. If
  multiple parameter attributes are needed, they are space separated. For 
    %someFunc = i16 (i8 sext %someParam) zext
    %someFunc = i16 (i8 zext %someParam) zext</pre>
  <p>Note that the two function types above are unique because the parameter has
  a different attribute (sext in the first one, zext in the second). Also note
  that the attribute for the function result (zext) comes immediately after the
  argument list.</p>
Reid Spencer's avatar
Reid Spencer committed

  <p>Currently, only the following parameter attributes are defined:</p>
Reid Spencer's avatar
Reid Spencer committed
  <dl>
Reid Spencer's avatar
Reid Spencer committed
    <dd>This indicates that the parameter should be zero extended just before
    a call to this function.</dd>
Reid Spencer's avatar
Reid Spencer committed
    <dd>This indicates that the parameter should be sign extended just before
    a call to this function.</dd>
    <dt><tt>inreg</tt></dt>
    <dd>This indicates that the parameter should be placed in register (if
Anton Korobeynikov's avatar
Anton Korobeynikov committed
    possible) during assembling function call. Support for this attribute is
    target-specific</dd>
    <dt><tt>sret</tt></dt>
Anton Korobeynikov's avatar
Anton Korobeynikov committed
    <dd>This indicates that the parameter specifies the address of a structure
    that is the return value of the function in the source program.</dd>
    <dt><tt>noreturn</tt></dt>
    <dd>This function attribute indicates that the function never returns. This
    indicates to LLVM that every call to this function should be treated as if
    an <tt>unreachable</tt> instruction immediately followed the call.</dd> 
    <dt><tt>nounwind</tt></dt>
    <dd>This function attribute indicates that the function type does not use
    the unwind instruction and does not allow stack unwinding to propagate
    through it.</dd>
Reid Spencer's avatar
Reid Spencer committed

</div>

<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="moduleasm">Module-Level Inline Assembly</a>
</div>

<div class="doc_text">
<p>
Modules may contain "module-level inline asm" blocks, which corresponds to the
GCC "file scope inline asm" blocks.  These blocks are internally concatenated by
LLVM and treated as a single unit, but may be separated in the .ll file if
desired.  The syntax is very simple:
</p>

<div class="doc_code"><pre>
Chris Lattner's avatar
Chris Lattner committed
  module asm "inline asm code goes here"
  module asm "more can go here"
</pre></div>

<p>The strings can contain any character by escaping non-printable characters.
   The escape sequence used is simply "\xx" where "xx" is the two digit hex code
   for the number.
</p>

<p>
  The inline asm code is simply printed to the machine code .s file when
  assembly code is generated.
</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
  <a name="datalayout">Data Layout</a>
</div>

<div class="doc_text">
<p>A module may specify a target specific data layout string that specifies how
data is to be laid out in memory. The syntax for the data layout is simply:<br/>
<pre>    target datalayout = "<i>layout specification</i>"
</pre>
The <i>layout specification</i> consists of a list of specifications separated
by the minus sign character ('-').  Each specification starts with a letter 
and may include other information after the letter to define some aspect of the
data layout.  The specifications accepted are as follows: </p>
<dl>
  <dt><tt>E</tt></dt>
  <dd>Specifies that the target lays out data in big-endian form. That is, the
  bits with the most significance have the lowest address location.</dd>
  <dt><tt>e</tt></dt>
  <dd>Specifies that hte target lays out data in little-endian form. That is,
  the bits with the least significance have the lowest address location.</dd>
  <dt><tt>p:<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt>
  <dd>This specifies the <i>size</i> of a pointer and its <i>abi</i> and 
  <i>preferred</i> alignments. All sizes are in bits. Specifying the <i>pref</i>
  alignment is optional. If omitted, the preceding <tt>:</tt> should be omitted
  too.</dd>
  <dt><tt>i<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt>
  <dd>This specifies the alignment for an integer type of a given bit
  <i>size</i>. The value of <i>size</i> must be in the range [1,2^23).</dd>
  <dt><tt>v<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt>
  <dd>This specifies the alignment for a vector type of a given bit 
  <i>size</i>.</dd>
  <dt><tt>f<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt>
  <dd>This specifies the alignment for a floating point type of a given bit 
  <i>size</i>. The value of <i>size</i> must be either 32 (float) or 64
  (double).</dd>
  <dt><tt>a<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt>
  <dd>This specifies the alignment for an aggregate type of a given bit
  <i>size</i>.</dd>
</dl>
<p>When constructing the data layout for a given target, LLVM starts with a
default set of specifications which are then (possibly) overriden by the
specifications in the <tt>datalayout</tt> keyword. The default specifications
are given in this list:</p>
<ul>
  <li><tt>E</tt> - big endian</li>
  <li><tt>p:32:64:64</tt> - 32-bit pointers with 64-bit alignment</li>
  <li><tt>i1:8:8</tt> - i1 is 8-bit (byte) aligned</li>
  <li><tt>i8:8:8</tt> - i8 is 8-bit (byte) aligned</li>
  <li><tt>i16:16:16</tt> - i16 is 16-bit aligned</li>
  <li><tt>i32:32:32</tt> - i32 is 32-bit aligned</li>
  <li><tt>i64:32:64</tt> - i64 has abi alignment of 32-bits but preferred
  alignment of 64-bits</li>
  <li><tt>f32:32:32</tt> - float is 32-bit aligned</li>
  <li><tt>f64:64:64</tt> - double is 64-bit aligned</li>
  <li><tt>v64:64:64</tt> - 64-bit vector is 64-bit aligned</li>
  <li><tt>v128:128:128</tt> - 128-bit vector is 128-bit aligned</li>
  <li><tt>a0:0:1</tt> - aggregates are 8-bit aligned</li>
</ul>
<p>When llvm is determining the alignment for a given type, it uses the 
following rules:
<ol>
  <li>If the type sought is an exact match for one of the specifications, that
  specification is used.</li>
  <li>If no match is found, and the type sought is an integer type, then the
  smallest integer type that is larger than the bitwidth of the sought type is
  used. If none of the specifications are larger than the bitwidth then the the
  largest integer type is used. For example, given the default specifications
  above, the i7 type will use the alignment of i8 (next largest) while both
  i65 and i256 will use the alignment of i64 (largest specified).</li>
  <li>If no match is found, and the type sought is a vector type, then the
  largest vector type that is smaller than the sought vector type will be used
  as a fall back.  This happens because <128 x double> can be implemented in 
  terms of 64 <2 x double>, for example.</li>
</ol>
</div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_section"> <a name="typesystem">Type System</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<!-- *********************************************************************** -->
<p>The LLVM type system is one of the most important features of the
Chris Lattner's avatar
Chris Lattner committed
intermediate representation.  Being typed enables a number of
optimizations to be performed on the IR directly, without having to do
extra analyses on the side before the transformation.  A strong type
system makes it easier to read the generated code and enables novel
analyses and transformations that are not feasible to perform on normal
three address code representations.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- ======================================================================= -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_subsection"> <a name="t_primitive">Primitive Types</a> </div>
John Criswell's avatar
John Criswell committed
<p>The primitive types are the fundamental building blocks of the LLVM
system. The current set of primitive types is as follows:</p>
<table class="layout">
  <tr class="layout">
    <td class="left">
      <table>
Chris Lattner's avatar
Chris Lattner committed
        <tbody>
        <tr><th>Type</th><th>Description</th></tr>
        <tr><td><tt><a name="t_void">void</a></tt></td><td>No value</td></tr>
        <tr><td><tt>i8</tt></td><td>8-bit value</td></tr>
        <tr><td><tt>i32</tt></td><td>32-bit value</td></tr>
        <tr><td><tt>float</tt></td><td>32-bit floating point value</td></tr>
        <tr><td><tt>label</tt></td><td>Branch destination</td></tr>
Chris Lattner's avatar
Chris Lattner committed
        </tbody>
      </table>
    </td>
    <td class="right">
      <table>
Chris Lattner's avatar
Chris Lattner committed
        <tbody>
          <tr><th>Type</th><th>Description</th></tr>
          <tr><td><tt>i1</tt></td><td>True or False value</td></tr>
          <tr><td><tt>i16</tt></td><td>16-bit value</td></tr>
          <tr><td><tt>i64</tt></td><td>64-bit value</td></tr>
Reid Spencer's avatar
Reid Spencer committed
         <tr><td><tt>double</tt></td><td>64-bit floating point value</td></tr>
Chris Lattner's avatar
Chris Lattner committed
        </tbody>
      </table>
    </td>
  </tr>
Chris Lattner's avatar
Chris Lattner committed
<!-- _______________________________________________________________________ -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_subsubsection"> <a name="t_classifications">Type
Classifications</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<p>These different primitive types fall into a few useful
classifications:</p>

<table border="1" cellspacing="0" cellpadding="4">
Chris Lattner's avatar
Chris Lattner committed
  <tbody>
    <tr><th>Classification</th><th>Types</th></tr>
Chris Lattner's avatar
Chris Lattner committed
    <tr>
      <td><a name="t_integer">integer</a></td>
      <td><tt>i1, i8, i16, i32, i64</tt></td>
Chris Lattner's avatar
Chris Lattner committed
    </tr>
    <tr>
      <td><a name="t_floating">floating point</a></td>
      <td><tt>float, double</tt></td>
    </tr>
    <tr>
      <td><a name="t_firstclass">first class</a></td>
      <td><tt>i1, i8, i16, i32, i64, float, double, <br/>
Reid Spencer's avatar
Reid Spencer committed
          <a href="#t_pointer">pointer</a>,<a href="#t_vector">vector</a></tt>
Reid Spencer's avatar
Reid Spencer committed
      </td>
Chris Lattner's avatar
Chris Lattner committed
    </tr>
  </tbody>
Chris Lattner's avatar
Chris Lattner committed
<p>The <a href="#t_firstclass">first class</a> types are perhaps the
most important.  Values of these types are the only ones which can be
produced by instructions, passed as arguments, or used as operands to
instructions.  This means that all structures and arrays must be
manipulated either by pointer or by component.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- ======================================================================= -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_subsection"> <a name="t_derived">Derived Types</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<p>The real power in LLVM comes from the derived types in the system. 
This is what allows a programmer to represent arrays, functions,
pointers, and other useful types.  Note that these derived types may be
recursive: For example, it is possible to have a two dimensional array.</p>
Chris Lattner's avatar
Chris Lattner committed
<!-- _______________________________________________________________________ -->
Chris Lattner's avatar
Chris Lattner committed
<div class="doc_subsubsection"> <a name="t_array">Array Type</a> </div>
Chris Lattner's avatar
Chris Lattner committed
<h5>Overview:</h5>
<p>The array type is a very simple derived type that arranges elements
Chris Lattner's avatar
Chris Lattner committed
sequentially in memory.  The array type requires a size (number of
elements) and an underlying data type.</p>

<pre>
  [&lt;# elements&gt; x &lt;elementtype&gt;]
</pre>

John Criswell's avatar
John Criswell committed
<p>The number of elements is a constant integer value; elementtype may
Chris Lattner's avatar
Chris Lattner committed
be any type with a size.</p>
<table class="layout">
  <tr class="layout">
    <td class="left">
Reid Spencer's avatar
Reid Spencer committed
      <tt>[40 x i32 ]</tt><br/>
      <tt>[41 x i32 ]</tt><br/>
    </td>
    <td class="left">
      Array of 40 32-bit integer values.<br/>
      Array of 41 32-bit integer values.<br/>
      Array of 40 8-bit integer values.<br/>
    </td>
  </tr>
</table>
<p>Here are some examples of multidimensional arrays:</p>
<table class="layout">
  <tr class="layout">
    <td class="left">
Reid Spencer's avatar
Reid Spencer committed
      <tt>[3 x [4 x i32]]</tt><br/>
      <tt>[12 x [10 x float]]</tt><br/>
      <tt>[2 x [3 x [4 x i16]]]</tt><br/>
    </td>
    <td class="left">
      3x4 array of 32-bit integer values.<br/>
      12x10 array of single precision floating point values.<br/>
      2x3x4 array of 16-bit integer  values.<br/>
    </td>
  </tr>
Chris Lattner's avatar
Chris Lattner committed
</table>
<p>Note that 'variable sized arrays' can be implemented in LLVM with a zero 
length array.  Normally, accesses past the end of an array are undefined in