"llvm/git@repo.hca.bsc.es:lalbano/llvm-bpevl.git" did not exist on "ff0598de7523703ccecc2650b1ce1f55069a4afa"
Newer
Older
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>LLVM Assembly Language Reference Manual</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="author" content="Chris Lattner">
<meta name="description"
content="LLVM Assembly Language Reference Manual.">
<link rel="stylesheet" href="llvm.css" type="text/css">
</head>
<body>
<div class="doc_title"> LLVM Language Reference Manual </div>
<li><a href="#abstract">Abstract</a></li>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#identifiers">Identifiers</a></li>
<li><a href="#highlevel">High Level Structure</a>
<ol>
<li><a href="#modulestructure">Module Structure</a></li>
<li><a href="#linkage">Linkage Types</a></li>
<li><a href="#globalvars">Global Variables</a></li>
<li><a href="#functionstructure">Function Structure</a></li>
</ol>
</li>
<li><a href="#t_primitive">Primitive Types</a>
<ol>
<li><a href="#t_classifications">Type Classifications</a></li>
<li><a href="#t_function">Function Type</a></li>
<li><a href="#t_pointer">Pointer Type</a></li>
<li><a href="#t_packed">Packed Type</a></li>
<li><a href="#constants">Constants</a>
<ol>
<li><a href="#simpleconstants">Simple Constants</a>
<li><a href="#aggregateconstants">Aggregate Constants</a>
<li><a href="#globalconstants">Global Variable and Function Addresses</a>
<li><a href="#undefvalues">Undefined Values</a>
<li><a href="#constantexprs">Constant Expressions</a>
</ol>
<li><a href="#instref">Instruction Reference</a>
<ol>
<li><a href="#terminators">Terminator Instructions</a>
<ol>
<li><a href="#i_ret">'<tt>ret</tt>' Instruction</a></li>
<li><a href="#i_br">'<tt>br</tt>' Instruction</a></li>
<li><a href="#i_switch">'<tt>switch</tt>' Instruction</a></li>
<li><a href="#i_invoke">'<tt>invoke</tt>' Instruction</a></li>
<li><a href="#i_unwind">'<tt>unwind</tt>' Instruction</a></li>
<li><a href="#i_unreachable">'<tt>unreachable</tt>' Instruction</a></li>
<li><a href="#i_add">'<tt>add</tt>' Instruction</a></li>
<li><a href="#i_sub">'<tt>sub</tt>' Instruction</a></li>
<li><a href="#i_mul">'<tt>mul</tt>' Instruction</a></li>
<li><a href="#i_div">'<tt>div</tt>' Instruction</a></li>
<li><a href="#i_rem">'<tt>rem</tt>' Instruction</a></li>
<li><a href="#i_setcc">'<tt>set<i>cc</i></tt>' Instructions</a></li>
<li><a href="#bitwiseops">Bitwise Binary Operations</a>
<ol>
<li><a href="#i_and">'<tt>and</tt>' Instruction</a></li>
<li><a href="#i_or">'<tt>or</tt>' Instruction</a></li>
<li><a href="#i_xor">'<tt>xor</tt>' Instruction</a></li>
<li><a href="#i_shl">'<tt>shl</tt>' Instruction</a></li>
<li><a href="#i_shr">'<tt>shr</tt>' Instruction</a></li>
<li><a href="#memoryops">Memory Access Operations</a>
<ol>
<li><a href="#i_malloc">'<tt>malloc</tt>' Instruction</a></li>
<li><a href="#i_free">'<tt>free</tt>' Instruction</a></li>
<li><a href="#i_alloca">'<tt>alloca</tt>' Instruction</a></li>
<li><a href="#i_load">'<tt>load</tt>' Instruction</a></li>
<li><a href="#i_store">'<tt>store</tt>' Instruction</a></li>
<li><a href="#i_getelementptr">'<tt>getelementptr</tt>' Instruction</a></li>
</ol>
</li>
<li><a href="#i_phi">'<tt>phi</tt>' Instruction</a></li>
<li><a href="#i_cast">'<tt>cast .. to</tt>' Instruction</a></li>
<li><a href="#i_select">'<tt>select</tt>' Instruction</a></li>
<li><a href="#i_call">'<tt>call</tt>' Instruction</a></li>
<li><a href="#i_vanext">'<tt>vanext</tt>' Instruction</a></li>
<li><a href="#i_vaarg">'<tt>vaarg</tt>' Instruction</a></li>
<li><a href="#intrinsics">Intrinsic Functions</a>
<ol>
<li><a href="#int_varargs">Variable Argument Handling Intrinsics</a>
<ol>
<li><a href="#i_va_start">'<tt>llvm.va_start</tt>' Intrinsic</a></li>
<li><a href="#i_va_end">'<tt>llvm.va_end</tt>' Intrinsic</a></li>
<li><a href="#i_va_copy">'<tt>llvm.va_copy</tt>' Intrinsic</a></li>
</ol>
</li>
<li><a href="#int_gc">Accurate Garbage Collection Intrinsics</a>
<ol>
<li><a href="#i_gcroot">'<tt>llvm.gcroot</tt>' Intrinsic</a></li>
<li><a href="#i_gcread">'<tt>llvm.gcread</tt>' Intrinsic</a></li>
<li><a href="#i_gcwrite">'<tt>llvm.gcwrite</tt>' Intrinsic</a></li>
</ol>
</li>
<li><a href="#int_codegen">Code Generator Intrinsics</a>
<ol>
<li><a href="#i_returnaddress">'<tt>llvm.returnaddress</tt>' Intrinsic</a></li>
<li><a href="#i_frameaddress">'<tt>llvm.frameaddress</tt>' Intrinsic</a></li>
</ol>
</li>
<li><a href="#int_os">Operating System Intrinsics</a>
<ol>
<li><a href="#i_readport">'<tt>llvm.readport</tt>' Intrinsic</a></li>
<li><a href="#i_writeport">'<tt>llvm.writeport</tt>' Intrinsic</a></li>
<li><a href="#i_readio">'<tt>llvm.readio</tt>' Intrinsic</a></li>
<li><a href="#i_writeio">'<tt>llvm.writeio</tt>' Intrinsic</a></li>
Chris Lattner
committed
<li><a href="#int_libc">Standard C Library Intrinsics</a>
<ol>
<li><a href="#i_memcpy">'<tt>llvm.memcpy</tt>' Intrinsic</a></li>
<li><a href="#i_memmove">'<tt>llvm.memmove</tt>' Intrinsic</a></li>
<li><a href="#i_memset">'<tt>llvm.memset</tt>' Intrinsic</a></li>
<li><a href="#i_isunordered">'<tt>llvm.isunordered</tt>' Intrinsic</a></li>
Chris Lattner
committed
</ol>
</li>
<li><a href="#int_debugger">Debugger intrinsics</a></li>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
and <a href="mailto:vadve@cs.uiuc.edu">Vikram Adve</a></p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="abstract">Abstract </a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>This document is a reference manual for the LLVM assembly language.
LLVM is an SSA based representation that provides type safety,
low-level operations, flexibility, and the capability of representing
'all' high-level languages cleanly. It is the common code
representation used throughout all phases of the LLVM compilation
strategy.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="introduction">Introduction</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>The LLVM code representation is designed to be used in three
different forms: as an in-memory compiler IR, as an on-disk bytecode
representation (suitable for fast loading by a Just-In-Time compiler),
and as a human readable assembly language representation. This allows
LLVM to provide a powerful intermediate representation for efficient
compiler transformations and analysis, while providing a natural means
to debug and visualize the transformations. The three different forms
of LLVM are all equivalent. This document describes the human readable
representation and notation.</p>
<p>The LLVM representation aims to be a light-weight and low-level
while being expressive, typed, and extensible at the same time. It
aims to be a "universal IR" of sorts, by being at a low enough level
that high-level ideas may be cleanly mapped to it (similar to how
microprocessors are "universal IR's", allowing many source languages to
be mapped to them). By providing type information, LLVM can be used as
the target of optimizations: for example, through pointer analysis, it
can be proven that a C automatic variable is never accessed outside of
the current function... allowing it to be promoted to a simple SSA
value instead of a memory location.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="wellformed">Well-Formedness</a> </div>
<div class="doc_text">
<p>It is important to note that this document describes 'well formed'
LLVM assembly language. There is a difference between what the parser
accepts and what is considered 'well formed'. For example, the
following instruction is syntactically okay, but not well formed:</p>
<pre>
%x = <a href="#i_add">add</a> int 1, %x
</pre>
<p>...because the definition of <tt>%x</tt> does not dominate all of
its uses. The LLVM infrastructure provides a verification pass that may
be used to verify that an LLVM module is well formed. This pass is
automatically run by the parser after parsing input assembly, and by
the optimizer before it outputs bytecode. The violations pointed out
by the verifier pass indicate bugs in transformation passes or input to
the parser.</p>
<!-- Describe the typesetting conventions here. --> </div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="identifiers">Identifiers</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>LLVM uses three different forms of identifiers, for different
purposes:</p>
<li>Named values are represented as a string of characters with a '%' prefix.
For example, %foo, %DivisionByZero, %a.really.long.identifier. The actual
regular expression used is '<tt>%[a-zA-Z$._][a-zA-Z$._0-9]*</tt>'.
Identifiers which require other characters in their names can be surrounded
with quotes. In this way, anything except a <tt>"</tt> character can be used
in a name.</li>
<li>Unnamed values are represented as an unsigned numeric value with a '%'
prefix. For example, %12, %2, %44.</li>
<li>Constants, which are described in a <a href="#constants">section about
constants</a>, below.</li>
</ol>
<p>LLVM requires that values start with a '%' sign for two reasons: Compilers
don't need to worry about name clashes with reserved words, and the set of
reserved words may be expanded in the future without penalty. Additionally,
unnamed identifiers allow a compiler to quickly come up with a temporary
variable without having to avoid symbol table conflicts.</p>
<p>Reserved words in LLVM are very similar to reserved words in other
languages. There are keywords for different opcodes ('<tt><a
href="#i_add">add</a></tt>', '<tt><a href="#i_cast">cast</a></tt>', '<tt><a
href="#i_ret">ret</a></tt>', etc...), for primitive type names ('<tt><a
href="#t_void">void</a></tt>', '<tt><a href="#t_uint">uint</a></tt>', etc...),
and others. These reserved words cannot conflict with variable names, because
none of them start with a '%' character.</p>
<p>Here is an example of LLVM code to multiply the integer variable
'<tt>%X</tt>' by 8:</p>
<p>The easy way:</p>
<pre>
%result = <a href="#i_mul">mul</a> uint %X, 8
</pre>
<p>After strength reduction:</p>
<pre>
%result = <a href="#i_shl">shl</a> uint %X, ubyte 3
</pre>
<p>And the hard way:</p>
<pre>
<a href="#i_add">add</a> uint %X, %X <i>; yields {uint}:%0</i>
<a href="#i_add">add</a> uint %0, %0 <i>; yields {uint}:%1</i>
%result = <a href="#i_add">add</a> uint %1, %1
</pre>
<p>This last way of multiplying <tt>%X</tt> by 8 illustrates several
important lexical features of LLVM:</p>
<li>Comments are delimited with a '<tt>;</tt>' and go until the end of
line.</li>
<li>Unnamed temporaries are created when the result of a computation is not
assigned to a named value.</li>
<li>Unnamed temporaries are numbered sequentially</li>
</ol>
<p>...and it also show a convention that we follow in this document. When
demonstrating instructions, we will follow an instruction with a comment that
defines the type and name of value produced. Comments are shown in italic
text.</p>
</div>
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="highlevel">High Level Structure</a> </div>
<!-- *********************************************************************** -->
<!-- ======================================================================= -->
<div class="doc_subsection"> <a name="modulestructure">Module Structure</a>
</div>
<div class="doc_text">
<p>LLVM programs are composed of "Module"s, each of which is a
translation unit of the input programs. Each module consists of
functions, global variables, and symbol table entries. Modules may be
combined together with the LLVM linker, which merges function (and
global variable) definitions, resolves forward declarations, and merges
symbol table entries. Here is an example of the "hello world" module:</p>
<pre><i>; Declare the string constant as a global constant...</i>
<a href="#identifiers">%.LC0</a> = <a href="#linkage_internal">internal</a> <a
href="#globalvars">constant</a> <a href="#t_array">[13 x sbyte]</a> c"hello world\0A\00" <i>; [13 x sbyte]*</i>
<i>; External declaration of the puts function</i>
<a href="#functionstructure">declare</a> int %puts(sbyte*) <i>; int(sbyte*)* </i>
<i>; Definition of main function</i>
int %main() { <i>; int()* </i>
<i>; Convert [13x sbyte]* to sbyte *...</i>
%cast210 = <a
href="#i_getelementptr">getelementptr</a> [13 x sbyte]* %.LC0, long 0, long 0 <i>; sbyte*</i>
<i>; Call puts function to write out the string to stdout...</i>
<a
href="#i_call">call</a> int %puts(sbyte* %cast210) <i>; int</i>
<a
href="#i_ret">ret</a> int 0<br>}<br></pre>
<p>This example is made up of a <a href="#globalvars">global variable</a>
named "<tt>.LC0</tt>", an external declaration of the "<tt>puts</tt>"
function, and a <a href="#functionstructure">function definition</a>
for "<tt>main</tt>".</p>
<p>In general, a module is made up of a list of global values,
where both functions and global variables are global values. Global values are
represented by a pointer to a memory location (in this case, a pointer to an
array of char, and a pointer to a function), and have one of the following <a
href="#linkage">linkage types</a>.</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="linkage">Linkage Types</a>
</div>
<div class="doc_text">
<p>
All Global Variables and Functions have one of the following types of linkage:
</p>
<dt><tt><b><a name="linkage_internal">internal</a></b></tt> </dt>
<dd>Global values with internal linkage are only directly accessible by
objects in the current module. In particular, linking code into a module with
an internal global value may cause the internal to be renamed as necessary to
avoid collisions. Because the symbol is internal to the module, all
references can be updated. This corresponds to the notion of the
'<tt>static</tt>' keyword in C, or the idea of "anonymous namespaces" in C++.
<dt><tt><b><a name="linkage_linkonce">linkonce</a></b></tt>: </dt>
<dd>"<tt>linkonce</tt>" linkage is similar to <tt>internal</tt> linkage, with
the twist that linking together two modules defining the same
<tt>linkonce</tt> globals will cause one of the globals to be discarded. This
is typically used to implement inline functions. Unreferenced
<tt>linkonce</tt> globals are allowed to be discarded.
<dt><tt><b><a name="linkage_weak">weak</a></b></tt>: </dt>
<dd>"<tt>weak</tt>" linkage is exactly the same as <tt>linkonce</tt> linkage,
except that unreferenced <tt>weak</tt> globals may not be discarded. This is
used to implement constructs in C such as "<tt>int X;</tt>" at global scope.
<dt><tt><b><a name="linkage_appending">appending</a></b></tt>: </dt>
<dd>"<tt>appending</tt>" linkage may only be applied to global variables of
pointer to array type. When two global variables with appending linkage are
linked together, the two global arrays are appended together. This is the
LLVM, typesafe, equivalent of having the system linker append together
"sections" with identical names when .o files are linked.
<dt><tt><b><a name="linkage_external">externally visible</a></b></tt>:</dt>
<dd>If none of the above identifiers are used, the global is externally
visible, meaning that it participates in linkage and can be used to resolve
external symbol references.
</dd>
</dl>
<p><a name="linkage_external">For example, since the "<tt>.LC0</tt>"
variable is defined to be internal, if another module defined a "<tt>.LC0</tt>"
variable and was linked with this one, one of the two would be renamed,
preventing a collision. Since "<tt>main</tt>" and "<tt>puts</tt>" are
external (i.e., lacking any linkage declarations), they are accessible
outside of the current module. It is illegal for a function <i>declaration</i>
to have any linkage type other than "externally visible".</a></p>
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="globalvars">Global Variables</a>
</div>
<div class="doc_text">
<p>Global variables define regions of memory allocated at compilation
time instead of run-time. Global variables may optionally be
initialized. A variable may be defined as a global "constant", which
indicates that the contents of the variable will never be modified
(enabling better optimization, allowing the global data to be placed in the
read-only section of an executable, etc).</p>
<p>As SSA values, global variables define pointer values that are in
scope (i.e. they dominate) all basic blocks in the program. Global
variables always define a pointer to their "content" type because they
describe a region of memory, and all memory objects in LLVM are
accessed through pointers.</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="functionstructure">Functions</a>
</div>
<div class="doc_text">
<p>LLVM function definitions are composed of a (possibly empty) argument list,
an opening curly brace, a list of basic blocks, and a closing curly brace. LLVM
function declarations are defined with the "<tt>declare</tt>" keyword, a
function name, and a function signature.</p>
<p>A function definition contains a list of basic blocks, forming the CFG for
the function. Each basic block may optionally start with a label (giving the
basic block a symbol table entry), contains a list of instructions, and ends
with a <a href="#terminators">terminator</a> instruction (such as a branch or
function return).</p>
<p>The first basic block in program is special in two ways: it is immediately
executed on entrance to the function, and it is not allowed to have predecessor
basic blocks (i.e. there can not be any branches to the entry block of a
function). Because the block can have no predecessors, it also cannot have any
<a href="#i_phi">PHI nodes</a>.</p>
<p>LLVM functions are identified by their name and type signature. Hence, two
functions with the same name but different parameter lists or return values are
considered different functions, and LLVM will resolves references to each
appropriately.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="typesystem">Type System</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>The LLVM type system is one of the most important features of the
intermediate representation. Being typed enables a number of
optimizations to be performed on the IR directly, without having to do
extra analyses on the side before the transformation. A strong type
system makes it easier to read the generated code and enables novel
analyses and transformations that are not feasible to perform on normal
three address code representations.</p>
<!-- ======================================================================= -->
<div class="doc_subsection"> <a name="t_primitive">Primitive Types</a> </div>
<div class="doc_text">
<p>The primitive types are the fundamental building blocks of the LLVM
system. The current set of primitive types are as follows:</p>
<table class="layout">
<tr class="layout">
<td class="left">
<table>
<tr><th>Type</th><th>Description</th></tr>
<tr><td><tt>void</tt></td><td>No value</td></tr>
<tr><td><tt>ubyte</tt></td><td>Unsigned 8 bit value</td></tr>
<tr><td><tt>ushort</tt></td><td>Unsigned 16 bit value</td></tr>
<tr><td><tt>uint</tt></td><td>Unsigned 32 bit value</td></tr>
<tr><td><tt>ulong</tt></td><td>Unsigned 64 bit value</td></tr>
<tr><td><tt>float</tt></td><td>32 bit floating point value</td></tr>
<tr><td><tt>label</tt></td><td>Branch destination</td></tr>
<tr><th>Type</th><th>Description</th></tr>
<tr><td><tt>bool</tt></td><td>True or False value</td></tr>
<tr><td><tt>sbyte</tt></td><td>Signed 8 bit value</td></tr>
<tr><td><tt>short</tt></td><td>Signed 16 bit value</td></tr>
<tr><td><tt>int</tt></td><td>Signed 32 bit value</td></tr>
<tr><td><tt>long</tt></td><td>Signed 64 bit value</td></tr>
<tr><td><tt>double</tt></td><td>64 bit floating point value</td></tr>
</table>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_classifications">Type
Classifications</a> </div>
<div class="doc_text">
<p>These different primitive types fall into a few useful
classifications:</p>
<table border="1" cellspacing="0" cellpadding="4">
<tr><th>Classification</th><th>Types</th></tr>
<tr>
<td><a name="t_signed">signed</a></td>
<td><tt>sbyte, short, int, long, float, double</tt></td>
</tr>
<tr>
<td><a name="t_unsigned">unsigned</a></td>
<td><tt>ubyte, ushort, uint, ulong</tt></td>
</tr>
<tr>
<td><a name="t_integer">integer</a></td>
<td><tt>ubyte, sbyte, ushort, short, uint, int, ulong, long</tt></td>
</tr>
<tr>
<td><a name="t_integral">integral</a></td>
Misha Brukman
committed
<td><tt>bool, ubyte, sbyte, ushort, short, uint, int, ulong, long</tt>
</td>
</tr>
<tr>
<td><a name="t_floating">floating point</a></td>
<td><tt>float, double</tt></td>
</tr>
<tr>
<td><a name="t_firstclass">first class</a></td>
Misha Brukman
committed
<td><tt>bool, ubyte, sbyte, ushort, short, uint, int, ulong, long,<br>
float, double, <a href="#t_pointer">pointer</a>,
<a href="#t_packed">packed</a></tt></td>
</table>
<p>The <a href="#t_firstclass">first class</a> types are perhaps the
most important. Values of these types are the only ones which can be
produced by instructions, passed as arguments, or used as operands to
instructions. This means that all structures and arrays must be
manipulated either by pointer or by component.</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"> <a name="t_derived">Derived Types</a> </div>
<div class="doc_text">
<p>The real power in LLVM comes from the derived types in the system.
This is what allows a programmer to represent arrays, functions,
pointers, and other useful types. Note that these derived types may be
recursive: For example, it is possible to have a two dimensional array.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_array">Array Type</a> </div>
<div class="doc_text">
<p>The array type is a very simple derived type that arranges elements
sequentially in memory. The array type requires a size (number of
elements) and an underlying data type.</p>
<h5>Syntax:</h5>
<pre>
[<# elements> x <elementtype>]
</pre>
<p>The number of elements is a constant integer value, elementtype may
be any type with a size.</p>
<h5>Examples:</h5>
<table class="layout">
<tr class="layout">
<td class="left">
<tt>[40 x int ]</tt><br/>
<tt>[41 x int ]</tt><br/>
<tt>[40 x uint]</tt><br/>
</td>
<td class="left">
Array of 40 integer values.<br/>
Array of 41 integer values.<br/>
Array of 40 unsigned integer values.<br/>
</td>
</tr>
</table>
<p>Here are some examples of multidimensional arrays:</p>
<table class="layout">
<tr class="layout">
<td class="left">
<tt>[3 x [4 x int]]</tt><br/>
<tt>[12 x [10 x float]]</tt><br/>
<tt>[2 x [3 x [4 x uint]]]</tt><br/>
</td>
<td class="left">
3x4 array integer values.<br/>
12x10 array of single precision floating point values.<br/>
2x3x4 array of unsigned integer values.<br/>
</td>
</tr>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_function">Function Type</a> </div>
<div class="doc_text">
<p>The function type can be thought of as a function signature. It
consists of a return type and a list of formal parameter types.
Function types are usually used to build virtual function tables
(which are structures of pointers to functions), for indirect function
calls, and when defining a function.</p>
<p>
The return type of a function type cannot be an aggregate type.
</p>
<pre> <returntype> (<parameter list>)<br></pre>
Misha Brukman
committed
<p>Where '<tt><parameter list></tt>' is a comma-separated list of type
specifiers. Optionally, the parameter list may include a type <tt>...</tt>,
Chris Lattner
committed
which indicates that the function takes a variable number of arguments.
Variable argument functions can access their arguments with the <a
href="#int_varargs">variable argument handling intrinsic</a> functions.</p>
<table class="layout">
<tr class="layout">
<td class="left">
<tt>int (int)</tt> <br/>
<tt>float (int, int *) *</tt><br/>
<tt>int (sbyte *, ...)</tt><br/>
</td>
<td class="left">
function taking an <tt>int</tt>, returning an <tt>int</tt><br/>
<a href="#t_pointer">Pointer</a> to a function that takes an
Misha Brukman
committed
<tt>int</tt> and a <a href="#t_pointer">pointer</a> to <tt>int</tt>,
returning <tt>float</tt>.<br/>
A vararg function that takes at least one <a href="#t_pointer">pointer</a>
to <tt>sbyte</tt> (signed char in C), which returns an integer. This is
the signature for <tt>printf</tt> in LLVM.<br/>
</td>
</tr>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_struct">Structure Type</a> </div>
<div class="doc_text">
<p>The structure type is used to represent a collection of data members
together in memory. The packing of the field types is defined to match
the ABI of the underlying processor. The elements of a structure may
be any type that has a size.</p>
<p>Structures are accessed using '<tt><a href="#i_load">load</a></tt>
and '<tt><a href="#i_store">store</a></tt>' by getting a pointer to a
field with the '<tt><a href="#i_getelementptr">getelementptr</a></tt>'
instruction.</p>
<table class="layout">
<tr class="layout">
<td class="left">
<tt>{ int, int, int }</tt><br/>
<tt>{ float, int (int) * }</tt><br/>
</td>
<td class="left">
a triple of three <tt>int</tt> values<br/>
A pair, where the first element is a <tt>float</tt> and the second element
is a <a href="#t_pointer">pointer</a> to a <a href="#t_function">function</a>
that takes an <tt>int</tt>, returning an <tt>int</tt>.<br/>
</td>
</tr>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_pointer">Pointer Type</a> </div>
<div class="doc_text">
<h5>Overview:</h5>
<p>As in many languages, the pointer type represents a pointer or
reference to another object, which must live in memory.</p>
<h5>Syntax:</h5>
<h5>Examples:</h5>
<table class="layout">
<tr class="layout">
<td class="left">
<tt>[4x int]*</tt><br/>
<tt>int (int *) *</tt><br/>
</td>
<td class="left">
A <a href="#t_pointer">pointer</a> to <a href="#t_array">array</a> of
four <tt>int</tt> values<br/>
A <a href="#t_pointer">pointer</a> to a <a
Misha Brukman
committed
href="#t_function">function</a> that takes an <tt>int</tt>, returning an
</table>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="t_packed">Packed Type</a> </div>
<div class="doc_text">
<h5>Overview:</h5>
<p>A packed type is a simple derived type that represents a vector
of elements. Packed types are used when multiple primitive data
are operated in parallel using a single instruction (SIMD).
A packed type requires a size (number of
elements) and an underlying primitive data type. Packed types are
considered <a href="#t_firstclass">first class</a>.</p>
<h5>Syntax:</h5>
<pre> < <# elements> x <elementtype> ><br></pre>
<p>The number of elements is a constant integer value, elementtype may
be any integral or floating point type.</p>
<h5>Examples:</h5>
<table class="layout">
<tr class="layout">
<td class="left">
<tt><4 x int></tt><br/>
<tt><8 x float></tt><br/>
<tt><2 x uint></tt><br/>
</td>
<td class="left">
Packed vector of 4 integer values.<br/>
Packed vector of 8 floating-point values.<br/>
Packed vector of 2 unsigned integer values.<br/>
</td>
</tr>
</table>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="constants">Constants</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>LLVM has several different basic types of constants. This section describes
them all and their syntax.</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"><a name="simpleconstants">Simple Constants</a></div>
<div class="doc_text">
<dl>
<dt><b>Boolean constants</b></dt>
<dd>The two strings '<tt>true</tt>' and '<tt>false</tt>' are both valid
constants of the <tt><a href="#t_primitive">bool</a></tt> type.
</dd>
<dt><b>Integer constants</b></dt>
<dd>Standard integers (such as '4') are constants of the <a
href="#t_integer">integer</a> type. Negative numbers may be used with signed
integer types.
</dd>
<dt><b>Floating point constants</b></dt>
<dd>Floating point constants use standard decimal notation (e.g. 123.421),
exponential notation (e.g. 1.23421e+2), or a more precise hexadecimal
notation. Floating point constants have an optional hexadecimal
notation (see below). Floating point constants must have a <a
href="#t_floating">floating point</a> type. </dd>
<dt><b>Null pointer constants</b></dt>
<dd>The identifier '<tt>null</tt>' is recognized as a null pointer constant,
and must be of <a href="#t_pointer">pointer type</a>.</dd>
</dl>
<p>The one non-intuitive notation for constants is the optional hexidecimal form
of floating point constants. For example, the form '<tt>double
0x432ff973cafa8000</tt>' is equivalent to (but harder to read than) '<tt>double
4.5e+15</tt>'. The only time hexadecimal floating point constants are required
(and the only time that they are generated by the disassembler) is when a
floating point constant must be emitted but it cannot be represented as a
decimal floating point number. For example, NaN's, infinities, and other
special values are represented in their IEEE hexadecimal format so that
assembly and disassembly do not cause any bits to change in the constants.</p>
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"><a name="aggregateconstants">Aggregate Constants</a>
</div>
<div class="doc_text">
<dl>
<dt><b>Structure constants</b></dt>
<dd>Structure constants are represented with notation similar to structure
type definitions (a comma separated list of elements, surrounded by braces
(<tt>{}</tt>). For example: "<tt>{ int 4, float 17.0 }</tt>". Structure
constants must have <a href="#t_struct">structure type</a>, and the number and
types of elements must match those specified by the type.
</dd>
<dt><b>Array constants</b></dt>
<dd>Array constants are represented with notation similar to array type
definitions (a comma separated list of elements, surrounded by square brackets
(<tt>[]</tt>). For example: "<tt>[ int 42, int 11, int 74 ]</tt>". Array
constants must have <a href="#t_array">array type</a>, and the number and
types of elements must match those specified by the type.
</dd>
<dt><b>Packed constants</b></dt>
<dd>Packed constants are represented with notation similar to packed type
definitions (a comma separated list of elements, surrounded by
less-than/greater-than's (<tt><></tt>). For example: "<tt>< int 42,
int 11, int 74, int 100 ></tt>". Packed constants must have <a
href="#t_packed">packed type</a>, and the number and types of elements must
match those specified by the type.
</dd>
<dt><b>Zero initialization</b></dt>
<dd>The string '<tt>zeroinitializer</tt>' can be used to zero initialize a
value to zero of <em>any</em> type, including scalar and aggregate types.
This is often used to avoid having to print large zero initializers (e.g. for
large arrays), and is always exactly equivalent to using explicit zero
initializers.
</dd>
</dl>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="globalconstants">Global Variable and Function Addresses</a>
</div>
<div class="doc_text">
<p>The addresses of <a href="#globalvars">global variables</a> and <a
href="#functionstructure">functions</a> are always implicitly valid (link-time)
constants. These constants explicitly referenced when the <a
href="#identifiers">identifier for the global</a> is used, and always have <a
href="#t_pointer">pointer</a> type. For example, the following is a legal LLVM
file:</p>
<pre>
%X = global int 17
%Y = global int 42
%Z = global [2 x int*] [ int* %X, int* %Y ]
</pre>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"><a name="undefvalues">Undefined Values</a></div>
<div class="doc_text">
<p>The string '<tt>undef</tt>' is recognized as a type-less constant that has
no specific value. Undefined values may be of any type, and be used anywhere
a constant is permitted.</p>
<p>Undefined values indicate to the compiler that the program is well defined
no matter what value is used, giving the compiler more freedom to optimize.
</p>
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"><a name="constantexprs">Constant Expressions</a>
</div>
<div class="doc_text">
<p>Constant expressions are used to allow expressions involving other constants
to be used as constants. Constant expressions may be of any <a
href="#t_firstclass">first class</a> type, and may involve any LLVM operation
that does not have side effects (e.g. load and call are not supported). The
following is the syntax for constant expressions:</p>
<dl>
<dt><b><tt>cast ( CST to TYPE )</tt></b></dt>
<dd>Cast a constant to another type.</dd>
<dt><b><tt>getelementptr ( CSTPTR, IDX0, IDX1, ... )</tt></b></dt>
<dd>Perform the <a href="#i_getelementptr">getelementptr operation</a> on
constants. As with the <a href="#i_getelementptr">getelementptr</a>
instruction, the index list may have zero or more indexes, which are required
to make sense for the type of "CSTPTR".</dd>
<dt><b><tt>OPCODE ( LHS, RHS )</tt></b></dt>
<dd>Perform the specified operation of the LHS and RHS constants. OPCODE may
be any of the <a href="#binaryops">binary</a> or <a href="#bitwiseops">bitwise
binary</a> operations. The constraints on operands are the same as those for
the corresponding instruction (e.g. no bitwise operations on floating point
are allowed).</dd>
</dl>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="instref">Instruction Reference</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>The LLVM instruction set consists of several different
classifications of instructions: <a href="#terminators">terminator
instructions</a>, <a href="#binaryops">binary instructions</a>, <a
href="#memoryops">memory instructions</a>, and <a href="#otherops">other
instructions</a>.</p>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection"> <a name="terminators">Terminator
Instructions</a> </div>
<div class="doc_text">
<p>As mentioned <a href="#functionstructure">previously</a>, every
basic block in a program ends with a "Terminator" instruction, which
indicates which block should be executed after the current block is
finished. These terminator instructions typically yield a '<tt>void</tt>'
value: they produce control flow, not values (the one exception being
the '<a href="#i_invoke"><tt>invoke</tt></a>' instruction).</p>
<p>There are five different terminator instructions: the '<a
href="#i_ret"><tt>ret</tt></a>' instruction, the '<a href="#i_br"><tt>br</tt></a>'
instruction, the '<a href="#i_switch"><tt>switch</tt></a>' instruction,
the '<a href="#i_invoke"><tt>invoke</tt></a>' instruction, the '<a
href="#i_unwind"><tt>unwind</tt></a>' instruction, and the '<a
href="#i_unreachable"><tt>unreachable</tt></a>' instruction.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection"> <a name="i_ret">'<tt>ret</tt>'
Instruction</a> </div>
<div class="doc_text">
<pre> ret <type> <value> <i>; Return a value from a non-void function</i>
ret void <i>; Return from void function</i>
<p>The '<tt>ret</tt>' instruction is used to return control flow (and a
value) from a function, back to the caller.</p>
<p>There are two forms of the '<tt>ret</tt>' instruction: one that
returns a value and then causes control flow, and one that just causes
control flow to occur.</p>
<p>The '<tt>ret</tt>' instruction may return any '<a
href="#t_firstclass">first class</a>' type. Notice that a function is
not <a href="#wellformed">well formed</a> if there exists a '<tt>ret</tt>'
instruction inside of the function that returns a value that does not
match the return type of the function.</p>
<p>When the '<tt>ret</tt>' instruction is executed, control flow
returns back to the calling function's context. If the caller is a "<a
href="#i_call"><tt>call</tt></a>" instruction, execution continues at