BytecodeFormat.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title>LLVM Bytecode File Format</title>
  <link rel="stylesheet" href="llvm.css" type="text/css">
  <style type="text/css">
    TR, TD { border: 2px solid gray; padding-left: 4pt; padding-right: 4pt; 
             padding-top: 2pt; padding-bottom: 2pt; }
    TH { border: 2px solid gray; font-weight: bold; font-size: 105%; }
    TABLE { text-align: center; border: 2px solid black; 
      border-collapse: collapse; margin-top: 1em; margin-left: 1em; 
      margin-right: 1em; margin-bottom: 1em; }
    .td_left { border: 2px solid gray; text-align: left; }
  </style>
</head>
<body>
<div class="doc_title"> LLVM Bytecode File Format </div>
<ol>
  <li><a href="#abstract">Abstract</a></li>
  <li><a href="#concepts">Concepts</a>
    <ol>
      <li><a href="#blocks">Blocks</a></li>
      <li><a href="#lists">Lists</a></li>
      <li><a href="#fields">Fields</a></li>
      <li><a href="#align">Alignment</a></li>
      <li><a href="#vbr">Variable Bit-Rate Encoding</a></li>
      <li><a href="#encoding">Encoding Primitives</a></li>
      <li><a href="#slots">Slots</a></li>
    </ol>
  </li>
  <li><a href="#general">General Structure</a> </li>
  <li><a href="#blockdefs">Block Definitions</a>
    <ol>
      <li><a href="#signature">Signature Block</a></li>
      <li><a href="#module">Module Block</a></li>
      <li><a href="#globaltypes">Global Type Pool</a></li>
      <li><a href="#globalinfo">Module Info Block</a></li>
      <li><a href="#constantpool">Global Constant Pool</a></li>
      <li><a href="#functiondefs">Function Definition</a></li>
      <li><a href="#compactiontable">Compaction Table</a></li>
      <li><a href="#instructionlist">Instruction List</a></li>
      <li><a href="#symtab">Symbol Table</a></li>
    </ol>
  </li>
  <li><a href="#versiondiffs">Version Differences</a>
    <ol>
      <li><a href="#vers12">Version 1.2 Differences From 1.3</a></li>
      <li><a href="#vers11">Version 1.1 Differences From 1.2</a></li>
      <li><a href="#vers10">Version 1.0 Differences From 1.1</a></li>
    </ol>
  </li>
</ol>
<div class="doc_author">
<p>Written by <a href="mailto:rspencer@x10sys.com">Reid Spencer</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="abstract">Abstract </a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>This document describes the LLVM bytecode file format. It specifies
the binary encoding rules of the bytecode file format so that
equivalent systems can encode bytecode files correctly. The LLVM
bytecode representation is used to store the intermediate
representation on disk in compacted form.</p>
<p>The LLVM bytecode format may change in the future, but LLVM will
always be backwards compatible with older formats. This document will
only describe the most current version of the bytecode format. See <a
 href="#versiondiffs">Version Differences</a> for the details on how
the current version is different from previous versions.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"> <a name="concepts">Concepts</a> </div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>This section describes the general concepts of the bytecode file
format without getting into specific layout details. It is recommended
that you read this section thoroughly before interpreting the detailed
descriptions.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="blocks">Blocks</a> </div>
<div class="doc_text">
<p>LLVM bytecode files consist simply of a sequence of blocks of bytes
using a binary encoding Each block begins with an header of two
unsigned integers. The first value identifies the type of block and the
second value provides the size of the block in bytes. The block
identifier is used because it is possible for entire blocks to be
omitted from the file if they are empty. The block identifier helps the
reader determine which kind of block is next in the file. Note that
blocks can be nested within other blocks.</p>
<p> All blocks are variable length, and the block header specifies the
size of the block. All blocks begin on a byte index that is aligned to
an even 32-bit boundary. That is, the first block is 32-bit aligned
because it starts at offset 0. Each block is padded with zero fill
bytes to ensure that the next block also starts on a 32-bit boundary.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="lists">Lists</a> </div>
<div class="doc_text">
<p>LLVM Bytecode blocks often contain lists of things of a similar
type. For example, a function contains a list of instructions and a
function type contains a list of argument types. There are two basic
types of lists: length lists (<a href="#llist">llist</a>), and null
terminated lists (<a href="#zlist">zlist</a>), as described below in
the <a href="#encoding">Encoding Primitives</a>.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="fields">Fields</a> </div>
<div class="doc_text">
<p>Fields are units of information that LLVM knows how to write atomically. Most 
fields have a uniform length or some kind of length indication built into their 
encoding. For example, a constant string (array of bytes) is written simply as 
the length followed by the characters. Although this is similar to a list, 
constant strings are treated atomically and are thus fields.</p>
<p>Fields use a condensed bit format specific to the type of information
they must contain. As few bits as possible are written for each field. The
sections that follow will provide the details on how these fields are
written and how the bits are to be interpreted.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="align">Alignment</a> </div>
<div class="doc_text">
  <p>To support cross-platform differences, the bytecode file is aligned on 
  certain boundaries. This means that a small amount of padding (at most 3 
  bytes) will be added to ensure that the next entry is aligned to a 32-bit 
  boundary.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="vbr">Variable Bit-Rate Encoding</a>
</div>
<div class="doc_text">
<p>Most of the values written to LLVM bytecode files are small integers. To 
minimize the number of bytes written for these quantities, an encoding scheme 
similar to UTF-8 is used to write integer data. The scheme is known as
variable bit rate (vbr) encoding. In this encoding, the high bit of
each byte is used to indicate if more bytes follow. If (byte &amp;
0x80) is non-zero in any given byte, it means there is another byte
immediately following that also contributes to the value. For the final
byte (byte &amp; 0x80) is false (the high bit is not set). In each byte
only the low seven bits contribute to the value. Consequently 32-bit
quantities can take from one to <em>five</em> bytes to encode. In
general, smaller quantities will encode in fewer bytes, as follows:</p>
<table>
  <tbody>
    <tr>
      <th>Byte #</th>
      <th>Significant Bits</th>
      <th>Maximum Value</th>
    </tr>
    <tr>
      <td>1</td>
      <td>0-6</td>
      <td>127</td>
    </tr>
    <tr>
      <td>2</td>
      <td>7-13</td>
      <td>16,383</td>
    </tr>
    <tr>
      <td>3</td>
      <td>14-20</td>
      <td>2,097,151</td>
    </tr>
    <tr>
      <td>4</td>
      <td>21-27</td>
      <td>268,435,455</td>
    </tr>
    <tr>
      <td>5</td>
      <td>28-34</td>
      <td>34,359,738,367</td>
    </tr>
    <tr>
      <td>6</td>
      <td>35-41</td>
      <td>4,398,046,511,103</td>
    </tr>
    <tr>
      <td>7</td>
      <td>42-48</td>
      <td>562,949,953,421,311</td>
    </tr>
    <tr>
      <td>8</td>
      <td>49-55</td>
      <td>72,057,594,037,927,935</td>
    </tr>
    <tr>
      <td>9</td>
      <td>56-62</td>
      <td>9,223,372,036,854,775,807</td>
    </tr>
    <tr>
      <td>10</td>
      <td>63-69</td>
      <td>1,180,591,620,717,411,303,423</td>
    </tr>