Skip to content
LangRef.rst 272 KiB
Newer Older
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Pointer Aliasing Rules
----------------------

Any memory access must be done through a pointer value associated with
an address range of the memory access, otherwise the behavior is
undefined. Pointer values are associated with address ranges according
to the following rules:

-  A pointer value is associated with the addresses associated with any
   value it is *based* on.
-  An address of a global variable is associated with the address range
   of the variable's storage.
-  The result value of an allocation instruction is associated with the
   address range of the allocated storage.
-  A null pointer in the default address-space is associated with no
   address.
-  An integer constant other than zero or a pointer value returned from
   a function not defined within LLVM may be associated with address
   ranges allocated through mechanisms other than those provided by
   LLVM. Such ranges shall not overlap with any ranges of addresses
   allocated by mechanisms provided by LLVM.

A pointer value is *based* on another pointer value according to the
following rules:

-  A pointer value formed from a ``getelementptr`` operation is *based*
   on the first operand of the ``getelementptr``.
-  The result value of a ``bitcast`` is *based* on the operand of the
   ``bitcast``.
-  A pointer value formed by an ``inttoptr`` is *based* on all pointer
   values that contribute (directly or indirectly) to the computation of
   the pointer's value.
-  The "*based* on" relationship is transitive.

Note that this definition of *"based"* is intentionally similar to the
definition of *"based"* in C99, though it is slightly weaker.

LLVM IR does not associate types with memory. The result type of a
``load`` merely indicates the size and alignment of the memory from
which to load, as well as the interpretation of the value. The first
operand type of a ``store`` similarly only indicates the size and
alignment of the store.

Consequently, type-based alias analysis, aka TBAA, aka
``-fstrict-aliasing``, is not applicable to general unadorned LLVM IR.
:ref:`Metadata <metadata>` may be used to encode additional information
which specialized optimization passes may use to implement type-based
alias analysis.

.. _volatile:

Volatile Memory Accesses
------------------------

Certain memory accesses, such as :ref:`load <i_load>`'s,
:ref:`store <i_store>`'s, and :ref:`llvm.memcpy <int_memcpy>`'s may be
marked ``volatile``. The optimizers must not change the number of
volatile operations or change their order of execution relative to other
volatile operations. The optimizers *may* change the order of volatile
operations relative to non-volatile operations. This is not Java's
"volatile" and has no cross-thread synchronization behavior.

.. _memmodel:

Memory Model for Concurrent Operations
--------------------------------------

The LLVM IR does not define any way to start parallel threads of
execution or to register signal handlers. Nonetheless, there are
platform-specific ways to create them, and we define LLVM IR's behavior
in their presence. This model is inspired by the C++0x memory model.

For a more informal introduction to this model, see the :doc:`Atomics`.

We define a *happens-before* partial order as the least partial order
that

-  Is a superset of single-thread program order, and
-  When a *synchronizes-with* ``b``, includes an edge from ``a`` to
   ``b``. *Synchronizes-with* pairs are introduced by platform-specific
   techniques, like pthread locks, thread creation, thread joining,
   etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
   Constraints <ordering>`).

Note that program order does not introduce *happens-before* edges
between a thread and signals executing inside that thread.

Every (defined) read operation (load instructions, memcpy, atomic
loads/read-modify-writes, etc.) R reads a series of bytes written by
(defined) write operations (store instructions, atomic
stores/read-modify-writes, memcpy, etc.). For the purposes of this
section, initialized globals are considered to have a write of the
initializer which is atomic and happens before any other read or write
of the memory in question. For each byte of a read R, R\ :sub:`byte`
may see any write to the same byte, except:

-  If write\ :sub:`1`  happens before write\ :sub:`2`, and
   write\ :sub:`2` happens before R\ :sub:`byte`, then
   R\ :sub:`byte` does not see write\ :sub:`1`.
-  If R\ :sub:`byte` happens before write\ :sub:`3`, then
   R\ :sub:`byte` does not see write\ :sub:`3`.

Given that definition, R\ :sub:`byte` is defined as follows:

-  If R is volatile, the result is target-dependent. (Volatile is
   supposed to give guarantees which can support ``sig_atomic_t`` in
   C/C++, and may be used for accesses to addresses which do not behave
   like normal memory. It does not generally provide cross-thread
   synchronization.)
-  Otherwise, if there is no write to the same byte that happens before
   R\ :sub:`byte`, R\ :sub:`byte` returns ``undef`` for that byte.
-  Otherwise, if R\ :sub:`byte` may see exactly one write,
   R\ :sub:`byte` returns the value written by that write.
-  Otherwise, if R is atomic, and all the writes R\ :sub:`byte` may
   see are atomic, it chooses one of the values written. See the :ref:`Atomic
   Memory Ordering Constraints <ordering>` section for additional
   constraints on how the choice is made.
-  Otherwise R\ :sub:`byte` returns ``undef``.

R returns the value composed of the series of bytes it read. This
implies that some bytes within the value may be ``undef`` **without**
the entire value being ``undef``. Note that this only defines the
semantics of the operation; it doesn't mean that targets will emit more
than one instruction to read the series of bytes.

Note that in cases where none of the atomic intrinsics are used, this
model places only one restriction on IR transformations on top of what
is required for single-threaded execution: introducing a store to a byte
which might not otherwise be stored is not allowed in general.
(Specifically, in the case where another thread might write to and read
from an address, introducing a store can change a load that may see
exactly one write into a load that may see multiple writes.)

.. _ordering:

Atomic Memory Ordering Constraints
----------------------------------

Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
:ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
:ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
an ordering parameter that determines which other atomic instructions on
the same address they *synchronize with*. These semantics are borrowed
from Java and C++0x, but are somewhat more colloquial. If these
descriptions aren't precise enough, check those specs (see spec
references in the :doc:`atomics guide <Atomics>`).
:ref:`fence <i_fence>` instructions treat these orderings somewhat
differently since they don't take an address. See that instruction's
documentation for details.

For a simpler introduction to the ordering constraints, see the
:doc:`Atomics`.

``unordered``
    The set of values that can be read is governed by the happens-before
    partial order. A value cannot be read unless some operation wrote
    it. This is intended to provide a guarantee strong enough to model
    Java's non-volatile shared variables. This ordering cannot be
    specified for read-modify-write operations; it is not strong enough
    to make them atomic in any interesting way.
``monotonic``
    In addition to the guarantees of ``unordered``, there is a single
    total order for modifications by ``monotonic`` operations on each
    address. All modification orders must be compatible with the
    happens-before order. There is no guarantee that the modification
    orders can be combined to a global total order for the whole program
    (and this often will not be possible). The read in an atomic
    read-modify-write operation (:ref:`cmpxchg <i_cmpxchg>` and
    :ref:`atomicrmw <i_atomicrmw>`) reads the value in the modification
    order immediately before the value it writes. If one atomic read
    happens before another atomic read of the same address, the later
    read must see the same value or a later value in the address's
    modification order. This disallows reordering of ``monotonic`` (or
    stronger) operations on the same address. If an address is written
    ``monotonic``-ally by one thread, and other threads ``monotonic``-ally
    read that address repeatedly, the other threads must eventually see
    the write. This corresponds to the C++0x/C1x
    ``memory_order_relaxed``.
``acquire``
    In addition to the guarantees of ``monotonic``, a
    *synchronizes-with* edge may be formed with a ``release`` operation.
    This is intended to model C++'s ``memory_order_acquire``.
``release``
    In addition to the guarantees of ``monotonic``, if this operation
    writes a value which is subsequently read by an ``acquire``
    operation, it *synchronizes-with* that operation. (This isn't a
    complete description; see the C++0x definition of a release
    sequence.) This corresponds to the C++0x/C1x
    ``memory_order_release``.
``acq_rel`` (acquire+release)
    Acts as both an ``acquire`` and ``release`` operation on its
    address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
``seq_cst`` (sequentially consistent)
    In addition to the guarantees of ``acq_rel`` (``acquire`` for an
    operation which only reads, ``release`` for an operation which only
    writes), there is a global total order on all
    sequentially-consistent operations on all addresses, which is
    consistent with the *happens-before* partial order and with the
    modification orders of all the affected addresses. Each
    sequentially-consistent read sees the last preceding write to the
    same address in this global order. This corresponds to the C++0x/C1x
    ``memory_order_seq_cst`` and Java volatile.

.. _singlethread:

If an atomic operation is marked ``singlethread``, it only *synchronizes
with* or participates in modification and seq\_cst total orderings with
other operations running in the same thread (for example, in signal
handlers).

.. _fastmath:

Fast-Math Flags
---------------

LLVM IR floating-point binary ops (:ref:`fadd <i_fadd>`,
:ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`,
:ref:`frem <i_frem>`) have the following flags that can set to enable
otherwise unsafe floating point operations

``nnan``
   No NaNs - Allow optimizations to assume the arguments and result are not
   NaN. Such optimizations are required to retain defined behavior over
   NaNs, but the value of the result is undefined.

``ninf``
   No Infs - Allow optimizations to assume the arguments and result are not
   +/-Inf. Such optimizations are required to retain defined behavior over
   +/-Inf, but the value of the result is undefined.

``nsz``
   No Signed Zeros - Allow optimizations to treat the sign of a zero
   argument or result as insignificant.

``arcp``
   Allow Reciprocal - Allow optimizations to use the reciprocal of an
   argument rather than perform division.

``fast``
   Fast - Allow algebraically equivalent transformations that may
   dramatically change results in floating point (e.g. reassociate). This
   flag implies all the others.

.. _typesystem:

Type System
===========

The LLVM type system is one of the most important features of the
intermediate representation. Being typed enables a number of
optimizations to be performed on the intermediate representation
directly, without having to do extra analyses on the side before the
transformation. A strong type system makes it easier to read the
generated code and enables novel analyses and transformations that are
not feasible to perform on normal three address code representations.

Type Classifications
--------------------

The types fall into a few useful classifications:


.. list-table::
   :header-rows: 1

   * - Classification
     - Types

   * - :ref:`integer <t_integer>`
     - ``i1``, ``i2``, ``i3``, ... ``i8``, ... ``i16``, ... ``i32``, ...
       ``i64``, ...

   * - :ref:`floating point <t_floating>`
     - ``half``, ``float``, ``double``, ``x86_fp80``, ``fp128``,
       ``ppc_fp128``


   * - first class

       .. _t_firstclass:

     - :ref:`integer <t_integer>`, :ref:`floating point <t_floating>`,
       :ref:`pointer <t_pointer>`, :ref:`vector <t_vector>`,
       :ref:`structure <t_struct>`, :ref:`array <t_array>`,
       :ref:`label <t_label>`, :ref:`metadata <t_metadata>`.

   * - :ref:`primitive <t_primitive>`
     - :ref:`label <t_label>`,
       :ref:`void <t_void>`,
       :ref:`integer <t_integer>`,
       :ref:`floating point <t_floating>`,
       :ref:`x86mmx <t_x86mmx>`,
       :ref:`metadata <t_metadata>`.

   * - :ref:`derived <t_derived>`
     - :ref:`array <t_array>`,
       :ref:`function <t_function>`,
       :ref:`pointer <t_pointer>`,
       :ref:`structure <t_struct>`,
       :ref:`vector <t_vector>`,
       :ref:`opaque <t_opaque>`.

The :ref:`first class <t_firstclass>` types are perhaps the most important.
Values of these types are the only ones which can be produced by
instructions.

.. _t_primitive:

Primitive Types
---------------

The primitive types are the fundamental building blocks of the LLVM
system.

.. _t_integer:

Integer Type
^^^^^^^^^^^^

Overview:
"""""""""

The integer type is a very simple type that simply specifies an
arbitrary bit width for the integer type desired. Any bit width from 1
bit to 2\ :sup:`23`\ -1 (about 8 million) can be specified.

Syntax:
"""""""

::

      iN

The number of bits the integer will occupy is specified by the ``N``
value.

Examples:
"""""""""

+----------------+------------------------------------------------+
| ``i1``         | a single-bit integer.                          |
+----------------+------------------------------------------------+
| ``i32``        | a 32-bit integer.                              |
+----------------+------------------------------------------------+
| ``i1942652``   | a really big integer of over 1 million bits.   |
+----------------+------------------------------------------------+

.. _t_floating:

Floating Point Types
^^^^^^^^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1

   * - Type
     - Description

   * - ``half``
     - 16-bit floating point value

   * - ``float``
     - 32-bit floating point value

   * - ``double``
     - 64-bit floating point value

   * - ``fp128``
     - 128-bit floating point value (112-bit mantissa)

   * - ``x86_fp80``
     -  80-bit floating point value (X87)

   * - ``ppc_fp128``
     - 128-bit floating point value (two 64-bits)

.. _t_x86mmx:

X86mmx Type
^^^^^^^^^^^

Overview:
"""""""""

The x86mmx type represents a value held in an MMX register on an x86
machine. The operations allowed on it are quite limited: parameters and
return values, load and store, and bitcast. User-specified MMX
instructions are represented as intrinsic or asm calls with arguments
and/or results of this type. There are no arrays, vectors or constants
of this type.

Syntax:
"""""""

::

      x86mmx

.. _t_void:

Void Type
^^^^^^^^^

Overview:
"""""""""

The void type does not represent any value and has no size.

Syntax:
"""""""

::

      void

.. _t_label:

Label Type
^^^^^^^^^^

Overview:
"""""""""

The label type represents code labels.

Syntax:
"""""""

::

      label

.. _t_metadata:

Metadata Type
^^^^^^^^^^^^^

Overview:
"""""""""

The metadata type represents embedded metadata. No derived types may be
created from metadata except for :ref:`function <t_function>` arguments.

Syntax:
"""""""

::

      metadata

.. _t_derived:

Derived Types
-------------

The real power in LLVM comes from the derived types in the system. This
is what allows a programmer to represent arrays, functions, pointers,
and other useful types. Each of these types contain one or more element
types which may be a primitive type, or another derived type. For
example, it is possible to have a two dimensional array, using an array
as the element type of another array.

.. _t_aggregate:

Aggregate Types
^^^^^^^^^^^^^^^

Aggregate Types are a subset of derived types that can contain multiple
member types. :ref:`Arrays <t_array>` and :ref:`structs <t_struct>` are
aggregate types. :ref:`Vectors <t_vector>` are not considered to be
aggregate types.

.. _t_array:

Array Type
^^^^^^^^^^

Overview:
"""""""""

The array type is a very simple derived type that arranges elements
sequentially in memory. The array type requires a size (number of
elements) and an underlying data type.

Syntax:
"""""""

::

      [<# elements> x <elementtype>]

The number of elements is a constant integer value; ``elementtype`` may
be any type with a size.

Examples:
"""""""""

+------------------+--------------------------------------+
| ``[40 x i32]``   | Array of 40 32-bit integer values.   |
+------------------+--------------------------------------+
| ``[41 x i32]``   | Array of 41 32-bit integer values.   |
+------------------+--------------------------------------+
| ``[4 x i8]``     | Array of 4 8-bit integer values.     |
+------------------+--------------------------------------+

Here are some examples of multidimensional arrays:

+-----------------------------+----------------------------------------------------------+
| ``[3 x [4 x i32]]``         | 3x4 array of 32-bit integer values.                      |
+-----------------------------+----------------------------------------------------------+
| ``[12 x [10 x float]]``     | 12x10 array of single precision floating point values.   |
+-----------------------------+----------------------------------------------------------+
| ``[2 x [3 x [4 x i16]]]``   | 2x3x4 array of 16-bit integer values.                    |
+-----------------------------+----------------------------------------------------------+

There is no restriction on indexing beyond the end of the array implied
by a static type (though there are restrictions on indexing beyond the
bounds of an allocated object in some cases). This means that
single-dimension 'variable sized array' addressing can be implemented in
LLVM with a zero length array type. An implementation of 'pascal style
arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
example.

.. _t_function:

Function Type
^^^^^^^^^^^^^

Overview:
"""""""""

The function type can be thought of as a function signature. It consists
of a return type and a list of formal parameter types. The return type
of a function type is a first class type or a void type.

Syntax:
"""""""

::

      <returntype> (<parameter list>)

...where '``<parameter list>``' is a comma-separated list of type
specifiers. Optionally, the parameter list may include a type ``...``,
which indicates that the function takes a variable number of arguments.
Variable argument functions can access their arguments with the
:ref:`variable argument handling intrinsic <int_varargs>` functions.
'``<returntype>``' is any type except :ref:`label <t_label>`.

Examples:
"""""""""

+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``i32 (i32)``                   | function taking an ``i32``, returning an ``i32``                                                                                                                    |
+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``float (i16, i32 *) *``        | :ref:`Pointer <t_pointer>` to a function that takes an ``i16`` and a :ref:`pointer <t_pointer>` to ``i32``, returning ``float``.                                    |
+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``i32 (i8*, ...)``              | A vararg function that takes at least one :ref:`pointer <t_pointer>` to ``i8`` (char in C), which returns an integer. This is the signature for ``printf`` in LLVM. |
+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``{i32, i32} (i32)``            | A function taking an ``i32``, returning a :ref:`structure <t_struct>` containing two ``i32`` values                                                                 |
+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. _t_struct:

Structure Type
^^^^^^^^^^^^^^

Overview:
"""""""""

The structure type is used to represent a collection of data members
together in memory. The elements of a structure may be any type that has
a size.

Structures in memory are accessed using '``load``' and '``store``' by
getting a pointer to a field with the '``getelementptr``' instruction.
Structures in registers are accessed using the '``extractvalue``' and
'``insertvalue``' instructions.

Structures may optionally be "packed" structures, which indicate that
the alignment of the struct is one byte, and that there is no padding
between the elements. In non-packed structs, padding between field types
is inserted as defined by the DataLayout string in the module, which is
required to match what the underlying code generator expects.

Structures can either be "literal" or "identified". A literal structure
is defined inline with other types (e.g. ``{i32, i32}*``) whereas
identified types are always defined at the top level with a name.
Literal types are uniqued by their contents and can never be recursive
or opaque since there is no way to write one. Identified types can be
recursive, can be opaqued, and are never uniqued.

Syntax:
"""""""

::

      %T1 = type { <type list> }     ; Identified normal struct type
      %T2 = type <{ <type list> }>   ; Identified packed struct type

Examples:
"""""""""

+------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``{ i32, i32, i32 }``        | A triple of three ``i32`` values                                                                                                                                                      |
+------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``{ float, i32 (i32) * }``   | A pair, where the first element is a ``float`` and the second element is a :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32``, returning an ``i32``.  |
+------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``<{ i8, i32 }>``            | A packed struct known to be 5 bytes in size.                                                                                                                                          |
+------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. _t_opaque:

Opaque Structure Types
^^^^^^^^^^^^^^^^^^^^^^

Overview:
"""""""""

Opaque structure types are used to represent named structure types that
do not have a body specified. This corresponds (for example) to the C
notion of a forward declared structure.

Syntax:
"""""""

::

      %X = type opaque
      %52 = type opaque

Examples:
"""""""""

+--------------+-------------------+
| ``opaque``   | An opaque type.   |
+--------------+-------------------+

.. _t_pointer:

Pointer Type
^^^^^^^^^^^^

Overview:
"""""""""

The pointer type is used to specify memory locations. Pointers are
commonly used to reference objects in memory.

Pointer types may have an optional address space attribute defining the
numbered address space where the pointed-to object resides. The default
address space is number zero. The semantics of non-zero address spaces
are target-specific.

Note that LLVM does not permit pointers to void (``void*``) nor does it
permit pointers to labels (``label*``). Use ``i8*`` instead.

Syntax:
"""""""

::

      <type> *

Examples:
"""""""""

+-------------------------+--------------------------------------------------------------------------------------------------------------+
| ``[4 x i32]*``          | A :ref:`pointer <t_pointer>` to :ref:`array <t_array>` of four ``i32`` values.                               |
+-------------------------+--------------------------------------------------------------------------------------------------------------+
| ``i32 (i32*) *``        | A :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32*``, returning an ``i32``. |
+-------------------------+--------------------------------------------------------------------------------------------------------------+
| ``i32 addrspace(5)*``   | A :ref:`pointer <t_pointer>` to an ``i32`` value that resides in address space #5.                           |
+-------------------------+--------------------------------------------------------------------------------------------------------------+

.. _t_vector:

Vector Type
^^^^^^^^^^^

Overview:
"""""""""

A vector type is a simple derived type that represents a vector of
elements. Vector types are used when multiple primitive data are
operated in parallel using a single instruction (SIMD). A vector type
requires a size (number of elements) and an underlying primitive data
type. Vector types are considered :ref:`first class <t_firstclass>`.

Syntax:
"""""""

::

      < <# elements> x <elementtype> >

The number of elements is a constant integer value larger than 0;
elementtype may be any integer or floating point type, or a pointer to
these types. Vectors of size zero are not allowed.

Examples:
"""""""""

+-------------------+--------------------------------------------------+
| ``<4 x i32>``     | Vector of 4 32-bit integer values.               |
+-------------------+--------------------------------------------------+
| ``<8 x float>``   | Vector of 8 32-bit floating-point values.        |
+-------------------+--------------------------------------------------+
| ``<2 x i64>``     | Vector of 2 64-bit integer values.               |
+-------------------+--------------------------------------------------+
| ``<4 x i64*>``    | Vector of 4 pointers to 64-bit integer values.   |
+-------------------+--------------------------------------------------+

Constants
=========

LLVM has several different basic types of constants. This section
describes them all and their syntax.

Simple Constants
----------------

**Boolean constants**
    The two strings '``true``' and '``false``' are both valid constants
    of the ``i1`` type.
**Integer constants**
    Standard integers (such as '4') are constants of the
    :ref:`integer <t_integer>` type. Negative numbers may be used with
    integer types.
**Floating point constants**
    Floating point constants use standard decimal notation (e.g.
    123.421), exponential notation (e.g. 1.23421e+2), or a more precise
    hexadecimal notation (see below). The assembler requires the exact
    decimal value of a floating-point constant. For example, the
    assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
    decimal in binary. Floating point constants must have a :ref:`floating
    point <t_floating>` type.
**Null pointer constants**
    The identifier '``null``' is recognized as a null pointer constant
    and must be of :ref:`pointer type <t_pointer>`.

The one non-intuitive notation for constants is the hexadecimal form of
floating point constants. For example, the form
'``double    0x432ff973cafa8000``' is equivalent to (but harder to read
than) '``double 4.5e+15``'. The only time hexadecimal floating point
constants are required (and the only time that they are generated by the
disassembler) is when a floating point constant must be emitted but it
cannot be represented as a decimal floating point number in a reasonable
number of digits. For example, NaN's, infinities, and other special
values are represented in their IEEE hexadecimal format so that assembly
and disassembly do not cause any bits to change in the constants.

When using the hexadecimal form, constants of types half, float, and
double are represented using the 16-digit form shown above (which
matches the IEEE754 representation for double); half and float values
must, however, be exactly representable as IEE754 half and single
precision, respectively. Hexadecimal format is always used for long
double, and there are three forms of long double. The 80-bit format used
by x86 is represented as ``0xK`` followed by 20 hexadecimal digits. The
128-bit format used by PowerPC (two adjacent doubles) is represented by
``0xM`` followed by 32 hexadecimal digits. The IEEE 128-bit format is
represented by ``0xL`` followed by 32 hexadecimal digits; no currently
supported target uses this format. Long doubles will only work if they
match the long double format on your target. The IEEE 16-bit format
(half precision) is represented by ``0xH`` followed by 4 hexadecimal
digits. All hexadecimal formats are big-endian (sign bit at the left).

There are no constants of type x86mmx.

Complex Constants
-----------------

Complex constants are a (potentially recursive) combination of simple
constants and smaller complex constants.

**Structure constants**
    Structure constants are represented with notation similar to
    structure type definitions (a comma separated list of elements,
    surrounded by braces (``{}``)). For example:
    "``{ i32 4, float 17.0, i32* @G }``", where "``@G``" is declared as
    "``@G = external global i32``". Structure constants must have
    :ref:`structure type <t_struct>`, and the number and types of elements
    must match those specified by the type.
**Array constants**
    Array constants are represented with notation similar to array type
    definitions (a comma separated list of elements, surrounded by
    square brackets (``[]``)). For example:
    "``[ i32 42, i32 11, i32 74 ]``". Array constants must have
    :ref:`array type <t_array>`, and the number and types of elements must
    match those specified by the type.
**Vector constants**
    Vector constants are represented with notation similar to vector
    type definitions (a comma separated list of elements, surrounded by
    less-than/greater-than's (``<>``)). For example:
    "``< i32 42, i32 11, i32 74, i32 100 >``". Vector constants
    must have :ref:`vector type <t_vector>`, and the number and types of
    elements must match those specified by the type.
**Zero initialization**
    The string '``zeroinitializer``' can be used to zero initialize a
    value to zero of *any* type, including scalar and
    :ref:`aggregate <t_aggregate>` types. This is often used to avoid
    having to print large zero initializers (e.g. for large arrays) and
    is always exactly equivalent to using explicit zero initializers.
**Metadata node**
    A metadata node is a structure-like constant with :ref:`metadata
    type <t_metadata>`. For example:
    "``metadata !{ i32 0, metadata !"test" }``". Unlike other
    constants that are meant to be interpreted as part of the
    instruction stream, metadata is a place to attach additional
    information such as debug info.

Global Variable and Function Addresses
--------------------------------------

The addresses of :ref:`global variables <globalvars>` and
:ref:`functions <functionstructure>` are always implicitly valid
(link-time) constants. These constants are explicitly referenced when
the :ref:`identifier for the global <identifiers>` is used and always have
:ref:`pointer <t_pointer>` type. For example, the following is a legal LLVM
file:

.. code-block:: llvm

    @X = global i32 17
    @Y = global i32 42
    @Z = global [2 x i32*] [ i32* @X, i32* @Y ]

.. _undefvalues:

Undefined Values
----------------

The string '``undef``' can be used anywhere a constant is expected, and
indicates that the user of the value may receive an unspecified
bit-pattern. Undefined values may be of any type (other than '``label``'
or '``void``') and be used anywhere a constant is permitted.

Undefined values are useful because they indicate to the compiler that
the program is well defined no matter what value is used. This gives the
compiler more freedom to optimize. Here are some examples of
(potentially surprising) transformations that are valid (in pseudo IR):

.. code-block:: llvm

      %A = add %X, undef
      %B = sub %X, undef
      %C = xor %X, undef
    Safe:
      %A = undef
      %B = undef
      %C = undef

This is safe because all of the output bits are affected by the undef
bits. Any output bit can have a zero or one depending on the input bits.

.. code-block:: llvm

      %A = or %X, undef
      %B = and %X, undef
    Safe:
      %A = -1
      %B = 0
    Unsafe:
      %A = undef
      %B = undef

These logical operations have bits that are not always affected by the
input. For example, if ``%X`` has a zero bit, then the output of the
'``and``' operation will always be a zero for that bit, no matter what
the corresponding bit from the '``undef``' is. As such, it is unsafe to
optimize or assume that the result of the '``and``' is '``undef``'.
However, it is safe to assume that all bits of the '``undef``' could be
0, and optimize the '``and``' to 0. Likewise, it is safe to assume that
all the bits of the '``undef``' operand to the '``or``' could be set,
allowing the '``or``' to be folded to -1.

.. code-block:: llvm

      %A = select undef, %X, %Y
      %B = select undef, 42, %Y
      %C = select %X, %Y, undef
    Safe:
      %A = %X     (or %Y)
      %B = 42     (or %Y)
      %C = %Y
    Unsafe:
      %A = undef
      %B = undef
      %C = undef

This set of examples shows that undefined '``select``' (and conditional
branch) conditions can go *either way*, but they have to come from one
of the two operands. In the ``%A`` example, if ``%X`` and ``%Y`` were
both known to have a clear low bit, then ``%A`` would have to have a
cleared low bit. However, in the ``%C`` example, the optimizer is
allowed to assume that the '``undef``' operand could be the same as
``%Y``, allowing the whole '``select``' to be eliminated.

.. code-block:: llvm

      %A = xor undef, undef

      %B = undef
      %C = xor %B, %B

      %D = undef
      %E = icmp lt %D, 4
      %F = icmp gte %D, 4

    Safe:
      %A = undef
      %B = undef
      %C = undef
      %D = undef
      %E = undef
      %F = undef

This example points out that two '``undef``' operands are not
necessarily the same. This can be surprising to people (and also matches
C semantics) where they assume that "``X^X``" is always zero, even if
``X`` is undefined. This isn't true for a number of reasons, but the
short answer is that an '``undef``' "variable" can arbitrarily change
its value over its "live range". This is true because the variable
doesn't actually *have a live range*. Instead, the value is logically
read from arbitrary registers that happen to be around when needed, so
the value is not necessarily consistent over time. In fact, ``%A`` and
``%C`` need to have the same semantics or the core LLVM "replace all
uses with" concept would not hold.

.. code-block:: llvm

      %A = fdiv undef, %X
      %B = fdiv %X, undef
    Safe:
      %A = undef
    b: unreachable

These examples show the crucial difference between an *undefined value*
and *undefined behavior*. An undefined value (like '``undef``') is
allowed to have an arbitrary bit-pattern. This means that the ``%A``
operation can be constant folded to '``undef``', because the '``undef``'
could be an SNaN, and ``fdiv`` is not (currently) defined on SNaN's.
However, in the second example, we can make a more aggressive
assumption: because the ``undef`` is allowed to be an arbitrary value,
we are allowed to assume that it could be zero. Since a divide by zero
has *undefined behavior*, we are allowed to assume that the operation
does not execute at all. This allows us to delete the divide and all
code after it. Because the undefined operation "can't happen", the
optimizer can assume that it occurs in dead code.

.. code-block:: llvm

    a:  store undef -> %X
    b:  store %X -> undef
    Safe:
    a: <deleted>
    b: unreachable

These examples reiterate the ``fdiv`` example: a store *of* an undefined
value can be assumed to not have any effect; we can assume that the
value is overwritten with bits that happen to match what was already
there. However, a store *to* an undefined location could clobber
arbitrary memory, therefore, it has undefined behavior.

.. _poisonvalues:

Poison Values
-------------

Poison values are similar to :ref:`undef values <undefvalues>`, however
they also represent the fact that an instruction or constant expression
which cannot evoke side effects has nevertheless detected a condition
which results in undefined behavior.

There is currently no way of representing a poison value in the IR; they
only exist when produced by operations such as :ref:`add <i_add>` with
the ``nsw`` flag.

Poison value behavior is defined in terms of value *dependence*:

-  Values other than :ref:`phi <i_phi>` nodes depend on their operands.
-  :ref:`Phi <i_phi>` nodes depend on the operand corresponding to
   their dynamic predecessor basic block.
-  Function arguments depend on the corresponding actual argument values
   in the dynamic callers of their functions.
-  :ref:`Call <i_call>` instructions depend on the :ref:`ret <i_ret>`
   instructions that dynamically transfer control back to them.
-  :ref:`Invoke <i_invoke>` instructions depend on the
   :ref:`ret <i_ret>`, :ref:`resume <i_resume>`, or exception-throwing
   call instructions that dynamically transfer control back to them.
-  Non-volatile loads and stores depend on the most recent stores to all
   of the referenced memory addresses, following the order in the IR
   (including loads and stores implied by intrinsics such as
   :ref:`@llvm.memcpy <int_memcpy>`.)
-  An instruction with externally visible side effects depends on the
   most recent preceding instruction with externally visible side
   effects, following the order in the IR. (This includes :ref:`volatile
   operations <volatile>`.)
-  An instruction *control-depends* on a :ref:`terminator