diff --git a/llvm/docs/GlobalISel.rst b/llvm/docs/GlobalISel.rst new file mode 100644 index 0000000000000000000000000000000000000000..fd247a534f68395639ec5fe19b602d9dba1eca4a --- /dev/null +++ b/llvm/docs/GlobalISel.rst @@ -0,0 +1,672 @@ +============================ +Global Instruction Selection +============================ + +.. contents:: + :local: + :depth: 1 + +.. warning:: + This document is a work in progress. It reflects the current state of the + implementation, as well as open design and implementation issues. + +Introduction +============ + +GlobalISel is a framework that provides a set of reusable passes and utilities +for instruction selection --- translation from LLVM IR to target-specific +Machine IR (MIR). + +GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to +solve three major problems: + +* **Performance** --- SelectionDAG introduces a dedicated intermediate + representation, which has a compile-time cost. + + GlobalISel directly operates on the post-isel representation used by the + rest of the code generator, MIR. + It does require extensions to that representation to support arbitrary + incoming IR: :ref:`gmir`. + +* **Granularity** --- SelectionDAG and FastISel operate on individual basic + blocks, losing some global optimization opportunities. + + GlobalISel operates on the whole function. + +* **Modularity** --- SelectionDAG and FastISel are radically different and share + very little code. + + GlobalISel is built in a way that enables code reuse. For instance, both the + optimized and fast selectors share the :ref:`pipeline`, and targets can + configure that pipeline to better suit their needs. + + +.. _gmir: + +Generic Machine IR +================== + +Machine IR operates on physical registers, register classes, and (mostly) +target-specific instructions. + +To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to +Machine IR: + +.. contents:: + :local: + +``NOTE``: +The generic MIR (GMIR) representation still contains references to IR +constructs (such as ``GlobalValue``). Removing those should let us write more +accurate tests, or delete IR after building the initial MIR. However, it is +not part of the GlobalISel effort. + +.. _gmir-instructions: + +Generic Instructions +-------------------- + +The main addition is support for pre-isel generic machine instructions (e.g., +``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or +``PHI``), these are available on all targets. + +``TODO``: +While we're progressively adding instructions, one kind in particular exposes +interesting problems: compares and how to represent condition codes. +Some targets (x86, ARM) have generic comparisons setting multiple flags, +which are then used by predicated variants. +Others (IR) specify the predicate in the comparison and users just get a single +bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to +represent this. + +The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides +a convenient way to create these generic instructions. + +.. _gmir-gvregs: + +Generic Virtual Registers +------------------------- + +Generic instructions operate on a new kind of register: "generic" virtual +registers. As opposed to non-generic vregs, they are not assigned a Register +Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned +a :ref:`gmir-regbank`. + +``MachineRegisterInfo`` tracks the same information that it does for +non-generic vregs (e.g., use-def chains). Additionally, it also tracks the +:ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``, +its :ref:`gmir-regbank`, if any. + +For simplicity, most generic instructions only accept generic vregs: + +* instead of immediates, they use a gvreg defined by an instruction + materializing the immediate value (see :ref:`irtranslator-constants`). +* instead of physical register, they use a gvreg defined by a ``COPY``. + +``NOTE``: +We started with an alternative representation, where MRI tracks a size for +each gvreg, and instructions have lists of types. +That had two flaws: the type and size are redundant, and there was no generic +way of getting a given operand's type (as there was no 1:1 mapping between +instruction types and operands). +We considered putting the type in some variant of MCInstrDesc instead: +See `PR26576 `_: [GlobalISel] Generic MachineInstrs +need a type but this increases the memory footprint of the related objects + +.. _gmir-regbank: + +Register Bank +------------- + +A Register Bank is a set of register classes defined by the target. +A bank has a size, which is the maximum store size of all covered classes. + +In general, cross-class copies inside a bank are expected to be cheaper than +copies across banks. They are also coalesceable by the register coalescer, +whereas cross-bank copies are not. + +Also, equivalent operations can be performed on different banks using different +instructions. + +For example, X86 can be seen as having 3 main banks: general-purpose, x87, and +vector (which could be further split into a bank per domain for single vs +double precision instructions). + +Register banks are described by a target-provided API, +:ref:`RegisterBankInfo `. + +.. _gmir-llt: + +Low Level Type +-------------- + +Additionally, every generic virtual register has a type, represented by an +instance of the ``LLT`` class. + +Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed +integer types. Furthermore, it also has no distinction between integer and +floating-point types: it mainly conveys absolutely necessary information, such +as size and number of vector lanes: + +* ``sN`` for scalars +* ``pN`` for pointers +* ```` for vectors +* ``unsized`` for labels, etc.. + +``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG. + +Here are some LLT examples and their ``EVT`` and ``Type`` equivalents: + + ============= ========= ====================================== + LLT EVT IR Type + ============= ========= ====================================== + ``s1`` ``i1`` ``i1`` + ``s8`` ``i8`` ``i8`` + ``s32`` ``i32`` ``i32`` + ``s32`` ``f32`` ``float`` + ``s17`` ``i17`` ``i17`` + ``s16`` N/A ``{i8, i8}`` + ``s32`` N/A ``[4 x i8]`` + ``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*`` + ``p2`` ``iPTR`` ``i8 addrspace(2)*`` + ``<4 x s32>`` ``v4f32`` ``<4 x float>`` + ``s64`` ``v1f64`` ``<1 x double>`` + ``<3 x s32>`` ``v3i32`` ``<3 x i32>`` + ``unsized`` ``Other`` ``label`` + ============= ========= ====================================== + + +Rationale: instructions already encode a specific interpretation of types +(e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that +information in the type system requires introducing bitcast with no real +advantage for the selector. + +Pointer types are distinguished by address space. This matches IR, as opposed +to SelectionDAG where address space is an attribute on operations. +This representation better supports pointers having different sizes depending +on their addressspace. + +``NOTE``: +Currently, LLT requires at least 2 elements in vectors, but some targets have +the concept of a '1-element vector'. Representing them as their underlying +scalar type is a nice simplification. + +``TODO``: +Currently, non-generic virtual registers, defined by non-pre-isel-generic +instructions, cannot have a type, and thus cannot be used by a pre-isel generic +instruction. Instead, they are given a type using a COPY. We could relax that +and allow types on all vregs: this would reduce the number of MI required when +emitting target-specific MIR early in the pipeline. This should purely be +a compile-time optimization. + +.. _pipeline: + +Core Pipeline +============= + +There are four required passes, regardless of the optimization mode: + +.. contents:: + :local: + +Additional passes can then be inserted at higher optimization levels or for +specific targets. For example, to match the current SelectionDAG set of +transformations: MachineCSE and a better MachineCombiner between every pass. + +``NOTE``: +In theory, not all passes are always necessary. +As an additional compile-time optimization, we could skip some of the passes by +setting the relevant MachineFunction properties. For instance, if the +IRTranslator did not encounter any illegal instruction, it would set the +``legalized`` property to avoid running the :ref:`milegalizer`. +Similarly, we considered specializing the IRTranslator per-target to directly +emit target-specific MI. +However, we instead decided to keep the core pipeline simple, and focus on +minimizing the overhead of the passes in the no-op cases. + + +.. _irtranslator: + +IRTranslator +------------ + +This pass translates the input LLVM IR ``Function`` to a GMIR +``MachineFunction``. + +``TODO``: +This currently doesn't support the more complex instructions, in particular +those involving control flow (``switch``, ``invoke``, ...). +For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass. + +.. _api-calllowering: + +API: CallLowering +^^^^^^^^^^^^^^^^^ + +The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also +implements the ABI's calling convention by lowering calls, returns, and +arguments to the appropriate physical register usage and instruction sequences. + +.. _irtranslator-aggregates: + +Aggregates +^^^^^^^^^^ + +Aggregates are lowered to a single scalar vreg. +This differs from SelectionDAG's multiple vregs via ``GetValueVTs``. + +``TODO``: +As some of the bits are undef (padding), we should consider augmenting the +representation with additional metadata (in effect, caching computeKnownBits +information on vregs). +See `PR26161 `_: [GlobalISel] Value to vreg during +IR to MachineInstr translation for aggregate type + +.. _irtranslator-constants: + +Constant Lowering +^^^^^^^^^^^^^^^^^ + +The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined +by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions. +Currently, these instructions are always emitted in the entry basic block. +In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg. + +This is beneficial as it allows us to fold constants into immediate operands +during :ref:`instructionselect`, while still avoiding redundant materializations +for expensive non-foldable constants. +However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as +these vregs can have long live ranges. + +``TODO``: +We're investigating better placement of these instructions, in fast and +optimized modes. + + +.. _milegalizer: + +Legalizer +--------- + +This pass transforms the generic machine instructions such that they are legal. + +A legal instruction is defined as: + +* **selectable** --- the target will later be able to select it to a + target-specific (non-generic) instruction. + +* operating on **vregs that can be loaded and stored** -- if necessary, the + target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand. + +As opposed to SelectionDAG, there are no legalization phases. In particular, +'type' and 'operation' legalization are not separate. + +Legalization is iterative, and all state is contained in GMIR. To maintain the +validity of the intermediate code, instructions are introduced: + +* ``G_SEQUENCE`` --- concatenate multiple registers into a single wider + register. + +* ``G_EXTRACT`` --- extract multiple registers (as contiguous sequences of bits) + from a single wider register. + +As they are expected to be temporary byproducts of the legalization process, +they are combined at the end of the :ref:`milegalizer` pass. +If any remain, they are expected to always be selectable, using loads and stores +if necessary. + +.. _api-legalizerinfo: + +API: LegalizerInfo +^^^^^^^^^^^^^^^^^^ + +Currently the API is broadly similar to SelectionDAG/TargetLowering, but +extended in two ways: + +* The set of available actions is wider, avoiding the currently very + overloaded ``Expand`` (which can cover everything from libcalls to + scalarization depending on the node's opcode). + +* Since there's no separate type legalization, independently varying + types on an instruction can have independent actions. For example a + ``G_ICMP`` has 2 independent types: the result and the inputs; we need + to be able to say that comparing 2 s32s is OK, but the s1 result + must be dealt with in another way. + +As such, the primary key when deciding what to do is the ``InstrAspect``, +essentially a tuple consisting of ``(Opcode, TypeIdx, Type)`` and mapping to a +suggested course of action. + +An example use might be: + + .. code-block:: c++ + + // The CPU can't deal with an s1 result, do something about it. + setAction({G_ICMP, 0, s1}, WidenScalar); + // An s32 input (the second type) is fine though. + setAction({G_ICMP, 1, s32}, Legal); + + +``TODO``: +An alternative worth investigating is to generalize the API to represent +actions using ``std::function`` that implements the action, instead of explicit +enum tokens (``Legal``, ``WidenScalar``, ...). + +``TODO``: +Moreover, we could use TableGen to initially infer legality of operation from +existing patterns (as any pattern we can select is by definition legal). +Expanding that to describe legalization actions is a much larger but +potentially useful project. + +.. _milegalizer-scalar-narrow: + +Scalar narrow types +^^^^^^^^^^^^^^^^^^^ + +In the AArch64 port, we currently mark as legal operations on narrow integer +types that have a legal equivalent in a wider type. + +For example, this: + + %2(GPR,s8) = G_ADD %0, %1 + +is selected to a 32-bit instruction: + + %2(GPR32) = ADDWrr %0, %1 + +This avoids unnecessarily legalizing operations that can be seen as legal: +8-bit additions are supported, but happen to have a 32-bit result with the high +24 bits undefined. + +``TODO``: +This has implications regarding vreg classes (as narrow values can now be +represented by wider vregs) and should be investigated further. + +``TODO``: +In particular, s1 comparison results can be represented as wider values in +different ways. +SelectionDAG has the notion of BooleanContents, which allows targets to choose +what true and false are when in a larger register: + +* ``ZeroOrOne`` --- if only 0 and 1 are valid bools, even in a larger register. +* ``ZeroOrMinusOne`` --- if -1 is true (common for vector instructions, + where compares produce -1). +* ``Undefined`` --- if only the low bit is relevant in determining truth. + +.. _milegalizer-non-power-of-2: + +Non-power of 2 types +^^^^^^^^^^^^^^^^^^^^ + +``TODO``: +Types which have a size that isn't a power of 2 aren't currently supported. +The setAction API will probably require changes to support them. +Even notionally explicitly specified operations only make suggestions +like "Widen" or "Narrow". The eventual type is still unspecified and a +search is performed by repeated doubling/halving of the type's +size. +This is incorrect for types that aren't a power of 2. It's reasonable to +expect we could construct an efficient set of side-tables for more general +lookups though, encoding a map from the integers (i.e. the size of the current +type) to types (the legal size). + +.. _milegalizer-vector: + +Vector types +^^^^^^^^^^^^ + +Vectors first get their element type legalized: ```` becomes +```` such that at least one operation is legal with ``sC``. + +This is currently specified by the function ``setScalarInVectorAction``, called +for example as: + + setScalarInVectorAction(G_ICMP, s1, WidenScalar); + +Next the number of elements is chosen so that the entire operation is +legal. This aspect is not controllable at the moment, but probably +should be (you could imagine disagreements on whether a ``<2 x s8>`` +operation should be scalarized or extended to ``<8 x s8>``). + + +.. _regbankselect: + +RegBankSelect +------------- + +This pass constrains the :ref:`gmir-gvregs` operands of generic +instructions to some :ref:`gmir-regbank`. + +It iteratively maps instructions to a set of per-operand bank assignment. +The possible mappings are determined by the target-provided +:ref:`RegisterBankInfo `. +The mapping is then applied, possibly introducing ``COPY`` instructions if +necessary. + +It traverses the ``MachineFunction`` top down so that all operands are already +mapped when analyzing an instruction. + +This pass could also remap target-specific instructions when beneficial. +In the future, this could replace the ExeDepsFix pass, as we can directly +select the best variant for an instruction that's available on multiple banks. + +.. _api-registerbankinfo: + +API: RegisterBankInfo +^^^^^^^^^^^^^^^^^^^^^ + +The ``RegisterBankInfo`` class describes multiple aspects of register banks. + +* **Banks**: ``addRegBankCoverage`` --- which register bank covers each + register class. + +* **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank + to another. + +* **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for + a given instruction. + +* **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other + possible bank assignments for a given instruction. + +``TODO``: +All this information should eventually be static and generated by TableGen, +mostly using existing information augmented by bank descriptions. + +``TODO``: +``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping`` +because the latter is more expensive: as we move to static mapping info, +both methods should be free, and we should merge them. + +.. _regbankselect-modes: + +RegBankSelect Modes +^^^^^^^^^^^^^^^^^^^ + +``RegBankSelect`` currently has two modes: + +* **Fast** --- For each instruction, pick a target-provided "default" bank + assignment. This is the default at -O0. + +* **Greedy** --- For each instruction, pick the cheapest of several + target-provided bank assignment alternatives. + +We intend to eventually introduce an additional optimizing mode: + +* **Global** --- Across multiple instructions, pick the cheapest combination of + bank assignments. + +``NOTE``: +On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at +backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from +integer scalars, the default assignment for loads and stores is the integer +bank, introducing cross-bank copies on most floating point operations. + + +.. _instructionselect: + +InstructionSelect +----------------- + +This pass transforms generic machine instructions into equivalent +target-specific instructions. It traverses the ``MachineFunction`` bottom-up, +selecting uses before definitions, enabling trivial dead code elimination. + +.. _api-instructionselector: + +API: InstructionSelector +^^^^^^^^^^^^^^^^^^^^^^^^ + +The target implements the ``InstructionSelector`` class, containing the +target-specific selection logic proper. + +The instance is provided by the subtarget, so that it can specialize the +selector by subtarget feature (with, e.g., a vector selector overriding parts +of a general-purpose common selector). +We might also want to parameterize it by MachineFunction, to enable selector +variants based on function attributes like optsize. + +The simple API consists of: + + .. code-block:: c++ + + virtual bool select(MachineInstr &MI) + +This target-provided method is responsible for mutating (or replacing) a +possibly-generic MI into a fully target-specific equivalent. +It is also responsible for doing the necessary constraining of gvregs into the +appropriate register classes. + +The ``InstructionSelector`` can fold other instructions into the selected MI, +by walking the use-def chain of the vreg operands. +As GlobalISel is Global, this folding can occur across basic blocks. + +``TODO``: +Currently, the Select pass is implemented with hand-written c++, similar to +FastISel, rather than backed by tblgen'erated pattern-matching. +We intend to eventually reuse SelectionDAG patterns. + + +.. _maintainability: + +Maintainability +=============== + +.. _maintainability-iterative: + +Iterative Transformations +------------------------- + +Passes are split into small, iterative transformations, with all state +represented in the MIR. + +This differs from SelectionDAG (in particular, the legalizer) using various +in-memory side-tables. + + +.. _maintainability-mir: + +MIR Serialization +----------------- + +.. FIXME: Update the MIRLangRef to include GMI additions. + +:ref:`gmir` is serializable (see :doc:`MIRLangRef`). +Combined with :ref:`maintainability-iterative`, this enables much finer-grained +testing, rather than requiring large and fragile IR-to-assembly tests. + +The current "stage" in the :ref:`pipeline` is represented by a set of +``MachineFunctionProperties``: + +* ``legalized`` +* ``regBankSelected`` +* ``selected`` + + +.. _maintainability-verifier: + +MachineVerifier +--------------- + +The pass approach lets us use the ``MachineVerifier`` to enforce invariants. +For instance, a ``regBankSelected`` function may not have gvregs without +a bank. + +``TODO``: +The ``MachineVerifier`` being monolithic, some of the checks we want to do +can't be integrated to it: GlobalISel is a separate library, so we can't +directly reference it from CodeGen. For instance, legality checks are +currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out +the checks, or we could add some sort of verifier API. + + +.. _progress: + +Progress and Future Work +======================== + +The initial goal is to replace FastISel on AArch64. The next step will be to +replace SelectionDAG as the optimized ISel. + +``NOTE``: +While we iterate on GlobalISel, we strive to avoid affecting the performance of +SelectionDAG, FastISel, or the other MIR passes. For instance, the types of +:ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``, +that is destroyed after :ref:`instructionselect`. + +.. _progress-fastisel: + +FastISel Replacement +-------------------- + +For the initial FastISel replacement, we intend to fallback to SelectionDAG on +selection failures. + +Currently, compile-time of the fast pipeline is within 1.5x of FastISel. +We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be +challenging given the multi-pass approach. +Still, supporting all IR (via a complete legalizer) and avoiding the fallback +to SelectionDAG in the worst case should enable better amortized performance +than SelectionDAG+FastISel. + +``NOTE``: +We considered never having a fallback to SelectionDAG, instead deciding early +whether a given function is supported by GlobalISel or not. The decision would +be based on :ref:`milegalizer` queries. +We abandoned that for two reasons: +a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`; +b) to be robust against unforeseen failures and to enable iterative +improvements. + +.. _progress-targets: + +Support For Other Targets +------------------------- + +In parallel, we're investigating adding support for other - ideally quite +different - targets. For instance, there is some initial AMDGPU support. + + +.. _porting: + +Porting GlobalISel to A New Target +================================== + +There are four major classes to implement by the target: + +* :ref:`CallLowering ` --- lower calls, returns, and arguments + according to the ABI. +* :ref:`RegisterBankInfo ` --- describe + :ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of + operands onto banks for each instruction. +* :ref:`LegalizerInfo ` --- describe what is legal, and how + to legalize what isn't. +* :ref:`InstructionSelector ` --- select generic MIR + to target-specific MIR. + +Additionally: + +* ``TargetPassConfig`` --- create the passes constituting the pipeline, + including additional passes not included in the :ref:`pipeline`. +* ``GISelAccessor`` --- setup the various subtarget-provided classes, with a + graceful fallback to no-op when GlobalISel isn't enabled. diff --git a/llvm/docs/index.rst b/llvm/docs/index.rst index e24d795946e7c20b4c75e87514824f3fdf362ca6..29f2bd8955a7279b15a6de048d124a699e7c7349 100644 --- a/llvm/docs/index.rst +++ b/llvm/docs/index.rst @@ -272,6 +272,7 @@ For API clients and LLVM developers. FaultMaps MIRLangRef Coroutines + GlobalISel :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -390,6 +391,9 @@ For API clients and LLVM developers. :doc:`Coroutines` LLVM support for coroutines. +:doc:`GlobalISel` + This describes the prototype instruction selection replacement, GlobalISel. + Development Process Documentation =================================