Commits · 7dee697faa5361fc953909b8d4547f2d08af5cab · Roger Ferrer / llvm-epi-0.8

Apr 25, 2013

This patch adds the X86FixupLEAs pass, which will reduce instruction · 8b7ab4ba

Preston Gurd authored Apr 25, 2013

latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573

8b7ab4ba

Jan 08, 2013

Pad Short Functions for Intel Atom · a01daace

Preston Gurd authored Jan 08, 2013

The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879

a01daace

Jan 07, 2013

Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d

Chandler Carruth authored Jan 07, 2013

a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681

664e354d

Jan 05, 2013

Revert revision 171524. Original message: · 478b6a47

Nadav Rotem authored Jan 05, 2013

URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603

478b6a47

Jan 04, 2013

The current Intel Atom microarchitecture has a feature whereby when a function · e36b685a

Preston Gurd authored Jan 04, 2013

returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524

e36b685a

Nov 26, 2012

Remove the X86 Maximal Stack Alignment Check pass as it is no longer necessary. · 4179e3f5

Chad Rosier authored Nov 26, 2012

This pass was conservative in that it always reserved the FP to enable dynamic
stack realignment, which allowed the RA to use aligned spills for vector
registers.  This happens even when spills were not necessary.  The RA has 
since been improved to use unaligned spills when necessary.

The new behavior is to realign the stack if the frame pointer was already
reserved for some other reason, but don't reserve the frame pointer just
because a function contains vector virtual registers.

Part of rdar://12719844

llvm-svn: 168627

4179e3f5

Aug 01, 2012
- Whitespace. · 24c19d20
  Chad Rosier authored Aug 01, 2012
```
llvm-svn: 161122
```
  24c19d20
Jun 01, 2012

Implement the local-dynamic TLS model for x86 (PR3985) · 789acfb6

Hans Wennborg authored Jun 01, 2012

This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818

789acfb6

Mar 17, 2012
- Reorder includes in Target backends to following coding standards. Remove some... · b25fda95
  Craig Topper authored Mar 17, 2012
```
Reorder includes in Target backends to following coding standards. Remove some superfluous forward declarations.

llvm-svn: 152997
```
  b25fda95
Sep 28, 2011

Remove X86-dependent stuff from SSEDomainFix. · 30c81124

Jakob Stoklund Olesen authored Sep 27, 2011

This also enables domain swizzling for AVX code which required a few
trivial test changes.

The pass will be moved to lib/CodeGen shortly.

llvm-svn: 140659

30c81124

Aug 23, 2011

Introduce a pass to insert vzeroupper instructions to avoid AVX to · 2a3ffb5d

Bruno Cardoso Lopes authored Aug 23, 2011

SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.

llvm-svn: 138317

2a3ffb5d

Jul 25, 2011
- Code clean up. · f5bf1953
  Evan Cheng authored Jul 25, 2011
```
llvm-svn: 135954
```
  f5bf1953
- More refactoring. · b2531009
  Evan Cheng authored Jul 25, 2011
```
llvm-svn: 135939
```
  b2531009
- Refactor X86 target to separate MC code from Target code. · 7e763d86
  Evan Cheng authored Jul 25, 2011
```
llvm-svn: 135930
```
  7e763d86
Jul 11, 2011

- Eliminate MCCodeEmitter's dependency on TargetMachine. It now uses MCInstrInfo · c5e6d2f5

Evan Cheng authored Jul 11, 2011

  and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
  detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
  MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
  MCSubtargetInfo so MC code emitter can do the right thing.

llvm-svn: 134884

c5e6d2f5

Jul 07, 2011
- Rename files for consistency. · 3ddfbd32
  Evan Cheng authored Jul 06, 2011
```
llvm-svn: 134546
```
  3ddfbd32
Jun 28, 2011
- Merge XXXGenRegisterNames.inc into XXXGenRegisterInfo.inc · 1e210d08
  Evan Cheng authored Jun 28, 2011
```
llvm-svn: 134024
```
  1e210d08
Jun 25, 2011
- Rename TargetDesc to MCTargetDesc · 3b960aca
  Evan Cheng authored Jun 24, 2011
```
llvm-svn: 133846
```
  3b960aca
Jun 24, 2011

Starting to refactor Target to separate out code that's needed to fully describe · 24753317

Evan Cheng authored Jun 24, 2011

target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.

First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.

llvm-svn: 133782

24753317

Dec 20, 2010
- Add header... · ca2511d8
  Daniel Dunbar authored Dec 20, 2010
```
llvm-svn: 122247
```
  ca2511d8
- X86/MC/Mach-O: Split out createX86MachObjectWriter(). · 7da045e5
  Daniel Dunbar authored Dec 20, 2010
```
llvm-svn: 122246
```
  7da045e5
Jul 16, 2010
- Remove the X86::FP_REG_KILL pseudo-instruction and the X86FloatingPointRegKill · c30b4ddc
  Jakob Stoklund Olesen authored Jul 16, 2010
```
pass that inserted it.

It is no longer necessary to limit the live ranges of FP registers to a single
basic block.

llvm-svn: 108536
```
  c30b4ddc
Jul 10, 2010

Reapply bottom-up fast-isel, with several fixes for x86-32: · d7b5ce33

Dan Gohman authored Jul 10, 2010

 - Check getBytesToPopOnReturn().
 - Eschew ST0 and ST1 for return values.
 - Fix the PIC base register initialization so that it doesn't ever
   fail to end up the top of the entry block.

llvm-svn: 108039

d7b5ce33

Apr 06, 2010

Fix PR6696 and PR6663 · 4dac8906

Jim Grosbach authored Apr 06, 2010

When a frame pointer is not otherwise required, and dynamic stack alignment
is necessary solely due to the spilling of a register with larger alignment
requirements than the default stack alignment, the frame pointer can be both
used as a general purpose register and a frame pointer. That goes poorly, for
obvious reasons. This patch brings back a bit of old logic for identifying
the use of such registers and conservatively reserves the frame pointer
during register allocation in such cases.

For now, implement for X86 only since it's 32-bit linux which is hitting this,
and we want a targeted fix for 2.7. As a follow-on, this will be expanded
to handle other targets, as theoretically the problem could arise elsewhere
as well.

llvm-svn: 100559

4dac8906

Mar 25, 2010

Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings. · 49e121d5

Jakob Stoklund Olesen authored Mar 25, 2010

On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.

The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.

This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.

The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.

llvm-svn: 99524

49e121d5

Mar 24, 2010
- Revert "Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings." · a86ccbfe
  Jakob Stoklund Olesen authored Mar 23, 2010
```
This reverts commit 99345. It was breaking buildbots.

llvm-svn: 99352
```
  a86ccbfe
- Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings. · 31da45b7
  Jakob Stoklund Olesen authored Mar 23, 2010
```
This is work in progress. So far, SSE execution domain tables are added to
X86InstrInfo, and a skeleton pass is enabled with -sse-domain-fix.

llvm-svn: 99345
```
  31da45b7
Mar 11, 2010
- MC: Provide the target triple to AsmBackend constructors. · 245f5b28
  Daniel Dunbar authored Mar 11, 2010
```
llvm-svn: 98220
```
  245f5b28
Feb 21, 2010
- MC/X86: Add stub AsmBackend. · 40eb7f09
  Daniel Dunbar authored Feb 21, 2010
```
llvm-svn: 96763
```
  40eb7f09
Feb 13, 2010
- rip out the 'heinous' x86 MCCodeEmitter implementation. · 509154e0
  Chris Lattner authored Feb 13, 2010
```
We still have the templated X86 JIT emitter, *and* the
almost-copy in X86InstrInfo for getting instruction sizes.

llvm-svn: 96059
```
  509154e0
- give MCCodeEmitters access to the current MCContext. · 741580a5
  Chris Lattner authored Feb 12, 2010
```
llvm-svn: 96038
```
  741580a5
Feb 05, 2010
- wire up 64-bit MCCodeEmitter. · 9c9453e5
  Chris Lattner authored Feb 05, 2010
```
llvm-svn: 95438
```
  9c9453e5
Feb 03, 2010
- stub out a new X86 encoder, which can be tried with · f914be06
  Chris Lattner authored Feb 03, 2010
```
-enable-new-x86-encoder until its stable.

llvm-svn: 95256
```
  f914be06
- rename createX86MCCodeEmitter to more accurately reflect what it creates. · 2f750f3b
  Chris Lattner authored Feb 03, 2010
```
llvm-svn: 95254
```
  2f750f3b
Feb 02, 2010
- remove dead code. · 6a613783
  Chris Lattner authored Feb 02, 2010
```
llvm-svn: 95144
```
  6a613783
Dec 02, 2009
- Factor the stack alignment calculations out into a target independent pass. · 2c3a6c65
  Jim Grosbach authored Dec 02, 2009
```
No functionality change.

llvm-svn: 90336
```
  2c3a6c65
Aug 27, 2009

llvm-mc/X86: Implement single instruction encoding interface for MC. · 981a71c3

Daniel Dunbar authored Aug 27, 2009

 - Note, this is a gigantic hack, with the sole purpose of unblocking further
   work on the assembler (its also possible to test the mathcer more completely
   now).

 - Despite being a hack, its actually good enough to work over all of 403.gcc
   (although some encodings are probably incorrect). This is a testament to the 
   beauty of X86's MachineInstr, no doubt! ;)

llvm-svn: 80234

981a71c3

Jul 25, 2009
- Add new helpers for registering targets. · 5680b4f2
  Daniel Dunbar authored Jul 25, 2009
```
 - Less boilerplate == good.

llvm-svn: 77052
```
  5680b4f2
Jul 19, 2009
- Put Target definitions inside Target specific header, and llvm namespace. · 67038c13
  Daniel Dunbar authored Jul 18, 2009
```
llvm-svn: 76344
```
  67038c13
Jul 15, 2009

Reapply TargetRegistry refactoring commits. · e833810a

Daniel Dunbar authored Jul 15, 2009

--- Reverse-merging r75799 into '.':
 U   test/Analysis/PointerTracking
U    include/llvm/Target/TargetMachineRegistry.h
U    include/llvm/Target/TargetMachine.h
U    include/llvm/Target/TargetRegistry.h
U    include/llvm/Target/TargetSelect.h
U    tools/lto/LTOCodeGenerator.cpp
U    tools/lto/LTOModule.cpp
U    tools/llc/llc.cpp
U    lib/Target/PowerPC/PPCTargetMachine.h
U    lib/Target/PowerPC/AsmPrinter/PPCAsmPrinter.cpp
U    lib/Target/PowerPC/PPCTargetMachine.cpp
U    lib/Target/PowerPC/PPC.h
U    lib/Target/ARM/ARMTargetMachine.cpp
U    lib/Target/ARM/AsmPrinter/ARMAsmPrinter.cpp
U    lib/Target/ARM/ARMTargetMachine.h
U    lib/Target/ARM/ARM.h
U    lib/Target/XCore/XCoreTargetMachine.cpp
U    lib/Target/XCore/XCoreTargetMachine.h
U    lib/Target/PIC16/PIC16TargetMachine.cpp
U    lib/Target/PIC16/PIC16TargetMachine.h
U    lib/Target/Alpha/AsmPrinter/AlphaAsmPrinter.cpp
U    lib/Target/Alpha/AlphaTargetMachine.cpp
U    lib/Target/Alpha/AlphaTargetMachine.h
U    lib/Target/X86/X86TargetMachine.h
U    lib/Target/X86/X86.h
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86AsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86TargetMachine.cpp
U    lib/Target/MSP430/MSP430TargetMachine.cpp
U    lib/Target/MSP430/MSP430TargetMachine.h
U    lib/Target/CppBackend/CPPTargetMachine.h
U    lib/Target/CppBackend/CPPBackend.cpp
U    lib/Target/CBackend/CTargetMachine.h
U    lib/Target/CBackend/CBackend.cpp
U    lib/Target/TargetMachine.cpp
U    lib/Target/IA64/IA64TargetMachine.cpp
U    lib/Target/IA64/AsmPrinter/IA64AsmPrinter.cpp
U    lib/Target/IA64/IA64TargetMachine.h
U    lib/Target/IA64/IA64.h
U    lib/Target/MSIL/MSILWriter.cpp
U    lib/Target/CellSPU/SPUTargetMachine.h
U    lib/Target/CellSPU/SPU.h
U    lib/Target/CellSPU/AsmPrinter/SPUAsmPrinter.cpp
U    lib/Target/CellSPU/SPUTargetMachine.cpp
U    lib/Target/Mips/AsmPrinter/MipsAsmPrinter.cpp
U    lib/Target/Mips/MipsTargetMachine.cpp
U    lib/Target/Mips/MipsTargetMachine.h
U    lib/Target/Mips/Mips.h
U    lib/Target/Sparc/AsmPrinter/SparcAsmPrinter.cpp
U    lib/Target/Sparc/SparcTargetMachine.cpp
U    lib/Target/Sparc/SparcTargetMachine.h
U    lib/ExecutionEngine/JIT/TargetSelect.cpp
U    lib/Support/TargetRegistry.cpp

llvm-svn: 75820

e833810a