Commits · 3854bad4a1936dcf481926dba0e2bd28a3573843 · Roger Ferrer / llvm-epi-0.8

Mar 24, 2011

Andrew Trick authored Mar 23, 2011

I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181

4ab9a165

Mar 23, 2011
- Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS. · 4046a0de
  Andrew Trick authored Mar 23, 2011
```
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
```
  4046a0de
Mar 21, 2011

Re-apply r127953 with fixes: eliminate empty return block if it has no... · 0663f23b

Evan Cheng authored Mar 21, 2011

Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.

llvm-svn: 127981

0663f23b

Mar 19, 2011

Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors · 327cd36f
Daniel Dunbar authored Mar 19, 2011
```
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
```
327cd36f

SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR · 824a7113

Evan Cheng authored Mar 19, 2011

to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953

824a7113

Add support for legalizing UINT_TO_FP of vectors on platforms which do · e7a101cc

Nadav Rotem authored Mar 19, 2011

not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951

e7a101cc

Mar 18, 2011

Revert r127852; it's apparently causing an ICE on mingw. · 59721e32
Eli Friedman authored Mar 18, 2011
```
llvm-svn: 127909
```
59721e32

Add a target-specific branchless method for double-width relational · 1a916a3c

Eli Friedman authored Mar 18, 2011

comparisons on x86.  Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.

This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.

llvm-svn: 127852

1a916a3c

Mar 17, 2011
- Move more logic into getTypeForExtArgOrReturn. · 2ef0c69d
  Cameron Zwarich authored Mar 17, 2011
```
llvm-svn: 127809
```
  2ef0c69d
- Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn(). · 34e7b3f7
  Cameron Zwarich authored Mar 17, 2011
```
llvm-svn: 127807
```
  34e7b3f7
Mar 16, 2011

The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte · ac106273

Cameron Zwarich authored Mar 16, 2011

rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766

ac106273

Mar 11, 2011

Change the x86 32-bit scheduler to register pressure and fix up the · cf56a503

Eric Christopher authored Mar 11, 2011

corresponding testcases back to the previous versions.

Fixes some performance regressions only seen on 32-bit.

llvm-svn: 127441

cf56a503

Mar 10, 2011
- Revert 127359; it broke lencod. · d17ae4e9
  Stuart Hastings authored Mar 10, 2011
```
llvm-svn: 127382
```
  d17ae4e9
Mar 09, 2011
- X86 byval copies no longer always_inline. <rdar://problem/8706628> · 9955e2f9
  Stuart Hastings authored Mar 09, 2011
```
llvm-svn: 127359
```
  9955e2f9
- Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4. · 58d1f93b
  NAKAMURA Takumi authored Mar 09, 2011
```
llvm-svn: 127328
```
  58d1f93b
Mar 08, 2011

X86: Fix the (saddo/ssub x, 1) -> incl/decl selection to check the right operand for 1. · 679cfb54
Benjamin Kramer authored Mar 08, 2011
```
Found by inspection.

llvm-svn: 127247
```
679cfb54

Turn on list-ilp scheduling by default on x86 and x86-64, fix up · eb19e9e9

Eric Christopher authored Mar 08, 2011

testcases accordingly. Some are currently xfailed and will be filed
as bugs to be fixed or understood.

Performance results:

roughly neutral on SPEC
some micro benchmarks in the llvm suite are up between 100 and 150%, only
a pair of regressions that are due to be investigated

john-the-ripper saw:
10% improvement in traditional DES
8% improvement in BSDI DES
59% improvement in FreeBSD MD5
67% improvement in OpenBSD Blowfish
14% improvement in LM DES

Small compile time impact.

llvm-svn: 127208

eb19e9e9

Mar 07, 2011
- Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo. · df616944
  Cameron Zwarich authored Mar 07, 2011
```
llvm-svn: 127175
```
  df616944
Mar 05, 2011

Increased the register pressure limit on x86_64 from 8 to 12 · 641e2d4f

Andrew Trick authored Mar 05, 2011

regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.

Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.

Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.

llvm-svn: 127067

641e2d4f

Mar 02, 2011

[AVX] Fix mask predicates for 256-bit UNPCKLPS/D and implement · dd567b21

David Greene authored Mar 02, 2011

      missing patterns for them.

      Add a SIMD test subdirectory to hold tests for SIMD instruction
      selection correctness and quality.
'

llvm-svn: 126845

dd567b21

Feb 28, 2011

· 20a1cbef

David Greene authored Feb 28, 2011

[AVX] Add decode support for VUNPCKLPS/D instructions, both 128-bit
      and 256-bit forms.  Because the number of elements in a vector
      does not determine the vector type (4 elements could be v4f32 or
      v4f64), pass the full type of the vector to decode routines.

llvm-svn: 126664

20a1cbef

Feb 25, 2011
- Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS. · b2c80da4
  Owen Anderson authored Feb 25, 2011
```
llvm-svn: 126518
```
  b2c80da4
Feb 24, 2011
- remove command line option debugging hook. · 0152b7bc
  Chris Lattner authored Feb 24, 2011
```
llvm-svn: 126441
```
  0152b7bc
Feb 23, 2011

· 9a6040dc

David Greene authored Feb 22, 2011

[AVX] General VUNPCKL codegen support.

llvm-svn: 126264

9a6040dc

Feb 22, 2011

Revert r124611 - "Keep track of incoming argument's location while emitting LiveIns." · f3292b21

Devang Patel authored Feb 21, 2011

In other words, do not keep track of argument's location.  The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working. 
 - The debugger needs to be aware of prolog_end attribute attached with line table entries.
 - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)

llvm-svn: 126155

f3292b21

Feb 20, 2011

If both operands are loads from stores in memory we can't use movlpd/movlps · ac6b001f

Eric Christopher authored Feb 20, 2011

since one needs to be a register operand. Just use movss instead of forcing
an operand into a register.

Fixes PR9239

llvm-svn: 126072

ac6b001f

Feb 19, 2011
- Fix typos. · c509ff69
  Eric Christopher authored Feb 19, 2011
```
llvm-svn: 126018
```
  c509ff69
Feb 17, 2011

· 3a2b508e

David Greene authored Feb 17, 2011

[AVX] Recorganize X86ShuffleDecode into its own library
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a.  Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients.  Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror.  Putting this in its own
library solves both problems at once.

llvm-svn: 125765

3a2b508e

Feb 16, 2011
- Swap VT and DebugLoc operands of getExtLoad() for consistency with · 81c43060
  Stuart Hastings authored Feb 16, 2011
```
other getNode() methods.  Radar 9002173.

llvm-svn: 125665
```
  81c43060
Feb 13, 2011

Enhance ComputeMaskedBits to know that aligned frameindexes · 46c01a30

Chris Lattner authored Feb 13, 2011

have their low bits set to zero.  This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.

Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively.  Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST).  The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.

llvm-svn: 125470

46c01a30

Feb 11, 2011

· 79827a5a

David Greene authored Feb 10, 2011

[AVX] Implement 256-bit vector lowering for SCALAR_TO_VECTOR.  This
largely completes support for 128-bit fallback lowering for code that
is not 256-bit ready.

llvm-svn: 125315

79827a5a

Feb 10, 2011

· ce318e49

David Greene authored Feb 10, 2011

[AVX] Implement 256-bit vector lowering for EXTRACT_VECTOR_ELT.

llvm-svn: 125284

ce318e49

Feb 09, 2011

· b36195ab

David Greene authored Feb 09, 2011

[AVX] Implement 256-bit vector lowering for INSERT_VECTOR_ELT.

llvm-svn: 125187

b36195ab

Feb 08, 2011

· 10b0db1d

David Greene authored Feb 08, 2011

[AVX] Implement BUILD_VECTOR lowering for 256-bit vectors.  For
anything but the simplest of cases, lower a 256-bit BUILD_VECTOR by
splitting it into 128-bit parts and recombining.

llvm-svn: 125105

10b0db1d

Feb 07, 2011

· 79651c52

David Greene authored Feb 07, 2011

[AVX] Insert/extract subvector lowering support.  This includes a
couple of utility functions that will be used in other places for more
AVX lowering.

llvm-svn: 125029

79651c52

Feb 05, 2011

Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be... · 1850c80a
NAKAMURA Takumi authored Feb 05, 2011
```
Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one.

llvm-svn: 124949
```
1850c80a
lib/Target/X86/X86ISelLowering.cpp: Introduce a new variable "IsWin64". No functional changes. · b21c3db9
NAKAMURA Takumi authored Feb 05, 2011
```
llvm-svn: 124948
```
b21c3db9
Target/X86: Fix whitespace. · f7f319d4
NAKAMURA Takumi authored Feb 05, 2011
```
llvm-svn: 124946
```
f7f319d4

· 96d07a82

David Greene authored Feb 05, 2011

[AVX] Revert 124910 until clients are ready.

llvm-svn: 124912

96d07a82

· bdd48150

David Greene authored Feb 04, 2011

[AVX] Add some utilities to insert and extract 128-bit subvectors.
This allows us to easily support 256-bit operations that don't have
native 256-bit support.  This applies to integer operations, certain
types of shuffles and various othher things.

llvm-svn: 124910

bdd48150