Skip to content
  1. Jan 26, 2011
  2. Jan 25, 2011
  3. Jan 24, 2011
  4. Jan 23, 2011
  5. Jan 21, 2011
  6. Jan 20, 2011
    • Duncan Sands's avatar
      At -O123 the early-cse pass is run before instcombine has run. According to my · 8fb2c382
      Duncan Sands authored
      auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
      This patch adds this transform and some related logic to InstructionSimplify
      and removes some of the logic from instcombine (unfortunately not all because
      there are several situations in which instcombine can improve things by making
      new instructions, whereas instsimplify is not allowed to do this).  At -O2 this
      often results in more than 15% more simplifications by early-cse, and results in
      hundreds of lines of bitcode being eliminated from the testsuite.  I did see some
      small negative effects in the testsuite, for example a few additional instructions
      in three programs.  One program, 483.xalancbmk, got an additional 35 instructions,
      which seems to be due to a function getting an additional instruction and then
      being inlined all over the place.
      
      llvm-svn: 123911
      8fb2c382
  7. Jan 19, 2011
  8. Jan 18, 2011
  9. Jan 17, 2011
  10. Jan 16, 2011
    • Anders Carlsson's avatar
      Teach DAE to look for functions whose arguments are unused, and change all... · d3db8334
      Anders Carlsson authored
      Teach DAE to look for functions whose arguments are unused, and change all callers to pass in an undefvalue instead.
      
      llvm-svn: 123596
      d3db8334
    • Chris Lattner's avatar
      tidy up a comment, as suggested by duncan · 7c9f4c9c
      Chris Lattner authored
      llvm-svn: 123590
      7c9f4c9c
    • Rafael Espindola's avatar
      Don't merge two constants if we care about the address of both. · 751677a0
      Rafael Espindola authored
      This fixes the original testcase in PR8927. It also causes a clang
      binary built with a patched clang to increase in size by 0.21%.
      
      We can probably get some of the size back by writing a pass that
      detects that a global never has its pointer compared and adds
      unnamed_addr to it (maybe extend global opt). It is also possible that
      there are some other cases clang could add unnamed_addr to.
      
      I will investigate extending globalopt next.
      
      llvm-svn: 123584
      751677a0
    • Chris Lattner's avatar
      fix PR8932, a case where arg promotion could infinitely promote. · e5f8de86
      Chris Lattner authored
      llvm-svn: 123574
      e5f8de86
    • Chris Lattner's avatar
      simplify a little · ed1fb92c
      Chris Lattner authored
      llvm-svn: 123573
      ed1fb92c
    • Chris Lattner's avatar
      if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, · 6fab2e94
      Chris Lattner authored
      then don't try to decimate it into its individual pieces.  This will just make a mess of the
      IR and is pointless if none of the elements are individually accessed.  This was generating
      really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
      as an {[8 x i8]} structure instead of {i64}.
      
      The testcase now is optimized to:
      
      define i64 @test2(i64 %X) {
        br label %L2
      
      L2:                                               ; preds = %0
        ret i64 %X
      }
      
      before we generated:
      
      define i64 @test2(i64 %X) {
        %sroa.store.elt = lshr i64 %X, 56
        %1 = trunc i64 %sroa.store.elt to i8
        %sroa.store.elt8 = lshr i64 %X, 48
        %2 = trunc i64 %sroa.store.elt8 to i8
        %sroa.store.elt9 = lshr i64 %X, 40
        %3 = trunc i64 %sroa.store.elt9 to i8
        %sroa.store.elt10 = lshr i64 %X, 32
        %4 = trunc i64 %sroa.store.elt10 to i8
        %sroa.store.elt11 = lshr i64 %X, 24
        %5 = trunc i64 %sroa.store.elt11 to i8
        %sroa.store.elt12 = lshr i64 %X, 16
        %6 = trunc i64 %sroa.store.elt12 to i8
        %sroa.store.elt13 = lshr i64 %X, 8
        %7 = trunc i64 %sroa.store.elt13 to i8
        %8 = trunc i64 %X to i8
        br label %L2
      
      L2:                                               ; preds = %0
        %9 = zext i8 %1 to i64
        %10 = shl i64 %9, 56
        %11 = zext i8 %2 to i64
        %12 = shl i64 %11, 48
        %13 = or i64 %12, %10
        %14 = zext i8 %3 to i64
        %15 = shl i64 %14, 40
        %16 = or i64 %15, %13
        %17 = zext i8 %4 to i64
        %18 = shl i64 %17, 32
        %19 = or i64 %18, %16
        %20 = zext i8 %5 to i64
        %21 = shl i64 %20, 24
        %22 = or i64 %21, %19
        %23 = zext i8 %6 to i64
        %24 = shl i64 %23, 16
        %25 = or i64 %24, %22
        %26 = zext i8 %7 to i64
        %27 = shl i64 %26, 8
        %28 = or i64 %27, %25
        %29 = zext i8 %8 to i64
        %30 = or i64 %29, %28
        ret i64 %30
      }
      
      In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
      PHIs are in play that instcombine backs off.  It's better to not generate this stuff
      in the first place.
      
      llvm-svn: 123571
      6fab2e94
    • Chris Lattner's avatar
      Use an irbuilder to get some trivial constant folding when doing a store · 7cd8cf7d
      Chris Lattner authored
      of a constant.
      
      llvm-svn: 123570
      7cd8cf7d
Loading