Skip to content
  1. Jan 24, 2011
  2. Jan 23, 2011
  3. Jan 18, 2011
  4. Jan 17, 2011
  5. Jan 16, 2011
    • Chris Lattner's avatar
      tidy up a comment, as suggested by duncan · 7c9f4c9c
      Chris Lattner authored
      llvm-svn: 123590
      7c9f4c9c
    • Chris Lattner's avatar
      if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, · 6fab2e94
      Chris Lattner authored
      then don't try to decimate it into its individual pieces.  This will just make a mess of the
      IR and is pointless if none of the elements are individually accessed.  This was generating
      really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
      as an {[8 x i8]} structure instead of {i64}.
      
      The testcase now is optimized to:
      
      define i64 @test2(i64 %X) {
        br label %L2
      
      L2:                                               ; preds = %0
        ret i64 %X
      }
      
      before we generated:
      
      define i64 @test2(i64 %X) {
        %sroa.store.elt = lshr i64 %X, 56
        %1 = trunc i64 %sroa.store.elt to i8
        %sroa.store.elt8 = lshr i64 %X, 48
        %2 = trunc i64 %sroa.store.elt8 to i8
        %sroa.store.elt9 = lshr i64 %X, 40
        %3 = trunc i64 %sroa.store.elt9 to i8
        %sroa.store.elt10 = lshr i64 %X, 32
        %4 = trunc i64 %sroa.store.elt10 to i8
        %sroa.store.elt11 = lshr i64 %X, 24
        %5 = trunc i64 %sroa.store.elt11 to i8
        %sroa.store.elt12 = lshr i64 %X, 16
        %6 = trunc i64 %sroa.store.elt12 to i8
        %sroa.store.elt13 = lshr i64 %X, 8
        %7 = trunc i64 %sroa.store.elt13 to i8
        %8 = trunc i64 %X to i8
        br label %L2
      
      L2:                                               ; preds = %0
        %9 = zext i8 %1 to i64
        %10 = shl i64 %9, 56
        %11 = zext i8 %2 to i64
        %12 = shl i64 %11, 48
        %13 = or i64 %12, %10
        %14 = zext i8 %3 to i64
        %15 = shl i64 %14, 40
        %16 = or i64 %15, %13
        %17 = zext i8 %4 to i64
        %18 = shl i64 %17, 32
        %19 = or i64 %18, %16
        %20 = zext i8 %5 to i64
        %21 = shl i64 %20, 24
        %22 = or i64 %21, %19
        %23 = zext i8 %6 to i64
        %24 = shl i64 %23, 16
        %25 = or i64 %24, %22
        %26 = zext i8 %7 to i64
        %27 = shl i64 %26, 8
        %28 = or i64 %27, %25
        %29 = zext i8 %8 to i64
        %30 = or i64 %29, %28
        ret i64 %30
      }
      
      In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
      PHIs are in play that instcombine backs off.  It's better to not generate this stuff
      in the first place.
      
      llvm-svn: 123571
      6fab2e94
    • Chris Lattner's avatar
      Use an irbuilder to get some trivial constant folding when doing a store · 7cd8cf7d
      Chris Lattner authored
      of a constant.
      
      llvm-svn: 123570
      7cd8cf7d
    • Chris Lattner's avatar
      enhance FoldOpIntoPhi in instcombine to try harder when a phi has · d55581de
      Chris Lattner authored
      multiple uses.  In some cases, all the uses are the same operation,
      so instcombine can go ahead and promote the phi.  In the testcase
      this pushes an add out of the loop.
      
      llvm-svn: 123568
      d55581de
  6. Jan 15, 2011
  7. Jan 14, 2011
  8. Jan 13, 2011
    • Bob Wilson's avatar
      Fix whitespace. · 328e91bb
      Bob Wilson authored
      llvm-svn: 123396
      328e91bb
    • Bob Wilson's avatar
      Check for empty structs, and for consistency, zero-element arrays. · c8056a95
      Bob Wilson authored
      llvm-svn: 123383
      c8056a95
    • Bob Wilson's avatar
      Extend SROA to handle arrays accessed as homogeneous structs and vice versa. · 08713d3c
      Bob Wilson authored
      This is a minor extension of SROA to handle a special case that is
      important for some ARM NEON operations.  Some of the NEON intrinsics
      return multiple values, which are handled as struct types containing
      multiple elements of the same vector type.  The corresponding return
      types declared in the arm_neon.h header have equivalent arrays.  We
      need SROA to recognize that it can split up those arrays and structs
      into separate vectors, even though they are not always accessed with
      the same type.  SROA already handles loads and stores of an entire
      alloca by using insertvalue/extractvalue to access the individual
      pieces, and that code works the same regardless of whether the type
      is a struct or an array.  So, all that needs to be done is to check
      for compatible arrays and homogeneous structs.
      
      llvm-svn: 123381
      08713d3c
    • Bob Wilson's avatar
      Make SROA more aggressive with allocas containing padding. · 12eec40c
      Bob Wilson authored
      SROA only split up structs and arrays one level at a time, so padding can
      only cause trouble if it is located in between the struct or array elements.
      
      llvm-svn: 123380
      12eec40c
  9. Jan 02, 2011
  10. Dec 26, 2010
  11. Dec 23, 2010
  12. Dec 15, 2010
  13. Nov 24, 2010
  14. Nov 20, 2010
  15. Nov 18, 2010
    • Chris Lattner's avatar
      finish a thought. · 1e37bbaf
      Chris Lattner authored
      llvm-svn: 119690
      1e37bbaf
    • Chris Lattner's avatar
      allow eliminating an alloca that is just copied from an constant global · ac570131
      Chris Lattner authored
      if it is passed as a byval argument.  The byval argument will just be a
      read, so it is safe to read from the original global instead.  This allows
      us to promote away the %agg.tmp alloca in PR8582
      
      llvm-svn: 119686
      ac570131
    • Chris Lattner's avatar
      enhance the "alloca is just a memcpy from constant global" · f183d5c4
      Chris Lattner authored
      to ignore calls that obviously can't modify the alloca
      because they are readonly/readnone.
      
      llvm-svn: 119683
      f183d5c4
    • Chris Lattner's avatar
      fix a small oversight in the "eliminate memcpy from constant global" · 7aeae25c
      Chris Lattner authored
      optimization.  If the alloca that is "memcpy'd from constant" also has
      a memcpy from *it*, ignore it: it is a load.  We now optimize the testcase to:
      
      define void @test2() {
        %B = alloca %T
        %a = bitcast %T* @G to i8*
        %b = bitcast %T* %B to i8*
        call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false)
        call void @bar(i8* %b)
        ret void
      }
      
      previously we would generate:
      
      define void @test() {
        %B = alloca %T
        %b = bitcast %T* %B to i8*
        %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0
        %tmp3 = load i8* %G.0, align 4
        %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1
        %G.15 = bitcast [123 x i8]* %G.1 to i8*
        %1 = bitcast [123 x i8]* %G.1 to i984*
        %srcval = load i984* %1, align 1
        %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0
        store i8 %tmp3, i8* %B.0, align 4
        %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1
        %B.12 = bitcast [123 x i8]* %B.1 to i8*
        %2 = bitcast [123 x i8]* %B.1 to i984*
        store i984 %srcval, i984* %2, align 1
        call void @bar(i8* %b)
        ret void
      }
      
      llvm-svn: 119682
      7aeae25c
  16. Oct 19, 2010
    • Owen Anderson's avatar
      Get rid of static constructors for pass registration. Instead, every pass... · 6c18d1aa
      Owen Anderson authored
      Get rid of static constructors for pass registration.  Instead, every pass exposes an initializeMyPassFunction(), which
      must be called in the pass's constructor.  This function uses static dependency declarations to recursively initialize
      the pass's dependencies.
      
      Clients that only create passes through the createFooPass() APIs will require no changes.  Clients that want to use the
      CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
      before parsing commandline arguments.
      
      I have tested this with all standard configurations of clang and llvm-gcc on Darwin.  It is possible that there are problems
      with the static dependencies that will only be visible with non-standard options.  If you encounter any crash in pass
      registration/creation, please send the testcase to me directly.
      
      llvm-svn: 116820
      6c18d1aa
  17. Oct 16, 2010
  18. Oct 12, 2010
  19. Oct 08, 2010
  20. Oct 01, 2010
    • Dale Johannesen's avatar
      Massive rewrite of MMX: · dd224d23
      Dale Johannesen authored
      The x86_mmx type is used for MMX intrinsics, parameters and
      return values where these use MMX registers, and is also
      supported in load, store, and bitcast.
      
      Only the above operations generate MMX instructions, and optimizations
      do not operate on or produce MMX intrinsics. 
      
      MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
      smaller pieces.  Optimizations may occur on these forms and the
      result casted back to x86_mmx, provided the result feeds into a
      previous existing x86_mmx operation.
      
      The point of all this is prevent optimizations from introducing
      MMX operations, which is unsafe due to the EMMS problem.
      
      llvm-svn: 115243
      dd224d23
  21. Sep 02, 2010
  22. Sep 01, 2010
    • Chris Lattner's avatar
      add a gross hack to work around a problem that Argiris reported · 34e5361e
      Chris Lattner authored
      on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
      which then cause random problems because the X86 backend is
      producing mmx stuff without inserting proper emms calls.
      
      In the short term, force off MMX datatypes.  In the long term,
      the X86 backend should not select generic vector types to MMX
      registers.  This is being worked on, but won't be done in time
      for 2.8.  rdar://8380055
      
      llvm-svn: 112696
      34e5361e
  23. Aug 18, 2010
Loading