Skip to content
  1. Feb 15, 2017
  2. Feb 14, 2017
  3. Feb 12, 2017
  4. Feb 10, 2017
  5. Feb 09, 2017
  6. Feb 08, 2017
  7. Feb 07, 2017
  8. Feb 04, 2017
  9. Feb 03, 2017
  10. Feb 02, 2017
    • Nirav Dave's avatar
      Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · 93f9d5ce
      Nirav Dave authored
      This reverts commit r293893 which is miscompiling lua on ARM and
      bootstrapping for x86-windows.
      
      llvm-svn: 293915
      93f9d5ce
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 4442667f
      Nirav Dave authored
          Recommiting after fixing X86 inc/dec chain bug.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 293893
      4442667f
    • Matt Arsenault's avatar
      AMDGPU: Use source modifiers with f16->f32 conversions · 9dba9bd4
      Matt Arsenault authored
      The operand types were defined to fit the fp16_to_fp node, which
      has the half as an integer type. v_cvt_f32_f16 does support
      source modifiers, so change this to have an FP type and modifiers.
      
      For targets without legal f16, this requires recognizing the
      bit operations and trying to produce them.
      
      llvm-svn: 293857
      9dba9bd4
  11. Feb 01, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Account workgroup size in LDS occupancy limits · 2b913b1f
      Stanislav Mekhanoshin authored
      Functions matching LDS use to occupancy return results for a workgroup
      of 64 workitems. The numbers has to be adjusted for bigger workgroups.
      For example a workgroup of size 256 already occupies 4 waves just by
      itself. Given that all numbers of LDS use in the compiler are per
      workgroup, occupancy shall be multiplied by 4 in this case. Each 64
      workitems still limited by the same number, but 4 subrgoups 64 workitems
      each can afford 4 times more LDS to get the same occupancy.
      
      In addition change initializes LDS size in the subtarget to a real value
      for SI+ targets. This is required since LDS size is a variable in these
      calculations.
      
      Differential Revision: https://reviews.llvm.org/D29423
      
      llvm-svn: 293837
      2b913b1f
    • Matt Arsenault's avatar
      AMDGPU: Allow clustering flat memory operations · 74f64833
      Matt Arsenault authored
      llvm-svn: 293809
      74f64833
    • Matt Arsenault's avatar
      AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops · d59e6404
      Matt Arsenault authored
      These were simply preserving the flags of the original operation,
      which was too conservative in most cases and incorrect for mul.
      
      nsw/nuw may be needed for some combines to cleanup messes when
      intermediate sext_inregs are introduced later.
      
      Tested valid combinations with alive.
      
      llvm-svn: 293776
      d59e6404
    • Matt Arsenault's avatar
      AMDGPU: Cleanup fmin/fmax legacy function · da7a6565
      Matt Arsenault authored
      Use a more specific subtarget check and combine hasOneUse checks
      
      llvm-svn: 293726
      da7a6565
Loading