Skip to content
  1. May 30, 2018
  2. May 29, 2018
    • Craig Topper's avatar
      [X86] Use VR128X instead of VR128 in EVEX instruction patterns. · dbd371e9
      Craig Topper authored
      llvm-svn: 333464
      dbd371e9
    • Craig Topper's avatar
      [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so... · aba57bfe
      Craig Topper authored
      [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
      
      The order should be controlled in the input pattern.
      
      llvm-svn: 333463
      aba57bfe
    • Sam Clegg's avatar
      Fix build error introduced in rL333459 · f4f37509
      Sam Clegg authored
      The DEBUG macro was renamed LLVM_DEBUG.
      
      llvm-svn: 333462
      f4f37509
    • Chandler Carruth's avatar
      [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to · 4cbcbb07
      Chandler Carruth authored
      be both simpler and substantially more efficient.
      
      Rather than use a hand-rolled iteration technique that isn't quite the
      same as RPO, use the pre-built RPO loop body traversal utility.
      
      Once visiting the loop body in RPO, we can assert that we visit defs
      before uses reliably. When this is the case, the only need to iterate is
      when simplifying a def that is used by a PHI node along a back-edge.
      With this patch, the first pass over the loop body is just a complete
      simplification of every instruction across the loop body. When we
      encounter a use of a simplified instruction that stems from a PHI node
      in the loop body that has already been visited (due to some cyclic CFG,
      potentially the loop itself, or a nested loop, or unstructured control
      flow), we recall that specific PHI node for the second iteration.
      Nothing else needs to be preserved from iteration to iteration.
      
      On the second and later iterations, only instructions known to have
      simplified inputs are considered, each time starting from a set of PHIs
      that had simplified inputs along the backedges.
      
      Dead instructions are collected along the way, but deleted in a batch at
      the end of each iteration making the iterations themselves substantially
      simpler. This uses a new batch API for recursively deleting dead
      instructions.
      
      This alsa changes the routine to visit subloops. Because simplification
      is fundamentally transitive, we may need to visit the entire loop body,
      including subloops, to handle knock-on simplification.
      
      I've added a basic test file that helps demonstrate that all of these
      changes work. It includes both straight-forward loops with
      simplifications as well as interesting PHI-structures, CFG-structures,
      and a nested loop case.
      
      Differential Revision: https://reviews.llvm.org/D47407
      
      llvm-svn: 333461
      4cbcbb07
    • Craig Topper's avatar
      [X86] Fix a potential crash that occur after r333419. · 5439b3d1
      Craig Topper authored
      The code could issue a truncate from a small type to larger type. We need to extend in that case instead.
      
      llvm-svn: 333460
      5439b3d1
    • Sam Clegg's avatar
      [WebAssembly] Add more error checking to object file parsing · b7c62394
      Sam Clegg authored
      This should address some of the assert failures the fuzzer has been
      finding such as:
        https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6719
      
      Differential Revision: https://reviews.llvm.org/D47086
      
      llvm-svn: 333459
      b7c62394
    • Matt Arsenault's avatar
      AMDGPU: Fix broken check lines · 4b3829d8
      Matt Arsenault authored
      llvm-svn: 333458
      4b3829d8
    • Matt Arsenault's avatar
      AMDGPU: Fix typo in option description · 2e4d338d
      Matt Arsenault authored
      llvm-svn: 333457
      2e4d338d
    • Matt Arsenault's avatar
      AMDGPU: Round up kernel argument allocation size · 1ea0402e
      Matt Arsenault authored
      AFAIK the driver's allocation will actually have to round this
      up anyway. It is useful to track the rounded up size, so that
      the end of the kernel segment is known to be dereferencable so
      a wider s_load_dword can be used for a short argument at the end
      of the segment.
      
      llvm-svn: 333456
      1ea0402e
    • Sameer AbuAsal's avatar
      [RISCV] Add peepholes for Global Address lowering patterns · 97684419
      Sameer AbuAsal authored
      Summary:
        Base and offset are always separated when a GlobalAddress node is lowered
        (rL332641) as an optimization to reduce instruction count. However, this
        optimization is not profitable if the Global Address ends up being used in only
        instruction.
      
        This patch adds peephole optimizations that merge an offset of
        an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.
      
        The peephole handles three patterns:
      
       1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
           --->
            ADDI (LUI %hi(global + offset)) %lo(global + offset).
      
         This generates:
         lui a0, hi (global + offset)
         add a0, a0, lo (global + offset)
      
         Instead of
      
         lui a0, hi (global)
         addi a0, hi (global)
         addi a0, offset
      
         This pattern is for cases when the offset is small enough to fit in the
         immediate filed of ADDI (less than 12 bits).
      
       2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
           --->
            offset = hi_offset << 12
            ADDI (LUI %hi(global + offset)) %lo(global + offset)
      
         Which generates the ASM:
      
         lui  a0, hi(global + offset)
         addi a0, lo(global + offset)
      
         Instead of:
      
         lui  a0, hi(global)
         addi a0, lo(global)
         lui a1, (offset)
         add a0, a0, a1
      
         This pattern is for cases when the offset doesn't fit in an immediate field
         of ADDI but the lower 12 bits are all zeros.
      
       3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
           --->
              offset = global + offhi20<<12 + offlo12
              ADDI (LUI %hi(global + offset)) %lo(global + offset)
      
         Which generates the ASM:
      
         lui  a1, %hi(global + offset)
         addi a1, %lo(global + offset)
      
         Instead of:
      
         lui  a0, hi(global)
         addi a0, lo(global)
         lui a1, (offhi20)
         addi a1, (offlo12)
         add a0, a0, a1
      
         This pattern is for cases when the offset doesn't fit in an immediate field
         of ADDI and both the lower 1 bits and high 20 bits are non zero.
      
          Reviewers: asb
      
          Reviewed By: asb
      
          Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
        niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
      
      llvm-svn: 333455
      97684419
    • Daniel Neilson's avatar
      [BasicAA] Teach the analysis about atomic memcpy · 3a6c50f4
      Daniel Neilson authored
      Summary:
      A simple change to derive mod/ref info from the atomic memcpy
      intrinsic in the same way as from the regular memcpy intrinsic.
      
      llvm-svn: 333454
      3a6c50f4
    • Douglas Yung's avatar
      Update CodeView register names in a test that was missed in r333421. · 99feb567
      Douglas Yung authored
      llvm-svn: 333453
      99feb567
    • Jan Kratochvil's avatar
      Remove unused DWARFUnit::HasDIEsParsed() · a3ad3c48
      Jan Kratochvil authored
      It was not implemented correctly after https://reviews.llvm.org/D46810 but then
      it has not been used anywhere anyway.
      
      llvm-svn: 333452
      a3ad3c48
    • Konstantin Zhuravlyov's avatar
      AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as · 2ca6b1f2
      Konstantin Zhuravlyov authored
      it is set by CP
      
      Differential Revision: https://reviews.llvm.org/D47392
      
      llvm-svn: 333451
      2ca6b1f2
    • Shoaib Meenai's avatar
      [COFF] Simplify symbol table output section computation · 4e518336
      Shoaib Meenai authored
      Rather than using a loop to compare symbol RVAs to the starting RVAs of
      sections to determine which section a symbol belongs to, just get the
      output section of a symbol directly via its chunk, and bail if the
      symbol doesn't have an output section, which avoids having to hardcode
      logic for handling dead symbols, CodeView symbols, etc. This was
      suggested by Reid Kleckner; thank you.
      
      This also fixes writing out symbol tables in the presence of RVA table
      input sections (e.g. .sxdata and .gfids). Such sections aren't written
      to the output file directly, so their RVA is 0, and the loop would thus
      fail to find an output section for them, resulting in a segfault. Extend
      some existing tests to cover this case.
      
      Fixes PR37584.
      
      Differential Revision: https://reviews.llvm.org/D47391
      
      llvm-svn: 333450
      4e518336
    • Jan Kratochvil's avatar
      Fix compiler unused variable warning in DWARFUnit · 43b40939
      Jan Kratochvil authored
      Alex Langford has reported it from: https://reviews.llvm.org/D46810
      
      llvm-svn: 333449
      43b40939
    • Florian Hahn's avatar
      [TableGen] Use explicit constructor for InstMemo · 33b6f9ac
      Florian Hahn authored
      This should fix a few buildbot failures with old
      GCC versions.
      
      llvm-svn: 333448
      33b6f9ac
    • Akira Hatanaka's avatar
      [CodeGen][Darwin] Set the calling-convention of thread-local variable · 1da9dbbc
      Akira Hatanaka authored
      initialization functions to 'cxx_fast_tlscc'.
      
      This fixes a bug where instructions calling initialization functions for
      thread-local static members of c++ template classes were using calling
      convention 'cxx_fast_tlscc' while the called functions weren't annotated
      with the calling convention.
      
      rdar://problem/40447463
      
      Differential Revision: https://reviews.llvm.org/D47354
      
      llvm-svn: 333447
      1da9dbbc
    • Craig Topper's avatar
Loading