Skip to content
  1. Jun 12, 2015
    • John Brawn's avatar
      [ARM] Disabling vfp4 should disable fp16 · d9e39d53
      John Brawn authored
      ARMTargetParser::getFPUFeatures should disable fp16 whenever it
      disables vfp4, as otherwise something like -mcpu=cortex-a7 -mfpu=none
      leaves us with fp16 enabled (though the only effect that will have is
      a wrong build attribute).
      
      Differential Revision: http://reviews.llvm.org/D10397
      
      llvm-svn: 239599
      d9e39d53
    • Reid Kleckner's avatar
      [WinEH] Put finally pointers in the handler scope table field · 81d1cc00
      Reid Kleckner authored
      We were putting them in the filter field, which is correct for 64-bit
      but wrong for 32-bit.
      
      Also switch the order of scope table entry emission so outermost entries
      are emitted first, and fix an obvious state assignment bug.
      
      llvm-svn: 239574
      81d1cc00
    • Reid Kleckner's avatar
      [WinEH] Create an llvm.x86.seh.exceptioninfo intrinsic · a9d62535
      Reid Kleckner authored
      This intrinsic is like framerecover plus a load. It recovers the EH
      registration stack allocation from the parent frame and loads the
      exception information field out of it, giving back a pointer to an
      EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
      expressions instead of accessing the EXCEPTION_POINTERS parameter that
      is available on x64.
      
      This required a minor change to MC to allow defining a label variable to
      another absolute framerecover label variable.
      
      llvm-svn: 239567
      a9d62535
  2. Jun 11, 2015
    • Peter Collingbourne's avatar
      Object: Prepend __imp_ when mangling a dllimport symbol in IRObjectFile. · 82e657b5
      Peter Collingbourne authored
      We cannot prepend __imp_ in the IR mangler because a function reference may
      be emitted unmangled in a constant initializer. The linker is expected to
      resolve such references to thunks. This is covered by the new test case.
      
      Strictly speaking we ought to emit two undefined symbols, one with __imp_ and
      one without, as we cannot know which symbol the final object file will refer
      to. However, this would require rather intrusive changes to IRObjectFile,
      and lld works fine without it for now.
      
      This reimplements r239437, which was reverted in r239502.
      
      Differential Revision: http://reviews.llvm.org/D10400
      
      llvm-svn: 239560
      82e657b5
    • Rafael Espindola's avatar
      This reverts commit r239529 and r239514. · 65d37e64
      Rafael Espindola authored
      Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
      Revert "Fixing MSVC 2013 build error."
      
      The  test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.
      
      llvm-svn: 239544
      65d37e64
    • Reid Kleckner's avatar
      Revert "Fix merges of non-zero vector stores" · 2691c59e
      Reid Kleckner authored
      This reverts commit r239539.
      
      It was causing SDAG assertions while building freetype.
      
      llvm-svn: 239543
      2691c59e
    • Matt Arsenault's avatar
      Fix merges of non-zero vector stores · e23a063d
      Matt Arsenault authored
      Now actually stores the non-zero constant instead of 0.
      I somehow forgot to include this part of r238108.
      
      The test change was just an independent instruction order swap,
      so just add another check line to satisfy CHECK-NEXT.
      
      llvm-svn: 239539
      e23a063d
    • Tom Stellard's avatar
      R600/SI: Add -mcpu=bonaire to a test that uses flat address space · 53e015f3
      Tom Stellard authored
      Flat instructions don't exist on SI, but there is a bug in the backend that
      allows them to be selected.
      
      llvm-svn: 239533
      53e015f3
    • Hao Liu's avatar
      [AArch64] Match interleaved memory accesses into ldN/stN instructions. · 4566d18e
      Hao Liu authored
      Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"
      
      E.g. Transform an interleaved load (Factor = 2):
             %wide.vec = load <8 x i32>, <8 x i32>* %ptr
             %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
             %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
           Into:
             %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
             %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
             %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1
      
      E.g. Transform an interleaved store (Factor = 2):
             %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
             store <8 x i32> %i.vec, <8 x i32>* %ptr
           Into:
             %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
             %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
             call void aarch64.neon.st2(%v0, %v1, %ptr)
      
      llvm-svn: 239514
      4566d18e
    • Simon Pilgrim's avatar
      [X86][SSE] Vectorized i8 and i16 shift operators · 5965680d
      Simon Pilgrim authored
      This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
      
      1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
      2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
      3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
      
      The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
      
      Tested on SSE2, SSE41 and AVX machines.
      
      Differential Revision: http://reviews.llvm.org/D9474
      
      llvm-svn: 239509
      5965680d
    • Nemanja Ivanovic's avatar
      LLVM support for vector quad bit permute and gather instructions through builtins · ea1db8a6
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D10096
      
      This is the back end portion of the patch related to D10095.
      The patch adds the instructions and back end intrinsics for:
      vbpermq
      vgbbd
      
      llvm-svn: 239505
      ea1db8a6
  3. Jun 10, 2015
  4. Jun 09, 2015
    • Jingyue Wu's avatar
      [NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces · 75589ffc
      Jingyue Wu authored
      Summary:
      We used to assume V->RAUW only modifies the operand list of V's user.
      However, if V and V's user are Constants, RAUW may replace and invalidate V's
      user entirely.
      
      This patch fixes the above issue by letting the caller replace the
      operand instead of calling RAUW on Constants.
      
      Test Plan: @nested_const_expr and @rauw in access-non-generic.ll
      
      Reviewers: broune, jholewinski
      
      Reviewed By: broune, jholewinski
      
      Subscribers: jholewinski, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D10345
      
      llvm-svn: 239435
      75589ffc
    • Reid Kleckner's avatar
      [WinEH] Add 32-bit SEH state table emission prototype · f12c030f
      Reid Kleckner authored
      This gets all the handler info through to the asm printer and we can
      look at the .xdata tables now. I've convinced one small catch-all test
      case to work, but other than that, it would be a stretch to say this is
      functional.
      
      The state numbering algorithm avoids doing any scope reconstruction as
      we do for C++ to simplify the implementation.
      
      llvm-svn: 239433
      f12c030f
    • Chad Rosier's avatar
      [AArch64] Remove an overly conservative check when generating store pairs. · cf90acc1
      Chad Rosier authored
      Store instructions do not modify register values and therefore it's safe
      to form a store pair even if the source register has been read in between
      the two store instructions.
      
      Previously, the read of w1 (see below) prevented the formation of a stp.
      
              str      w0, [x2]
              ldr     w8, [x2, #8]
              add      w0, w8, w1
              str     w1, [x2, #4]
              ret
      
      We now generate the following code.
      
              stp      w0, w1, [x2]
              ldr     w8, [x2, #8]
              add      w0, w8, w1
              ret
      
      All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
      Performance results for SPEC2K were within noise.
      
      llvm-svn: 239432
      cf90acc1
    • Akira Hatanaka's avatar
      Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions · d9699bc7
      Akira Hatanaka authored
      that was resetting it.
      
      Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
      the value of function attribute "disable-tail-calls" instead. Also,
      unconditionally add pass TailCallElim to the pipeline and check the function
      attribute at the start of runOnFunction to disable the pass on a per-function
      basis. 
       
      This is part of the work to remove TargetMachine::resetTargetOptions, and since
      DisableTailCalls was the last non-fast-math option that was being reset in that
      function, we should be able to remove the function entirely after the work to
      propagate IR-level fast-math flags to DAG nodes is completed.
      
      Out-of-tree users should remove the uses of DisableTailCalls and make changes
      to attach attribute "disable-tail-calls"="true" or "false" to the functions in
      the IR.
      
      rdar://problem/13752163
      
      Differential Revision: http://reviews.llvm.org/D10099
      
      llvm-svn: 239427
      d9699bc7
    • Samuel Antao's avatar
      The constant initialization for globals in NVPTX is generated as an · cd50135a
      Samuel Antao authored
      array of bytes. The generation of this byte arrays was expecting 
      the host to be little endian, which prevents big endian hosts to be 
      used in the generation of the PTX code. This patch fixes the 
      problem by changing the way the bytes are extracted so that it 
      works for either little and big endian.
      
      llvm-svn: 239412
      cd50135a
    • Matt Arsenault's avatar
      Implement computeKnownBits for min/max nodes · 705eb8f6
      Matt Arsenault authored
      llvm-svn: 239378
      705eb8f6
    • Jingyue Wu's avatar
      [NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces · 2e4d1dd0
      Jingyue Wu authored
      Summary:
      This cleans up most allocas NVPTXLowerKernelArgs emits for byval
      parameters.
      
      Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.
      
      Reviewers: eliben, jholewinski
      
      Reviewed By: eliben, jholewinski
      
      Subscribers: jholewinski, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D10322
      
      llvm-svn: 239368
      2e4d1dd0
  5. Jun 08, 2015
  6. Jun 07, 2015
  7. Jun 05, 2015
Loading