Skip to content
  1. Oct 07, 2019
  2. Oct 06, 2019
    • Simon Pilgrim's avatar
      [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217) · b4ba3cbd
      Simon Pilgrim authored
      If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
      
      This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
      
      Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
      
      Fixes PR43217
      
      Differential Revision: https://reviews.llvm.org/D68544
      
      llvm-svn: 373871
      b4ba3cbd
    • Simon Pilgrim's avatar
      Fix signed/unsigned warning. NFCI · d84cd7ca
      Simon Pilgrim authored
      llvm-svn: 373870
      d84cd7ca
    • Amy Kwan's avatar
      [NFC][PowerPC] Reorganize CRNotPat multiclass patterns in PPCInstrInfo.td · e36415ca
      Amy Kwan authored
      This is patch aims to group together the `CRNotPat` multi class instantiations
      within the `PPCInstrInfo.td` file.
      
      Integer instantiations of the multi class are grouped together into a section,
      and the floating point patterns are separated into its own section.
      
      Differential Revision: https://reviews.llvm.org/D67975
      
      llvm-svn: 373869
      e36415ca
    • Simon Pilgrim's avatar
      [X86][SSE] Remove resolveTargetShuffleInputs and use getTargetShuffleInputs directly. · 739c9f0b
      Simon Pilgrim authored
      Move the resolveTargetShuffleInputsAndMask call to after the shuffle mask combine before the undef/zero constant fold instead.
      
      llvm-svn: 373868
      739c9f0b
    • Simon Pilgrim's avatar
      [X86][SSE] Don't merge known undef/zero elements into target shuffle masks. · 42010dc8
      Simon Pilgrim authored
      Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).
      
      This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.
      
      llvm-svn: 373867
      42010dc8
    • Richard Smith's avatar
      Implements CWG 1601 in [over.ics.rank/4.2] · 344df110
      Richard Smith authored
      Summary:
      The overload resolution for enums with a fixed underlying type has changed in the C++14 standard. This patch implements the new rule.
      
      Patch by Mark de Wever!
      
      Reviewers: rsmith
      
      Reviewed By: rsmith
      
      Subscribers: cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D65695
      
      llvm-svn: 373866
      344df110
    • Craig Topper's avatar
      [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8... · 570ae49d
      Craig Topper authored
      [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal
      
      Summary:
      The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.
      
      I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.
      
      Reviewers: RKSimon, spatel
      
      Reviewed By: RKSimon
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D68428
      
      llvm-svn: 373864
      570ae49d
Loading