Skip to content
  1. Jan 08, 2018
  2. Jan 07, 2018
    • Davide Italiano's avatar
      Revert "[SCCP] Manually fold branches on undef." · e15bffe9
      Davide Italiano authored
      I thought this was responsible for PR35723, but I was
      wrong, the issue lies elsewhere. Revert while I debug.
      
      llvm-svn: 321975
      e15bffe9
    • Davide Italiano's avatar
      [SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). · 4c39758a
      Davide Italiano authored
      The approach was never discussed, I wasn't able to reproduce this
      non-determinism, and the original author went AWOL.
      After a discussion on the ML, Philip suggested to revert this.
      
      llvm-svn: 321974
      4c39758a
    • Craig Topper's avatar
      [X86] Revert accidental change to CMakeLists.txt in r321952 · e9f44e1b
      Craig Topper authored
      I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change.
      
      llvm-svn: 321971
      e9f44e1b
    • Zvi Rackover's avatar
      X86 Tests: Add Tests for PMADDWD selection. NFC. · 93b8bd49
      Zvi Rackover authored
      Support for ISel to be added.
      
      llvm-svn: 321970
      93b8bd49
    • Simon Pilgrim's avatar
      [DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcasts · 998180da
      Simon Pilgrim authored
      Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.
      
      Committed on the behalf of @sameconrad (Sam Conrad)
      
      Differential Revision: https://reviews.llvm.org/D41643
      
      llvm-svn: 321969
      998180da
    • Craig Topper's avatar
      [X86] Remove unneeded code from combineGatherScatter that used to delte... · c1ec57c3
      Craig Topper authored
      [X86] Remove unneeded code from combineGatherScatter that used to delte SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL.
      
      v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated.
      
      llvm-svn: 321968
      c1ec57c3
    • Craig Topper's avatar
      [X86] Make v2i1 and v4i1 legal types without VLX · d58c1655
      Craig Topper authored
      Summary:
      There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
      
      It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
      
      This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
      
      We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
      
      I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
      
      There's definitely room for improvement with some follow up patches.
      
      Reviewers: RKSimon, zvi, guyblank
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41560
      
      llvm-svn: 321967
      d58c1655
    • Hal Finkel's avatar
      [LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp · 0f1314c5
      Hal Finkel authored
      Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.
      
      VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
      LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
      LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
      class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
      this needs further streamlining work in later patches and thus all I did was
      take it out of the CostModel class and moved to the header file.  The callback
      function had to stay inside LoopVectorize.cpp since it calls an
      InnerLoopVectorizer member function declared in it.  Next Steps: Make
      InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
      and more aligned with VPlan direction, in small increments.
      
      Previous step was: r320900 (https://reviews.llvm.org/D41045)
      
      Patch by Hideki Saito, thanks!
      
      Differential Revision: https://reviews.llvm.org/D41420
      
      llvm-svn: 321962
      0f1314c5
    • Florian Hahn's avatar
      [CodeExtractor] Use subset of function attributes for extracted function. · 55be37e7
      Florian Hahn authored
      In addition to target-dependent attributes, we can also preserve a
      white-listed subset of target independent function attributes. The white-list
      excludes problematic attributes, most prominently:
      
      * attributes related to memory accesses, as alloca instructions
        could be moved in/out of the extracted block
      
      * control-flow dependent attributes, like no_return or thunk, as the
        relerelevant instructions might or might not get extracted.
      
      Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
      propagated.
      
      
      Reviewers: efriedma, davidxl, davide, silvas
      
      Reviewed By: efriedma
      
      Differential Revision: https://reviews.llvm.org/D41334
      
      llvm-svn: 321961
      55be37e7
    • Craig Topper's avatar
      [PowerPC] Add an ISD::TRUNCATE to the legalization for ppc_is_decremented_ctr_nonzero · d461aefe
      Craig Topper authored
      Summary:
      I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.
      
      There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.
      
      With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.
      
      Reviewers: hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: nemanjai, llvm-commits, kbarton
      
      Differential Revision: https://reviews.llvm.org/D41654
      
      llvm-svn: 321959
      d461aefe
    • Craig Topper's avatar
      a21f5511
    • Craig Topper's avatar
      [X86] Correct the load folding flags for xmm fp->mmx conversion instructions. · d0859a03
      Craig Topper authored
      The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.
      
      llvm-svn: 321956
      d0859a03
    • Craig Topper's avatar
    • Craig Topper's avatar
      [X86] Don't put any EVEX_B instructions in the tablegen generated load folding tables. · 85657d59
      Craig Topper authored
      EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent.
      
      llvm-svn: 321954
      85657d59
    • Craig Topper's avatar
      89293a2a
    • Craig Topper's avatar
      [X86] Add some 8 and 16-bit instructions to the load folding tables. · a124ab10
      Craig Topper authored
      llvm-svn: 321952
      a124ab10
    • Craig Topper's avatar
      [X86] Add EVEX vcvtph2ps to the load folding tables. · 11aede13
      Craig Topper authored
      llvm-svn: 321951
      11aede13
    • Craig Topper's avatar
      [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex... · 40cc8338
      Craig Topper authored
      [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex versions of cvtps2ph to the store folding tables.
      
      The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
      
      llvm-svn: 321950
      40cc8338
    • Craig Topper's avatar
      [X86] Add CMP8ri8 to load folding tables. · 8fa800b8
      Craig Topper authored
      llvm-svn: 321949
      8fa800b8
  3. Jan 06, 2018
Loading