Skip to content
  1. Mar 31, 2016
  2. Mar 30, 2016
    • Sanjay Patel's avatar
      fix typos · 43d4144d
      Sanjay Patel authored
      llvm-svn: 264933
      43d4144d
    • Aaron Ballman's avatar
      Silencing warnings from MSVC 2015 Update 2. All of these changes silence... · ef0fe1ee
      Aaron Ballman authored
      Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
      
      llvm-svn: 264929
      ef0fe1ee
    • Matt Arsenault's avatar
      LegalizeDAG: Don't replace vector store with integer if not legal · 46ba3165
      Matt Arsenault authored
      For the same reason as the corresponding load change.
      
      Note that ExpandStore is completely broken for non-byte sized element
      vector stores, but preserve the current broken behavior which has tests
      for it. The behavior should be the same, but now introduces a new typed
      store that is incorrectly split later rather than doing it directly.
      
      llvm-svn: 264928
      46ba3165
    • Matt Arsenault's avatar
      LegalizeDAG: Don't replace vector load with integer unless legal · a4b1b6ea
      Matt Arsenault authored
      On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
      If the access is unaligned, this would conclude that since i64 is legal,
      it would convert it back to i64 and there is an endless legalization
      loop.
      
      Extract the logic for scalarizing the load into a new TargetLowering
      function, where this can also replace the custom function AMDGPU
      has for this.
      
      llvm-svn: 264927
      a4b1b6ea
    • David Majnemer's avatar
      [IndVarSimplify] Don't insert after a catchswitch · 5d518386
      David Majnemer authored
      Widening a PHI requires us to insert a trunc.
      The logical place for this trunc is in the same BB as the PHI.
      This is not possible if the BB is terminated by a catchswitch.
      
      This fixes PR27133.
      
      llvm-svn: 264926
      5d518386
    • Justin Lebar's avatar
    • Simon Pilgrim's avatar
      [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros · c49bd2ed
      Simon Pilgrim authored
      Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.
      
      llvm-svn: 264922
      c49bd2ed
    • Justin Lebar's avatar
      [NVPTX] Make NVVMReflect a function pass. · e3804cc9
      Justin Lebar authored
      Summary:
      Currently it's a module pass.  Make it a function pass so that we can
      move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
      which only accepts function passes.
      
      Reviewers: rnk
      
      Subscribers: tra, llvm-commits, jholewinski
      
      Differential Revision: http://reviews.llvm.org/D18615
      
      llvm-svn: 264919
      e3804cc9
    • Justin Lebar's avatar
      [PassManager] Make PassManagerBuilder::addExtension take an std::function,... · 2fe13231
      Justin Lebar authored
      [PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer.
      
      Summary:
      This gives callers flexibility to pass lambdas with captures, which lets
      callers avoid the C-style void*-ptr closure style.  (Currently, callers
      in clang store state in the PassManagerBuilderBase arg.)
      
      No functional change, and the new API is backwards-compatible.
      
      Reviewers: chandlerc
      
      Subscribers: joker.eph, cfe-commits
      
      Differential Revision: http://reviews.llvm.org/D18613
      
      llvm-svn: 264918
      2fe13231
    • Justin Bogner's avatar
      test: Remove a test for a transform that hasn't existed in 5 years. · a5a63787
      Justin Bogner authored
      The TailDup transform was removed in r138841 in 2011, along with most
      of the tests for it. This test, however, was missed. Probably because
      it had already been XFAIL'd for 3 years at that point (since r52243!)
      and continued to fail when the opt flag for -tailduplicate stopped
      being valid.
      
      llvm-svn: 264916
      a5a63787
    • Hal Finkel's avatar
      Add a copy constructor to StringMap · 38bf13d0
      Hal Finkel authored
      There is code under review that requires StringMap to have a copy constructor,
      and this makes StringMap more consistent with our other containers (like
      DenseMap) that have copy constructors.
      
      Differential Revision: http://reviews.llvm.org/D18506
      
      llvm-svn: 264906
      38bf13d0
    • Hal Finkel's avatar
      [LoopVectorize] Don't vectorize loops when everything will be scalarized · 2e0ff2b2
      Hal Finkel authored
      This change prevents the loop vectorizer from vectorizing when all of the vector
      types it generates will be scalarized. I've run into this problem on the PPC's QPX
      vector ISA, which only holds floating-point vector types. The loop vectorizer
      will, however, happily vectorize loops with purely integer computation. Here's
      an example:
      
        LV: The Smallest and Widest types: 32 / 32 bits.
        LV: The Widest register is: 256 bits.
        LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
        LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
        LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
        LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
        LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
        LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
        LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
        LV: Scalar loop costs: 3.
        LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
        LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
        LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
        LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
        LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
        LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
        LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
        LV: Vector loop of width 2 costs: 2.
        LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
        LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
        LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
        LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
        LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
        LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
        LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
        LV: Vector loop of width 4 costs: 1.
        ...
        LV: Selecting VF: 8.
        LV: The target has 32 registers
        LV(REG): Calculating max register usage:
        LV(REG): At #0 Interval # 0
        LV(REG): At #1 Interval # 1
        LV(REG): At #2 Interval # 2
        LV(REG): At #4 Interval # 1
        LV(REG): At #5 Interval # 1
        LV(REG): VF = 8
      
      The problem is that the cost model here is not wrong, exactly. Since all of
      these operations are scalarized, their cost (aside from the uniform ones) are
      indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
      picked, the lower the relative overhead from the loop itself (and the
      induction-variable update and check), and so in a sense, picking the largest VF
      here is the right thing to do.
      
      The problem is that vectorizing like this, where all of the vectors will be
      scalarized in the backend, isn't really vectorizing, but rather interleaving.
      By itself, this would be okay, but then the vectorizer itself also interleaves,
      and that's where the problem manifests itself. There's aren't actually enough
      scalar registers to support the normal interleave factor multiplied by a factor
      of VF (8 in this example). In other words, the problem with this is that our
      register-pressure heuristic does not account for scalarization.
      
      While we might want to improve our register-pressure heuristic, I don't think
      this is the right motivating case for that work. Here we have a more-basic
      problem: The job of the vectorizer is to vectorize things (interleaving aside),
      and if the IR it generates won't generate any actual vector code, then
      something is wrong. Thus, if every type looks like it will be scalarized (i.e.
      will be split into VF or more parts), then don't consider that VF.
      
      This is not a problem specific to PPC/QPX, however. The problem comes up under
      SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
      reduced test case from PR26837 to this commit.
      
      Differential Revision: http://reviews.llvm.org/D18537
      
      llvm-svn: 264904
      2e0ff2b2
    • Rong Xu's avatar
      [PGO] PGOFuncName in LTO optimizations · b534166f
      Rong Xu authored
      PGOFuncNames are used as the key to retrieve the Function definition from the
      MD5 stored in the profile. For internal linkage function, we prefix the source
      file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
      symbols. This happens after value profile annotation, but those internal
      linkage functions should not have a source prefix. To differentiate compiler
      generated internal symbols from original ones, PGOFuncName meta data are
      created and attached to the original internal symbols in the value profile
      annotation step. If a symbol does not have the meta data, its original linkage
      must be non-internal.
      
      Also add a new map that maps PGOFuncName's MD5 value to the function definition.
      
      Differential Revision: http://reviews.llvm.org/D17895
      
      llvm-svn: 264902
      b534166f
    • Reid Kleckner's avatar
      [cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version · 88ad225e
      Reid Kleckner authored
      Credit to Aaron Ballman for thinking of this.
      
      llvm-svn: 264886
      88ad225e
    • Teresa Johnson's avatar
      Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly" · 83c517c4
      Teresa Johnson authored
      This restores commit 264869, with a fix for windows bots to properly
      escape '\' in the path when serializing out. Added test.
      
      llvm-svn: 264884
      83c517c4
    • Chad Rosier's avatar
      [AArch64] Fix warnings pointed out by Hal. · f7ac5f28
      Chad Rosier authored
      llvm-svn: 264882
      f7ac5f28
    • Reid Kleckner's avatar
      [cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t · 2b3db2c1
      Reid Kleckner authored
      What we are really trying to do here is to figure out if we are using
      the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
      define a version macro that we can check directly. Instead I wrote a
      check to see if char16_t works.
      
      llvm-svn: 264881
      2b3db2c1
    • Reid Kleckner's avatar
      [cmake] Allow EH usage with clang-cl · 8c18019d
      Reid Kleckner authored
      llvm-svn: 264880
      8c18019d
    • Rong Xu's avatar
      [PGO] Use ArrayRef in annotateValueSite() · 311ada11
      Rong Xu authored
      Using ArrayRef in annotateValueSite's parameter instead of using an array
      and it's size.
      
      Differential Revision: http://reviews.llvm.org/D18568
      
      llvm-svn: 264879
      311ada11
    • Tom Stellard's avatar
      AMDGPU/SI: Improve MachineSchedModel definition · 1d5e6d4b
      Tom Stellard authored
      This patch contains a few improvements to the model, including:
      
      - Using a single resource with a defined buffers size for each memory unit.
      - Setting the IssueWidth correctly.
      - Fixing latency values for memory instructions.
      
      shader-db stats:
      
      16429 shaders in 3231 tests
      Totals:
      SGPRS: 318232 -> 312328 (-1.86 %)
      VGPRS: 208996 -> 209346 (0.17 %)
      Code Size: 7147044 -> 7166440 (0.27 %) bytes
      LDS: 83 -> 83 (0.00 %) blocks
      Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
      Max Waves: 49182 -> 49243 (0.12 %)
      Wait states: 0 -> 0 (0.00 %)A
      
      Differential Revision: http://reviews.llvm.org/D18453
      
      llvm-svn: 264877
      1d5e6d4b
    • Tom Stellard's avatar
      AMDGPU/SI: Enable lanemask tracking in misched · 0bc954e3
      Tom Stellard authored
      Summary:
      This results in higher register usage, but should make it easier for
      the compiler to hide latency.
      
      This pass is a prerequisite for some more scheduler improvements, and I
      think the increase register usage with this patch is acceptable, because
      when combined with the scheduler improvements, the total register usage
      will decrease.
      
      shader-db stats:
      
      2382 shaders in 478 tests
      Totals:
      SGPRS: 48672 -> 49088 (0.85 %)
      VGPRS: 34148 -> 34847 (2.05 %)
      Code Size: 1285816 -> 1289128 (0.26 %) bytes
      LDS: 28 -> 28 (0.00 %) blocks
      Scratch: 492544 -> 573440 (16.42 %) bytes per wave
      Max Waves: 6856 -> 6846 (-0.15 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Depends on D18451
      
      Reviewers: nhaehnle, arsenm
      
      Subscribers: arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D18452
      
      llvm-svn: 264876
      0bc954e3
    • Jonas Paulsson's avatar
      [SystemZ] Add nop and nopr InstAliases. · f7612338
      Jonas Paulsson authored
      For compatability with GAS, nop and nopr are recognized as alises for
      bc and bcr, respectively. A mask of 0 turns these instructions
      effectively into no-operations.
      
      Reviewed by Ulrich Weigand.
      
      llvm-svn: 264875
      f7612338
    • Nirav Dave's avatar
      Remove HasFnAttribute guards to getFnAttribute calls · 8dd66e57
      Nirav Dave authored
      These checks are redundant and can be removed
      
      Reviewers: hans
      
      Subscribers: llvm-commits, mzolotukhin
      
      Differential Revision: http://reviews.llvm.org/D18564
      
      llvm-svn: 264872
      8dd66e57
    • Teresa Johnson's avatar
      Revert "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly" · 20beeea2
      Teresa Johnson authored
      This reverts commit r264869. I am seeing Windows bot failures due to the
      "\" in the path being mishandled at some point (seems to be interpreted
      wrongly at some point and llvm-as | llvm-dis is yielding some junk
      characters). Need to investigate.
      
      llvm-svn: 264871
      20beeea2
    • Simon Pilgrim's avatar
      [X86][XOP] BITREVERSE lowering using VPPERM · b87ffe85
      Simon Pilgrim authored
      XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.
      
      llvm-svn: 264870
      b87ffe85
    • Teresa Johnson's avatar
      [ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly · 832a6790
      Teresa Johnson authored
      Summary:
      This change serializes out and in the SourceFileName to LLVM assembly
      so that it is preserved through "llvm-dis | llvm-as". This is
      necessary to ensure that the global identifiers created for local values
      in the module summary index are the same even if the bitcode is
      streamed out and read back from LLVM assembly.
      
      Serializing the summary itself to LLVM assembly is in progress.
      
      Reviewers: joker.eph
      
      Subscribers: llvm-commits, joker.eph
      
      Differential Revision: http://reviews.llvm.org/D18588
      
      llvm-svn: 264869
      832a6790
    • Simon Pilgrim's avatar
      [X86][SSE] Test the legalization of vector comparison results · 9490b56a
      Simon Pilgrim authored
      We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.
      
      llvm-svn: 264867
      9490b56a
    • Benjamin Kramer's avatar
      [NVPTX] Avoid temporary std::string and make single-use function local to the cpp file. · 9415e06d
      Benjamin Kramer authored
      No functionality change intended.
      
      llvm-svn: 264861
      9415e06d
    • Marianne Mailhot-Sarrasin's avatar
      gold-plugin: Fixed typo in an error message. · a5a750ea
      Marianne Mailhot-Sarrasin authored
      llvm-svn: 264860
      a5a750ea
    • Simon Pilgrim's avatar
      [X86][SSE] Added tests for clearing upper bits of vector elements · ab305a9d
      Simon Pilgrim authored
      Patterns based on PR6455
      
      llvm-svn: 264857
      ab305a9d
    • James Molloy's avatar
      [VectorUtils] Don't try and truncate PHIs to a smaller bitwidth · 8e46cd05
      James Molloy authored
      We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.
      
      However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.
      
      This fixes PR17018.
      
      llvm-svn: 264852
      8e46cd05
    • Chandler Carruth's avatar
      [x86] Fix a horrible bug in our lowering of x86 floating point atomic · 8e06a10d
      Chandler Carruth authored
      operations.
      
      Specifically, we had code that tried to badly approximate reconstructing
      all of the possible variations on addressing modes in two x86
      instructions based on those in one pseudo instruction. This is not the
      first bug uncovered with doing this, so stop doing it altogether.
      Instead generically and pedantically copy every operand from the address
      over to both new instructions, and strip kill flags from any register
      operands.
      
      This fixes a subtle bug seen in the wild where we would mysteriously
      drop parts of the addressing mode, causing for example the index
      argument in the added test case to just be completely ignored.
      
      Hypothetically, this was an extremely bad miscompile because it actually
      caused a predictable and leveragable write of a 64bit quantity to an
      unintended offset (the first element of the array intead of whatever
      other element was intended). As a consequence, in theory this could even
      have introduced security vulnerabilities.
      
      However, this was only something that could happen with an atomic
      floating point add. No other operation could trigger this bug, so it
      seems extremely unlikely to have occured widely in the wild.
      
      But it did in fact occur, and frequently in scientific applications
      which were using relaxed atomic updates of a floating point value after
      adding a delta. Those would end up being quite badly miscompiled by
      LLVM, which is how we found this. Of course, this often looks like
      a race condition in the code, but it was actually a miscompile.
      
      I suspect that this whole RELEASE_FADD thing was a complete mistake.
      There is no such operation, and I worry that anything other than add
      will get remarkably worse codegeneration. But that's not for this
      change....
      
      llvm-svn: 264845
      8e06a10d
    • Craig Topper's avatar
      [CodeGen] Mark EVT:getExtendedSizeInBits() as LLVM_READONLY. · e9ff01b2
      Craig Topper authored
      I think I had tried this a long time back and some bots failed. Hoping that was with an older gcc and maybe now it will work.
      
      llvm-svn: 264840
      e9ff01b2
    • Jingyue Wu's avatar
      [docs] Add gpucc publication and tutorial. · f190ed43
      Jingyue Wu authored
      llvm-svn: 264839
      f190ed43
    • Duncan P. N. Exon Smith's avatar
      IR: Constify LLVMContext::discardValueNames, NFC · 90717299
      Duncan P. N. Exon Smith authored
      llvm-svn: 264823
      90717299
Loading