Skip to content
  1. Oct 13, 2015
    • Duncan P. N. Exon Smith's avatar
      TransformUtils: Remove implicit ilist iterator conversions, NFC · 5b4c837c
      Duncan P. N. Exon Smith authored
      Continuing the work from last week to remove implicit ilist iterator
      conversions.  First related commit was probably r249767, with some more
      motivation in r249925.  This edition gets LLVMTransformUtils compiling
      without the implicit conversions.
      
      No functional change intended.
      
      llvm-svn: 250142
      5b4c837c
    • Matt Arsenault's avatar
      DAGCombiner: Don't stop finding better chain on 2 aliases · e5d9515f
      Matt Arsenault authored
      The comment says this was stopped because it was unlikely to be
      profitable. This is not true if you want to combine vector loads
      with multiple components.
      
      For a simple case that looks like
      
      t0 = load t0 ...
      t1 = load t0 ...
      t2 = load t0 ...
      t3 = load t0 ...
      
      t4 = store t0:1, t0:1
        t5 = store t4, t1:0
          t6 = store t5, t2:0
      	  t7 = store t6, t3:0
      
      We want to get all of these stores onto a chain
      that is a TokenFactor of these N loads. This mostly
      solves the AMDGPU merge-stores.ll regressions
      with -combiner-alias-analysis for merging vector
      stores of vector loads.
      
      llvm-svn: 250138
      e5d9515f
    • JF Bastien's avatar
      x86: preserve flags when folding atomic operations · 986ed68e
      JF Bastien authored
      Summary:
      D4796 taught LLVM to fold some atomic integer operations into a single
      instruction. The pattern was unaware that the instructions clobbered
      flags.
      
      This patch adds the missing EFLAGS definition.
      
      Floating point operations don't set flags, the subsequent fadd
      optimization is therefore correct. The same applies for surrounding
      load/store optimizations.
      
      Reviewers: rsmith, rtrieu
      
      Subscribers: llvm-commits, reames, morisset
      
      Differential Revision: http://reviews.llvm.org/D13680
      
      llvm-svn: 250135
      986ed68e
    • Matt Arsenault's avatar
      AMDGPU: Refactor isVGPRToSGPRCopy · f0d9e47d
      Matt Arsenault authored
      It should now correctly handle physical registers and make
      it easier to identify the other direction.
      
      llvm-svn: 250132
      f0d9e47d
    • Matt Arsenault's avatar
      DAGCombiner: Combine extract_vector_elt from build_vector · 61dc235f
      Matt Arsenault authored
      This basic combine was surprisingly missing.
      AMDGPU legalizes many operations in terms of 32-bit vector components,
      so not doing this results in many extra copies and subregister extracts
      that need to be cleaned up later.
      
      InstCombine already does this for the hasOneUse case. The target hook
      is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
      from a vector materialize repeated immediate instruction to a constant
      vector load with more scalar copies from it.
      
      llvm-svn: 250129
      61dc235f
    • Cong Hou's avatar
      Assign correct edge weights to unwind destinations when lowering invoke statement. · bf22f506
      Cong Hou authored
      When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.
      
      Differential revision: http://reviews.llvm.org/D13354
      
      llvm-svn: 250119
      bf22f506
    • Simon Pilgrim's avatar
      [SelectionDAG] Add common vector constant folding helper function · c8832fc2
      Simon Pilgrim authored
      We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).
      
      This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.
      
      After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().
      
      Differential Revision: http://reviews.llvm.org/D13665
      
      llvm-svn: 250118
      c8832fc2
    • Kevin Enderby's avatar
      Fixed bugs in llvm-obdump while parsing Mach-O files from malformed archives · 90395545
      Kevin Enderby authored
      that caused aborts.  This was because of the characters of the ‘Size’ field in
      the archive header did not contain decimal characters.
      
      rdar://22983603
      
      llvm-svn: 250117
      90395545
  2. Oct 12, 2015
    • Cong Hou's avatar
      Update the branch weight metadata in JumpThreading pass. · 3320bcd8
      Cong Hou authored
      In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). 
      
      Differential revision: http://reviews.llvm.org/D10979
      
      llvm-svn: 250089
      3320bcd8
    • Reid Kleckner's avatar
      Make Win64 localescape offsets FP relative instead of SP relative · 4a5f35c0
      Reid Kleckner authored
      We made them SP relative back in March (r233137) because that's the
      value the runtime passes to EH functions. With the new cleanuppad IR,
      funclets adjust their frame argument from SP to FP, so our offsets
      should now be FP-relative.
      
      llvm-svn: 250088
      4a5f35c0
    • Andrea Di Biagio's avatar
      [x86] Fix wrong lowering of vsetcc nodes (PR25080). · b0fe4eb1
      Andrea Di Biagio authored
      Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
      assumption that for non-AVX512 targets, the source type and destination type
      of a type-legalized setcc node were always the same type.
      
      This assumption was unfortunately incorrect; the type legalizer is not always
      able to promote the return type of a setcc to the same type as the first
      operand of a setcc.
      
      In the case of a vsetcc node, the legalizer firstly checks if the first input
      operand has a legal type. If so, then it promotes the return type of the vsetcc
      to that same type. Otherwise, the return type is promoted to the 'next legal
      type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
      
      Example (-mattr=+avx):
      
        %0 = trunc <8 x i32> %a to <8 x i23>
        %1 = icmp eq <8 x i23> %0, zeroinitializer
      
      The initial selection dag for the code above is:
      
      v8i1 = setcc t5, t7, seteq:ch
        t5: v8i23 = truncate t2
          t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
          t7: v8i32 = build_vector of all zeroes.
      
      The type legalizer would firstly check if 't5' has a legal type. If so, then it
      would reuse that same type to promote the return type of the setcc node.
      Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
      promote the return type of the setcc node. Consequently, the setcc return type
      is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
      following dag node:
        v8i16 = setcc t32, t25, seteq:ch
      
        where t32 and t25 are now values of type v8i32.
      
      Before this patch, function LowerVSETCC would have wrongly expanded the setcc
      to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
      instruction. In our case, ISel would have matched a VPCMPEQWrr:
        t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
      
      However, t36 and t25 are both VR256, while the result type is instead of class
      VR128. This inconsistency ended up causing the insertion of COPY instructions
      like this:
        %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
      
      Which is an invalid full copy (not a sub register copy).
      Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
      instruction" in the attempt to expand the malformed pseudo COPY instructions.
      
      This patch fixes the problem adding the missing logic in LowerVSETCC to handle
      the corner case of a setcc with 128-bit return type and 256-bit operand type.
      
      This problem was originally reported by Dimitry as PR25080. It has been latent
      for a very long time. I have added the minimal reproducible from that bugzilla
      as test setcc-lowering.ll.
      
      Differential Revision: http://reviews.llvm.org/D13660
      
      llvm-svn: 250085
      b0fe4eb1
    • Cong Hou's avatar
      Add - and -= operators to BlockFrequency using saturating arithmetic. · 61e13de4
      Cong Hou authored
      llvm-svn: 250077
      61e13de4
    • Sanjay Patel's avatar
      combine predicates; NFCI · 0dc91b31
      Sanjay Patel authored
      llvm-svn: 250075
      0dc91b31
    • Cong Hou's avatar
      Turn const/const& into value type for BlockFrequency in functions of this... · 90c6cf8e
      Cong Hou authored
      Turn const/const& into value type for BlockFrequency in functions of this class. Also fix a naming issue. NFC.
      
      llvm-svn: 250074
      90c6cf8e
    • Matt Arsenault's avatar
      AMDGPU: Register some more passes so -print-before works · 8c0ef8b3
      Matt Arsenault authored
      llvm-svn: 250071
      8c0ef8b3
    • Matt Arsenault's avatar
      Enable verifier after PeepholeOptimizer · 07a72bad
      Matt Arsenault authored
      No tests fail with this enabled so I assume it was an accident
      that it isn't enabled now.
      
      llvm-svn: 250070
      07a72bad
    • Reid Kleckner's avatar
      Don't call PrepareEHLandingPad on non EH pads · 9abb3c06
      Reid Kleckner authored
      This was a minor bug in r249492. Calling PrepareEHLandingPad on a
      non-landingpad was a no-op, but it attempted to get the generic pointer
      register class, which apparently doesn't exist for some targets.
      
      llvm-svn: 250068
      9abb3c06
    • David Majnemer's avatar
      [WinEH] Remove CatchObjRecoverIdx · 99c1d13e
      David Majnemer authored
      CatchObjRecoverIdx was used for the old scheme, it is no longer
      relevant.
      
      llvm-svn: 250065
      99c1d13e
    • Sanjay Patel's avatar
      fix typos; NFC · b814ef1a
      Sanjay Patel authored
      llvm-svn: 250059
      b814ef1a
    • Zoran Jovanovic's avatar
    • Oliver Stannard's avatar
      [Debug] Look through bitcasts to find argument registers · cca893ff
      Oliver Stannard authored
      On targets where f32 is not legal, we have to look through a BITCAST SDNode to
      find the register that an argument is stored in when emitting debug info, or we
      will not be able to emit a DW_AT_location for it.
      
      Differential Revision: http://reviews.llvm.org/D13005
      
      llvm-svn: 250056
      cca893ff
    • Vasileios Kalintiris's avatar
      [mips][FastISel] Clang-format switch statement. NFC. · 2a95f828
      Vasileios Kalintiris authored
      llvm-svn: 250053
      2a95f828
    • Sanjay Patel's avatar
      fix capitalization; NFC · 53d1d8b7
      Sanjay Patel authored
      llvm-svn: 250049
      53d1d8b7
    • Greg Bedwell's avatar
      Fix rename() sometimes failing if another process uses openFileForRead() · 7f68a716
      Greg Bedwell authored
      On Windows, fs::rename() could fail is another process was reading the
      file at the same time using fs::openFileForRead().  In most cases the user
      wouldn't notice as fs::rename() will continue to retry for 2000ms.  Typically
      this is enough for the read to complete and a retry to succeed, but if the
      disk is being it too hard then the response time might be longer than the
      retry time and the rename would fail with a permission error.
      
      Add FILE_SHARE_DELETE to the sharing flags for CreateFileW() in
      fs::openFileForRead() and try ReplaceFileW() prior to MoveFileExW()
      in fs::rename().
      
      Based on an initial patch by Edd Dawson!
      
      Differential Revision: http://reviews.llvm.org/D13647
      
      llvm-svn: 250046
      7f68a716
    • Daniel Sanders's avatar
      [mips][ias] Implement macro expansion when bcc has an immediate where a register belongs. · b1ef88c1
      Daniel Sanders authored
      Summary: Fixes PR24915.
      
      Reviewers: vkalintiris
      
      Subscribers: emaste, seanbruno, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13533
      
      llvm-svn: 250042
      b1ef88c1
    • Daniel Sanders's avatar
      [mips] Clean up most macro expansions to use the emit*() functions. · 2a5ce1ac
      Daniel Sanders authored
      Reviewers: vkalintiris
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13591
      
      llvm-svn: 250040
      2a5ce1ac
    • Daniel Sanders's avatar
      [mips] Handle undef when extracting subregs from FP64 registers. · 2fb8564d
      Daniel Sanders authored
      Summary:
      This removes unnecessary instructions when extracting from an undefined register
      and also fixes a crash for O32 when passing undef to a double argument in
      held in integer registers.
      
      Reviewers: vkalintiris
      
      Subscribers: llvm-commits, zoran.jovanovic, petarj
      
      Differential Revision: http://reviews.llvm.org/D13467
      
      llvm-svn: 250039
      2fb8564d
    • Oliver Stannard's avatar
      GlobalOpt does not treat externally_initialized globals correctly · 939724cd
      Oliver Stannard authored
      GlobalOpt currently merges stores into the initialisers of internal,
      externally_initialized globals, but should not do so as the value of the global
      may change between the initialiser and any code in the module being run.
      
      llvm-svn: 250035
      939724cd
    • James Molloy's avatar
      [ARM] Mark Swift MISched model as incomplete · fa4e994a
      James Molloy authored
      The Swift Machine Scheduler Model is incomplete. There are instructions
      missing which can trigger the "incomplete machine model" abort. This was
      observed when a downstream SchedMachineModel was added to the ARM
      target.
      
      Patch by Christof Douma!
      
      llvm-svn: 250033
      fa4e994a
    • James Molloy's avatar
      [LoopVectorize] Shrink integer operations into the smallest type possible · 55d633bd
      James Molloy authored
      C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
      type (e.g. i32) whenever arithmetic is performed on them.
      
      For targets with native i8 or i16 operations, usually InstCombine can shrink
      the arithmetic type down again. However InstCombine refuses to create illegal
      types, so for targets without i8 or i16 registers, the lengthening and
      shrinking remains.
      
      Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
      their scalar equivalents do not, so during vectorization it is important to
      remove these lengthens and truncates when deciding the profitability of
      vectorization.
      
      The algorithm this uses starts at truncs and icmps, trawling their use-def
      chains until they terminate or instructions outside the loop are found (or
      unsafe instructions like inttoptr casts are found). If the use-def chains
      starting from different root instructions (truncs/icmps) meet, they are
      unioned. The demanded bits of each node in the graph are ORed together to form
      an overall mask of the demanded bits in the entire graph. The minimum bitwidth
      that graph can be truncated to is the bitwidth minus the number of leading
      zeroes in the overall mask.
      
      The intention is that this algorithm should "first do no harm", so it will
      never insert extra cast instructions. This is why the use-def graphs are
      unioned, so that subgraphs with different minimum bitwidths do not need casts
      inserted between them.
      
      This algorithm works hard to reduce compile time impact. DemandedBits are only
      queried if there are extends of illegal types and if a truncate to an illegal
      type is seen. In the general case, this results in a simple linear scan of the
      instructions in the loop.
      
      No non-noise compile time impact was seen on a clang bootstrap build.
      
      llvm-svn: 250032
      55d633bd
    • Amjad Aboud's avatar
      [X86] Add XSAVE intrinsic family · 1db6d7af
      Amjad Aboud authored
      Add intrinsics for the
        XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
        XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
        XSAVEC instructions (XSAVEC/XSAVEC64)
        XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)
      
      Differential Revision: http://reviews.llvm.org/D13012
      
      llvm-svn: 250029
      1db6d7af
    • Andrea Di Biagio's avatar
      [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all... · a0922ed8
      Andrea Di Biagio authored
      [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
      
      This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
      chain of shuffles to be wrongly folded away when the combined shuffle mask has
      only one element.
      
      We may end up with a combined shuffle mask of one element as a result of
      multiple calls to function 'canWidenShuffleElements()'.
      Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
      the size of the elements being shuffled.
      For every pair of shuffle indices, function canWidenShuffleElements checks if
      indices refer to adjacent elements. If all pairs refer to "adjacent" elements
      then the shuffle mask is safely widened. As a consequence of widening, we end up
      with a new shuffle mask which is half the size of the original shuffle mask.
      
      The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
      indices. Function canWidenShuffleElements would combine each pair of
      SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
      logarithmic number of steps (4 in this case), the pshufb mask is simplified to
      a mask with only one index which is equal to SM_SentinelZero.
      
      Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
      of size one is always equivalent to an identity mask. So, the entire shuffle
      chain was just folded away as the combined shuffle mask was treated as a no-op
      mask.
      
      With this patch we know check if the only element of a combined shuffle mask is
      SM_SentinelZero. In case, we propagate a zero vector.
      
      Differential Revision: http://reviews.llvm.org/D13364
      
      llvm-svn: 250027
      a0922ed8
    • Zlatko Buljan's avatar
      Test commit · d76b666a
      Zlatko Buljan authored
      llvm-svn: 250026
      d76b666a
    • Tobias Grosser's avatar
      SCEV: Allow simple AddRec * Parameter products in delinearization · 374bce0c
      Tobias Grosser authored
      This patch also allows the -delinearize pass to delinearize expressions that do
      not have an outermost SCEVAddRec expression. The SCEV::delinearize
      infrastructure allowed this since r240952, but the -delinearize pass was not
      updated yet.
      
      llvm-svn: 250018
      374bce0c
    • Craig Topper's avatar
      [X86] Use u8imm for the immediate type for all shift and rotate instructions.... · 8d2e6bc2
      Craig Topper authored
      [X86] Use u8imm for the immediate type for all shift and rotate instructions. This way the assembler will perform range checking. Believe this matches gas behavior.
      
      llvm-svn: 250016
      8d2e6bc2
    • Craig Topper's avatar
      [X86] Add support to assembler and MCInst lowering to use the other vmovq... · d6b661db
      Craig Topper authored
      [X86] Add support to assembler and MCInst lowering to use the other vmovq %xmmX, %xmmX encoding if it would be a shorter VEX encoding.
      
      llvm-svn: 250014
      d6b661db
    • Craig Topper's avatar
      [X86] Cleanup formatting a bit. NFC · 635e05df
      Craig Topper authored
      llvm-svn: 250013
      635e05df
    • Craig Topper's avatar
      [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly... · 5be914ed
      Craig Topper authored
      [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size.
      
      llvm-svn: 250012
      5be914ed
    • Craig Topper's avatar
      [X86] Add some instruction aliases to get the assembly parser table to favor... · 95fffba2
      Craig Topper authored
      [X86] Add some instruction aliases to get the assembly parser table to favor arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax.
      
      This allows us to remove the explicit code for working around the existing priority
      
      llvm-svn: 250011
      95fffba2
  3. Oct 11, 2015
Loading