Skip to content
  1. Jan 29, 2014
    • Chandler Carruth's avatar
      [LPM] Fix PR18643, another scary place where loop transforms failed to · d4be9dc0
      Chandler Carruth authored
      preserve loop simplify of enclosing loops.
      
      The problem here starts with LoopRotation which ends up cloning code out
      of the latch into the new preheader it is buidling. This can create
      a new edge from the preheader into the exit block of the loop which
      breaks LoopSimplify form. The code tries to fix this by splitting the
      critical edge between the latch and the exit block to get a new exit
      block that only the latch dominates. This sadly isn't sufficient.
      
      The exit block may be an exit block for multiple nested loops. When we
      clone an edge from the latch of the inner loop to the new preheader
      being built in the outer loop, we create an exiting edge from the outer
      loop to this exit block. Despite breaking the LoopSimplify form for the
      inner loop, this is fine for the outer loop. However, when we split the
      edge from the inner loop to the exit block, we create a new block which
      is in neither the inner nor outer loop as the new exit block. This is
      a predecessor to the old exit block, and so the split itself takes the
      outer loop out of LoopSimplify form. We need to split every edge
      entering the exit block from inside a loop nested more deeply than the
      exit block in order to preserve all of the loop simplify constraints.
      
      Once we try to do that, a problem with splitting critical edges
      surfaces. Previously, we tried a very brute force to update LoopSimplify
      form by re-computing it for all exit blocks. We don't need to do this,
      and doing this much will sometimes but not always overlap with the
      LoopRotate bug fix. Instead, the code needs to specifically handle the
      cases which can start to violate LoopSimplify -- they aren't that
      common. We need to see if the destination of the split edge was a loop
      exit block in simplified form for the loop of the source of the edge.
      For this to be true, all the predecessors need to be in the exact same
      loop as the source of the edge being split. If the dest block was
      originally in this form, we have to split all of the deges back into
      this loop to recover it. The old mechanism of doing this was
      conservatively correct because at least *one* of the exiting blocks it
      rewrote was the DestBB and so the DestBB's predecessors were fixed. But
      this is a much more targeted way of doing it. Making it targeted is
      important, because ballooning the set of edges touched prevents
      LoopRotate from being able to split edges *it* needs to split to
      preserve loop simplify in a coherent way -- the critical edge splitting
      would sometimes find the other edges in need of splitting but not
      others.
      
      Many, *many* thanks for help from Nick reducing these test cases
      mightily. And helping lots with the analysis here as this one was quite
      tricky to track down.
      
      llvm-svn: 200393
      d4be9dc0
    • Renato Golin's avatar
      Enable EHABI by default · 8cea6e8f
      Renato Golin authored
      After all hard work to implement the EHABI and with the test-suite
      passing, it's time to turn it on by default and allow users to
      disable it as a work-around while we fix the eventual bugs that show
      up.
      
      This commit also remove the -arm-enable-ehabi-descriptors, since we
      want the tables to be printed every time the EHABI is turned on
      for non-Darwin ARM targets.
      
      Although MCJIT EHABI is not working yet (needs linking with the right
      libraries), this commit also fixes some relocations on MCJIT regarding
      the EH tables/lib calls, and update some tests to avoid using EH tables
      when none are needed.
      
      The EH tests in the test-suite that were previously disabled on ARM
      now pass with these changes, so a follow-up commit on the test-suite
      will re-enable them.
      
      llvm-svn: 200388
      8cea6e8f
    • Venkatraman Govindaraju's avatar
      [Sparc] Use %r_disp32 for pc_rel entries in FDE as well. · 141d0e22
      Venkatraman Govindaraju authored
      This makes MCAsmInfo::getExprForFDESymbol() a virtual function and overrides it in SparcMCAsmInfo.
      
      llvm-svn: 200376
      141d0e22
    • NAKAMURA Takumi's avatar
      Revert r200340, "Add line table debug info to COFF files when using a win32 triple." · b366f01f
      NAKAMURA Takumi authored
      It was incompatible with --target=i686-win32.
      
      llvm-svn: 200375
      b366f01f
    • Venkatraman Govindaraju's avatar
      [Sparc] Use %r_disp32 for pc_rel entries in gcc_except_table and eh_frame. · fd5c1f94
      Venkatraman Govindaraju authored
      Otherwise, assembler (gas) fails to assemble them with error message "operation
      combines symbols in different segments". This is because MC computes
      pc_rel entries with subtract expression between labels from different sections.
      
      llvm-svn: 200373
      fd5c1f94
    • Chandler Carruth's avatar
      [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" · 66f0b163
      Chandler Carruth authored
      because of the inside-out run of LoopSimplify in the LoopPassManager and
      the fact that LoopSimplify couldn't be "preserved" across two
      independent LoopPassManagers.
      
      Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
      node because it thought it was rewriting (via SCEV) the incoming value
      to a loop invariant value. While it may well be invariant for the
      current loop, it may be rewritten in terms of an enclosing loop's
      values. This in and of itself is fine, as the LCSSA PHI node in the
      enclosing loop for the inner loop value we're rewriting will have its
      own LCSSA PHI node if used outside of the enclosing loop. With me so
      far?
      
      Well, the current loop and the enclosing loop may share an exiting
      block and exit block, and when they do they also share LCSSA PHI nodes.
      In this case, its not valid to RAUW through the LCSSA PHI node.
      
      Expected crazy test included.
      
      llvm-svn: 200372
      66f0b163
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Don't count the induction variable multiple times · 1aab75ab
      Arnold Schwaighofer authored
      When estimating register pressure, don't count the induction variable mulitple
      times. It is unlikely to be unrolled. This is currently disabled and hidden
      behind a flag ("enable-ind-var-reg-heur").
      
      llvm-svn: 200371
      1aab75ab
    • Venkatraman Govindaraju's avatar
      [SparcV9] Use correct register class (I64RegClass) to hold the address of ... · 50f32d94
      Venkatraman Govindaraju authored
      [SparcV9] Use correct register class (I64RegClass) to hold the address of  _GLOBAL_OFFSET_TABLE_ in sparcv9.
      
      llvm-svn: 200368
      50f32d94
    • Rafael Espindola's avatar
      Use a raw_stream to implement the mangler. · 310f501e
      Rafael Espindola authored
      This is a bit more convenient for some callers, but more importantly, it is
      easier to implement correctly. Doing this removes the patching of already
      printed data that was used for fastcall, fixing a crash with private fastcall
      symbols.
      
      llvm-svn: 200367
      310f501e
    • Kevin Qin's avatar
      [AArch64 NEON] Lower SELECT_CC with vector operand. · 92d64d2d
      Kevin Qin authored
      When the scalar compare is between floating point and operands are
      vector, we custom lower SELECT_CC to use NEON SIMD compare for
      generating less instructions.
      
      llvm-svn: 200365
      92d64d2d
    • Mark Seaborn's avatar
      Remove unnecessary call to pthread_mutexattr_setpshared() · efe919ff
      Mark Seaborn authored
      The default value of this attribute is PTHREAD_PROCESS_PRIVATE, so
      there's no point in calling pthread_mutexattr_setpshared() to set
      that.
      
      See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getpshared.html
      
      This removes some ifdefs that tend to need to be extended for other
      platforms (e.g. for NaCl).
      
      Note that this call was in the first implementation of Mutex, added in
      r22403, so it doesn't appear to have been added in response to a
      performance problem.
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2633
      
      llvm-svn: 200360
      efe919ff
    • David Majnemer's avatar
      MC: Clean up error paths in AsmParser::parseMacroArgument · 1625245b
      David Majnemer authored
      Use an RAII object Instead of inserting a call to
      AsmLexer::setSkipSpace(true) in all error paths.
      
      No functional change.
      
      llvm-svn: 200358
      1625245b
    • Rafael Espindola's avatar
      Make createObjectFile's signature a bit less error prone. · c3ceeb6f
      Rafael Espindola authored
      This will be better with c++11, but right now file_magic converts to bool,
      which makes the api really easy to misuse.
      
      llvm-svn: 200357
      c3ceeb6f
    • David Woodhouse's avatar
      [Sparc] Fix breakage in r200345 · a86694ca
      David Woodhouse authored
      Oops. Don't do build tests on patches like that with --enable-targets=x86_64
      
      llvm-svn: 200355
      a86694ca
    • David Woodhouse's avatar
      Delete MCSubtargetInfo data members from target MCCodeEmitter classes · d2cca113
      David Woodhouse authored
      The subtarget info is explicitly passed to the EncodeInstruction
      method and we should use that subtarget info to influence any
      encoding decisions.
      
      llvm-svn: 200350
      d2cca113
    • David Woodhouse's avatar
      3fa98a65
    • David Woodhouse's avatar
      9784cef3
    • David Woodhouse's avatar
      Keep the MCSubtargetInfo in the MCRelxableFragment class. · f5199f68
      David Woodhouse authored
      Needed to fix PR18303 to correctly re-encode the instruction if it
      is relaxed.
      
      We keep a copy of the MCSubtargetInfo to make sure that we are not
      effected by future changes to the subtarget info coming from the
      assembler (e.g. when parsing .code 16 directived).
      
      llvm-svn: 200347
      f5199f68
    • David Woodhouse's avatar
      Modify MCObjectStreamer EmitInstTo* interface · 6f3c73f7
      David Woodhouse authored
      Add MCSubtargetInfo parameter
      virtual void EmitInstToFragment(const MCInst &Inst, const MCSubtargetInfo &);
      virtual void EmitInstToData(const MCInst &Inst, const MCSubtargetInfo &);
      
      llvm-svn: 200346
      6f3c73f7
    • David Woodhouse's avatar
      Change MCStreamer EmitInstruction interface to take subtarget info · e6c13e4a
      David Woodhouse authored
      llvm-svn: 200345
      e6c13e4a
  2. Jan 28, 2014
    • Timur Iskhodzhanov's avatar
    • Matheus Almeida's avatar
      [mips] Fix ELF header flags. · 2e03f243
      Matheus Almeida authored
      As opposed to GCC/GAS the default ABI for Mips64 is n64.
      Compatibility bit should be set if o32 ABI is used when targeting Mips64.
      
      llvm-svn: 200332
      2e03f243
    • Gautam Chakrabarti's avatar
      [NVPTX] Fix emitting aggregate parameters · 2c283400
      Gautam Chakrabarti authored
      The code was missing the case for aggregate parameters and
      hence was emitting them as .b0 type. Also fixed a couple
      of comments.
      
      llvm-svn: 200325
      2c283400
    • Andrea Di Biagio's avatar
      [X86] Add extra rules for combining vselect dag nodes into movsd. · 2ea61f17
      Andrea Di Biagio authored
      This improves the fix committed at revision 199683 adding the
      following new target specific combine rules:
      
      1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
              (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))
      
      2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
              (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))
      
      3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
              (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))
      
      4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
              (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))
      
      llvm-svn: 200324
      2ea61f17
    • Adrian Prantl's avatar
      typo · c67655a7
      Adrian Prantl authored
      llvm-svn: 200323
      c67655a7
    • Rafael Espindola's avatar
      Fix pr14893. · ab73c493
      Rafael Espindola authored
      When simplifycfg moves an instruction, it must drop metadata it doesn't know
      is still valid with the preconditions changes. In particular, it must drop
      the range and tbaa metadata.
      
      The patch implements this with an utility function to drop all metadata not
      in a white list.
      
      llvm-svn: 200322
      ab73c493
    • Andrea Di Biagio's avatar
      [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend. · b6d39afb
      Andrea Di Biagio authored
      Make sure that we don't introduce illegal build_vector dag nodes
      when trying to fold a sign_extend of a build_vector.
      
      This fixes a regression introduced by r200234.
      Added test CodeGen/X86/fold-vector-sext-crash.ll
      to verify that llc no longer crashes with an assertion failure
      due to an illegal build_vector of type MVT::v4i64.
      
      Thanks to Ilia Filippov for spotting this regression and for
      providing a reproducible test case.
      
      llvm-svn: 200313
      b6d39afb
    • Iain Sandoe's avatar
      Provide a stub Target Streamer implementation for PPC MachO · 625b65a9
      Iain Sandoe authored
      At present, this handles .tc (error) and needs to be expanded to deal properly with .machine
      
      llvm-svn: 200309
      625b65a9
    • Chandler Carruth's avatar
      [vectorizer] Completely disable the block frequency guidance of the loop · b7836285
      Chandler Carruth authored
      vectorizer, placing it behind an off-by-default flag.
      
      It turns out that block frequency isn't what we want at all, here or
      elsewhere. This has been I think a nagging feeling for several of us
      working with it, but Arnold has given some really nice simple examples
      where the results are so comprehensively wrong that they aren't useful.
      
      I'm planning to email the dev list with a summary of why its not really
      useful and a couple of ideas about how to better structure these types
      of heuristics.
      
      llvm-svn: 200294
      b7836285
    • Hal Finkel's avatar
      Handle spilling the PPC GPRC_NOR0 register class · 4e703bce
      Hal Finkel authored
      GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
      register). As a result, we also need to check for it in the spilling code.
      
      llvm-svn: 200288
      4e703bce
    • Timur Iskhodzhanov's avatar
    • Michel Danzer's avatar
      R600/SI: Add pattern for truncating i32 to i1 · bf1a6410
      Michel Danzer authored
      
      
      Fixes half a dozen piglit tests with radeonsi.
      
      Reviewed-by: default avatarTom Stellard <thomas.stellard@amd.com>
      llvm-svn: 200283
      bf1a6410
    • Jakob Stoklund Olesen's avatar
      Fix the DWARF EH encodings for Sparc PIC code. · 83c67735
      Jakob Stoklund Olesen authored
      Also emit the stubs that were generated for references to typeinfo
      symbols.
      
      llvm-svn: 200282
      83c67735
    • Reid Kleckner's avatar
      Update optimization passes to handle inalloca arguments · 26af2cae
      Reid Kleckner authored
      Summary:
      I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
      sites to check for inalloca if appropriate.
      
      I added tests for any change that would allow an optimization to fire on
      inalloca.
      
      Reviewers: nlewycky
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2449
      
      llvm-svn: 200281
      26af2cae
    • Reid Kleckner's avatar
      x86: add implicit defs for cpuid · b2340d4c
      Reid Kleckner authored
      This avoids miscompiling MS inline asm in LLVM where we have to infer
      clobbers.  Test case forthcoming in Clang.
      
      llvm-svn: 200279
      b2340d4c
    • Chandler Carruth's avatar
      [LPM] Fix PR18616 where the shifts to the loop pass manager to extract · d84f776e
      Chandler Carruth authored
      LCSSA from it caused a crasher with the LoopUnroll pass.
      
      This crasher is really nasty. We destroy LCSSA form in a suprising way.
      When unrolling a loop into an outer loop, we not only need to restore
      LCSSA form for the outer loop, but for all children of the outer loop.
      This is somewhat obvious in retrospect, but hey!
      
      While this seems pretty heavy-handed, it's not that bad. Fundamentally,
      we only do this when we unroll a loop, which is already a heavyweight
      operation. We're unrolling all of these hypothetical inner loops as
      well, so their size and complexity is already on the critical path. This
      is just adding another pass over them to re-canonicalize.
      
      I have a test case from PR18616 that is great for reproducing this, but
      pretty useless to check in as it relies on many 10s of nested empty
      loops that get unrolled and deleted in just the right order. =/ What's
      worse is that investigating this has exposed another source of failure
      that is likely to be even harder to test. I'll try to come up with test
      cases for these fixes, but I want to get the fixes into the tree first
      as they're causing crashes in the wild.
      
      llvm-svn: 200273
      d84f776e
    • Juergen Ributzka's avatar
      [TLI] Add a new hook to TargetLowering to query the target if a load of a... · 659ce00d
      Juergen Ributzka authored
      [TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself.
      
      Before this patch we used getIntImmCost from TargetTransformInfo to determine if
      a load of a constant should be converted to just a constant, but the threshold
      for this was set to an arbitrary value. This value works well for the two
      targets (X86 and ARM) that implement this target-hook, but it isn't
      target-independent at all.
      
      Now targets have the possibility to decide directly if this optimization should
      be performed. The default value is set to false to preserve the current
      behavior. The target hook has been moved to TargetLowering, which removed the
      last use and need of TargetTransformInfo in SelectionDAG.
      
      llvm-svn: 200271
      659ce00d
    • Arnold Schwaighofer's avatar
      LoopVectorize: Support conditional stores by scalarizing · 18865db3
      Arnold Schwaighofer authored
      The vectorizer takes a loop like this and widens all instructions except for the
      store. The stores are scalarized/unrolled and hidden behind an "if" block.
      
        for (i = 0; i < 128; ++i) {
          if (a[i] < 10)
            a[i] += val;
        }
      
        for (i = 0; i < 128; i+=2) {
          v = a[i:i+1];
          v0 = (extract v, 0) + 10;
          v1 = (extract v, 1) + 10;
          if (v0 < 10)
            a[i] = v0;
          if (v1 < 10)
            a[i] = v1;
        }
      
      The vectorizer relies on subsequent optimizations to sink instructions into the
      conditional block where they are anticipated.
      
      The flag "vectorize-num-stores-pred" controls whether and how many stores to
      handle this way. Vectorization of conditional stores is disabled per default for
      now.
      
      This patch also adds a change to the heuristic when the flag
      "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
      loops until load/store ports are saturated. This heuristic uses TTI's
      getMaxUnrollFactor as a measure for load/store ports.
      
      I also added a second flag -enable-cond-stores-vec. It will enable vectorization
      of conditional stores. But there is no cost model for vectorization of
      conditional stores in place yet so this will not do good at the moment.
      
      rdar://15892953
      
      Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
      -vectorize-num-stores-pred=1 (before the BFI change):
      
       Performance Regressions:
         Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
         Applications/siod/siod         2.18%
       Performance improvements:
         mesa                          -4.42%
         libquantum                    -4.15%
      
       With a patch that slightly changes the register heuristics (by subtracting the
       induction variable on both sides of the register pressure equation, as the
       induction variable is probably not really unrolled):
      
       Performance Regressions:
         Benchmarks/Ptrdist/yacr2/yacr2  7.73%
         Applications/siod/siod          1.97%
      
       Performance Improvements:
         libquantum                    -13.05% (we now also unroll quantum_toffoli)
         mesa                           -4.27%
      
      llvm-svn: 200270
      18865db3
    • Eric Christopher's avatar
      Revert r199871 and replace it with a simple check in the debug info · 2037caf8
      Eric Christopher authored
      code to see if we're emitting a function into a non-default
      text section. This is still a less-than-ideal solution, but more
      contained than r199871 to determine whether or not we're emitting
      code into an array of comdat sections.
      
      llvm-svn: 200269
      2037caf8
    • Eric Christopher's avatar
      Reformat slightly. · f07ee3ae
      Eric Christopher authored
      llvm-svn: 200264
      f07ee3ae
Loading