Skip to content
  1. Dec 05, 2013
    • Yi Jiang's avatar
      01cfa942
    • Renato Golin's avatar
      Move test to X86 dir · e593fea5
      Renato Golin authored
      Test is platform independent, but I don't want to force vector-width, or
      that could spoil the pragma test.
      
      llvm-svn: 196539
      e593fea5
    • Renato Golin's avatar
      Add #pragma vectorize enable/disable to LLVM · 729a3ae9
      Renato Golin authored
      The intended behaviour is to force vectorization on the presence
      of the flag (either turn on or off), and to continue the behaviour
      as expected in its absence. Tests were added to make sure the all
      cases are covered in opt. No tests were added in other tools with
      the assumption that they should use the PassManagerBuilder in the
      same way.
      
      This patch also removes the outdated -late-vectorize flag, which was
      on by default and not helping much.
      
      The pragma metadata is being attached to the same place as other loop
      metadata, but nothing forbids one from attaching it to a function
      (to enable #pragma optimize) or basic blocks (to hint the basic-block
      vectorizers), etc. The logic should be the same all around.
      
      Patches to Clang to produce the metadata will be produced after the
      initial implementation is agreed upon and committed. Patches to other
      vectorizers (such as SLP and BB) will be added once we're happy with
      the pass manager changes.
      
      llvm-svn: 196537
      729a3ae9
    • Yuchen Wu's avatar
      llvm-cov: Changed extension from .llcov to .gcov. · 9af3938b
      Yuchen Wu authored
      llvm-svn: 196530
      9af3938b
    • Andrew Trick's avatar
      MI-Sched: handle latency of in-order operations with the new machine model. · 880e573d
      Andrew Trick authored
      The per-operand machine model allows the target to define "unbuffered"
      processor resources. This change is a quick, cheap way to model stalls
      caused by the latency of operations that use such resources. This only
      applies when the processor's micro-op buffer size is non-zero
      (Out-of-Order). We can't precisely model in-order stalls during
      out-of-order execution, but this is an easy and effective
      heuristic. It benefits cortex-a9 scheduling when using the new
      machine model, which is not yet on by default.
      
      MI-Sched for armv7 was evaluated on Swift (and only not enabled because
      of a performance bug related to predication). However, we never
      evaluated Cortex-A9 performance on MI-Sched in its current form. This
      change adds MI-Sched functionality to reach performance goals on
      A9. The only remaining change is to allow MI-Sched to run as a PostRA
      pass.
      
      I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
      -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false
      
      For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
      (min run time over 2 runs, filtering tiny changes)
      
      Speedups:
      | Benchmarks/BenchmarkGame/recursive         |  52.39% |
      | Benchmarks/VersaBench/beamformer           |  20.80% |
      | Benchmarks/Misc/pi                         |  19.97% |
      | Benchmarks/Misc/mandel-2                   |  19.95% |
      | SPEC/CFP2000/188.ammp                      |  18.72% |
      | Benchmarks/McCat/08-main/main              |  18.58% |
      | Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
      | Benchmarks/Olden/power                     |  17.11% |
      | Benchmarks/Misc-C++/mandel-text            |  16.47% |
      | Benchmarks/Misc/oourafft                   |  15.94% |
      | Benchmarks/Misc/flops-7                    |  14.99% |
      | Benchmarks/FreeBench/distray               |  14.26% |
      | SPEC/CFP2006/470.lbm                       |  14.00% |
      | mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
      | Benchmarks/SmallPT/smallpt                 |  10.36% |
      | Benchmarks/Misc-C++/Large/ray              |   8.97% |
      | Benchmarks/Misc/fp-convert                 |   8.75% |
      | Benchmarks/Olden/perimeter                 |   7.10% |
      | Benchmarks/Bullet/bullet                   |   7.03% |
      | Benchmarks/Misc/mandel                     |   6.75% |
      | Benchmarks/Olden/voronoi                   |   6.26% |
      | Benchmarks/Misc/flops-8                    |   5.77% |
      | Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
      | Benchmarks/MiBench/security-rijndael       |   5.15% |
      | Benchmarks/Misc/flops-6                    |   5.10% |
      | Benchmarks/Olden/tsp                       |   4.46% |
      | Benchmarks/MiBench/consumer-lame           |   4.28% |
      | Benchmarks/Misc/flops-5                    |   4.27% |
      | Benchmarks/mafft/pairlocalalign            |   4.19% |
      | Benchmarks/Misc/himenobmtxpa               |   4.07% |
      | Benchmarks/Misc/lowercase                  |   4.06% |
      | SPEC/CFP2006/433.milc                      |   3.99% |
      | Benchmarks/tramp3d-v4                      |   3.79% |
      | Benchmarks/FreeBench/pifft                 |   3.66% |
      | Benchmarks/Ptrdist/ks                      |   3.21% |
      | Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
      | SPEC/CINT2000/175.vpr                      |   3.12% |
      | Benchmarks/nbench                          |   2.98% |
      | SPEC/CFP2000/183.equake                    |   2.91% |
      | Benchmarks/Misc/perlin                     |   2.85% |
      | Benchmarks/Misc/flops-1                    |   2.82% |
      | Benchmarks/Misc-C++-EH/spirit              |   2.80% |
      | Benchmarks/Misc/flops-2                    |   2.77% |
      | Benchmarks/NPB-serial/is                   |   2.42% |
      | Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
      | Benchmarks/BenchmarkGame/n-body            |   2.28% |
      | Benchmarks/SciMark2-C/scimark2             |   2.27% |
      | Benchmarks/Olden/bh                        |   2.03% |
      | skidmarks10/skidmarks                      |   1.81% |
      | Benchmarks/Misc/flops                      |   1.72% |
      
      Slowdowns:
      | Benchmarks/llubenchmark/llu                | -14.14% |
      | Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
      | Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
      | Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
      | Benchmarks/Shootout/hash                   |  -2.35% |
      | Benchmarks/Prolangs-C++/ocean              |  -2.01% |
      | Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
      | Polybench/linear-algebra/kernels/3mm       |  -1.95% |
      | Benchmarks/McCat/09-vor/vor                |  -1.68% |
      
      llvm-svn: 196516
      880e573d
    • Arnold Schwaighofer's avatar
      SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use · 7ee53cac
      Arnold Schwaighofer authored
      We were creating external uses for scalar values in MustGather entries that also
      had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
      meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
      This is not necessary because when we create a MustGather vector we explicitly
      create external uses entries for the insertelement instructions of the
      MustGather vector elements.
      
      Fixes PR18129.
      
      radar://15582184
      
      llvm-svn: 196508
      7ee53cac
    • Kostya Serebryany's avatar
      [tsan] fix PR18146: sometimes a variable written into vptr could have an... · 2460c3fc
      Kostya Serebryany authored
      [tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations)
      
      llvm-svn: 196507
      2460c3fc
    • Justin Holewinski's avatar
      4459717b
    • Matheus Almeida's avatar
      [mips] Small code generation improvement for conditional operator (select) · a6beac1a
      Matheus Almeida authored
      in case the operands are constants and its difference is |1|.
      It should be possible in those cases to rematerialize the result using
      MIPS's slt and similar instructions.
      
      The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
      otherwise the optimization implemented in this patch would have been triggered
      (difference between the operands was 1) and that would have changed the semantic
      of the tests.
      
      llvm-svn: 196498
      a6beac1a
    • Matheus Almeida's avatar
      [mips][msa] Fix issue with immediate fields of LD/ST instructions · 6b59c449
      Matheus Almeida authored
      not being correctly encoded/decoded.
      In more detail, immediate fields of LD/ST instructions should be
      divided/multiplied by the size of the data format before encoding and
      after decoding, respectively.
      
      llvm-svn: 196494
      6b59c449
    • Tim Northover's avatar
      ARM: fix yet another stack-folding bug · e4def5e2
      Tim Northover authored
      We were trying to fold the stack adjustment into the wrong instruction in the
      situation where the entire basic-block was epilogue code. Really, it can only
      ever be valid to do the folding precisely where the "add sp, ..." would be
      placed so there's no need for a separate iterator to track that.
      
      Should fix PR18136.
      
      llvm-svn: 196493
      e4def5e2
    • Alp Toker's avatar
      Correct word hyphenations · f907b891
      Alp Toker authored
      This patch tries to avoid unrelated changes other than fixing a few
      hyphen-related ambiguities and contractions in nearby lines.
      
      llvm-svn: 196471
      f907b891
    • Rafael Espindola's avatar
      Hide the stub created for MO_ExternalSymbol too. · 01d19d02
      Rafael Espindola authored
      given
      
      declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
      declare void @foo()
      define void @bar() {
        call void @foo()
        call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
        ret void
      }
      
      We used to produce
      
      L_foo$stub:
              .indirect_symbol        _foo
              .ascii  "\364\364\364\364\364"
      
      _memset$stub:
              .indirect_symbol        _memset
              .ascii  "\364\364\364\364\364"
      
      We not produce a private stub for memset too.
      
      Stubs are not needed with recent linkers, but we still produce them for darwin8.
      
      Thanks to David Fang for confirming that gcc used to do this too.
      
      llvm-svn: 196468
      01d19d02
    • Matt Arsenault's avatar
      R600/SI: Add comments for number of used registers. · 89cc49fe
      Matt Arsenault authored
      llvm-svn: 196467
      89cc49fe
    • NAKAMURA Takumi's avatar
      Move llvm/test/MC/ELF/thumb-st_other.s to test/MC/ARM. · 57b20a7e
      NAKAMURA Takumi authored
      llvm-svn: 196457
      57b20a7e
    • Jiangning Liu's avatar
    • Cameron McInally's avatar
      Add FileCheck statements for r196435. · 164097a6
      Cameron McInally authored
      llvm-svn: 196449
      164097a6
    • Eric Christopher's avatar
      Make these two tests resilient in the face of compile unit size · c4dd56b9
      Eric Christopher authored
      changes.
      
      llvm-svn: 196444
      c4dd56b9
    • Logan Chien's avatar
      [mc] Fix ELF st_other flag. · ee36595c
      Logan Chien authored
      ELF_Other_Weakref and ELF_Other_ThumbFunc seems to be LLVM
      internal ELF symbol flags.  These should not be emitted to
      object file.
      
      This commit defines ELF_STO_Shift for the target-defined
      flags for st_other, and increase the value of
      ELF_Other_Shift to 16.
      
      llvm-svn: 196440
      ee36595c
    • Cameron McInally's avatar
      Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load. · 30bbb214
      Cameron McInally authored
      Patch by Aleksey Bader.
      
      llvm-svn: 196435
      30bbb214
    • Kevin Enderby's avatar
      Fix a bug in darwin's 32-bit X86 handling of evaluating fixups. · 86496a45
      Kevin Enderby authored
      Where it would use a scattered relocation entry but falls back to a
      normal relocation entry because the FixupOffset is more than 24-bits.
      
      The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
      it changes reference parameter FixedValue but then returns false to indicate
      it did not create a scattered relocation entry.  The fix is simply to save the
      original value of the parameter FixedValue at the start of the method and
      restore it if we are returning false in that case.
      
      rdar://15526046
      
      llvm-svn: 196432
      86496a45
  2. Dec 04, 2013
  3. Dec 03, 2013
Loading