- Dec 05, 2013
-
-
Yunzhong Gao authored
llvm-svn: 196523
-
Rafael Espindola authored
We use CSEBlocks to initialize a worklist: SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end()); so it must have a deterministic order. llvm-svn: 196520
-
Eric Christopher authored
llvm-svn: 196519
-
Alp Toker authored
llvm-svn: 196518
-
Andrew Trick authored
This allows a target to use MI-Sched as an in-order scheduler that will model strict resource conflicts without defining a processor itinerary. Instead, the target can now use the new per-operand machine model and define in-order resources with BufferSize=0. For example, this would allow restricting the type of operations that can be formed into a dispatch group. (Normally NumMicroOps is sufficient to enforce dispatch groups). If the intent is to model latency in in-order pipeline, as opposed to resource conflicts, then a resource with BufferSize=1 should be defined instead. This feature is only casually tested as there are no in-tree targets using it yet. However, Hal will be experimenting with POWER7. llvm-svn: 196517
-
Andrew Trick authored
The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: | Benchmarks/BenchmarkGame/recursive | 52.39% | | Benchmarks/VersaBench/beamformer | 20.80% | | Benchmarks/Misc/pi | 19.97% | | Benchmarks/Misc/mandel-2 | 19.95% | | SPEC/CFP2000/188.ammp | 18.72% | | Benchmarks/McCat/08-main/main | 18.58% | | Benchmarks/Misc-C++/Large/sphereflake | 18.46% | | Benchmarks/Olden/power | 17.11% | | Benchmarks/Misc-C++/mandel-text | 16.47% | | Benchmarks/Misc/oourafft | 15.94% | | Benchmarks/Misc/flops-7 | 14.99% | | Benchmarks/FreeBench/distray | 14.26% | | SPEC/CFP2006/470.lbm | 14.00% | | mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% | | Benchmarks/SmallPT/smallpt | 10.36% | | Benchmarks/Misc-C++/Large/ray | 8.97% | | Benchmarks/Misc/fp-convert | 8.75% | | Benchmarks/Olden/perimeter | 7.10% | | Benchmarks/Bullet/bullet | 7.03% | | Benchmarks/Misc/mandel | 6.75% | | Benchmarks/Olden/voronoi | 6.26% | | Benchmarks/Misc/flops-8 | 5.77% | | Benchmarks/Misc/matmul_f64_4x4 | 5.19% | | Benchmarks/MiBench/security-rijndael | 5.15% | | Benchmarks/Misc/flops-6 | 5.10% | | Benchmarks/Olden/tsp | 4.46% | | Benchmarks/MiBench/consumer-lame | 4.28% | | Benchmarks/Misc/flops-5 | 4.27% | | Benchmarks/mafft/pairlocalalign | 4.19% | | Benchmarks/Misc/himenobmtxpa | 4.07% | | Benchmarks/Misc/lowercase | 4.06% | | SPEC/CFP2006/433.milc | 3.99% | | Benchmarks/tramp3d-v4 | 3.79% | | Benchmarks/FreeBench/pifft | 3.66% | | Benchmarks/Ptrdist/ks | 3.21% | | Benchmarks/Adobe-C++/loop_unroll | 3.12% | | SPEC/CINT2000/175.vpr | 3.12% | | Benchmarks/nbench | 2.98% | | SPEC/CFP2000/183.equake | 2.91% | | Benchmarks/Misc/perlin | 2.85% | | Benchmarks/Misc/flops-1 | 2.82% | | Benchmarks/Misc-C++-EH/spirit | 2.80% | | Benchmarks/Misc/flops-2 | 2.77% | | Benchmarks/NPB-serial/is | 2.42% | | Benchmarks/ASC_Sequoia/CrystalMk | 2.33% | | Benchmarks/BenchmarkGame/n-body | 2.28% | | Benchmarks/SciMark2-C/scimark2 | 2.27% | | Benchmarks/Olden/bh | 2.03% | | skidmarks10/skidmarks | 1.81% | | Benchmarks/Misc/flops | 1.72% | Slowdowns: | Benchmarks/llubenchmark/llu | -14.14% | | Benchmarks/Polybench/stencils/seidel-2d | -5.67% | | Benchmarks/Adobe-C++/functionobjects | -5.25% | | Benchmarks/Misc-C++/oopack_v1p8 | -5.00% | | Benchmarks/Shootout/hash | -2.35% | | Benchmarks/Prolangs-C++/ocean | -2.01% | | Benchmarks/Polybench/medley/floyd-warshall | -1.98% | | Polybench/linear-algebra/kernels/3mm | -1.95% | | Benchmarks/McCat/09-vor/vor | -1.68% | llvm-svn: 196516
-
Andrew Trick authored
llvm-svn: 196515
-
Andrew Trick authored
llvm-svn: 196514
-
Andrew Trick authored
llvm-svn: 196513
-
Hans Wennborg authored
as the location for grabbing clang-format.exe, and also output the .vsix here. This allows us to find clang-format.exe when building from a MSVC Solution. llvm-svn: 196512
-
Alp Toker authored
No practical difference in this case and would return 1 either way, but this is more self-explanatory. llvm-svn: 196511
-
Alp Toker authored
llvm-svn: 196510
-
Rafael Espindola authored
Should fix the msan and valgrind bots. llvm-svn: 196509
-
Arnold Schwaighofer authored
We were creating external uses for scalar values in MustGather entries that also had a ScalarToTreeEntry (they also are present in a vectorized tuple). This meant we would keep a value 'alive' as a scalar and vectorized causing havoc. This is not necessary because when we create a MustGather vector we explicitly create external uses entries for the insertelement instructions of the MustGather vector elements. Fixes PR18129. radar://15582184 llvm-svn: 196508
-
Kostya Serebryany authored
[tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations) llvm-svn: 196507
-
Alexey Samsonov authored
llvm-svn: 196506
-
Rui Ueyama authored
llvm-svn: 196505
-
Rui Ueyama authored
Currently we do not de-duplicate library files specified by /defaultlib option. As a result, the same files are added multiple times to the input graph. In particular, some popular files, such as kernel32.lib or oldnames.lib, are added more than 10 times during linking of LLD. That makes the linker slower, as it needs to parse the same file again and again. This patch solves the issue by de-duplicating. The same file will be added only once to the input graph. This patch improved the LLD linking time from 10.5 seconds to 7.7 seconds on my 4-core Core i7 Macbook Pro. llvm-svn: 196504
-
Justin Holewinski authored
llvm-svn: 196503
-
Alexey Samsonov authored
llvm-svn: 196501
-
Alexey Samsonov authored
llvm-svn: 196500
-
Sylvestre Ledru authored
Revert: "Patch from Todd Fiala that install the lldb.py module in the prefix directory and also makes install fail if the prefix directory can't be accessed" Does not respect the prefix llvm-svn: 196499
-
Matheus Almeida authored
in case the operands are constants and its difference is |1|. It should be possible in those cases to rematerialize the result using MIPS's slt and similar instructions. The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed otherwise the optimization implemented in this patch would have been triggered (difference between the operands was 1) and that would have changed the semantic of the tests. llvm-svn: 196498
-
Sergey Matveev authored
Instead of "if (common_flags()->verbosity) Report(...)" we now have macros. llvm-svn: 196497
-
Matheus Almeida authored
The structure of the code was slightly modified so that the next patch is easier to read/review. No functional changes. llvm-svn: 196496
-
Kostya Serebryany authored
llvm-svn: 196495
-
Matheus Almeida authored
not being correctly encoded/decoded. In more detail, immediate fields of LD/ST instructions should be divided/multiplied by the size of the data format before encoding and after decoding, respectively. llvm-svn: 196494
-
Tim Northover authored
We were trying to fold the stack adjustment into the wrong instruction in the situation where the entire basic-block was epilogue code. Really, it can only ever be valid to do the folding precisely where the "add sp, ..." would be placed so there's no need for a separate iterator to track that. Should fix PR18136. llvm-svn: 196493
-
Alexey Samsonov authored
llvm-svn: 196492
-
Kostya Serebryany authored
llvm-svn: 196491
-
Kostya Serebryany authored
[tsan] fix the include path that is broken in configure/make build but works in cmake build (PR18144). This is a quick fix. Will need to fix the configure/make build properly llvm-svn: 196490
-
Kostya Serebryany authored
llvm-svn: 196489
-
Richard Smith authored
delete on a class which has no array cookie and has no class-specific operator new. llvm-svn: 196488
-
Argyrios Kyrtzidis authored
Patch by Erik Verbruggen! llvm-svn: 196487
-
Argyrios Kyrtzidis authored
at a particular reparsing iteration. Passing '-remap-file-1=from:to' will remap the files in the second iteration. llvm-svn: 196486
-
Argyrios Kyrtzidis authored
lldb does not like semicolon as part of an option. llvm-svn: 196485
-
Alp Toker authored
Also use write() for unified diff output to avoid further processing by the print function (e.g. trailing newline). llvm-svn: 196484
-
Sylvestre Ledru authored
llvm-svn: 196483
-
Richard Smith authored
boxes yellow until we release, though. llvm-svn: 196482
-
Richard Smith authored
within their namespace, and such a redeclaration isn't required to be a definition any more. Update DR status page to say Clang 3.4 instead of SVN and add new Clang 3.5 category (but keep Clang 3.4 yellow for now). llvm-svn: 196481
-