- Dec 12, 2013
-
-
Yi Jiang authored
llvm-svn: 197109
-
Hal Finkel authored
llvm-svn: 197100
-
Hal Finkel authored
Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099
-
Chad Rosier authored
intrinsics to use f32 types, rather than their vector equivalents. llvm-svn: 197090
-
Hal Finkel authored
For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089
-
- Dec 11, 2013
-
-
Chad Rosier authored
use f32/f64 types, rather than their vector equivalents. llvm-svn: 197068
-
Chad Rosier authored
floating-point reciprocal square root step LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector equivalents. llvm-svn: 197067
-
Chad Rosier authored
point reciprocal exponent, and floating-point reciprocal square root estimate LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector equivalents. llvm-svn: 197066
-
Rafael Espindola authored
llvm-svn: 197064
-
Tom Stellard authored
This makes it a little easier to read. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197058
-
Tom Stellard authored
This enables -print-before-all to dump MachineInstrs after it is run. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197057
-
Tom Stellard authored
This enables -print-before-all to dump MachineInstrs after it is run. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197056
-
Logan Chien authored
llvm-svn: 197052
-
Tim Northover authored
The tests were no longer using fast-isel at all (MachO needs an "ios" rather than "darwin" triple at the moment and Linux needs ARM mode). Once that was corrected, the verifier complained about a t2ADDri created for the alloca. llvm-svn: 197046
-
Elena Demikhovsky authored
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions). llvm-svn: 197041
-
Richard Sandiford authored
In such cases it's often better to test the result of the negation instead, since the negation also sets CC. llvm-svn: 197032
-
Reed Kotler authored
llvm-svn: 196999
-
Kevin Qin authored
llvm-svn: 196998
-
Rafael Espindola authored
llvm-svn: 196996
-
Rafael Espindola authored
llvm-svn: 196990
-
NAKAMURA Takumi authored
llvm-svn: 196988
-
Rafael Espindola authored
llvm-svn: 196987
-
Reid Kleckner authored
The combination of inline asm, stack realignment, and dynamic allocas turns out to be too common to reject out of hand. ASan inserts empy inline asm fragments and uses aligned allocas. Compiling any trivial function containing a dynamic alloca with ASan is enough to trigger the check. XFAIL the test cases that would be miscompiled and add one that uses the relevant functionality. llvm-svn: 196986
-
- Dec 10, 2013
-
-
Rafael Espindola authored
llvm-svn: 196976
-
Matt Arsenault authored
llvm-svn: 196971
-
David Fang authored
.weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970
-
Chad Rosier authored
intrinsic to use f32/f64 types, rather than their vector equivalents. llvm-svn: 196965
-
Chad Rosier authored
LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents. llvm-svn: 196964
-
Chad Rosier authored
and fixed-point convert to floating-point LLVM AArch64 intrinsics. llvm-svn: 196963
-
Chad Rosier authored
LLVM AArch64 intrinsics. llvm-svn: 196962
-
Reid Kleckner authored
This re-lands commit r196876, which was reverted in r196879. The tests have been fixed to pass on platforms with a stack alignment larger than 4. Update to clang side tests will land shortly. llvm-svn: 196939
-
Tim Northover authored
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously true, unless they'd put the compiler in a box with a gun attached to a photon detector. This makes sure precisely one of the three formats is true for any triple and simplifies some target logic based on that. llvm-svn: 196934
-
Chad Rosier authored
that they use float/double rather than the vector equivalents when appropriate. llvm-svn: 196930
-
Chad Rosier authored
Specifically, reuse the ARM intrinsics when possible. llvm-svn: 196926
-
Andrea Di Biagio authored
immediately after SSE scalar fp instructions like addss or mulss. Added patterns to select SSE scalar fp arithmetic instructions from a scalar fp operation followed by a blend. For example, given the following code: __m128 foo(__m128 A, __m128 B) { A[0] += B[0]; return A; } previously we generated: addss %xmm0, %xmm1 movss %xmm1, %xmm0 now we generate: addss %xmm1, %xmm0 llvm-svn: 196925
-
Vincent Lejeune authored
llvm-svn: 196923
-
Vincent Lejeune authored
llvm-svn: 196922
-
Reed Kotler authored
Save S2(reg 18) only when we are calling floating point stubs that have a return value of float or complex. Some more work to make this better but this is the first step. llvm-svn: 196921
-
Elena Demikhovsky authored
llvm-svn: 196918
-
Elena Demikhovsky authored
llvm-svn: 196914
-