- Jan 16, 2019
-
-
Hans Wennborg authored
This broke the build, ending up with too long command-lines when invoking gen-mscv-exports.py. > As it says in the subject, should have gone long enough now that this > should be safe. This will greatly simplify dealing with LLVM for people > that just want to use the C API on windows. This is a follow up from > D35077. > > Patch by Jakob Bornecrantz! > > Differential revision: https://reviews.llvm.org/D56774 llvm-svn: 351329
-
Gabor Buella authored
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic. For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation) ``` %rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast %5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %5) ``` isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion. As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation. The test is a greatly simplified to just reproduce the assertion. Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com> Reviewers: chandlerc, craig.topper Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D55934 llvm-svn: 351325
-
Hans Wennborg authored
As it says in the subject, should have gone long enough now that this should be safe. This will greatly simplify dealing with LLVM for people that just want to use the C API on windows. This is a follow up from D35077. Patch by Jakob Bornecrantz! Differential revision: https://reviews.llvm.org/D56774 llvm-svn: 351324
-
Philip Pfaffe authored
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56734 llvm-svn: 351322
-
Hans Wennborg authored
llvm-svn: 351320
-
Florian Hahn authored
The value returned by max() is the last valid value, adjust the comparison accordingly. The code added in D55073 creates TokenFactors with max() operands. Reviewers: aemerson, efriedma, RKSimon, craig.topper Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D56738 llvm-svn: 351318
-
Pavel Labath authored
Summary: The version of make_absolute which accepted a specific directory to use as the "base" for the computation could never fail, even though it returned a std::error_code. The reason for that seems to be historical -- the CWD flavour (which can fail due to failure to retrieve CWD) was there first, and the new version was implemented by extending that. This removes the error return value from the non-CWD overload and reimplements the CWD version on top of that. This enables us to remove some dead code where people were pessimistically trying to handle the errors returned from this function. Reviewers: zturner, sammccall Subscribers: hiraditya, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D56599 llvm-svn: 351317
-
Philip Pfaffe authored
Summary: Second iteration of D56433 which got reverted in rL350719. The problem in the previous version was that we dropped the thunk calling the tsan init function. The new version keeps the thunk which should appease dyld, but is not actually OK wrt. the current semantics of function passes. Hence, add a helper to insert the functions only on the first time. The helper allows hooking into the insertion to be able to append them to the global ctors list. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56538 llvm-svn: 351314
-
Sam Parker authored
ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310
-
Hans Wennborg authored
llvm-svn: 351309
-
Martin Storsjö authored
This allows avoiding conflicts between paths that begin with the same chars as some llvm-rc options (which can be used with either slashes or dashes). Differential Revision: https://reviews.llvm.org/D56743 llvm-svn: 351305
-
Dmitry Venikov authored
Summary: Provides -C as alias to -demangle. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40069. Reviewers: jhenderson, ruiu, rnk, fjricci Reviewed By: jhenderson, ruiu Subscribers: rupprecht, erik.pilkington, llvm-commits Differential Revision: https://reviews.llvm.org/D56591 llvm-svn: 351300
-
Dan Gohman authored
llvm-svn: 351297
-
Tom Stellard authored
Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296
-
Serguei Katkov authored
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295
-
Peter Collingbourne authored
llvm-svn: 351293
-
Sam Clegg authored
This change bumps for version number of the wasm object file metadata. See https://github.com/WebAssembly/tool-conventions/pull/92 Differential Revision: https://reviews.llvm.org/D56758 llvm-svn: 351285
-
Aditya Nandakumar authored
https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283
-
Mandeep Singh Grang authored
Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281
-
Craig Topper authored
llvm-svn: 351279
-
Craig Topper authored
That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use. llvm-svn: 351278
-
Peter Collingbourne authored
The Android sanitizer tests are currently some of the most difficult to run correctly, requiring at least 3 build directories which have to be configured in just the right way and built in the correct order (see e.g. [1] and the functions that it calls). This patch adds a check-hwasan target which greatly simplifies running the hwasan tests for gn users, taking advantage of its support for multiple toolchains. With this the tests can be run simply by setting an NDK path and running "ninja check-hwasan" with a compatible Android device connected. The Linux/x86_64 and Android/aarch64 targets are tested in parallel. [1] https://github.com/llvm/llvm-zorg/blob/master/zorg/buildbot/builders/sanitizers/buildbot_android.sh Differential Revision: https://reviews.llvm.org/D56713 llvm-svn: 351277
-
Craig Topper authored
We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275
-
Changpeng Fang authored
Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
-
- Jan 15, 2019
-
-
Jonas Devlieghere authored
When generating a reproducer in LLDB we build up the mapping but don't immediately copy over the files on the file system. Rather than keeping a separate data structure with real and virtual paths, we might as well reuse the entries already stored in the YAMLVFSWriter to lazily copy over the files when needed. llvm-svn: 351266
-
Jonas Devlieghere authored
This moves the RedirectingFileSystem into the header so it can be extended. This is needed in LLDB we need a way to obtain the external path to deal with FILE* and file descriptor APIs. Discussion on the mailing list: http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html Differential revision: https://reviews.llvm.org/D54277 llvm-svn: 351265
-
Jordan Rupprecht authored
llvm-svn: 351259
-
Peter Collingbourne authored
Differential Revision: https://reviews.llvm.org/D56711 llvm-svn: 351258
-
Jordan Rupprecht authored
[llvm-ar] Resubmit recursive thin archive test with fix for full path names and better error messages llvm-svn: 351256
-
Peter Collingbourne authored
The path to the resource directory will end up being used in several more places once the support for running check-hwasan lands. This moves the definition to a central location so that it can be used from those places. Differential Revision: https://reviews.llvm.org/D56700 llvm-svn: 351255
-
Craig Topper authored
[X86] Add the GCCBuiltin name back to the deprecated avx512 gather intrinsics until the clang side patch for the new versions is approved. llvm-svn: 351254
-
Roman Lebedev authored
Summary: Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`. That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]]. That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it. It will not be unexpected if you will be doing further computations in 32-bit width. And then the current code breaks, as the tests show. The basic idea/pattern here is following: 1. We have `i64` input 2. We perform `i64` right-shift on it. 3. We `trunc`ate that shifted value 4. We do all further work (masking) in `i32` Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift. BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64` and truncate the result after masking: https://rise4fun.com/Alive/K4B ``` Name: @bextr64_32_b1 -> @bextr64_32_b0 %shiftedval = lshr i64 %val, %numskipbits %truncshiftedval = trunc i64 %shiftedval to i32 %widenumlowbits1 = zext i8 %numlowbits to i32 %notmask1 = shl nsw i32 -1, %widenumlowbits1 %mask1 = xor i32 %notmask1, -1 %res = and i32 %truncshiftedval, %mask1 => %shiftedval = lshr i64 %val, %numskipbits %widenumlowbits = zext i8 %numlowbits to i64 %notmask = shl nsw i64 -1, %widenumlowbits %mask = xor i64 %notmask, -1 %wideres = and i64 %shiftedval, %mask %res = trunc i64 %wideres to i32 ``` Thus, we are again able to extract that `lshr` into `BEXTR`'s control. Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea: ``` $ cat /tmp/old.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RSI # LLVM-EXEGESIS-LIVEIN EDX # LLVM-EXEGESIS-LIVEIN RDI movq %rsi, %rcx shrq %cl, %rdi shll $8, %edx bextrl %edx, %edi, %eax $ cat /tmp/old.s | ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o --- mode: latency key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.6638, per_snippet_value: 2.6552 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... $ cat /tmp/old.s | ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o --- mode: uops key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... ``` vs ``` $ cat /tmp/new.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RDX # LLVM-EXEGESIS-LIVEIN SIL # LLVM-EXEGESIS-LIVEIN RDI shlq $8, %rdx movzbl %sil, %eax orq %rdx, %rax bextrq %rax, %rdi, %rax $ cat /tmp/new.s | ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o --- mode: latency key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.7454, per_snippet_value: 2.9816 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... $ cat /tmp/new.s | ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o --- mode: uops key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... ``` ^ latency increased (worse). Except //maybe// not really. Like with all synthetic benchmarks, they //may// be misleading. Let's take a look on some actual real-world hotpath. In this case it's 'my' [[ https://github.com/darktable-org/rawspeed | RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ https://github.com/darktable-org/rawspeed/blob/e3316dc85127c2c29baa40f998f198a7b278bf36/src/librawspeed/decompressors/VC5Decompressor.cpp#L814 | GoPro VC5 decompressor ]]: ``` raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM 2018-12-22 21:23:03 Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench Run on (8 X 4012.81 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.41, 2.41, 2.03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 40 ms 40 ms 128 0.322244 7.96974 12M 37.4457M 298.534M 3.12047 24.8778 0.040465 GOPR9172.GPR/threads:8/real_time_median 39 ms 39 ms 128 0.312606 7.99155 12M 38.387M 306.788M 3.19891 25.5656 0.039115 GOPR9172.GPR/threads:8/real_time_stddev 4 ms 3 ms 128 0.0271557 0.130575 0 2.4941M 21.3909M 0.207842 1.78257 3.81081m RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9 2018-12-22 21:23:08 Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Run on (8 X 4013.1 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.78, 2.50, 2.06 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 39 ms 39 ms 128 0.311533 7.97323 12M 38.6828M 308.471M 3.22356 25.706 0.0390928 GOPR9172.GPR/threads:8/real_time_median 38 ms 38 ms 128 0.304231 7.99005 12M 39.4437M 315.527M 3.28698 26.294 0.0380316 GOPR9172.GPR/threads:8/real_time_stddev 3 ms 3 ms 128 0.0229149 0.133814 0 2.26225M 19.1421M 0.188521 1.59517 3.13671m Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 GOPR9172.GPR/threads:8/real_time_mean -0.0339 -0.0316 40 39 40 39 GOPR9172.GPR/threads:8/real_time_median -0.0277 -0.0274 39 38 39 38 GOPR9172.GPR/threads:8/real_time_stddev -0.1769 -0.1267 4 3 3 3 ``` I.e. this results in //roughly// -3% improvements in perf. While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]], it won't address it fully. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: craig.topper Subscribers: courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D56052 llvm-svn: 351253
-
David Callahan authored
Summary: InvokeInst should be treated like CallInst and assigned a separate discriminator. This is particularly import when an Invoke is converted to a Call during compilation and so can invalidate sample profile data collected wtih different link time optimizations Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56491 llvm-svn: 351251
-
Peter Collingbourne authored
While here, add a use_lld flag and default it to true when using clang on non-mac. Differential Revision: https://reviews.llvm.org/D56710 llvm-svn: 351248
-
Matt Morehouse authored
Summary: Comdat groups override weak symbol behavior, allowing the linker to keep the comdats for weak symbols in favor of comdats for strong symbols. Fixes the issue described in: https://bugs.chromium.org/p/chromium/issues/detail?id=918662 Reviewers: eugenis, pcc, rnk Reviewed By: pcc, rnk Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56516 llvm-svn: 351247
-
Peter Collingbourne authored
This allows the hwasan runtime to be built for Android aarch64. Differential Revision: https://reviews.llvm.org/D56628 llvm-svn: 351246
-
Peter Collingbourne authored
llvm-svn: 351242
-
Alexey Bataev authored
llvm-svn: 351240
-
Michael Trent authored
Summary: When running llvm-objdump with the -macho option objdump will by default disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is treated no diferently than -disassemble. This change upates llvm-objdump's MachO parsing code to disassemble all __text sections found in a file when -disassemble-all is specified. This is useful for disassembling files with more than one __text section, or when disassembling files whose __text section is not present in __TEXT. I added a lit test case that verifies "llvm-objdump -m -d" and "llvm-objdump -m -D" produce the expected results on a reference binary. I also updated the CommandGuide documentation for llvm-objdump.rst and verified it renders correctly as man and html. rdar://42899338 Reviewers: ab, pete, lhames Reviewed By: lhames Subscribers: rupprecht, llvm-commits Differential Revision: https://reviews.llvm.org/D56649 llvm-svn: 351238
-
Craig Topper authored
[X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234
-