- Feb 27, 2015
-
-
David Blaikie authored
Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
-
David Blaikie authored
[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
-
Mehdi Amini authored
Summary: Currently fast-isel-abort will only abort for regular instructions, and just warn for function calls, terminators, function arguments. There is already fast-isel-abort-args but nothing for calls and terminators. This change turns the fast-isel-abort options into an integer option, so that multiple levels of strictness can be defined. This will help no being surprised when the "abort" option indeed does not abort, and enables the possibility to write test that verifies that no intrinsics are forgotten by fast-isel. Reviewers: resistor, echristo Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D7941 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 230775
-
Renato Golin authored
Patch by Patrick Wildt. llvm-svn: 230762
-
Petar Jovanovic authored
Not passing mtriple for one of the tests caused a regression failure on MIPS buildbot. The issue was introduced by r230651. Differential Revision: http://reviews.llvm.org/D7938 llvm-svn: 230756
-
- Feb 26, 2015
-
-
Sumanth Gundapaneni authored
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the features bits are not computed. This patch lets the asm printer emit ".cpu cortex-a9" directive for krait and the hwdiv feature is enabled through ".arch_extension". In short, krait is treated as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since it is not supported bu GNU GAS yet llvm-svn: 230651
-
- Feb 25, 2015
-
-
Renato Golin authored
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions: * Use word loads for anyext byte and halfword loads from the stack. * Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid. * Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index. * Correct how many bits of offset we think AddrModeT1_s has. Patch by John Brawn. llvm-svn: 230496
-
- Feb 24, 2015
-
-
Simon Pilgrim authored
Added test case for PR22678 (check CONCAT_VECTORS DAG combiner pass doesn't introduce illegal types) llvm-svn: 230386
-
Tim Northover authored
The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348
-
- Feb 20, 2015
-
-
Ahmed Bougacha authored
This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
-
- Feb 19, 2015
-
-
Peter Collingbourne authored
llvm-svn: 229798
-
- Feb 18, 2015
-
-
Bradley Smith authored
Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 llvm-svn: 229660
-
- Feb 17, 2015
-
-
Sanjay Patel authored
This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 llvm-svn: 229511
-
- Feb 11, 2015
-
-
James Molloy authored
analysis. We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness" heuristic and use TTI directly. Generally NFC intended, but we're using a slightly different heuristic now so there is a slight test churn. Test changes: * combine-comparisons-by-cse.ll: Removed unneeded branch check. * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq. * coalesce-subregs.ll: Superfluous block check. * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv. * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present. * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI. llvm-svn: 228826
-
- Feb 10, 2015
-
-
Bradley Smith authored
llvm-svn: 228696
-
- Feb 09, 2015
-
-
Akira Hatanaka authored
wrong basic block. This would happen when the result of an invoke was used by a phi instruction in the invoke's normal destination block. An instruction to reload the invoke's value would get inserted before the critical edge was split and a new basic block (which is the correct insertion point for the reload) was created. This commit fixes the bug by splitting the critical edge before all the reload instructions are inserted. Also, hoist up the code which computes the insertion point to the only place that need that computation. rdar://problem/15978721 llvm-svn: 228566
-
- Feb 08, 2015
-
-
Tim Northover authored
While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518
-
- Feb 07, 2015
-
-
David Majnemer authored
COFF section flags are not idempotent: 'rd' will make a read-write section because 'd' implies write 'dr' will make a read-only section because 'r' disables write llvm-svn: 228490
-
- Feb 05, 2015
-
-
Ahmed Bougacha authored
llvm-svn: 228259
-
Ahmed Bougacha authored
The q8/d16 thing is silly; I'd be happy to hear about a better way to write those tests where simple substitution isn't enough.. llvm-svn: 228258
-
- Feb 04, 2015
-
-
Renato Golin authored
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp load balancing pass isn't enabled for Cortex-A72 as it's not profitable to have it enabled for this core. Patch by Ranjeet Singh. llvm-svn: 228140
-
Renato Golin authored
This reverts patches 223862, 224198, 224203, and 224754, which were all related to the vector load/store combining and were reverted/reaplied a few times due to the same alignment problems we're seeing now. Further tests, mainly self-hosting Clang, will be needed to reapply this patch in the future. llvm-svn: 228129
-
- Feb 02, 2015
-
-
Jan Wen Voung authored
Summary: Previously it only avoided optimizing signed comparisons to 0. Sometimes the DAGCombiner will optimize the unsigned comparisons to 0 before it gets to the peephole pass, but sometimes it doesn't. Fix for PR22373. Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll Reviewers: jfb, manmanren Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D7274 llvm-svn: 227809
-
- Jan 31, 2015
-
-
Saleem Abdulrasool authored
Now that -mstack-probe-size is piped through to the backend via the function attribute as on Windows x86, honour the value to permit handling of non-default values for stack probes. This is needed /Gs with the clang-cl driver or -mstack-probe-size with the clang driver when targeting Windows on ARM. llvm-svn: 227667
-
- Jan 29, 2015
-
-
Charlie Turner authored
llvm-svn: 227429
-
- Jan 23, 2015
-
-
Jyoti Allur authored
_foo: smull r0, r1, r1, r0 smull r2, r3, r3, r2 adds r0, r2, r0 adc r1, r3, r1 bx lr to _foo: smull r0, r1, r1, r0 smlal r0, r1, r3, r2 bx lr llvm-svn: 226904
-
- Jan 21, 2015
-
-
Jon Roelofs authored
Thumbv4t does not have lo->lo copies other than MOVS, and that can't be predicated. So emit MOVS when needed and bail if there's a predicate. http://reviews.llvm.org/D6592 llvm-svn: 226711
-
- Jan 19, 2015
-
-
Rafael Espindola authored
No change in this commit, but clang was changed to also produce trivial comdats when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226467
-
- Jan 16, 2015
-
-
Timur Iskhodzhanov authored
This breaks AddressSanitizer (ninja check-asan) on Windows llvm-svn: 226251
-
Rafael Espindola authored
This reverts commit r226173, adding r226038 back. No change in this commit, but clang was changed to also produce trivial comdats for costructors, destructors and vtables when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226242
-
- Jan 15, 2015
-
-
Hal Finkel authored
Reapply r226071 with fixes. Two fixes: 1. We need to manually remove the old and create the new 'deaf defs' associated with physical register definitions when we move the definition of the physical register from the copy point to the point of the original vreg def. This problem was picked up by the machinstr verifier, and could trigger a verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've turned on the verifier in the tests. 2. When moving the def point of the phys reg up, we need to make sure that it is neither defined nor read in between the two instructions. We don't, however, extend the live ranges of phys reg defs to cover uses, so just checking for live-range overlap between the pair interval and the phys reg aliases won't pick up reads. As a result, we manually iterate over the range and check for reads. A test soon to be committed to the PowerPC backend will test this change. Original commit message: [RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying *from* a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226200
-
Timur Iskhodzhanov authored
It breaks AddressSanitizer on Windows. llvm-svn: 226173
-
Hal Finkel authored
Reverting this while I investigate some bad behavior this is causing. As a possibly-related issue, adding -verify-machineinstrs to one of the test cases now fails because of this change: llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs *** Bad machine code: No instruction at def index *** - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS Valno #3 is defined at 624r *** Bad machine code: Live segment doesn't end at a valid instruction *** - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS [624r,624d:3) LLVM ERROR: Found 2 machine code errors. where 624r corresponds exactly to the interval combining change: 624B %RSP<def> = COPY %vreg16; GR64:%vreg16 Considering merging %vreg16 with %RSP RHS = %vreg16 [608r,624r:0) 0@608r updated: 608B %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1] Success: %vreg16 -> %RSP Result = %RSP llvm-svn: 226086
-
Hal Finkel authored
This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying *from* a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226071
-
- Jan 14, 2015
-
-
Duncan P. N. Exon Smith authored
This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048
-
Rafael Espindola authored
This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226038
-
Tim Northover authored
Somehow we seem to have ended up without any actual tests of the CodeGen side. Easy enough to fix. llvm-svn: 225930
-
- Jan 12, 2015
-
-
Adrian Prantl authored
into a new class DwarfExpression that can be shared between AsmPrinter and DwarfUnit. This is the first step towards unifying the two entirely redundant implementations of dwarf expression emission in DwarfUnit and AsmPrinter. Almost no functional change — Testcases were updated because asm comments that used to be on two lines now appear on the same line, which is actually preferable. llvm-svn: 225706
-
- Jan 08, 2015
-
-
Kristof Beyls authored
This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446
-
- Jan 07, 2015
-
-
Charlie Turner authored
llvm-svn: 225348
-