- Jul 21, 2014
-
-
Tom Stellard authored
This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561
-
Tom Stellard authored
Therefore we don't need to add it to the implict defs list. llvm-svn: 213558
-
David Blaikie authored
Correct the ownership passing semantics of object::createBinary and make them explicit in the type system. createBinary documented that it destroyed the parameter in error cases, though by observation it does not. By passing the unique_ptr by value rather than lvalue reference, callers are now explicit about passing ownership and the function implements the documented contract. Remove the explicit documentation, since now the behavior cannot be anything other than what was documented, so it's redundant. Also drops a unique_ptr::release in llvm-nm that was always run on a null unique_ptr anyway. llvm-svn: 213557
-
David Blaikie authored
llvm-svn: 213556
-
David Blaikie authored
llvm-svn: 213554
-
Tom Stellard authored
There are a few more cleanups to do, but I ran into some problems with ext loads and trunc stores, when I tried to change some of the vector loads and stores from custom to legal, so I wasn't able to get rid of everything. llvm-svn: 213552
-
Tom Stellard authored
llvm-svn: 213551
-
Tom Stellard authored
llvm-svn: 213550
-
Tom Stellard authored
This operand is never used. llvm-svn: 213549
-
Daniel Sanders authored
We now emit this value when we need to contradict the default value. This restores support for binutils 2.24. When a suitable binutils has been released we can resume unconditionally emitting .module directives. This is preferable to omitting the .module directives since the .module directives protect against, for example, accidentally assembling FP32 code with -mfp64 and producing an unusuable object. llvm-svn: 213548
-
Robert Khasanov authored
Enabling HasAVX512{DQ,BW,VL} predicates. Adding VK2, VK4, VK32, VK64 masked register classes. Adding new types (v64i8, v32i16) to VR512. Extending calling conventions for new types (v64i8, v32i16) Patch by Zinovy Nis <zinovy.y.nis@intel.com> Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213545
-
Tom Stellard authored
This implements a solution for constant initializers suggested by Vadim Girlin, where we store the data after the shader code and then use the S_GETPC instruction to compute its address. This saves use the trouble of creating a new buffer for constant data and then having to pass the pointer to the kernel via user SGPRs or the input buffer. llvm-svn: 213530
-
Tom Stellard authored
llvm-svn: 213529
-
Tom Stellard authored
llvm-svn: 213528
-
Tom Stellard authored
This allows us to explicitly define the type of fixup that is needed, so we can distinguish this from future fixup types. llvm-svn: 213527
-
Tom Stellard authored
llvm-svn: 213526
-
Daniel Sanders authored
This abstraction allows us to support the various records that can be placed in the .MIPS.options section in the future. We currently use it to record register usage information (the ODK_REGINFO record in our ELF64 spec). Each .MIPS.options record should subclass MipsOptionRecord and provide an implementation of EmitMipsOptionRecord. Patch by Matheus Almeida and Toma Tabacu llvm-svn: 213522
-
Hal Finkel authored
There were two generally-useful CaptureTracker classes defined in LLVM: the simple tracker defined in CaptureTracking (and made available via the PointerMayBeCaptured utility function), and the CapturesBefore tracker available only inside of AA. This change moves the CapturesBefore tracker into CaptureTracking, generalizes it slightly (by adding a ReturnCaptures parameter), and makes it generally available via a PointerMayBeCapturedBefore utility function. This logic will be needed, for example, to perform noalias function parameter attribute inference. No functionality change intended. llvm-svn: 213519
-
Aaron Ballman authored
Fixing an MSVC conversion warning about implicitly converting the shift results to 64-bits. No functional change intended. llvm-svn: 213515
-
Hal Finkel authored
The ability to identify function locals will exist outside of BasicAA (for example, logic for inferring noalias function arguments will need this), so make this concept generally accessible without code duplication. No functionality change. llvm-svn: 213514
-
Daniel Sanders authored
Fix a dangerous default case that caused MipsCodeEmitter to discard pseudo instructions it didn't recognize. It will now call llvm_unreachable() for unrecognized pseudo's and explicitly handles PseudoReturn, PseudoReturn64, PseudoIndirectBranch, PseudoIndirectBranch64, CFI_INSTRUCTION, IMPLICIT_DEF, and KILL. There may be other pseudos that need handling but this was enough for the ExecutionEngine tests to pass on my test system. llvm-svn: 213513
-
Daniel Sanders authored
We now emit this directive when we need to contradict the default value (e.g. -mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx is given). This restores support for the currently available head of binutils. However, at this point binutils 2.24 is still not sufficient since it does not support '.module fp=...'. llvm-svn: 213511
-
Tim Northover authored
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation path is now uniform, regardless of the input IR: fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall fpext -> FP16_TO_FP (if f16 illegal) -> libcall Each target should be able to select the version that best matches its operations and not be required to duplicate patterns for both fptrunc and FP_TO_FP16 (for example). As a result we can remove some redundant AArch64 patterns. llvm-svn: 213507
-
Chandler Carruth authored
'Worklist' consistently rather than a deeply confusing mixture of 'WorkList' and 'Worklist'. Notably, the very 'WorkList' of the DAG combiner was exposed to target specific DAG combines under an interface 'AddToWorklist' which was implemented by in turn calling 'AddToWorkList' in the combiner. This has sent me circling with the wrong case in grep one too many times. I chose to normalize on 'Worklist' because that one won the grep-vote for llvm/lib/... by a hundered hits or so, and it is used in places relatively "canonical" such as InstCombine's Worklist. Let's all jsut pick this casing, whether "correct", "good", or "bad" and be consistent... llvm-svn: 213506
-
Chandler Carruth authored
stack, filter all handle nodes from the DAG combiner worklist. This will also handle cases where other handle nodes might be (erroneously) added to the worklist and then cause bugs and explosions when deleted. For example, when running the legalizer within the DAG combiner, there are times when other handle nodes are used and can end up here. llvm-svn: 213505
-
Andrea Di Biagio authored
Canonicalize shuffles according to rules: * shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A) * shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B) * shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B) This patch helps identifying more shuffle pairs that could be combined reusing the already existing rules in the DAGCombiner. Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized shuffles are now folded into a single shuffle node by the DAGCombiner. Added more test cases to 'combine-vec-shuffle-4.ll'. llvm-svn: 213504
-
Andrea Di Biagio authored
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'. This refactoring is in preperation of an upcoming change to the DAGCombiner. llvm-svn: 213503
-
Gerolf Hoflehner authored
Prevents hoisting of loads above stores and sinking of stores below loads in MergedLoadStoreMotion.cpp (rdar://15991737) llvm-svn: 213497
-
Ulrich Weigand authored
This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs *before* we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493
-
Ulrich Weigand authored
This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492
-
Ulrich Weigand authored
This patch enables the new ELFv2 ABI in the runtime dynamic loader. The loader has to implement the following features: - In the ELFv2 ABI, do not look up a function descriptor in .opd, but instead use the local entry point when resolving a direct call. - Update the TOC restore code to use the new TOC slot linkage area offset. - Create PLT stubs appropriate for the ELFv2 ABI. Note that this patch also adds common-code changes. These are necessary because the loader must check the newly added ELF flags: the e_flags header bits encoding the ABI version, and the st_other symbol table entry bits encoding the local entry point offset. There is currently no way to access these, so I've added ObjectFile::getPlatformFlags and SymbolRef::getOther accessors. Reviewed by Hal Finkel. llvm-svn: 213491
-
Ulrich Weigand authored
The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when *calling* a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the *callee* side, in LowerFormalArguments, we *must* track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently *always* write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an *address* to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490
-
Ulrich Weigand authored
This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points *if* the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be *part* of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489
-
Hal Finkel authored
Thanks to the lld-x86_64-darwin13 builder for catching this first. llvm-svn: 213488
-
Ulrich Weigand authored
As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. llvm-svn: 213487
-
Hal Finkel authored
Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int *a, float *b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486
-
Ulrich Weigand authored
A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by *not* directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a *section* instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. llvm-svn: 213485
-
Ulrich Weigand authored
ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. llvm-svn: 213484
-
Ulrich Weigand authored
When handling an incoming byval argument, we need to possibly write incoming registers to the stack in order to create an on-stack image of the parameter, so we can return its address to common code. This currently uses CreateFixedObject to access the parts of the parameter save area where the argument is (or needs to be) stored. However, sometimes we need to access multiple parts of that area, e.g. to write multiple registers. The code currently uses a new CreateFixedObject call for each of these accesses, resulting in a patchwork of overlapping (fixed) stack objects. This doesn't really matter in the case of fixed objects, since any access to those turns into a fixed stackpointer + offset address anyway. However, with the upcoming ELFv2 patches, we may actually need to place an incoming argument into our *own* stack frame instead of the caller's. This means we need to use CreateStackObject instead, and we cannot have multiple overlapping instances of those. To make the rest of the argument handling code work equally in both situations, this patch refactors it to always use just a single call to CreateFixedObject, and access parts of that object as required using address arithmetic. This way, we can in a future patch substitute CreateStackObject without further changes. No change to generated code intended. llvm-svn: 213483
-
Ulrich Weigand authored
The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482
-