Skip to content
  1. Jul 21, 2014
    • Tom Stellard's avatar
      R600/SI: Add instruction shrinking pass · 1aaad697
      Tom Stellard authored
      This pass converts 64-bit instructions to 32-bit when possible.
      
      llvm-svn: 213561
      1aaad697
    • Tom Stellard's avatar
      R600/SI: VOPC instructions explicitly define VCC · 63797d4a
      Tom Stellard authored
      Therefore we don't need to add it to the implict defs list.
      
      llvm-svn: 213558
      63797d4a
    • David Blaikie's avatar
      Correct the ownership passing semantics of object::createBinary and make them... · 4b9ae52a
      David Blaikie authored
      Correct the ownership passing semantics of object::createBinary and make them explicit in the type system.
      
      createBinary documented that it destroyed the parameter in error cases,
      though by observation it does not. By passing the unique_ptr by value
      rather than lvalue reference, callers are now explicit about passing
      ownership and the function implements the documented contract. Remove
      the explicit documentation, since now the behavior cannot be anything
      other than what was documented, so it's redundant.
      
      Also drops a unique_ptr::release in llvm-nm that was always run on a
      null unique_ptr anyway.
      
      llvm-svn: 213557
      4b9ae52a
    • David Blaikie's avatar
    • David Blaikie's avatar
      Remove unused variable. · 370a67a5
      David Blaikie authored
      llvm-svn: 213554
      370a67a5
    • Tom Stellard's avatar
      R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code · e812f2fd
      Tom Stellard authored
      There are a few more cleanups to do, but I ran into some problems
      with ext loads and trunc stores, when I tried to change some of the
      vector loads and stores from custom to legal, so I wasn't able to
      get rid of everything.
      
      llvm-svn: 213552
      e812f2fd
    • Tom Stellard's avatar
      R600/SI: Use scratch memory for large private arrays · b02094e1
      Tom Stellard authored
      llvm-svn: 213551
      b02094e1
    • Tom Stellard's avatar
      R600/SI: Specify wavefront size for SI and CI · 42639a57
      Tom Stellard authored
      llvm-svn: 213550
      42639a57
    • Tom Stellard's avatar
      R600/SI: Remove vaddr operand from BUFFER_LOAD_*_OFFSET instructions · 8e44d948
      Tom Stellard authored
      This operand is never used.
      
      llvm-svn: 213549
      8e44d948
    • Daniel Sanders's avatar
      [mips] Do not emit '.module fp=...' unless we really need to. · e22244b7
      Daniel Sanders authored
      We now emit this value when we need to contradict the default value. This
      restores support for binutils 2.24.
      
      When a suitable binutils has been released we can resume unconditionally
      emitting .module directives. This is preferable to omitting the .module
      directives since the .module directives protect against, for example,
      accidentally assembling FP32 code with -mfp64 and producing an unusuable object.
      
      llvm-svn: 213548
      e22244b7
    • Robert Khasanov's avatar
      [SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features. · bfa01313
      Robert Khasanov authored
      Enabling HasAVX512{DQ,BW,VL} predicates.
      Adding VK2, VK4, VK32, VK64 masked register classes.
      Adding new types (v64i8, v32i16) to VR512.
      Extending calling conventions for new types (v64i8, v32i16)
      
      Patch by Zinovy Nis <zinovy.y.nis@intel.com>
      Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>
      
      llvm-svn: 213545
      bfa01313
    • Tom Stellard's avatar
      R600/SI: Store constant initializer data in constant memory · 067c8156
      Tom Stellard authored
      This implements a solution for constant initializers suggested
      by Vadim Girlin, where we store the data after the shader code
      and then use the S_GETPC instruction to compute its address.
      
      This saves use the trouble of creating a new buffer for constant data
      and then having to pass the pointer to the kernel via user SGPRs or the
      input buffer.
      
      llvm-svn: 213530
      067c8156
    • Tom Stellard's avatar
      R600/SI: Add isCFDepth0 Predicate to SALU addc pattern · b2114caf
      Tom Stellard authored
      llvm-svn: 213529
      b2114caf
    • Tom Stellard's avatar
      R600/SI: Use VALU for i1 XOR · 54a3b65b
      Tom Stellard authored
      llvm-svn: 213528
      54a3b65b
    • Tom Stellard's avatar
      R600/SI: Use a custom encoding method for simm16 in SOPP branch instructions · 01825afa
      Tom Stellard authored
      This allows us to explicitly define the type of fixup that is needed,
      so we can distinguish this from future fixup types.
      
      llvm-svn: 213527
      01825afa
    • Tom Stellard's avatar
      R600/SI: Rename SOPP operands to match the encoding fields · e08fe68b
      Tom Stellard authored
      llvm-svn: 213526
      e08fe68b
    • Daniel Sanders's avatar
      [mips] Add MipsOptionRecord abstraction and use it to implement .reginfo/.MIPS.options · 68c3747e
      Daniel Sanders authored
      This abstraction allows us to support the various records that can be placed in
      the .MIPS.options section in the future. We currently use it to record register
      usage information (the ODK_REGINFO record in our ELF64 spec).
      
      Each .MIPS.options record should subclass MipsOptionRecord and provide an
      implementation of EmitMipsOptionRecord.
      
      Patch by Matheus Almeida and Toma Tabacu
      
      llvm-svn: 213522
      68c3747e
    • Hal Finkel's avatar
      Move the CapturesBefore tracker from AA into CaptureTracking · b0356217
      Hal Finkel authored
      There were two generally-useful CaptureTracker classes defined in LLVM: the
      simple tracker defined in CaptureTracking (and made available via the
      PointerMayBeCaptured utility function), and the CapturesBefore tracker
      available only inside of AA. This change moves the CapturesBefore tracker into
      CaptureTracking, generalizes it slightly (by adding a ReturnCaptures
      parameter), and makes it generally available via a PointerMayBeCapturedBefore
      utility function.
      
      This logic will be needed, for example, to perform noalias function parameter
      attribute inference.
      
      No functionality change intended.
      
      llvm-svn: 213519
      b0356217
    • Aaron Ballman's avatar
      Fixing an MSVC conversion warning about implicitly converting the shift... · 6c078a59
      Aaron Ballman authored
      Fixing an MSVC conversion warning about implicitly converting the shift results to 64-bits. No functional change intended.
      
      llvm-svn: 213515
      6c078a59
    • Hal Finkel's avatar
      Move isIdentifiedFunctionLocal from BasicAA to AA · c782aa5a
      Hal Finkel authored
      The ability to identify function locals will exist outside of BasicAA (for
      example, logic for inferring noalias function arguments will need this), so
      make this concept generally accessible without code duplication.
      
      No functionality change.
      
      llvm-svn: 213514
      c782aa5a
    • Daniel Sanders's avatar
      [mips] Try to fix the test/ExecutionEngine tests on a MIPS host. · decb7a2b
      Daniel Sanders authored
      Fix a dangerous default case that caused MipsCodeEmitter to discard pseudo
      instructions it didn't recognize. It will now call llvm_unreachable() for
      unrecognized pseudo's and explicitly handles PseudoReturn, PseudoReturn64,
      PseudoIndirectBranch, PseudoIndirectBranch64, CFI_INSTRUCTION, IMPLICIT_DEF,
      and KILL.
      
      There may be other pseudos that need handling but this was enough for the
      ExecutionEngine tests to pass on my test system.
      
      llvm-svn: 213513
      decb7a2b
    • Daniel Sanders's avatar
      [mips] Do not emit '.module [no]oddspreg' unless we really need to. · d7c27960
      Daniel Sanders authored
      We now emit this directive when we need to contradict the default value (e.g.
      -mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx
      is given).
      
      This restores support for the currently available head of binutils. However,
      at this point binutils 2.24 is still not sufficient since it does not support
      '.module fp=...'.
      
      llvm-svn: 213511
      d7c27960
    • Tim Northover's avatar
      CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext · f7a02c17
      Tim Northover authored
      This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
      and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
      path is now uniform, regardless of the input IR:
      
        fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
        fpext -> FP16_TO_FP (if f16 illegal) -> libcall
      
      Each target should be able to select the version that best matches its
      operations and not be required to duplicate patterns for both fptrunc
      and FP_TO_FP16 (for example).
      
      As a result we can remove some redundant AArch64 patterns.
      
      llvm-svn: 213507
      f7a02c17
    • Chandler Carruth's avatar
      [SDAG,cleanup] Switch the DAG combiner over to use the spelling · 3c0012be
      Chandler Carruth authored
      'Worklist' consistently rather than a deeply confusing mixture of
      'WorkList' and 'Worklist'.
      
      Notably, the very 'WorkList' of the DAG combiner was exposed to target
      specific DAG combines under an interface 'AddToWorklist' which was
      implemented by in turn calling 'AddToWorkList' in the combiner. This has
      sent me circling with the wrong case in grep one too many times.
      
      I chose to normalize on 'Worklist' because that one won the grep-vote
      for llvm/lib/... by a hundered hits or so, and it is used in places
      relatively "canonical" such as InstCombine's Worklist. Let's all jsut
      pick this casing, whether "correct", "good", or "bad" and be
      consistent...
      
      llvm-svn: 213506
      3c0012be
    • Chandler Carruth's avatar
      [SDAG] Rather than using a narrow test against the one dummy node on the · 24ceb0ce
      Chandler Carruth authored
      stack, filter all handle nodes from the DAG combiner worklist.
      
      This will also handle cases where other handle nodes might be
      (erroneously) added to the worklist and then cause bugs and explosions
      when deleted. For example, when running the legalizer within the DAG
      combiner, there are times when other handle nodes are used and can end
      up here.
      
      llvm-svn: 213505
      24ceb0ce
    • Andrea Di Biagio's avatar
      [DAGCombiner] Improve the shuffle-vector folding logic. · 0fb20131
      Andrea Di Biagio authored
      Canonicalize shuffles according to rules:
       *  shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
       *  shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
       *  shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)
      
      This patch helps identifying more shuffle pairs that could be combined reusing
      the already existing rules in the DAGCombiner.
      
      Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
      shuffles are now folded into a single shuffle node by the DAGCombiner.
      Added more test cases to 'combine-vec-shuffle-4.ll'.
      
      llvm-svn: 213504
      0fb20131
    • Andrea Di Biagio's avatar
      [DAG] Refactor some logic. No functional change. · 4d8bd416
      Andrea Di Biagio authored
      This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
      and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
      This refactoring is in preperation of an upcoming change to the DAGCombiner.
      
      llvm-svn: 213503
      4d8bd416
    • Gerolf Hoflehner's avatar
      Fix for regression: [Bug 20369] wrong code at -O3 on x86_64-linux-gnu in 64-bit mode · ae1ec299
      Gerolf Hoflehner authored
      Prevents hoisting of loads above stores and sinking of stores below loads
      in MergedLoadStoreMotion.cpp (rdar://15991737)
      
      llvm-svn: 213497
      ae1ec299
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 aggregate passing support · 85d5df25
      Ulrich Weigand authored
      This patch adds infrastructure support for passing array types
      directly.  These can be used by the front-end to pass aggregate
      types (coerced to an appropriate array type).  The details of the
      array type being used inform the back-end about ABI-relevant
      properties.  Specifically, the array element type encodes:
      - whether the parameter should be passed in FPRs, VRs, or just
        GPRs/stack slots  (for float / vector / integer element types,
        respectively)
      - what the alignment requirements of the parameter are when passed in
        GPRs/stack slots  (8 for float / 16 for vector / the element type
        size for integer element types) -- this corresponds to the
        "byval align" field
      
      Using the infrastructure provided by this patch, a companion patch
      to clang will enable two features:
      - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
        or vector aggregates in FPRs and VRs (this is similar to the ARM
        homogeneous aggregate ABI)
      - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
        that fit fully in registers without using the "byval" mechanism
      
      The patch uses the functionArgumentNeedsConsecutiveRegisters callback
      to encode that special treatment is required for all directly-passed
      array types.  The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
      as a results are then used to implement the required size and alignment
      rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.
      
      As a related change, the ABI routines have to be modified to support
      passing floating-point types in GPRs.  This is necessary because with
      homogeneous aggregates of 4-byte float type we can now run out of FPRs
      *before* we run out of the 64-byte argument save area that is shadowed
      by GPRs.  Any extra floating-point arguments that no longer fit in FPRs
      must now be passed in GPRs until we run out of those too.
      
      Note that there was already code to pass floating-point arguments in
      GPRs used with vararg parameters, which was done by writing the argument
      out to the argument save area first and then reloading into GPRs.  The
      patch re-implements this, however, in favor of code packing float arguments
      directly via extension/truncation, BITCAST, and BUILD_PAIR operations.
      
      This is required to support the ELFv2 ABI, since we cannot unconditionally
      write to the argument save area (which the caller might not have allocated).
      The change does, however, affect ELFv1 varags routines too; but even here
      the overall effect should be advantageous: Instead of loading the argument
      into the FPR, then storing the argument to the stack slot, and finally
      reloading the argument from the stack slot into a GPR, the new code now
      just loads the argument into the FPR, and subsequently loads the argument
      into the GPR (via BITCAST).  That BITCAST might imply a save/reload from
      a stack temporary (in which case we're no worse than before); but it
      might be implemented more efficiently in some cases.
      
      The final part of the patch enables up to 8 FPRs and VRs for argument
      return in PPCCallingConv.td; this is required to support returning
      ELFv2 homogeneous aggregates.  (Note that this doesn't affect other ABIs
      since LLVM wil only look for which register to use if the parameter is
      marked as "direct" return anyway.)
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213493
      85d5df25
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 explicit CFI for CR fields · be928cc2
      Ulrich Weigand authored
      This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
      would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
      using just a single CFI record refering to CR2.   In ELFv2 instead,
      each of the CR fields is represented by its own CFI record.  The
      advantage is that the compiler can now chose to save just a single
      (or two) CR fields instead of all of them, if those are the only ones
      that actually need saving.  That can lead to more efficient code using
      mf(o)crf instead of the (slow) mfcr instruction.
      
      Note that this patch does not (yet) implement this more efficient
      code generation, but it does implement the part that is required to
      be ABI compliant: creating multiple CFI records if multiple CR fields
      are saved.
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213492
      be928cc2
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 dynamic loader support · 752b5c9e
      Ulrich Weigand authored
      This patch enables the new ELFv2 ABI in the runtime dynamic loader.
      The loader has to implement the following features:
      - In the ELFv2 ABI, do not look up a function descriptor in .opd, but
        instead use the local entry point when resolving a direct call.
      - Update the TOC restore code to use the new TOC slot linkage area
        offset.
      - Create PLT stubs appropriate for the ELFv2 ABI.
      
      Note that this patch also adds common-code changes. These are necessary
      because the loader must check the newly added ELF flags: the e_flags
      header bits encoding the ABI version, and the st_other symbol table
      entry bits encoding the local entry point offset.  There is currently
      no way to access these, so I've added ObjectFile::getPlatformFlags and
      SymbolRef::getOther accessors.
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213491
      752b5c9e
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 stack space reduction · 8658f17e
      Ulrich Weigand authored
      The ELFv2 ABI reduces the amount of stack required to implement an
      ABI-compliant function call in two ways:
      * the "linkage area" is reduced from 48 bytes to 32 bytes by
        eliminating two unused doublewords
      * the 64-byte "parameter save area" is now optional and need not be
        present in certain cases (it remains mandatory in functions with
        variable arguments, and functions that have any parameter that is
        passed on the stack)
      
      The following patch implements this required changes:
      - reducing the linkage area, and associated relocation of the TOC save
        slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
        callers of these routines to pass in the isELFv2ABI flag).
      - (partially) handling the case where the parameter save are is optional
      
      This latter part requires some extra explanation:  Currently, we still
      always allocate the parameter save area when *calling* a function.
      That is certainly always compliant with the ABI, but may cause code to
      allocate stack unnecessarily.  This can be addressed by a follow-on
      optimization patch.
      
      On the *callee* side, in LowerFormalArguments, we *must* track
      correctly whether the ABI guarantees that the caller has allocated
      the parameter save area for our use, and the patch does so. However,
      there is one complication: the code that handles incoming "byval"
      arguments will currently *always* write to the parameter save area,
      because it has to force incoming register arguments to the stack since
      it must return an *address* to implement the byval semantics.
      
      To fix this, the patch changes the LowerFormalArguments code to write
      arguments to a freshly allocated stack slot on the function's own stack
      frame instead of the argument save area in those cases where that area
      is not present.
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213490
      8658f17e
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 function call changes · aa0ac4f1
      Ulrich Weigand authored
      This patch builds upon the two preceding MC changes to implement the
      basic ELFv2 function call convention.  In the ELFv1 ABI, a "function
      descriptor" was associated with every function, pointing to both the
      entry address and the related TOC base (and a static chain pointer
      for nested functions).  Function pointers would actually refer to that
      descriptor, and the indirect call sequence needed to load up both entry
      address and TOC base.
      
      In the ELFv2 ABI, there are no more function descriptors, and function
      pointers simply refer to the (global) entry point of the function code.
      Indirect function calls simply branch to that address, after loading it
      up into r12 (as required by the ABI rules for a global entry point).
      Direct function calls continue to just do a "bl" to the target symbol;
      this will be resolved by the linker to the local entry point of the
      target function if it is local, and to a PLT stub if it is global.
      That PLT stub would then load the (global) entry point address of the
      final target into r12 and branch to it.  Note that when performing a
      local function call, r2 must be set up to point to the current TOC
      base: if the target ends up local, the ABI requires that its local
      entry point is called with r2 set up; if the target ends up global,
      the PLT stub requires that r2 is set up.
      
      This patch implements all LLVM changes to implement that scheme:
      - No longer create a function descriptor when emitting a function
        definition (in EmitFunctionEntryLabel)
      - Emit two entry points *if* the function needs the TOC base (r2)
        anywhere (this is done EmitFunctionBodyStart; note that this cannot
        be done in EmitFunctionBodyStart because the global entry point
        prologue code must be *part* of the function as covered by debug info).
      - In order to make use tracking of r2 (as needed above) work correctly,
        mark direct function calls as implicitly using r2.
      - Implement the ELFv2 indirect function call sequence (no function
        descriptors; load target address into r12).
      - When creating an ELFv2 object file, emit the .abiversion 2 directive
        to tell the linker to create the appropriate version of PLT stubs.  
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213489
      aa0ac4f1
    • Hal Finkel's avatar
      [LoopVectorize] Remove an unused private AA pointer · 07c9bb3d
      Hal Finkel authored
      Thanks to the lld-x86_64-darwin13 builder for catching this first.
      
      llvm-svn: 213488
      07c9bb3d
    • Ulrich Weigand's avatar
      [MC] Pass MCSymbolData to needsRelocateWithSymbol · 46797c69
      Ulrich Weigand authored
      As discussed in a previous checking to support the .localentry
      directive on PowerPC, we need to inspect the actual target symbol
      in needsRelocateWithSymbol to make the appropriate decision based
      on that symbol's st_other bits.
      
      Currently, needsRelocateWithSymbol does not get the target symbol.
      However, it is directly available to its sole caller.  This patch
      therefore simply extends the needsRelocateWithSymbol by a new
      parameter "const MCSymbolData &SD", passes in the target symbol,
      and updates all derived implementations.
      
      In particular, in the PowerPC implementation, this patch removes
      the FIXME added by the previous checkin.
      
      llvm-svn: 213487
      46797c69
    • Hal Finkel's avatar
      [LoopVectorize] Use AA to partition potential dependency checks · 7ae00a12
      Hal Finkel authored
      Prior to this change, the loop vectorizer did not make use of the alias
      analysis infrastructure. Instead, it performed memory dependence analysis using
      ScalarEvolution-based linear dependence checks within equivalence classes
      derived from the results of ValueTracking's GetUnderlyingObjects.
      
      Unfortunately, this meant that:
        1. The loop vectorizer had logic that essentially duplicated that in BasicAA
           for aliasing based on identified objects.
        2. The loop vectorizer could not partition the space of dependency checks
           based on information only easily available from within AA (TBAA metadata is
           currently the prime example).
      
      This means, for example, regardless of whether -fno-strict-aliasing was
      provided, the vectorizer would only vectorize this loop with a runtime
      memory-overlap check:
      
      void foo(int *a, float *b) {
        for (int i = 0; i < 1600; ++i)
          a[i] = b[i];
      }
      
      This is suboptimal because the TBAA metadata already provides the information
      necessary to show that this check unnecessary. Of course, the vectorizer has a
      limit on the number of such checks it will insert, so in practice, ignoring
      TBAA means not vectorizing more-complicated loops that we should.
      
      This change causes the vectorizer to use an AliasSetTracker to keep track of
      the pointers in the loop. The resulting alias sets are then used to partition
      the space of dependency checks, and potential runtime checks; this results in
      more-efficient vectorizations.
      
      When pointer locations are added to the AliasSetTracker, two things are done:
        1. The location size is set to UnknownSize (otherwise you'd not catch
           inter-iteration dependencies)
        2. For instructions in blocks that would need to be predicated, TBAA is
           removed (because the metadata might have a control dependency on the condition
           being speculated).
      
      For non-predicated blocks, you can leave the TBAA metadata. This is safe
      because you can't have an iteration dependency on the TBAA metadata (if you
      did, and you unrolled sufficiently, you'd end up with the same pointer value
      used by two accesses that TBAA says should not alias, and that would yield
      undefined behavior).
      
      llvm-svn: 213486
      7ae00a12
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 MC support for .localentry directive · bb68610d
      Ulrich Weigand authored
      A second binutils feature needed to support ELFv2 is the .localentry
      directive.  In the ELFv2 ABI, functions may have two entry points:
      one for calling the routine locally via "bl", and one for calling the
      function via function pointer (either at the source level, or implicitly
      via a PLT stub for global calls).  The two entry points share a single
      ELF symbol, where the ELF symbol address identifies the global entry
      point address, while the local entry point is found by adding a delta
      offset to the symbol address.  That offset is encoded into three
      platform-specific bits of the ELF symbol st_other field.
      
      The .localentry directive instructs the assembler to set those fields
      to encode a particular offset.  This is typically used by a function
      prologue sequence like this:
      
      func:
              addis r2, r12, (.TOC.-func)@ha
              addi r2, r2, (.TOC.-func)@l
              .localentry func, .-func
      
      Note that according to the ABI, when calling the global entry point,
      r12 must be set to point the global entry point address itself; while
      when calling the local entry point, r2 must be set to point to the TOC
      base.  The two instructions between the global and local entry point in
      the above example translate the first requirement into the second.
      
      This patch implements support in the PowerPC MC streamers to emit the
      .localentry directive (both into assembler and ELF object output), as
      well as support in the assembler parser to parse that directive.
      
      In addition, there is another change required in MC fixup/relocation
      handling to properly deal with relocations targeting function symbols
      with two entry points: When the target function is known local, the MC
      layer would immediately handle the fixup by inserting the target
      address -- this is wrong, since the call may need to go to the local
      entry point instead.  The GNU assembler handles this case by *not*
      directly resolving fixups targeting functions with two entry points,
      but always emits the relocation and relies on the linker to handle
      this case correctly.  This patch changes LLVM MC to do the same (this
      is done via the processFixupValue routine).
      
      Similarly, there are cases where the assembler would normally emit a
      relocation, but "simplify" it to a relocation targeting a *section*
      instead of the actual symbol.  For the same reason as above, this
      may be wrong when the target symbol has two entry points.  The GNU
      assembler again handles this case by not performing this simplification
      in that case, but leaving the relocation targeting the full symbol,
      which is then resolved by the linker.  This patch changes LLVM MC
      to do the same (via the needsRelocateWithSymbol routine).
      NOTE: The method used in this patch is overly pessimistic, since the
      needsRelocateWithSymbol routine currently does not have access to the
      actual target symbol, and thus must always assume that it might have
      two entry points.  This will be improved upon by a follow-on patch
      that modifies common code to pass the target symbol when calling
      needsRelocateWithSymbol.
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213485
      bb68610d
    • Ulrich Weigand's avatar
      [PowerPC] ELFv2 MC support for .abiversion directive · 0daa5164
      Ulrich Weigand authored
      ELFv2 binaries are marked by a bit in the ELF header e_flags field.
      A new assembler directive .abiversion can be used to set that flag.
      This patch implements support in the PowerPC MC streamers to emit the
      .abiversion directive (both into assembler and ELF binary output),
      as well as support in the assembler parser to parse the .abiversion
      directive.
      
      Reviewed by Hal Finkel.
      
      llvm-svn: 213484
      0daa5164
    • Ulrich Weigand's avatar
      [PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4 · 24195972
      Ulrich Weigand authored
      When handling an incoming byval argument, we need to possibly write
      incoming registers to the stack in order to create an on-stack image
      of the parameter, so we can return its address to common code.
      
      This currently uses CreateFixedObject to access the parts of the
      parameter save area where the argument is (or needs to be) stored.
      However, sometimes we need to access multiple parts of that area,
      e.g. to write multiple registers.  The code currently uses a new
      CreateFixedObject call for each of these accesses, resulting in
      a patchwork of overlapping (fixed) stack objects.
      
      This doesn't really matter in the case of fixed objects, since
      any access to those turns into a fixed stackpointer + offset
      address anyway.  However, with the upcoming ELFv2 patches, we
      may actually need to place an incoming argument into our *own*
      stack frame instead of the caller's.  This means we need to use
      CreateStackObject instead, and we cannot have multiple overlapping
      instances of those.
      
      To make the rest of the argument handling code work equally in
      both situations, this patch refactors it to always use just a
      single call to CreateFixedObject, and access parts of that object
      as required using address arithmetic.  This way, we can in a future
      patch substitute CreateStackObject without further changes.
      
      No change to generated code intended.
      
      llvm-svn: 213483
      24195972
    • Ulrich Weigand's avatar
      [PowerPC] Fix FrameIndex handling in SelectAddressRegImm · 55a96650
      Ulrich Weigand authored
      The PPCTargetLowering::SelectAddressRegImm routine needs to handle
      FrameIndex nodes in a special manner, by tranlating them into a
      TargetFrameIndex node.  This was done in most cases, but seems to
      have been neglected in one path: when the input tree has an OR of
      the FrameIndex with an immediate.  This can happen if the FrameIndex
      can be proven to be sufficiently aligned that an OR of that immediate
      is equivalent to an ADD.
      
      The missing handling of FrameIndex in that case caused the SelectionDAG
      instruction selection to miss opportunities to merge the OR back into
      the FrameIndex node, leading to superfluous addi/ori instructions in
      the final assembler output.
      
      llvm-svn: 213482
      55a96650
Loading