Skip to content
  1. Dec 12, 2014
    • Charlie Turner's avatar
      Emit Tag_ABI_FP_16bit_format build attribute. · 1a53996c
      Charlie Turner authored
      The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
      supported, there is not a user switch to change this behaviour. This build
      attribute should capture the default behaviour of the compiler, which is to
      expose the IEEE 754 version of __fp16.
      
      When -mfp16-format is emitted, that will be the way to control the value of
      this build attribute.
      
      Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0
      llvm-svn: 224115
      1a53996c
  2. Dec 11, 2014
    • Tim Northover's avatar
      ARM: correctly expand LDR-lit based globals. · 2ac7e4b3
      Tim Northover authored
      Quite a major error here: the expansions for the Pseudos with and without
      folded load were mixed up. Fortunately it only affects ARM-mode, when not using
      movw/movt, on Darwin. I'm guessing no-one actually uses that combination.
      
      llvm-svn: 223986
      2ac7e4b3
  3. Dec 10, 2014
    • Ahmed Bougacha's avatar
      [ARM] Combine base-updating/post-incrementing vector load/stores. · 7efbac74
      Ahmed Bougacha authored
      We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
      when the base pointer is incremented after the load/store.
      
      We can do the same thing for generic load/stores.
      
      Note that we can only combine the first load/store+adds pair in
      a sequence (as might be generated for a v16f32 load for instance),
      because other combines turn the base pointer addition chain (each
      computing the address of the next load, from the address of the last
      load) into independent additions (common base pointer + this load's
      offset).
      
      Differential Revision: http://reviews.llvm.org/D6585
      
      llvm-svn: 223862
      7efbac74
  4. Dec 09, 2014
  5. Dec 05, 2014
    • Charlie Turner's avatar
      Add missing FP build attribute tests. · c96e95c1
      Charlie Turner authored
      The test file test/CodeGen/ARM/build-attributes.ll was missing several
      floating-point build attribute tests. The intention of this commit is that for
      each CPU / architecture currently tested, there are now tests that make sure
      the following attributes are sufficiently checked,
      
        * Tag_ABI_FP_rounding
        * Tag_ABI_FP_denormal
        * Tag_ABI_FP_exceptions
        * Tag_ABI_FP_user_exceptions
        * Tag_ABI_FP_number_model
      
      Also in this commit, the -unsafe-fp-math flag has been augmented with the full
      suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
      `-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
      -enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'
      
      Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
      llvm-svn: 223454
      c96e95c1
  6. Dec 04, 2014
    • Jon Roelofs's avatar
      Fix thumbv4t indirect calls · 300d8ffd
      Jon Roelofs authored
      So there are a couple of issues with indirect calls on thumbv4t. First, the most
      'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
      next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
      because the saved off pc has its thumb bit cleared, so when the callee returns
      we end up in ARM mode.... yuck.
      
      The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.
      
      We could cut down on code size by sharing the landing pads between call sites
      that are close enough, but for the moment let's do correctness first and look at
      performance later.
      
      
      Patch by: Iain Sandoe
      
      http://reviews.llvm.org/D6519
      
      llvm-svn: 223380
      300d8ffd
  7. Dec 03, 2014
    • Charlie Turner's avatar
      Emit ABI_FP_rounding attribute. · f02c9248
      Charlie Turner authored
      LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
      the user has specified this option, the Tag_ABI_FP_rounding attribute should be
      emitted with value 1. This option currently does not appear to disable
      transformations and optimizations that assume default floating point rounding
      behavior, AFAICT, but the intention should be recorded in the build attributes,
      regardless of what the compiler actually does with the intention.
      
      Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
      llvm-svn: 223218
      f02c9248
    • Charlie Turner's avatar
      Add tests for default value of Tag_ABI_FP_rounding. · 1620a69f
      Charlie Turner authored
      Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393
      llvm-svn: 223217
      1620a69f
  8. Dec 02, 2014
    • Charlie Turner's avatar
      Emit Tag_ABI_FP_denormal correctly in fast-math mode. · 15f91c52
      Charlie Turner authored
      The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
      relevance to this patch is that input denormals are flushed to zero. The way in
      which they're flushed to zero depends on the architecture,
      
        * For VFPv2, it is implementation defined as to whether the sign of zero is
          preserved.
        * For VFPv3 and above, the sign of zero is always preserved when a denormal
          is flushed to zero.
      
      When FP support has been disabled, the strategy taken by this patch is to
      assume the software support will mirror the behaviour of the hardware support
      for the target *if it existed*. That is, for architectures which can only have
      VFPv2, it is assumed the software will flush to positive zero. For later
      architectures it is assumed the software will flush to zero preserving sign.
      
      Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
      llvm-svn: 223110
      15f91c52
  9. Dec 01, 2014
  10. Nov 27, 2014
    • Charlie Turner's avatar
      Stop uppercasing build attribute data. · 8d433691
      Charlie Turner authored
      The string data for string-valued build attributes were being unconditionally
      uppercased. There is no mention in the ARM ABI addenda about case conventions,
      so it's technically implementation defined as to whether the data are
      capitialised in some way or not. However, there are good reasons not to
      captialise the data.
      
        * It's less work.
        * Some vendors may legitimately have case-sensitive checks for these
          attributes which would fail on LLVM generated object files.
        * There could be locale issues with uppercasing.
      
      The original reasons for uppercasing appear to have stemmed from an
      old codesourcery toolchain behaviour, see
      
      http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133
      
      This patch makes the object file emitted no longer captialise string
      data, it encodes as seen in the assembly source.
      
      Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
      llvm-svn: 222882
      8d433691
  11. Nov 17, 2014
    • Renato Golin's avatar
      Fix ARM triple parsing · 609bf923
      Renato Golin authored
      The triple parser should only accept existing architecture names
      when the triple starts with armv, armebv, thumbv or thumbebv.
      
      Patch by Gabor Ballabas.
      
      llvm-svn: 222129
      609bf923
    • Oliver Stannard's avatar
      [Thumb1] Re-write emitThumbRegPlusImmediate · 970b0d57
      Oliver Stannard authored
      This was motivated by a bug which caused code like this to be
      miscompiled:
        declare void @take_ptr(i8*)
        define void @test() {
          %addr1.32 = alloca i8
          %addr2.32 = alloca i32, i32 1028
          call void @take_ptr(i8* %addr1)
          ret void
        }
      
      This was emitting the following assembly to get the value of %addr1:
        add r0, sp, #1020
        add r0, r0, #8
      However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
      could not be assembled. The generated object file contained this,
      resulting in r0 holding SP+8 rather tha SP+1028:
        add r0, sp, #1020
        add r0, sp, #8
      
      This function looked like it could have caused miscompilations for
      other combinations of registers and offsets (though I don't think it is
      currently called with these), and the heuristic it used did not match
      the emitted code in all cases.
      
      llvm-svn: 222125
      970b0d57
    • Oliver Stannard's avatar
      Fix optimisations of SELECT_CC which assumed result is boolean · d29db9b9
      Oliver Stannard authored
      Some optimisations in DAGCombiner cause miscompilations for targets that use
      TargetLowering::UndefinedBooleanContent, because they assume that the results
      of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
      XORed. These optimisations are only valid for targets that use
      ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.
      
      This is a follow-up to D6210/r221693.
      
      llvm-svn: 222123
      d29db9b9
  12. Nov 14, 2014
    • Tim Northover's avatar
      ARM: refactor .cfi_def_cfa_offset emission. · 603d3165
      Tim Northover authored
      We use to track quite a few "adjusted" offsets through the FrameLowering code
      to account for changes in the prologue instructions as we went and allow the
      emission of correct CFA annotations. However, we were missing a couple of cases
      and the code was almost impenetrable.
      
      It's easier to just add any stack-adjusting instruction to a list and emit them
      together.
      
      llvm-svn: 222057
      603d3165
    • Tim Northover's avatar
      ARM: correctly calculate the offset of FP in its push. · 9d2d218f
      Tim Northover authored
      When we folded the DPR alignment gap into a push, we weren't noting the extra
      distance from the beginning of the push to the FP, and so FP ended up pointing
      at an incorrect offset.
      
      The .cfi_def_cfa_offset directives are still wrong in this case, but I think
      that can be improved by refactoring.
      
      llvm-svn: 222056
      9d2d218f
    • Tim Northover's avatar
      ARM: simplify test. · a0691c89
      Tim Northover authored
      The test's DWARF stubs were there just to trigger the emission of .cfi
      directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind
      info, so it's easier to just use that triple.
      
      llvm-svn: 222055
      a0691c89
  13. Nov 13, 2014
  14. Nov 11, 2014
    • Tom Roeder's avatar
      Add Forward Control-Flow Integrity. · eb7a303d
      Tom Roeder authored
      This commit adds a new pass that can inject checks before indirect calls to
      make sure that these calls target known locations. It supports three types of
      checks and, at compile time, it can take the name of a custom function to call
      when an indirect call check fails. The default failure function ignores the
      error and continues.
      
      This pass incidentally moves the function JumpInstrTables::transformType from
      private to public and makes it static (with a new argument that specifies the
      table type to use); this is so that the CFI code can transform function types
      at call sites to determine which jump-instruction table to use for the check at
      that site.
      
      Also, this removes support for jumptables in ARM, pending further performance
      analysis and discussion.
      
      Review: http://reviews.llvm.org/D4167
      llvm-svn: 221708
      eb7a303d
    • Oliver Stannard's avatar
      LLVM incorrectly folds xor into select · 8c2c67e6
      Oliver Stannard authored
      LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
      (set_cc !cc x y), which is only correct when the xor has type i1.
      Instead, we should check that the constant operand to the xor is all
      ones.
      
      llvm-svn: 221693
      8c2c67e6
  15. Nov 06, 2014
  16. Nov 05, 2014
    • Tim Northover's avatar
      ARM: try to add extra CS-register whenever stack alignment >= 8. · dc0d9e46
      Tim Northover authored
      We currently try to push an even number of registers to preserve 8-byte
      alignment during a function's prologue, but only when the stack alignment is
      prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
      (the extra store is often free, and can save another stack adjustment, though
      less frequently for 16-byte stack alignment).
      
      llvm-svn: 221321
      dc0d9e46
    • Tim Northover's avatar
      ARM/Dwarf: correctly align stack before callee-saved VPRs · 228c943f
      Tim Northover authored
      We were making an attempt to do this by adding an extra callee-saved GPR (so
      that there was an even number in the list), but when that failed we went ahead
      and pushed anyway.
      
      This had a couple of potential issues:
        + The .cfi directives we emit misplaced dN because they were based on
          PrologEpilogInserter's calculation.
        + Unaligned stores can be less efficient.
        + Unaligned stores can actually fault (likely only an issue in niche cases,
          but possible).
      
      This adds a final explicit stack adjustment if all other options fail, so that
      the actual locations of the registers match up with where they should be.
      
      llvm-svn: 221320
      228c943f
  17. Nov 03, 2014
    • Charlie Turner's avatar
      Remove the cortex-a9-mp CPU. · 1d8cc909
      Charlie Turner authored
      This CPU definition is redundant. The Cortex-A9 is defined as
      supporting multiprocessing extensions. Remove its definition and
      update appropriate tests.
      
      LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
      difference between the two CPU definitions in ARM.td is that
      cortex-a9-mp contains the feature FeatureMP for multiprocessing
      extensions.
      
      This is redundant since the Cortex-A9 is defined as having
      multiprocessing extensions in the TRMs. armcc also defines the
      Cortex-A9 as having multiprocessing extensions by default.
      
      Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
      llvm-svn: 221166
      1d8cc909
  18. Oct 31, 2014
    • Quentin Colombet's avatar
      [CodeGenPrepare] Move extractelement close to store if they can be combined. · c32615df
      Quentin Colombet authored
      This patch adds an optimization in CodeGenPrepare to move an extractelement
      right before a store when the target can combine them.
      The optimization may promote any scalar operations to vector operations in the
      way to make that possible.
      
      
      ** Context **
      
      Some targets use different register files for both vector and scalar operations.
      This means that transitioning from one domain to another may incur copy from one
      register file to another. These copies are not coalescable and may be expensive.
      For example, according to the scheduling model, on cortex-A8 a vector to GPR
      move is 20 cycles.
      
      
      ** Motivating Example **
      
      Let us consider an example:
      define void @foo(<2 x i32>* %addr1, i32* %dest) {
       %in1 = load <2 x i32>* %addr1, align 8
       %extract = extractelement <2 x i32> %in1, i32 1
       %out = or i32 %extract, 1
       store i32 %out, i32* %dest, align 4
       ret void
      }
      
      As it is, this IR generates the following assembly on armv7:
        vldr  d16, [r0]            @vector load  
        vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
        orr r0, r0, #1           @ scalar bitwise or
        str r0, [r1]               @ scalar store
        bx  lr
      
      Whereas we could generate much faster code:
        vldr  d16, [r0]               @ vector load
        vorr.i32  d16, #0x1     @ vector bitwise or
        vst1.32 {d16[1]}, [r1:32] @ vector extract + store
        bx  lr
      
      Half of the computation made in the vector is useless, but this allows to get
      rid of the expensive cross-register-file copy.
      
      
      ** Proposed Solution **
      
      To avoid this cross-register-copy penalty, we promote the scalar operations to
      vector operations. The penalty will be removed if we manage to promote the whole
      chain of computation in the vector domain.
      Currently, we do that only when the chain of computation ends by a store and the
      target is able to combine an extract with a store.
      
      Stores are the most likely candidates, because other instructions produce values
      that would need to be promoted and so, extracted as some point[1]. Moreover,
      this is customary that targets feature stores that perform a vector extract (see
      AArch64 and X86 for instance).
      
      The proposed implementation relies on the TargetTransformInfo to decide whether
      or not it is beneficial to promote a chain of computation in the vector domain.
      Unfortunately, this interface is rather inaccurate for this level of details and
      although this optimization may be beneficial for X86 and AArch64, the inaccuracy
      will lead to the optimization being too aggressive.
      Basically in TargetTransformInfo, everything that is legal has a cost of 1,
      whereas, even if a vector type is legal, usually a vector operation is slightly
      more expensive than its scalar counterpart. That will lead to too many
      promotions that may not be counter balanced by the saving of the
      cross-register-file copy. For instance, on AArch64 this penalty is just 4
      cycles.
      
      For now, the optimization is just enabled for ARM prior than v8, since those
      processors have a larger penalty on cross-register-file copies, and the scope is
      limited to basic blocks. Because of these two factors, we limit the effects of
      the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
      frequency and everything on top of that.
      
      [1] We can imagine targets that can combine an extractelement with  other
      instructions than just stores. If we want to go into that direction, the current
      interfaces must be augmented and, moreover, I think this becomes a global isel
      problem.
      
      Differential Revision: http://reviews.llvm.org/D5921
      
      <rdar://problem/14170854>
      
      llvm-svn: 220978
      c32615df
  19. Oct 30, 2014
  20. Oct 27, 2014
    • Oliver Stannard's avatar
      [ARM] Select VMAXNM and VMINNM regardless of operand order · 79efe41a
      Oliver Stannard authored
      Currently, the ARM backend will select the VMAXNM and VMINNM for these C
      expressions:
        (a < b) ? a : b
        (a > b) ? a : b
      but not these expressions:
        (a > b) ? b : a
        (a < b) ? b : a
      
      This patch allows all of these expressions to be matched.
      
      llvm-svn: 220671
      79efe41a
  21. Oct 23, 2014
    • Renato Golin's avatar
      Do not emit intermediate register for zero FP immediate · 6fb9c2ea
      Renato Golin authored
      This updates check for double precision zero floating point constant to allow
      use of instruction with immediate value rather than temporary register.
      Currently "a == 0.0", where "a" is of "double" type generates:
      
      vmov.i32        d16, #0x0
      vcmpe.f64       d0, d16
      
      With this change it becomes:
      
      vcmpe.f64        d0, #0
      
      Patch by Sergey Dmitrouk.
      
      llvm-svn: 220486
      6fb9c2ea
    • Akira Hatanaka's avatar
      [ARM, stack protector] If supported, use armv7 instructions. · 2ee0e9e6
      Akira Hatanaka authored
      This commit enables using movt/movw to load the stack guard address:
      
      movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
      movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
      ldr r0, [pc, r0]
      
      Previously a pc-relative load was emitted:
      
      ldr r0, LCPI0_0
      ldr r0, [pc, r0]
      
      rdar://problem/18740489
      
      llvm-svn: 220470
      2ee0e9e6
  22. Oct 20, 2014
    • Tim Northover's avatar
      ARM: rework Thumb1 frame index rewriting · 23075cce
      Tim Northover authored
      The previous code had a few problems, motivating the choices here.
      
      1. It could create instructions clobbering CPSR, but the incoming MachineInstr
         didn't reflect this. A potential source of corruption. This is why the patch
         has a new PseudoInst for before lowering.
      2. Similarly, there was some code to handle the incoming instruction not being
         ARMCC::AL, but this would have caused massive problems if it was actually
         invoked when a complex offset needing more than one instruction was requested.
      3. It wasn't designed to handle unaligned pointers (or offsets). These should
         probably be minimised anyway, but the code needs to deal with them properly
         regardless.
      4. It had some rather dubious ad-hoc code to avoid calling
         emitThumbRegPlusImmediate, a function which should be designed to do precisely
         this job.
      
      We seem to cover the common cases correctly now, and hopefully can enhance
      emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
      future.
      
      llvm-svn: 220236
      23075cce
    • Oliver Stannard's avatar
      [ARM] Do not select SMULW[BT] or SMLAW[BT] · e8f63a54
      Oliver Stannard authored
      The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
      are incorrect. These instructions multiply a 32-bit and a 16-bit value
      (both signed) and return the top 32 bits of the 48-bit result. This
      preserves the 16 bits of overflow, whereas the patterns they currently
      match truncate the result to 16 bits then sign extend.
      
      To select these instructions, we would need to match an ISD::SMUL_LOHI,
      a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
      in an instruction pattern as it defines multiple values, so this would
      have to be done in C++. I have raised
      http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
      selection of these instructions.
      
      This fixes http://llvm.org/bugs/show_bug.cgi?id=19396
      
      llvm-svn: 220196
      e8f63a54
  23. Oct 15, 2014
    • Tim Northover's avatar
      ARM: remove ARM/Thumb distinction for preferred alignment. · cf6ce0c8
      Tim Northover authored
      Thumb1 has legitimate reasons for preferring 32-bit alignment of types
      i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
      a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
      the DataLayout string is not the best place to represent it even if desired.
      
      So this patch removes the extra Thumb requirements, hopefully making ARM and
      Thumb completely compatible in this respect.
      
      llvm-svn: 219734
      cf6ce0c8
    • Tim Northover's avatar
      ARM: allow misaligned local variables in Thumb1 mode. · 9a4c043d
      Tim Northover authored
      There's no hard requirement on LLVM to align local variable to 32-bits, so the
      Thumb1 frame handling needs to be able to deal with variables that are only
      naturally aligned without falling over.
      
      llvm-svn: 219733
      9a4c043d
  24. Oct 14, 2014
    • Tim Northover's avatar
      ARM: set preferred aggregate alignment to 32 universally. · aa09ac6e
      Tim Northover authored
      Before, ARM and Thumb mode code had different preferred alignments, which could
      lead to some rather unexpected results. There's justification for reducing it
      from the default 64-bits (wasted space), but I don't think there is for going
      below 32-bits.
      
      There's no actual ABI change here, just to reassure people.
      
      llvm-svn: 219719
      aa09ac6e
  25. Oct 13, 2014
  26. Oct 08, 2014
Loading