Skip to content
  1. Feb 01, 2018
  2. Jan 31, 2018
    • Evgeniy Stepanov's avatar
      Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations" · 7746899f
      Evgeniy Stepanov authored
      Miscompiles code. Testcase pending.
      
      This reverts commit r323869.
      
      llvm-svn: 323929
      7746899f
    • Diana Picus's avatar
      Fix formatting for r323876. NFC · 12ed95e3
      Diana Picus authored
      llvm-svn: 323878
      12ed95e3
    • Diana Picus's avatar
      [ARM GlobalISel] Modernize LegalizerInfo. NFCI · 1d4421f6
      Diana Picus authored
      Start using the new LegalizerInfo API introduced in r323681.
      
      Keep the old API for opcodes that need Lowering in some circumstances
      (G_FNEG and G_UREM/G_SREM).
      
      llvm-svn: 323876
      1d4421f6
    • Pablo Barrio's avatar
      [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations · 2e442a78
      Pablo Barrio authored
      Summary:
      Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
      
      In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
      
      Patch by Marten Svanfeldt.
      
      Reviewers: fhahn, pbarrio
      
      Reviewed By: pbarrio
      
      Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42574
      
      llvm-svn: 323869
      2e442a78
    • Sjoerd Meijer's avatar
      [ARM] Armv8.2-A FP16 code generation (part 2/3) · 98d5359e
      Sjoerd Meijer authored
      Half-precision arguments and return values are passed as if it were an int or
      float for ARM. This results in truncates and bitcasts to/from i16 and f16
      values, which are legalized very early to stack stores/loads. When FullFP16 is
      enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
      inefficient.
      
      Differential Revision: https://reviews.llvm.org/D42580
      
      llvm-svn: 323861
      98d5359e
    • Roger Ferrer Ibanez's avatar
      [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR GPR. · aea42087
      Roger Ferrer Ibanez authored
      In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
      copies CPSR  GPR but not all Thumb1 targets implement them.
      
      The schedule can attempt, before attempting a copy, to clone the instructions
      but it does not currently do that for nodes with input glue. In this patch we
      introduce a target-hook to let the hook decide if a glued machinenode is still
      eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .
      
      As a follow-up of this change we should actually implement the copies for the
      Thumb1 targets that do implement them and restrict the hook to the targets that
      can't really do such copy as these clones are not ideal.
      
      This change fixes PR35836.
      
      Differential Revision: https://reviews.llvm.org/D42051
      
      llvm-svn: 323857
      aea42087
  3. Jan 30, 2018
  4. Jan 29, 2018
  5. Jan 26, 2018
    • Momchil Velikov's avatar
      [ARM] Accept a subset of Thumb GPR register class when emitting an SP-relative · d2cc6fd9
      Momchil Velikov authored
      load instruction
      
      The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
      register class. The function serves to emit a `tLDRspi` instruction and
      certainly any subset of the `tGPR` register class is a valid destination of the
      load.
      
      Differential revision: https://reviews.llvm.org/D42535
      
      llvm-svn: 323514
      d2cc6fd9
    • Sjoerd Meijer's avatar
      [ARM] Armv8.2-A FP16 code generation (part 1/3) · 011de9c0
      Sjoerd Meijer authored
      This is the groundwork for Armv8.2-A FP16 code generation .
      
      Clang passes and returns _Float16 values as floats, together with the required
      bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
      We will implement half-precision argument passing/returning lowering in the ARM
      backend soon, but for now this means that this:
      
      _Float16 sub(_Float16 a, _Float16 b) {
        return a + b;
      }
      
      gets lowered to this:
      
      define float @sub(float %a.coerce, float %b.coerce) {
      entry:
        %0 = bitcast float %a.coerce to i32
        %tmp.0.extract.trunc = trunc i32 %0 to i16
        %1 = bitcast i16 %tmp.0.extract.trunc to half
        <SNIP>
        %add = fadd half %1, %3
        <SNIP>
      }
      
      When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
      legalization for "free", i.e. nothing changes and everything works as before.
      And also f16 argument passing/returning is handled.
      
      When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
      we need to patch up: f16 argument passing and returning, which involves minor
      tweaks to avoid unnecessary code generation for some bitcasts.
      
      As a "demonstrator" that this works for the different FP16, FullFP16, softfp
      modes, etc., I've added match rules to the VSUB instruction description showing
      that we can codegen this instruction from IR, but more importantly, also to
      some conversion instructions. These conversions were causing issue before in
      the FP16 and FullFP16 cases.
      
      I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
      actually compile the entire half-precision sub code example above. This showed
      that these loads and stores had the wrong addressing mode specified: AddrMode5
      instead of AddrMode5FP16, which turned out not be implemented at all, so that
      has also been added.
      
      This is the minimal patch that shows all the different moving parts. In patch
      2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
      remaining Armv8.2-A FP16 instruction descriptions.
      
      
      Thanks to Sam Parker and Oliver Stannard for their help and reviews!
      
      
      Differential Revision: https://reviews.llvm.org/D38315
      
      llvm-svn: 323512
      011de9c0
  6. Jan 24, 2018
  7. Jan 22, 2018
  8. Jan 19, 2018
    • Joel Galenson's avatar
      [ARM] Fix perf regression in compare optimization. · dbc724f7
      Joel Galenson authored
      Fix a performance regression caused by r322737.
      
      While trying to make it easier to replace compares with existing adds and
      subtracts, I accidentally stopped it from doing so in some cases.  This should
      fix that.  I'm also fixing another potential bug in that commit.
      
      Differential Revision: https://reviews.llvm.org/D42263
      
      llvm-svn: 322972
      dbc724f7
    • Daniel Neilson's avatar
      Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) · 1e68724d
      Daniel Neilson authored
      Summary:
       This is a resurrection of work first proposed and discussed in Aug 2015:
         http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
      and initially landed (but then backed out) in Nov 2015:
         http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
      
       The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
      which is required to be a constant integer. It represents the alignment of the
      dest (and source), and so must be the minimum of the actual alignment of the
      two.
      
       This change is the first in a series that allows source and dest to each
      have their own alignments by using the alignment attribute on their arguments.
      
       In this change we:
      1) Remove the alignment argument.
      2) Add alignment attributes to the source & dest arguments. We, temporarily,
         require that the alignments for source & dest be equal.
      
       For example, code which used to read:
        call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
      will now read
        call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
      
       Downstream users may have to update their lit tests that check for
      @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
      may help with updating the majority of your tests, but it does not catch all possible
      patterns so some manual checking and updating will be required.
      
      s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
      s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g
      
       The remaining changes in the series will:
      Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
         source and dest alignments.
      Step 3) Update Clang to use the new IRBuilder API.
      Step 4) Update Polly to use the new IRBuilder API.
      Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
              and those that use use MemIntrinsicInst::[get|set]Alignment() to use
              getDestAlignment() and getSourceAlignment() instead.
      Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
              MemIntrinsicInst::[get|set]Alignment() methods.
      
      Reviewers: pete, hfinkel, lhames, reames, bollu
      
      Reviewed By: reames
      
      Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41675
      
      llvm-svn: 322965
      1e68724d
  9. Jan 18, 2018
    • Reid Kleckner's avatar
      [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64 · 1aa9061c
      Reid Kleckner authored
      Every known PE COFF target emits /EXPORT: linker flags into a .drective
      section. The AsmPrinter should handle this.
      
      While we're at it, use global_values() and emit each export flag with
      its own .ascii directive. This should make the .s file output more
      readable.
      
      llvm-svn: 322788
      1aa9061c
  10. Jan 17, 2018
  11. Jan 12, 2018
  12. Jan 11, 2018
  13. Jan 10, 2018
Loading