Skip to content
  1. Feb 27, 2015
    • David Blaikie's avatar
      [opaque pointer type] Add textual IR support for explicit type parameter to load instruction · a79ac14f
      David Blaikie authored
      Essentially the same as the GEP change in r230786.
      
      A similar migration script can be used to update test cases, though a few more
      test case improvements/changes were required this time around: (r229269-r229278)
      
      import fileinput
      import sys
      import re
      
      pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
      
      for line in sys.stdin:
        sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
      
      Reviewers: rafael, dexonsmith, grosser
      
      Differential Revision: http://reviews.llvm.org/D7649
      
      llvm-svn: 230794
      a79ac14f
    • David Blaikie's avatar
      [opaque pointer type] Add textual IR support for explicit type parameter to... · 79e6c749
      David Blaikie authored
      [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
      
      One of several parallel first steps to remove the target type of pointers,
      replacing them with a single opaque pointer type.
      
      This adds an explicit type parameter to the gep instruction so that when the
      first parameter becomes an opaque pointer type, the type to gep through is
      still available to the instructions.
      
      * This doesn't modify gep operators, only instructions (operators will be
        handled separately)
      
      * Textual IR changes only. Bitcode (including upgrade) and changing the
        in-memory representation will be in separate changes.
      
      * geps of vectors are transformed as:
          getelementptr <4 x float*> %x, ...
        ->getelementptr float, <4 x float*> %x, ...
        Then, once the opaque pointer type is introduced, this will ultimately look
        like:
          getelementptr float, <4 x ptr> %x
        with the unambiguous interpretation that it is a vector of pointers to float.
      
      * address spaces remain on the pointer, not the type:
          getelementptr float addrspace(1)* %x
        ->getelementptr float, float addrspace(1)* %x
        Then, eventually:
          getelementptr float, ptr addrspace(1) %x
      
      Importantly, the massive amount of test case churn has been automated by
      same crappy python code. I had to manually update a few test cases that
      wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
      python script just massages stdin and writes the result to stdout, I
      then wrapped that in a shell script to handle replacing files, then
      using the usual find+xargs to migrate all the files.
      
      update.py:
      import fileinput
      import sys
      import re
      
      ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
      normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
      
      def conv(match, line):
        if not match:
          return line
        line = match.groups()[0]
        if len(match.groups()[5]) == 0:
          line += match.groups()[2]
        line += match.groups()[3]
        line += ", "
        line += match.groups()[1]
        line += "\n"
        return line
      
      for line in sys.stdin:
        if line.find("getelementptr ") == line.find("getelementptr inbounds"):
          if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
            line = conv(re.match(ibrep, line), line)
        elif line.find("getelementptr ") != line.find("getelementptr ("):
          line = conv(re.match(normrep, line), line)
        sys.stdout.write(line)
      
      apply.sh:
      for name in "$@"
      do
        python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
        rm -f "$name.tmp"
      done
      
      The actual commands:
      From llvm/src:
      find test/ -name *.ll | xargs ./apply.sh
      From llvm/src/tools/clang:
      find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
      From llvm/src/tools/polly:
      find test/ -name *.ll | xargs ./apply.sh
      
      After that, check-all (with llvm, clang, clang-tools-extra, lld,
      compiler-rt, and polly all checked out).
      
      The extra 'rm' in the apply.sh script is due to a few files in clang's test
      suite using interesting unicode stuff that my python script was throwing
      exceptions on. None of those files needed to be migrated, so it seemed
      sufficient to ignore those cases.
      
      Reviewers: rafael, dexonsmith, grosser
      
      Differential Revision: http://reviews.llvm.org/D7636
      
      llvm-svn: 230786
      79e6c749
    • Mehdi Amini's avatar
      Change the fast-isel-abort option from bool to int to enable "levels" · 945a660c
      Mehdi Amini authored
      Summary:
      Currently fast-isel-abort will only abort for regular instructions,
      and just warn for function calls, terminators, function arguments.
      There is already fast-isel-abort-args but nothing for calls and
      terminators.
      
      This change turns the fast-isel-abort options into an integer option,
      so that multiple levels of strictness can be defined.
      This will help no being surprised when the "abort" option indeed does
      not abort, and enables the possibility to write test that verifies
      that no intrinsics are forgotten by fast-isel.
      
      Reviewers: resistor, echristo
      
      Subscribers: jfb, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D7941
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 230775
      945a660c
    • Renato Golin's avatar
      Equally to NetBSD, Bitrig/ARM uses the Itanium-ABI. · a78995c0
      Renato Golin authored
      Patch by Patrick Wildt.
      
      llvm-svn: 230762
      a78995c0
    • Petar Jovanovic's avatar
      Pass correct -mtriple for krait-cpu-div-attribute.ll · 1df91808
      Petar Jovanovic authored
      Not passing mtriple for one of the tests caused a regression failure
      on MIPS buildbot. The issue was introduced by r230651.
      
      Differential Revision: http://reviews.llvm.org/D7938
      
      llvm-svn: 230756
      1df91808
  2. Feb 26, 2015
    • Sumanth Gundapaneni's avatar
      Use ".arch_extension" ARM directive to support hwdiv on krait · 28a3b86b
      Sumanth Gundapaneni authored
      In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
      features bits are not computed. This patch lets the asm printer
      emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
      enabled through ".arch_extension". In short, krait is treated
      as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
      it is not supported bu GNU GAS yet
      
      llvm-svn: 230651
      28a3b86b
  3. Feb 25, 2015
    • Renato Golin's avatar
      Improve handling of stack accesses in Thumb-1 · b9887ef3
      Renato Golin authored
      Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
      STR, and ADD only allow offsets that are a multiple of 4. Make some changes
      to better make use of these instructions:
      
      * Use word loads for anyext byte and halfword loads from the stack.
      * Enforce 4-byte alignment on objects accessed in this way, to ensure that
        the offset is valid.
      * Do the same for objects whose frame index is used, in order to avoid having
        to use more than one ADD to generate the frame index.
      * Correct how many bits of offset we think AddrModeT1_s has.
      
      Patch by John Brawn.
      
      llvm-svn: 230496
      b9887ef3
  4. Feb 24, 2015
  5. Feb 20, 2015
    • Ahmed Bougacha's avatar
      [ARM] Re-re-apply VLD1/VST1 base-update combine. · db141ac3
      Ahmed Bougacha authored
      This re-applies r223862, r224198, r224203, and r224754, which were
      reverted in r228129 because they exposed Clang misalignment problems
      when self-hosting.
      
      The combine caused the crashes because we turned ISD::LOAD/STORE nodes
      to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
      were very lax for the former, and only emitted the alignment operand
      (as in "[r1:128]") when it was larger than the standard alignment of
      the memory type.
      
      However, for ARMISD nodes, we just used the MMO alignment, no matter
      what.  In our case, we turned ISD nodes to ARMISD nodes, and this
      caused the alignment operands to start being emitted.
      
      And that's how we exposed alignment problems that were ignored before
      (but I believe would have been caught with SCTRL.A==1?).
      
      To fix this, we can just mirror the hack done for ISD nodes:  only
      take into account the MMO alignment when the access is overaligned.
      
      Original commit message:
      We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
      when the base pointer is incremented after the load/store.
      
      We can do the same thing for generic load/stores.
      
      Note that we can only combine the first load/store+adds pair in
      a sequence (as might be generated for a v16f32 load for instance),
      because other combines turn the base pointer addition chain (each
      computing the address of the next load, from the address of the last
      load) into independent additions (common base pointer + this load's
      offset).
      
      rdar://19717869, rdar://14062261.
      
      llvm-svn: 229932
      db141ac3
  6. Feb 19, 2015
  7. Feb 18, 2015
    • Bradley Smith's avatar
      [ARM] Add missing M/R class CPUs · 26c9922a
      Bradley Smith authored
      Add some of the missing M and R class Cortex CPUs, namely:
      
      Cortex-M0+ (called Cortex-M0plus for GCC compatibility)
      Cortex-M1
      SC000
      SC300
      Cortex-R5
      
      llvm-svn: 229660
      26c9922a
  8. Feb 17, 2015
  9. Feb 11, 2015
    • James Molloy's avatar
      [SimplifyCFG] Swap to using TargetTransformInfo for cost · 7c336576
      James Molloy authored
       analysis.
      
      We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
      heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
      different heuristic now so there is a slight test churn.
      
      Test changes:
        * combine-comparisons-by-cse.ll: Removed unneeded branch check.
        * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
        * coalesce-subregs.ll: Superfluous block check.
        * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
        * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
        * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.
      
      llvm-svn: 228826
      7c336576
  10. Feb 10, 2015
  11. Feb 09, 2015
    • Akira Hatanaka's avatar
      Fix a bug in DemoteRegToStack where a reload instruction was inserted into the · 8d3cb829
      Akira Hatanaka authored
      wrong basic block.
      
      This would happen when the result of an invoke was used by a phi instruction
      in the invoke's normal destination block. An instruction to reload the invoke's
      value would get inserted before the critical edge was split and a new basic
      block (which is the correct insertion point for the reload) was created. This
      commit fixes the bug by splitting the critical edge before all the reload
      instructions are inserted.
      
      Also, hoist up the code which computes the insertion point to the only place
      that need that computation.
      
      rdar://problem/15978721
      
      llvm-svn: 228566
      8d3cb829
  12. Feb 08, 2015
    • Tim Northover's avatar
      ARM & AArch64: teach LowerVSETCC that output type size may differ from input. · 45aa89c9
      Tim Northover authored
      While various DAG combines try to guarantee that a vector SETCC
      operation will have the same output size as input, there's nothing
      intrinsic to either creation or LegalizeTypes that actually guarantees
      it, so the function needs to be ready to handle a mismatch.
      
      Fortunately this is easy enough, just extend or truncate the naturally
      compared result.
      
      I couldn't reproduce the failure in other backends that I know have
      SIMD, so it's probably only an issue for these two due to shared
      heritage.
      
      Should fix PR21645.
      
      llvm-svn: 228518
      45aa89c9
  13. Feb 07, 2015
  14. Feb 05, 2015
  15. Feb 04, 2015
    • Renato Golin's avatar
      Adding support to LLVM for targeting Cortex-A72 · 60885044
      Renato Golin authored
      Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
      load balancing pass isn't enabled for Cortex-A72 as it's not
      profitable to have it enabled for this core.
      
      Patch by Ranjeet Singh.
      
      llvm-svn: 228140
      60885044
    • Renato Golin's avatar
      Reverting VLD1/VST1 base-updating/post-incrementing combining · 2a5c0a51
      Renato Golin authored
      This reverts patches 223862, 224198, 224203, and 224754, which were all
      related to the vector load/store combining and were reverted/reaplied
      a few times due to the same alignment problems we're seeing now.
      
      Further tests, mainly self-hosting Clang, will be needed to reapply this
      patch in the future.
      
      llvm-svn: 228129
      2a5c0a51
  16. Feb 02, 2015
  17. Jan 31, 2015
    • Saleem Abdulrasool's avatar
      ARM: support stack probe size on Windows on ARM · fb8a66fb
      Saleem Abdulrasool authored
      Now that -mstack-probe-size is piped through to the backend via the function
      attribute as on Windows x86, honour the value to permit handling of non-default
      values for stack probes.  This is needed /Gs with the clang-cl driver or
      -mstack-probe-size with the clang driver when targeting Windows on ARM.
      
      llvm-svn: 227667
      fb8a66fb
  18. Jan 29, 2015
  19. Jan 23, 2015
  20. Jan 21, 2015
  21. Jan 19, 2015
    • Rafael Espindola's avatar
      Bring r226038 back. · 12ca34f5
      Rafael Espindola authored
      No change in this commit, but clang was changed to also produce trivial comdats when
      needed.
      
      Original message:
      
      Don't create new comdats in CodeGen.
      
      This patch stops the implicit creation of comdats during codegen.
      
      Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
      now produce the same result in pr19848.
      
      llvm-svn: 226467
      12ca34f5
  22. Jan 16, 2015
    • Timur Iskhodzhanov's avatar
      Revert r226242 - Revert Revert Don't create new comdats in CodeGen · 60b72136
      Timur Iskhodzhanov authored
      This breaks AddressSanitizer (ninja check-asan) on Windows
      
      llvm-svn: 226251
      60b72136
    • Rafael Espindola's avatar
      Revert "Revert Don't create new comdats in CodeGen" · 67a79e72
      Rafael Espindola authored
      This reverts commit r226173, adding r226038 back.
      
      No change in this commit, but clang was changed to also produce trivial comdats for
      costructors, destructors and vtables when needed.
      
      Original message:
      
      Don't create new comdats in CodeGen.
      
      This patch stops the implicit creation of comdats during codegen.
      
      Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
      now produce the same result in pr19848.
      
      llvm-svn: 226242
      67a79e72
  23. Jan 15, 2015
    • Hal Finkel's avatar
      Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" · 5ef58eb8
      Hal Finkel authored
      Reapply r226071 with fixes. Two fixes:
      
       1. We need to manually remove the old and create the new 'deaf defs'
          associated with physical register definitions when we move the definition of
          the physical register from the copy point to the point of the original vreg def.
      
          This problem was picked up by the machinstr verifier, and could trigger a
          verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
          turned on the verifier in the tests.
      
       2. When moving the def point of the phys reg up, we need to make sure that it
          is neither defined nor read in between the two instructions. We don't, however,
          extend the live ranges of phys reg defs to cover uses, so just checking for
          live-range overlap between the pair interval and the phys reg aliases won't
          pick up reads. As a result, we manually iterate over the range and check for
          reads.
      
          A test soon to be committed to the PowerPC backend will test this change.
      
      Original commit message:
      
      [RegisterCoalescer] Remove copies to reserved registers
      
      This allows the RegisterCoalescer to join "non-flipped" range pairs with a
      physical destination register -- which allows the RegisterCoalescer to remove
      copies like this:
      
      <vreg> = something (maybe a load, for example)
      ... (things that don't use PHYSREG)
      PHYSREG = COPY <vreg>
      
      (with all of the restrictions normally applied by the RegisterCoalescer: having
      compatible register classes, etc. )
      
      Previously, the RegisterCoalescer handled only the opposite case (copying
      *from* a physical register). I don't handle the problem fully here, but try to
      get the common case where there is only one use of <vreg> (the COPY).
      
      An upcoming commit to the PowerPC backend will make this pattern much more
      common on PPC64/ELF systems.
      
      llvm-svn: 226200
      5ef58eb8
    • Timur Iskhodzhanov's avatar
      Revert Don't create new comdats in CodeGen · f5adf13f
      Timur Iskhodzhanov authored
      It breaks AddressSanitizer on Windows.
      
      llvm-svn: 226173
      f5adf13f
    • Hal Finkel's avatar
      Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers" · dd669615
      Hal Finkel authored
      Reverting this while I investigate some bad behavior this is causing. As a
      possibly-related issue, adding -verify-machineinstrs to one of the test cases
      now fails because of this change:
      
        llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs
      
      *** Bad machine code: No instruction at def index ***
      - function:    foo
      - basic block: BB#0 return (0x10007e21f10) [0B;736B)
      - liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
      4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
      - register:    %DS
      Valno #3 is defined at 624r
      
      *** Bad machine code: Live segment doesn't end at a valid instruction ***
      - function:    foo
      - basic block: BB#0 return (0x10007e21f10) [0B;736B)
      - liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
      4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
      - register:    %DS
      [624r,624d:3)
      LLVM ERROR: Found 2 machine code errors.
      
      where 624r corresponds exactly to the interval combining change:
      
      624B    %RSP<def> = COPY %vreg16; GR64:%vreg16
              Considering merging %vreg16 with %RSP
                      RHS = %vreg16 [608r,624r:0)  0@608r
                      updated: 608B   %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1]
              Success: %vreg16 -> %RSP
              Result = %RSP
      
      llvm-svn: 226086
      dd669615
    • Hal Finkel's avatar
      [RegisterCoalescer] Remove copies to reserved registers · 82996462
      Hal Finkel authored
      This allows the RegisterCoalescer to join "non-flipped" range pairs with a
      physical destination register -- which allows the RegisterCoalescer to remove
      copies like this:
      
      <vreg> = something (maybe a load, for example)
      ... (things that don't use PHYSREG)
      PHYSREG = COPY <vreg>
      
      (with all of the restrictions normally applied by the RegisterCoalescer: having
      compatible register classes, etc. )
      
      Previously, the RegisterCoalescer handled only the opposite case (copying
      *from* a physical register). I don't handle the problem fully here, but try to
      get the common case where there is only one use of <vreg> (the COPY).
      
      An upcoming commit to the PowerPC backend will make this pattern much more
      common on PPC64/ELF systems.
      
      llvm-svn: 226071
      82996462
  24. Jan 14, 2015
    • Duncan P. N. Exon Smith's avatar
      IR: Move MDLocation into place · 98854699
      Duncan P. N. Exon Smith authored
      This commit moves `MDLocation`, finishing off PR21433.  There's an
      accompanying clang commit for frontend testcases.  I'll attach the
      testcase upgrade script I used to PR21433 to help out-of-tree
      frontends/backends.
      
      This changes the schema for `DebugLoc` and `DILocation` from:
      
          !{i32 3, i32 7, !7, !8}
      
      to:
      
          !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)
      
      Note that empty fields (line/column: 0 and inlinedAt: null) don't get
      printed by the assembly writer.
      
      llvm-svn: 226048
      98854699
    • Rafael Espindola's avatar
      Don't create new comdats in CodeGen. · fad1639a
      Rafael Espindola authored
      This patch stops the implicit creation of comdats during codegen.
      
      Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
      now produce the same result in pr19848.
      
      llvm-svn: 226038
      fad1639a
    • Tim Northover's avatar
      ARM: add test for crc32 instructions in CodeGen. · a203ca61
      Tim Northover authored
      Somehow we seem to have ended up without any actual tests of the
      CodeGen side. Easy enough to fix.
      
      llvm-svn: 225930
      a203ca61
  25. Jan 12, 2015
    • Adrian Prantl's avatar
      Debug info: Factor out the creation of DWARF expressions from AsmPrinter · b16d9ebb
      Adrian Prantl authored
      into a new class DwarfExpression that can be shared between AsmPrinter
      and DwarfUnit.
      
      This is the first step towards unifying the two entirely redundant
      implementations of dwarf expression emission in DwarfUnit and AsmPrinter.
      
      Almost no functional change — Testcases were updated because asm comments
      that used to be on two lines now appear on the same line, which is
      actually preferable.
      
      llvm-svn: 225706
      b16d9ebb
  26. Jan 08, 2015
    • Kristof Beyls's avatar
      Fix large stack alignment codegen for ARM and Thumb2 targets · 933de7aa
      Kristof Beyls authored
      This partially fixes PR13007 (ARM CodeGen fails with large stack
      alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
      seems stack alignment for Thumb1 targets hasn't been supported at
      all.
      
      Producing an aligned stack pointer is done by zero-ing out the lower
      bits of the stack pointer. The BIC instruction was used for this.
      However, the immediate field of the BIC instruction only allows to
      encode an immediate that can zero out up to a maximum of the 8 lower
      bits. When a larger alignment is requested, a BIC instruction cannot
      be used; llvm was silently producing incorrect code in this case.
      
      This commit fixes code generation for large stack aligments by
      using the BFC instruction instead, when the BFC instruction is
      available.  When not, it uses 2 instructions: a right shift,
      followed by a left shift to zero out the lower bits.
      
      The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
      that unconditionally uses BIC to realign the stack pointer, so it
      very likely has the same problem. However, I wasn't able to
      produce a test case for that. This commit adds an assert so that
      the compiler will fail the assert instead of silently generating
      wrong code if this is ever reached.
      
      llvm-svn: 225446
      933de7aa
  27. Jan 07, 2015
Loading