Skip to content
  1. Aug 04, 2014
    • Chandler Carruth's avatar
      [x86] Implement more aggressive use of PACKUS chains for lowering common · 06e6f1ca
      Chandler Carruth authored
      patterns of v16i8 shuffles.
      
      This implements one of the more important FIXMEs for the SSE2 support in
      the new shuffle lowering. We now generate the optimal shuffle sequence
      for truncate-derived shuffles which show up essentially everywhere.
      
      Unfortunately, this exposes a weakness in other parts of the shuffle
      logic -- we can no longer form PSHUFB here. I'll add the necessary
      support for that and other things in a subsequent commit.
      
      llvm-svn: 214702
      06e6f1ca
    • Benjamin Kramer's avatar
      Update links to the gcc and java documentation that 404'd. · 2abde4f9
      Benjamin Kramer authored
      llvm-svn: 214700
      2abde4f9
    • Kevin Qin's avatar
      Revert "r214669 - MachineCombiner Pass for selecting faster instruction" · f31ecf3f
      Kevin Qin authored
      This commit broke "make check" for several hours, so get it reverted.
      
      llvm-svn: 214697
      f31ecf3f
    • NAKAMURA Takumi's avatar
      MemoryBuffer: Don't use mmap when FileSize is multiple of 4k on Cygwin. · 56bc3419
      NAKAMURA Takumi authored
      On Cygwin, getpagesize() returns 64k(AllocationGranularity).
      
      In r214580, the size of X86GenInstrInfo.inc became 1499136.
      
      FIXME: We should reorganize again getPageSize() on Win32.
      MapFile allocates address along AllocationGranularity but view is mapped by physical page.
      
      llvm-svn: 214681
      56bc3419
    • Chandler Carruth's avatar
      [x86] Handle single input shuffles in the SSSE3 case more intelligently. · 37a18821
      Chandler Carruth authored
      I spent some time looking into a better or more principled way to handle
      this. For example, by detecting arbitrary "unneeded" ORs... But really,
      there wasn't any point. We just shouldn't build blatantly wrong code so
      late in the pipeline rather than adding more stages and logic later on
      to fix it. Avoiding this is just too simple.
      
      llvm-svn: 214680
      37a18821
    • Chandler Carruth's avatar
      [x86] Fix the test case added in r214670 and tweaked in r214674 further. · 7bbfd245
      Chandler Carruth authored
      Fundamentally, there isn't a really portable way to test the constant
      pool contents. Instead, pin this test to the bare-metal triple. This
      also makes it a 64-bit triple which allows us to only match a single
      constant pool rather than two. It can also just hard code the '.' prefix
      as the format should be stable now that it has a fixed triple. Finally,
      I've switched it to use CHECK-NEXT to be more precise in the instruction
      sequence expected and to use variables rather than hard coding decisions
      by the register allocator.
      
      llvm-svn: 214679
      7bbfd245
    • Peter Zotov's avatar
      [OCaml] Add Llvm.{string_of_const,const_element}. · 454b8560
      Peter Zotov authored
      llvm-svn: 214677
      454b8560
    • Peter Zotov's avatar
      [LLVM-C] Add LLVM{IsConstantString,GetAsString,GetElementAsConstant}. · f9aa882c
      Peter Zotov authored
      llvm-svn: 214676
      f9aa882c
    • Sanjay Patel's avatar
      Account for possible leading '.' in label string. · 065cabf4
      Sanjay Patel authored
      llvm-svn: 214674
      065cabf4
    • Chandler Carruth's avatar
      [x86] Don't add nodes to the combined set (and prune subsequent · cde4eb56
      Chandler Carruth authored
      combines) until they are legal.
      
      Doing it the old way could, when the stars align *just* right, cause
      a node to get into the combine set prior to being legalized. Then, when
      the same node showed up as an operand to another node later on (but not
      so much later on that it had been deleted as dead) we would fail to add
      it back to the worklist thinking it had already been combined. This
      would in turn cause it to not be legalized. Fortunately, we can also
      walk the operands looking for uncombined (and thus potentially
      un-legalized) nodes late. It will still ensure that we walk all operands
      of all nodes and send all of them through both the legalizer without
      changes and the combiner at least once. (Which was the original goal of
      this).
      
      I have a test case for this bug, but it is terribly brittle. For
      example, it will stop finding the bug the moment I enable the new
      shuffle lowering. I don't yet have any test case that reliably exercises
      this bug, and it isn't clear that it will be possible to craft one. It
      is entirely possible that with the new shuffle lowering the two forms of
      doing this are precisely equivalent. That doesn't mean we shouldn't take
      the more conservative approach of insisting on things in the combined
      set having survived the legalizer.
      
      llvm-svn: 214673
      cde4eb56
    • Saleem Abdulrasool's avatar
      X86: silence warning (-Wparentheses) · 557023e3
      Saleem Abdulrasool authored
      GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition:
      
      lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
         assert(STReturns == 0 || isMask_32(STReturns) && N <= 2);
      
      llvm-svn: 214672
      557023e3
    • Saleem Abdulrasool's avatar
      CodeGen: silence a warning · befa2153
      Saleem Abdulrasool authored
      GCC 4.8.2 objects to the tautological condition in the assert as the unsigned
      value is guaranteed to be >= 0.  Simplify the assertion by dropping the
      tautological condition.
      
      llvm-svn: 214671
      befa2153
    • Sanjay Patel's avatar
      fix for PR20354 - Miscompile of fabs due to vectorization · 2ef67440
      Sanjay Patel authored
      This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.
      
      This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.
      
      There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.
      
      llvm-svn: 214670
      2ef67440
    • Gerolf Hoflehner's avatar
      MachineCombiner Pass for selecting faster instruction · 35ba4671
      Gerolf Hoflehner authored
       sequence -  AArch64 target support
      
       This patch turns off madd/msub generation in the DAGCombiner and generates
       them in the MachineCombiner instead. It replaces the original code sequence
       with the combined sequence when it is beneficial to do so.
      
       When there is no machine model support it always generates the madd/msub
       instruction. This is true also when the objective is to optimize for code
       size: when the combined sequence is shorter is always chosen and does not
       get evaluated.
      
       When there is a machine model the combined instruction sequence
       is evaluated for critical path and resource length using machine
       trace metrics and the original code sequence is replaced when it is
       determined to be faster.
      
       rdar://16319955
      
      llvm-svn: 214669
      35ba4671
  2. Aug 03, 2014
    • Gerolf Hoflehner's avatar
      MachineCombiner Pass for selecting faster instruction · 5e1207e5
      Gerolf Hoflehner authored
       sequence -  target independent framework
      
       When the DAGcombiner selects instruction sequences
       it could increase the critical path or resource len.
      
       For example, on arm64 there are multiply-accumulate instructions (madd,
       msub). If e.g. the equivalent  multiply-add sequence is not on the
       crictial path it makes sense to select it instead of  the combined,
       single accumulate instruction (madd/msub). The reason is that the
       conversion from add+mul to the madd could lengthen the critical path
       by the latency of the multiply.
      
       But the DAGCombiner would always combine and select the madd/msub
       instruction.
      
       This patch uses machine trace metrics to estimate critical path length
       and resource length of an original instruction sequence vs a combined
       instruction sequence and picks the faster code based on its estimates.
      
       This patch only commits the target independent framework that evaluates
       and selects code sequences. The machine instruction combiner is turned
       off for all targets and expected to evolve over time by gradually
       handling DAGCombiner pattern in the target specific code.
      
       This framework lays the groundwork for fixing
       rdar://16319955
      
      llvm-svn: 214666
      5e1207e5
    • Saleem Abdulrasool's avatar
      MC: virtualise EmitWindowsUnwindTables · 4544c16e
      Saleem Abdulrasool authored
      This makes EmitWindowsUnwindTables a virtual function and lowers the
      implementation of the function to the X86WinCOFFStreamer.  This method is a
      target specific operation.  This enables making the behaviour target dependent
      by isolating it entirely to the target specific streamer.
      
      llvm-svn: 214664
      4544c16e
    • Saleem Abdulrasool's avatar
      MC: rename Win64EHFrameInfo to WinEH::FrameInfo · b3be7371
      Saleem Abdulrasool authored
      The frame information stored in this structure is driven by the requirements for
      Windows NT unwinding rather than Windows 64 specifically.  As a result, this
      type can be shared across multiple architectures (ARM, AXP, MIPS, PPC, SH).
      Rename this class in preparation for adding support for supporting unwinding
      information for Windows on ARM.
      
      Take the opportunity to constify the members as everything except the
      ChainedParent is read-only.  This required some adjustment to the label
      handling.
      
      llvm-svn: 214663
      b3be7371
    • Matt Arsenault's avatar
      R600/SI: Fix extra whitespace in asm str · 9215b17e
      Matt Arsenault authored
      This slipped in in r214467, so something like
      
      V_MOV_B32_e32  v0, ... is now printed with 2 spaces
      between the instruction name and first operand.
      
      llvm-svn: 214660
      9215b17e
    • Manman Ren's avatar
      [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion. · 062f58d5
      Manman Ren authored
      When we have a covered lookup table, make sure we don't delete PHINodes that
      are cached in PHIs.
      
      rdar://17887153
      
      llvm-svn: 214642
      062f58d5
  3. Aug 02, 2014
    • Joerg Sonnenberger's avatar
      tlbia support · c03105ba
      Joerg Sonnenberger authored
      llvm-svn: 214640
      c03105ba
    • Joerg Sonnenberger's avatar
      mfdcr / mtdcr support · e8a167ce
      Joerg Sonnenberger authored
      llvm-svn: 214639
      e8a167ce
    • Erik Eckstein's avatar
      fix bug 20513 - Crash in SLP Vectorizer · 26a1bf7d
      Erik Eckstein authored
      llvm-svn: 214638
      26a1bf7d
    • James Molloy's avatar
      6b999ae6
    • Joerg Sonnenberger's avatar
      Don't use additional arguments for dss and friends to satisfy DSS_Form, · 99ab590a
      Joerg Sonnenberger authored
      when let can do the same thing. Keep the 64bit variants as codegen-only.
      While they have a different register class, the encoding is the same for
      32bit and 64bit mode. Having both present would otherwise confuse the
      disassembler.
      
      llvm-svn: 214636
      99ab590a
    • James Molloy's avatar
      [AArch64] Teach DAGCombiner that converting two consecutive loads into a... · ce45be04
      James Molloy authored
      [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
      
      The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!
      
      llvm-svn: 214634
      ce45be04
    • Chandler Carruth's avatar
      [x86] Remove the FIXME that was implemented in r214628. Managed to · 16c13cad
      Chandler Carruth authored
      forget to update the comment here... =/
      
      llvm-svn: 214630
      16c13cad
    • Chandler Carruth's avatar
      [x86] Give this test a bare metal triple so it doesn't use the weird · bec57b40
      Chandler Carruth authored
      Darwin x86 asm comment prefix designed to work around GAS on that
      platform. That makes the comment-matching of the test much more stable.
      
      llvm-svn: 214629
      bec57b40
    • Chandler Carruth's avatar
      [x86] Largely complete the use of PSHUFB in the new vector shuffle · 4c57955f
      Chandler Carruth authored
      lowering with a small addition to it and adding PSHUFB combining.
      
      There is one obvious place in the new vector shuffle lowering where we
      should form PSHUFBs directly: when without them we will unpack a vector
      of i8s across two different registers and do a potentially 4-way blend
      as i16s only to re-pack them into i8s afterward. This is the crazy
      expensive fallback path for i8 shuffles and we can just directly use
      pshufb here as it will always be cheaper (the unpack and pack are
      two instructions so even a single shuffle between them hits our
      three instruction limit for forming PSHUFB).
      
      However, this doesn't generate very good code in many cases, and it
      leaves a bunch of common patterns not using PSHUFB. So this patch also
      adds support for extracting a shuffle mask from PSHUFB in the X86
      lowering code, and uses it to handle PSHUFBs in the recursive shuffle
      combining. This allows us to combine through them, combine multiple ones
      together, and generally produce sufficiently high quality code.
      
      Extracting the PSHUFB mask is annoyingly complex because it could be
      either pre-legalization or post-legalization. At least this doesn't have
      to deal with re-materialized constants. =] I've added decode routines to
      handle the different patterns that show up at this level and we dispatch
      through them as appropriate.
      
      The two primary test cases are updated. For the v16 test case there is
      still a lot of room for improvement. Since I was going through it
      systematically I left behind a bunch of FIXME lines that I'm hoping to
      turn into ALL lines by the end of this.
      
      llvm-svn: 214628
      4c57955f
    • Chandler Carruth's avatar
      [x86] Switch to using the variable we extracted this operand into. · d10b2924
      Chandler Carruth authored
      Spotted this missed refactoring by inspection when reading code, and it
      doesn't changethe functionality at all.
      
      llvm-svn: 214627
      d10b2924
    • Chandler Carruth's avatar
      [x86] Fix a few typos in my comments spotted in passing. · 5219d4ef
      Chandler Carruth authored
      llvm-svn: 214626
      5219d4ef
    • Chandler Carruth's avatar
      [x86] Teach the target shuffle mask extraction to recognize unary forms · 34f9a987
      Chandler Carruth authored
      of normally binary shuffle instructions like PUNPCKL and MOVLHPS.
      
      This detects cases where a single register is used for both operands
      making the shuffle behave in a unary way. We detect this and adjust the
      mask to use the unary form which allows the existing DAG combine for
      shuffle instructions to actually work at all.
      
      As a consequence, this uncovered a number of obvious bugs in the
      existing DAG combine which are fixed. It also now canonicalizes several
      shuffles even with the existing lowering. These typically are trying to
      match the shuffle to the domain of the input where before we only really
      modeled them with the floating point variants. All of the cases which
      change to an integer shuffle here have something in the integer domain, so
      there are no more or fewer domain crosses here AFAICT. Technically, it
      might be better to go from a GPR directly to the floating point domain,
      but detecting floating point *outputs* despite integer inputs is a lot
      more code and seems unlikely to be worthwhile in practice. If folks are
      seeing domain-crossing regressions here though, let me know and I can
      hack something up to fix it.
      
      Also as a consequence, a bunch of missed opportunities to form pshufb
      now can be formed. Notably, splats of i8s now form pshufb.
      Interestingly, this improves the existing splat lowering too. We go from
      3 instructions to 1. Yes, we may tie up a register, but it seems very
      likely to be worth it, especially if splatting the 0th byte (the
      common case) as then we can use a zeroed register as the mask.
      
      llvm-svn: 214625
      34f9a987
    • Chandler Carruth's avatar
      [x86] Teach my pshufb comment printer to handle VPSHUFB forms as well as · 2ad69eea
      Chandler Carruth authored
      PSHUFB forms. This will be important to update some AVX tests when I add
      PSHUFB combining.
      
      llvm-svn: 214624
      2ad69eea
    • Chandler Carruth's avatar
      [SDAG] Refactor the code which deletes nodes in the DAG combiner to do · 18066974
      Chandler Carruth authored
      so using a single helper which adds operands back onto the worklist.
      
      Several places didn't rigorously do this but a couple already did.
      Factoring them together and doing it rigorously is important to delete
      things recursively early on in the combiner and get a chance to see
      accurate hasOneUse values. While no existing test cases change, an
      upcoming patch to add DAG combining logic for PSHUFB requires this to
      work correctly.
      
      llvm-svn: 214623
      18066974
    • Owen Anderson's avatar
      Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be constant-folded · 9d5a8c28
      Owen Anderson authored
      during DAGCombine in certain circumstances.  Unfortunately, the circumstances required
      to trigger the issue seem to require a pretty specific interaction of DAGCombines,
      and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64.
      The functionality added here is replicated in essentially every other DAG combine,
      so it seems pretty obviously correct.
      
      llvm-svn: 214622
      9d5a8c28
    • Justin Bogner's avatar
      CodeGen: Remove commented out code · 0950d79f
      Justin Bogner authored
      These two lines have been commented out for over 4 years. They aren't
      helping anyone.
      
      llvm-svn: 214615
      0950d79f
    • Akira Hatanaka's avatar
      [ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectly · dc08c30d
      Akira Hatanaka authored
      expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
      in pic mode. This patch fixes the bug.
      
      <rdar://problem/17886592>
      
      llvm-svn: 214614
      dc08c30d
    • Lang Hames's avatar
      [MCJIT] Fix an overly-aggressive check in RuntimeDyldMachOARM. · 70735351
      Lang Hames authored
      This should fix the MachO_ARM_PIC_relocations.s test failures on some 32-bit
      testers.
      
      llvm-svn: 214613
      70735351
    • Matt Arsenault's avatar
      R600: Cleanup fneg tests · 4de32444
      Matt Arsenault authored
      llvm-svn: 214612
      4de32444
    • Michael Gottesman's avatar
      Add a small utility called bisect that enables commandline bisecting on a counter. · 55fcf347
      Michael Gottesman authored
      This is something that I have found to be very useful in my work and I
      wanted to contribute it back to the community since several people in
      the past have asked me for something along these lines. (Jakob, I know
      this has been a while coming ; )]
      
      The way you use this is you create a script that takes in as its first
      argument a count. The script passes into LLVM the count via a command
      line flag that disables a pass after LLVM has run after the pass has
      run for count number of times. Then the script invokes a test of some
      sort and indicates whether LLVM successfully compiled the test via the
      scripts exit status. Then you invoke bisect as follows:
      
      bisect --start=<start_num> --end=<end_num> ./script.sh "%(count)s"
      
      And bisect will continually call ./script.sh with various counts using
      the exit status to determine success and failure.
      
      llvm-svn: 214610
      55fcf347
    • Eric Fiselier's avatar
      [lit] Add --show-xfail flag to LIT. · c85f00a0
      Eric Fiselier authored
      Summary:
      This patch add a --show-xfail flag. If this flag is specified then each xfail test will be printed to output.
      When it is not given xfail tests are ignored. Ignoring xfail tests is the current behavior.
      
      This flag is meant to mirror the --show-unsupported flag that was recently added.
      
      Reviewers: ddunbar, EricWF
      
      Reviewed By: EricWF
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D4750
      
      llvm-svn: 214609
      c85f00a0
Loading