Skip to content
  1. Apr 09, 2012
    • Bill Wendling's avatar
      8a49d049
    • Craig Topper's avatar
    • Chandler Carruth's avatar
      Cleanup and relax a restriction on the matching of global offsets into · 3779ac10
      Chandler Carruth authored
      x86 addressing modes. This allows PIE-based TLS offsets to fit directly
      into an addressing mode immediate offset, which is the last remaining
      code quality issue from PR12380. With this patch, that PR is completely
      fixed.
      
      To understand why this patch is correct to match these offsets into
      addressing mode immediates, break it down by cases:
      1) 32-bit is trivially correct, and unmodified here.
      2) 64-bit non-small mode is unchanged and never matches.
      3) 64-bit small PIC code which is RIP-relative is handled specially in
         the match to try to fit RIP into the base register. If it fails, it
         now early exits. This behavior is unchanged by the patch.
      4) 64-bit small non-PIC code which is not RIP-relative continues to work
         as it did before. The reason these immediates are safe is because the
         ABI ensures they fit in small mode. This behavior is unchanged.
      5) 64-bit small PIC code which is *not* using RIP-relative addressing.
         This is the only case changed by the patch, and the primary place you
         see it is in TLS, either the win64 section offset TLS or Linux
         local-exec TLS model in a PIC compilation. Here the ABI again ensures
         that the immediates fit because we are in small mode, and any other
         operations required due to the PIC relocation model have been handled
         externally to the Wrapper node (extra loads etc are made around the
         wrapper node in ISelLowering).
      
      I've tested this as much as I can comparing it with GCC's output, and
      everything appears safe. I discussed this with Anton and it made sense
      to him at least at face value. That said, if there are issues with PIC
      code after this patch, yell and we can revert it.
      
      llvm-svn: 154304
      3779ac10
    • Chandler Carruth's avatar
      Fold 15 tiny test cases into a single file that implements the · 84b83426
      Chandler Carruth authored
      comprehensive testing of TLS codegen for x86. Convert all of the ones
      that were still using grep to use FileCheck. Remove some redundancies
      between them.
      
      Perhaps most interestingly expand the test cases so that they actually
      fully list the instruction snippet being tested. TLS operations are
      *very* narrowly defined, and so these seem reasonably stable. More
      importantly, the existing test cases already were crazy fine grained,
      expecting specific registers to be allocated. This just clarifies that
      no *other* instructions are expected, and fills in some crucial gaps
      that weren't being tested at all.
      
      This will make any subsequent changes to TLS much more clear during
      review.
      
      llvm-svn: 154303
      84b83426
    • Craig Topper's avatar
      Optimize code a bit. No functional change intended. · 6148fe65
      Craig Topper authored
      llvm-svn: 154299
      6148fe65
  2. Apr 08, 2012
    • Benjamin Kramer's avatar
      Silence sign-compare warning. · bb6ff087
      Benjamin Kramer authored
      llvm-svn: 154297
      bb6ff087
    • Duncan Sands's avatar
      Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381
      Duncan Sands authored
      when -ffast-math, i.e. don't just always do it if the reciprocal can
      be formed exactly.  There is already an IR level transform that does
      that, and it does it more carefully.
      
      llvm-svn: 154296
      2f1dc381
    • Craig Topper's avatar
      Simplify code that tries to do vector extracts for shuffles when the mask... · c8e2d91a
      Craig Topper authored
      Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently.
      
      llvm-svn: 154295
      c8e2d91a
    • Chandler Carruth's avatar
      Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa
      Chandler Carruth authored
      optimizations which are valid for position independent code being linked
      into a single executable, but not for such code being linked into
      a shared library.
      
      I discussed the design of this with Eric Christopher, and the decision
      was to support an optional bit rather than a completely separate
      relocation model. Fundamentally, this is still PIC relocation, its just
      that certain optimizations are only valid under a PIC relocation model
      when the resulting code won't be in a shared library. The simplest path
      to here is to expose a single bit option in the TargetOptions. If folks
      have different/better designs, I'm all ears. =]
      
      I've included the first optimization based upon this: changing TLS
      models to the *Exec models when PIE is enabled. This is the LLVM
      component of PR12380 and is all of the hard work.
      
      llvm-svn: 154294
      ede4a8aa
    • Chandler Carruth's avatar
      Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb
      Chandler Carruth authored
      in TargetLowering. There was already a FIXME about this location being
      odd. The interface is simplified as a consequence. This will also make
      it easier to change TLS models when compiling with PIE.
      
      llvm-svn: 154292
      16f0ebcb
    • Benjamin Kramer's avatar
      EngineBuilder::create is expected to take ownership of the TargetMachine... · 25a3d816
      Benjamin Kramer authored
      EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.
      
      llvm-svn: 154288
      25a3d816
    • Chandler Carruth's avatar
      Remove an over zealous assert. The assert was trying to catch places · bed1abf9
      Chandler Carruth authored
      where a chain outside of the loop block-set ended up in the worklist for
      scheduling as part of the contiguous loop. However, asserting the first
      block in the chain is in the loop-set isn't a valid check -- we may be
      forced to drag a chain into the worklist due to one block in the chain
      being part of the loop even though the first block is *not* in the loop.
      This occurs when we have been forced to form a chain early due to
      un-analyzable branches.
      
      No test case here as I have no idea how to even begin reducing one, and
      it will be hopelessly fragile. We have to somehow end up with a loop
      header of an inner loop which is a successor of a basic block with an
      unanalyzable pair of branch instructions. Ow. Self-host triggers it so
      it is unlikely it will regress.
      
      This at least gets block placement back to passing selfhost and the test
      suite. There are still a lot of slowdown that I don't like coming out of
      block placement, although there are now also a lot of speedups. =[ I'm
      seeing swings in both directions up to 10%. I'm going to try to find
      time to dig into this and see if we can turn this on for 3.1 as it does
      a really good job of cleaning up after some loops that degraded with the
      inliner changes.
      
      llvm-svn: 154287
      bed1abf9
    • Chandler Carruth's avatar
      Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
      Chandler Carruth authored
      debugging.
      
      llvm-svn: 154286
      49158908
    • Chandler Carruth's avatar
      Teach InstCombine to nuke a common alloca pattern -- an alloca which has · f82b0e2d
      Chandler Carruth authored
      GEPs, bit casts, and stores reaching it but no other instructions. These
      often show up during the iterative processing of the inliner, SROA, and
      DCE. Once we hit this point, we can completely remove the alloca. These
      were actually showing up in the final, fully optimized code in a bunch
      of inliner tests I've been working on, and notably they show up after
      LLVM finishes optimizing away all function calls involved in
      hash_combine(a, b).
      
      llvm-svn: 154285
      f82b0e2d
    • Nadav Rotem's avatar
      AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6
      Nadav Rotem authored
      Previously we used three instructions to broadcast an immediate value into a
      vector register.
      On Sandybridge we continue to load the broadcasted value from the constant pool.
      
      llvm-svn: 154284
      82609df6
    • Bill Wendling's avatar
      Remove old 'grep' lines. · 8c783d41
      Bill Wendling authored
      llvm-svn: 154283
      8c783d41
    • Bill Wendling's avatar
    • Bill Wendling's avatar
      FileCheckize these testcases. · 57f8e5eb
      Bill Wendling authored
      llvm-svn: 154281
      57f8e5eb
    • Bill Wendling's avatar
      Remove the 'Parent' pointer from the MDNodeOperand class. · 5c0068f8
      Bill Wendling authored
      An MDNode has a list of MDNodeOperands allocated directly after it as part of
      its allocation. Therefore, the Parent of the MDNodeOperands can be found by
      walking back through the operands to the beginning of that list. Mark the first
      operand's value pointer as being the 'first' operand so that we know where the
      beginning of said list is.
      
      This saves a *lot* of space during LTO with -O0 -g flags.
      
      llvm-svn: 154280
      5c0068f8
    • Bill Wendling's avatar
      Allow subclasses of the ValueHandleBase to store information as part of the · 9b2503a0
      Bill Wendling authored
      value pointer by making the value pointer into a pointer-int pair with 2 bits
      available for flags.
      
      llvm-svn: 154279
      9b2503a0
    • Craig Topper's avatar
      Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2
      Craig Topper authored
      Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.
      
      llvm-svn: 154272
      d024cef2
  3. Apr 07, 2012
    • Craig Topper's avatar
      Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a
      Craig Topper authored
      Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
      
      llvm-svn: 154268
      aa9aab5a
    • Craig Topper's avatar
      Remove 'else' after 'if' that ends in return. · e09d1c5c
      Craig Topper authored
      llvm-svn: 154267
      e09d1c5c
    • Nadav Rotem's avatar
      1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5
      Nadav Rotem authored
         shuffle node because it could introduce new shuffle nodes that were not
         supported efficiently by the target.
      
      2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
         second shuffle reverses the transformation of the first shuffle.
      
      llvm-svn: 154266
      71d07ae5
    • Duncan Sands's avatar
      Convert floating point division by a constant into multiplication by the · 5f8397a9
      Duncan Sands authored
      reciprocal if converting to the reciprocal is exact.  Do it even if inexact
      if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
      benchmarks.
      
      llvm-svn: 154265
      5f8397a9
    • Chandler Carruth's avatar
      Perform partial SROA on the helper hashing structure. I really wish the · 75a1cf32
      Chandler Carruth authored
      optimizers could do this for us, but expecting partial SROA of classes
      with template methods through cloning is probably expecting too much
      heroics. With this change, the begin/end pointer pairs which indicate
      the status of each loop iteration are actually passed directly into each
      layer of the combine_data calls, and the inliner has a chance to see
      when most of the combine_data function could be deleted by inlining.
      Similarly for 'length'.
      
      We have to be careful to limit the places where in/out reference
      parameters are used as those will also defeat the inliner / optimizers
      from properly propagating constants.
      
      With this change, LLVM is able to fully inline and unroll the hash
      computation of small sets of values, such as two or three pointers.
      These now decompose into essentially straight-line code with no loops or
      function calls.
      
      There is still one code quality problem to be solved with the hashing --
      LLVM is failing to nuke the alloca. It removes all loads from the
      alloca, leaving only lifetime intrinsics and dead(!!) stores to the
      alloca. =/ Very unfortunate.
      
      llvm-svn: 154264
      75a1cf32
    • Chandler Carruth's avatar
      Fix ValueTracking to conclude that debug intrinsics are safe to · 28192c93
      Chandler Carruth authored
      speculate. Without this, loop rotate (among many other places) would
      suddenly stop working in the presence of debug info. I found this
      looking at loop rotate, and have augmented its tests with a reduction
      out of a very hot loop in yacr2 where failing to do this rotation costs
      sometimes more than 10% in runtime performance, perturbing numerous
      downstream optimizations.
      
      This should have no impact on performance without debug info, but the
      change in performance when debug info is enabled can be extreme. As
      a consequence (and this how I got to this yak) any profiling of
      performance problems should be treated with deep suspicion -- they may
      have been wildly innacurate of debug info was enabled for profiling. =/
      Just a heads up.
      
      llvm-svn: 154263
      28192c93
    • Benjamin Kramer's avatar
      SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. · e1f4ca1b
      Benjamin Kramer authored
      Found by inspection.
      
      llvm-svn: 154262
      e1f4ca1b
    • Bob Wilson's avatar
      Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543> · 6f9be7e2
      Bob Wilson authored
      The tLDRr instruction with the last register operand set to the zero register
      prints in assembly as if no register was specified, and the assembler encodes
      it as a tLDRi instruction with a zero immediate.  With the integrated assembler,
      that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
      is broken.  Emit the instruction as tLDRi with a zero immediate.  I don't
      know if there's a good way to write a testcase for this.  Suggestions welcome.
      
      Opportunities for follow-up work:
      1) The asm printer should complain if a non-optional register operand is set
         to the zero register, instead of silently dropping it.
      2) The integrated assembler should complain in the same situation, instead of
         silently emitting the operand as "r0".
      
      llvm-svn: 154261
      6f9be7e2
    • Hongbin Zheng's avatar
      Refactor: Use positive field names in VectorizeConfig. · 5758f495
      Hongbin Zheng authored
      llvm-svn: 154249
      5758f495
    • NAKAMURA Takumi's avatar
      Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. · b95f6413
      NAKAMURA Takumi authored
      Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
      I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.
      
      llvm-svn: 154247
      b95f6413
    • Alexis Hunt's avatar
      Make the test for r154235 more platform-independent with a shorter · 78fce432
      Alexis Hunt authored
      string.
      
      llvm-svn: 154243
      78fce432
    • Alexis Hunt's avatar
      Output UTF-8-encoded characters as identifier characters into assembly · 0235f684
      Alexis Hunt authored
      by default.
      
      This is a behaviour configurable in the MCAsmInfo. I've decided to turn
      it on by default in (possibly optimistic) hopes that most assemblers are
      reasonably sane. If this proves a problem, switching to default seems
      reasonable.
      
      I'm not sure if this is the opportune place to test, but it seemed good
      to make sure it was tested somewhere.
      
      llvm-svn: 154235
      0235f684
    • Jim Grosbach's avatar
      Tidy up. 80 columns. · 0c509fa6
      Jim Grosbach authored
      llvm-svn: 154226
      0c509fa6
  4. Apr 06, 2012
Loading