Skip to content
  1. May 19, 2013
  2. May 18, 2013
    • David Majnemer's avatar
      isKnownToBeAPowerOfTwo: (X & Y) + Y is a power of 2 or zero if y is also. · beab5678
      David Majnemer authored
      This is useful if something that looks like (x & (1 << y)) ? 64 : 32 is
      the divisor in a modulo operation.
      
      llvm-svn: 182200
      beab5678
    • Arnold Schwaighofer's avatar
      LoopVectorize: Handle single edge PHIs · 693a1ca6
      Arnold Schwaighofer authored
      We might encouter single edge PHIs - handle them with an identity select.
      
      Fixes PR15990.
      
      llvm-svn: 182199
      693a1ca6
    • Hal Finkel's avatar
      Check InlineAsm clobbers in PPCCTRLoops · 2f474f0e
      Hal Finkel authored
      We don't need to reject all inline asm as using the counter register (most does
      not). Only those that explicitly clobber the counter register need to prevent
      the transformation.
      
      llvm-svn: 182191
      2f474f0e
    • Tim Northover's avatar
      AArch64: add CMake dependency to fix very parallel builds · fd2639f7
      Tim Northover authored
      llvm-svn: 182190
      fd2639f7
    • David Majnemer's avatar
      X86: Bad peephole interaction between adc, MOV32r0 · 5ba473af
      David Majnemer authored
      The peephole tries to reorder MOV32r0 instructions such that they are
      before the instruction that modifies EFLAGS.
      
      The problem is that the peephole does not consider the case where the
      instruction that modifies EFLAGS also depends on the previous state of
      EFLAGS.
      
      Instead, walk backwards until we find an instruction that has a def for
      EFLAGS but does not have a use.
      If we find such an instruction, insert the MOV32r0 before it.
      If it cannot find such an instruction, skip the optimization.
      
      llvm-svn: 182184
      5ba473af
    • Matt Arsenault's avatar
      Remove duplicated comment · e858e960
      Matt Arsenault authored
      The same comment is already made in the header
      
      llvm-svn: 182181
      e858e960
    • Matt Arsenault's avatar
      Add LLVMContext argument to getSetCCResultType · 75865923
      Matt Arsenault authored
      llvm-svn: 182180
      75865923
    • JF Bastien's avatar
      Support unaligned load/store on more ARM targets · 97b08c40
      JF Bastien authored
      This patch matches GCC behavior: the code used to only allow unaligned
      load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
      for v6+ Darwin as well as for v7+ on Linux and NaCl.
      
      The distinction is made because v6 doesn't guarantee support (but LLVM
      assumes that Apple controls hardware+kernel and therefore have
      conformant v6 CPUs), whereas v7 does provide this guarantee (and
      Linux/NaCl behave sanely).
      
      The patch keeps the -arm-strict-align command line option, and adds
      -arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
      -mnostrict-align.
      
      I originally encountered this discrepancy in FastIsel tests which expect
      unaligned load/store generation. Overall this should slightly improve
      performance in most cases because of reduced I$ pressure.
      
      llvm-svn: 182175
      97b08c40
    • Rafael Espindola's avatar
      Convert obj2yaml to use yamlio. · f5bb53f1
      Rafael Espindola authored
      llvm-svn: 182169
      f5bb53f1
    • Rafael Espindola's avatar
      Fix the build in c++11 mode. · 5986ce0e
      Rafael Espindola authored
      The errors were:
      
      non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list
      
      and
      
      non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list
      
      llvm-svn: 182168
      5986ce0e
  3. May 17, 2013
    • Matt Arsenault's avatar
      Replace redundant code · 04126234
      Matt Arsenault authored
      Use EVT::changeExtendedVectorElementTypeToInteger instead of doing the
      same thing that it does
      
      llvm-svn: 182165
      04126234
    • Matt Arsenault's avatar
      Add missing -*- C++ -*- to headers · 52ddb7bc
      Matt Arsenault authored
      llvm-svn: 182164
      52ddb7bc
    • Vincent Lejeune's avatar
      R600: Lower int_load_input to copyFromReg instead of Register node · d3fcb501
      Vincent Lejeune authored
      It solves a bug uncovered by dot4 patch where the register class of
      int_load_input use was ignored.
      
      llvm-svn: 182130
      d3fcb501
    • Vincent Lejeune's avatar
      R600: Use bottom up scheduling algorithm · 3d5118ca
      Vincent Lejeune authored
      llvm-svn: 182129
      3d5118ca
    • Vincent Lejeune's avatar
      R600: Use depth first scheduling algorithm · 4c81d4da
      Vincent Lejeune authored
      It should increase PV substitution opportunities and lower gpr
      usage (pending computations path are "flushed" sooner)
      
      llvm-svn: 182128
      4c81d4da
    • Vincent Lejeune's avatar
      e958c8e0
    • Vincent Lejeune's avatar
      R600: Relax some vector constraints on Dot4. · 519f21ee
      Vincent Lejeune authored
      Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
      coalescer to remove some unneeded COPY.
      This patch also defines some structures/functions that can be used to handle
      every vector instructions (CUBE, Cayman special instructions...) in a similar
      fashion.
      
      llvm-svn: 182126
      519f21ee
    • Vincent Lejeune's avatar
      R600: Improve texture handling · d3eed66e
      Vincent Lejeune authored
      llvm-svn: 182125
      d3eed66e
    • Vincent Lejeune's avatar
      R600: Rename 128 bit registers. · 4ebef18a
      Vincent Lejeune authored
      Almost all instructions that takes a 128 bits reg as input (fetch, export...)
      have the abilities to swizzle their argument and output. Instead of printing
      default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
      print potentially optimized swizzles themselves.
      
      llvm-svn: 182124
      4ebef18a
    • Vincent Lejeune's avatar
      R600: Some factorization · 0fca91d5
      Vincent Lejeune authored
      llvm-svn: 182123
      0fca91d5
    • Vincent Lejeune's avatar
      R600: Factorize Fetch size limit inside AMDGPUSubTarget · f9f4e1e7
      Vincent Lejeune authored
      llvm-svn: 182122
      f9f4e1e7
    • Vincent Lejeune's avatar
      R600: prettier dump of clamp · 709e0168
      Vincent Lejeune authored
      llvm-svn: 182121
      709e0168
    • Tom Stellard's avatar
    • Tom Stellard's avatar
      R600: Pass MCSubtargetInfo reference to R600CodeEmitter · edade94b
      Tom Stellard authored
      llvm-svn: 182112
      edade94b
    • Venkatraman Govindaraju's avatar
      [Sparc] Implements hasReservedCallFrame and hasFP. · 641b0b5a
      Venkatraman Govindaraju authored
       This is to generate correct framesetup code when the function
       has variable sized allocas.
      
      llvm-svn: 182108
      641b0b5a
    • Benjamin Kramer's avatar
      X86: Make shuffle -> shift conversion more aggressive about undefs. · fc33e1d9
      Benjamin Kramer authored
      Shuffles that only move an element into position 0 of the vector are common in
      the output of the loop vectorizer and often generate suboptimal code when SSSE3
      is not available. Lower them to vector shifts if possible.
      
      We still prefer palignr over psrldq because it has higher throughput on
      sandybridge.
      
      llvm-svn: 182102
      fc33e1d9
    • Benjamin Kramer's avatar
      LoopVectorize: Simplify code. No functionality change. · d84a6339
      Benjamin Kramer authored
      llvm-svn: 182100
      d84a6339
    • David Tweed's avatar
      r182085 introduced a change that triggered an assertion on ARM. This is an immediate fix · 3285dc13
      David Tweed authored
      which doesn't resolve the deeper problem.
      
      llvm-svn: 182098
      3285dc13
    • Ulrich Weigand's avatar
      · 2dbe06a9
      Ulrich Weigand authored
      [PowerPC] Fix hi/lo encoding in old-style code emitter
      
      This patch implements the equivalent change to r182091/r182092
      in the old-style code emitter.  Instead of having two separate
      16-bit immediate encoding routines depending on the instruction,
      this patch introduces a single encoder that checks the machine
      operand flags to decide whether the low or high half of a
      symbol address is required.
      
      Since now both encoders make no further distinction between
      "symbolLo" and "symbolHi", the .td operand can now use a
      single getS16ImmEncoding method.
      
      Tested by running the old-style JIT tests on 32-bit Linux.
      
      llvm-svn: 182097
      2dbe06a9
    • Ulrich Weigand's avatar
      · 6e23ac60
      Ulrich Weigand authored
      [PowerPC] Merge/rename PPC fixup types
      
      Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly
      the same everywhere, it no longer makes sense to have two fixup types.
      
      This patch merges them both into a single type fixup_ppc_half16,
      and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency.
      (The half16 and half16ds names are taken from the description of
      relocation types in the PowerPC ABI.)
      
      No change in code generation expected.
      
      llvm-svn: 182092
      6e23ac60
    • Ulrich Weigand's avatar
      · 994f49ed
      Ulrich Weigand authored
      [PowerPC] Fix processing of ha16/lo16 fixups
      
      The current PowerPC MC back end distinguishes between fixup_ppc_ha16
      and fixup_ppc_lo16, which are determined by the instruction the fixup
      applies to, and uses this distinction to decide whether a fixup ought
      to resolve to the high or the low part of a symbol address.
      
      This isn't quite correct, however.  It is valid -if unusual- assembler
      to use, e.g.
        li 1, symbol@ha
      or
        lis 1, symbol@l
      Whether the high or the low part of the address is used depends solely
      on the @ suffix, not on the instruction.
      
      In addition, both
        li 1, symbol
      and
        lis 1, symbol
      are valid, assuming the symbol address fits into 16 bits; again, both
      will then refer to the actual symbol value (so li will load the value
      itself, while lis will load the value shifted by 16).
      
      
      To fix this, two places need to be adapted.  If the fixup cannot be
      resolved at assembler time, a relocation needs to be emitted via
      PPCELFObjectWriter::getRelocType.  This routine already looks at
      the VK_ type to determine the relocation.  The only problem is that
      will reject any _LO modifier in a ha16 fixup and vice versa.  This
      is simply incorrect; any of those modifiers ought to be accepted
      for either fixup type.
      
      If the fixup *can* be resolved at assembler time, adjustFixupValue
      currently selects the high bits of the symbol value if the fixup
      type is ha16.  Again, this is incorrect; see the above example
        lis 1, symbol
      
      Now, in theory we'd have to respect a VK_ modifier here.  However,
      in fact common code never even attempts to resolve symbol references
      using any nontrivial VK_ modifier at assembler time; it will always
      fall back to emitting a reloc and letting the linker handle it.
      
      If this ever changes, presumably there'd have to be a target callback
      to resolve VK_ modifiers.  We'd then have to handle @ha etc. there.
      
      llvm-svn: 182091
      994f49ed
Loading