Skip to content
  1. Aug 30, 2012
    • Michael Liao's avatar
      Introduce 'UseSSEx' to force SSE legacy encoding · bbd10792
      Michael Liao authored
      - Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
        enabled.
      
        As the penalty of inter-mixing SSE and AVX instructions, we need
        prevent SSE legacy insn from being generated except explicitly
        specified through some intrinsics. For patterns supported by both
        SSE and AVX, so far, we force AVX insn will be tried first relying on
        AddedComplexity or position in td file. It's error-prone and
        introduces bugs accidentally.
      
        'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
        by AVX, we need this predicate to force VEX encoding or SSE legacy
        encoding only.
      
        For insns not inherited by AVX, we still use the previous predicates,
        i.e. 'HasSSEx'. So far, these insns fall into the following
        categories:
        * SSE insns with MMX operands
        * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
          CRC, and etc.)
        * SSE4A insns.
        * MMX insns.
        * x87 insns added by SSE.
      
      2 test cases are modified:
      
       - test/CodeGen/X86/fast-isel-x86-64.ll
         AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
         selected by fast-isel due to complicated pattern and fast-isel
         fallback to materialize it from constant pool.
      
       - test/CodeGen/X86/widen_load-1.ll
         AVX code generation is different from SSE one after fixing SSE/AVX
         inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
         'vmovaps'.
      
      llvm-svn: 162919
      bbd10792
    • NAKAMURA Takumi's avatar
      PPCISelLowering.cpp: Fix r162725. · ac49029f
      NAKAMURA Takumi authored
      [Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!
      
      Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.
      
      llvm-svn: 162916
      ac49029f
    • NAKAMURA Takumi's avatar
      PPCISelLowering.cpp: Whitespace. · 8ad54e04
      NAKAMURA Takumi authored
      llvm-svn: 162915
      8ad54e04
    • Tim Northover's avatar
      Add support for moving pure S-register to NEON pipeline if desired · ca9f384f
      Tim Northover authored
      llvm-svn: 162898
      ca9f384f
    • Craig Topper's avatar
      Only perform DAG combine on FMAs of legal types. · e39ad7b5
      Craig Topper authored
      llvm-svn: 162892
      e39ad7b5
    • Michael Liao's avatar
      Fix PR13727 · 3c898064
      Michael Liao authored
      - The root cause is that target constant materialization in X86 fast-isel
        creates a PC-rel addressing which may overflow 32-bit range in non-Small code
        model if .rodata section is allocated too far away from code segment in
        MCJIT, which uses Large code model so far.
      - Follow the similar logic to fix non-Small code model in fast-isel by skipping
        non-Small code model.
      
      llvm-svn: 162881
      3c898064
  2. Aug 29, 2012
  3. Aug 28, 2012
    • Jack Carter's avatar
      The instruction DEXT may be transformed into DEXTU or DEXTM depending · cd6b0e13
      Jack Carter authored
      on the size of the extraction and its position in the 64 bit word.
      
      This patch allows support of the dext transformations with mips64 direct
      object output.
      
      0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
      DINS
      The field is entirely contained in the right-most word of the doubleword
      
      32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
      DINSM
      The field straddles the words of the doubleword
      
      32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
      DINSU
      The field is entirely contained in the left-most word of the doubleword
      
      llvm-svn: 162782
      cd6b0e13
    • Michael Liao's avatar
      Explicitly update the number of nodes to be traversed · 710e1a59
      Michael Liao authored
      llvm-svn: 162780
      710e1a59
    • Jack Carter's avatar
      Some instructions are passed to the assembler to be · c20a21b8
      Jack Carter authored
      transformed to the final instruction variant. An
      example would be dsrll which is transformed into 
      dsll32 if the shift value is greater than 32.
      
      For direct object output we need to do this transformation
      in the codegen. If the instruction was inside branch
      delay slot, it was being missed. This patch corrects this
      oversight.
      
      llvm-svn: 162779
      c20a21b8
    • Roman Divacky's avatar
      Emit word of zeroes after the last instruction as a start of the mandatory · 8c4b6a30
      Roman Divacky authored
      traceback table on PowerPC64. This helps gdb handle exceptions. The other
      mandatory fields are ignored by gdb and harder to implement so just add
      there a FIXME.
      
      Patch by Bill Schmidt. PR13641.
      
      llvm-svn: 162778
      8c4b6a30
    • Akira Hatanaka's avatar
      Follow-up patch to r162731. · 206cefe6
      Akira Hatanaka authored
      Fix a couple of bugs in mips' long branch pass.
      This patch was supposed to be committed along with r162731, so I don't have a
      new test case.
      
      llvm-svn: 162777
      206cefe6
    • Hal Finkel's avatar
      Add PPC Freescale e500mc and e5500 subtargets. · 742b535e
      Hal Finkel authored
      Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
      the PowerPC backend.
      
      Patch by Tobias von Koch.
      
      llvm-svn: 162764
      742b535e
    • Bill Wendling's avatar
      The commutative flag is already correctly set within the multiclass. If we set · cc567180
      Bill Wendling authored
      it here, then a 'register-memory' version would wrongly get the commutative
      flag.
      <rdar://problem/12180135>
      
      llvm-svn: 162741
      cc567180
    • Craig Topper's avatar
      72f51c39
    • Craig Topper's avatar
      Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo. · bd509eea
      Craig Topper authored
      llvm-svn: 162738
      bd509eea
    • Michael Liao's avatar
      Fix PR12312 · b7d85b63
      Michael Liao authored
      - Add a target-specific DAG optimization to recognize a pattern PTEST-able.
        Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
        X86ISD::OR node has only its flag result being used as a boolean value and
        all its leaves are extracted from the same vector, it could be folded into an
        X86ISD::PTEST node.
      
      llvm-svn: 162735
      b7d85b63
    • Jakob Stoklund Olesen's avatar
      Revert r162713: "Add ATOMIC_LDR* pseudo-instructions to model atomic_load on ARM." · b3de7b17
      Jakob Stoklund Olesen authored
      This wasn't the right way to enforce ordering of atomics.
      
      We are already setting the isVolatile bit on memory operands of atomic
      operations which is good enough to enforce the correct ordering.
      
      llvm-svn: 162732
      b3de7b17
    • Akira Hatanaka's avatar
      Fix mips' long branch pass. · b5af7121
      Akira Hatanaka authored
      Instructions emitted to compute branch offsets now use immediate operands
      instead of symbolic labels. This change was needed because there were problems
      when R_MIPS_HI16/LO16 relocations were used to make shared objects.
      
      llvm-svn: 162731
      b5af7121
    • Hal Finkel's avatar
      Split several PPC instruction classes. · 679c73cb
      Hal Finkel authored
      Slight reorganisation of PPC instruction classes for scheduling. No
      functionality change for existing subtargets.
       - Clearly separate load/store-with-update instructions from regular loads and stores.
       - Split IntRotateD -> IntRotateD and IntRotateDI
       - Split out fsub and fadd from FPGeneral -> FPAddSub
       - Update existing itineraries
      
      Patch by Tobias von Koch.
      
      llvm-svn: 162729
      679c73cb
    • Hal Finkel's avatar
      Allow remat of LI on PPC. · 686f2ee2
      Hal Finkel authored
      Allow load-immediates to be rematerialised in the register coalescer for
      PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
      because it relies on a register move getting emitted. The immediate load is
      equivalent, so change this test case.
      
      Patch by Tobias von Koch.
      
      llvm-svn: 162727
      686f2ee2
    • Hal Finkel's avatar
      Eliminate redundant CR moves on PPC32. · 5ab37803
      Hal Finkel authored
      The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
      unset if it doesn't. The solution up to now was to insert a MachineNode to
      set/unset the CR bit, which produces a CR vreg. This vreg was then copied
      into CR bit 6. When the register allocator saw a bunch of these in the same
      function, it allocated the set/unset CR bit in some random CR register (1
      extra instruction) and then emitted CR moves before every vararg function
      call, rather than just setting and unsetting CR bit 6 directly before every
      vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
      instruction which are then matched by a dedicated instruction pattern.
      
      Patch by Tobias von Koch.
      
      llvm-svn: 162725
      5ab37803
    • Hal Finkel's avatar
      Optimize zext on PPC64. · e39526a7
      Hal Finkel authored
      The zeroextend IR instruction is lowered to an 'and' node with an immediate
      mask operand, which in turn gets legalised to a sequence of ori's & ands.
      This can be done more efficiently using the rldicl instruction.
      
      Patch by Tobias von Koch.
      
      llvm-svn: 162724
      e39526a7
    • Jakob Stoklund Olesen's avatar
      More missing mayLoad flags on AVX multiclasses. · 89d6b29d
      Jakob Stoklund Olesen authored
      llvm-svn: 162714
      89d6b29d
    • Jakob Stoklund Olesen's avatar
      Add ATOMIC_LDR* pseudo-instructions to model atomic_load on ARM. · b24cb8c5
      Jakob Stoklund Olesen authored
      It is not safe to use normal LDR instructions because they may be
      reordered by the scheduler. The ATOMIC_LDR pseudos have a mayStore flag
      that prevents reordering.
      
      Atomic loads are also prevented from participating in rematerialization
      and load folding.
      
      llvm-svn: 162713
      b24cb8c5
    • Bill Wendling's avatar
  4. Aug 27, 2012
Loading