Skip to content
  1. Sep 25, 2012
  2. Sep 20, 2012
    • Michael Liao's avatar
      Re-work X86 code generation of atomic ops with spin-loop · 3237662b
      Michael Liao authored
      - Rewrite/merge pseudo-atomic instruction emitters to address the
        following issue:
        * Reduce one unnecessary load in spin-loop
      
          previously the spin-loop looks like
      
              thisMBB:
              newMBB:
                ld  t1 = [bitinstr.addr]
                op  t2 = t1, [bitinstr.val]
                not t3 = t2  (if Invert)
                mov EAX = t1
                lcs dest = [bitinstr.addr], t3  [EAX is implicit]
                bz  newMBB
                fallthrough -->nextMBB
      
          the 'ld' at the beginning of newMBB should be lift out of the loop
          as lcs (or CMPXCHG on x86) will load the current memory value into
          EAX. This loop is refined as:
      
              thisMBB:
                EAX = LOAD [MI.addr]
              mainMBB:
                t1 = OP [MI.val], EAX
                LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
                JNE mainMBB
              sinkMBB:
      
        * Remove immopc as, so far, all pseudo-atomic instructions has
          all-register form only, there is no immedidate operand.
      
        * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
          td
      
        * Fix issues in PR13458
      
      - Add comprehensive tests on atomic ops on various data types.
        NOTE: Some of them are turned off due to missing functionality.
      
      - Revise tests due to the new spin-loop generated.
      
      llvm-svn: 164281
      3237662b
  3. Sep 13, 2012
    • Michael Liao's avatar
      Add wider vector/integer support for PR12312 · 137f8aed
      Michael Liao authored
      - Enhance the fix to PR12312 to support wider integer, such as 256-bit
        integer. If more than 1 fully evaluated vectors are found, POR them
        first followed by the final PTEST.
      
      llvm-svn: 163832
      137f8aed
  4. Sep 11, 2012
  5. Aug 19, 2012
  6. Aug 17, 2012
  7. Aug 14, 2012
    • Michael Liao's avatar
      fix PR11334 · 34107b91
      Michael Liao authored
      - FP_EXTEND only support extending from vectors with matching elements.
        This results in the scalarization of extending to v2f64 from v2f32,
        which will be legalized to v4f32 not matching with v2f64.
      - add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
      - add BUILD_VECTOR lowering helper to recover back the original
        extending from v4f32 to v2f64.
      - test case is enhanced to include different vector width.
      
      llvm-svn: 161894
      34107b91
  8. Aug 13, 2012
  9. Aug 06, 2012
  10. Aug 03, 2012
    • Bob Wilson's avatar
      Fall back to selection DAG isel for calls to builtin functions. · 3e6fa462
      Bob Wilson authored
      Fast isel doesn't currently have support for translating builtin function
      calls to target instructions.  For embedded environments where the library
      functions are not available, this is a matter of correctness and not
      just optimization.  Most of this patch is just arranging to make the
      TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>
      
      llvm-svn: 161232
      3e6fa462
  11. Aug 01, 2012
  12. Jul 19, 2012
  13. Jul 17, 2012
    • Evan Cheng's avatar
      This is another case where instcombine demanded bits optimization created · f579beca
      Evan Cheng authored
      large immediates. Add dag combine logic to recover in case the large
      immediates doesn't fit in cmp immediate operand field.
      
      int foo(unsigned long l) {
        return (l>> 47) == 1;
      }
      
      we produce
      
        %shr.mask = and i64 %l, -140737488355328
        %cmp = icmp eq i64 %shr.mask, 140737488355328
        %conv = zext i1 %cmp to i32
        ret i32 %conv
      
      which codegens to
      
      movq    $0xffff800000000000,%rax
      andq    %rdi,%rax
      movq    $0x0000800000000000,%rcx
      cmpq    %rcx,%rax
      sete    %al
      movzbl    %al,%eax
      ret
      
      TargetLowering::SimplifySetCC would transform
      (X & -256) == 256 -> (X >> 8) == 1
      if the immediate fails the isLegalICmpImmediate() test. For x86,
      that's immediates which are not a signed 32-bit immediate.
      
      Based on a patch by Eli Friedman.
      
      PR10328
      rdar://9758774
      
      llvm-svn: 160346
      f579beca
  14. Jul 12, 2012
  15. Jun 09, 2012
  16. Jun 01, 2012
    • Hans Wennborg's avatar
      Implement the local-dynamic TLS model for x86 (PR3985) · 789acfb6
      Hans Wennborg authored
      This implements codegen support for accesses to thread-local variables
      using the local-dynamic model, and adds a clean-up pass so that the base
      address for the TLS block can be re-used between local-dynamic access on
      an execution path.
      
      llvm-svn: 157818
      789acfb6
  17. May 25, 2012
  18. Apr 27, 2012
    • Benjamin Kramer's avatar
      X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. · 913da4b2
      Benjamin Kramer authored
      * Model FPSW (the FPU status word) as a register.
      * Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
      * During Legalize/Lowering, build a node sequence to transfer the comparison
      result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
      an implicit sub-register extraction (%ax -> %ah) which is handled later on by
      the instruction selector.
      
      Fixes PR6679. Patch by Christoph Erhardt!
      
      llvm-svn: 155704
      913da4b2
  19. Apr 16, 2012
  20. Apr 15, 2012
  21. Apr 14, 2012
  22. Apr 11, 2012
    • Nadav Rotem's avatar
      Reapply 154396 after fixing a test. · 9bc178ac
      Nadav Rotem authored
      Original message:
      Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
      blendV uses a register for the selection while Vblend uses an immediate.
      On sandybridge they still have the same latency and execute on the same execution ports.
      
      llvm-svn: 154483
      9bc178ac
  23. Apr 10, 2012
  24. Apr 09, 2012
  25. Apr 04, 2012
    • Rafael Espindola's avatar
      Always compute all the bits in ComputeMaskedBits. · ba0a6cab
      Rafael Espindola authored
      This allows us to keep passing reduced masks to SimplifyDemandedBits, but
      know about all the bits if SimplifyDemandedBits fails. This allows instcombine
      to simplify cases like the one in the included testcase.
      
      llvm-svn: 154011
      ba0a6cab
  26. Feb 28, 2012
  27. Feb 25, 2012
    • NAKAMURA Takumi's avatar
      Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2... · bdf94879
      NAKAMURA Takumi authored
      Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff.
      
      [Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
      Clang raised a warning, and X86 LowerOperation would assert out for
      fptoui f64 to i32 because it improperly lowered to an illegal
      BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
      any other changes are necessary. Thanks.
      
      llvm-svn: 151432
      bdf94879
  28. Feb 24, 2012
  29. Feb 22, 2012
  30. Feb 19, 2012
  31. Feb 05, 2012
  32. Feb 02, 2012
  33. Feb 01, 2012
  34. Jan 30, 2012
  35. Jan 23, 2012
Loading