Skip to content
  1. Jun 08, 2016
    • Dehao Chen's avatar
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and... · 769219b1
      Dehao Chen authored
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
      
      Summary:
      Consider the following diamond CFG:
      
       A
      / \
      B C
       \/
       D
      
      Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
      
      Reviewers: djasper, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D20989
      
      llvm-svn: 272203
      769219b1
    • Quentin Colombet's avatar
      [AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR. · d1cd30b2
      Quentin Colombet authored
      Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
      FPR for 64-bit or 32-bit values.
      
      Add test cases demonstrating how this information is used to coalesce a
      computation on a single register bank.
      
      llvm-svn: 272170
      d1cd30b2
    • Oliver Stannard's avatar
      [ARM] MSR instructions implicitly set CPSR · b3378e2f
      Oliver Stannard authored
      The MSR instructions can write to the CPSR, but we did not model this
      fact, so we could emit them in the middle of IT blocks, changing the
      condition flags for later instructions in the block.
      
      The tests use two calls to llvm.write_register.i32 because it is valid
      to use these instructions at the end of an IT block, which if conversion
      does do in some cases. With two calls, the first clobbers the flags, so
      a branch has to be used to make the second one conditional.
      
      Differential Revision: http://reviews.llvm.org/D21139
      
      llvm-svn: 272154
      b3378e2f
    • Matthias Braun's avatar
      MIR: Fix parsing of stack object references in MachineMemOperands · 3ef7df9c
      Matthias Braun authored
      The MachineMemOperand parser lacked the code to handle %stack.X
      references (%fixed-stack.X was working).
      
      llvm-svn: 272082
      3ef7df9c
  2. Jun 07, 2016
    • Nicolai Haehnle's avatar
      AMDGPU: Add amdgpu-ps-wqm-outputs function attributes · c00e03b8
      Nicolai Haehnle authored
      Summary:
      The presence of this attribute indicates that VGPR outputs should be computed
      in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
      that derivatives can be taken of shader inputs computed by the prolog, fixing
      a bug.
      
      The generated code could certainly be improved: if a prolog pixel shader is
      used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
      stipples, and forcing per-sample interpolation), Mesa will use this attribute
      unconditionally, because it has to be conservative. So WQM may be used in the
      prolog when it isn't really needed, and furthermore a silly back-and-forth
      switch is likely to happen at the boundary between prolog and main shader
      parts.
      
      Fixing this is a bit involved: we'd first have to add a mechanism by which
      LLVM writes the WQM-related input requirements to the main shader part binary,
      and then Mesa specializes the prolog part accordingly. At that point, we may
      as well just compile a monolithic shader...
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
      
      Reviewers: arsenm, tstellarAMD, mareko
      
      Subscribers: arsenm, llvm-commits, kzhuravl
      
      Differential Revision: http://reviews.llvm.org/D20839
      
      llvm-svn: 272063
      c00e03b8
    • Simon Pilgrim's avatar
      [X86][SSE4A] Regenerated SSE4A intrinsics tests · 536434e8
      Simon Pilgrim authored
      There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output
      
      llvm-svn: 272060
      536434e8
    • Eric Christopher's avatar
      Revert "Differential Revision: http://reviews.llvm.org/D20557" · 538d09d0
      Eric Christopher authored
      Author: Wei Ding <wei.ding2@amd.com>
      Date:   Tue Jun 7 19:04:44 2016 +0000
      
          Differential Revision: http://reviews.llvm.org/D20557
      
          git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
          91177308-0d34-0410-b5e6-96231b3b80d8
      
      as it was breaking the bots.
      
      This reverts commit r272044.
      
      llvm-svn: 272056
      538d09d0
    • Etienne Bergeron's avatar
      [stack-protection] Add support for MSVC buffer security check · 22bfa832
      Etienne Bergeron authored
      Summary:
      This patch is adding support for the MSVC buffer security check implementation
      
      The buffer security check is turned on with the '/GS' compiler switch.
        * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
        * To be added to clang here: http://reviews.llvm.org/D20347
      
      Some overview of buffer security check feature and implementation:
        * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
        * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
        * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
      
      
      For the following example:
      ```
      int example(int offset, int index) {
        char buffer[10];
        memset(buffer, 0xCC, index);
        return buffer[index];
      }
      ```
      
      The MSVC compiler is adding these instructions to perform stack integrity check:
      ```
              push        ebp  
              mov         ebp,esp  
              sub         esp,50h  
        [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
        [2]   xor         eax,ebp  
        [3]   mov         dword ptr [ebp-4],eax  
              push        ebx  
              push        esi  
              push        edi  
              mov         eax,dword ptr [index]  
              push        eax  
              push        0CCh  
              lea         ecx,[buffer]  
              push        ecx  
              call        _memset (010610B9h)  
              add         esp,0Ch  
              mov         eax,dword ptr [index]  
              movsx       eax,byte ptr buffer[eax]  
              pop         edi  
              pop         esi  
              pop         ebx  
        [4]   mov         ecx,dword ptr [ebp-4]  
        [5]   xor         ecx,ebp  
        [6]   call        @__security_check_cookie@4 (01061276h)  
              mov         esp,ebp  
              pop         ebp  
              ret  
      ```
      
      The instrumentation above is:
        * [1] is loading the global security canary,
        * [3] is storing the local computed ([2]) canary to the guard slot,
        * [4] is loading the guard slot and ([5]) re-compute the global canary,
        * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
      
      Overview of the current stack-protection implementation:
        * lib/CodeGen/StackProtector.cpp
          * There is a default stack-protection implementation applied on intermediate representation.
          * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
          * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
          * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
          * Guard manipulation and comparison are added directly to the intermediate representation.
      
        * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
        * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
          * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
            * see long comment above 'class StackProtectorDescriptor' declaration.
          * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
            * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
          * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
      
        * include/llvm/Target/TargetLowering.h
          * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
      
        * lib/Target/X86/X86ISelLowering.cpp
          * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
      
      Function-based Instrumentation:
        * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
        * To support function-based instrumentation, this patch is
          * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
            * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
          * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
          * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
          * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
      
      Modifications
        * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
        * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
      
      Results
      
        * IR generated instrumentation:
      ```
      clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
      ```
      
      ```
      *** Final LLVM Code input to ISel ***
      
      ; Function Attrs: nounwind sspstrong
      define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
      entry:
        %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
        %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
        call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
        %index.addr = alloca i32, align 4
        %offset.addr = alloca i32, align 4
        %buffer = alloca [10 x i8], align 1
        store i32 %index, i32* %index.addr, align 4
        store i32 %offset, i32* %offset.addr, align 4
        %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
        %1 = load i32, i32* %index.addr, align 4
        call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
        %2 = load i32, i32* %index.addr, align 4
        %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
        %3 = load i8, i8* %arrayidx, align 1
        %conv = sext i8 %3 to i32
        %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
        call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
        ret i32 %conv
      }
      ```
      
        * SelectionDAG generated instrumentation:
      
      ```
      clang-cl /GS test.cc /O1 /c /FA
      ```
      
      ```
      "?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
      # BB#0:                                 # %entry
              pushl   %esi
              subl    $16, %esp
              movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
              movl    28(%esp), %esi
              movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
              leal    2(%esp), %eax
              pushl   %esi
              pushl   $204
              pushl   %eax
              calll   _memset
              addl    $12, %esp
              movsbl  2(%esp,%esi), %esi
              movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
              calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
              movl    %esi, %eax
              addl    $16, %esp
              popl    %esi
              retl
      ```
      
      Reviewers: kcc, pcc, eugenis, rnk
      
      Subscribers: majnemer, llvm-commits, hans, thakis, rnk
      
      Differential Revision: http://reviews.llvm.org/D20346
      
      llvm-svn: 272053
      22bfa832
    • Wei Ding's avatar
      Differential Revision: http://reviews.llvm.org/D20557 · a70216f1
      Wei Ding authored
      llvm-svn: 272044
      a70216f1
    • Geoff Berry's avatar
      486f49cc
    • Haicheng Wu's avatar
      Revert "[MBP] Reduce code size by running tail merging in MBP." · 4fa9f3ae
      Haicheng Wu authored
      This reverts commit r271930, r271915, r271923.  They break a thumb selfhosting
      bot.
      
      llvm-svn: 272017
      4fa9f3ae
    • Simon Pilgrim's avatar
      [X86][AVX512] Added 512-bit integer vector non-temporal load tests · 15c6ab5f
      Simon Pilgrim authored
      llvm-svn: 272016
      15c6ab5f
    • Simon Pilgrim's avatar
      [X86][SSE] Add general lowering of nontemporal vector loads · 9a89623b
      Simon Pilgrim authored
      Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
      
      This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
      
      We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
      
      There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
      
      Differential Review: http://reviews.llvm.org/D20965
      
      llvm-svn: 272010
      9a89623b
    • James Molloy's avatar
      [Thumb-1] Add optimized constant materialization for integers [256..512) · b101383f
      James Molloy authored
      We can materialize these integers using a MOV; ADDi8 pair.
      
      llvm-svn: 272007
      b101383f
    • Igor Breger's avatar
      [AVX512] Fix load opcode for fast isel. · 61e62859
      Igor Breger authored
      Differential Revision: http://reviews.llvm.org/D21067
      
      llvm-svn: 272006
      61e62859
    • Ulrich Weigand's avatar
      [PowerPC] Support multiple return values with fast isel · 6b0634b3
      Ulrich Weigand authored
      Using an LLVM IR aggregate return value type containing three
      or more integer values causes an abort in the fast isel pass.
      
      This patch adds two more registers to RetCC_PPC64_ELF_FIS to
      allow returning up to four integers with fast isel, just the
      same as is currently supported with regular isel (RetCC_PPC).
      
      This is needed for Swift and (possibly) other non-clang frontends.
      
      Fixes PR26190.
      
      llvm-svn: 272005
      6b0634b3
    • Simon Pilgrim's avatar
      [X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly · ca1da1bf
      Simon Pilgrim authored
      We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
      
      This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
      
      llvm-svn: 272003
      ca1da1bf
    • James Molloy's avatar
      [ARM] Shrink post-indexed LDR and STR to LDM/STM · 53298a18
      James Molloy authored
      A Thumb-2 post-indexed LDR instruction such as:
      
        ldr.w r0, [r1], #4
      
      Can be rewritten as:
      
        ldm.n r1!, {r0}
      
      LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.
      
      llvm-svn: 272002
      53298a18
    • James Molloy's avatar
      [ARM] Transform LDMs into writeback form to save code size · 75afc951
      James Molloy authored
      If we have an LDM that uses only low registers and doesn't write to its base register:
      
        ldm.w r0, {r1, r2, r3}
      
      And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:
      
        ldm.n r0!, {r1, r2, r3}
      
      Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.
      
      llvm-svn: 272000
      75afc951
    • Saleem Abdulrasool's avatar
      ARM: correct TLS access on WoA · 532dcbc2
      Saleem Abdulrasool authored
      TLS access requires an offset from the TLS index.  The index itself is the
      section-relative distance of the symbol.  For ARM, the relevant relocation
      (IMAGE_REL_ARM_SECREL) is applied as a constant.  This means that the value may
      not be an immediate and must be lowered into a constant pool.  This offset will
      not be base relocated.  We were previously emitting the actual address of the
      symbol which would be base relocated and would therefore be the vaue offset by
      the ImageBase + TLS Offset.
      
      llvm-svn: 271974
      532dcbc2
  3. Jun 06, 2016
  4. Jun 05, 2016
  5. Jun 04, 2016
  6. Jun 03, 2016
    • Chad Rosier's avatar
      9faa5bcf
    • Chad Rosier's avatar
      [AArch64] Spot SBFX-compatible code expressed with sign_extend. · be879ea7
      Chad Rosier authored
      This is very similar to r271677, but for extracts from i32 with the SIGN_EXTEND
      acting on a arithmetic shift.
      
      llvm-svn: 271717
      be879ea7
    • Derek Schuff's avatar
      [WebAssembly] Emit type signatures for declared functions · 5859a9ed
      Derek Schuff authored
      Under emscripten, C code can take the address of a function implemented
      in Javascript (which is exposed via an import in wasm). Because imports
      do not have linear memory address in wasm, we need to generate a thunk
      to be the target of the indirect call; it call the import directly.
      
      To make this possible, LLVM needs to emit the type signatures for these
      functions, because they may not be called directly or referred to other
      than where the address is taken.
      
      This uses s new .s directive (.functype) which specifies the signature.
      
      Differential Revision: http://reviews.llvm.org/D20891
      
      Re-apply r271599 but instead of bailing with an error when a declared
      function has multiple returns, replace it with a pointer argument. Also
      add the test case I forgot to 'git add' last time around.
      
      llvm-svn: 271703
      5859a9ed
Loading