Skip to content
  1. Jun 08, 2016
    • Dehao Chen's avatar
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and... · 769219b1
      Dehao Chen authored
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
      
      Summary:
      Consider the following diamond CFG:
      
       A
      / \
      B C
       \/
       D
      
      Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
      
      Reviewers: djasper, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D20989
      
      llvm-svn: 272203
      769219b1
  2. Jun 07, 2016
    • Simon Pilgrim's avatar
      [X86][SSE4A] Regenerated SSE4A intrinsics tests · 536434e8
      Simon Pilgrim authored
      There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output
      
      llvm-svn: 272060
      536434e8
    • Etienne Bergeron's avatar
      [stack-protection] Add support for MSVC buffer security check · 22bfa832
      Etienne Bergeron authored
      Summary:
      This patch is adding support for the MSVC buffer security check implementation
      
      The buffer security check is turned on with the '/GS' compiler switch.
        * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
        * To be added to clang here: http://reviews.llvm.org/D20347
      
      Some overview of buffer security check feature and implementation:
        * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
        * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
        * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
      
      
      For the following example:
      ```
      int example(int offset, int index) {
        char buffer[10];
        memset(buffer, 0xCC, index);
        return buffer[index];
      }
      ```
      
      The MSVC compiler is adding these instructions to perform stack integrity check:
      ```
              push        ebp  
              mov         ebp,esp  
              sub         esp,50h  
        [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
        [2]   xor         eax,ebp  
        [3]   mov         dword ptr [ebp-4],eax  
              push        ebx  
              push        esi  
              push        edi  
              mov         eax,dword ptr [index]  
              push        eax  
              push        0CCh  
              lea         ecx,[buffer]  
              push        ecx  
              call        _memset (010610B9h)  
              add         esp,0Ch  
              mov         eax,dword ptr [index]  
              movsx       eax,byte ptr buffer[eax]  
              pop         edi  
              pop         esi  
              pop         ebx  
        [4]   mov         ecx,dword ptr [ebp-4]  
        [5]   xor         ecx,ebp  
        [6]   call        @__security_check_cookie@4 (01061276h)  
              mov         esp,ebp  
              pop         ebp  
              ret  
      ```
      
      The instrumentation above is:
        * [1] is loading the global security canary,
        * [3] is storing the local computed ([2]) canary to the guard slot,
        * [4] is loading the guard slot and ([5]) re-compute the global canary,
        * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
      
      Overview of the current stack-protection implementation:
        * lib/CodeGen/StackProtector.cpp
          * There is a default stack-protection implementation applied on intermediate representation.
          * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
          * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
          * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
          * Guard manipulation and comparison are added directly to the intermediate representation.
      
        * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
        * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
          * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
            * see long comment above 'class StackProtectorDescriptor' declaration.
          * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
            * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
          * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
      
        * include/llvm/Target/TargetLowering.h
          * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
      
        * lib/Target/X86/X86ISelLowering.cpp
          * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
      
      Function-based Instrumentation:
        * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
        * To support function-based instrumentation, this patch is
          * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
            * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
          * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
          * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
          * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
      
      Modifications
        * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
        * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
      
      Results
      
        * IR generated instrumentation:
      ```
      clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
      ```
      
      ```
      *** Final LLVM Code input to ISel ***
      
      ; Function Attrs: nounwind sspstrong
      define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
      entry:
        %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
        %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
        call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
        %index.addr = alloca i32, align 4
        %offset.addr = alloca i32, align 4
        %buffer = alloca [10 x i8], align 1
        store i32 %index, i32* %index.addr, align 4
        store i32 %offset, i32* %offset.addr, align 4
        %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
        %1 = load i32, i32* %index.addr, align 4
        call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
        %2 = load i32, i32* %index.addr, align 4
        %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
        %3 = load i8, i8* %arrayidx, align 1
        %conv = sext i8 %3 to i32
        %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
        call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
        ret i32 %conv
      }
      ```
      
        * SelectionDAG generated instrumentation:
      
      ```
      clang-cl /GS test.cc /O1 /c /FA
      ```
      
      ```
      "?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
      # BB#0:                                 # %entry
              pushl   %esi
              subl    $16, %esp
              movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
              movl    28(%esp), %esi
              movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
              leal    2(%esp), %eax
              pushl   %esi
              pushl   $204
              pushl   %eax
              calll   _memset
              addl    $12, %esp
              movsbl  2(%esp,%esi), %esi
              movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
              calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
              movl    %esi, %eax
              addl    $16, %esp
              popl    %esi
              retl
      ```
      
      Reviewers: kcc, pcc, eugenis, rnk
      
      Subscribers: majnemer, llvm-commits, hans, thakis, rnk
      
      Differential Revision: http://reviews.llvm.org/D20346
      
      llvm-svn: 272053
      22bfa832
    • Simon Pilgrim's avatar
      [X86][AVX512] Added 512-bit integer vector non-temporal load tests · 15c6ab5f
      Simon Pilgrim authored
      llvm-svn: 272016
      15c6ab5f
    • Simon Pilgrim's avatar
      [X86][SSE] Add general lowering of nontemporal vector loads · 9a89623b
      Simon Pilgrim authored
      Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
      
      This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
      
      We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
      
      There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
      
      Differential Review: http://reviews.llvm.org/D20965
      
      llvm-svn: 272010
      9a89623b
    • Igor Breger's avatar
      [AVX512] Fix load opcode for fast isel. · 61e62859
      Igor Breger authored
      Differential Revision: http://reviews.llvm.org/D21067
      
      llvm-svn: 272006
      61e62859
    • Simon Pilgrim's avatar
      [X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly · ca1da1bf
      Simon Pilgrim authored
      We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
      
      This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
      
      llvm-svn: 272003
      ca1da1bf
  3. Jun 06, 2016
  4. Jun 05, 2016
  5. Jun 04, 2016
  6. Jun 03, 2016
  7. Jun 02, 2016
  8. Jun 01, 2016
Loading