Skip to content
  1. Jan 25, 2018
  2. Jan 24, 2018
  3. Jan 23, 2018
  4. Jan 22, 2018
    • Chandler Carruth's avatar
      Introduce the "retpoline" x86 mitigation technique for variant #2 of the... · c58f2166
      Chandler Carruth authored
      Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
      
      Summary:
      First, we need to explain the core of the vulnerability. Note that this
      is a very incomplete description, please see the Project Zero blog post
      for details:
      https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
      
      The basis for branch target injection is to direct speculative execution
      of the processor to some "gadget" of executable code by poisoning the
      prediction of indirect branches with the address of that gadget. The
      gadget in turn contains an operation that provides a side channel for
      reading data. Most commonly, this will look like a load of secret data
      followed by a branch on the loaded value and then a load of some
      predictable cache line. The attacker then uses timing of the processors
      cache to determine which direction the branch took *in the speculative
      execution*, and in turn what one bit of the loaded value was. Due to the
      nature of these timing side channels and the branch predictor on Intel
      processors, this allows an attacker to leak data only accessible to
      a privileged domain (like the kernel) back into an unprivileged domain.
      
      The goal is simple: avoid generating code which contains an indirect
      branch that could have its prediction poisoned by an attacker. In many
      cases, the compiler can simply use directed conditional branches and
      a small search tree. LLVM already has support for lowering switches in
      this way and the first step of this patch is to disable jump-table
      lowering of switches and introduce a pass to rewrite explicit indirectbr
      sequences into a switch over integers.
      
      However, there is no fully general alternative to indirect calls. We
      introduce a new construct we call a "retpoline" to implement indirect
      calls in a non-speculatable way. It can be thought of loosely as
      a trampoline for indirect calls which uses the RET instruction on x86.
      Further, we arrange for a specific call->ret sequence which ensures the
      processor predicts the return to go to a controlled, known location. The
      retpoline then "smashes" the return address pushed onto the stack by the
      call with the desired target of the original indirect call. The result
      is a predicted return to the next instruction after a call (which can be
      used to trap speculative execution within an infinite loop) and an
      actual indirect branch to an arbitrary address.
      
      On 64-bit x86 ABIs, this is especially easily done in the compiler by
      using a guaranteed scratch register to pass the target into this device.
      For 32-bit ABIs there isn't a guaranteed scratch register and so several
      different retpoline variants are introduced to use a scratch register if
      one is available in the calling convention and to otherwise use direct
      stack push/pop sequences to pass the target address.
      
      This "retpoline" mitigation is fully described in the following blog
      post: https://support.google.com/faqs/answer/7625886
      
      We also support a target feature that disables emission of the retpoline
      thunk by the compiler to allow for custom thunks if users want them.
      These are particularly useful in environments like kernels that
      routinely do hot-patching on boot and want to hot-patch their thunk to
      different code sequences. They can write this custom thunk and use
      `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
      case, on x86-64 thu thunk names must be:
      ```
        __llvm_external_retpoline_r11
      ```
      or on 32-bit:
      ```
        __llvm_external_retpoline_eax
        __llvm_external_retpoline_ecx
        __llvm_external_retpoline_edx
        __llvm_external_retpoline_push
      ```
      And the target of the retpoline is passed in the named register, or in
      the case of the `push` suffix on the top of the stack via a `pushl`
      instruction.
      
      There is one other important source of indirect branches in x86 ELF
      binaries: the PLT. These patches also include support for LLD to
      generate PLT entries that perform a retpoline-style indirection.
      
      The only other indirect branches remaining that we are aware of are from
      precompiled runtimes (such as crt0.o and similar). The ones we have
      found are not really attackable, and so we have not focused on them
      here, but eventually these runtimes should also be replicated for
      retpoline-ed configurations for completeness.
      
      For kernels or other freestanding or fully static executables, the
      compiler switch `-mretpoline` is sufficient to fully mitigate this
      particular attack. For dynamic executables, you must compile *all*
      libraries with `-mretpoline` and additionally link the dynamic
      executable and all shared libraries with LLD and pass `-z retpolineplt`
      (or use similar functionality from some other linker). We strongly
      recommend also using `-z now` as non-lazy binding allows the
      retpoline-mitigated PLT to be substantially smaller.
      
      When manually apply similar transformations to `-mretpoline` to the
      Linux kernel we observed very small performance hits to applications
      running typical workloads, and relatively minor hits (approximately 2%)
      even for extremely syscall-heavy applications. This is largely due to
      the small number of indirect branches that occur in performance
      sensitive paths of the kernel.
      
      When using these patches on statically linked applications, especially
      C++ applications, you should expect to see a much more dramatic
      performance hit. For microbenchmarks that are switch, indirect-, or
      virtual-call heavy we have seen overheads ranging from 10% to 50%.
      
      However, real-world workloads exhibit substantially lower performance
      impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
      the impact of hot indirect calls (by speculatively promoting them to
      direct calls) and allow optimized search trees to be used to lower
      switches. If you need to deploy these techniques in C++ applications, we
      *strongly* recommend that you ensure all hot call targets are statically
      linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
      tuned servers using all of these techniques saw 5% - 10% overhead from
      the use of retpoline.
      
      We will add detailed documentation covering these components in
      subsequent patches, but wanted to make the core functionality available
      as soon as possible. Happy for more code review, but we'd really like to
      get these patches landed and backported ASAP for obvious reasons. We're
      planning to backport this to both 6.0 and 5.0 release streams and get
      a 5.0 release with just this cherry picked ASAP for distros and vendors.
      
      This patch is the work of a number of people over the past month: Eric, Reid,
      Rui, and myself. I'm mailing it out as a single commit due to the time
      sensitive nature of landing this and the need to backport it. Huge thanks to
      everyone who helped out here, and everyone at Intel who helped out in
      discussions about how to craft this. Also, credit goes to Paul Turner (at
      Google, but not an LLVM contributor) for much of the underlying retpoline
      design.
      
      Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
      
      Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41723
      
      llvm-svn: 323155
      c58f2166
    • Serguei Katkov's avatar
      Revert [SCEV] Fix isLoopEntryGuardedByCond usage · f38041dc
      Serguei Katkov authored
      It causes buildbot failures. New added assert is fired.
      It seems not all usages of isLoopEntryGuardedByCond are fixed.
      
      llvm-svn: 323079
      f38041dc
    • Serguei Katkov's avatar
      [SCEV] Fix isLoopEntryGuardedByCond usage · 50714a1c
      Serguei Katkov authored
      ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
      that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
      
      Reviewers: sanjoy, mkazantsev, anna, dorit
      Reviewed By: mkazantsev
      Subscribers: llvm-commits
      Differential Revision: https://reviews.llvm.org/D42165
      
      llvm-svn: 323077
      50714a1c
  5. Jan 21, 2018
  6. Jan 20, 2018
    • Sanjay Patel's avatar
      [InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC · 43913218
      Sanjay Patel authored
      This fold is proposed in D42032.
      
      llvm-svn: 323043
      43913218
    • Craig Topper's avatar
      [X86] Add support for passing 'prefer-vector-width' function attribute into... · 0d797a34
      Craig Topper authored
      [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
      
      This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.
      
      I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.
      
      This has been split from D41895 with the remainder in a follow up commit.
      
      llvm-svn: 323015
      0d797a34
    • Akira Hatanaka's avatar
      [ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call · 73ceb50d
      Akira Hatanaka authored
      to @objc_autorelease if its operand is a PHI and the PHI has an
      equivalent value that is used by a return instruction.
      
      For example, ARC optimizer shouldn't replace the call in the following
      example, as doing so breaks the AutoreleaseRV/RetainRV optimization:
      
        %v1 = bitcast i32* %v0 to i8*
        br label %bb3
      bb2:
        %v3 = bitcast i32* %v2 to i8*
        br label %bb3
      bb3:
        %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
        %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
        %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
        ret i32* %retval
      
      Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
      operand uses with its value so that the call gets tail-called.
      
      rdar://problem/15894705
      
      llvm-svn: 323009
      73ceb50d
  7. Jan 19, 2018
    • Jakub Kuderski's avatar
      [Dominators] Visit affected node candidates found at different root levels · d2e371f0
      Jakub Kuderski authored
      Summary:
      This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 | PR35969 ]] .
      
      When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not.
      
      This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected.
      
      Reviewers: brzycki, davide, dberlin, grosser
      
      Reviewed By: brzycki
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42231
      
      llvm-svn: 322993
      d2e371f0
    • Daniel Neilson's avatar
      Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) · 1e68724d
      Daniel Neilson authored
      Summary:
       This is a resurrection of work first proposed and discussed in Aug 2015:
         http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
      and initially landed (but then backed out) in Nov 2015:
         http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
      
       The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
      which is required to be a constant integer. It represents the alignment of the
      dest (and source), and so must be the minimum of the actual alignment of the
      two.
      
       This change is the first in a series that allows source and dest to each
      have their own alignments by using the alignment attribute on their arguments.
      
       In this change we:
      1) Remove the alignment argument.
      2) Add alignment attributes to the source & dest arguments. We, temporarily,
         require that the alignments for source & dest be equal.
      
       For example, code which used to read:
        call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
      will now read
        call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
      
       Downstream users may have to update their lit tests that check for
      @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
      may help with updating the majority of your tests, but it does not catch all possible
      patterns so some manual checking and updating will be required.
      
      s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
      s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
      s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
      s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
      s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g
      
       The remaining changes in the series will:
      Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
         source and dest alignments.
      Step 3) Update Clang to use the new IRBuilder API.
      Step 4) Update Polly to use the new IRBuilder API.
      Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
              and those that use use MemIntrinsicInst::[get|set]Alignment() to use
              getDestAlignment() and getSourceAlignment() instead.
      Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
              MemIntrinsicInst::[get|set]Alignment() methods.
      
      Reviewers: pete, hfinkel, lhames, reames, bollu
      
      Reviewed By: reames
      
      Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41675
      
      llvm-svn: 322965
      1e68724d
    • Alexey Bataev's avatar
      [SLP] Fix vectorization for tree with trunc to minimum required bit width. · fa80c47c
      Alexey Bataev authored
      Summary:
      If the vectorized tree has truncate to minimum required bit width and
      the vector type of the cast operation after the truncation is the same
      as the vector type of the cast operands, count cost of the vector cast
      operation as 0, because this cast will be later removed.
      Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.
      
      Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41948
      
      llvm-svn: 322946
      fa80c47c
    • John Brawn's avatar
      [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr · 2867bd72
      John Brawn authored
      Three (or more) operand getelementptrs could plausibly also be handled, but
      handling only two-operand fits in easily with the existing BinaryOperator
      handling.
      
      Differential Revision: https://reviews.llvm.org/D39958
      
      llvm-svn: 322930
      2867bd72
    • Sanjay Patel's avatar
      [InstSimplify] regenerate checks and add tests for commutes; NFC · a19b748f
      Sanjay Patel authored
      llvm-svn: 322907
      a19b748f
  8. Jan 18, 2018
Loading