Skip to content
  1. Jul 10, 2019
  2. Jul 09, 2019
    • Peter Collingbourne's avatar
      hwasan: Improve precision of checks using short granule tags. · 1366262b
      Peter Collingbourne authored
      A short granule is a granule of size between 1 and `TG-1` bytes. The size
      of a short granule is stored at the location in shadow memory where the
      granule's tag is normally stored, while the granule's actual tag is stored
      in the last byte of the granule. This means that in order to verify that a
      pointer tag matches a memory tag, HWASAN must check for two possibilities:
      
      * the pointer tag is equal to the memory tag in shadow memory, or
      * the shadow memory tag is actually a short granule size, the value being loaded
        is in bounds of the granule and the pointer tag is equal to the last byte of
        the granule.
      
      Pointer tags between 1 to `TG-1` are possible and are as likely as any other
      tag. This means that these tags in memory have two interpretations: the full
      tag interpretation (where the pointer tag is between 1 and `TG-1` and the
      last byte of the granule is ordinary data) and the short tag interpretation
      (where the pointer tag is stored in the granule).
      
      When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
      will show both the memory tag and the last byte of the granule. Currently,
      it is up to the user to disambiguate the two possibilities.
      
      Because this functionality obsoletes the right aligned heap feature of
      the HWASAN memory allocator (and because we can no longer easily test
      it), the feature is removed.
      
      Also update the documentation to cover both short granule tags and
      outlined checks.
      
      Differential Revision: https://reviews.llvm.org/D63908
      
      llvm-svn: 365551
      1366262b
    • Craig Topper's avatar
      [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into... · 84a1f073
      Craig Topper authored
      [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
      
      Basically the problem is that X86 doesn't set the Fast flag from
      allowsMemoryAccess on certain CPUs due to slow unaligned memory
      subtarget features. This prevents bitcasts from being folded into
      loads and stores. But all vector loads and stores of the same width
      are the same cost on X86.
      
      This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
      
      Differential Revision: https://reviews.llvm.org/D64295
      
      llvm-svn: 365549
      84a1f073
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] gfx908 register file changes · 9e77d0c6
      Stanislav Mekhanoshin authored
      Differential Revision: https://reviews.llvm.org/D64438
      
      llvm-svn: 365546
      9e77d0c6
    • Sean Fertile's avatar
      Boilerplate for producing XCOFF object files from the PowerPC backend. · f09d54ed
      Sean Fertile authored
      Stubs out a number of the classes needed to produce a new object file format
      (XCOFF) for the powerpc-aix target. For testing input is an empty module which
      produces an object file with just a file header.
      
      Differential Revision: https://reviews.llvm.org/D61694
      
      llvm-svn: 365541
      f09d54ed
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] gfx908 target · 22b2c3d6
      Stanislav Mekhanoshin authored
      Differential Revision: https://reviews.llvm.org/D64429
      
      llvm-svn: 365525
      22b2c3d6
    • Matt Arsenault's avatar
      AMDGPU: Fix test failing since r365512 · 077df019
      Matt Arsenault authored
      llvm-svn: 365521
      077df019
    • Christudasan Devadasan's avatar
      [AMDGPU] Created a sub-register class for the return address operand in the return instruction. · b2d24bd5
      Christudasan Devadasan authored
      Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
      the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
      exclusive of the CSRs, and used this regclass while lowering the return instruction.
      
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D63924
      
      llvm-svn: 365512
      b2d24bd5
    • Sam Elliott's avatar
      [RISCV] Fix ICE in isDesirableToCommuteWithShift · 114d2db4
      Sam Elliott authored
      Summary:
      There was an error being thrown from isDesirableToCommuteWithShift in
      some tests. This was tracked down to the method being called before
      legalisation, with an extended value type, not a machine value type.
      
      In the case I diagnosed, the error was only hit with an instruction sequence
      involving `i24`s in the add and shift. `i24` is not a Machine ValueType, it is
      instead an Extended ValueType which was causing the issue.
      
      I have added a test to cover this case, and fixed the error in the callback.
      
      Reviewers: asb, luismarques
      
      Reviewed By: asb
      
      Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D64425
      
      llvm-svn: 365511
      114d2db4
    • Amara Emerson's avatar
      [AArch64][GlobalISel] Optimize conditional branches followed by unconditional branches · 6616e269
      Amara Emerson authored
      If we have an icmp->brcond->br sequence where the brcond just branches to the
      next block jumping over the br, while the br takes the false edge, then we can
      modify the conditional branch to jump to the br's target while inverting the
      condition of the incoming icmp. This means we can eliminate the br as an
      unconditional branch to the fallthrough block.
      
      Differential Revision: https://reviews.llvm.org/D64354
      
      llvm-svn: 365510
      6616e269
    • Simon Atanasyan's avatar
      [mips] Show error in case of using FP64 mode on pre MIPS32R2 CPU · e3892d84
      Simon Atanasyan authored
      llvm-svn: 365508
      e3892d84
    • Simon Atanasyan's avatar
      [mips] Explicitly select `mips32r2` CPU for test cases require 64-bit FPU. NFC · 623282f0
      Simon Atanasyan authored
      Support for 64-bit coprocessors on a 32-bit architecture
      was added in `MIPS32 R2`.
      
      llvm-svn: 365507
      623282f0
    • Yonghong Song's avatar
      [BPF] Support for compile once and run everywhere · d3d88d08
      Yonghong Song authored
      Introduction
      ============
      
      This patch added intial support for bpf program compile once
      and run everywhere (CO-RE).
      
      The main motivation is for bpf program which depends on
      kernel headers which may vary between different kernel versions.
      The initial discussion can be found at https://lwn.net/Articles/773198/
      
      .
      
      Currently, bpf program accesses kernel internal data structure
      through bpf_probe_read() helper. The idea is to capture the
      kernel data structure to be accessed through bpf_probe_read()
      and relocate them on different kernel versions.
      
      On each host, right before bpf program load, the bpfloader
      will look at the types of the native linux through vmlinux BTF,
      calculates proper access offset and patch the instruction.
      
      To accommodate this, three intrinsic functions
         preserve_{array,union,struct}_access_index
      are introduced which in clang will preserve the base pointer,
      struct/union/array access_index and struct/union debuginfo type
      information. Later, bpf IR pass can reconstruct the whole gep
      access chains without looking at gep itself.
      
      This patch did the following:
        . An IR pass is added to convert preserve_*_access_index to
          global variable who name encodes the getelementptr
          access pattern. The global variable has metadata
          attached to describe the corresponding struct/union
          debuginfo type.
        . An SimplifyPatchable MachineInstruction pass is added
          to remove unnecessary loads.
        . The BTF output pass is enhanced to generate relocation
          records located in .BTF.ext section.
      
      Typical CO-RE also needs support of global variables which can
      be assigned to different values to different hosts. For example,
      kernel version can be used to guard different versions of codes.
      This patch added the support for patchable externals as well.
      
      Example
      =======
      
      The following is an example.
      
        struct pt_regs {
          long arg1;
          long arg2;
        };
        struct sk_buff {
          int i;
          struct net_device *dev;
        };
      
        #define _(x) (__builtin_preserve_access_index(x))
        static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) =
                (void *) 4;
        extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version;
        int bpf_prog(struct pt_regs *ctx) {
          struct net_device *dev = 0;
      
          // ctx->arg* does not need bpf_probe_read
          if (__kernel_version >= 41608)
            bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev));
          else
            bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev));
          return dev != 0;
        }
      
      In the above, we want to translate the third argument of
      bpf_probe_read() as relocations.
      
        -bash-4.4$ clang -target bpf -O2 -g -S trace.c
      
      The compiler will generate two new subsections in .BTF.ext,
      OffsetReloc and ExternReloc.
      OffsetReloc is to record the structure member offset operations,
      and ExternalReloc is to record the external globals where
      only u8, u16, u32 and u64 are supported.
      
         BPFOffsetReloc Size
         struct SecLOffsetReloc for ELF section #1
         A number of struct BPFOffsetReloc for ELF section #1
         struct SecOffsetReloc for ELF section #2
         A number of struct BPFOffsetReloc for ELF section #2
         ...
         BPFExternReloc Size
         struct SecExternReloc for ELF section #1
         A number of struct BPFExternReloc for ELF section #1
         struct SecExternReloc for ELF section #2
         A number of struct BPFExternReloc for ELF section #2
      
        struct BPFOffsetReloc {
          uint32_t InsnOffset;    ///< Byte offset in this section
          uint32_t TypeID;        ///< TypeID for the relocation
          uint32_t OffsetNameOff; ///< The string to traverse types
        };
      
        struct BPFExternReloc {
          uint32_t InsnOffset;    ///< Byte offset in this section
          uint32_t ExternNameOff; ///< The string for external variable
        };
      
      Note that only externs with attribute section ".BPF.patchable_externs"
      are considered for Extern Reloc which will be patched by bpf loader
      right before the load.
      
      For the above test case, two offset records and one extern record
      will be generated:
        OffsetReloc records:
              .long   .Ltmp12                 # Insn Offset
              .long   7                       # TypeId
              .long   242                     # Type Decode String
              .long   .Ltmp18                 # Insn Offset
              .long   7                       # TypeId
              .long   242                     # Type Decode String
      
        ExternReloc record:
              .long   .Ltmp5                  # Insn Offset
              .long   165                     # External Variable
      
        In string table:
              .ascii  "0:1"                   # string offset=242
              .ascii  "__kernel_version"      # string offset=165
      
      The default member offset can be calculated as
          the 2nd member offset (0 representing the 1st member) of struct "sk_buff".
      
      The asm code:
          .Ltmp5:
          .Ltmp6:
                  r2 = 0
                  r3 = 41608
          .Ltmp7:
          .Ltmp8:
                  .loc    1 18 9 is_stmt 0        # t.c:18:9
          .Ltmp9:
                  if r3 > r2 goto LBB0_2
          .Ltmp10:
          .Ltmp11:
                  .loc    1 0 9                   # t.c:0:9
          .Ltmp12:
                  r2 = 8
          .Ltmp13:
                  .loc    1 19 66 is_stmt 1       # t.c:19:66
          .Ltmp14:
          .Ltmp15:
                  r3 = *(u64 *)(r1 + 0)
                  goto LBB0_3
          .Ltmp16:
          .Ltmp17:
          LBB0_2:
                  .loc    1 0 66 is_stmt 0        # t.c:0:66
          .Ltmp18:
                  r2 = 8
                  .loc    1 21 66 is_stmt 1       # t.c:21:66
          .Ltmp19:
                  r3 = *(u64 *)(r1 + 8)
          .Ltmp20:
          .Ltmp21:
          LBB0_3:
                  .loc    1 0 66 is_stmt 0        # t.c:0:66
                  r3 += r2
                  r1 = r10
          .Ltmp22:
          .Ltmp23:
          .Ltmp24:
                  r1 += -8
                  r2 = 8
                  call 4
      
      For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number
      8 is the structure offset based on the current BTF.
      Loader needs to adjust it if it changes on the host.
      
      For instruction .Ltmp5, "r2 = 0", the external variable
      got a default value 0, loader needs to supply an appropriate
      value for the particular host.
      
      Compiling to generate object code and disassemble:
         0000000000000000 bpf_prog:
                 0:       b7 02 00 00 00 00 00 00         r2 = 0
                 1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
                 2:       b7 02 00 00 00 00 00 00         r2 = 0
                 3:       b7 03 00 00 88 a2 00 00         r3 = 41608
                 4:       2d 23 03 00 00 00 00 00         if r3 > r2 goto +3 <LBB0_2>
                 5:       b7 02 00 00 08 00 00 00         r2 = 8
                 6:       79 13 00 00 00 00 00 00         r3 = *(u64 *)(r1 + 0)
                 7:       05 00 02 00 00 00 00 00         goto +2 <LBB0_3>
      
          0000000000000040 LBB0_2:
                 8:       b7 02 00 00 08 00 00 00         r2 = 8
                 9:       79 13 08 00 00 00 00 00         r3 = *(u64 *)(r1 + 8)
      
          0000000000000050 LBB0_3:
                10:       0f 23 00 00 00 00 00 00         r3 += r2
                11:       bf a1 00 00 00 00 00 00         r1 = r10
                12:       07 01 00 00 f8 ff ff ff         r1 += -8
                13:       b7 02 00 00 08 00 00 00         r2 = 8
                14:       85 00 00 00 04 00 00 00         call 4
      
      Instructions #2, #5 and #8 need relocation resoutions from the loader.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      
      Differential Revision: https://reviews.llvm.org/D61524
      
      llvm-svn: 365503
      d3d88d08
    • David Green's avatar
      [ARM] Add test for MVE and no floats. NFC · 781e3aff
      David Green authored
      Adds a simple test that MVE with no floating point will be promoted correctly
      to software float calls.
      
      llvm-svn: 365496
      781e3aff
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Register bank select for G_PHI. Select i64 phi · be20e361
      Petar Avramovic authored
      Select gprb or fprb when def/use register operand of G_PHI is
      used/defined by either:
       copy to/from physical register or
       instruction with only one mapping available for that use/def operand.
      
      Integer s64 phi is handled with narrowScalar when mapping is applied,
      produced artifacts are combined away. Manually set gprb to all register
      operands of instructions created during narrowScalar.
      
      Differential Revision: https://reviews.llvm.org/D64351
      
      llvm-svn: 365494
      be20e361
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Prepare some tests for store selection · fdd761af
      Matt Arsenault authored
      Mostsly these would fail due to trying to use SI with a flat
      operation. Implementing global loads with MUBUF is more work than
      flat, so these won't be handled in the initial load selection.
      
      Others fail because store of s64 won't initially work, as the current
      set of patterns expect everything to be turned into v2i32.
      
      llvm-svn: 365493
      fdd761af
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Regbanks for G_SELECT. Select i64, f32 and f64 select · dbb6d01d
      Petar Avramovic authored
      Select gprb or fprb when def/use register operand of G_SELECT is
      used/defined by either:
       copy to/from physical register or
       instruction with only one mapping available for that use/def operand.
      
      Integer s64 select is handled with narrowScalar when mapping is applied,
      produced artifacts are combined away. Manually set gprb to all register
      operands of instructions created during narrowScalar.
      
      For selection of floating point s32 or s64 select it is enough to set
      fprb of appropriate size and selectImpl will do the rest.
      
      Differential Revision: https://reviews.llvm.org/D64350
      
      llvm-svn: 365492
      dbb6d01d
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Fix test · 85ad662d
      Matt Arsenault authored
      llvm-svn: 365491
      85ad662d
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Legalize more concat_vectors · 4dd5755d
      Matt Arsenault authored
      llvm-svn: 365488
      4dd5755d
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Improve regbankselect for icmp s16 · 6bdb92d8
      Matt Arsenault authored
      Account for 64-bit scalar eq/ne when available.
      
      llvm-svn: 365487
      6bdb92d8
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Make s16 G_ICMP legal · 8b8eee59
      Matt Arsenault authored
      llvm-svn: 365486
      8b8eee59
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Select G_SUB · e6d10f97
      Matt Arsenault authored
      llvm-svn: 365484
      e6d10f97
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Select G_UNMERGE_VALUES · 872f38be
      Matt Arsenault authored
      llvm-svn: 365483
      872f38be
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Select G_MERGE_VALUES · 9b7ffc4e
      Matt Arsenault authored
      llvm-svn: 365482
      9b7ffc4e
    • Bjorn Pettersson's avatar
      [LegalizeTypes] Fix saturation bug for smul.fix.sat · 59029017
      Bjorn Pettersson authored
      Summary:
      Make sure we use SETGE instead of SETGT when checking
      if the sign bit is zero at SMULFIXSAT expansion.
      
      The faulty expansion occured when doing "expand" of
      SMULFIXSAT and the scale was exactly matching the
      size of the smaller type. For example doing
        i64 Z = SMULFIXSAT X, Y, 32
      and expanding X/Y/Z into using two i32 values.
      
      The problem was that we sometimes did not saturate
      to min when overflowing.
      
      Here is an example using Q3.4 numbers:
      
      Consider that we are multiplying X and Y.
        X = 0x80 (-8.0 as Q3.4)
        Y = 0x20 (2.0 as Q3.4)
      To avoid loss of precision we do a widening
      multiplication, getting a 16 bit result
        Z = 0xF000 (-16.0 as Q7.8)
      
      To detect negative overflow we should check if
      the five most significant bits in Z are less than -1.
      Assume that we name the 4 most significant bits
      as HH and the next 4 bits as HL. Then we can do the
      check by examining if
       (HH < -1) or (HH == -1 && "sign bit in HL is zero").
      
      The fault was that we have been doing the check as
       (HH < -1) or (HH == -1 && HL > 0)
      instead of
       (HH < -1) or (HH == -1 && HL >= 0).
      
      In our example HH is -1 and HL is 0, so the old
      code did not trigger saturation and simply truncated
      the result to 0x00 (0.0). With the bugfix we instead
      detect that we should saturate to min, and the result
      will be set to 0x80 (-8.0).
      
      Reviewers: leonardchan, bevinh
      
      Reviewed By: leonardchan
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D64331
      
      llvm-svn: 365455
      59029017
    • Guillaume Chatelet's avatar
      Fixing @llvm.memcpy not honoring volatile. · 336f3e16
      Guillaume Chatelet authored
      This is explicitly not addressing target-specific code, or calls to memcpy.
      
      Summary: https://bugs.llvm.org/show_bug.cgi?id=42254
      
      Reviewers: courbet
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D63215
      
      llvm-svn: 365449
      336f3e16
    • Kai Luo's avatar
      [NFC][PowerPC] Added a test to show current codegen of MachinePRE · 09329ce6
      Kai Luo authored
      llvm-svn: 365447
      09329ce6
Loading