Skip to content
  1. Jun 19, 2020
  2. Jun 18, 2020
  3. Jun 17, 2020
    • Ian Levesque's avatar
      [xray] Option to omit the function index · 7c7c8e0d
      Ian Levesque authored
      Summary:
      Add a flag to omit the xray_fn_idx to cut size overhead and relocations
      roughly in half at the cost of reduced performance for single function
      patching.  Minor additions to compiler-rt support per-function patching
      without the index.
      
      Reviewers: dberris, MaskRay, johnislarry
      
      Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits
      
      Tags: #clang, #sanitizers, #llvm
      
      Differential Revision: https://reviews.llvm.org/D81995
      7c7c8e0d
    • Daniel Sanders's avatar
      [gicombiner] Allow disable-rule option to disable all-except-... · 778db887
      Daniel Sanders authored
      Summary:
      Adds two features to the generated rule disable option:
      - '*' - Disable all rules
      - '!<foo>' - Re-enable rule(s)
        - '!foo' - Enable rule named 'foo'
        - '!5' - Enable rule five
        - '!4-9' - Enable rule four to nine
        - '!foo-bar' - Enable rules from 'foo' to (and including) 'bar'
      (the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled())
      
      This is intended to support unit testing of combine rules so
      that you can do:
        GeneratedCfg.setRuleDisabled("*")
        GeneratedCfg.setRuleEnabled("foo")
      to ensure only a specific rule is in effect. The rule is still
      required to be included in a combiner though
      
      Also added --...-only-enable-rule=X,Y which is effectively an
      alias for --...-disable-rule=*,!X,!Y and as such interacts
      properly with disable-rule.
      
      Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
      
      Subscribers: wdng, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D81889
      778db887
  4. Jun 16, 2020
  5. Jun 15, 2020
  6. Jun 13, 2020
  7. Jun 12, 2020
  8. Jun 11, 2020
    • Fangrui Song's avatar
      432f20bc
    • Eli Friedman's avatar
      12459ec9
    • Dominik Montada's avatar
      [GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating... · f24e2e9e
      Dominik Montada authored
      [GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating @llvm.dbg.value intrinsic and using -debug
      
      Summary:
      Fix crash when using -debug caused by the GlobalISel observer trying to print
      an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
      using buildInstr, which immediately inserts the instruction causing print,
      instead of using BuildMI to first build up the instruction and using
      insertInstr when finished.
      
      Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
      no crash is happening.
      
      Also fixed a missing %s in the 2nd RUN-line of the same test.
      
      Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm
      
      Reviewed By: arsenm
      
      Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76934
      f24e2e9e
    • David Sherwood's avatar
      [CodeGen] Let computeKnownBits do something sensible for scalable vectors · bd97342a
      David Sherwood authored
      Until we have a real need for computing known bits for scalable
      vectors I have simply changed the code to bail out for now and
      pretend we know nothing. I've also fixed up some simple callers of
      computeKnownBits too.
      
      Differential Revision: https://reviews.llvm.org/D80437
      bd97342a
    • Kristof Beyls's avatar
      [AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions. · 0ee176ed
      Kristof Beyls authored
      Some processors may speculatively execute the instructions immediately
      following RET (returns) and BR (indirect jumps), even though
      control flow should change unconditionally at these instructions.
      To avoid a potential miss-speculatively executed gadget after these
      instructions leaking secrets through side channels, this pass places a
      speculation barrier immediately after every RET and BR instruction.
      
      Since these barriers are never on the correct, architectural execution
      path, performance overhead of this is expected to be low.
      
      On targets that implement that Armv8.0-SB Speculation Barrier extension,
      a single SB instruction is emitted that acts as a speculation barrier.
      On other targets, a DSB SYS followed by a ISB is emitted to act as a
      speculation barrier.
      
      These speculation barriers are implemented as pseudo instructions to
      avoid later passes to analyze them and potentially remove them.
      
      Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
      instructions, these are also mitigated by the pass and tested through a
      MIR test.
      
      The mitigation is off by default and can be enabled by the
      harden-sls-retbr subtarget feature.
      
      Differential Revision:  https://reviews.llvm.org/D81400
      0ee176ed
  9. Jun 10, 2020
  10. Jun 09, 2020
    • Mehdi Amini's avatar
      Change filecheck default to dump input on failure · d31c9e5a
      Mehdi Amini authored
      Having the input dumped on failure seems like a better
      default: I debugged FileCheck tests for a while without knowing
      about this option, which really helps to understand failures.
      
      Remove `-dump-input-on-failure` and the environment variable
      FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.
      
      Differential Revision: https://reviews.llvm.org/D81422
      d31c9e5a
    • David Green's avatar
      [MachineScheduler] Update available queue on the first mop of a new cycle · 2fea3fe4
      David Green authored
      If a resource can be held for multiple cycles in the schedule model
      then an instruction can be placed into the available queue, another
      instruction can be scheduled, but the first will not be taken back out if
      the two instructions hazard. To fix this make sure that we update the
      available queue even on the first MOp of a cycle, pushing available
      instructions back into the pending queue if they now conflict.
      
      This happens with some downstream schedules we have around MVE
      instruction scheduling where we use ResourceCycles=[2] to show the
      instruction executing over two beats. Apparently the test changes here
      are OK too.
      
      Differential Revision: https://reviews.llvm.org/D76909
      2fea3fe4
    • Jessica Paquette's avatar
      [AArch64][GlobalISel] Select trn1 and trn2 · cb2d8b30
      Jessica Paquette authored
      Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
      G_SHUFFLE_VECTORs that are trn1/trn2 instructions.
      
      - Add G_TRN1 and G_TRN2
      - Port mask matching code from AArch64ISelLowering
      - Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
      - Select via importer
      
      Add select-trn.mir to test selection.
      
      Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
      existing arm64-trn test.
      
      Note that both of these tests contain things we currently don't legalize.
      
      I figured it would be easier to test these now rather than later, since once
      we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
      the tests.
      
      Differential Revision: https://reviews.llvm.org/D81182
      cb2d8b30
    • Sanjay Patel's avatar
      [DAGCombiner] allow more folding of fadd + fmul into fma · 702cf933
      Sanjay Patel authored
      If fmul and fadd are separated by an fma, we can fold them together
      to save an instruction:
      fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))
      
      The fold implemented here is actually a specialization - we should
      be able to peek through >1 fma to find this pattern. That's another
      patch if we want to try that enhancement though.
      
      This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
      so it was done for some in-tree targets like PowerPC, but not AArch64
      or x86. The hook is protecting against forming a potentially more
      expensive computation when fma takes longer to execute than a single
      fadd. That hook may be needed for other transforms, but in this case,
      we are replacing fmul+fadd with fma, and the fma should never take
      longer than the 2 individual instructions.
      
      'contract' FMF is all we need to allow this transform. That flag
      corresponds to -ffp-contract=fast in Clang, so we are allowed to form
      fma ops freely across expressions.
      
      Differential Revision: https://reviews.llvm.org/D80801
      702cf933
    • Cullen Rhodes's avatar
      [AArch64][SVE] Implement structured load intrinsics · b82be5db
      Cullen Rhodes authored
      Summary:
      This patch adds initial support for the following instrinsics:
      
          * llvm.aarch64.sve.ld2
          * llvm.aarch64.sve.ld3
          * llvm.aarch64.sve.ld4
      
      For loading two, three and four vectors worth of data. Basic codegen is
      implemented with reg+reg and reg+imm addressing modes being addressed
      in a later patch.
      
      The types returned by these intrinsics have a number of elements that is a
      multiple of the elements in a 128-bit vector for a given type and N, where N is
      the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
      the types are:
      
          LD2 : <vscale x 8 x i32>
          LD3 : <vscale x 12 x i32>
          LD4 : <vscale x 16 x i32>
      
      This is implemented with target-specific intrinsics for each variant that take
      the same operands as the IR intrinsic but return N values, where the type of
      each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
      These values are then concatenated using the standard concat_vector intrinsic
      to maintain type legality with the IR.
      
      These intrinsics are intended for use in the Arm C Language
      Extension (ACLE).
      
      Reviewed By: sdesmalen
      
      Differential Revision: https://reviews.llvm.org/D75751
      b82be5db
Loading