Skip to content
  1. Dec 18, 2018
    • Alexandre Ganea's avatar
      [CMake] Default options for faster executables on MSVC · b5053199
      Alexandre Ganea authored
      - Disable incremental linking by default. /INCREMENTAL adds extra thunks in the EXE, which makes execution slower.
      - Set /MT (static CRT lib) by default instead of CMake's default /MD (dll CRT lib). The previous default /MD makes all DLL functions to be thunked, thus making execution slower (memcmp, memset, etc.)
      - Adds LLVM_ENABLE_INCREMENTAL_LINK which is set to OFF by default.
      
      Differential revision: https://reviews.llvm.org/D55056
      
      llvm-svn: 349517
      b5053199
    • Alexandre Ganea's avatar
      [llvm-symbolizer] Omit stderr output when symbolizing a crash · b67d91e0
      Alexandre Ganea authored
      Differential revision: https://reviews.llvm.org/D55723
      
      llvm-svn: 349516
      b67d91e0
    • Sanjay Patel's avatar
      [InstCombine] add tests for scalarization; NFC · e0afd278
      Sanjay Patel authored
      We miss pattern matching a splat constant if it has undef elements.
      
      llvm-svn: 349515
      e0afd278
    • Michael Berg's avatar
      Add FMF management to common fp intrinsics in GlobalIsel · c6a5245c
      Michael Berg authored
      Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel.  Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
      
      Reviewers: aditya_nandakumar, bogner
      
      Reviewed By: bogner
      
      Subscribers: rovka, kristof.beyls, javed.absar
      
      Differential Revision: https://reviews.llvm.org/D55668
      
      llvm-svn: 349514
      c6a5245c
    • Michael Kruse's avatar
      [LoopVectorize] Rename pass options. NFC. · d4eb13c8
      Michael Kruse authored
      Rename:
      NoUnrolling to InterleaveOnlyWhenForced
      and
      AlwaysVectorize to !VectorizeOnlyWhenForced
      
      Contrary to what the name 'AlwaysVectorize' suggests, it does not
      unconditionally vectorize all loops, but applies a cost model to
      determine whether vectorization is profitable to all loops. Hence,
      passing false will disable the cost model, except when a loop is marked
      with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
      by @hfinkel in D55716) better matches this behavior.
      
      Similarly, 'NoUnrolling' disables the profitability cost model for
      interleaving (a term to distinguish it from unrolling by the
      LoopUnrollPass); rename it for consistency.
      
      Differential Revision: https://reviews.llvm.org/D55785
      
      llvm-svn: 349513
      d4eb13c8
    • Simon Pilgrim's avatar
      [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts · 14119174
      Simon Pilgrim authored
      Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.
      
      llvm-svn: 349510
      14119174
    • Michael Kruse's avatar
      [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. · 3284775b
      Michael Kruse authored
      When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
      the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
      also means that it will not process any loop metadata such as
      llvm.loop.unroll.enable (which is generated by #pragma unroll or
      WarnMissedTransformationsPass emits a warning that a forced
      transformation has not been applied (see
      https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
      Such explicit transformations should take precedence over disabling
      heuristics.
      
      This patch unconditionally adds LoopUnrollPass to the optimizing
      pipeline (that is, it is still not added with `-O0`), but passes a flag
      indicating whether automatic unrolling is dis-/enabled. This is the same
      approach as LoopVectorize uses.
      
      The new pass manager's pipeline builder has no option to disable
      unrolling, hence the problem does not apply.
      
      Differential Revision: https://reviews.llvm.org/D55716
      
      llvm-svn: 349509
      3284775b
    • Simon Pilgrim's avatar
      [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts · e9effe97
      Simon Pilgrim authored
      Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.
      
      llvm-svn: 349500
      e9effe97
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM · 0a5e4eb7
      Petar Avramovic authored
      Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
      and use integer type of correct size when creating arguments for
      CLI.lowerCall.
      Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
      on MIPS32.
      
      Differential Revision: https://reviews.llvm.org/D55651
      
      llvm-svn: 349499
      0a5e4eb7
    • Nico Weber's avatar
      [gn build] Add build file for llvm-pdbutil · 1164dab2
      Nico Weber authored
      Needed for check-lld.
      
      Differential Revision: https://reviews.llvm.org/D55826
      
      llvm-svn: 349490
      1164dab2
    • Nico Weber's avatar
      [gn build] Add build file for llvm-bcanalyzer · 6dc08550
      Nico Weber authored
      Needed for check-lld.
      
      Differential Revision: https://reviews.llvm.org/D55824
      
      llvm-svn: 349488
      6dc08550
    • Nico Weber's avatar
      [gn build] Add build files for llvm-ar, llvm-nm, llvm-objdump, llvm-readelf · 6b66308f
      Nico Weber authored
      Also add build files for deps DebugInfo/Symbolize, ToolDrivers/dll-tool.
      Also add gn/build/libs/xar (needed by llvm-objdump).
      
      Also delete an incorrect part of the symlink description in //BUILD.gn (it used
      to be true before I made the symlink step write a stamp file; now it's no
      longer true).
      
      These are all binaries needed by check-lld that need symlinks.
      
      Differential Revision: https://reviews.llvm.org/D55743
      
      llvm-svn: 349486
      6b66308f
    • Simon Pilgrim's avatar
      [X86][SSE] Add shift combine 'out of range' tests with UNDEFs · be0fbe67
      Simon Pilgrim authored
      Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF.
      
      llvm-svn: 349483
      be0fbe67
    • Nikita Popov's avatar
      [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS · 665ab081
      Nikita Popov authored
      Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
      UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
      the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.
      
      This only replaces use in the X86 backend, and does not move any of
      the ADDUS/SUBUS X86 specific combines into generic codegen.
      
      Differential Revision: https://reviews.llvm.org/D55787
      
      llvm-svn: 349481
      665ab081
    • Nikita Popov's avatar
      [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests · a7d2a235
      Nikita Popov authored
      Integer result promotion needs to use the scalar size, and we need
      support for result widening.
      
      This is in preparation for D55787.
      
      llvm-svn: 349480
      a7d2a235
    • Peter Smith's avatar
      [docs] Improve HowToCrossCompilerBuiltinsOnArm · d1328e1a
      Peter Smith authored
      Some recent experience on llvm-dev pointed out some errors in the document:
      - Assumption of ninja
      - Use of --march rather than -march
      - Problems with host include files when a multiarch setup was used
      - Insufficient target information passed to assembler
      - Instructions on using the cmake cache file BaremetalARM.cmake were
        incomplete
      
      There was also insufficient guidance on what to do when various stages
      failed due to misconfiguration or missing steps.
      
      Summary of changes:
      - Fixed problems above
      - Added a troubleshooting section with common errors.
      - Cleared up one "at time of writing" that is no longer a problem.
      
      Differential Revision: https://reviews.llvm.org/D55709
      
      llvm-svn: 349477
      d1328e1a
    • George Rimar's avatar
      [llvm-dwarfdump] - Do not error out on R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. · 1ec49110
      George Rimar authored
      This is https://bugs.llvm.org/show_bug.cgi?id=39992,
      
      If we have the following code (test.cpp):
      
      thread_local int tdata = 24;
      and build an .o file with debug information:
      
      clang --target=x86_64-pc-linux -c bar.cpp -g
      
      Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
      (clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me)
      
      Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping
      object and reports an
      error:
      failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file
      
      This relocation represents the offset in the TLS block and resolved by the linker,
      but this info is unavailable at the
      point when the object file is dumped by this tool.
      
      The patch adds the simple evaluation for such relocations to avoid emitting errors.
      Resulting behavior seems to be equal to GNU dwarfdump.
      
      Differential revision: https://reviews.llvm.org/D55762
      
      llvm-svn: 349476
      1ec49110
    • Petar Avramovic's avatar
      [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR · 150fd430
      Petar Avramovic authored
      Add narrowScalar for G_AND and G_XOR.
      Legalize G_AND G_OR and G_XOR for types other then s32 
      with clampScalar on MIPS32.
      
      Differential Revision: https://reviews.llvm.org/D55362
      
      llvm-svn: 349475
      150fd430
    • Luke Cheeseman's avatar
      [AArch64] - Return address signing dwarf support · f57d7d82
      Luke Cheeseman authored
      - Reapply changes intially introduced in r343089
      - The archtecture info is no longer loaded whenever a DWARFContext is created
      - The runtimes libraries (santiziers) make use of the dwarf context classes but
        do not intialise the target info
      - The architecture of the object can be obtained without loading the target info
      - Adding a method to the dwarf context to get this information and multiplex the
        string printing later on
      
      Differential Revision: https://reviews.llvm.org/D55774
      
      llvm-svn: 349472
      f57d7d82
    • Simon Pilgrim's avatar
      [X86][AVX] Add 256/512-bit vector funnel shift tests · ba8e84b3
      Simon Pilgrim authored
      Extra coverage for D55747
      
      llvm-svn: 349471
      ba8e84b3
    • Simon Pilgrim's avatar
      [X86][SSE] Add 128-bit vector funnel shift tests · 46b90e85
      Simon Pilgrim authored
      Extra coverage for D55747
      
      llvm-svn: 349470
      46b90e85
    • Dylan McKay's avatar
      [IPO][AVR] Create new Functions in the default address space specified in the data layout · f920da00
      Dylan McKay authored
      This modifies the IPO pass so that it respects any explicit function
      address space specified in the data layout.
      
      In targets with nonzero program address spaces, all functions should, by
      default, be placed into the default program address space.
      
      This is required for Harvard architectures like AVR. Without this, the
      functions will be marked as residing in data space, and thus not be
      callable.
      
      This has no effect to any in-tree official backends, as none use an
      explicit program address space in their data layouts.
      
      Patch by Tim Neumann.
      
      llvm-svn: 349469
      f920da00
    • Matt Arsenault's avatar
      AMDGPU: Legalize/regbankselect frame_index · c94e26c7
      Matt Arsenault authored
      llvm-svn: 349468
      c94e26c7
    • Matt Arsenault's avatar
      AMDGPU: Legalize/regbankselect fma · c0ea2210
      Matt Arsenault authored
      llvm-svn: 349467
      c0ea2210
    • Simon Pilgrim's avatar
      [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits · af6fbbf1
      Simon Pilgrim authored
      For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
      
      llvm-svn: 349466
      af6fbbf1
    • Tim Northover's avatar
      SROA: preserve alignment tags on loads and stores. · 856628f7
      Tim Northover authored
      When splitting up an alloca's uses we were dropping any explicit
      alignment tags, which means they default to the ABI-required default
      alignment and this can cause miscompiles if the real value was smaller.
      
      Also refactor the TBAA metadata into a parent class since it's shared by
      both children anyway.
      
      llvm-svn: 349465
      856628f7
    • Matt Arsenault's avatar
      GlobalISel: Improve crash on invalid mapping · 1ac38ba7
      Matt Arsenault authored
      If NumBreakDowns is 0, BreakDown is null.
      This trades a null dereference with an assert somewhere
      else.
      
      llvm-svn: 349464
      1ac38ba7
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub · e01e7c81
      Matt Arsenault authored
      llvm-svn: 349463
      e01e7c81
    • Simon Pilgrim's avatar
      [X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits · 8488a44c
      Simon Pilgrim authored
      (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1
      
      This works better as part of SimplifyDemandedBits than part of the general combine.
      
      llvm-svn: 349462
      8488a44c
    • Simon Pilgrim's avatar
      [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. · 26c630f4
      Simon Pilgrim authored
      This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).
      
      Test change is merely a rescheduling.
      
      llvm-svn: 349459
      26c630f4
    • Kristof Beyls's avatar
      Introduce control flow speculation tracking pass for AArch64 · e66bc1f7
      Kristof Beyls authored
      The pass implements tracking of control flow miss-speculation into a "taint"
      register. That taint register can then be used to mask off registers with
      sensitive data when executing under miss-speculation, a.k.a. "transient
      execution".
      This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
      
      At the moment, it implements the tracking of miss-speculation of control
      flow into a taint register, but doesn't implement a mechanism yet to then
      use that taint register to mask off vulnerable data in registers (something
      for a follow-on improvement). Possible strategies to mask out vulnerable
      data that can be implemented on top of this are:
      - speculative load hardening to automatically mask of data loaded
        in registers.
      - using intrinsics to mask of data in registers as indicated by the
        programmer (see https://lwn.net/Articles/759423/).
      
      For AArch64, the following implementation choices are made.
      Some of these are different than the implementation choices made in
      the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
      the instruction set characteristics result in different trade-offs.
      - The speculation hardening is done after register allocation. With a
        relative abundance of registers, one register is reserved (X16) to be
        the taint register. X16 is expected to not clash with other register
        reservation mechanisms with very high probability because:
        . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
        . The only way to request X16 to be used as a programmer is through
          inline assembly. In the rare case a function explicitly demands to
          use X16/W16, this pass falls back to hardening against speculation
          by inserting a DSB SYS/ISB barrier pair which will prevent control
          flow speculation.
      - It is easy to insert mask operations at this late stage as we have
        mask operations available that don't set flags.
      - The taint variable contains all-ones when no miss-speculation is detected,
        and contains all-zeros when miss-speculation is detected. Therefore, when
        masking, an AND instruction (which only changes the register to be masked,
        no other side effects) can easily be inserted anywhere that's needed.
      - The tracking of miss-speculation is done by using a data-flow conditional
        select instruction (CSEL) to evaluate the flags that were also used to
        make conditional branch direction decisions. Speculation of the CSEL
        instruction can be limited with a CSDB instruction - so the combination of
        CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
        aren't speculated. When conditional branch direction gets miss-speculated,
        the semantics of the inserted CSEL instruction is such that the taint
        register will contain all zero bits.
        One key requirement for this to work is that the conditional branch is
        followed by an execution of the CSEL instruction, where the CSEL
        instruction needs to use the same flags status as the conditional branch.
        This means that the conditional branches must not be implemented as one
        of the AArch64 conditional branches that do not use the flags as input
        (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
        selectors to not produce these instructions when speculation hardening
        is enabled. This pass will assert if it does encounter such an instruction.
      - On function call boundaries, the miss-speculation state is transferred from
        the taint register X16 to be encoded in the SP register as value 0.
      
      Future extensions/improvements could be:
      - Implement this functionality using full speculation barriers, akin to the
        x86-slh-lfence option. This may be more useful for the intrinsics-based
        approach than for the SLH approach to masking.
        Note that this pass already inserts the full speculation barriers if the
        function for some niche reason makes use of X16/W16.
      - no indirect branch misprediction gets protected/instrumented; but this
        could be done for some indirect branches, such as switch jump tables.
      
      Differential Revision: https://reviews.llvm.org/D54896
      
      llvm-svn: 349456
      e66bc1f7
    • Martin Storsjö's avatar
      [AArch64] [MinGW] Allow enabling SEH exceptions · 8f0cb9c3
      Martin Storsjö authored
      The default still is dwarf, but SEH exceptions can now be enabled
      optionally for the MinGW target.
      
      Differential Revision: https://reviews.llvm.org/D55748
      
      llvm-svn: 349451
      8f0cb9c3
    • Craig Topper's avatar
      [X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when... · 284d426f
      Craig Topper authored
      [X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used.
      
      A similar things happen to TBM instructions which we already have tests for.
      
      llvm-svn: 349450
      284d426f
    • Kewen Lin's avatar
      [PowerPC][NFC]Update vabsd cases with vselect test cases · bbb461f7
      Kewen Lin authored
      Power9 VABSDU* instructions can be exploited for some special vselect sequences.
      Check in the orignal test case here, later the exploitation patch will update this 
      and reviewers can check the differences easily.
      
      llvm-svn: 349446
      bbb461f7
    • Kewen Lin's avatar
      [PowerPC] Exploit power9 new instruction setb · 44ace925
      Kewen Lin authored
      Check the expected pattens feeding to SELECT_CC like:
         (select_cc lhs, rhs,  1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1)
         (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1)
         (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs,  1, -1, cc2), seteq)
         (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs, -1,  1, cc2), seteq)
      Further transform the sequence to comparison + setb if hits.
      
      Differential Revision: https://reviews.llvm.org/D53275
      
      llvm-svn: 349445
      44ace925
    • QingShan Zhang's avatar
      [NFC] Add new test to cover the lhs scheduling issue for P9. · ecdab5bd
      QingShan Zhang authored
      llvm-svn: 349443
      ecdab5bd
    • Craig Topper's avatar
      [X86] Add test case for PR40060. NFC · 4adf9ca7
      Craig Topper authored
      llvm-svn: 349441
      4adf9ca7
    • Craig Topper's avatar
      [X86] Const correct some helper functions X86InstrInfo.cpp. NFC · 1ff7356f
      Craig Topper authored
      llvm-svn: 349440
      1ff7356f
    • QingShan Zhang's avatar
      [NFC] fix test case issue that with wrong label check. · f5498125
      QingShan Zhang authored
      llvm-svn: 349439
      f5498125
    • Artur Pilipenko's avatar
      [CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation · 2a0146e0
      Artur Pilipenko authored
          
      This is a follow up for rL347910. In the original patch I somehow forgot to pass
      the limit from wrappers to the function which actually does the job.
      
      llvm-svn: 349438
      2a0146e0
Loading