Skip to content
  1. Jan 16, 2019
  2. Jan 15, 2019
    • Jonas Devlieghere's avatar
      [VFS] Add getter for mapping entries. · 7a168627
      Jonas Devlieghere authored
      When generating a reproducer in LLDB we build up the mapping but don't
      immediately copy over the files on the file system.
      
      Rather than keeping a separate data structure with real and virtual
      paths, we might as well reuse the entries already stored in the
      YAMLVFSWriter to lazily copy over the files when needed.
      
      llvm-svn: 351266
      7a168627
    • Jonas Devlieghere's avatar
      [VFS] Move RedirectingFileSystem interface into header (NFC) · 1a0ce65a
      Jonas Devlieghere authored
      This moves the RedirectingFileSystem into the header so it can be
      extended. This is needed in LLDB we need a way to obtain the external
      path to deal with FILE* and file descriptor APIs.
      
      Discussion on the mailing list:
      http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html
      
      Differential revision: https://reviews.llvm.org/D54277
      
      llvm-svn: 351265
      1a0ce65a
    • Jordan Rupprecht's avatar
      [libObject] Tweak expected error output from llvm-ar · 20a817ea
      Jordan Rupprecht authored
      llvm-svn: 351259
      20a817ea
    • Peter Collingbourne's avatar
    • Jordan Rupprecht's avatar
      [llvm-ar] Resubmit recursive thin archive test with fix for full path names... · 904ce984
      Jordan Rupprecht authored
      [llvm-ar] Resubmit recursive thin archive test with fix for full path names and better error messages
      
      llvm-svn: 351256
      904ce984
    • Peter Collingbourne's avatar
      gn build: Add a resource_dir.gni file. · efe83db7
      Peter Collingbourne authored
      The path to the resource directory will end up being used in several
      more places once the support for running check-hwasan lands. This
      moves the definition to a central location so that it can be used
      from those places.
      
      Differential Revision: https://reviews.llvm.org/D56700
      
      llvm-svn: 351255
      efe83db7
    • Craig Topper's avatar
      [X86] Add the GCCBuiltin name back to the deprecated avx512 gather intrinsics... · b2729b14
      Craig Topper authored
      [X86] Add the GCCBuiltin name back to the deprecated avx512 gather intrinsics until the clang side patch for the new versions is approved.
      
      llvm-svn: 351254
      b2729b14
    • Roman Lebedev's avatar
      X86DAGToDAGISel::matchBitExtract() with truncation (PR36419) · fb4eed38
      Roman Lebedev authored
      Summary:
      Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`.
      That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]].
      
      That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it.
      It will not be unexpected if you will be doing further computations in 32-bit width.
      And then the current code breaks, as the tests show.
      
      The basic idea/pattern here is following:
      1. We have `i64` input
      2. We perform `i64` right-shift on it.
      3. We `trunc`ate that shifted value
      4. We do all further work (masking) in `i32`
      
      Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift.
      BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64`
      and truncate the result after masking: https://rise4fun.com/Alive/K4B
      ```
      Name: @bextr64_32_b1 -> @bextr64_32_b0
        %shiftedval = lshr i64 %val, %numskipbits
        %truncshiftedval = trunc i64 %shiftedval to i32
        %widenumlowbits1 = zext i8 %numlowbits to i32
        %notmask1 = shl nsw i32 -1, %widenumlowbits1
        %mask1 = xor i32 %notmask1, -1
        %res = and i32 %truncshiftedval, %mask1
      =>
        %shiftedval = lshr i64 %val, %numskipbits
        %widenumlowbits = zext i8 %numlowbits to i64
        %notmask = shl nsw i64 -1, %widenumlowbits
        %mask = xor i64 %notmask, -1
        %wideres = and i64 %shiftedval, %mask
        %res = trunc i64 %wideres to i32
      ```
      
      Thus, we are again able to extract that `lshr` into `BEXTR`'s control.
      
      Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea:
      ```
      $ cat /tmp/old.s
      # bextr64_32_b1
      # LLVM-EXEGESIS-LIVEIN RSI
      # LLVM-EXEGESIS-LIVEIN EDX
      # LLVM-EXEGESIS-LIVEIN RDI
      movq %rsi, %rcx
      shrq %cl, %rdi
      shll $8, %edx
      bextrl %edx, %edi, %eax
      $ cat /tmp/old.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
      Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o
      ---
      mode:            latency
      key:
        instructions:
          - 'MOV64rr RCX RSI'
          - 'SHR64rCL RDI RDI'
          - 'SHL32ri EDX EDX i_0x8'
          - 'BEXTR32rr EAX EDI EDX'
        config:          ''
        register_initial_values: []
      cpu_name:        bdver2
      llvm_triple:     x86_64-unknown-linux-gnu
      num_repetitions: 10000
      measurements:
        - { key: latency, value: 0.6638, per_snippet_value: 2.6552 }
      error:           ''
      info:            ''
      assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
      ...
      $ cat /tmp/old.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
      Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o
      ---
      mode:            uops
      key:
        instructions:
          - 'MOV64rr RCX RSI'
          - 'SHR64rCL RDI RDI'
          - 'SHL32ri EDX EDX i_0x8'
          - 'BEXTR32rr EAX EDI EDX'
        config:          ''
        register_initial_values: []
      cpu_name:        bdver2
      llvm_triple:     x86_64-unknown-linux-gnu
      num_repetitions: 10000
      measurements:
        - { key: PdFPU0, value: 0, per_snippet_value: 0 }
        - { key: PdFPU1, value: 0, per_snippet_value: 0 }
        - { key: PdFPU2, value: 0, per_snippet_value: 0 }
        - { key: PdFPU3, value: 0, per_snippet_value: 0 }
        - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
      error:           ''
      info:            ''
      assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
      ...
      ```
      vs
      ```
      $ cat /tmp/new.s
      # bextr64_32_b1
      # LLVM-EXEGESIS-LIVEIN RDX
      # LLVM-EXEGESIS-LIVEIN SIL
      # LLVM-EXEGESIS-LIVEIN RDI
      shlq $8, %rdx
      movzbl %sil, %eax
      orq %rdx, %rax
      bextrq %rax, %rdi, %rax
      $ cat /tmp/new.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
      Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o
      ---
      mode:            latency
      key:
        instructions:
          - 'SHL64ri RDX RDX i_0x8'
          - 'MOVZX32rr8 EAX SIL'
          - 'OR64rr RAX RAX RDX'
          - 'BEXTR64rr RAX RDI RAX'
        config:          ''
        register_initial_values: []
      cpu_name:        bdver2
      llvm_triple:     x86_64-unknown-linux-gnu
      num_repetitions: 10000
      measurements:
        - { key: latency, value: 0.7454, per_snippet_value: 2.9816 }
      error:           ''
      info:            ''
      assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
      ...
      $ cat /tmp/new.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
      Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o
      ---
      mode:            uops
      key:
        instructions:
          - 'SHL64ri RDX RDX i_0x8'
          - 'MOVZX32rr8 EAX SIL'
          - 'OR64rr RAX RAX RDX'
          - 'BEXTR64rr RAX RDI RAX'
        config:          ''
        register_initial_values: []
      cpu_name:        bdver2
      llvm_triple:     x86_64-unknown-linux-gnu
      num_repetitions: 10000
      measurements:
        - { key: PdFPU0, value: 0, per_snippet_value: 0 }
        - { key: PdFPU1, value: 0, per_snippet_value: 0 }
        - { key: PdFPU2, value: 0, per_snippet_value: 0 }
        - { key: PdFPU3, value: 0, per_snippet_value: 0 }
        - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
      error:           ''
      info:            ''
      assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
      ...
      ```
      ^ latency increased (worse).
      
      Except //maybe// not really.
      Like with all synthetic benchmarks, they //may// be misleading.
      
      Let's take a look on some actual real-world hotpath.
      In this case it's 'my' [[ https://github.com/darktable-org/rawspeed | RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ https://github.com/darktable-org/rawspeed/blob/e3316dc85127c2c29baa40f998f198a7b278bf36/src/librawspeed/decompressors/VC5Decompressor.cpp#L814 | GoPro VC5 decompressor ]]:
      ```
      raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR
      RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM
      2018-12-22 21:23:03
      Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench
      Run on (8 X 4012.81 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 3.41, 2.41, 2.03
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      GOPR9172.GPR/threads:8/real_time_mean           40 ms         40 ms        128   0.322244          7.96974        12M       37.4457M        298.534M      3.12047       24.8778   0.040465
      GOPR9172.GPR/threads:8/real_time_median         39 ms         39 ms        128   0.312606          7.99155        12M        38.387M        306.788M      3.19891       25.5656   0.039115
      GOPR9172.GPR/threads:8/real_time_stddev          4 ms          3 ms        128  0.0271557         0.130575          0        2.4941M        21.3909M     0.207842       1.78257   3.81081m
      RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9
      2018-12-22 21:23:08
      Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
      Run on (8 X 4013.1 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 3.78, 2.50, 2.06
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      GOPR9172.GPR/threads:8/real_time_mean           39 ms         39 ms        128   0.311533          7.97323        12M       38.6828M        308.471M      3.22356        25.706  0.0390928
      GOPR9172.GPR/threads:8/real_time_median         38 ms         38 ms        128   0.304231          7.99005        12M       39.4437M        315.527M      3.28698        26.294  0.0380316
      GOPR9172.GPR/threads:8/real_time_stddev          3 ms          3 ms        128  0.0229149         0.133814          0       2.26225M        19.1421M     0.188521       1.59517   3.13671m
      Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
      Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
      --------------------------------------------------------------------------------------------------------------------------------------
      GOPR9172.GPR/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
      GOPR9172.GPR/threads:8/real_time_mean                  -0.0339         -0.0316            40            39            40            39
      GOPR9172.GPR/threads:8/real_time_median                -0.0277         -0.0274            39            38            39            38
      GOPR9172.GPR/threads:8/real_time_stddev                -0.1769         -0.1267             4             3             3             3
      ```
      I.e. this results in //roughly// -3% improvements in perf.
      
      While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]], it won't address it fully.
      
      Reviewers: RKSimon, craig.topper, andreadb, spatel
      
      Reviewed By: craig.topper
      
      Subscribers: courbet, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56052
      
      llvm-svn: 351253
      fb4eed38
    • David Callahan's avatar
      treat invoke like call · d129d3e9
      David Callahan authored
      Summary:
      InvokeInst should be treated like CallInst and
      assigned a separate discriminator. This is particularly
      import when an Invoke is converted to a Call
      during compilation and so can invalidate sample profile
      data collected wtih different link time optimizations
      
      Reviewers: twoh, Kader, danielcdh, wmi
      
      Reviewed By: wmi
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56491
      
      llvm-svn: 351251
      d129d3e9
    • Peter Collingbourne's avatar
      gn build: Move target flags from toolchain to a .gni file. · e6b1a341
      Peter Collingbourne authored
      While here, add a use_lld flag and default it to true when using
      clang on non-mac.
      
      Differential Revision: https://reviews.llvm.org/D56710
      
      llvm-svn: 351248
      e6b1a341
    • Matt Morehouse's avatar
      [SanitizerCoverage] Don't create comdat for interposable functions. · 19ff35c4
      Matt Morehouse authored
      Summary:
      Comdat groups override weak symbol behavior, allowing the linker to keep
      the comdats for weak symbols in favor of comdats for strong symbols.
      
      Fixes the issue described in:
      https://bugs.chromium.org/p/chromium/issues/detail?id=918662
      
      Reviewers: eugenis, pcc, rnk
      
      Reviewed By: pcc, rnk
      
      Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56516
      
      llvm-svn: 351247
      19ff35c4
    • Peter Collingbourne's avatar
      gn build: Add build files for compiler-rt/lib/{hwasan,interception,sanitizer_common,ubsan}. · 4a22fb18
      Peter Collingbourne authored
      This allows the hwasan runtime to be built for Android aarch64.
      
      Differential Revision: https://reviews.llvm.org/D56628
      
      llvm-svn: 351246
      4a22fb18
    • Peter Collingbourne's avatar
      gn build: Merge r351216, r351228. · 907ea9f1
      Peter Collingbourne authored
      llvm-svn: 351242
      907ea9f1
    • Alexey Bataev's avatar
      [SLP] Added test for PR40310, NFC. · 9514b1c6
      Alexey Bataev authored
      llvm-svn: 351240
      9514b1c6
    • Michael Trent's avatar
      llvm-objdump -m -D should disassemble all text segments · 7e660211
      Michael Trent authored
      Summary:
      When running llvm-objdump with the -macho option objdump will by default
      disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when
      disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is
      treated no diferently than -disassemble.
      
      This change upates llvm-objdump's MachO parsing code to disassemble all
      __text sections found in a file when -disassemble-all is specified. This
      is useful for disassembling files with more than one __text section, or
      when disassembling files whose __text section is not present in __TEXT.
      
      I added a lit test case that verifies "llvm-objdump -m -d" and 
      "llvm-objdump -m -D" produce the expected results on a reference binary. 
      I also updated the CommandGuide documentation for llvm-objdump.rst and
      verified it renders correctly as man and html.
      
      rdar://42899338
      
      Reviewers: ab, pete, lhames
      
      Reviewed By: lhames
      
      Subscribers: rupprecht, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56649
      
      llvm-svn: 351238
      7e660211
    • Craig Topper's avatar
      [X86] Add versions of the avx512 gather intrinsics that take the mask as a... · 82015b63
      Craig Topper authored
      [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
      
      In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.
      
      I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.
      
      I'll work on scatter and gatherpf/scatterpf next.
      
      Differential Revision: https://reviews.llvm.org/D56527
      
      llvm-svn: 351234
      82015b63
Loading