Skip to content
  1. Feb 25, 2019
    • Ganesh Gopalasubramanian's avatar
      Test commit (remove a blank space) · f03939fc
      Ganesh Gopalasubramanian authored
      Change-Id: I69175571d3b1defeb85e96fdd87db5c3ccadcb63
      llvm-svn: 354775
      f03939fc
    • Simon Pilgrim's avatar
      [TTI] Add generic cost model for fixed point smul/umul · 9caf0f0d
      Simon Pilgrim authored
      Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.
      
      Differential Revision: https://reviews.llvm.org/D57925
      
      llvm-svn: 354774
      9caf0f0d
    • Alexey Bader's avatar
      [SYCL] Add clang front-end option to enable SYCL device compilation flow. · 3f62fa69
      Alexey Bader authored
      Patch by Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
      
      llvm-svn: 354773
      3f62fa69
    • Simon Atanasyan's avatar
      [mips] Reduce number of tools invocations in the test. NFC · 478cd32b
      Simon Atanasyan authored
      llvm-svn: 354772
      478cd32b
    • Simon Pilgrim's avatar
      [X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483) · c61f1e8e
      Simon Pilgrim authored
      Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.
      
      Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.
      
      Differential Revision: https://reviews.llvm.org/D58597
      
      llvm-svn: 354771
      c61f1e8e
    • James Henderson's avatar
      [yaml2obj]Re-allow dynamic sections to have raw content · fd99780c
      James Henderson authored
      Recently, support was added to yaml2obj to allow dynamic sections to
      have a list of entries, to make it easier to write tests with dynamic
      sections. However, this change also removed the ability to provide
      custom contents to the dynamic section, making it hard to test
      malformed contents (e.g. because the section is not a valid size to
      contain an array of entries). This change reinstates this. An error is
      emitted if raw content and dynamic entries are both specified.
      
      Reviewed by: grimar, ruiu
      
      Differential Review: https://reviews.llvm.org/D58543
      
      llvm-svn: 354770
      fd99780c
    • Peter Smith's avatar
      [ELF][ARM] Accept and ignore -p and -no-pipleline-knowledge · 777e1cfd
      Peter Smith authored
      The linux kernel uses an old flag -p/-no-pipeline-knowledge that is
      accepted by bfd and gold but ignored by modern versions of them. The
      original option is very old and is pre-ABI, it sometimes comes up in
      code-bases that had support for pre ABI toolchains. The Linux kernel uses
      it in 3 places in the ARM specific section.
      
      Differential Revision: https://reviews.llvm.org/D58540
      
      llvm-svn: 354769
      777e1cfd
    • Simon Tatham's avatar
      [ARM] Make fullfp16 instructions not conditionalisable. · b70fc0c5
      Simon Tatham authored
      More or less all the instructions defined in the v8.2a full-fp16
      extension are defined as UNPREDICTABLE if you put them in an IT block
      (Thumb) or use with any condition other than AL (ARM). LLVM didn't
      know that, and was happy to conditionalise them.
      
      In order to force these instructions to count as not predicable, I had
      to make a small Tablegen change. The code generation back end mostly
      decides if an instruction was predicable by looking for something it
      can identify as a predicate operand; there's an isPredicable bit flag
      that overrides that check in the positive direction, but nothing that
      overrides it in the negative direction.
      
      (I considered the alternative approach of actually removing the
      predicate operand from those instructions, but thought that it would
      be more painful overall for instructions differing only in data type
      to have different shapes of operand list. This way, the only code that
      has to notice the difference is the if-converter.)
      
      So I've added an isUnpredicable bit alongside isPredicable, and set
      that bit on the right subset of FP16 instructions, and also on the
      VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
      unpredicable for all data types.
      
      I've included a couple of representative regression tests, both of
      which previously caused an fp16 instruction to be conditionalised in
      ARM state and (with -arm-no-restrict-it) to be put in an IT block in
      Thumb.
      
      Reviewers: SjoerdMeijer, t.p.northover, efriedma
      
      Reviewed By: efriedma
      
      Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D57823
      
      llvm-svn: 354768
      b70fc0c5
    • Roman Lebedev's avatar
      [llvm-exegesis] Split Epsilon param into two (PR40787) · 542e5d7b
      Roman Lebedev authored
      Summary:
      This eps param is used for two distinct things:
      * initial point clusterization
      * checking clusters against the llvm values
      
      What if one wants to only look at highly different clusters, without changing
      the clustering itself? In particular, this helps to weed out noisy measurements
      (since the clusterization epsilon is still small, so there is a better chance
      that noisy measurements from the same opcode will go into different clusters)
      
      By splitting it into two params it is now possible.
      
      This is nearly-free performance-wise:
      Old:
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 10099 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
      
                  390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                      12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                       0      cpu-migrations            #    0.000 K/sec
                    4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
              1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
               185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
               392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
              1839236666      instructions              #    1.18  insn per cycle
                                                        #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
               407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
                10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)
      
                0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
      ```
      ```
      $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 50572 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
      
                 6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
                     262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                       0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
                   13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
             27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
              1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
             16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
             17611143370      instructions              #    0.65  insn per cycle
                                                        #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
              3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
               116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)
      
                  6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
      ```
      New:
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 10099 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
      
                  400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                      12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                       0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
                    4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
              1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
               199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
               402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
              1847783963      instructions              #    1.15  insn per cycle
                                                        #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
               407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
                10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)
      
                 0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )
      
      lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 50572 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
      
                 6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
                     217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                       1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
                   13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
             27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
              1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
             16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
             17755545931      instructions              #    0.64  insn per cycle
                                                        #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
              3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
               117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)
      
                  6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
      ```
      
      I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
      Within noise i'd say.
      
      Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
      
      Reviewers: courbet, gchatelet
      
      Reviewed By: courbet
      
      Subscribers: tschuett, RKSimon, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58476
      
      llvm-svn: 354767
      542e5d7b
    • Pavel Labath's avatar
      Finish revert of r354706 · ad96b0e6
      Pavel Labath authored
      The revert in r354711 wasn't complete. Finish the job.
      
      llvm-svn: 354766
      ad96b0e6
    • Kadir Cetinkaya's avatar
      [clangd] Add thread priority lowering for MacOS as well · f47177dd
      Kadir Cetinkaya authored
      Reviewers: ilya-biryukov
      
      Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D58492
      
      llvm-svn: 354765
      f47177dd
    • Roman Lebedev's avatar
      [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert" · 49b6f81a
      Roman Lebedev authored
      Summary:
      This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert
      
      Abstractions are great.
      Readable code is great.
      JSON support library is a *good* idea.
      
      However unfortunately, there is an internal detail that one needs
      to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
      So for **every** `llvm::json::Object`, even if you only store a single `int`
      entry there, you pay the whole price of `llvm::DenseMap`.
      
      Unfortunately, it matters for `llvm-xray`.
      
      I was trying to analyse the `llvm-exegesis` analysis mode performance,
      and for that i wanted to view the LLVM X-Ray log visualization in Chrome
      trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
      even ended up being killed by OOM.
      
      `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
      (compiled with ` -fxray-instruction-threshold=128`)
      analysis mode over `-benchmarks-file` with 10099 points (one full
      latency measurement set), with normal runtime of 0.387s.
      
      Timings:
      Old: (copied from D58580)
      ```
      $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
      
       Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
      
                21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
                     314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
                       1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
                 2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
             85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
             14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
             32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
             67896890228      instructions              #    0.79  insn per cycle
                                                        #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
             14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
               212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)
      
                 21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )
      ```
      New:
      ```
      $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
      
       Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):
      
                 7178.38 msec task-clock                #    1.000 CPUs utilized            ( +-  0.26% )
                     182      context-switches          #   25.402 M/sec                    ( +- 28.84% )
                       0      cpu-migrations            #    0.046 M/sec                    ( +- 70.71% )
                   33701      page-faults               # 4694.994 M/sec                    ( +-  0.88% )
             28761053971      cycles                    # 4006833.933 GHz                   ( +-  0.26% )  (83.32%)
              2028297997      stalled-cycles-frontend   #    7.05% frontend cycles idle     ( +-  1.61% )  (83.32%)
             10773154901      stalled-cycles-backend    #   37.46% backend cycles idle      ( +-  0.38% )  (33.36%)
             36199132874      instructions              #    1.26  insn per cycle
                                                        #    0.30  stalled cycles per insn  ( +-  0.03% )  (50.02%)
              6434504227      branches                  # 896420204.421 M/sec               ( +-  0.03% )  (66.68%)
                73355176      branch-misses             #    1.14% of all branches          ( +-  1.46% )  (83.33%)
      
                  7.1807 +- 0.0190 seconds time elapsed  ( +-  0.26% )
      ```
      
      So using `llvm::json` nearly triples run-time on that test case.
      (+3x is times, not percent.)
      
      Memory:
      Old:
      ```
      total runtime: 39.88s.
      bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
      calls to allocation functions: 33267816 (834135/s)
      temporary memory allocations: 5832298 (146235/s)
      peak heap memory consumption: 9.21GB
      peak RSS (including heaptrack overhead): 147.98GB
      total memory leaked: 1.09MB
      ```
      New:
      ```
      total runtime: 17.42s.
      bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
      calls to allocation functions: 21382982 (1227284/s)
      temporary memory allocations: 232858 (13364/s)
      peak heap memory consumption: 350.69MB
      peak RSS (including heaptrack overhead): 2.55GB
      total memory leaked: 79.95KB
      ```
      Diff:
      ```
      total runtime: -22.46s.
      bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
      calls to allocation functions: -11884834 (529155/s)
      temporary memory allocations: -5599440 (249307/s)
      peak heap memory consumption: -8.86GB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: -1.01MB
      ```
      So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
      And total allocation count +15x. Both of these numbers are times, *not* percent.
      
      And note that memory usage is clearly unbound with `llvm::json`, it directly depends
      on the length of the log, so peak memory consumption is always increasing.
      This isn't so with the dumb code, there is no accumulating memory consumption,
      peak memory consumption is fixed. Naturally, that means it will handle *much*
      larger logs without OOM'ing.
      
      Readability is good, but the price is simply unacceptable here.
      Too bad none of this analysis was done as part of the development/review D50129 itself.
      
      Reviewers: dberris, kpw, sammccall
      
      Reviewed By: dberris
      
      Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58584
      
      llvm-svn: 354764
      49b6f81a
    • Craig Topper's avatar
      [SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a... · 8c9724ea
      Craig Topper authored
      [SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair.
      
      OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
      
      This saves ~3000 bytes in the X86 table.
      
      llvm-svn: 354763
      8c9724ea
    • Kang Zhang's avatar
      [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction... · 4faa4090
      Kang Zhang authored
      [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
      
      Summary:
      Fast selection of llvm fptoi & fptrunc instructions is not handled well about
      VSX instruction support.
      We'd use VSX float convert integer instruction instead of non-vsx float convert
      integer instruction if the operand register class is VSSRC or VSFRC because i32
      and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
      openeded.
      For float trunc instruction, we do this silimar work like float convert integer
      instruction to try to use VSX instruction.
      
      Reviewed By: jsji
      
      Differential Revision: https://reviews.llvm.org/D58430
      
      llvm-svn: 354762
      4faa4090
    • Marc-Andre Laperle's avatar
      [clangd] Enhance macro hover to see full definition · 25e69027
      Marc-Andre Laperle authored
      Summary: Signed-off-by: Marc-Andre Laperle <malaperle@gmail.com>
      
      Reviewers: simark, ilya-biryukov, sammccall, ioeric, hokein
      
      Reviewed By: ilya-biryukov
      
      Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D55250
      
      llvm-svn: 354761
      25e69027
  2. Feb 24, 2019
    • Nikita Popov's avatar
      [InstCombine] Add tests for PR40846; NFC · b7918f3c
      Nikita Popov authored
      The icmps are the same as the overflow result of the intrinsic.
      
      llvm-svn: 354760
      b7918f3c
    • Nikita Popov's avatar
      [InstCombine] Move with.overflow tests to separate file; NFC · bdefe478
      Nikita Popov authored
      And regenerate checks. I had to rename some variables, because
      update_test_checks can't deal with the same variable names used
      in lower and upper case. I've also dropped the result type aliases,
      as just using the type directly gives a cleaner result.
      
      llvm-svn: 354759
      bdefe478
    • Simon Pilgrim's avatar
      [X86] Add PR40483 test cases · f43c48cb
      Simon Pilgrim authored
      Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops
      
      llvm-svn: 354758
      f43c48cb
    • Simon Pilgrim's avatar
      [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637) · cfaf663a
      Simon Pilgrim authored
      Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.
      
      llvm-svn: 354757
      cfaf663a
    • Craig Topper's avatar
      [X86] Fix tls variable lowering issue with large code model · 3fe4bd46
      Craig Topper authored
      Summary:
      The problem here is the lowering for tls variable. Below is the DAG for the code.
      SelectionDAG has 11 nodes:
      
      t0: ch = EntryToken
            t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
              t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
            t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
          t12: i64 = add t8, t11
        t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
      t6: ch = CopyToReg t0, Register:i32 %0, t4
      And when mcmodel is large, below instruction can NOT be folded.
      
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
      So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"
      
      When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.
      
      The patch is to fold the load and X86ISD::WrapperRIP.
      
      Fixes PR26906
      
      Patch by LuoYuanke
      
      Reviewers: craig.topper, rnk, annita.zhang, wxiao3
      
      Reviewed By: rnk
      
      Subscribers: llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58336
      
      llvm-svn: 354756
      3fe4bd46
    • Craig Topper's avatar
      [X86][SSE] Use pblendw for v4i32/v2i64 during isel. · 5532a987
      Craig Topper authored
      Summary:
      
      Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.
      
      Reviewers: spatel, RKSimon
      
      Reviewed By: RKSimon
      
      Subscribers: jdoerfert, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58574
      
      llvm-svn: 354755
      5532a987
    • Craig Topper's avatar
      [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake. · ce2bd19c
      Craig Topper authored
      Summary:
      The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.
      
      The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.
      
      Reviewers: RKSimon, andreadb
      
      Reviewed By: RKSimon
      
      Subscribers: gbedwell, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58581
      
      llvm-svn: 354754
      ce2bd19c
    • Craig Topper's avatar
      [LegalizeTypes][AArch64][X86] Make type legalization of vector... · be334857
      Craig Topper authored
      [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
      
      Summary:
      When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
      
      We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
      
      Reviewers: spatel, RKSimon, nikic
      
      Reviewed By: nikic
      
      Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58567
      
      llvm-svn: 354753
      be334857
    • Kristina Brooks's avatar
      Fix accidentally used hard tabs. NFC · 103799c0
      Kristina Brooks authored
      Big sorry. This undoes the indentation mess I made
      in r354751.
      
      llvm-svn: 354752
      103799c0
    • Kristina Brooks's avatar
      Wrap code for builtin_assume_aligned at 80 col.NFC · 716cbfb4
      Kristina Brooks authored
      Minor style fix to avoid going over 80 cols in handling
      of case for Builtin::BI__builtin_assume_aligned. NFC.
      
      llvm-svn: 354751
      716cbfb4
    • Sanjay Patel's avatar
      [InstCombine] add test for icmp+add fold; NFC · 26aa7024
      Sanjay Patel authored
      llvm-svn: 354750
      26aa7024
    • Simon Pilgrim's avatar
      [X86][AVX] Rename lowerShuffleByMerging128BitLanes to... · 4f4f9abd
      Simon Pilgrim authored
      [X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
      
      Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.
      
      llvm-svn: 354749
      4f4f9abd
    • Sanjay Patel's avatar
      [InstCombine] canonicalize add/sub with bool · 9907d3c8
      Sanjay Patel authored
      add A, sext(B) --> sub A, zext(B)
      
      We have to choose 1 of these forms, so I'm opting for the
      zext because that's easier for value tracking.
      
      The backend should be prepared for this change after:
      D57401
      rL353433
      
      This is also a preliminary step towards reducing the amount
      of bit hackery that we do in IR to optimize icmp/select.
      That should be waiting to happen at a later optimization stage.
      
      The seeming regression in the fuzzer test was discussed in:
      D58359
      
      We were only managing that fold in instcombine by luck, and
      other passes should be able to deal with that better anyway.
      
      llvm-svn: 354748
      9907d3c8
    • Sanjay Patel's avatar
      [InstCombine] regenerate checks; NFC · 986a024c
      Sanjay Patel authored
      llvm-svn: 354747
      986a024c
    • Sanjay Patel's avatar
      [CGP] add special-cases to form unsigned add with overflow (PR40486) · cb04ba03
      Sanjay Patel authored
      There's likely a missed IR canonicalization for at least 1 of these
      patterns. Otherwise, we wouldn't have needed the pattern-matching
      enhancement in D57516.
      
      Note that -- unlike usubo added with D57789 -- the TLI hook for
      this transform defaults to 'on'. So if there's any perf fallout
      from this, targets should look at how they're lowering the uaddo
      node in SDAG and/or override that hook.
      
      The x86 diffs suggest that there's some missing pattern-matching
      for forming inc/dec.
      
      This should fix the remaining known problems in:
      https://bugs.llvm.org/show_bug.cgi?id=40486
      https://bugs.llvm.org/show_bug.cgi?id=31754
      
      llvm-svn: 354746
      cb04ba03
    • Simon Pilgrim's avatar
    • Heejin Ahn's avatar
      [WebAssembly] Rename a variable in CFGStackify (NFC) · 20cf0749
      Heejin Ahn authored
      llvm-svn: 354744
      20cf0749
    • Heejin Ahn's avatar
      [WebAssembly] Merge two identical switch case routines into one (NFC) · 25d924b4
      Heejin Ahn authored
      llvm-svn: 354743
      25d924b4
    • Michael Liao's avatar
      Typo: s/CHCCK/CHECK · 7faef3d1
      Michael Liao authored
      llvm-svn: 354742
      7faef3d1
    • Michael Liao's avatar
      [NFC] Minor coding style (indent) fix. · 8676f12a
      Michael Liao authored
      llvm-svn: 354741
      8676f12a
    • Philip Reames's avatar
      [Hexagon, SystemZ] Be super conservative about atomics · 33d7e49b
      Philip Reames authored
      As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
      
      Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.  
      
      llvm-svn: 354740
      33d7e49b
    • Duncan P. N. Exon Smith's avatar
      VFS: Avoid some unnecessary std::string copies · e7b94649
      Duncan P. N. Exon Smith authored
      Thread Twine a little deeper through the VFS to avoid unnecessarily
      constructing the same std::string twice in a parameter sequence:
      
          Twine -> std::string -> StringRef -> std::string
      
      Changing a few parameters from StringRef to Twine avoids the early call
      to `Twine::str()`.
      
      llvm-svn: 354739
      e7b94649
  3. Feb 23, 2019
    • Craig Topper's avatar
      [TwoAddressInstructionPass] After commuting an instruction and before trying... · dc185522
      Craig Topper authored
      [TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
      
      The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.
      
      X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.
      
      Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.
      
      llvm-svn: 354738
      dc185522
    • Craig Topper's avatar
      Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" · be9eeb55
      Craig Topper authored
      And its follow ups r354511, r354640.
      
      A follow patch will fix the issue that caused it to be reverted.
      
      llvm-svn: 354737
      be9eeb55
    • Richard Smith's avatar
      Enable coroutines under -std=c++2a. · 10ab78e8
      Richard Smith authored
      llvm-svn: 354736
      10ab78e8
Loading