Skip to content
  1. Apr 12, 2013
  2. Apr 11, 2013
  3. Apr 10, 2013
  4. Apr 09, 2013
    • Nadav Rotem's avatar
      Add support for bottom-up SLP vectorization infrastructure. · 2d9dec32
      Nadav Rotem authored
      This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
      The infrastructure has three potential users:
      
        1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
      
        2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
      
        3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
      
      This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
      
      void SAXPY(int *x, int *y, int a, int i) {
        x[i]   = a * x[i]   + y[i];
        x[i+1] = a * x[i+1] + y[i+1];
        x[i+2] = a * x[i+2] + y[i+2];
        x[i+3] = a * x[i+3] + y[i+3];
      }
      
      llvm-svn: 179117
      2d9dec32
  5. Apr 05, 2013
  6. Mar 14, 2013
  7. Mar 10, 2013
  8. Mar 09, 2013
  9. Mar 08, 2013
  10. Mar 02, 2013
    • Nadav Rotem's avatar
      PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. · 739e37a0
      Nadav Rotem authored
      The LoopVectorizer often runs multiple times on the same function due to inlining.
      When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
      With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.
      
      PR14448.
      
      llvm-svn: 176399
      739e37a0
  11. Mar 01, 2013
  12. Feb 27, 2013
  13. Feb 21, 2013
    • Renato Golin's avatar
      Allow GlobalValues to vectorize with AliasAnalysis · cf928cb5
      Renato Golin authored
      Storing the load/store instructions with the values
      and inspect them using Alias Analysis to make sure
      they don't alias, since the GEP pointer operand doesn't
      take the offset into account.
      
      Trying hard to not add any extra cost to loads and stores
      that don't overlap on global values, AA is *only* calculated
      if all of the previous attempts failed.
      
      Using biggest vector register size as the stride for the
      vectorization access, as we're being conservative and
      the cost model (which calculates the real vectorization
      factor) is only run after the legalization phase.
      
      We might re-think this relationship in the future, but
      for now, I'd rather be safe than sorry.
      
      llvm-svn: 175818
      cf928cb5
  14. Feb 17, 2013
    • Hal Finkel's avatar
      BBVectorize: Fix an invalid reference bug · 76e65e45
      Hal Finkel authored
      This fixes PR15289. This bug was introduced (recently) in r175215; collecting
      all std::vector references for candidate pairs to delete at once is invalid
      because subsequent lookups in the owning DenseMap could invalidate the
      references.
      
      bugpoint was able to reduce a useful test case. Unfortunately, because whether
      or not this asserts depends on memory layout, this test case will sometimes
      appear to produce valid output. Nevertheless, running under valgrind will
      reveal the error.
      
      llvm-svn: 175397
      76e65e45
  15. Feb 15, 2013
    • Hal Finkel's avatar
      BBVectorize: Call a DAG and DAG instead of a tree · 89909397
      Hal Finkel authored
      Several functions and variable names used the term 'tree' to refer
      to what is actually a DAG. Correcting this mistake will, hopefully,
      prevent confusion in the future.
      
      No functionality change intended.
      
      llvm-svn: 175278
      89909397
    • Hal Finkel's avatar
      BBVectorize: Cap the number of candidate pairs in each instruction group · 283f4f0e
      Hal Finkel authored
      For some basic blocks, it is possible to generate many candidate pairs for
      relatively few pairable instructions. When many (tens of thousands) of these pairs
      are generated for a single instruction group, the time taken to generate and
      rank the different vectorization plans can become quite large. As a result, we now
      cap the number of candidate pairs within each instruction group. This is done by
      closing out the group once the threshold is reached (set now at 3000 pairs).
      
      Although this will limit the overall compile-time impact, this may not be the best
      way to achieve this result. It might be better, for example, to prune excessive
      candidate pairs after the fact the prevent the generation of short, but highly-connected
      groups. We can experiment with this in the future.
      
      This change reduces the overall compile-time slowdown of the csa.ll test case in
      PR15222 to ~5x. If 5x is still considered too large, a lower limit can be
      used as the default.
      
      This represents a functionality change, but only for very large inputs
      (thus, there is no regression test).
      
      llvm-svn: 175251
      283f4f0e
  16. Feb 14, 2013
  17. Feb 13, 2013
  18. Feb 12, 2013
    • Hal Finkel's avatar
      BBVectorize: Don't over-search when building the dependency map · 6ae564b4
      Hal Finkel authored
      When building the pairable-instruction dependency map, don't search
      past the last pairable instruction. For large blocks that have been
      divided into multiple instruction groups, searching past the last
      instruction in each group is very wasteful. This gives a 32% speedup
      on the csa.ll test case from PR15222 (when using 50 instructions
      in each group).
      
      No functionality change intended.
      
      llvm-svn: 174915
      6ae564b4
    • Hal Finkel's avatar
      BBVectorize: Omit unnecessary entries in PairableInstUsers · 39a95032
      Hal Finkel authored
      This map is queried only for instructions in pairs of pairable
      instructions; so make sure that only pairs of pairable
      instructions are added to the map. This gives a 3.5% speedup
      on the csa.ll test case from PR15222.
      
      No functionality change intended.
      
      llvm-svn: 174914
      39a95032
  19. Feb 11, 2013
  20. Feb 08, 2013
    • Hal Finkel's avatar
      BBVectorize: Use TTI->getAddressComputationCost · dd272184
      Hal Finkel authored
      This is a follow-up to the cost-model change in r174713 which splits
      the cost of a memory operation between the address computation and the
      actual memory access. In r174713, this cost is always added to the
      memory operation cost, and so BBVectorize will do the same.
      
      Currently, this new cost function is used only by ARM, and I don't
      have any ARM test cases for BBVectorize. Assistance in generating some
      good ARM test cases for BBVectorize would be greatly appreciated!
      
      llvm-svn: 174743
      dd272184
    • Jakob Stoklund Olesen's avatar
      Typos. · 479e5a93
      Jakob Stoklund Olesen authored
      llvm-svn: 174723
      479e5a93
    • Arnold Schwaighofer's avatar
      ARM cost model: Address computation in vector mem ops not free · 594fa2dc
      Arnold Schwaighofer authored
      Adds a function to target transform info to query for the cost of address
      computation. The cost model analysis pass now also queries this interface.
      The code in LoopVectorize adds the cost of address computation as part of the
      memory instruction cost calculation. Only there, we know whether the instruction
      will be scalarized or not.
      Increase the penality for inserting in to D registers on swift. This becomes
      necessary because we now always assume that address computation has a cost and
      three is a closer value to the architecture.
      
      radar://13097204
      
      llvm-svn: 174713
      594fa2dc
    • Michael Kuperstein's avatar
      Test Commit · f63b77be
      Michael Kuperstein authored
      llvm-svn: 174709
      f63b77be
  21. Feb 07, 2013
  22. Feb 05, 2013
    • Arnold Schwaighofer's avatar
      Loop Vectorizer: Refactor code to compute vectorized memory instruction cost · 3be40b56
      Arnold Schwaighofer authored
      Introduce a helper class that computes the cost of memory access instructions.
      No functionality change intended.
      
      llvm-svn: 174422
      3be40b56
    • Arnold Schwaighofer's avatar
      Loop Vectorizer: Handle pointer stores/loads in getWidestType() · 22174f5d
      Arnold Schwaighofer authored
      In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
      type when computing the widest type within a loop. This meant that if we had
      only stores/loads of pointers in a loop we would return a widest type of 8bits
      (instead of 32 or 64 bit) and therefore a vector factor that was too big.
      
      Now, if we see a consecutive store/load of pointers we use the size of a pointer
      (from data layout).
      
      This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
      test case is the first test in vector_ptr_load_store.ll).
      
      radar://13139343
      
      llvm-svn: 174377
      22174f5d
Loading