Skip to content
  1. Mar 26, 2014
    • Quentin Colombet's avatar
      [X86] Add broadcast instructions to the table used by ExeDepsFix pass. · 6f12ae0d
      Quentin Colombet authored
      Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
      That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
      across domain (int <-> float).
      
      In particular, prior to this patch we were generating:
        vpbroadcastd  LCPI1_0(%rip), %ymm2
        vpand %ymm2, %ymm0, %ymm0
        vmaxps  %ymm1, %ymm0, %ymm0 ## <- domain change penalty
      
      Now, we generate the following nice sequence where everything is in the float
      domain:
        vbroadcastss  LCPI1_0(%rip), %ymm2
        vandps  %ymm2, %ymm0, %ymm0
        vmaxps  %ymm1, %ymm0, %ymm0
      
      <rdar://problem/16354675>
      
      llvm-svn: 204770
      6f12ae0d
    • Rafael Espindola's avatar
      Create .symtab_shndxr only when needed. · 10be0837
      Rafael Espindola authored
      We need .symtab_shndxr if and only if a symbol references a section with an
      index >= 0xff00.
      
      The old code was trying to figure out if the section was needed ahead of time,
      making it a fairly dependent on the code actually writing the table. It was
      also somewhat conservative and would create the section in cases where it was
      not needed.
      
      If I remember correctly, the old structure was there so that the sections were
      created in the same order gas creates them. That was valuable when MC's support
      for ELF was new and we tested with elf-dump.py.
      
      This patch refactors the symbol table creation to another class and makes it
      obvious that .symtab_shndxr is really only created when we are about to output
      a reference to a section index >= 0xff00.
      
      While here, also improve the tests to use macros. One file is one section
      short of needing .symtab_shndxr, the second one has just the right number.
      
      llvm-svn: 204769
      10be0837
    • Hal Finkel's avatar
      [PowerPC] Select between VSX A-type and M-type FMA instructions just before RA · 174e5909
      Hal Finkel authored
      The VSX instruction set has two types of FMA instructions: A-type (where the
      addend is taken from the output register) and M-type (where one of the product
      operands is taken from the output register). This adds a small pass that runs
      just after MI scheduling (and, thus, just before register allocation) that
      mutates A-type instructions (that are created during isel) into M-type
      instructions when:
      
       1. This will eliminate an otherwise-necessary copy of the addend
      
       2. One of the product operands is killed by the instruction
      
      The "right" moment to make this decision is in between scheduling and register
      allocation, because only there do we know whether or not one of the product
      operands is killed by any particular instruction. Unfortunately, this also
      makes the implementation somewhat complicated, because the MIs are not in SSA
      form and we need to preserve the LiveIntervals analysis.
      
      As a simple example, if we have:
      
      %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
      %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                              %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
        ...
        %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
                              %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
        ...
      
      We can eliminate the copy by changing from the A-type to the
      M-type instruction. This means:
      
        %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                              %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
      
      is replaced by:
      
        %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
                              %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
      
      and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
      
      llvm-svn: 204768
      174e5909
    • Bob Wilson's avatar
      [PGO] Add simplified branch weights for Objective-C for-collection loops. · 0ed74d96
      Bob Wilson authored
      Conceptually one of these loops is just a while-loop, but the actual code-gen
      is more complicated. We don't instrument all the different control flow edges
      to get accurate counts for each conditional branch, nor do I think it makes
      sense to do so. Instead, make the simplifying assumption that the loop behaves
      like a while-loop. Use the same branch weights for the first check for an
      empty collection as would be used for the back-edge of a while loop, and use
      that same weighting for the innermost loop, ignoring the possibility that there
      may be some extra code to go fetch more elements.
      
      llvm-svn: 204767
      0ed74d96
    • NAKAMURA Takumi's avatar
  2. Mar 25, 2014
Loading