Skip to content
Commit e7c5041c authored by Sanjay Patel's avatar Sanjay Patel
Browse files

[CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion

The test diff for PowerPC shows we can better optimize if this case is one block.

For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed 
cheap and SDAG can't optimize across blocks. 

Instead of this:

_cmp_eq8:
  movq  (%rdi), %rax
  cmpq  (%rsi), %rax
  je  LBB23_1
## BB#2:                                ## %res_block
  movl  $1, %ecx
  jmp LBB23_3
LBB23_1:
  xorl  %ecx, %ecx
LBB23_3:                                ## %endblock
  xorl  %eax, %eax
  testl %ecx, %ecx
  sete  %al
  retq

We get this:

cmp_eq8:   
  movq  (%rdi), %rcx
  xorl  %eax, %eax
  cmpq  (%rsi), %rcx
  sete  %al
  retq

And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). 
If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable 
CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we 
can lower the IR produced here optimally.

Differential Revision: https://reviews.llvm.org/D34005

llvm-svn: 304987
parent 8cb1d093
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment