Commits · 06e6f1cae2ea0ca409ab97d4355918000b6fb599 · Roger Ferrer / llvm-epi

Aug 04, 2014

[x86] Implement more aggressive use of PACKUS chains for lowering common · 06e6f1ca

Chandler Carruth authored Aug 04, 2014

patterns of v16i8 shuffles.

This implements one of the more important FIXMEs for the SSE2 support in
the new shuffle lowering. We now generate the optimal shuffle sequence
for truncate-derived shuffles which show up essentially everywhere.

Unfortunately, this exposes a weakness in other parts of the shuffle
logic -- we can no longer form PSHUFB here. I'll add the necessary
support for that and other things in a subsequent commit.

llvm-svn: 214702

06e6f1ca

[x86] Handle single input shuffles in the SSSE3 case more intelligently. · 37a18821

Chandler Carruth authored Aug 04, 2014

I spent some time looking into a better or more principled way to handle
this. For example, by detecting arbitrary "unneeded" ORs... But really,
there wasn't any point. We just shouldn't build blatantly wrong code so
late in the pipeline rather than adding more stages and logic later on
to fix it. Avoiding this is just too simple.

llvm-svn: 214680

37a18821

[x86] Fix the test case added in r214670 and tweaked in r214674 further. · 7bbfd245

Chandler Carruth authored Aug 04, 2014

Fundamentally, there isn't a really portable way to test the constant
pool contents. Instead, pin this test to the bare-metal triple. This
also makes it a 64-bit triple which allows us to only match a single
constant pool rather than two. It can also just hard code the '.' prefix
as the format should be stable now that it has a fixed triple. Finally,
I've switched it to use CHECK-NEXT to be more precise in the instruction
sequence expected and to use variables rather than hard coding decisions
by the register allocator.

llvm-svn: 214679

7bbfd245

Account for possible leading '.' in label string. · 065cabf4
Sanjay Patel authored Aug 03, 2014
```
llvm-svn: 214674
```
065cabf4

fix for PR20354 - Miscompile of fabs due to vectorization · 2ef67440

Sanjay Patel authored Aug 03, 2014

This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.

This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.

There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.

llvm-svn: 214670

2ef67440

Aug 02, 2014

[x86] Give this test a bare metal triple so it doesn't use the weird · bec57b40

Chandler Carruth authored Aug 02, 2014

Darwin x86 asm comment prefix designed to work around GAS on that
platform. That makes the comment-matching of the test much more stable.

llvm-svn: 214629

bec57b40

[x86] Largely complete the use of PSHUFB in the new vector shuffle · 4c57955f

Chandler Carruth authored Aug 02, 2014

lowering with a small addition to it and adding PSHUFB combining.

There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).

However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.

Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.

The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.

llvm-svn: 214628

4c57955f

[x86] Teach the target shuffle mask extraction to recognize unary forms · 34f9a987

Chandler Carruth authored Aug 02, 2014

of normally binary shuffle instructions like PUNPCKL and MOVLHPS.

This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.

As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.

Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.

llvm-svn: 214625

34f9a987

[x86] Make some questionable tests not spew assembly to stdout, which · 063f425e

Chandler Carruth authored Aug 02, 2014

makes a mess of the lit output when they ultimately fail.

The 2012-10-02-DAGCycle test is really frustrating because the *only*
explanation for what it is testing is a rdar link. I would really rather
that rdar links (which are not public or part of the open source
project) were not committed to the source code. Regardless, the actual
problem *must* be described as the rdar link is completely opaque. The
fact that this test didn't check for any particular output further
exacerbates the inability of any other developer to debug failures.

The mem-promote-integers test has nice comments and *seems* to be
a great test for our lowering... except that we don't actually check
that any of the generated code is correct or matches some pattern. We
just avoid crashing. It would be great to go back and populate this test
with the actual expectations.

llvm-svn: 214605

063f425e

[X86] Simplify X87 stackifier pass. · 3516669a

Akira Hatanaka authored Aug 01, 2014

Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.

It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:

1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0

<rdar://problem/16952634>

llvm-svn: 214580

3516669a

Aug 01, 2014

MS inline asm: Hide symbol to attempt to fix test failure on darwin · 6a2de900
Reid Kleckner authored Aug 01, 2014
```
If the symbol comes from an external DSO, it apparently requires
indirection through a register.

llvm-svn: 214571
```
6a2de900

MS inline asm: Use memory constraints for functions instead of registers · 5b37c181

Reid Kleckner authored Aug 01, 2014

This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.

This fixes errors about requiring more registers than available in
cases like this:
  void f();
  void __declspec(naked) g() {
    __asm pusha
    __asm call f
    __asm popa
    __asm ret
  }

There are no registers available to pass the address of 'f' into the asm
blob.  The asm should now directly call 'f'.

Tests will land in Clang shortly.

llvm-svn: 214550

5b37c181

Explicitly report runtime stack realignment in StackMap section · 87c2b605

Philip Reames authored Aug 01, 2014

This change adds code to explicitly mark a function which requires runtime stack realignment as not having a fixed frame size in the StackMap section. As it happens, this is not actually a functional change. The size that would be reported without the check is also "-1", but as far as I can tell, that's an accident. The code change makes this explicit.

Note: There's a separate bug in handling of stackmaps and patchpoints in functions which need dynamic frame realignment. The current code assumes that offsets can be calculated from RBP, but realigned frames must use RSP. (There's a variable gap between RBP and the spill slots.) This change set does not address that issue.

Reviewers: atrick, ributzka

Differential Revision: http://reviews.llvm.org/D4572

llvm-svn: 214534

87c2b605

Jul 31, 2014

[FastISel] Fix the patchpoint intrinsic lowering in FastISel for large target addresses. · e8514fc1

Juergen Ributzka authored Jul 31, 2014

This fixes a mistake where I accidentially dropped the upper 32bit of a
64bit pointer during FastISel lowering of the patchpoint intrinsic.

llvm-svn: 214367

e8514fc1

[FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering. · 39032673

Juergen Ributzka authored Jul 31, 2014

UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

llvm-svn: 214366

39032673

Use "weak alias" instead of "alias weak" · 464fe024

Rafael Espindola authored Jul 30, 2014

Before this patch we had

@a = weak global ...
but
@b = alias weak ...

The patch changes aliases to look more like global variables.

Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.

The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.

llvm-svn: 214355

464fe024

Jul 30, 2014

Fix test case introduced in r214322 · 7d7ab5d1

Louis Gerbarg authored Jul 30, 2014

This patch adds an explicit triple to the test case introduced by r214322. This
should fix build failueres that are occuring on bots that are cross building.

llvm-svn: 214330

7d7ab5d1

Retain alignment requirements for load->selects modified by DAGCombine · 4fc09b36

Louis Gerbarg authored Jul 30, 2014

DAGCombine may choose to rewrite graphs where two loads feed a select into
graphs where a select of two addresses feed a load. While it sanity checks the
loads to make sure they are broadly equivalent it currently just uses the
alignment restriction of the left node. In cases where the right node has
stronger alignment requiresment this may lead to bad codegen, such as generating
an aligned load where an unaligned load is required. This patch makes the
combine generate a load with an alignment that is the same as whichever is more
restrictive of the two alignments.

Tests included.

rdar://17762530

llvm-svn: 214322

4fc09b36

[AVX512] Test that _mm512_set1_* intrinsics generate broadcasts · f1a80c1e
Adam Nemet authored Jul 30, 2014
```
llvm-svn: 214275
```
f1a80c1e
[AVX512] Add missing CHECK-LABEL · 9dcc254a
Adam Nemet authored Jul 30, 2014
```
llvm-svn: 214273
```
9dcc254a

Jul 29, 2014

[Debug Info] add DISubroutineType and its creation takes DITypeArray. · f8a1967c

Manman Ren authored Jul 28, 2014

DITypeArray is an array of DITypeRef, at its creation, we will create
DITypeRef (i.e use the identifier if the type node has an identifier).

This is the last patch to unique the type array of a subroutine type.

rdar://17628609

llvm-svn: 214132

f8a1967c

Jul 28, 2014

[SKX] Enabling mask logic instructions: encoding, lowering · 595683da

Robert Khasanov authored Jul 28, 2014

Instructions: KAND{BWDQ}, KANDN{BWDQ}, KOR{BWDQ}, KXOR{BWDQ}, KXNOR{BWDQ}

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 214081

595683da

Jul 27, 2014

[x86] Add a much more powerful framework for combining x86 shuffle · 80c5bfd8

Chandler Carruth authored Jul 27, 2014

instructions in the legalized DAG, and leverage it to combine long
sequences of instructions to PSHUFB.

Eventually, the other x86-instruction-specific shuffle combines will
probably all be driven out of this routine. But the real motivation is
to detect after we have fully legalized and optimized a shuffle to the
minimal number of x86 instructions whether it is profitable to replace
the chain with a fully generic PSHUFB instruction even though doing so
requires either a load from a constant pool or tying up a register with
the mask.

While the Intel manuals claim it should be used when it replaces 5 or
more instructions (!!!!) my experience is that it is actually very fast
on modern chips, and so I've gon with a much more aggressive model of
replacing any sequence of 3 or more instructions.

I've also taught it to do some basic canonicalization to special-purpose
instructions which have smaller encodings than their generic
counterparts.

There are still quite a few FIXMEs here, and I've not yet implemented
support for lowering blends with PSHUFB (where its power really shines
due to being able to zero out lanes), but this starts implementing real
PSHUFB support even when using the new, fancy shuffle lowering. =]

llvm-svn: 214042

80c5bfd8

Jul 26, 2014

Fix the failing test 'vector-idiv.ll'. · ec981058
Joey Gouly authored Jul 26, 2014
```
On Darwin the comment character is ##.

llvm-svn: 214028
```
ec981058

[SDAG] When performing post-legalize DAG combining, run the legalizer · 411fb407

Chandler Carruth authored Jul 26, 2014

over each node in the worklist prior to combining.

This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.

This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.

Differential Revision: http://reviews.llvm.org/D4564

llvm-svn: 214020

411fb407

llvm/test/CodeGen/X86/vector-idiv.ll: Fix for -Asserts. · f2df3f59
NAKAMURA Takumi authored Jul 26, 2014
```
llvm-svn: 214015
```
f2df3f59

[x86] Fix PR20355 (for real). There are many layers to this bug. · 5896698e

Chandler Carruth authored Jul 26, 2014

The tale starts with r212808 which attempted to fix inversion of the low
and high bits when lowering MUL_LOHI. Sadly, that commit did not include
any positive test cases, and just removed some operations from a test
case where the actual logic being changed isn't fully visible from the
test.

What this commit did was two things. First, it reversed the low and high
results in the formation of the MERGE_VALUES node for the multiple
results. This is entirely correct.

Second it changed the shuffles for extracting the low and high
components from the i64 results of the multiplies to extract them
assuming a big-endian-style encoding of the multiply results. This
second change is wrong. There is no big-endian encoding in x86, the
results of the multiplies are normal v2i64s: when cast to v4i32, the low
i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3.

However, the first change wasn't enough to actually fix the bug, which
is (I assume) why the second change was also made. There was another bug
in the MERGE_VALUES formation: we weren't using a VTList, and so were
getting a single result node! When grabbing the *second* result from the
node, we got... well.. colud be anything. I think this *appeared* to
invert things, but had to be causing other problems as well.

Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
have been fine, right? NOOOPE! Because the core bug was never addressed,
the test in vector-idiv failed when I fixed the MERGE_VALUES node.
Because there are essentially no docs for this node, I had to guess at
how to fix it and tried swapping the operands, restoring the order of
the original code before r212808. While this "fixed" the test case (in
that we produced the write instructions) we were still extracting the
wrong elements of the i64s, and thus PR20355 was still broken.

This commit essentially reverts the big-endian-style extraction part of
r212808 and goes back to the original masks which were correct. Now that
the MERGE_VALUES node formation is also correct, everything works. I've
also included a more detailed test from PR20355 to make sure this stays
fixed.

llvm-svn: 214011

5896698e

[x86] Finish switching from CHECK to ALL. This was mistakenly included · 591c16a9

Chandler Carruth authored Jul 26, 2014

in r214007 and then reverted when I backed that (very misguided) patch
out. This recovers the test case cleanup which was good.

llvm-svn: 214010

591c16a9

[x86] Revert r214007: Fix PR20355 ... · f6406ac5

Chandler Carruth authored Jul 26, 2014

The clever way to implement signed multiplication with unsigned *is
already implemented* and tested and working correctly. The bug is
somewhere else. Re-investigating.

This will teach me to not scroll far enough to read the code that did
what I thought needed to be done.

llvm-svn: 214009

f6406ac5

[x86] Fix PR20355 (and dups) by not using unsigned multiplication when · 1bf4d191

Chandler Carruth authored Jul 26, 2014

signed multiplication is requested. While there is not a difference in
the *low* half of the result, the *high* half (used specifically to
implement the signed division by these constants) certainly is used. The
test case I've nuked was actively asserting wrong code.

There is a delightful solution to doing signed multiplication even when
we don't have it that Richard Smith has crafted, but I'll add the
machinery back and implement that in a follow-up patch. This at least
restores correctness.

llvm-svn: 214007

1bf4d191

[x86] Add coverage for PMUL* instruction testing on SSE2 as well as · 80adc640
Chandler Carruth authored Jul 26, 2014
```
SSE4.1.

llvm-svn: 214001
```
80adc640
[x86] More cleanup for this test -- simplify the command line. · 8709cb4a
Chandler Carruth authored Jul 26, 2014
```
llvm-svn: 213991
```
8709cb4a
[x86] FileCheck-ize this test. · 6da2d97a
Chandler Carruth authored Jul 25, 2014
```
llvm-svn: 213988
```
6da2d97a

Jul 25, 2014

[stack protector] Fix a potential security bug in stack protector where the · e5b6e0d2

Akira Hatanaka authored Jul 25, 2014

address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967

e5b6e0d2

Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for... · 2f040114

David Blaikie authored Jul 25, 2014

Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information.

Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson
reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped
provide a reduced reproduction, though the failure wasn't too hard to
guess, and even easier with the example to confirm.

The assertion that the subprogram metadata associated with an
llvm::Function matches the scope data referenced by the DbgLocs on the
instructions in that function is not valid under LTO. In LTO, a C++
inline function might exist in multiple CUs and the subprogram metadata
nodes will refer to the same llvm::Function. In this case, depending on
the order of the CUs, the first intance of the subprogram metadata may
not be the one referenced by the instructions in that function and the
assertion will fail.

A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the
assertion removed and a comment added to explain this situation.

This was then reverted again in r213581 as it caused PR20367. The root
cause of this was the early exit in LiveDebugVariables meant that
spurious DBG_VALUE intrinsics that referenced dead variables were not
removed, causing an assertion/crash later on. The fix is to have
LiveDebugVariables strip all DBG_VALUE intrinsics in functions without
debug info as they're not needed anyway. Test case added to cover this
situation (that occurs when a debug-having function is inlined into a
nodebug function) in test/DebugInfo/X86/nodebug_with_debug_loc.ll

Original commit message:

If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.

While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.

Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.

Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).

llvm-svn: 213952

2f040114

DebugInfo: Fix up some test cases to have more correct debug info metadata. · 48af9c35

David Blaikie authored Jul 25, 2014

* Add CUs to the named CU node
* Add missing DW_TAG_subprogram nodes
* Add llvm::Functions to the DW_TAG_subprogram nodes

This cleans up the tests so that they don't break under a
soon-to-be-made change that is more strict about such things.

llvm-svn: 213951

48af9c35

[X86] Add comments to clarify some non-obvious lines in the stackmap-nops.ll · 98c3c0f3
Lang Hames authored Jul 25, 2014
```
testcases.

Based on code review from Philip Reames. Thanks Philip!

llvm-svn: 213923
```
98c3c0f3

[SDAG] Introduce a combined set to the DAG combiner which tracks nodes · 9f4530b9

Chandler Carruth authored Jul 24, 2014

which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898

9f4530b9

[x86] Make vector legalization of extloads work more like the "normal" · 80b86946

Chandler Carruth authored Jul 24, 2014

vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.

Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.

With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.

There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.

Differential Revision: http://reviews.llvm.org/D4654

llvm-svn: 213897

80b86946

Jul 24, 2014

[X86] Optimize stackmap shadows on X86. · f49bc3f1

Lang Hames authored Jul 24, 2014

This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.

To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.

This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.

<rdar://problem/14959522>

llvm-svn: 213892

f49bc3f1