Commits · 1ec49110a7239a0be65bb0d4aea3f86fbdba48fa · Roger Ferrer / llvm-epi

Dec 18, 2018

[llvm-dwarfdump] - Do not error out on R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. · 1ec49110

George Rimar authored Dec 18, 2018

This is https://bugs.llvm.org/show_bug.cgi?id=39992,

If we have the following code (test.cpp):

thread_local int tdata = 24;
and build an .o file with debug information:

clang --target=x86_64-pc-linux -c bar.cpp -g

Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
(clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me)

Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping
object and reports an
error:
failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file

This relocation represents the offset in the TLS block and resolved by the linker,
but this info is unavailable at the
point when the object file is dumped by this tool.

The patch adds the simple evaluation for such relocations to avoid emitting errors.
Resulting behavior seems to be equal to GNU dwarfdump.

Differential revision: https://reviews.llvm.org/D55762

llvm-svn: 349476

1ec49110

[MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR · 150fd430

Petar Avramovic authored Dec 18, 2018

Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32 
with clampScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D55362

llvm-svn: 349475

150fd430

Fix typo in r349473 · 307839cc
Luke Cheeseman authored Dec 18, 2018
```
llvm-svn: 349474
```
307839cc

Update CallFrameString API to account for r349472 · 9f236d85

Luke Cheeseman authored Dec 18, 2018

- CallFrameString now takes an Arch parameter to account for multiplexing
  overlapping CFI directives

llvm-svn: 349473

9f236d85

[AArch64] - Return address signing dwarf support · f57d7d82

Luke Cheeseman authored Dec 18, 2018

- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472

f57d7d82

[X86][AVX] Add 256/512-bit vector funnel shift tests · ba8e84b3
Simon Pilgrim authored Dec 18, 2018
```
Extra coverage for D55747

llvm-svn: 349471
```
ba8e84b3
[X86][SSE] Add 128-bit vector funnel shift tests · 46b90e85
Simon Pilgrim authored Dec 18, 2018
```
Extra coverage for D55747

llvm-svn: 349470
```
46b90e85

[IPO][AVR] Create new Functions in the default address space specified in the data layout · f920da00

Dylan McKay authored Dec 18, 2018

This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.

In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.

This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.

This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.

Patch by Tim Neumann.

llvm-svn: 349469

f920da00

AMDGPU: Legalize/regbankselect frame_index · c94e26c7
Matt Arsenault authored Dec 18, 2018
```
llvm-svn: 349468
```
c94e26c7
AMDGPU: Legalize/regbankselect fma · c0ea2210
Matt Arsenault authored Dec 18, 2018
```
llvm-svn: 349467
```
c0ea2210

[TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits · af6fbbf1

Simon Pilgrim authored Dec 18, 2018

For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.

llvm-svn: 349466

af6fbbf1

SROA: preserve alignment tags on loads and stores. · 856628f7

Tim Northover authored Dec 18, 2018

When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.

Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.

llvm-svn: 349465

856628f7

GlobalISel: Improve crash on invalid mapping · 1ac38ba7

Matt Arsenault authored Dec 18, 2018

If NumBreakDowns is 0, BreakDown is null.
This trades a null dereference with an assert somewhere
else.

llvm-svn: 349464

1ac38ba7

AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub · e01e7c81
Matt Arsenault authored Dec 18, 2018
```
llvm-svn: 349463
```
e01e7c81

[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits · 8488a44c

Simon Pilgrim authored Dec 18, 2018

(VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1

This works better as part of SimplifyDemandedBits than part of the general combine.

llvm-svn: 349462

8488a44c

build.py: inherit environment in the gcc builder · 78453945

Pavel Labath authored Dec 18, 2018

Summary:
This should enable the compiler to find the system linker for the link
step.

Reviewers: stella.stamenova, zturner

Subscribers: lldb-commits

Differential Revision: https://reviews.llvm.org/D55736

llvm-svn: 349461

78453945

[Tests] fix non-determinism failure in testcase · cf80e72e
Joachim Protze authored Dec 18, 2018
```
llvm-svn: 349460
```
cf80e72e

[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. · 26c630f4

Simon Pilgrim authored Dec 18, 2018

This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).

Test change is merely a rescheduling.

llvm-svn: 349459

26c630f4

[OMPT] First chunk of final OMPT 5.0 interface updates · 0e0d6cdd

Joachim Protze authored Dec 18, 2018

This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.

Patch partially prepared by Simon Convent

Reviewers: hbae
llvm-svn: 349458

0e0d6cdd

[OMPT] Add testcase for thread_num provided by implicit task events · 1f7d4aca
Joachim Protze authored Dec 18, 2018
```
llvm-svn: 349457
```
1f7d4aca

Introduce control flow speculation tracking pass for AArch64 · e66bc1f7

Kristof Beyls authored Dec 18, 2018

The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456

e66bc1f7

Portable Python script across Python version · 3744de52

Serge Guelton authored Dec 18, 2018

In Python2, division between integer yields an integer, while it yields a float in Python3.
Use a combination of from __future__ import division and // operator to get a portable behavior.

Differential Revision: https://reviews.llvm.org/D55204

llvm-svn: 349455

3744de52

Portable Python script across Python version · c0ebe773

Serge Guelton authored Dec 18, 2018

Using from __future__ import print_function it is possible to have a compatible behavior of `print(...)` across Python version.

Differential Revision: https://reviews.llvm.org/D55213

llvm-svn: 349454

c0ebe773

[unittests] Remove superfluous semicolon, fixing warnings with GCC. NFC. · 85833393
Martin Storsjö authored Dec 18, 2018
```
llvm-svn: 349453
```
85833393

[Driver] Automatically enable -munwind-tables if -fseh-exceptions is enabled · 56f9c81c

Martin Storsjö authored Dec 18, 2018

For targets where SEH exceptions are used by default (on MinGW,
only x86_64 so far), -munwind-tables are added automatically. If
-fseh-exeptions is enabled on a target where SEH exeptions are
availble but not enabled by default yet (aarch64), we need to
pass -munwind-tables if -fseh-exceptions was specified.

Differential Revision: https://reviews.llvm.org/D55749

llvm-svn: 349452

56f9c81c

[AArch64] [MinGW] Allow enabling SEH exceptions · 8f0cb9c3

Martin Storsjö authored Dec 18, 2018

The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451

8f0cb9c3

[X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when... · 284d426f

Craig Topper authored Dec 18, 2018

[X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used.

A similar things happen to TBM instructions which we already have tests for.

llvm-svn: 349450

284d426f

Portable Python script across Python version · 73cf752f

Serge Guelton authored Dec 18, 2018

ConfigParser module has been renamed as configparser in Python3

Differential Revision: https://reviews.llvm.org/D55200

llvm-svn: 349449

73cf752f

Portable Python script across Python version · c5d97e3e

Serge Guelton authored Dec 18, 2018

Replace `xrange(...)` by either `range(...)` or `list(range(...))` depending on the context.

Differential Revision: https://reviews.llvm.org/D55193

llvm-svn: 349448

c5d97e3e

Portable Python script across Python version · 366c089b

Serge Guelton authored Dec 18, 2018

dict no longer have the `has_key` method in Python3. Instead, one can
use the `in` keyword which already works in Python2.

llvm-svn: 349447

366c089b

[PowerPC][NFC]Update vabsd cases with vselect test cases · bbb461f7

Kewen Lin authored Dec 18, 2018

Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this 
and reviewers can check the differences easily.

llvm-svn: 349446

bbb461f7

[PowerPC] Exploit power9 new instruction setb · 44ace925

Kewen Lin authored Dec 18, 2018

Check the expected pattens feeding to SELECT_CC like:
   (select_cc lhs, rhs,  1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs,  1, -1, cc2), seteq)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs, -1,  1, cc2), seteq)
Further transform the sequence to comparison + setb if hits.

Differential Revision: https://reviews.llvm.org/D53275

llvm-svn: 349445

44ace925

[ExprConstant] Handle compound assignment when LHS has integral type and RHS... · 9f935e87

Tan S. B. authored Dec 18, 2018

[ExprConstant] Handle compound assignment when LHS has integral type and RHS has floating point type

Fixes PR39858

Differential Revision: https://reviews.llvm.org/D55413

llvm-svn: 349444

9f935e87

[NFC] Add new test to cover the lhs scheduling issue for P9. · ecdab5bd
QingShan Zhang authored Dec 18, 2018
```
llvm-svn: 349443
```
ecdab5bd

Automatic variable initialization · 14daa20b

JF Bastien authored Dec 18, 2018

Summary:
Add an option to initialize automatic variables with either a pattern or with
zeroes. The default is still that automatic variables are uninitialized. Also
add attributes to request uninitialized on a per-variable basis, mainly to disable
initialization of large stack arrays when deemed too expensive.

This isn't meant to change the semantics of C and C++. Rather, it's meant to be
a last-resort when programmers inadvertently have some undefined behavior in
their code. This patch aims to make undefined behavior hurt less, which
security-minded people will be very happy about. Notably, this means that
there's no inadvertent information leak when:

- The compiler re-uses stack slots, and a value is used uninitialized.
- The compiler re-uses a register, and a value is used uninitialized.
- Stack structs / arrays / unions with padding are copied.

This patch only addresses stack and register information leaks. There's many
more infoleaks that we could address, and much more undefined behavior that
could be tamed. Let's keep this patch focused, and I'm happy to address related
issues elsewhere.

To keep the patch simple, only some `undef` is removed for now, see
`replaceUndef`. The padding-related infoleaks are therefore not all gone yet.
This will be addressed in a follow-up, mainly because addressing padding-related
leaks should be a stand-alone option which is implied by variable
initialization.

There are three options when it comes to automatic variable initialization:

0. Uninitialized

This is C and C++'s default. It's not changing. Depending on code
generation, a programmer who runs into undefined behavior by using an
uninialized automatic variable may observe any previous value (including
program secrets), or any value which the compiler saw fit to materialize on
the stack or in a register (this could be to synthesize an immediate, to
refer to code or data locations, to generate cookies, etc).

1. Pattern initialization

This is the recommended initialization approach. Pattern initialization's
goal is to initialize automatic variables with values which will likely
transform logic bugs into crashes down the line, are easily recognizable in
a crash dump, without being values which programmers can rely on for useful
program semantics. At the same time, pattern initialization tries to
generate code which will optimize well. You'll find the following details in
`patternFor`:

- Integers are initialized with repeated 0xAA bytes (infinite scream).
- Vectors of integers are also initialized with infinite scream.
- Pointers are initialized with infinite scream on 64-bit platforms because
it's an unmappable pointer value on architectures I'm aware of. Pointers
are initialize to 0x000000AA (small scream) on 32-bit platforms because
32-bit platforms don't consistently offer unmappable pages. When they do
it's usually the zero page. As people try this out, I expect that we'll
want to allow different platforms to customize this, let's do so later.
- Vectors of pointers are initialized the same way pointers are.
- Floating point values and vectors are initialized with a negative quiet
NaN with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff).
NaNs are nice (here, anways) because they propagate on arithmetic, making
it more likely that entire computations become NaN when a single
uninitialized value sneaks in.
- Arrays are initialized to their homogeneous elements' initialization
value, repeated. Stack-based Variable-Length Arrays (VLAs) are
runtime-initialized to the allocated size (no effort is made for negative
size, but zero-sized VLAs are untouched even if technically undefined).
- Structs are initialized to their heterogeneous element's initialization
values. Zero-size structs are initialized as 0xAA since they're allocated
a single byte.
- Unions are initialized using the initialization for the largest member of
the union.

Expect the values used for pattern initialization to change over time, as we
refine heuristics (both for performance and security). The goal is truly to
avoid injecting semantics into undefined behavior, and we should be
comfortable changing these values when there's a worthwhile point in doing
so.

Why so much infinite scream? Repeated byte patterns tend to be easy to
synthesize on most architectures, and otherwise memset is usually very
efficient. For values which aren't entirely repeated byte patterns, LLVM
will often generate code which does memset + a few stores.

2. Zero initialization

Zero initialize all values. This has the unfortunate side-effect of
providing semantics to otherwise undefined behavior, programs therefore
might start to rely on this behavior, and that's sad. However, some
programmers believe that pattern initialization is too expensive for them,
and data might show that they're right. The only way to make these
programmers wrong is to offer zero-initialization as an option, figure out
where they are right, and optimize the compiler into submission. Until the
compiler provides acceptable performance for all security-minded code, zero
initialization is a useful (if blunt) tool.

I've been asked for a fourth initialization option: user-provided byte value.
This might be useful, and can easily be added later.

Why is an out-of band initialization mecanism desired? We could instead use
-Wuninitialized! Indeed we could, but then we're forcing the programmer to
provide semantics for something which doesn't actually have any (it's
uninitialized!). It's then unclear whether `int derp = 0;` lends meaning to `0`,
or whether it's just there to shut that warning up. It's also way easier to use
a compiler flag than it is to manually and intelligently initialize all values
in a program.

Why not just rely on static analysis? Because it cannot reason about all dynamic
code paths effectively, and it has false positives. It's a great tool, could get
even better, but it's simply incapable of catching all uses of uninitialized
values.

Why not just rely on memory sanitizer? Because it's not universally available,
has a 3x performance cost, and shouldn't be deployed in production. Again, it's
a great tool, it'll find the dynamic uses of uninitialized variables that your
test coverage hits, but it won't find the ones that you encounter in production.

What's the performance like? Not too bad! Previous publications [0] have cited
2.7 to 4.5% averages. We've commmitted a few patches over the last few months to
address specific regressions, both in code size and performance. In all cases,
the optimizations are generally useful, but variable initialization benefits
from them a lot more than regular code does. We've got a handful of other
optimizations in mind, but the code is in good enough shape and has found enough
latent issues that it's a good time to get the change reviewed, checked in, and
have others kick the tires. We'll continue reducing overheads as we try this out
on diverse codebases.

Is it a good idea? Security-minded folks think so, and apparently so does the
Microsoft Visual Studio team [1] who say "Between 2017 and mid 2018, this
feature would have killed 49 MSRC cases that involved uninitialized struct data
leaking across a trust boundary. It would have also mitigated a number of bugs
involving uninitialized struct data being used directly.". They seem to use pure
zero initialization, and claim to have taken the overheads down to within noise.
Don't just trust Microsoft though, here's another relevant person asking for
this [2]. It's been proposed for GCC [3] and LLVM [4] before.

What are the caveats? A few!

- Variables declared in unreachable code, and used later, aren't initialized.
This goto, Duff's device, other objectionable uses of switch. This should
instead be a hard-error in any serious codebase.
- Volatile stack variables are still weird. That's pre-existing, it's really
the language's fault and this patch keeps it weird. We should deprecate
volatile [5].
- As noted above, padding isn't fully handled yet.

I don't think these caveats make the patch untenable because they can be
addressed separately.

Should this be on by default? Maybe, in some circumstances. It's a conversation
we can have when we've tried it out sufficiently, and we're confident that we've
eliminated enough of the overheads that most codebases would want to opt-in.
Let's keep our precious undefined behavior until that point in time.

How do I use it:

1. On the command-line:

-ftrivial-auto-var-init=uninitialized (the default)
-ftrivial-auto-var-init=pattern
-ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang

2. Using an attribute:

int dont_initialize_me __attribute((uninitialized));

[0]: https://users.elis.ugent.be/~jsartor/researchDocs/OOPSLA2011Zero-submit.pdf
[1]: https://twitter.com/JosephBialek/status/1062774315098112001
[2]: https://outflux.net/slides/2018/lss/danger.pdf
[3]: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00615.html
[4]: https://github.com/AndroidHardeningArchive/platform_external_clang/commit/776a0955ef6686d23a82d2e6a3cbd4a6a882c31c
[5]: http://wg21.link/p1152

I've also posted an RFC to cfe-dev: http://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html

<rdar://problem/39131435>

Reviewers: pcc, kcc, rsmith

Subscribers: JDevlieghere, jkorous, dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D54604

llvm-svn: 349442

14daa20b

[X86] Add test case for PR40060. NFC · 4adf9ca7
Craig Topper authored Dec 18, 2018
```
llvm-svn: 349441
```
4adf9ca7
[X86] Const correct some helper functions X86InstrInfo.cpp. NFC · 1ff7356f
Craig Topper authored Dec 18, 2018
```
llvm-svn: 349440
```
1ff7356f
[NFC] fix test case issue that with wrong label check. · f5498125
QingShan Zhang authored Dec 18, 2018
```
llvm-svn: 349439
```
f5498125

[CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation · 2a0146e0

Artur Pilipenko authored Dec 18, 2018

    
This is a follow up for rL347910. In the original patch I somehow forgot to pass
the limit from wrappers to the function which actually does the job.

llvm-svn: 349438

2a0146e0

[PowerPC] Improve vec_abs on P9 · 3dac1252

Kewen Lin authored Dec 18, 2018

Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.

Differential Revision: https://reviews.llvm.org/D54783

llvm-svn: 349437

3dac1252