- Sep 11, 2017
-
-
Kostya Kortchinsky authored
Summary: Some of glibc's own thread local data is destroyed after a user's thread local destructors are called, via __libc_thread_freeres. This might involve calling free, as is the case for strerror_thread_freeres. If there is no prior heap operation in the thread, this free would end up initializing some thread specific data that would never be destroyed properly (as user's pthread destructors have already been called), while still being deallocated when the TLS goes away. As a result, a program could SEGV, usually in __sanitizer::AllocatorGlobalStats::Unregister, where one of the doubly linked list links would refer to a now unmapped memory area. To prevent this from happening, we will not do a full initialization from the deallocation path. This means that the fallback cache & quarantine will be used if no other heap operation has been called, and we effectively prevent the TSD being initialized and never destroyed. The TSD will be fully initialized for all other paths. In the event of a thread doing only frees and nothing else, a TSD would never be initialized for that thread, but this situation is unlikely and we can live with that. Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37697 llvm-svn: 312939
-
- Jul 12, 2017
-
-
Kostya Kortchinsky authored
Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
-
- May 05, 2017
-
-
Kostya Kortchinsky authored
Summary: This change adds Android support to the allocator (but doesn't yet enable it in the cmake config), and should be the last fragment of the rewritten change D31947. Android has more memory constraints than other platforms, so the idea of a unique context per thread would not have worked. The alternative chosen is to allocate a set of contexts based on the number of cores on the machine, and share those contexts within the threads. Contexts can be dynamically reassigned to threads to prevent contention, based on a scheme suggested by @dvyuokv in the initial review. Additionally, given that Android doesn't support ELF TLS (only emutls for now), we use the TSan TLS slot to make things faster: Scudo is mutually exclusive with other sanitizers so this shouldn't cause any problem. An additional change made here, is replacing `thread_local` by `THREADLOCAL` and using the initial-exec thread model in the non-Android version to prevent extraneous weak definition and checks on the relevant variables. Reviewers: kcc, dvyukov, alekseyshl Reviewed By: alekseyshl Subscribers: srhines, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D32649 llvm-svn: 302300
-
- Apr 27, 2017
-
-
Kostya Kortchinsky authored
Summary: This change introduces scudo_tls.h & scudo_tls_linux.cpp, where we move the thread local variables used by the allocator, namely the cache, quarantine cache & prng. `ScudoThreadContext` will hold those. This patch doesn't introduce any new platform support yet, this will be the object of a later patch. This also changes the PRNG so that the structure can be POD. Reviewers: kcc, dvyukov, alekseyshl Reviewed By: dvyukov, alekseyshl Subscribers: llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D32440 llvm-svn: 301584
-