[ASan][libc++] std::basic_string annotations (#72677)
This commit introduces basic annotations for `std::basic_string`, mirroring the approach used in `std::vector` and `std::deque`. Initially, only long strings with the default allocator will be annotated. Short strings (_SSO - short string optimization_) and strings with non-default allocators will be annotated in the near future, with separate commits dedicated to enabling them. The process will be similar to the workflow employed for enabling annotations in `std::deque`. **Please note**: these annotations function effectively only when libc++ and libc++abi dylibs are instrumented (with ASan). This aligns with the prevailing behavior of Memory Sanitizer. To avoid breaking everything, this commit also appends `_LIBCPP_INSTRUMENTED_WITH_ASAN` to `__config_site` whenever libc++ is compiled with ASan. If this macro is not defined, string annotations are not enabled. However, linking a binary that does **not** annotate strings with a dynamic library that annotates strings, is not permitted. Originally proposed here: https://reviews.llvm.org/D132769 Related patches on Phabricator: - Turning on annotations for short strings: https://reviews.llvm.org/D147680 - Turning on annotations for all allocators: https://reviews.llvm.org/D146214 This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. This Pull Request introduces basic annotations for `std::basic_string`. Long strings exhibit structural similarities to `std::vector` and will be annotated accordingly. Short strings are already implemented, but will be turned on separately in a forthcoming commit. Look at [a comment](https://github.com/llvm/llvm-project/pull/72677#issuecomment-1850554465) below to read about SSO issues at current moment. Due to the functionality introduced in [D132522](https://github.com/llvm/llvm-project/commit/dd1b7b797a116eed588fd752fbe61d34deeb24e4), the `__sanitizer_annotate_contiguous_container` function now offers compatibility with all allocators. However, enabling this support will be done in a subsequent commit. For the time being, only strings with the default allocator will be annotated. If you have any questions, please email: - advenam.tacet@trailofbits.com - disconnect3d@trailofbits.com
Loading