[ARM][BFloat] Implement bf16 get/set_lane without casts to i16 vectors
Currently, in order to extract an element from a bf16 vector, we cast the vector to an i16 vector, perform the extraction, and cast the result to bfloat. This behavior was copied from the old fp16 implementation. The goal of this patch is to achieve optimal code generation for lane copying intrinsics in a subsequent patch (LLVM fails to fold certain combinations of bitcast, insertelement, extractelement and shufflevector instructions leading to the generation of suboptimal code). Differential Revision: https://reviews.llvm.org/D82206
Loading
Please sign in to comment