Skip to content
Unverified Commit 4263b2ec authored by peterbell10's avatar peterbell10 Committed by GitHub
Browse files

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672)

In openai/triton#2483 I've encountered a bug in the NVPTX codegen. Given
`load<8 x half>` followed by `fpext to <8 x float>` we get

```
ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];
```

Which loads float16 values into float registers without any conversion
and the result is simply garbage.

This PR brings `v8f16` and `v8bf16` into line with the other vector
types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar
parent bfbfd1ca
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment