Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in fcad8d36.