[X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256 This is an improvement to 88dacbd4