Per my suspicions after trying images with 4o, “natively fully multimodal” makes it sound like vision in 4 was tacked on in some way (which makes sense given the mistakes it makes). Makes me wonder how natively multimodal images are in Gemini, because I’ve found it makes the same