If you were interested in my cryptic posts on how to train Chameleon-like models up to 4x faster, check out our MoMa paper which covers a detailed overview of most of our architectural improvements.
tl;dr adaptive compute in 3-dim, modality, width, depth.