Grok-2-mini just got a speed upgrade. Over the past few days, we have substantially improved our inference stack. These gains come from using custom algorithms for computation and communication kernels, along with more efficient batch scheduling and quantization.
Our inference