Recent advancements in Large Language Models (LLMs) have shown significant improvements in mathematical reasoning tasks. Tiny LLMs, such as Llama8B, achieved a 96.7% score on the GSM8K math benchmark, surpassing GPT-4, Claude, and Gemini, despite having 200 times fewer parameters. This success is attributed to techniques such as Monte Carlo Tree Search (MCTS) and backpropagation, similar to those used by Google to solve Go. Additionally, vLLM now supports FP8 quantization, optimizing performance and efficiency. Open-source LLMs like Qwen 2 and Nemotron are rapidly advancing, with fine-tunes expected to match top models like Gemini and GPT-4 turbo. Llama-3 70b was replaced in weeks.