korshunov.ai — ML news

Daily ML news, summarized.

Reinforcement Learning Advances in LLM Reasoning

The State of Reinforcement Learning for LLM Reasoning explores how reasoning capabilities in large language models are evolving through reinforcement learning. OpenAI's o3 reasoning model, trained with 10× more compute than o1, demonstrates significant progress using RL methods, highlighting that scaling alone is insufficient and targeted RL training is essential for improving model accuracy on complex tasks.

Ahead of AI · Source
Coding LLMs from the Ground Up: A Complete Course

A comprehensive 15-hour course teaches how to build large language models from scratch, covering topics from text data preparation to instruction fine-tuning. The content includes seven detailed videos and a bonus overview of LLM evolution from 2018 to 2025, originally developed as supplementary material for a book and now offered as standalone learning resources.

Ahead of AI · Source
LLM Research Papers: The 2025 List (January to June)

The author shares a curated list of LLM research papers from January to June 2025, organized by topic such as reasoning models, reinforcement learning, and multimodal systems. The list includes 30 papers, with a strong focus on training and inference-time reasoning strategies, and is released in bi-annual updates to remain timely and digestible.

Ahead of AI · Source
Understanding and Coding the KV Cache in LLMs from Scratch

A KV cache stores key and value vectors from previous token generations to avoid redundant computations during inference, significantly speeding up text generation in large language models. The article provides a from-scratch, human-readable implementation of the KV cache, explaining its conceptual basis and code structure, while noting it increases memory usage and is not applicable during training.

Ahead of AI · Source
Ten Open-Weight LLM Architectures Released Jan-Feb 2026

A comprehensive overview of ten major open-weight large language model releases from January to February 2026 is presented, highlighting architectural features such as Mixture-of-Experts, sliding window attention, and parameter efficiency. Key models include Arcee AI's Trinity Large (400B params with 13B active), z.AI's GLM-5, and Cohere's Tiny Aya, with comparisons emphasizing design similarities and innovations in attention mechanisms and scalability.

Ahead of AI · Source
Big LLM Architecture Comparison Highlights DeepSeek V3 Innovations

The article examines key architectural advancements in modern large language models, focusing on DeepSeek V3 and its subsequent reasoning model DeepSeek R1. It highlights two innovations: Multi-Head Latent Attention (MLA) and Mixture-of-Experts (MoE), which improve computational efficiency and distinguish DeepSeek V3 from other LLMs.

Ahead of AI · Source
Understanding and Implementing Qwen3 From Scratch

This article provides a hands-on, code-based implementation of Qwen3 models in pure PyTorch. It highlights Qwen3's popularity due to its developer-friendly Apache 2.0 license, strong performance on benchmarks like LMArena, and a wide range of model sizes from 0.6B to 480B parameters.

Ahead of AI · Source
OpenAI Releases gpt-oss-120b and gpt-oss-20b Models

OpenAI has released gpt-oss-120b and gpt-oss-20b, their first fully open-weight large language models since GPT-2 in 2019. The models are designed with architectural optimizations, including MXFP4, enabling local execution on GPUs, with the 20B model running on consumer hardware and the 120B model on an H100 or newer.

Ahead of AI · Source
The 4 Main Approaches to LLM Evaluation

The four main approaches to evaluating large language models are multiple-choice question answering, verifiers, leaderboards, and LLM judges. These methods fall into benchmark-based and judgment-based categories, with multiple-choice benchmarks like MMLU being historically prominent. The article provides a clear overview of each method and includes from-scratch code examples to illustrate their implementation and limitations.

Ahead of AI · Source
Beyond Standard LLMs: Alternative Architectures Emerge

While most large language models remain autoregressive decoder-style transformers, recent years have seen the rise of alternatives such as text diffusion models and linear attention hybrids. These approaches aim to improve efficiency or modeling performance, with some like code world models targeting specific use cases. The article highlights a range of open-weight models and notes that non-traditional architectures deserve deeper exploration in future coverage.

Ahead of AI · Source
DeepSeek V3.2: Sparse Attention and RL Updates

DeepSeek released V3.2, an updated flagship model featuring sparse attention and reinforcement learning improvements. The model demonstrates strong performance comparable to GPT-5 and Gemini 3.0 Pro, and is available as an open-weight model, expanding access to advanced language capabilities.

Ahead of AI · Source
LLM Research Papers: The 2025 List (July to December)

A curated list of research papers on large language models from July to December 2025 is presented, categorized by topics such as reasoning models, reinforcement learning, inference strategies, model releases, architectures, and multimodal systems. The list was created as a separate post to improve readability and accessibility, alongside the publication of the annual "State of LLMs 2025" review article.

Ahead of AI · Source
Categories of Inference-Time Scaling for Improved LLM Reasoning

The article categorizes inference-time scaling techniques to improve large language model reasoning without altering model weights. It covers methods like Chain-of-Thought Prompting, Self-Consistency, Best-of-N Ranking, Rejection Sampling, Self-Refinement, and Search Over Solution Paths, with a focus on practical implementations and recent academic work.

Ahead of AI · Source
DeepSeek R1 Introduces RLVR and GRPO for LLM Reasoning

DeepSeek R1, released in January 2025, introduced Reinforcement Learning with Verifiable Rewards (RLVR) using the GRPO algorithm to enable reasoning in large language models. The open-weight model performed comparably to top proprietary models and suggested training costs of around $5 million—significantly lower than previous estimates—though it does not account for research and development expenses.

Ahead of AI · Source
A Visual Guide to Attention Variants in Modern LLMs

The article presents a visual gallery of 45 LLM architectures with model cards, covering recent attention variants used in open-source models. It includes a poster version available on Redbubble, designed for print with clear visuals, though smaller text elements may be hard to read at reduced sizes.

Ahead of AI · Source
My Workflow for Understanding LLM Architectures

The author outlines a workflow for reverse-engineering LLM architectures by starting with technical papers and then using available config files and reference implementations from Hugging Face and the transformers library. This approach is effective for open-weight models, as working code provides concrete, verifiable details that papers often lack.

Ahead of AI · Source
Six Components of a Coding Agent

The article outlines six key components of a coding agent, emphasizing the role of agent harnesses in managing code context, tool use, memory, and iterative feedback. It distinguishes between LLMs, reasoning models, and agents, explaining that agents operate as control loops using models and tools, with coding harnesses being specialized for software engineering tasks.

Ahead of AI · Source
Recent LLM Architectures Focus on Long-Context Efficiency

Newer large language model architectures emphasize long-context efficiency by reducing KV-cache size and attention costs. Key innovations include KV sharing in Gemma 4, layer-wise attention budgeting in Laguna XS.2, compressed convolutional attention in ZAYA1-8B, and mHC with compressed attention in DeepSeek V4.

Ahead of AI · Source
LLM Research Papers: The 2026 List (January to May)

A curated list of research papers from January to May 2026 focuses on reasoning models, reinforcement learning, agent systems, and long-context capabilities. It highlights hybrid architectures like Nemotron 3 and Qwen3.6, with emphasis on efficiency and practical applications in real-world agent systems.

Ahead of AI · Source

korshunov.ai — ML news

Reinforcement Learning Advances in LLM Reasoning

Coding LLMs from the Ground Up: A Complete Course

LLM Research Papers: The 2025 List (January to June)

Understanding and Coding the KV Cache in LLMs from Scratch

Ten Open-Weight LLM Architectures Released Jan-Feb 2026

Big LLM Architecture Comparison Highlights DeepSeek V3 Innovations

Understanding and Implementing Qwen3 From Scratch

OpenAI Releases gpt-oss-120b and gpt-oss-20b Models

The 4 Main Approaches to LLM Evaluation

Beyond Standard LLMs: Alternative Architectures Emerge

DeepSeek V3.2: Sparse Attention and RL Updates

LLM Research Papers: The 2025 List (July to December)

Categories of Inference-Time Scaling for Improved LLM Reasoning

DeepSeek R1 Introduces RLVR and GRPO for LLM Reasoning

A Visual Guide to Attention Variants in Modern LLMs

My Workflow for Understanding LLM Architectures

Six Components of a Coding Agent

Recent LLM Architectures Focus on Long-Context Efficiency

LLM Research Papers: The 2026 List (January to May)