Papers Explainedv377: Fathom-R1. Fathom-R1–14B is a 14-billion-parameter… | by Ritvik Rastogi

Fathom-R1–14B is a 14-billion-parameter reasoning language mannequin derived from Deepseek-R1-Distilled-Qwen-14B, fine-tuned for mathematical reasoning by Fractal.

The fashions and datasets can be found at HuggingFace.

We start by curating a high-quality mathematical corpus from the next open-source datasets:

Open-R1 — default subset
Numina — Olympiads & AOPS_forum (phrase issues, float sort solutions)

After rigorous deduplication and decontamination, roughly ~100K distinctive issues are consolidated forming the preliminary corpus for all subsequent trainings.

Coaching Recipe for Fathom-R1–14B-v0.6

SFT on troublesome questions and their reasoning chains has confirmed efficient for enhancing reasoning skill. Constructing on this, this coaching stage goals to enhance the mannequin’s efficiency on difficult mathematical issues utilizing an iterative curriculum studying technique, with a most sequence size of 16k. Curriculum studying (CL) is a well-established methodology for coaching LLMs, the place the mannequin is progressively uncovered to more and more troublesome duties. This method helps scaffold extra complicated reasoning, enhancing generalization and lowering overfitting. On this case, CL is carried out iteratively, which means a number of iterations of CL are carried out.

For dataset preparation, every query’s problem is annotated utilizing OpenAI’s o3mini mannequin. Solely questions rated above common are retained and additional filtered to incorporate these with resolve charges between 0.2 and 0.7. This course of leads to the Iterative Curriculum Studying dataset, comprising 5K examples.

Coaching Recipe for Fathom-R1–14B-v0.4-RS

The technique for creating this checkpoint entails a two-stage pipeline:

First Stage (Leveraging RL for environment friendly test-time pondering):

Curate a seed dataset guaranteeing minimal reward however room for progress, comprising questions with resolve charges inside a selected vary, forming a 7.7K query RL Compression dataset.
Practice the bottom mannequin, DeepSeek-R1-Distill-Qwen-14B, utilizing the GRPO algorithm with a 6k token sequence size restrict.
The mannequin learns to generate concise responses, displaying improved efficiency at decrease token limits.

Second Stage (Leveraging SFT to enhance reasoning effectively at increased sequence size):

Construct upon the RL checkpoint and carry out SFT with a 16K context window to reinforce detailed reasoning for complicated issues.
Curate a dataset of exhausting issues with decrease resolve charges, forming a 9.5K instance SFT Shortest Chains dataset.
Supervised fine-tuning on this dataset stabilizes the mannequin’s reasoning at as much as 16K sequence size.

The ensuing mannequin, Fathom-R1–14B-v0.4, is optimized for concise but correct mathematical reasoning.

Coaching Recipe for Fathom-R1–14B-v0.4

Given the efficiency enchancment observed throughout the second fine-tuning stage of growing Fathom-R1–14B-v0.4-RS and in an try and additional scale back the associated fee, an experiment was performed by eliminating RL and immediately performing second stage SFT on Deepseek-R1-Distilled-Qwen-14B base mannequin.

Mannequin Merging

Given v0.6 and v0.4 fashions have been developed by following completely different coaching methodologies, linear merging is carried out to mix the strengths to acquire last 2 checkpoints.

Fathom-R1–14B: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (SFT-Shortest-Chains)
Fathom-R1–14B-RS: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (RL-compression + SFT-Shortest-Chains)

Fathom‑R1–14B demonstrates extremely aggressive efficiency throughout all datasets, bettering over the unique R1-distilled fashions whereas intently matching or surpassing different sturdy baselines in a number of settings.
On each AIME 25 and HMMT 25, our mannequin exhibits the very best go@1 in addition to cons@64 scores amongst all of the open-source fashions (together with the larger R1-Distilled-32B mannequin), with R1–670B being the one exception.
Fathom-R1–14B is superior to the primary two generations of OpenAI’s mini-reasoning fashions, together with o1-mini and o3-mini-low- and its efficiency intently matches that of newly launched o4-mini-low (self-consistency decoding).

Fathom-R1: $499 Training Recipe for Unlocking Math Reasoning at o4-mini level with just 14B parameters under 16K context

Source link

🔥 “A Fireside Chat Between Three Minds: JEPA, Generative AI, and Agentic AI Debate the Future” | by pawan | May, 2025

Decoding Complexity: My Journey with Gemini Multimodality and Multimodal RAG | by Yaswanth Ippili | May, 2025

Understanding Reward Models in Large Language Models: A Deep Dive into Reinforcement Learning | by Shawn | May, 2025

Return-to-Office Push Meets Employee Pushback — What’s Next?

6 key lessons in building and enjoying your wealth

Develop a Lifetime of New Skills for Only $20

I Use the 6-Week Sprint Method For Better Product Development — and More. Here’s Why You Need It, Too.

Optimasi Model Machine Learning. Optimalkan model machine learning… | by Yasun Studio | May, 2025

Most Popular

Waabi says its virtual robotrucks are realistic enough to prove the real ones are safe

Designing a new way to optimize complex coordinated systems | MIT News

The Future of AI in Business: Trends to Watch in 2025 and Beyond

Our Picks

Top Python Libraries for Machine Learning | by Expert App Devs | Apr, 2025

Universal Fine-Tuning Framework (UFTF): A Versatile and Efficient Approach to Fine-Tuning Language Models | by Frank Morales Aguilera | AI Simplified in Plain English | Mar, 2025

Your iPhone’s a Data Scientist — But a Very Private One. | by Shusrita Venugopal | May, 2025

Papers Explainedv377: Fathom-R1. Fathom-R1–14B is a 14-billion-parameter… | by Ritvik Rastogi | May, 2025

Coaching Recipe for Fathom-R1–14B-v0.6

Coaching Recipe for Fathom-R1–14B-v0.4-RS

Coaching Recipe for Fathom-R1–14B-v0.4

Mannequin Merging

Related Posts