Papers Explainedv377: Fathom-R1. Fathom-R1–14B is a 14-billion-parameter… | by Ritvik Rastogi

Fathom-R1–14B is a 14-billion-parameter reasoning language mannequin derived from Deepseek-R1-Distilled-Qwen-14B, fine-tuned for mathematical reasoning by Fractal.

The fashions and datasets can be found at HuggingFace.

We start by curating a high-quality mathematical corpus from the next open-source datasets:

Open-R1 — default subset
Numina — Olympiads & AOPS_forum (phrase issues, float sort solutions)

After rigorous deduplication and decontamination, roughly ~100K distinctive issues are consolidated forming the preliminary corpus for all subsequent trainings.

Coaching Recipe for Fathom-R1–14B-v0.6

SFT on troublesome questions and their reasoning chains has confirmed efficient for enhancing reasoning skill. Constructing on this, this coaching stage goals to enhance the mannequin’s efficiency on difficult mathematical issues utilizing an iterative curriculum studying technique, with a most sequence size of 16k. Curriculum studying (CL) is a well-established methodology for coaching LLMs, the place the mannequin is progressively uncovered to more and more troublesome duties. This method helps scaffold extra complicated reasoning, enhancing generalization and lowering overfitting. On this case, CL is carried out iteratively, which means a number of iterations of CL are carried out.

For dataset preparation, every query’s problem is annotated utilizing OpenAI’s o3mini mannequin. Solely questions rated above common are retained and additional filtered to incorporate these with resolve charges between 0.2 and 0.7. This course of leads to the Iterative Curriculum Studying dataset, comprising 5K examples.

Coaching Recipe for Fathom-R1–14B-v0.4-RS

The technique for creating this checkpoint entails a two-stage pipeline:

First Stage (Leveraging RL for environment friendly test-time pondering):

Curate a seed dataset guaranteeing minimal reward however room for progress, comprising questions with resolve charges inside a selected vary, forming a 7.7K query RL Compression dataset.
Practice the bottom mannequin, DeepSeek-R1-Distill-Qwen-14B, utilizing the GRPO algorithm with a 6k token sequence size restrict.
The mannequin learns to generate concise responses, displaying improved efficiency at decrease token limits.

Second Stage (Leveraging SFT to enhance reasoning effectively at increased sequence size):

Construct upon the RL checkpoint and carry out SFT with a 16K context window to reinforce detailed reasoning for complicated issues.
Curate a dataset of exhausting issues with decrease resolve charges, forming a 9.5K instance SFT Shortest Chains dataset.
Supervised fine-tuning on this dataset stabilizes the mannequin’s reasoning at as much as 16K sequence size.

The ensuing mannequin, Fathom-R1–14B-v0.4, is optimized for concise but correct mathematical reasoning.

Coaching Recipe for Fathom-R1–14B-v0.4

Given the efficiency enchancment observed throughout the second fine-tuning stage of growing Fathom-R1–14B-v0.4-RS and in an try and additional scale back the associated fee, an experiment was performed by eliminating RL and immediately performing second stage SFT on Deepseek-R1-Distilled-Qwen-14B base mannequin.

Mannequin Merging

Given v0.6 and v0.4 fashions have been developed by following completely different coaching methodologies, linear merging is carried out to mix the strengths to acquire last 2 checkpoints.

Fathom-R1–14B: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (SFT-Shortest-Chains)
Fathom-R1–14B-RS: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (RL-compression + SFT-Shortest-Chains)

Fathom‑R1–14B demonstrates extremely aggressive efficiency throughout all datasets, bettering over the unique R1-distilled fashions whereas intently matching or surpassing different sturdy baselines in a number of settings.
On each AIME 25 and HMMT 25, our mannequin exhibits the very best go@1 in addition to cons@64 scores amongst all of the open-source fashions (together with the larger R1-Distilled-32B mannequin), with R1–670B being the one exception.
Fathom-R1–14B is superior to the primary two generations of OpenAI’s mini-reasoning fashions, together with o1-mini and o3-mini-low- and its efficiency intently matches that of newly launched o4-mini-low (self-consistency decoding).

Fathom-R1: $499 Training Recipe for Unlocking Math Reasoning at o4-mini level with just 14B parameters under 16K context

Source link

Patterns at Your Fingertips: A Practitioner’s Journey into Fingerprint Classification | by Everton Gomede, PhD | Jun, 2025

The Age of Thinking Machines: Are We Ready for AI with a Mind of Its Own? | by Mirzagalib | Jun, 2025

Gaussian-Weighted Word Embeddings for Sentiment Analysis | by Sgsahoo | Jun, 2025

Hd#شماره خاله تهران# شماره خاله تهرانپارس# شماره خاله تهرانسر# شماره خاله انقلاب شماره خاله ونک…

ByteDance InfiniteYou: AI model to Generate Character Consistent images | by Mehul Gupta | Data Science in your pocket | Mar, 2025

314 Things the Government Might Know About You

Chobani Is Building a Billion Dollar Dairy Factory in NY

NBA Hall of Famer Paul Pierce Just Walked 20 Miles to Work

Most Popular

Land More Gigs with This AI-Powered Job App Assistant for Just $55

Deploying Machine Learning Models with FastAPI | by Abhishek Shaw | Mar, 2025

CPI Report: Inflation Dropped in March. Will the Fed Cut Rates?

Our Picks

Phase two of military AI has arrived

Is OpenAI Training AI on Copyrighted Data? A Deep Dive into the Controversy | by Brandon Hepworth | Apr, 2025

Openlayer Raises $14.5 Million Series A

Papers Explainedv377: Fathom-R1. Fathom-R1–14B is a 14-billion-parameter… | by Ritvik Rastogi | May, 2025

Coaching Recipe for Fathom-R1–14B-v0.6

Coaching Recipe for Fathom-R1–14B-v0.4-RS

Coaching Recipe for Fathom-R1–14B-v0.4

Mannequin Merging

Related Posts