Fathom-R1–14B is a 14-billion-parameter reasoning language mannequin derived from Deepseek-R1-Distilled-Qwen-14B, fine-tuned for mathematical reasoning by Fractal.
The fashions and datasets can be found at HuggingFace.
We start by curating a high-quality mathematical corpus from the next open-source datasets:
- Open-R1 — default subset
- Numina — Olympiads & AOPS_forum (phrase issues, float sort solutions)
After rigorous deduplication and decontamination, roughly ~100K distinctive issues are consolidated forming the preliminary corpus for all subsequent trainings.
Coaching Recipe for Fathom-R1–14B-v0.6
SFT on troublesome questions and their reasoning chains has confirmed efficient for enhancing reasoning skill. Constructing on this, this coaching stage goals to enhance the mannequin’s efficiency on difficult mathematical issues utilizing an iterative curriculum studying technique, with a most sequence size of 16k. Curriculum studying (CL) is a well-established methodology for coaching LLMs, the place the mannequin is progressively uncovered to more and more troublesome duties. This method helps scaffold extra complicated reasoning, enhancing generalization and lowering overfitting. On this case, CL is carried out iteratively, which means a number of iterations of CL are carried out.
For dataset preparation, every query’s problem is annotated utilizing OpenAI’s o3mini mannequin. Solely questions rated above common are retained and additional filtered to incorporate these with resolve charges between 0.2 and 0.7. This course of leads to the Iterative Curriculum Studying dataset, comprising 5K examples.
Coaching Recipe for Fathom-R1–14B-v0.4-RS
The technique for creating this checkpoint entails a two-stage pipeline:
First Stage (Leveraging RL for environment friendly test-time pondering):
- Curate a seed dataset guaranteeing minimal reward however room for progress, comprising questions with resolve charges inside a selected vary, forming a 7.7K query RL Compression dataset.
- Practice the bottom mannequin, DeepSeek-R1-Distill-Qwen-14B, utilizing the GRPO algorithm with a 6k token sequence size restrict.
- The mannequin learns to generate concise responses, displaying improved efficiency at decrease token limits.
Second Stage (Leveraging SFT to enhance reasoning effectively at increased sequence size):
- Construct upon the RL checkpoint and carry out SFT with a 16K context window to reinforce detailed reasoning for complicated issues.
- Curate a dataset of exhausting issues with decrease resolve charges, forming a 9.5K instance SFT Shortest Chains dataset.
- Supervised fine-tuning on this dataset stabilizes the mannequin’s reasoning at as much as 16K sequence size.
The ensuing mannequin, Fathom-R1–14B-v0.4, is optimized for concise but correct mathematical reasoning.
Coaching Recipe for Fathom-R1–14B-v0.4
Given the efficiency enchancment observed throughout the second fine-tuning stage of growing Fathom-R1–14B-v0.4-RS and in an try and additional scale back the associated fee, an experiment was performed by eliminating RL and immediately performing second stage SFT on Deepseek-R1-Distilled-Qwen-14B base mannequin.
Mannequin Merging
Given v0.6 and v0.4 fashions have been developed by following completely different coaching methodologies, linear merging is carried out to mix the strengths to acquire last 2 checkpoints.
- Fathom-R1–14B: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (SFT-Shortest-Chains)
- Fathom-R1–14B-RS: Obtained by way of merging Fathom-R1–14B-V0.6 (Iterative Curriculum SFT) and Fathom-R1–14B-V0.4 (RL-compression + SFT-Shortest-Chains)
- Fathom‑R1–14B demonstrates extremely aggressive efficiency throughout all datasets, bettering over the unique R1-distilled fashions whereas intently matching or surpassing different sturdy baselines in a number of settings.
- On each AIME 25 and HMMT 25, our mannequin exhibits the very best go@1 in addition to cons@64 scores amongst all of the open-source fashions (together with the larger R1-Distilled-32B mannequin), with R1–670B being the one exception.
- Fathom-R1–14B is superior to the primary two generations of OpenAI’s mini-reasoning fashions, together with o1-mini and o3-mini-low- and its efficiency intently matches that of newly launched o4-mini-low (self-consistency decoding).