Updated January 13, 2025

Overview of the $450 Open-Source Reasoning Model: Sky-T1

Model Name: Sky-T1-32B-Preview
Developed by: NovaSky, UC Berkeley
Cost to Train: Less than $450
Performance: Matches OpenAI’s o1-preview on various reasoning and coding benchmarks.

Key Features and Capabilities

Open-Source Commitment

All code, model weights, and training data are fully open-source, allowing the academic and open-source communities to replicate and build upon the work.
The model is designed to perform well in both mathematical reasoning and coding tasks.

Training Data

The model was trained using a dataset of 17,000 examples, which includes 5,000 coding problems and 10,000 math problems, along with 1,000 science and puzzle problems.
The training data was curated using a combination of existing models and a reject sampling procedure to ensure high quality.

Technical Specifications

The model was fine-tuned from Qwen2.5-32B-Instruct, a base model without inherent reasoning capabilities.
Training was conducted over 19 hours on 8 H100 GPUs, utilizing DeepSpeed Zero-3 offload technology.

Benchmark Performance

Sky-T1-32B-Preview achieved competitive scores on various benchmarks:
- Math500: 82.4%
- AIME2024: 43.3%
- LiveCodeBench (Easy): 86.3%
- LiveCodeBench (Medium): 56.8%
- LiveCodeBench (Hard): 17.9%
- GPQA-Diamond: 56.8%

Reasoning Techniques

The model employs advanced reasoning techniques that allow it to produce a long internal chain of thought, similar to other high-performing models like OpenAI’s o1 and Gemini 2.0.
It has been noted that model size and data mixture significantly impact performance, with larger models (32B) yielding better results compared to smaller counterparts.

Future Directions

The NovaSky team plans to continue developing more efficient models while maintaining strong reasoning performance and exploring advanced techniques to enhance accuracy.

References

This research highlights the emergence of affordable, open-source AI models that can compete with established models like OpenAI’s o1, showcasing the potential for broader access to advanced AI technologies.