Overview of the $450 Open-Source Reasoning Model: Sky-T1
Model Name: Sky-T1-32B-Preview
Developed by: NovaSky, UC Berkeley
Cost to Train: Less than $450
Performance: Matches OpenAI’s o1-preview on various reasoning and coding benchmarks.
Key Features and Capabilities
Open-Source Commitment
- All code, model weights, and training data are fully open-source, allowing the academic and open-source communities to replicate and build upon the work.
- The model is designed to perform well in both mathematical reasoning and coding tasks.
Training Data
- The model was trained using a dataset of 17,000 examples, which includes 5,000 coding problems and 10,000 math problems, along with 1,000 science and puzzle problems.
- The training data was curated using a combination of existing models and a reject sampling procedure to ensure high quality.
Technical Specifications
- The model was fine-tuned from Qwen2.5-32B-Instruct, a base model without inherent reasoning capabilities.
- Training was conducted over 19 hours on 8 H100 GPUs, utilizing DeepSpeed Zero-3 offload technology.
Benchmark Performance
- Sky-T1-32B-Preview achieved competitive scores on various benchmarks:
- Math500: 82.4%
- AIME2024: 43.3%
- LiveCodeBench (Easy): 86.3%
- LiveCodeBench (Medium): 56.8%
- LiveCodeBench (Hard): 17.9%
- GPQA-Diamond: 56.8%
Reasoning Techniques
- The model employs advanced reasoning techniques that allow it to produce a long internal chain of thought, similar to other high-performing models like OpenAI’s o1 and Gemini 2.0.
- It has been noted that model size and data mixture significantly impact performance, with larger models (32B) yielding better results compared to smaller counterparts.
Future Directions
- The NovaSky team plans to continue developing more efficient models while maintaining strong reasoning performance and exploring advanced techniques to enhance accuracy.
References
This research highlights the emergence of affordable, open-source AI models that can compete with established models like OpenAI’s o1, showcasing the potential for broader access to advanced AI technologies.