- Core problem - Deploying post-training pipelines on open-weights models like Qwen3 need SFT+RL on many diff environments. expensive and annoying.
Solutions:
- DreamGym - Train in Dreams via code world model
- Autoharness - RLMs, have the agents write their own testing environments
- Similar to above, but deploy preset github environments instead
Benchmark to beat:
5. NeMo-Gym - the standardized benchmark