Coding Model Env Post-Training

Core problem - Deploying post-training pipelines on open-weights models like Qwen3 need SFT+RL on many diff environments. expensive and annoying.

Solutions:

DreamGym - Train in Dreams via code world model
Autoharness - RLMs, have the agents write their own testing environments
1. Similar to above, but deploy preset github environments instead

Benchmark to beat:
5. NeMo-Gym - the standardized benchmark