Projects

Mapox

JAX-native, multi-agent, partially-observable gridworld environments with a shared observation/action format.

Mapox Trainer Writeup

High performance Jax based multi-agent ppo implementation with first class support for transformers.

VALM Writeup

JAX-based framework for online RL with LLMs. Custom Qwen3 implementation with LoRA support.

Self Play Tic-Tac-Toe Demo

Learning to match minmax in tictactoe with less than a minute of self play on a single GPU.

Yelp Language Model

Learning exercise training a 109m parameter transformer from scratch to generate yelp reviews and predict ratings.