Projects
JAX-native, multi-agent, partially-observable gridworld environments with a shared observation/action format.
High performance Jax based multi-agent ppo implementation with first class support for transformers.
JAX-based framework for online RL with LLMs. Custom Qwen3 implementation with LoRA support.
Learning to match minmax in tictactoe with less than a minute of self play on a single GPU.
Learning exercise training a 109m parameter transformer from scratch to generate yelp reviews and predict ratings.