for those who dont know what "RL" means: its reinforcement learning. the models i was showing before were trained to mimic player behavior as accurately as possible, this one has a second phase of training which tells it to estimate which decisions were "good" and focus on those actions