Inlay

Profile

hi, i made those ArcSys script modding tools, and the Xrd SAMMI/WebSockets integration. posting stuff i make here. @TopTwentyNotes on twitter Support my work: https://ko-fi.com/pangaea__

Pangaea

as a comparison, this is what the model was doing a couple months ago bsky.app/profile/topt...

i think the only thing i'm lacking now is RL to give the model the goal of winning, not just pure imitation of player habits. also more data, there is really no limit to the amount of replay data i would like to have, it's clearly still overfocusing on a few weird rounds (e.g. the backdash habit)

for those who dont know what "RL" means: its reinforcement learning. the models i was showing before were trained to mimic player behavior as accurately as possible, this one has a second phase of training which tells it to estimate which decisions were "good" and focus on those actions

working on figuring out the meter gain values for each character, i've created a table of each characters values, most of them are fairly obvious, just meter added and scaled by tension pulse somehow, but i can't figure out what "Guard Balance Tension" is supposed to be. any ideas?

things i've tried so far: - getting hit/hitting an opponent with varying RISC values - blocking/making an opponent block with varying RISC values none of them have resulted in any change the the meter gained by either player, i'm quite confused, because it varies by character, i assume they use it

Xrd model now uses RL, this isn't a finished training run, and no online RL has been applied yet as i'm still working on the mod setup for that, but i'm very happy with the progress! here's the model playing against itself, i think it's a massive step up from the last footage i posted

switched to training on whole rounds instead of small context windows, i'm pretty shocked by how big the improvement is compared to what i had before. this also enables performance optimizations which make it way easier to run larger models in realtime. here's a Johnny vs Faust round as an example

17d

11d

17d

Video

Pangaea