//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
So far I used a simple a=a[a] pattern to test GPU memory latency, but that indexed addressing penalty always bothered me. I finally got around to making the compiler spit out a chain of dependent loads and nothing else. Good start on AMD. I save ~4 or ~12 ns for scalar and vector accesses
7mo
Clamchowder