//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
clamtech.org?dest=intel_u... - Sharing some thoughts on Intel's possible unified core project. Basically, I think the easiest route is a Zen 4c/5c style shrink of their P-Core. But of course, Intel has more options than that
Sharing a piece I wrote a while ago on Zen 1. I mostly did this to test my site design, with plenty of pagination, tables, and captioned images. It's a pretty complete article by itself though, and I hope yall find it a fun read! clamtech.org?dest=zen1
I wanted to bring in RDNA4 since I have an example of that card now, but never found the time. That's stuck on the back of a long todo list :/
Time for a little site with some multi-page support! I plan to write random thoughts on hardware there. To start, here's some commentary on drilling down GPU cache latency using very funny OpenCL kernels: clamtech.org?dest=gpudire...
clamtech.org?dest=gpuwrite Here's a look at GPU cache/memory write bandwidth across a variety of hardware
Intel's desktop Arrow Lake always keeps the SNCU (die to die interface and some other parts of the uncore) at 2.6 GHz. On Meteor Lake, it goes up to 2.4 GHz but varies a lot probably to save power.
Sharing another piece I wrote last year, comparing hardware AV1 encoding on Intel's Arc B580 and AMD's Hawk Point, at clamtech.org?dest=av1hwenc Should be a good test of image handling on the site, with sliders for quality comparisons
So far I used a simple a=a[a] pattern to test GPU memory latency, but that indexed addressing penalty always bothered me. I finally got around to making the compiler spit out a chain of dependent loads and nothing else. Good start on AMD. I save ~4 or ~12 ns for scalar and vector accesses