Inlay

//

Profile

Loading...

clamtech.org?dest=intel_u... - Sharing some thoughts on Intel's possible unified core project. Basically, I think the easiest route is a Zen 4c/5c style shrink of their P-Core. But of course, Intel has more options than that

Sharing a piece I wrote a while ago on Zen 1. I mostly did this to test my site design, with plenty of pagination, tables, and captioned images. It's a pretty complete article by itself though, and I hope yall find it a fun read! clamtech.org?dest=zen1

I wanted to bring in RDNA4 since I have an example of that card now, but never found the time. That's stuck on the back of a long todo list :/

Time for a little site with some multi-page support! I plan to write random thoughts on hardware there. To start, here's some commentary on drilling down GPU cache latency using very funny OpenCL kernels: clamtech.org?dest=gpudire...

clamtech.org?dest=gpuwrite Here's a look at GPU cache/memory write bandwidth across a variety of hardware

Intel's desktop Arrow Lake always keeps the SNCU (die to die interface and some other parts of the uncore) at 2.6 GHz. On Meteor Lake, it goes up to 2.4 GHz but varies a lot probably to save power.

Sharing another piece I wrote last year, comparing hardware AV1 encoding on Intel's Arc B580 and AMD's Hawk Point, at clamtech.org?dest=av1hwenc Should be a good test of image handling on the site, with sliders for quality comparisons

So far I used a simple a=a[a] pattern to test GPU memory latency, but that indexed addressing penalty always bothered me. I finally got around to making the compiler spit out a chain of dependent loads and nothing else. Good start on AMD. I save ~4 or ~12 ns for scalar and vector accesses