Servo is now passing 1.9M subtests at wpt.fyi: 1,903,187 / 2,131,264 (89,3%) 🎉
See wpt.fyi/results/?pro...
Life hack, this.
I wrote a new Agentic text-to-SQL benchmark and tested every local model I could against it: sql-benchmark.nicklothian.com
Thanks to DuckDB WASM you can try your own models from the browser.
Regretfully, the story about LLMs anti-polarizing people was not real.
This article has been making the rounds, but someone else pointed out that the "evidence" is based on LLM-simulated users. I have extremely low faith in the validity of these results, especially given the established stickiness of political beliefs and the known issues with in-silica simulation
Each year nuclear adding only as much net global power capacity as renewables add every two days. Game over.
.. “Claaaaaude, can you please take a look at this pty ?” - Claude looks at the screen, fixes a couple of snags, and gives it back to me. No copypaste, just switching between the two windows. Super convenient, i am now launching tttt+claude pretty much with every terminal 😂
Now that i’ve mastered a live-reload of a compiled rust binary without losing the child processes underneath (or with restart of them in the case of Claude —restart), pondering if I wanna build a “bootstrap agent” whose only tool is “one-shot another tool, if it compiles, add and restart”.…
If you're reviewing ARR papers and want a tool to help you spot potential hallucinated references, I cooked this up for the ACL SACs and thought I would share it with the broader community github.com/davidjurgens...
Servo
Tttt was super useful today - first tell Claude ”hey go read about this thing (netlab.tools) for me, and make a one-shot install script for my DGX Spark“, then go open a secondary terminal, launch the script there, hit an error, and…
Video
Andrew Yourtchenko
LLM/agents experimenting, Rust, 3D-printing and active mobility. Release manager for VPP. Bits of code: GitHub.com/ayourtch ; all posts are entirely only mine. 🇪🇺