More

gpjt · 2025-10-06T14:54:01 1759762441

This is a great post on many levels, but what struck me as particularly clever was the use of lm_head to decode the outputs of earlier layers. That linear layer is only trained to decode the output of the last layer, so intuitively it might only be able to do that -- the embedding spaces used between earlier layers might be different and "incompatible". It's really interesting that that is not the case.

gpjt · 2025-09-06T23:12:49 1757200369

Post author here. I agree 100%! The post is the basic maths for people digging in to how LLMs work under the hood -- I wrote a separate one for non-techies who just want to know what they are, at https://www.gilesthomas.com/2025/08/what-ai-chatbots-are-doi...

gpjt · 2025-09-06T23:09:42 1757200182

Check the first link in the parent comment, it's a link to the book.

gpjt · 2025-06-16T00:14:34 1750032874

Congrats! Amazing feeling, isn't it :-)

gpjt · 2025-05-24T13:38:12 1748093892

This, 100%. A full-stack engineer will likely have at least a solid understanding of the HTTP protocol, HTTPS, WebSockets, the interface layer between the frontend server and their chosen Web webdev stack, and so on. Then a more general understanding of networking protocols, TCP vs UDP, DNS, routing, etc. In general, you need to have a solid understanding of the layer below where you're working, some understanding of the layer below that, and so on, less and less detail needed for each layer down.

(That's not to say that you shouldn't bother with learning more -- more knowledge is always good -- or that the OP specifically only knows that. It's more a sensible minimum.)

My own "curriculum" for that has been Jeremy Howard's Fast AI course and Sebastian Raschka's book "build an LLM from scratch". Still working through it, but once I'm done I think I'll be solid on your point 2 above. My guess is that I'll want to learn more, but that's out of interest more than because I think its necessary.

gpjt · 2025-05-12T11:05:32 1747047932

As the author of the original post above, let me say that if that's word salad, it's a Michelin star salad. Just the right mix of lettuce and tomato, and the dressing is spot on :-)

Seriously, though, differentiable hash tables is an awesome way to look at them, I wish I'd heard it before.

gpjt · 2025-05-12T11:00:36 1747047636

Author of the post here -- I'm being careful not to do that. My posts are more about filling in the gaps; they're covering the things that aren't mentioned. The book's target audience is, I think, people with a bit more background knowledge about the inner workings of AI than I have, so I'm having to play catch-up a bit.

gpjt · 2025-05-12T03:05:45 1747019145

Author here: I endorse this comment ;-) That's definitely the route I've optimised for for reading the series.

gpjt · 2025-05-05T13:17:10 1746451030

Another one leaving for Porkbun here.

gpjt · 2025-05-05T01:09:19 1746407359

Huh, I was thinking the same thing, and was wondering whether it was just moving to London. Could be both, I suppose.