Description
How to Build LLM from Scratch (34 Pages)
By limiting the use of libraries and focusing on math and coding, I could code the architecture from a very basic point of view. I also came here with a few notes and handwritten notes to note. (Handwriting is kinda cringy!)
GPT architecture looks complex, but if studied properly then everything comes under the:
👉 Metrics and tensors: You should be comfortable handling tensors and understanding their dimensions.
👉 Probability and Statistics: Softmax, Layer normalization, and the multinomial distribution play a very important role in building GPT.
👉 Calculus: To train GPT, we need to run backpropagation. The chain rule is the core of this.
In my notes I’ve mentioned all the steps involved in building GPT-2 from scratch as follows:
👉 Learning about LLM (Large Language Models)
👉 Stages of building LLM
👉 Data preprocessing
👉 Cleaning and tokenizing text
👉 Transformer architecture
👉 Attention mechanisms (Multi Head Attention)
👉 Coding & Training Model
Reviews
There are no reviews yet.