TLDW logo

How Generative AI / LLM Actually Works (ChatGPT is Not Magic)

By Sachin Hiriyanna

Summary

## Key takeaways - **AI Isn't Magic: Just Math Functions**: Generative AI feels like magic but under the hood it's massive math functions, weights, biases, and expensive trial and error. We're drawing the architecture to show exactly how computers learn to predict the future. [00:15], [00:32] - **Weights & Biases Tune Connections**: Every line connecting neurons has a weight that determines how important that connection is; high weight passes signal strongly, near zero ignores it. Bias is an extra number added to shift the activation point. [01:49], [02:12] - **Activation Functions Add Nonlinearity**: The neuron math wx + b is linear like a straight line, but the real world is curved, so we wrap it in activation functions like ReLU or sigmoid. This asks if the number is big enough to fire the neuron. [03:07], [03:27] - **Backpropagation Nudges Weights**: Backpropagation and gradient descent send error signals backward: decrease weights on the left, increase on the right, nudging them tiny bits. It repeats billions of times until error reaches zero. [05:13], [05:21] - **Self-Attention Resolves Ambiguity**: Transformers use self-attention to calculate relationships between every word simultaneously, assigning scores like realizing 'tired' applies to animal not street in 'the animal didn't cross because it was too tired.' This holds context over thousands of words. [05:55], [06:14] - **Training Costs $10M-$100M**: Training frontier models like GPT-4 requires thousands of GPUs running months at full capacity, costing $10 million to $100 million per run and as much electricity as a town uses in a year. [07:01], [07:42]

Topics Covered

  • Neural nets are tunable math knobs
  • AI learns by predicting next word
  • Backprop nudges weights downhill
  • Self-attention tracks long context
  • AI mirrors data, not truth

Full Transcript

Generative AI. It's the technology that lets machine create something new. Text,

images, code, even music. It often looks like something a human made. But how?

Most people think it's magic. It isn't.

Some people think it's a copy pasting from Google. It isn't that either. To

from Google. It isn't that either. To

understand this, we [snorts] need to stop looking at the results and start looking at the engine. We need to talk

about math functions, ways, biases, and most expensive trial and error process in the human history. In this video,

we're going to draw the architecture of a brain, calculate the cost of training, and explain exactly how a computer learns to predict the future.

It starts here. The neural network. This

is inspired by a human brain, but mathematically it's much simpler.

Think of this circle, this artificial neuron, as a holding container for a number. It receives input from the left,

number. It receives input from the left, processes them, and then sends an output to the right.

When we stack these together, we get a network.

Uh there are three important parts to this. The input layer, this is your

this. The input layer, this is your data. If you're analyzing [snorts] an

data. If you're analyzing [snorts] an image, these circles hold the pixel values, brightness for example. The

hidden layers, this is where the magic happens. This is the deep uh learning.

happens. This is the deep uh learning.

The output layer, this is the prediction. Is this a cat or a dog?

prediction. Is this a cat or a dog?

But here is a key concept, weights and biases. Every line connecting the

biases. Every line connecting the circles has a weight w. The weight

determines how important that connection is. If the weight is high, the signal

is. If the weight is high, the signal passes through strongly. If it's near zero, the signal is ignored.

The bias B is an extra number added to the neuron to shift the activation point up or down.

Deep tech takeaway. Um a neuron a you know neural network is essentially millions of knobs. Training an AI

is just the process of tuning these knobs uh which is weights and biases until the output is correct.

At its core, generative AI is just a massive function y equals f ofx where x is your prompt something like

write me a poem.

Y is an output the poem.

F is the mortal which is the GPT claude or Gemini.

If we zoom into single neuron the math looks like this. Input time weight plus

bias that is wx + b. But this is linear.

It's just a straight line. The real

world the real world is curved and complex. So we wrap this mesh in

complex. So we wrap this mesh in something called activation function.

Common functions are rectified linear unit relu or sigmoid.

Basically this function asks is this number big enough to matter? If yes it

fires the neuron. If no it stays dark.

Now imagine this simple math equation hap happening billions of times simultaneously.

GPT4 for example is rumored to have over a trillion parameters. That means over trillion weights that's W and biases uh

B being calculated for every single token it generates.

So how do we set these trillion weights?

We don't. The mission does it itself through training. This is the most

through training. This is the most computationally expensive part.

The model reads massive amounts of text, the internet, books, code, etc. It tries to predict the next word. Let's say the

input is the cat sat on the the model guesses it as car. We know the answer is math.

The model calculates how wrong it was.

The calculation is called loss function.

It measures the mathematical distance between its gas uh which is car here and the truth the mat.

Now for the most important algorithm in AI back propagation and gradient descent.

The model looks at the error, turns around, and sends a signal backward through the network. It says to the neurons, "Hey, we got it wrong. You guys

on the left, decrease your weight. You

guys on the right, increase your weight." It nudges the weights just a

weight." It nudges the weights just a tiny bit. Then it tries again. It

tiny bit. Then it tries again. It

repeats this process billions of times.

Eventually, the ball reaches the bottom of the valley. The error is zero. The

model now knows the cat sat on the mat.

Before 2017, AI was bad at long sentences. It would forget the beginning

sentences. It would forget the beginning of the paragraph. By the time it reached the end, then came the transformer architecture.

The breakthrough was a mechanism called selfattension.

in this sentence that is the animal didn't cross the street because it was too tired. The word it refers to what?

too tired. The word it refers to what?

The animal or the street.

The transformer calculates relationships between every single word in a sentence at the same time. It assigns a

mathematical score to how related words are. It realizes that tired usually

are. It realizes that tired usually applies to animals, not streets. So, it

pays more attention to the word animal.

This allows the AI to hold context over thousands of words, writing code or essays that stay coherent.

This math isn't free. To perform these matrix multiplications, we need GPUs uh which are graphics processing units.

Uh these chips are designed to do thousands of tiny math problems in parallel. That's the key. Training a

parallel. That's the key. Training a

frontier model like GPD4 or llama 3 requires clusters of thousands of these GPUs running at 100% capacity for

months. The cost depending on the model

months. The cost depending on the model size a single training run can cost between $10 million

to $100 million.

and the energy training a large model consumes as much electricity as a single town uses in a year. This is why AI is

constrained by hardware and energy not just code.

So does it know anything?

Technically no. It doesn't have beliefs or feelings or truth.

It is a statistical engine. It is a mirror of the data it was trained on. It

compresses uh the patterns of human knowledge into a file of weights and biases.

So uh even though it's just math, the result is a tool that amplifies our ability to create, build, and you know

solve problems. Thanks for watching. If you want to see us code a neural network from scratch in Python, let me know in the comments.

Loading...

Loading video analysis...