How do language models really work? A look behind the scenes of ChatGPT and co.
-
- Recommended
-
Daniel -
October 26, 2024 at 3:15 PM -
256 Views -
0 Comments
- From a jumble of text to an "intelligent" answer: the training process
- Transformer architecture: The heart of an AI model
- How is an answer created? From token to language
- Why not a real database? Facts vs. probabilities
- Adaptation and improvement: fine-tuning with supervised learning
- Conclusion: Not a classic database, but a clever probability model
Artificial intelligence has developed rapidly in recent years, and generative AI models such as ChatGPT are at the forefront. But how do they actually work? How does a model manage to provide meaningful and sometimes surprisingly creative answers to our questions even though it does not have access to a traditional database? In this blog post, we take a look behind the scenes and explain how a language model works.
From a jumble of text to an "intelligent" answer: the training process
Before ChatGPT can give intelligent answers, it needs to learn properly. The first step is a huge collection of texts from various sources: Books, websites, articles, forums - anything that will get the model up to speed. Unlike a database, however, a language model does not store facts, but recognises patterns and connections between words.
- Collecting and preparing data: The training data is available in many languages and on a wide variety of topics. However, a language model does not "read" texts like we do, but breaks them down into mathematical units. Each word (or part of a word) becomes a "token" that the model can understand.
- Self-learning through prediction: The model learns by trying to predict the next word in a sentence. So the question is: What is the best next word? The model runs through countless such scenarios and develops an idea of which words often occur together.
Transformer architecture: The heart of an AI model
This is where the "Transformer architecture" comes into play - the technological backbone of ChatGPT. What makes it special: Transformers process all the words in a sentence simultaneously and analyse the relationships between them.
- Self-attention mechanism:
Imagine the model reads: "The cat chased the mouse." Thanks to the self-attention technique, it understands that "cat" and "chased" are related. Regardless of how far apart they are. - Layer by layer to meaning:
The model is organised in many layers that recognise the pattern in the data in ever greater detail. Each word is converted into a kind of meaning code - a bit like a mathematical "meaning coordinate" for each word. And the millions of weightings that the model creates? These are effectively the model's memory: all probabilities and correlations are stored here.
How is an answer created? From token to language
When you ask the model a question, your input is also broken down into tokens. The following then happens:
- Calculating probabilities:
ChatGPT calculates which word or token is most likely to appear next in the sentence. The selection is based on what the model has "learnt" in training - it is a reconstruction of what fits best. - The art of text generation:
The model generates word by word, but not always with the same certainty. Parameters such as "temperature" control whether the answer should be "creative" or "logically structured". Do you want a strictly rational answer? Then the temperature can be low. If you want it to be more creative, it will be raised.
Why not a real database? Facts vs. probabilities
A language model like ChatGPT doesn't rely on stored facts - it reconstructs text based on probabilities. It's as if, after a while of practice, you get better and better at quickly finding the right answers in conversation without having memorised everything word for word. Therefore, in some cases it can provide the correct answer, but in other cases it can also "hallucinate" and simply construct something plausible sounding.
QuoteFun Fact: Since ChatGPT has no database, it cannot know who will win the European Football Championship in 2024 (unless we rewrite the model after the event). It only works with patterns and probabilities, not a real knowledge base.
Adaptation and improvement: fine-tuning with supervised learning
Large AI models are also regularly fine-tuned to improve the user experience and provide useful answers to common questions. Through additional training methods such as Reinforcement Learning from Human Feedback (RLHF), the model can "learn" in which cases its answers were helpful and in which not. It therefore learns from the reactions in order to respond more and more precisely to user input.
Conclusion: Not a classic database, but a clever probability model
To summarise: Generative AI models such as ChatGPT do not work like a database that simply stores information and spits it out at the touch of a button. Instead, they are huge probability calculators that recognise patterns and correlations in texts and generate plausible answers on this basis. The artificial intelligence behind ChatGPT is therefore not an "encyclopaedia", but a clever mixture of statistics and language comprehension - fascinating and a little scary at the same time.
Participate now!
Don’t have an account yet? Register yourself now and be a part of our community!