This is an issue that has plagued the machine learning field since long before this latest generative AI craze. Decision trees you can understand, SVMs and Naive Bayes too, but the moment you get into automatic feature extraction and RBF kernels and stuff like that, it becomes difficult to understand how the verdicts issued by the model relate to the real world. Having said that, I’m pretty sure GPTs are even more inscrutable and made the problem worse.
Each of them is tuned by the input data, which is a long and expensive process.
But surely the history of how this data is tuned/created is kept track of. If you want to know how a specific value is created you ideally should be able to reference the entire history of how it changed over time.
I’m not saying this would be easy, but you could have people whose entire job is to understand this and with unlimited amounts of time to do so if it is important enough. And it seems like it would be important enough and such people would be very valuable.
Now that AI is first taking off is exactly the time to establish the precedent that we do not let it escape the understanding and control of humans.
The issue is that the values of the parameters don’t correspond to traditional variables. Concepts in AI are not represented with discrete variables and quantities. A concept may be represented in a distributed way across thousands or millions of neurons. You can look at each individual neuron and say, oh, this neuron’s weight is 0.7142, and this neuron’s weight is 0.2193, etc., across all the billions of neurons in your model, but you’re not going to be able to connect a concept from the output back to the behavior of those individual parameters because they only work in aggregate.
You can only know that an AI system knows a concept based on its behavior and output, not from individual neurons. And AI systems are quite like humans in that regard. If your professor wants to know if you understand calculus, or if the DMV wants to know if you can safely drive a car, they give you a test: can you perform the desired output behavior (a correct answer, a safe drive) when prompted? Understanding how an idea is represented across billions of parameters in an AI system is no more feasible than your professor trying to confirm you understand calculus by scanning your brain to find the exact neuronal connections that represent that knowledge.
Well the thing is that good AI models aren’t manually tuned. There’s not some poor intern turning a little knob and seeing if it’s slightly more accurate, it happens on its own. The more little knobs there are the better the model is. This means essentially you have no idea how any knob ultimately effects every other knob cause there’s thousands of them and any little change can completely change something else.
Look at “simple” AI for playing like Super Mario World https://youtu.be/qv6UVOQ0F44 shits already pretty complicated and this thing is stupid. It’s only capable of playing the first level
Here’s the summary for the wikipedia article you mentioned in your comment:
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was initially released on March 14, 2023, and has been made publicly available via the paid chatbot product ChatGPT Plus, and via OpenAI's API. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.: 2 Observers reported that the iteration of ChatGPT using GPT-4 was an improvement on the previous iteration based on GPT-3.5, with the caveat that GPT-4 retains some of the problems with earlier revisions. GPT-4 is also capable of taking images as input on ChatGPT. OpenAI has declined to reveal various technical details and statistics about GPT-4, such as the precise size of the model.
The machine learning part finds values for the coefficients a, b and c.
Even if you stepped through the code, you will see the equation be evaluated just fine, but you still won’t know why the coefficients are the way they are. Oh and there are literally billions of coefficients.
An Artificial Neural Network isn’t exactly an algorithm. There are algorithms to “run” ANNs, but the ANN itself is really a big bundle of equations.
An ANN has an input layer of neurons and an output layer. Between them are one or more hidden layers. Each neuron in one layer is connected to each neuron in the next layer. Let’s do without hidden layers for a start. Let’s say we are interested in handwriting. We take a little grayscale image of a letter (say, 16*16 pixels) and want to determine if it shows an upper case “A”.
Your input layer would have 16*16= 256 neurons and your output layer just 1. Each input value is a single number representing how bright that pixel is. You take these 256 numbers, multiply each one by another number, representing the strength of the connection between each of the input neurons and the single output neuron. Then you add them up and that value represents the likelihood of the image showing an “A”.
I think that wouldn’t work well (or at all) without a hidden layer but IDK.
The numbers representing the strength of the connections, are the parameters of the model, aka the weights. In this extremely simple case, they can be interpreted easily. If a parameter is large, then that pixel being dark makes it more likely that we have an “A”. If it’s negative, then it’s less likely. Finding these numbers/parameters/weights is what training a model means.
When you add a hidden layer, things get murky. You have an intermediate result and don’t know what it represents.
The impressive AI models take much more input, produce much more diverse output and have many hidden layers. The small ones, you can run on a gaming PC, have several billion parameters. The big ones, like ChatGPT, have several 100 billion. Each of these numbers is potentially involved in creating the output.
I do exactly this kind of thing for my day job. In short: reading a syntactic description of an algorithm written in assembly language is not the equivalent of understanding what you’ve just read, which in turn is not the equivalent of having a concise and comprehensible logical representation of what you’ve just read, which in turn is not the equivalent of understanding the principles according to which the logical system thus described will behave when given various kinds of input.
Because of the aforementioned automatic feature extraction.
In this case, the algorithm chooses itself what feature is relevant when making decisions. The problem is that those features are almost impossible to decript since they are often list of numbers.
Here is a simple video that breaks down how neurons work in machine learning. It can give you an idea about how this works and why it would be so difficult for a human to reverse engineer.
https://youtu.be/aircAruvnKk?si=RpX2ZVYeW6HV7dHv
They provide a simple example with a few thousand neurons, and even then, we can’t easily tell what the network is doing, because the neurons do not produce any traditional computer code with logic that can be followed. They are just a collection of weights and biases (a bunch of numbers) which transform the input in a some way that the computer decided that it can arrive at the solution. GPT4 contains well over a trillion neurons, for comparison.
No. The training output is essentially a set of huge matrices, and using the model involves taking your input and those matrices and chaining a lot of matrix multiplications (how many and how big they are depends on the complexity of the model) to get your result. It is just simply not possible to understand that because none of the data has any sort of fixed responsibility or correspondence with specific features.
This is probably not exactly how it works, I’m not a ML guy, just someone who watched some of those “training a model to play a computer game” videos years ago, but it should at the very least be a close enough analogy.
An oversimplification but Imagine you have an algebraic math function where every word in English can be assigned a number.
x+y+z=n where x y z are the three words in a sentence. N is the next predicted word based on the coefficients of the previous 3.
Now imagine you have 10 trillion coefficients instead of 3. That’s an LLM, more or less. Except it’s done procedurally and there’s actually not that many input variables (context window) just a lot of coefficients per input
It’s still inscrutable, but it makes more sense if you think of all these as arbitrary function approximation on higher dimension manifolds. The reason we can’t generate traditional numerical solvers for these problems is because the underlying analytical models fall apart when you over-parameterize them. Backprop is very robust at extreme parameter counts, and comes with much weaker assumptions compared to things like series decomposition, so it really just looks like a generic numerical method which can scale to absurd levels.
This is an issue that has plagued the machine learning field since long before this latest generative AI craze. Decision trees you can understand, SVMs and Naive Bayes too, but the moment you get into automatic feature extraction and RBF kernels and stuff like that, it becomes difficult to understand how the verdicts issued by the model relate to the real world. Having said that, I’m pretty sure GPTs are even more inscrutable and made the problem worse.
This may be a dumb question, but why can’t you set the debugger on and step thru the program to see why it branches the way it does?
Because it doesn’t have branches, it has neurons - and A LOT of them.
Each of them is tuned by the input data, which is a long and expensive process.
At the end, you hope your model has noticed patterns and not doing stuff at random.
But all you see is just weights on countless neurons.
Not sure I’m describing it correctly though.
But surely the history of how this data is tuned/created is kept track of. If you want to know how a specific value is created you ideally should be able to reference the entire history of how it changed over time.
I’m not saying this would be easy, but you could have people whose entire job is to understand this and with unlimited amounts of time to do so if it is important enough. And it seems like it would be important enough and such people would be very valuable.
Now that AI is first taking off is exactly the time to establish the precedent that we do not let it escape the understanding and control of humans.
The issue is that the values of the parameters don’t correspond to traditional variables. Concepts in AI are not represented with discrete variables and quantities. A concept may be represented in a distributed way across thousands or millions of neurons. You can look at each individual neuron and say, oh, this neuron’s weight is 0.7142, and this neuron’s weight is 0.2193, etc., across all the billions of neurons in your model, but you’re not going to be able to connect a concept from the output back to the behavior of those individual parameters because they only work in aggregate.
You can only know that an AI system knows a concept based on its behavior and output, not from individual neurons. And AI systems are quite like humans in that regard. If your professor wants to know if you understand calculus, or if the DMV wants to know if you can safely drive a car, they give you a test: can you perform the desired output behavior (a correct answer, a safe drive) when prompted? Understanding how an idea is represented across billions of parameters in an AI system is no more feasible than your professor trying to confirm you understand calculus by scanning your brain to find the exact neuronal connections that represent that knowledge.
Well the thing is that good AI models aren’t manually tuned. There’s not some poor intern turning a little knob and seeing if it’s slightly more accurate, it happens on its own. The more little knobs there are the better the model is. This means essentially you have no idea how any knob ultimately effects every other knob cause there’s thousands of them and any little change can completely change something else.
Look at “simple” AI for playing like Super Mario World https://youtu.be/qv6UVOQ0F44 shits already pretty complicated and this thing is stupid. It’s only capable of playing the first level
“Rumors claim that GPT-4 has 1.76 trillion parameters”
https://en.m.wikipedia.org/wiki/GPT-4
I’m not sure even unlimited time would help understand what’s really going on.
You could build another model to try to decipher te first, but how much could you trust it?
Here’s the summary for the wikipedia article you mentioned in your comment:
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was initially released on March 14, 2023, and has been made publicly available via the paid chatbot product ChatGPT Plus, and via OpenAI's API. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.: 2 Observers reported that the iteration of ChatGPT using GPT-4 was an improvement on the previous iteration based on GPT-3.5, with the caveat that GPT-4 retains some of the problems with earlier revisions. GPT-4 is also capable of taking images as input on ChatGPT. OpenAI has declined to reveal various technical details and statistics about GPT-4, such as the precise size of the model.
article | about
imagine you have a simple equation:
ax + by + cz
The machine learning part finds values for the coefficients a, b and c.
Even if you stepped through the code, you will see the equation be evaluated just fine, but you still won’t know why the coefficients are the way they are. Oh and there are literally billions of coefficients.
An Artificial Neural Network isn’t exactly an algorithm. There are algorithms to “run” ANNs, but the ANN itself is really a big bundle of equations.
An ANN has an input layer of neurons and an output layer. Between them are one or more hidden layers. Each neuron in one layer is connected to each neuron in the next layer. Let’s do without hidden layers for a start. Let’s say we are interested in handwriting. We take a little grayscale image of a letter (say, 16*16 pixels) and want to determine if it shows an upper case “A”.
Your input layer would have 16*16= 256 neurons and your output layer just 1. Each input value is a single number representing how bright that pixel is. You take these 256 numbers, multiply each one by another number, representing the strength of the connection between each of the input neurons and the single output neuron. Then you add them up and that value represents the likelihood of the image showing an “A”.
I think that wouldn’t work well (or at all) without a hidden layer but IDK.
The numbers representing the strength of the connections, are the parameters of the model, aka the weights. In this extremely simple case, they can be interpreted easily. If a parameter is large, then that pixel being dark makes it more likely that we have an “A”. If it’s negative, then it’s less likely. Finding these numbers/parameters/weights is what training a model means.
When you add a hidden layer, things get murky. You have an intermediate result and don’t know what it represents.
The impressive AI models take much more input, produce much more diverse output and have many hidden layers. The small ones, you can run on a gaming PC, have several billion parameters. The big ones, like ChatGPT, have several 100 billion. Each of these numbers is potentially involved in creating the output.
I do exactly this kind of thing for my day job. In short: reading a syntactic description of an algorithm written in assembly language is not the equivalent of understanding what you’ve just read, which in turn is not the equivalent of having a concise and comprehensible logical representation of what you’ve just read, which in turn is not the equivalent of understanding the principles according to which the logical system thus described will behave when given various kinds of input.
Because of the aforementioned
automatic feature extraction
. In this case, the algorithm chooses itself what feature is relevant when making decisions. The problem is that those features are almost impossible to decript since they are often list of numbers.Can’t you determine how and why that choice is made?
What if you had a team of people whose only job was to understand this? After awhile they would get better and better at it.
Here is a simple video that breaks down how neurons work in machine learning. It can give you an idea about how this works and why it would be so difficult for a human to reverse engineer. https://youtu.be/aircAruvnKk?si=RpX2ZVYeW6HV7dHv
They provide a simple example with a few thousand neurons, and even then, we can’t easily tell what the network is doing, because the neurons do not produce any traditional computer code with logic that can be followed. They are just a collection of weights and biases (a bunch of numbers) which transform the input in a some way that the computer decided that it can arrive at the solution. GPT4 contains well over a trillion neurons, for comparison.
No. The training output is essentially a set of huge matrices, and using the model involves taking your input and those matrices and chaining a lot of matrix multiplications (how many and how big they are depends on the complexity of the model) to get your result. It is just simply not possible to understand that because none of the data has any sort of fixed responsibility or correspondence with specific features.
This is probably not exactly how it works, I’m not a ML guy, just someone who watched some of those “training a model to play a computer game” videos years ago, but it should at the very least be a close enough analogy.
An oversimplification but Imagine you have an algebraic math function where every word in English can be assigned a number.
x+y+z=n where x y z are the three words in a sentence. N is the next predicted word based on the coefficients of the previous 3.
Now imagine you have 10 trillion coefficients instead of 3. That’s an LLM, more or less. Except it’s done procedurally and there’s actually not that many input variables (context window) just a lot of coefficients per input
It’s still inscrutable, but it makes more sense if you think of all these as arbitrary function approximation on higher dimension manifolds. The reason we can’t generate traditional numerical solvers for these problems is because the underlying analytical models fall apart when you over-parameterize them. Backprop is very robust at extreme parameter counts, and comes with much weaker assumptions compared to things like series decomposition, so it really just looks like a generic numerical method which can scale to absurd levels.