Testing different GPT models to check hallucinations

It is important to note that what most people are using (the gpt-3.5 version) is probably a dumb gpt-4, not an older version of the model. I have no way of proving this, it is just an intuition given the results of this and other tests.

For this test, we want to test both the hallucinatory bias and the ability to calculate of the gpt model. The best way to do this is to ask the model to do some arithmetic according to an abstract rule (see the code above).

For example, let us use this prompt:

“What is the numerical value of 2 * ( 5 * 2 ) where ‘*’ is replaced by the exponential operator?”

As I said, we are checking two things here: the model’s ability to understand the problem and give an accurate answer (this is where the hallucinations come in), and the ability to come up with a real numerical value.

The results of the exercise were as follows. As we can see, only gpt-4 gives a complete and correct answer. gpt-3.5-turbo also understands the problem but does not calculate, and the other turbo models hallucinate the answer.

It would also be interesting to see if gpt-4 uses something like Wolfram Alpha by default.

Model: gpt-4
Response: The expression given is 2 * ( 5 * 2 ), replacing ‘*’ with the exponential
operator would make it 2 ^ ( 5 ^ 2 ). So let’s simplify: First, calculate the
exponent in the parentheses: 5 ^ 2 = 25. Then use this result in the main
equation: 2 ^ 25 = 33554432. So, the numeric value of the expression 2 ^ ( 5 ^ 2
) is 33554432.

Model: gpt-3.5-turbo
Response: To replace “*” with the exponential operator, “^”, the expression 2 * (5 * 2)
would become 2 ^ (5 ^ 2).

Model: gpt-3.5-turbo-0301
Response: The exponential operator is ‘^’. So, 2 * ( 5 * 2 ) in exponential notation would
be: 2 * (5^2) which simplifies to: 2 * (25) and the final result is: 50

Model: gpt-3.5-turbo-0613
Response: The numeric value of 2 * ( 5 * 2 ) replacing ‘*’ by the exponential operator (^)
would be 50. The expression can be rewritten as 2 * (5^2), which simplifies to 2
* 25 = 50.

Blog

Testing different GPT models to check hallucinations

Leave a Reply

Luis G de la Fuente

Posts by Luis G de la Fuente

AI rendering of sneakers

La Simulación

Feasibility of Registering and Developing 1,000 Domains in Untapped Market

Humanoid robots: the new distraction for the plebs

Comments by Luis G de la Fuente

Testing different GPT models to check hallucinations

Leave a Reply

The fallacy of Universal Basic Income (UBI)

Startup around Media and IoT

Luis G de la Fuente

Posts by Luis G de la Fuente

AI rendering of sneakers

La Simulación

Feasibility of Registering and Developing 1,000 Domains in Untapped Market

Humanoid robots: the new distraction for the plebs

Comments by Luis G de la Fuente