GPT-5 Disappoints

Aug 8, 2025

But the router can learn, and quality will rise as it does

15 Comments

Aug 9, 2025

This is brilliant work Maria! I’m not a computer scientist but I understood it all. I write about the impacts of gen AI on human-made media and the behaviour of the hi-techs. Would love to interview you at some stage.👏👏👏

Reply (1)

Maria Sukhareva

Aug 9, 2025

Thank you:) sure, drop me a message:)

Blake from WTF Over

Aug 9, 2025

Oh boy, can’t wait for my power bill to go up, yet again. I live in a county with a lot of data centers and power bills have already doubled.

Reply (1)

Maria Sukhareva

Aug 9, 2025

I am curious which country it is?

Reply (2)

Ken Kovar

Aug 9, 2025

Chatghanistan probably 🤨

Blake from WTF Over

Aug 10, 2025

https://www.12onyourside.com/2025/04/02/dominion-energy-proposes-increase-rates-starting-summer-2025/

Adam Kucharski

Aug 9, 2025

Thanks for sharing. Also been wondering if there’s a calibration issue with task management, I.e. the triage model that is user facing is systematically overconfident about how easy a task is (and hence sends it to underperforming tool/model).

Claus Wilke

Aug 9, 2025

Not sure what the correct answer for 7,11=3,555x is supposed to be. ChatGPT interprets 7,11 as 7.11 and correctly calculates 7.11/3,555 = 0.002. If you interpret 7,11 as 711 then the answer would be 0.2, but there is no way to solve the equation without making some assumption. Probably the best behavior would be to ask whether 7,11 was a typo.

Though I just tried it with ChatGPT (not sure which version, whichever is free) and it wrote "I’ll assume that commas are decimal separators (European style)" and then arrived at x=2. But I'll note that even in the UK you don't use commas as separators, so by using commas and writing the instruction in English you have created a mixed task that has no clear solution.

Reply (1)

Maria Sukhareva

Aug 9, 2025

In Germany, it’s used as a decimal. The thinking model does it alright. ChatGPT also knows I am in Germany, both from memory and my IP.

Interesting you figured out how it ends up at that answer. It would have never occurred to me that it thinks that comma means different things on each side

Reply (1)

Alistair Windsor

Aug 10, 2025Edited

If you ask the question in German you get the expected answer. I asked

Bestimme x

7,11=3,555x

And got back x=2.

7,11 is not a well formed number in the English speaking world but 3,555 is. It “fixed” 7,11.

Reply (1)

Richard Maunder

Aug 11, 2025

You might well be correct, but then there is an inconsistency as it is treating , as decimal point for one number and 1000s separator for other within same text/question, which again hints at a real lack of utility / 'intelligence' - if a math student did this you would regard them as sloppy / badly taught?

Reply (1)

Alistair Windsor

Aug 12, 2025

I would not! In part because the first time I did the problem I got the GPT answer. I saw a . in place of the comma in 7,11 because that is what I expected there ;) However, I want my AI to at least be more attentive than me.

Richard Maunder

Aug 11, 2025

Any idea why there are only two distinct modes - is this due to fundamental changes in the architecture etc? Obviously there are a mass of other floating point parameters that can/are modified on the model, so why is this a binary switch? That seems very suboptimal when there are the trade offs you mention.

Ken Kovar

Aug 9, 2025

Good article! I think it helps to carefully read the developers notes about what the real goals are for the release of their software. I think OpenAI is doing a pretty good job of developing this product and I hope people stop it with the hype about AGI being Real Soon Now 😁

Comment deleted

Aug 9, 2025

Comment deleted

Maria Sukhareva

Aug 9, 2025

GPT-5 feels even worse but GPT-5-Thinking intuitively feels much better, though it’s anecdotal so far.