15 Comments
User's avatar
Graham Lovelace's avatar

This is brilliant work Maria! I’m not a computer scientist but I understood it all. I write about the impacts of gen AI on human-made media and the behaviour of the hi-techs. Would love to interview you at some stage.πŸ‘πŸ‘πŸ‘

Maria Sukhareva's avatar

Thank you:) sure, drop me a message:)

Blake from WTF Over's avatar

Oh boy, can’t wait for my power bill to go up, yet again. I live in a county with a lot of data centers and power bills have already doubled.

Adam Kucharski's avatar

Thanks for sharing. Also been wondering if there’s a calibration issue with task management, I.e. the triage model that is user facing is systematically overconfident about how easy a task is (and hence sends it to underperforming tool/model).

Claus Wilke's avatar

Not sure what the correct answer for 7,11=3,555x is supposed to be. ChatGPT interprets 7,11 as 7.11 and correctly calculates 7.11/3,555 = 0.002. If you interpret 7,11 as 711 then the answer would be 0.2, but there is no way to solve the equation without making some assumption. Probably the best behavior would be to ask whether 7,11 was a typo.

Though I just tried it with ChatGPT (not sure which version, whichever is free) and it wrote "I’ll assume that commas are decimal separators (European style)" and then arrived at x=2. But I'll note that even in the UK you don't use commas as separators, so by using commas and writing the instruction in English you have created a mixed task that has no clear solution.

Maria Sukhareva's avatar

In Germany, it’s used as a decimal. The thinking model does it alright. ChatGPT also knows I am in Germany, both from memory and my IP.

Interesting you figured out how it ends up at that answer. It would have never occurred to me that it thinks that comma means different things on each side

Alistair Windsor's avatar

If you ask the question in German you get the expected answer. I asked

Bestimme x

7,11=3,555x

And got back x=2.

7,11 is not a well formed number in the English speaking world but 3,555 is. It β€œfixed” 7,11.

Richard Maunder's avatar

You might well be correct, but then there is an inconsistency as it is treating , as decimal point for one number and 1000s separator for other within same text/question, which again hints at a real lack of utility / 'intelligence' - if a math student did this you would regard them as sloppy / badly taught?

Alistair Windsor's avatar

I would not! In part because the first time I did the problem I got the GPT answer. I saw a . in place of the comma in 7,11 because that is what I expected there ;) However, I want my AI to at least be more attentive than me.

Richard Maunder's avatar

Any idea why there are only two distinct modes - is this due to fundamental changes in the architecture etc? Obviously there are a mass of other floating point parameters that can/are modified on the model, so why is this a binary switch? That seems very suboptimal when there are the trade offs you mention.

Ken Kovar's avatar

Good article! I think it helps to carefully read the developers notes about what the real goals are for the release of their software. I think OpenAI is doing a pretty good job of developing this product and I hope people stop it with the hype about AGI being Real Soon Now 😁

User's avatar
Comment deleted
Aug 9, 2025
Comment deleted
Maria Sukhareva's avatar

GPT-5 feels even worse but GPT-5-Thinking intuitively feels much better, though it’s anecdotal so far.