20 Comments
User's avatar
Graham Lovelace's avatar

This is brilliant work Maria! I’m not a computer scientist but I understood it all. I write about the impacts of gen AI on human-made media and the behaviour of the hi-techs. Would love to interview you at some stage.šŸ‘šŸ‘šŸ‘

Expand full comment
Maria Sukhareva's avatar

Thank you:) sure, drop me a message:)

Expand full comment
Blake from WTF Over's avatar

Oh boy, can’t wait for my power bill to go up, yet again. I live in a county with a lot of data centers and power bills have already doubled.

Expand full comment
Maria Sukhareva's avatar

I am curious which country it is?

Expand full comment
Ken Kovar's avatar

Chatghanistan probably 🤨

Expand full comment
Joe Doe's avatar

The country where your mother was whoring herself during her youth.

Expand full comment
Adam Kucharski's avatar

Thanks for sharing. Also been wondering if there’s a calibration issue with task management, I.e. the triage model that is user facing is systematically overconfident about how easy a task is (and hence sends it to underperforming tool/model).

Expand full comment
Claus Wilke's avatar

Not sure what the correct answer for 7,11=3,555x is supposed to be. ChatGPT interprets 7,11 as 7.11 and correctly calculates 7.11/3,555 = 0.002. If you interpret 7,11 as 711 then the answer would be 0.2, but there is no way to solve the equation without making some assumption. Probably the best behavior would be to ask whether 7,11 was a typo.

Though I just tried it with ChatGPT (not sure which version, whichever is free) and it wrote "I’ll assume that commas are decimal separators (European style)" and then arrived at x=2. But I'll note that even in the UK you don't use commas as separators, so by using commas and writing the instruction in English you have created a mixed task that has no clear solution.

Expand full comment
Maria Sukhareva's avatar

In Germany, it’s used as a decimal. The thinking model does it alright. ChatGPT also knows I am in Germany, both from memory and my IP.

Interesting you figured out how it ends up at that answer. It would have never occurred to me that it thinks that comma means different things on each side

Expand full comment
Alistair Windsor's avatar

If you ask the question in German you get the expected answer. I asked

Bestimme x

7,11=3,555x

And got back x=2.

7,11 is not a well formed number in the English speaking world but 3,555 is. It ā€œfixedā€ 7,11.

Expand full comment
Richard Maunder's avatar

You might well be correct, but then there is an inconsistency as it is treating , as decimal point for one number and 1000s separator for other within same text/question, which again hints at a real lack of utility / 'intelligence' - if a math student did this you would regard them as sloppy / badly taught?

Expand full comment
Alistair Windsor's avatar

I would not! In part because the first time I did the problem I got the GPT answer. I saw a . in place of the comma in 7,11 because that is what I expected there ;) However, I want my AI to at least be more attentive than me.

Expand full comment
Joe Doe's avatar

Because you don't have the ability to really think.

Expand full comment
Richard Maunder's avatar

Any idea why there are only two distinct modes - is this due to fundamental changes in the architecture etc? Obviously there are a mass of other floating point parameters that can/are modified on the model, so why is this a binary switch? That seems very suboptimal when there are the trade offs you mention.

Expand full comment
Ken Kovar's avatar

Good article! I think it helps to carefully read the developers notes about what the real goals are for the release of their software. I think OpenAI is doing a pretty good job of developing this product and I hope people stop it with the hype about AGI being Real Soon Now 😁

Expand full comment
Ignacy Skorupka's avatar

I still didn't see a big difference between chatgpt 4.1 to 5

Expand full comment
Maria Sukhareva's avatar

GPT-5 feels even worse but GPT-5-Thinking intuitively feels much better, though it’s anecdotal so far.

Expand full comment
Joe Doe's avatar

As stupid as chat gpt is, it's still much smarter than you.

Expand full comment
Ignacy Skorupka's avatar

True that's quite crazy, I use chat gpt for Formula 1 content creation and I made some tests about specific races of like the 50s and 60s and it fails more then chatgpt 4.5 and honestly gemini has been far better at this, I just had big hopes for chatgpt5

Expand full comment