This is brilliant work Maria! Iām not a computer scientist but I understood it all. I write about the impacts of gen AI on human-made media and the behaviour of the hi-techs. Would love to interview you at some stage.ššš
Thanks for sharing. Also been wondering if thereās a calibration issue with task management, I.e. the triage model that is user facing is systematically overconfident about how easy a task is (and hence sends it to underperforming tool/model).
Not sure what the correct answer for 7,11=3,555x is supposed to be. ChatGPT interprets 7,11 as 7.11 and correctly calculates 7.11/3,555 = 0.002. If you interpret 7,11 as 711 then the answer would be 0.2, but there is no way to solve the equation without making some assumption. Probably the best behavior would be to ask whether 7,11 was a typo.
Though I just tried it with ChatGPT (not sure which version, whichever is free) and it wrote "Iāll assume that commas are decimal separators (European style)" and then arrived at x=2. But I'll note that even in the UK you don't use commas as separators, so by using commas and writing the instruction in English you have created a mixed task that has no clear solution.
In Germany, itās used as a decimal. The thinking model does it alright. ChatGPT also knows I am in Germany, both from memory and my IP.
Interesting you figured out how it ends up at that answer. It would have never occurred to me that it thinks that comma means different things on each side
You might well be correct, but then there is an inconsistency as it is treating , as decimal point for one number and 1000s separator for other within same text/question, which again hints at a real lack of utility / 'intelligence' - if a math student did this you would regard them as sloppy / badly taught?
I would not! In part because the first time I did the problem I got the GPT answer. I saw a . in place of the comma in 7,11 because that is what I expected there ;) However, I want my AI to at least be more attentive than me.
Any idea why there are only two distinct modes - is this due to fundamental changes in the architecture etc? Obviously there are a mass of other floating point parameters that can/are modified on the model, so why is this a binary switch? That seems very suboptimal when there are the trade offs you mention.
Good article! I think it helps to carefully read the developers notes about what the real goals are for the release of their software. I think OpenAI is doing a pretty good job of developing this product and I hope people stop it with the hype about AGI being Real Soon Now š
True that's quite crazy, I use chat gpt for Formula 1 content creation and I made some tests about specific races of like the 50s and 60s and it fails more then chatgpt 4.5 and honestly gemini has been far better at this, I just had big hopes for chatgpt5
This is brilliant work Maria! Iām not a computer scientist but I understood it all. I write about the impacts of gen AI on human-made media and the behaviour of the hi-techs. Would love to interview you at some stage.ššš
Thank you:) sure, drop me a message:)
Oh boy, canāt wait for my power bill to go up, yet again. I live in a county with a lot of data centers and power bills have already doubled.
I am curious which country it is?
Chatghanistan probably š¤Ø
The country where your mother was whoring herself during her youth.
https://www.12onyourside.com/2025/04/02/dominion-energy-proposes-increase-rates-starting-summer-2025/
Thanks for sharing. Also been wondering if thereās a calibration issue with task management, I.e. the triage model that is user facing is systematically overconfident about how easy a task is (and hence sends it to underperforming tool/model).
Not sure what the correct answer for 7,11=3,555x is supposed to be. ChatGPT interprets 7,11 as 7.11 and correctly calculates 7.11/3,555 = 0.002. If you interpret 7,11 as 711 then the answer would be 0.2, but there is no way to solve the equation without making some assumption. Probably the best behavior would be to ask whether 7,11 was a typo.
Though I just tried it with ChatGPT (not sure which version, whichever is free) and it wrote "Iāll assume that commas are decimal separators (European style)" and then arrived at x=2. But I'll note that even in the UK you don't use commas as separators, so by using commas and writing the instruction in English you have created a mixed task that has no clear solution.
In Germany, itās used as a decimal. The thinking model does it alright. ChatGPT also knows I am in Germany, both from memory and my IP.
Interesting you figured out how it ends up at that answer. It would have never occurred to me that it thinks that comma means different things on each side
If you ask the question in German you get the expected answer. I asked
Bestimme x
7,11=3,555x
And got back x=2.
7,11 is not a well formed number in the English speaking world but 3,555 is. It āfixedā 7,11.
You might well be correct, but then there is an inconsistency as it is treating , as decimal point for one number and 1000s separator for other within same text/question, which again hints at a real lack of utility / 'intelligence' - if a math student did this you would regard them as sloppy / badly taught?
I would not! In part because the first time I did the problem I got the GPT answer. I saw a . in place of the comma in 7,11 because that is what I expected there ;) However, I want my AI to at least be more attentive than me.
Because you don't have the ability to really think.
Any idea why there are only two distinct modes - is this due to fundamental changes in the architecture etc? Obviously there are a mass of other floating point parameters that can/are modified on the model, so why is this a binary switch? That seems very suboptimal when there are the trade offs you mention.
Good article! I think it helps to carefully read the developers notes about what the real goals are for the release of their software. I think OpenAI is doing a pretty good job of developing this product and I hope people stop it with the hype about AGI being Real Soon Now š
I still didn't see a big difference between chatgpt 4.1 to 5
GPT-5 feels even worse but GPT-5-Thinking intuitively feels much better, though itās anecdotal so far.
As stupid as chat gpt is, it's still much smarter than you.
True that's quite crazy, I use chat gpt for Formula 1 content creation and I made some tests about specific races of like the 50s and 60s and it fails more then chatgpt 4.5 and honestly gemini has been far better at this, I just had big hopes for chatgpt5