Let’s talk about em dashes in AI

Maria Sukhareva

Jun 10

ChatGPT has not always been overloaded with em dashes.

Read →

5 Comments

Daniel Graetzer

Jul 1

Maybe morse code is the one true language of the future.

Expand full comment

Joseph de Castelnau

Jul 2

Maria I think you are missing the point entirely. With reinforcement learning, our precious LLMs area stating to figure out that concise, precise answers are key. So em dashes go up.(As they are a stylistic way to provide clarity). Be careful with endogeinty bias.

Expand full comment

Reply (1)

Maria Sukhareva

Jul 2

LLMs don’t start “figuring anything out”. They don’t have cognition or common sense, what they have though is a loss function that sums up the probabilities during generation. The smaller the loss, the better. Em dashes have make generation use less tokens, better score. So now they put em dashes everywhere where they belong and don’t. And models by no means aim to provide short and concise answer, the so-called reasoning made them enormous ramblers but one of the training objectives is still to optimise token use.

Expand full comment

Reply (1)

Joseph de Castelnau

Jul 3

I disagree. With reinforcement learning loss function we are heading towards clarity. Though you are correct their primary token base charging model will continue to have them rambling on to charge the customers more.

Expand full comment

Yong Zheng-Xin (Yong)

Jul 1

thanks for writing this piece. doesn’t em dash used by models use the ones without connecting spaces? i don’t think the example you gave in ‘cheapness’ argument uses the same em dash.

Expand full comment