A couple of weeks ago I made an assumption: the rise of em‑dashes in AI‑generated text happened because model providers started scanning older, pre‑Kindle books.
Yes, there is a variety of possible paraphrases depending on the context. You can see the data in the GitHub. That’s a also reason why the model prefers them
An em dash isn't just an alternative to ", and" though.
You can see all the paraphrases from the study here https://github.com/ktoetotam/mystery-em-dash/blob/main/analysis_results_substack/paraphrase_results_20250704_022337.json
Yes, there is a variety of possible paraphrases depending on the context. You can see the data in the GitHub. That’s a also reason why the model prefers them
I was going to look into this but you absolutely nailed it. Did we just become best friends? Unbelievable work.
https://open.substack.com/pub/ghostlotuz/p/understanding-em-dashes-vs-semicolons?r=nqkfv&utm_medium=ios
I wrote about why it is correct, and no one even knows basic grammar.