Custom GPTs: moderate value for users but a goldmine for OpenAI
Why OpenAI still has the service
Custom GPTs are a questionable service to run. The clicky-clicky RAG bots have a lot of disadvantages:
They hallucinate like crazy. The paper by Yang et. al shows that only 63% of answers from such bots are accurate.
They have serious security issues. 95% of custom GPTs lack adequate security protections according to Ogundoyin et. al . They leak system prompts, fall for phishing etc.
The adoption seems to be quite limited. This blog shows that customGPTs account only for 2.7% worldwide chatGPT traffic.
Microsoft at some point decided that it is not even worth investing in DYI RAG bots and killed its customGPT builder
So why does OpenAI still keep this service running?
The likely answer: data collection. Custom GPTs provide access to two unique types of data that are hard to acquire elsewhere:
Niche and domain-specific documents uploaded into the vector store
Domain-specific user intents
Niche and domain-specific data collection
Ilya Sutskever made waves saying:
We have to deal with the data that we have. There's only one internet.
For proprietary LLM providers, access to proprietary data is a massive competitive advantage. Custom GPTs allow users to upload up to 20 files, each with a maximum size of 2 million tokens. Since launch, over 3 million Custom GPTs have been created. While not all include extensive knowledge bases, the volume of niche data OpenAI has likely absorbed is substantial.
People are uploading all kinds of materials that are nearly impossible to scrape from the public web. Think:
Internal training manuals from banks, hospitals, or manufacturing companies
Patent drafts, technical white papers, or confidential R&D reports
Private customer support logs with real-world issue descriptions and responses
Corporate strategy documents, market analyses, or M&A decks
Legal memos, case summaries, and regulatory compliance checklists
Unpublished academic work, thesis drafts, and peer review feedback
And much more. Many of these files are high-quality, up-to-date, and come directly from knowledge workers who explain how the content should be used. That’s the goldmine: not just the content itself, but the user annotations—system prompts, task descriptions, tool selections, and expected outcomes. It's labeled intent, attached to real-world domain-specific data. Which brings us to the next point:
Domain specific intent collection
The secret behind OpenAI’s success isn’t just having a lot of data or inventing a novel algorithm. The real breakthrough was redefining the goal of language modeling—from generating the most likely next word to generating the most useful one (Ouyang et al.). That shift required a fundamental change in training data: instead of learning from passively consumed internet text, the model was trained on instructions—task-specific, goal-directed prompts with human-annotated completions.
The hardest part of working with instruction data isn’t writing or labeling it but ensuring the dataset is representative. That means the instruction set must reflect the diversity of tasks real users actually care about, across industries, professions, and cultural contexts.
If your dataset is dominated by a narrow slice e.g. coding questions, creative writing tasks, or academic math problems, then the model will learn to excel in those areas and fail to generalize.
This issue of representativeness was one of the biggest hurdles in training the original LLaMA model—and remains a bottleneck for every model trying to serve a general-purpose audience.
OpenAI collected one of the most representative instruction datasets in history through its website. However, these are largely general-purpose requests. What is still missing is a representative instruction dataset for domain-specific and niche tasks. Users are well aware that there’s little point in chatting with ChatGPT about their insurance contract if the model hasn’t seen the contract. But if Custom GPTs allow users to upload such documents, OpenAI can collect the user’s request, the relevant data needed to process that request, and the model’s output—all in one interaction. The same applies to other domain-specific tasks that can only be addressed when both the data and the context are provided.
When users upload their internal knowledge bases, they often accompany them with detailed system prompts—describing what the model should do with the data, how it should be processed, and what kind of output is expected. This combination of domain data and user intent provides an ideal foundation for training models to better align with user needs in specialized contexts.
So what is happening next?
The adoption of Custom GPTs, while substantial, hasn’t matched the hype—likely because most users don’t want to spend time configuring bots, engineering system prompts, or selecting tools only to end up with something that has 63% accuracy. What people actually want is for the base model to “just work”: upload a document, and the model should already understand what to do with it.
This is where Custom GPTs become interesting. Their system prompts and interaction patterns could offer OpenAI insight into domain-specific behavior and user intent. While there’s no official confirmation that this data feeds directly into model training, it's a plausible direction.
Notably, OpenAI is now integrating connectors to enterprise platforms like SharePoint and OneDrive, and adopting the Model Context Protocol to pull from structured sources like GitHub, Slack, Salesforce etc. But to handle such data meaningfully, the model needs instructions for contextual understanding of what matters in a given domain. Whether or not Custom GPTs directly inform training, they’ve likely been a valuable testing ground for exactly that kind of contextualization and could have potentially been a goldmine for training.