Why does ChatGPT use “Delve” so much? Mystery Solved.
ChatGPT overuses the word “Delve”. It has to do with the way ChatGPT was built.
The mystery of why ChatGPT frequently uses the word “delve” (one of the 10 most common words used by ChatGPT) is finally solved, and the answer is (literally) far from what you could expect.
While we may not regularly use words like “Delve” or “Tapestry” in our everyday conversations (I’m not even sure what “Tapestry” means), ChatGPT seems to favor them. You may have already noticed that it tends to overuse these words in its outputs.
All of a sudden, the word “delve” is skyrocketing in medical papers as you can see in the chart above. This chart was published in March 2024, so 2023 is the only full year where ChatGPT was used.
“Delve”, along with many other phrases (”as an AI language model…”) has become a part of ChatGPT’s pop culture, almost the cue that a text is written by ChatGPT and not a human.
But here is the question that has baffled me personally, along with ML experts: If ChatGPT is trained on human data, how did it all of a sudden get the idea of using “Delve” so much? Is this an emergent behavior? And why “Delve” specifically?
The Guardian article called “How cheap, outsourced labour in Africa is shaping AI English” is probably a good answer to this question. The key to the mystery lies in Africa and the way ChatGPT was built.
Why “Delve” so Much?
Let’s come back to the question, “How did ChatGPT all of a sudden get the idea of using “Delve” so much?” If ChatGPT overuses “delve” compared to our daily language and possibly the internet data, then ChatGPT’s language must be altered AFTER it has been trained on the internet scrape.
After an LLM undergoes days and weeks of training on a mammoth corpus of data, measures have to be taken to make sure the AI doesn’t go off the rails. It has to be “aligned”. To accomplish this, an additional step of Supervised Learning is used. In this step, human annotators evaluate the outputs of the Language Model. Their evaluations are then used to fine-tune the model.
Here’s a short summary of the process:
A base Language Learning Model (LLM) is trained using a data corpus. A Reward Model (RM) is also used to learn what humans consider as “good” and “aligned”.
The LLM generates multiple outputs, from which humans select the preferred one.
The RM learns from these human choices, converting preferences into rewards or scores.
The LLM gets this reward signal and modifies its behavior to obtain more rewards.
This process is repeated: the LLM generates outputs, humans provide feedback, and the models improve in an iterative loop.
This is the general setup of the extra step of RLHF. It’s essentially what makes ChatGPT cautious and in-bound. However, to do this effectively requires human evaluation. Now this human evaluation isn’t necessarily just selecting the best-desired output, it can be giving thumbs up/down to an output, or crafting an ideal response that the LLM is expected to produce.
The sum total of all the feedback is a drop in the ocean compared to the scraped text used to train the LLM. But it’s expensive. Hundreds of thousands of hours of work goes into providing enough feedback to turn an LLM into a useful chatbot, and that means the large AI companies outsource the work to parts of the global south, where anglophonic knowledge workers are cheap to hire.
As the Guardian points out, like many other annotation tasks when it comes to Machine Learning, there is an army of cheap freelancers in Africa that take up this task.
In Nigeria, “delve” is much more frequently used in business English than it is in England or the US. So the workers training their systems provided examples of input and output that used the same language, eventually ending up with an AI system that writes slightly like an African.
This is a case of poor sampling (Selection Bias), where the evaluators differ somewhat from the target users, introducing a slight bias in writing style. If the target users of an AI Assistant are global English speakers, the least measure to ensure a better sampling is to choose annotators of all nationalities and styles.
Whether this is the cause of slight biases in the writing style of ChatGPT or not, the root cause most likely lies in the RLHF step and not the initial training. ChatGPT’s writing style is already robotic and easy to detect, with or without “delve”. But there is a lesson to be learned here. By knowing what could go wrong in which step of the way, we can take measures to not repeat these errors in our research and development.
Making ChatGPT More Human-Like?
So what is the way around this? The straightforward method to make ChatGPT sound more human and not use cheesy “delve” words, is Prompt Engineering. There are multiple approaches to this:
Instruct directly ChatGPT to avoid using “delve” or certain words.
Instruct ChatGPT to act as a [SPECIFIC ROLE].
Use few-shot learning and hand it pieces of text you want it to write similar to, like your own blog posts, etc.
The obvious downside of these methods is that it takes too much time! I don’t want to prompt ChatGPT for 5 minutes and THEN ask it something. I’m looking for a quick method, that integrates well with ChatGPT and is reliable, something like a Chrome extension.
So if you have solved this issue and know a reliable tool, share it in the comments. I’m sure this is a widespread problem.
🌟 If if you want to join +1000 people learning about Python, ML/MLOps/AI, Data Science, and LLM, follow me and check out my X/Twitter where I keep you updated Daily: https://twitter.com/itsHesamSheikh
Thanks for reading,
— Hesam