Programmatic Content

GPT-3 + Data = Nonsense

Can we extend GPT’s impressive language skills with accurate data and real insights from multiple live sources?
Amir Zohrenejad
3 mins read

Everyone is talking about Chat GPT and how it will change the way people work. The real estate industry is no exception. To recap: Chat GPT is a tool created by a company called OpenAI that can converse in English to generate text or answer a variety of questions. However when it comes to creating content that is data\-heavy, Chat GPT (and GPT-3, the platform on which it is built) tend to not work very well. Let me explain. In its own words, GPT-3 “is trained to understand and generate human language. It is not specifically designed to work with data in the sense of numerical or statistical analysis."

The result – the technology can’t be used out of the box by content creators in data-intensive industries like real estate. For example, if a creator’s goal were to draft a piece of longform content detailing the latest rental trends for a given zip code, Chat GPT will return a response similar to this:

As impressive as that looks at first glance, the numeric data cited in this output is nonsensical making the content unusable by any serious real estate professional. Asking Chat GPT where it found those figures will return a response like this “The information I provided on rental trends in Chicago's 60642 zip code was based on my general knowledge and understanding of the real estate market, but it is not specific numbers from any source.”

In a world where publications like the NYTimes have gotten readers to expect data embedded within all content, what GPT provides out of the box simply doesn’t cut it for data-heavy use cases like real estate. So the problem becomes: how can we extend GPT’s impressive language skills with accurate data and real insights from multiple live sources? 

This is exactly the problem we are solving here at Dataherald. The solution works in the following steps:

  1. We extract the intent of the prompt entered by the user. We fine-tune our own model on top of GPT to do this efficiently against the data in our warehouses
  2. We query our data warehouses to get accurate numbers for the users prompt
  3. We prompt GPT using the accurate data, also embedding an interactive data visualization to accompany the final text  

Here is the solution in action

Let’s talk:

Ready to supercharge your content with data?

Get in touch today
Schedule Demo
Or drop us a line at