Advanced data analysis with ChatGPT-4o

In this article by Associate Software Engineer, Conor Martin, we discuss what ChatGPT-4o can do for data analysis.
Date posted
24 May 2024
Reading time
6 minutes
Conor Martin
Associate Software Engineer · Kainos

On the 13th of May, OpenAI announced its newest flagship model, ChatGPT-4o. This model can respond with text, audio, and images in real-time. OpenAI claims it responds in similar time as that of a human and at 50% of the cost via their API. They showcase a wide range of uses. These include interview practice, math tutoring, and so on. One particularly interesting use case is its newfound potential in advanced data analysis. 

We believe that GPT-4o has the potential to play a pivotal role in enhancing data analytics across a broad range of use cases. Experts have the freedom to choose how much control they hand over to the AI. With the ability to identify their own weaknesses, analysts can significantly enhance the quality of their work. 

image

What can ChatGPT-4o do for data analysts? 

We tested the model extensively. We found that ChatGPT-4o can do end-to-end data analysis. 

  • Analyses, processes and understands the context of large datasets
  • Ideates on opportunities with the dataset
  • Develop a detailed plan foron analysing a dataset with an end goal in mind
  • Step by step execution of the plan, including: execution of a wide range of data analysis techniques, real-time data visualisation, and high-quality Python code generation
  • Draw comprehensive conclusions and explain its findings. 

Showcasing ChatGPT-4o in action 

One of our investigations involved using an open dataset on certain commodity prices in India between 1997 and 2015. Our end goal was to determine the impact of specific economic policies on commodity prices, following GTP-4o’s suggested plan. We quickly executed ChatGPT’s plan and had working code, visualising the following steps in various charts and graphs: 

  1. Data Cleaning and Preparation - A vital first step in data analysis is ensuring the dataset is ready. 
  2. Exploratory Data Analysis – GPT summarised the dataset to create visualisations, including a line plot of price trends over time, a box plot for distribution and to identify outliers and a heat-map to visualise regional price variation. 
  3. Before and After - GPT-4o created a list of economic policies that may have affected the price of goods. It visualised the impact of one of these policies via a bar chart depicting the price of goods before and after the policy’s implementation. 
  4. Time-Series Analysis - Extensive analysis of the long-term movement in the data over a period. It analysed regular and predictable fluctuations in the data at specific intervals such as daily, monthly and yearly. 
  5. Impact Assessment via Regression Testing - Quantified policy impact by identifying the relationship between the dependent variable, price and independent variables including policy, price and region. 

Code generation 

The newer versions of ChatGPT feature the ability to run code by itself within a restricted environment. GPT will write the code and run it. With a concise prompt GPT-4o interprets that we want to see the data analysis visualised. It will display in-line visualisations and print the python required to run it. 

However, when using the Azure GPT API, you will not get graphs and charts, you will have to work with the python code to generate those. Crucially, understanding your code is still important and we strongly recommend taking the time to read and dissect code before executing. This way, you will be prepared when errors inevitably appear.. 

Data security

We understand the importance of data security and privacy for all customers. As such, it is important to avoid putting any confidential data into ChatGPT. Instead, GPT-4o's API is available on Azure and as per Microsoft’s documentation your data is not used to train the model further. As a bonus, GPT-4o runs at 50% of the cost of its predecessor! 

The true value lies in knowing how to use it 

We believe that this is a first step in transforming the data analysis process. Experts will benefit most from tools like this. They have the deep experience to know what analysis they need and to understand their results. GPT-4o can significantly speed up their workflow as well as improve and build upon it.  

You might choose to use GPT-4o to help analyse datasets and produce general statistics describing them. Or you could use it to help generate ideas on what technique is best for a desired outcome. You could then use it to generate the code you need. Perhaps you will use it for all of the above and more. You only pay for what you use of the model, so make sure it is returning the maximum value! 

Try it yourself 

GPT-4o is a new model extremely capable of interpreting detailed analysis requests and executing them. We believe this is a great tool which will transform the process of data analysis. Not only will it enhance the quality of analysis performed but also produce results at much greater speed.   

This tool is available for all users (some limitations are in place for free users) via OpenAI. With vast amounts of open datasets available online you can easily get started. Experiment and find out if it suits you! Alternatively, feel free to reach out and contact us for more insight and discussion. 

 

About the author

Conor Martin
Associate Software Engineer · Kainos