How and Why to Use GPT To Write Code (if you currently use Excel/Tableau/Alteryx/etc.)
I’ve been a data scientist for about six years. I’ve written a lot of code that cleans and analyzes data, and I’ve made a lot of graphs using Python libraries. But until a month ago, I’d never written a dashboard purely with code. I wanted to, but I hadn’t gotten around to it, in part because I thought it was going to take a long time to learn.
This was my first dash app a few weeks ago.
It’s not going to win any design awards, but it has the functionality that I wanted it to have:
It has a filter which is determined by the values in the underlying data, and which defaults to showing all of the data
It hides everything but the first 100 characters of each cell, but has hover text which shows the full strings
It’s displaying the Close Date in the date format I wanted, even though the underlying filter/processing has to handle the dates in a different format
It’s using text filters for each of the columns
That’s about it! But, also, it works. It’s a proof-of-concept that might get turned into something bigger. And it only took a couple of hours to build it with help from GPT-4.
Below is my second app, built in Shiny for Python. It has significantly greater functionality and better design.
It has charts, it has different kinds of filters, it has a button that resets the filters, and another button where you can download the source data. I’m using CSS (cascading style sheets) to modify the formatting.
And GPT helped me build this as well. As far as GPT is concerned, Shiny for Python doesn’t exist yet, because it’s so recent that GPT wasn’t even trained with it. But I pasted in both documentation from the creators of Shiny for Python and existing code that used the functions I wanted to use, and GPT was able to use those to write working code for me using this library.
For instance, this is part of the documentation for the update_slider button in Shiny. I fed the whole page to GPT:
The results weren’t perfect. It got a lot wrong and I wound up doing more of the coding myself than I did with the Dash app. But I was still able to create this more quickly than I would have otherwise.
Why Bother With Code When You Can Use Excel/Tableau/Alteryx?
You might be wondering why you should bother with building your own applications at all. It’s already possible to build dashboards in lots of off-the-shelf products that work reasonably well. You can make your maps with ESRI and your graphs in Excel. Your company might already be paying for one or more of these products. If they’re easy to use, and work well, then what’s the problem?
Version Control
The first advantage of code is that code lets you use version control with it in a comprehensive way. Version control with an open-source tool called git is how you track different versions of the same documents without having to actually keep multiple copies with different names. But it’s so much more than that. It’s how you can go back to a previous version after you mess up. It’s how you can work on one version and your coworker can work on another, and when you’re ready, you can merge them in and go line-by-line and see every difference between the two versions, highlighted, and pick which line is the correct one. (Or, if you were working on totally separate parts of the code, git will figure that out and do the merge for you.)
You might think this is not much of an advantage. Doesn’t Excel have a version history? Why is this a point for code over another product? Simply put, version control works better for code than any custom product you’re using in a graphical user interface. You may be able to revert to a previous version in your other tool, but you can’t get the full benefits of branching, merging, and seeing the line-by-line differences between two objects.
Modularity and Ease of Changes
The next advantage of coding is that your code is modular and much easier to change. For instance, take my Shiny dashboard. There are four graphs in it with the same formatting. Those are all coming from one function, meaning that I can change that formatting on all four graphs with one line of code.
Even better, I can write a document in CSS which makes tons of formatting changes to the whole dashboard – and then I can use that CSS file any time I make a dashboard, and it will update the formatting to be just how I like it. Or, if I’m coding, and I realize I did something wrong in my first step, I can just go back and change it and re-run everything, whereas if I’m doing everything in Excel, I now need to start over.
Documentation
Next, using text-based tools lets you take full advantage of documentation. You can write a readme file which explains exactly what it does, and that goes in your git repository. And, within the code itself, you can write which explains what each function does. Someone else can read it and figure out what’s going on – and modify it, and build on it. I’ve tried to do that same thing with someone’s function-filled Excel workbook, and it’s much more difficult.
Forcing You To Articulate Your Work
Another advantage is that code forces you to make decisions yourself, and to write down those decisions, which gives you (and someone else looking at your code) more opportunities to figure out if you’re doing something wrong, as opposed to outsourcing those choices to another tool, where you’re less likely to realize what it’s doing, think about it, and realize you want to do something differently.
Escaping Vendor Lock-In
Finally, coding means you’re not stuck in this expensive, proprietary universe. You’re still paying someone to host your stuff, but you’re not paying for licenses just to build. And the model for drag-and-drop tools is that the costs of migrating out are very, very high. I tried to make sense of an XML file for the Alteryx workflow - that’s a text representation of what someone built in Alteryx. Going from that workflow to Python or R code would be a really bad experience. By contrast, if I decide I want my Dash app to live somewhere besides my current linux server, that’s going to be a breeze. If I want to spin up my own Shiny server and host my Shiny dashboards, I can do that. The code’s already written, and it’s a separate entity from whoever I’m paying to host my stuff.
How Should You Write Code? How Can GPT Help?
The way you should write code, in general, is this:
Come up with a broad plan. What are you trying to do/build? What are your inputs and outputs? For instance, before I made the Shiny dashboard above, I looked at an existing Tableau dashboard from DC. I noticed that the underlying data set was really interesting, but I had a lot of questions about it, which the existing dashboard could not answer. All you can do with it that is look at data from one school and one year, or one school and multiple years but only one grade. So I decided to build a dashboard that would answer the kinds of questions I had. These are things like, what schools have the most applicants? What pre-k programs are families getting into even if they don’t have any sibling or neighborhood connection to the school which gives them a leg up in admissions? What schools have the most seats they’re putting in the lottery, and are they getting filled? I had an underlying data set – my input – and my output was going to be a working dashboard online which people can use it to answer these questions.
Break it into smaller steps. For example, I need code that takes a data column and makes a bar chart, and I have specific formatting requirements as far as color, title, where the legend goes. etc. I needed a button to reset the filters, and another one to download the data. This definitely can change after you get started - you’re not committed to doing everything the way you planned to. I started out with slider bars for my filters and I switched to text input instead once I saw how they looked. But think about what the inputs and outputs of each step are. Each piece will be a function, class, or module which is responsible for a single step.
Code each of those underlying parts. It doesn’t have to be sequential – for instance, I’ve coded most of my dashboard even though I’m hoping to get a better data set, and if I do get that better data set, I’ll have to go back to that data cleaning step.
This sounds intimidating! But GPT can help you with each of these steps. You can talk through it about what your plan is and how to break it into steps. But let’s focus on the coding part. My conservative advice is that your process for coding should still be that same process for coding, but using GPT to help.
Give GPT specific directions for each chunk of something you want to build. For example “I have a .csv file called my_csv_file.csv. I want you to write code that reads it in via Python. Explain every step of the code to me.”
Think in terms of inputs (your csv file) and outputs (your Python variable representing your csv.) Or, “I read my data in. it’s now a Python DataFrame. I want you to make a bar chart with “Number of applicants” on the y-axis and “count of schools” on the “x-axis.” I want the title to be “Count of applicants by number of schools.” Let it write the code for you. Try the code out. Go one chunk at a time. You can add more features as you go. If you look at the graph and you realize the colors are wrong and the fonts are wrong and you want your data binned, then tell it that, and then try out what it gives you.
Pay attention to what it’s writing for you. Ask it to explain it to you. If you don’t understand what it’s saying, tell it you’re a really new programmer and to try again using more basic words. It’s not going to judge you or get frustrated.
If something breaks and there’s an error message, by all means give GPT the error message and see what it does. Give it all of your code and the error message. But if that doesn’t work– if it’s still broken – don’t just keep trying, but try to do some troubleshooting yourself. Ask it if it has ideas for what could be happening. Find some documentation for that function you’re trying to use and give it to GPT. Read your code. Think about the larger context.
I also think that if you are building stuff and you find you like it, you should pick up a book or take a class or try to get some general grounding in coding. If you find yourself liking coding, you will also like learning a bit more about how it works.
Why I’m Writing About This
When I first got into Python, what was exciting about it was that all of the stuff I was doing in Excel or Stata I could now do so much faster and better in Python. All of these new things I hadn’t known were options suddenly were.
GPT has been like that for me again. I had all of these projects I had in the back of my head, these tools that I figured I could get to eventually. Now I can get meaningfully started on them in the time it would take me to watch a movie. Occasionally the time it would take me to watch a commercial, even!
If there are things you wanted to make, now you can make them. Quickly. With less frustration and better output. And if you’re already a data analyst, or you do data viz, or you make dashboards, then the things you know already and the concepts you’re familiar with just got way easier to translate into code.