If You Like Transparency and Good Coding Processes, GPT Has Much to Praise
A Data Scientist's Perspective on GPT-4
Getting Started in Python
During my PhD program, I wrote a lot of code in Stata. Stata is a proprietary statistical analysis software that a lot of economists use. Shortly after that–about seven years ago–I started working in Python. It was radically different working in a language where you can do everything in your code as opposed to part-manually and part-coding, and where code is modular and reusable.
But while my code and coding practices have gotten more sophisticated since then, my processes for coding and learning stuff have looked pretty much the same. My tools have included API and library documentation, Google, Stack Overflow, and sometimes, a book or a class.
But with GPT-4, that's changed in ways that make me much more productive and are also just really cool. I don't know what coding is going to look like five years from now or even one year from now, but at this moment, at least from where I sit as a data scientist, coding with the help of this tool is amazing.
How GPT-4 is Changing the Way I Code
Here are a few ways GPT-4 has transformed my personal data science projects:
1) Going from Concept to Code Faster
The things I’ve asked for help from GPT-4 to code up in the last week include:
Pulling some markdown files down from a GitHub repo in a way that doesn’t pull them as HTML but rather as markdown
Submitting a series of prompts to the GPT API and putting the responses into a table
Downloading, unzipping, and making a specific graph in R, a language which I’ve only worked very minimally
All of these are things where I had a narrowly-defined task and a clear success criteria. I would have gotten there myself eventually, but I got there faster with GPT-4, and as a result, I could do more. And none of them are things where I was in some major danger of not understanding what the code was doing. Even in the case of coding in R, I was still troubleshooting and modifying the code myself.
2) One Place to Go for a Good Start
I made a mistake a few days ago with how I set up a GitHub repo where I wrote a README first on one branch, and then I had a Python file locally on a differently-named branch, and I wanted to merge them in, but I was getting an error message because these were two branches with no overlapping files in them.
I just needed the line for how to resolve it, and I pasted my error message in, and GPT-4 told me how to fix it, instead of having to Google around. It's not always correct, but it's a great first place to look and see if the thing works. And I could test whether it worked because I could see that my code was merging and getting pushed into the correct branch.
3) Doing the Things You Knew You Were Supposed to Do All Along
I had a project that I needed an MVP for, and in a few hours, I got the code working, and I also worked with GPT-4 to get function-level documentation, a thorough README, and unit tests. I refactored the code as I added functionality to make it simpler to read and build on, making it much more clear when I shared it with someone else. Some of those things are things I would have done eventually, others I might not have.
4) Learning Stuff I Always Wanted to Learn
There are always things on my list to learn, but there's a learning curve there. For instance, I’ve been wanting to use docker but it's not super high priority because, for the most part, a virtual environment meets my needs.
But GPT-4 can write me a list of instructions, and when I have questions (like "do I really need to register my Docker image"), it can explain to me the pros and cons, so I can get further down my list faster.
Addressing Technical Debt and Trade-offs
Technical debt is the cost of choosing quick, easy solutions now that will require more work to fix later. Sometimes, taking on substantial technical debt is necessary to get things done or test the feasibility of a project. Because there are costs associated with adhering to best practices, the 'right way' also varies significantly based on the project.
However, GPT-4 is changing the landscape of this trade-off by making it easier and faster to adopt better coding practices from the start. This means we can minimize the compromises we used to make when faced with tight deadlines or limited resources.
The Idealized System That Never Existed
I think a lot of complaints about GPT are setting it up in contrast to an idealized system that–at least on the data analysis and data science side–never really existed.
You don't like black boxes? Me either, but writing your own code doesn’t guarantee you transparency or even understanding on the part of the coder. And using tools with graphical user interfaces for analysis or visualization, like Excel or Tableau, can make transparency even more difficult to attain. (It’s possible to have rigorous testing and CI/CD processes with those...but a lot of people don't.) Data analysis and even research are not-infrequently being done inside of someone’s impossible-to-follow spreadsheet.
Another common criticism of GPT is that GPT is not always accurate. But neither are Stack Overflow or that page you Googled! GPT often gives much better guidance than even a successful Google query, because it can explain the code to you, it can modify it for your specific context, and troubleshoot the results.
Similarly, data scientists can already import scikit-learn and start producing results without conceptually understanding what the model is doing, or whether the training data is going to be useful for their real data. GPT can walk that person through different model types, explain trade-offs, and basically have a conversation with them that will leave them better-equipped to do useful, accurate work.
Using GPT-4 lets you get faster to well-documented, organized code. It makes the learning curve to good coding practices shorter, letting you do things the right way that previously you might not have been doing.
If you like adhering to good coding practices but you’re ever time-constrained in terms of either learning them or implementing them, GPT has a ton to offer.
Looking to the Future
Currently, GPT is a natural complement when iterating your way towards a goal. But you have to know what you’re trying to do. The better-defined and smaller the task you can give it, the more helpful it’s going to be. You still have to understand your goals and requirements at a detailed level, you have to pay a lot of attention to the output it’s giving you, and you should expect to be doing some modifications and troubleshooting that require knowledge of your own. It’s like going from coding without documentation, google, or stack overflow to adding all of those things at once. It’s not like magic.
In the future, GPT may be able to do even more. Maybe soon it will complete these tasks from start to finish in ways that are less transparent. But at the moment, it’s benefiting me as a tool for improving my coding practices, making me more productive, and tackling the ever-present issue of technical debt. The future is uncertain, but the present is really cool.