Discussion about this post

User's avatar
Marcos H's avatar

I think there is a big difference between not knowing how like a bubble sort works (basic comp sci concept but not useful to a data scientist), not understanding your graphing code (might be ok assuming you’re sure the right data is being plotted the way you want and you can verify it by just looking at it?), and not understanding the polars/pandas/sql code that is doing the analysis.

For example, I have used an LLM to like translate a plotly graph to altair (so I could have shaded error bars which plotly doesn’t support) but it was a long back and forth and part of my wishes I just actually learned Altair rather than outsource to an LLM.

I think the jury is very much out on the productivity increases of LLMS for coding and there is evidence using them inhibits learning and understanding

https://www.anthropic.com/research/AI-assistance-coding-skills

LLMs can work for coding because you can always test the code - I think you are right about the importance of tests. But I want to understand any actual analysis code so I can make sure the analysis is correct! The LLM doesn’t understand anything; only humans know what our true intent is. If we divorce ourselves from understanding the analytic code than the risk is high that subtle but believable errors will creep in.

I also disagree to we have no choice but to learn/use these tools and change how we work. Companies are already setting limits on tokens and realizing that maybe all this AI spend isn’t worth the money. I use coding tools for what I think they are good at - and that’s what every data science should do. No pressure to use it because of FOMO.

1 more comment...

No posts

Ready for more?