Very nicely communicated => “learning Python” increasingly means learning to think about architecture and workflows: how do you want data to flow, what should update automatically, how should pieces connect, what should you test? You no longer need to master syntax to work in these ecosystems.
It's fucking embarrassing that people still use any of those tools. Python is the modern choice. R has more advanced statistical packages than any of the proprietary systems. There's just no excuse.
It should become popular to pass state laws that public university econ pros get fired if they use Stata and biostats/PH profs are fired if they use SAS.
This piece captures something that’s often missed in the “Python vs. Stata/SAS” debate: the real difference isn’t syntax or even cost, it’s infrastructure.
What you describe with the OPM data is exactly the gap between availability and usability. Releasing hundreds of files via one-by-one downloads technically satisfies openness, but it doesn’t support reproducibility, auditing, or longitudinal workflows. Once you move into a Python-centric ecosystem, you’re no longer just “analyzing data” — you’re building a pipeline that can be rerun, inspected, and shared.
I also appreciate the emphasis on the broader lifecycle. In survey and official statistics work, the problem starts before analysis: questionnaires, skip logic, and metadata are still treated as static documents rather than versioned software artifacts. That’s where a lot of irreproducibility creeps in.
Python makes it possible to treat the entire chain — instrument design, data ingestion, validation, analysis, and publication — as one coherent system. That’s the motivation behind projects like svy (https://svylab.com/docs), not replacing statistics, but modernizing the workflow around them.
The most convincing part of your argument, to me, is that none of this is hypothetical anymore. The tooling exists, the sharing infrastructure exists, and AI assistance has lowered the switching cost dramatically. At this point, sticking with closed, non-programmable ecosystems is more a historical artifact than a technical necessity.
Nobody is actually writing code anymore. So just have the AI write Python!
I appreciate your much shorter version of my post!
Very nicely communicated => “learning Python” increasingly means learning to think about architecture and workflows: how do you want data to flow, what should update automatically, how should pieces connect, what should you test? You no longer need to master syntax to work in these ecosystems.
It's fucking embarrassing that people still use any of those tools. Python is the modern choice. R has more advanced statistical packages than any of the proprietary systems. There's just no excuse.
It should become popular to pass state laws that public university econ pros get fired if they use Stata and biostats/PH profs are fired if they use SAS.
This piece captures something that’s often missed in the “Python vs. Stata/SAS” debate: the real difference isn’t syntax or even cost, it’s infrastructure.
What you describe with the OPM data is exactly the gap between availability and usability. Releasing hundreds of files via one-by-one downloads technically satisfies openness, but it doesn’t support reproducibility, auditing, or longitudinal workflows. Once you move into a Python-centric ecosystem, you’re no longer just “analyzing data” — you’re building a pipeline that can be rerun, inspected, and shared.
I also appreciate the emphasis on the broader lifecycle. In survey and official statistics work, the problem starts before analysis: questionnaires, skip logic, and metadata are still treated as static documents rather than versioned software artifacts. That’s where a lot of irreproducibility creeps in.
Python makes it possible to treat the entire chain — instrument design, data ingestion, validation, analysis, and publication — as one coherent system. That’s the motivation behind projects like svy (https://svylab.com/docs), not replacing statistics, but modernizing the workflow around them.
The most convincing part of your argument, to me, is that none of this is hypothetical anymore. The tooling exists, the sharing infrastructure exists, and AI assistance has lowered the switching cost dramatically. At this point, sticking with closed, non-programmable ecosystems is more a historical artifact than a technical necessity.