The Inclusivity of GitHub Profiles in Hiring

May 21, 2024

When I was transitioning to data science, which I began doing with an infant, a two-year-old, and a full-time job, this was only possible because I could build data skills outside of both my job and a formal education program — and employers valued those skills.

I got my first data science job quickly, but even after that, I was still putting in a lot of outside hours learning what I needed to know to be a capable data scientist. It was difficult, but it also let me transition my career while continuing to work full-time — with minimal educational expenses and on my own schedule. In many fields, particularly competitive fields involving knowledge work, that would be unheard of. There have also been periods since then where the the code I was writing for personal projects was integral to my career development and next steps.

It was like this except I don’t have an Apple, my apartment was a lot messier, and I wasn’t smiling.

I think when we talk negatively about certain hiring processes, like valuing personal projects on GitHub, and how that disadvantages people with caretaking responsibilities — like, basically people like me— we’re missing two points:

Nearly any way you could hire advantages people who are able to spend more time on work or work-adjacent tasks.
Considering candidates based on their outside-of-work skill development, including their GitHub projects, makes the hiring process more inclusive across several dimensions compared to many other strategies.

Methods Which Don’t Advantage Extra Hours

What would a hiring process that doesn’t advantage ‘extra hours’ on work or work-adjacent task look like?

Something completely random, like deciding based on a modulo of candidates’ social security numbers.
Something based on factors determined long before applying, like only hiring people with undergraduate STEM degrees from very competitive schools or using SAT scores.
Evaluating candidates based solely on skills or experiences gained at work. This is logistically nearly impossible in a field like data science. You also couldn’t send recruiters to meetups or conferences, hire from employees’ professional networks (except for former coworkers), or hire anyone self-taught. And even then, you’re still advantaging candidates based on career progress made because they worked longer hours at their actual jobs.

I think these not only get you less accurate hiring processes, but anything besides the total randomness option also suffers from its own bias or equity issues. You’re preserving an existing order: where candidates were at 17, or whether their current job exposed them to the skills you’re looking for. In competitive fields, preserving the work-life balance of certain candidates inherently comes at the expense of keeping out other candidates.

Skills-Based Hiring vs. Other ‘Extra Time’ Methods

I also think there’s something huge to be said for how using skills-based hiring in general, or specifically letting candidates distinguish themselves via personal coding projects, is also more inclusive relative to other hiring methods that advantage putting in extra hours.

Flexibility to work on projects on your own schedule, unlike in-person networking events often held in the evenings or requiring travel or registration fees.
No need to navigate or fit into the culture of any specific professional organization, or for such organizations to even exist in one's location.
Doesn’t require pricey certifications, additional degrees, or unpaid internships.

Every hiring method is about a combination of incentivizing certain behaviors and hiring one group of people over another. A skill-based approach which incentivizes learning and demonstrating things that are directly tied to your ability to be productive at a job both incentivizes useful behavior and lets people from all different backgrounds with various time and life constraints get hired.

Bad Ways To Use GitHub

Of course there are terrible ways to use GitHub to hire people, like counting the number of recent public commits they’ve made.

But this is bad mainly because it’s ineffective at selecting better candidates, not because it disadvantages people with responsibilities outside of standard work hours.

First, a truly “counting the green squares” approach to hiring can be easily gamed in not much time. But second, for the roles where looking at recent public commits is the least productive — senior roles in which candidates can't demonstrate their skills via weekend coding projects because much of their value-add has to do with architecting, planning, or managing stakeholders — the time commitment that’s most directly competing with personal coding projects is the long hours they’re working for their actual job. That senior engineers don’t want to add trivial side projects to already-punishing workloads is legitimate for its own sake, but this isn’t being more inclusive of people with significant caregiving responsibilities, because they’re already mostly excluded from these hiring pipelines, just as they are from senior roles in many other fields*, regardless of how GitHub is used.

Conclusion

I think it’s incredibly cool and I am extremely grateful that data science has a path which emphasizes skills, regardless of where you learned them. Fields where a lot of people want to join or progress are competitive — they have to be — but I don't think that data science is competitive in worse ways than other types of knowledge work.

But while I don’t think it’s desirable or even possible to negate the career advantages associated with working additional hours to teach yourself more material or demonstrate what you’ve learned, I do think there’s a related issue: To the extent that this career path requires teaching yourself a set of skills that may change year-to-year, or networking in ways that aren’t obvious, or putting together a portfolio that even a data science graduate program may not actually teach you to make effectively, that presents a huge advantage to candidates who are already plugged into this professional world and know what the rules are.

I think it’s possible to fix those information issues to some extent, and the way we do that is by being as transparent as possible about both our own careers and about the ways we hire people.

*For a general discussion on the changing relationship between working long hours and salary/career opportunities and how this affects mothers in particular, I recommend this New York Times article.

Ali Ruth

May 21, 2024Edited

Great post! I went on the job market as I was finishing my PhD and applied for a mix of jobs, including some that required traditional academic job talks...

I ultimately was hired into a federal data science job through an SME-QA assignment (a mapping project; pretty fun) where one output was a linked GitHub repo. While there are thorny equity questions re: coding assignments and projects, I still found the skills assessment to be a huge relief compared to the traditional academic job talk. Time-wise, job talks take many hours of preparation and can feel a bit arbitrary/nerve-wracking. I was surprised to find that many data science hiring processes seemed to be a bit more democratic and flexible.

Expand full comment

The Present of Coding

Discussion about this post

Ready for more?