A while back I was looking at the output from my GitHub profile generator and it seemed off. In particular, the language stats seemed off. The generator sums up how many bytes of code I’ve written for each language. and then calculates what percentage of my total output that represents.
Here’s what it showed, more or less:
|Past Two Years
|Perl: 76%, 9.5 MB
|Perl: 77%, 11.3 MB
|Rust: 21%, 2.7 MB
|Rust: 18%, 2.7 MB
|Go: 2%, 214.8 KB
|Go: 2%, 368 KB
This isn’t obviously wrong. I’ve written a lot of Perl and I’ve been doing a fair bit of Rust recently. But the Rust numbers seemed excessive. Had I written 2.7MB of Rust code in two years? That’s a lot of code!
So I filed a bug to remind myself to look at this later. Today was later.
I added some debugging output to my code to print out various bits of info as it went, focusing on each repo’s language stats. Eventually, I had it just print out bytes of Rust in each repo that had any Rust. That did the trick.
I realized that my Rust repos have huge amounts of generated code. For
project exists to
generate Rust code from Tailwind CSS. The repo contains an example of that
generated file is 613KB all by itself.
The fix was simple. GitHub uses Linguist
for its language detection and stats. You can set attributes in your
control how Linguist generates stats. Any file with a
attribute is excluded from Linguist’s stats collection. So I went through and
added this to my Rust repos.
My Rust stat went down to 2.1MB. I’d have expected it to go down more, but I think that maybe some of what I marked as generated was already being excluded somehow.
And then it occurred to me that I have the same issue with some Perl repos
contain ridiculous amounts of generated code. Apparently, I knew about this
Linguist thing before because
DateTime-Locale already had a
file. But there was none for
DateTime-TimeZone. Adding that removed about
6MB of Perl code from my stats.
So here are the new stats:
|Past Two Years
|Perl: 60%, 3.6 MB
|Perl: 66%, 5.4 MB
|Rust: 34%, 2.1 MB
|Rust: 26%, 2.1 MB
|Go: 3%, 214.8 KB
|Go: 4%, 368 KB
|HTML: 1%, 62.6 KB
That seems a bit more sensible. I’ve written a lot of Perl, but I haven’t worked on many of my Perl projects for a while.
I also noticed some weirdness with the count of PRs written and merged. When I run the profile generator locally I get a higher number than when it runs in GitHub Actions. That’s presumably because running it locally I run it with a GitHub API token that has access to private repos, so it sees private MongoDB repos.
But if I change the query to exclude private repos and run it locally, it gets a much lower number than it should. I’m not sure what’s going on here. Doing the query manually on the GitHub website I get numbers that match what the code gets in GitHub Actions, so I’m pretty sure that’s the right one. Confusing!
Just for good measure, I excluded all of my work-related orgs from the queries too. The point of the profile is to highlight my FOSS work, not my work work.
But even with this refinement I still get different results from GitHub Actions versus running it locally. If anyone has any ideas on why, I’d love to hear them!
GitHub is pretty slow to render this file. Be patient. ↩︎