Finding Non-Stale GitHub Repositories Despite the API
With GitHub Rest API, you can dump quite a lot of detail about a repository;
curl --silent \
-H 'Accept: application/vnd.github.preview' \
https://api.github.com/repos/aboul3la/Sublist3r
...which has information such as the number of forks, stars, and the last time this repository was updated.
For an upcoming research project, I had to filter through a lot of repositories. I figured I could use a recency check to get the number down to a manageable size.
{
...
"name": "Sublist3r",
"full_name": "aboul3la/Sublist3r",
...
"created_at": "2015-12-15T00:55:25Z",
"updated_at": "2024-10-15T14:35:14Z",
"pushed_at": "2024-08-02T00:00:30Z",
...
}
Looking at the (truncated) API response of the call above, we see three keys with timestamp values:
created_at
updated_at
pushed_at
created_at
is intuitive and works as you would expect; it's the timestamp of the initial commit made to the repository.
Not what I'm after, though.
Maybe API documentation will give us a clue?
Let's see updated_at
and pushed_at
, which ultimately caused this post.
pushed_at
has 23 search results, because pushed_at
is a common key in API objects.
The same goes for updated_at
.
And ultimately, no.
None of the keys are documented in the API docs as far as I can tell.
Google AI answers has been fun to make fun of, and they are not helpful here either.
There is a relevant discussion from over a decade ago on Stack Overflow, and in the comments, Tomáš Hübelbauer seems to have an idea about the weird behaviour going on.
updated_at
seems to be tracking any update that happens to the repository, even someone starring it.
I haven't verified this but it's suspiciously recent for every repository I've checked so I'll gladly accept that explanation.
From the same Stack Overflow question, the whole thing has been discovered by user Poonacha:
Strangely, I have noticed that the
pushed_at
flag gets updated even if anyone happens to open apullrequestevent
on any of the focal repository's branches (even without merging or closing it). The opened pull request can come from any remote fork. I am not sure why this us happening, since, a per my understanding, no commit is being made on any of the repos' branch in this particular case. – Commented May 30, 2018 at 9:38
Time to experiment! I'm writing this at 2024-10-15 13:44:00 (GMT). Let's run:
curl --silent \
'https://api.github.com/repos/aboul3la/Sublist3r' \
-H 'Accept: application/vnd.github.preview' | jq '.pushed_at'
Which gives us
"2024-08-02T00:00:30Z"
GitHub API believes that aboul3la/Sublist3r has been pushed to August this year. However, the most recent commit is from 4 years ago.
Let's verify, sort pull requests in most recently updated;
That's the timestamp we see at pushed_at
to the response of our original API query.
{
...
"name": "Sublist3r",
"full_name": "aboul3la/Sublist3r",
...
"created_at": "2015-12-15T00:55:25Z",
"updated_at": "2024-10-15T14:35:14Z",
"pushed_at": "2024-08-02T00:00:30Z",
...
}
Which... proves the point above but does not help us. None of the three timestamps help us if we just want to know whether this repository is updated or not.
I just want to check if a GitHub repository has been updated recently?!
#!/usr/bin/env sh
# Usage: owner/repo -> be5invis/Iosevka to query https://github.com/be5invis/Iosevka
# Get the default branch name, use jq -r to not have "quotes"
DEFAULT_BRANCH=$(curl --silent "https://api.github.com/repos/$1" -H 'Accept: application/vnd.github.preview' -H 'Authorization: Bearer <Your GitHub API Key Here>' | jq -r '.default_branch')
# Get the most recent commit of the default branch
HASH=$(curl --silent "https://api.github.com/repos/$1/branches/$DEFAULT_BRANCH" -H 'Accept: application/vnd.github.preview' -H 'Authorization: Bearer <your GitHub API Key Here>' | jq -r '.commit.sha')
# Get the timestamp of the most recent commit to the default branch
curl --silent "https://api.github.com/repos/$1/commits/$HASH" -H 'Accept: application/vnd.github.preview' -H 'Authorization: Bearer <your GitHub API Key Here>' | jq -r '.commit.author.date'
Enjoy!
I've written the script above around 3 times until I decided to document the whole process in a single place, hopefully, I've saved you a bit of time :)
Finally, I should note that the 'timestamp of last commit' is not a good measure on its own. Mature codebases, small scripts, etc. can go forever without any updates and they will work just fine! The metric I used here was required for a specific purpose, and not on its own either.