If you’ve ever assumed that your private or deleted GitHub repositories are secure and inaccessible, it’s time to reconsider. A recent discovery has shed light on a significant security concern that allows anyone to access data from deleted forks, deleted repositories, and even private repositories on GitHub. What’s even more surprising is that this is not a flaw—it’s an intentional design by GitHub.
The Issue: Data Exposure in Deleted Forks
Let’s start with deleted forks. Imagine you have a fork of a public repository. You commit code to your fork, perhaps including sensitive information like an API key or password, and then you delete the fork, assuming the data is gone forever. Unfortunately, this is not the case. Even after deletion, the data remains accessible through the commit history, and there’s nothing you can do to remove it.
To demonstrate, let’s say you fork a repository, commit some changes, and then delete the fork. You might think the data is gone, but if you retained the commit’s URL, you can still access it. Even if you don’t have the full commit hash, you can access the data using just the first few characters. This vulnerability means that sensitive data you thought was deleted remains publicly accessible, posing a significant security risk.
Brute Forcing Commit Hashes
You might think that the lengthy commit hashes provide some security, but that’s not entirely true. While the full commit hash is difficult to guess, GitHub allows access with just the first six characters. This makes it feasible for attackers to brute-force the short hashes, exposing sensitive information. The minimum number of characters required for a commit hash is four, which reduces the possible combinations to 65,536—well within the reach of brute-force attacks.
GitHub Archive: A Goldmine for Attackers
Even more troubling is that commit hashes can be easily discovered through platforms like GitHub Archive, which logs every event on GitHub, including commits. This means that the hashes for almost every commit in a public repository are available, making it easier for attackers to find and exploit sensitive information that was supposed to be deleted.
A case study by Joe Leon from Truffle Security highlights the severity of this issue. He found 40 valid API keys from deleted forks, demonstrating how users inadvertently expose sensitive information. This is especially common among new users who might fork a repository, add sensitive data to an example file, and then delete the fork, not realizing the data remains accessible.
Deleted Repository Data: A Bigger Problem
The problem extends beyond deleted forks. If a user forks a public repository, and you later delete the original repository, the forked copy still retains all the commits, including those made after the fork. This means that as long as at least one fork exists, the data will remain publicly accessible, even if the original repository is deleted.
This isn’t just a hypothetical scenario. Recently, a major tech company accidentally committed a private key for an employee’s GitHub account that had significant access to their entire GitHub organization. The company quickly deleted the repository, but since it had been forked, the sensitive data remained accessible through the fork. This incident underscores the vulnerability of relying on repository deletion as a method of securing sensitive information.
Private Repository Data Exposure
The situation gets even worse when it comes to private repositories. A common workflow for developers is to create a private repository, fork it internally, and then later make the original repository public while keeping the fork private. However, any commits made to the private fork before the upstream repository was made public are still accessible through the public repository. This means that sensitive features or code in the private fork could inadvertently be exposed to the public.
GitHub’s Stance: It’s a Feature, Not a Bug
When this issue was reported to GitHub through their bug bounty program, the response was clear: “This is an intentional design decision and is working as expected.” GitHub’s documentation supports this, stating that commits in a fork network can be accessed from any repository in the network, including the upstream repository. Additionally, when a private repository is made public, all commits, including those from forks, become visible to everyone.
This design choice reflects GitHub’s origins as a platform for open collaboration, where transparency and accessibility were prioritized. However, in the context of modern security expectations, this approach seems outdated and potentially harmful.
The Implications: What This Means for You
The main takeaway is that any commit made in a GitHub repository network is effectively permanent. Deleting a repository or fork does not erase the data; it will remain accessible as long as there is at least one fork in existence. This means that if you accidentally commit sensitive information, like an API key, simply deleting the commit or repository is not enough—you must rotate the key immediately.
For sensitive information like personal data, the situation is even more dire. Once committed, that data could be publicly accessible forever, posing a significant security and privacy risk.
Should GitHub Change This?
GitHub has a strong reputation for security, and they’ve invested heavily in ensuring that private data remains private. However, this issue highlights a significant gap in their security model. The separation between private and public repositories is seen by many users as a security boundary, and the expectation is that private data remains private. Unfortunately, as we’ve seen, that’s not always the case.
It’s time for GitHub to reconsider this design decision. While the platform was built for open collaboration, the security needs of its users have evolved. GitHub should prioritize user security and consider updating its repository network model to ensure that sensitive information can be fully and permanently deleted when needed.
Conclusion
The fact that deleted and private repository data on GitHub remains accessible is a serious security concern. It’s a reminder that we must be vigilant about what we commit to repositories and understand that deleting something from GitHub doesn’t necessarily mean it’s gone. GitHub’s current design reflects its roots in open source, but as the platform has grown and its user base has diversified, it’s clear that changes are needed to better protect sensitive information.
So, did you know GitHub worked this way? It’s a wake-up call for many, and it’s time to start asking for better protections for our data.