GitHub besieged by millions of malicious repositories in ongoing attack

Photo of author
Written By Sedoso Feb

Getty Images

GitHub is struggling to contain an ongoing attack that’s flooding the site with millions of code repositories. These repositories contain obfuscated malware that steals passwords and cryptocurrency from developer devices, researchers said.

The malicious repositories are clones of legitimate ones, making them hard to distinguish to the casual eye. An unknown party has automated a process that forks legitimate repositories, meaning the source code is copied so developers can use it in an independent project that builds on the original one. The result is millions of forks with names identical to the original one that add a payload that’s wrapped under seven layers of obfuscation. To make matters worse, some people, unaware of the malice of these imitators, are forking the forks, which adds to the flood.


“Most of the forked repos are quickly removed by GitHub, which identifies the automation,” Matan Giladi and Gil David, researchers at security firm Apiiro, wrote Wednesday. “However, the automation detection seems to miss many repos, and the ones that were uploaded manually survive. Because the whole attack chain seems to be mostly automated on a large scale, the 1% that survive still amount to thousands of malicious repos.”

Given the constant churn of new repos being uploaded and GitHub’s removal, it’s hard to estimate precisely how many of each there are. The researchers said the number of repos uploaded or forked before GitHub removes them is likely in the millions. They said the attack “impacts more than 100,000 GitHub repositories.”

GitHub officials didn’t dispute Apiiro’s estimates and didn’t answer other questions sent by email. Instead, they issued the following statement:

GitHub hosts over 100M developers building across over 420M repositories, and is committed to providing a safe and secure platform for developers. We have teams dedicated to detecting, analyzing, and removing content and accounts that violate our Acceptable Use Policies. We employ manual reviews and at-scale detections that use machine learning and constantly evolve and adapt to adversarial tactics. We also encourage customers and community members to report abuse and spam.

Supply-chain attacks that target users of developer platforms have existed since at least 2016, when a college student uploaded custom scripts to RubyGems, PyPi, and NPM. The scripts bore names similar to widely used legitimate packages, but otherwise had no connection to them. A phone-home feature in the student’s scripts showed that the imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half the time his code was given all-powerful administrative rights. Two of the affected domains ended in .mil, an indication that people inside the US military had run his script. This form of supply-chain attack is often referred to as typosquatting, because it relies on users making small errors when choosing the name of a package they want to use.

In 2021, a researcher used a similar technique to successfully execute counterfeit code on networks belonging to Apple, Microsoft, Tesla, and dozens of other companies. The technique—known as a dependency confusion or namespace confusion attack—started by placing malicious code packages in an official public repository and giving them the same name as dependency packages Apple and the other targeted companies use in their products. Automated scripts inside the package managers used by the companies then automatically downloaded and installed the counterfeit dependency code.

The technique observed by Apiiro is known as repo confusion.

“Similar to dependency confusion attacks, malicious actors get their target to download their malicious version instead of the real one,” Wednesday’s post explained. “But dependency confusion attacks take advantage of how package managers work, while repo confusion attacks simply rely on humans to mistakenly pick the malicious version over the real one, sometimes employing social engineering techniques as well.”

The flow of the campaign is simple:

  1. Cloning existing repos (for example: TwitterFollowBot, WhatsappBOT, discord-boost-tool, Twitch-Follow-Bot, and hundreds more)
  2. Infecting them with malware loaders
  3. Uploading them back to GitHub with identical names
  4. Automatically forking each thousands of times
  5. Covertly promoting them across the web via forums, Discord, etc.

Developers who use any of the malicious repos in the campaign unpack a payload buried under seven layers of obfuscation to receive malicious Python code and, later, an executable file. The code—mainly consisting of a modified version of the open source BlackCap-Grabber—then collects authentication cookies and login credentials from various apps and sends them to a server controlled by the attacker. The researchers said the malicious repo “performs a long series of additional malicious activities.”

This image demonstrates how the payload works:

GIF showing the flow of malicious payload inside repositories.
GIF showing the flow of malicious payload inside repositories.

The campaign began last May and was ongoing at the time this post went live on Ars. Apiiro said there have been three main phases so far:

May 2023: As originally reported by Phylum, several malicious packages were uploaded to PyPI containing early parts of the current payload. These packages were spread by ‘os.system(“pip install package”)’ calls planted in forks of popular GitHub repos, such as ‘chatgpt-api.’

July–August 2023: Several malicious repos were uploaded to GitHub, this time delivering the payload directly instead of through importing PyPI packages. This came after PyPI removed the malicious packages, and the security community increased its focus there. Aliakbar Zahravi and Peter Girnus from Trend Micro published a great technical analysis of it.

November 2023–now: We have detected more than 100,000 repos containing similar malicious payloads, and the number keeps growing. This attack approach has several advantages:

  • GitHub is huge, therefore despite the large number of instances, their relative portion is still insignificant and thus hard to detect.
  • Package managers are not involved as before, therefore explicit malicious package names are not mentioned, so that’s one less indicator.
  • The targeted repos are in a small niche and have low popularity, making it easier for unsuspecting developers to make the mistake and clone their malicious impersonators.
Diagram showing the three phases of the campaign.
Enlarge / Diagram showing the three phases of the campaign.

Wednesday’s post didn’t say how many downloads or installs, if any, the malicious repos in this campaign have received, and Apiiro representatives didn’t respond to an email seeking this and other details. Without this information, it’s hard to assess how much of a real-world threat the flood of malicious uploads to GitHub is. Given the sheer number of forks and the sustained duration of the campaign, developers would do well to be aware of the risk and ensure downloads come from legitimate sources.


Leave a Comment

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .