Fifteen-year-old N-day Python tarfile module vulnerability puts software supply chain under the microscope.
Cybersecurity company Trellix announced Wednesday that a known Python vulnerability puts 350,000 open-source projects and the applications that use them at risk of device take over or malicious code execution. All applications that use the Python tarfile module are potentially at risk.
SEE: Hiring kit: Python developer (TechRepublic Premium)
The Python tarfile module, which is the default module installed in any project using Python and is found extensively in frameworks created by Netflix, AWS, Intel, Facebook, Google and applications used for machine learning, automation and Docker containerization, Trellix said.
Hackers can take over devices by using this vulnerability
The vulnerability, CVE-2007-4559, was originally discovered in 2007 and given a medium risk score of 6.8 out of 10. It can be exploited by uploading a malicious file generated with two or three lines of code using un-sanitized tarfile.extract or the built-in defaults of tarfile.extractall. Once hacked, attackers can execute arbitrary code or take control of the device, Trellix said.
It is unknown how many live applications utilize the tarfile module and no known exploitation of the vulnerability has occurred in the wild, said Doug McKee, a principal engineer and director of Vulnerability Research at Trellix. Nor is he aware of any scanners looking for the exploit.
“Due to a vulnerability that went unpatched 15 years ago in a main software supply chain, hundreds of thousands of pieces of software are vulnerable to an attack today, which can lead to complete system compromise,” McKee said. “Like the events of Log4j, every organization will need to determine if and how they are affected, which is why we are releasing a script to help with that discernment process.”
The script to check for vulnerable applications is available at GitHub.
How the CVE-2007-4559 vulnerability was re-discovered
Trellix Advanced Research Center researcher Kasimir Schulz, a vulnerability research intern at Trellix, helped find the issue while investigating an unrelated vulnerability.
“Initially we thought we had found a new zero-day vulnerability,” he said in a blog post. “As we dug into the issue, we realized this was in fact CVE-2007-4559.”
CVE-2007-4559 is a path traversal attack in the extract and extractall functions in the tarfile module that allows an attacker to overwrite arbitrary files by adding the “..” sequence to filenames in a TAR archive, Schulz said.
Using standard GitHub access, Trellix researchers discovered that hundreds of thousands of GitHub repositories were vulnerable. Working with GitHub, they found 2.87 million open-source files which contained Python’s tarfile module in about 588,000 unique repositories — 61% of which, or 350,000, were vulnerable to being attacked via the tarfile module.
“This is the devastating power of CVE-2007-4559,” McKee said. “It’s in a programming language that is widely used, therefore affects a very wide range of end-user products.”
Even though the vulnerability was known, it has been allowed to propagate through tutorials which incorrectly demonstrate how to securely deploy the tarfile module. Even Python’s own documentation provides incorrect information, Trellix said.
What companies can do to avoid an attack
To exploit the vulnerability requires changes to be made in the code of the application using the tarfile module, McKee said. To avoid being hacked, developers need to check the target directory of where the tarfile is writing data to ensure that data is only extracted to the directory intended by the developer.
Trellix is working to push code via GitHub pull request to protect open-source projects from the vulnerability. Trellix currently has patches available for 11,005 repositories ready for pull requests. Each patch will be added to a forked repository.