Safety Tool Chain (3): Software Composition Analysis (SCA)
What is SCA?
Software Composition Analysis (SCA) is a technique used to identify, manage, and audit third-party components (especially open-source components) in software. It focuses on component dependency identification, known vulnerability scanning, and license compliance checks to ensure the security of the software supply chain.
SCA Core Competencies
1. Component Discovery and Inventory Generation
The SCA tool automatically identifies all direct and indirect dependencies by scanning source code, binaries, container images, or dependency management files, constructing a complete and structured SBOM. Mainstream SBOM formats include SPDX and CycloneDX, which provide standardized languages for exchanging component information.
2. Vulnerability Matching
Based on component identification, the SCA tool matches them against well-known vulnerability databases (NVD, CISA KEV, CVE, etc.) to detect security vulnerabilities present in the components.
3. License Compliance Management
Open source software is not automatically compliant by default; its use is governed by specific open source licenses. The SCA tool can detect the license type of each open source component, assisting in assessing compliance.
4. Open Source Component Health Monitoring
The SCA tool can assess the "health indicators" of components, such as community activity, update frequency, number of unresolved issues, and the presence of a stable and continuous maintainer. This assists open source users in evaluating the health status of open source projects and eliminating potential risks.
Principle of SCA Implementation
1. Source Code Scanning
Source code scanning identifies software components through multi-dimensional analysis. It first locates and matches source code directories and files. It then analyses the code to identify reusable fragments and automatically parses dependency management configuration files to obtain declared dependencies.
In addition, it can simulate or invoke the build system (e.g., executing the `mvn dependency: tree` command) to capture complete transitive dependencies. By matching and correlating these extracted multi-level features with the SCA feature knowledge base, a comprehensive and accurate Software Bill of Materials (SBOM) is automatically generated.
Figure 1: SCA Source Code Scanning Principle
2. Binary Scanning
1) Feature Extraction
Binary scanning begins by unpacking and structurally parsing the binary file to calculate file-level features for the entire file or key sections. Metadata feature sets, such as string constants, are obtained through string extraction, symbol table analysis, and metadata parsing. The binary is then disassembled. Through function identification and instruction normalization, an Attributed Control Flow Graph (ACFG) and a Function Call Graph (FCG) are generated.
Figure 2. SCA Binary Scan Feature Extraction Principle
2) Third-Party Library Identification
The system calculates the ACFG similarity between the target library and functions in the knowledge base, selecting the function with the highest matching degree as the initial anchor point. Based on this anchor, expansion proceeds along the function call graph to adjacent nodes, forming a high-similarity function cluster region.
Finally, the overall similarity of this cluster is compared
with the function call graph structures of known third-party libraries in the
knowledge base. When a predefined threshold is reached, the library is
accurately identified.
Figure 3. SCA Binary Scan Third-Party Library Identification Principle
3) Version Identification
Version Identification begins by extracting string
information from the target binary library as key clues, and simultaneously
performing a deep comparison with the baseline version in the knowledge base to
accurately identify reused original code blocks and newly added code regions.
By analyzing this code inheritance and evolution pattern and matching it with
the version feature library, the specific version or version range of the
target library can be accurately determined.
Figure 4. SCA Binary Scan Version Identification Principle
Advantages and Limitations of SCA

Figure 5 Advantages and Limitations of SCA
Advantages:
1. Ease of Use: Compared to tools like DAST and IAST, which require setting up a runtime environment and complex configurations, SCA's access mode is more agile. Users typically only need to upload their code repository or build artifacts to start a deep scan, greatly lowering the initial barrier to deploying security capabilities.
2. Depth of Dependency Insight: SCA can discover deeply hidden transitive dependencies, often "vulnerability blind spots" that are difficult to track manually.
3. Efficiency of Incident Response: Precise impact analysis based on SBOM can shorten incident response time from days to hours or even minutes, achieving a fundamental shift from "panic-wide investigation" to "precise location and remediation."
Limitations:
1. False Positives and False Negatives: The detected SBOM list is not 100% accurate and may contain false positives and false negatives, requiring manual recalibration. Furthermore, vulnerability detection relies heavily on version matching; if the knowledge base is not updated in a timely manner, the risk of false negatives will increase significantly.
2. Insufficient Vulnerability Verification Capabilities: The vulnerability reachability analysis capabilities of mainstream SCA tools are still immature. While these tools can effectively inform users of a vulnerability, they cannot provide in-depth static analysis of whether the specific call path to that vulnerability is actually reachable within the application or whether the triggering conditions are satisfied.
3. Limitations of License Compliance Detection: In the field of open-source license compliance, SCA primarily provides automated detection and preliminary suggestions based on general rules. For complex and specific enterprise use cases, these tools struggle to provide clear and legally binding judgments. Ultimately, a compliance assessment by the legal team, tailored to the specific business scenario, is still required.
Future Trends of SCA
As technology advances and needs evolve, SCA technology is evolving towards greater intelligence, integration, and standardization:
1. Deep AI Empowerment: AI enhances code semantic understanding, enabling more accurate assessment of vulnerability exploitability. Generative AI assistants may be integrated to automatically analyze the impact of remediation solutions and even generate preliminary patch code or upgrade scripts.
2. Software Supply Chain Poisoning Attack Detection: Poisoning attacks targeting the open-source ecosystem are increasingly rampant. Attackers spread malicious code by releasing malicious packages with similar names or hijacking legitimate packages. Future SCA will enhance its ability to detect malicious packages, using behavioral analysis, reputation scoring, and other technologies to identify potential malicious behavior before components are introduced into projects.
3. Continuously Improving Component Version Detection Rate: Based on multi-technology integration and database expansion, it helps achieve more accurate identification in complex scenarios. Combining dependency analysis, code fingerprinting, binary analysis, code snippet matching, and other technologies ensures that components, whether declared through package managers, directly copied, or embedded in binary files, can be identified. We continuously maintain and expand our component version signature library, and enhance our coverage of emerging ecosystems (such as Rust and specific domestic components).
Next Issue Preview
In this article, we explored the concept, capabilities, principles, and limitations of Software Composition Analysis (SCA). We learned that SCA is like creating a sophisticated "composition manual" for complex software systems. Through automated scanning, it reveals and inventories all open-source dependencies in the software, accurately locating security vulnerabilities and license risks. This process establishes a critical line of defense for software supply chain security in enterprises.
In the next article, we will delve into another key technology in the field of application security testing—SAST. Unlike SCA's perspective of "reviewing third-party components," SAST technology is like a sophisticated "code X-ray machine." It can perform deep scanning of source code without running the program, tracing data flow and control flow at the semantic level, thereby accurately identifying security vulnerabilities introduced by improper coding, such as SQL injection, cross-site scripting, and buffer overflows. We will learn how SAST achieves "security left shift," transforming into a code review expert during the code writing stage, nipping risks in the bud.
Stay tuned as we unveil the core capabilities and implementation principles of SAST, gaining a deeper understanding of how to build robust security barriers within the software supply chain!
Disclaimer:
The analysis and discussion in the article aim to share industry trends and technical practices. If there are any issues related to intellectual property, please feel free to contact us, and we will handle and adjust accordingly in accordance with relevant laws and regulations.