Challenges of LLM-driven Multi-agent System like OpenClaw
- Mar 25
- 4 min read
Written by SMU PHD Haoyu Wang & AIDX TECH Yifan Jia
We highlight three fundamental challenges that arise from the design of LLM-driven multi-agent systems such as OpenClaw.
(1) Externalized Control Flow

A fundamental shift introduced by systems like OpenClaw lies in the externalization of control flow from deterministic program logic to the reasoning process of large language models (LLMs). In traditional software systems, control flow — the sequence and conditions under which functions are executed — is explicitly defined, statically analyzable, and subject to formal verification. However, in LLM-driven multi-agent systems, this control flow is no longer hard-coded but instead dynamically generated at runtime through natural language reasoning. Decisions such as which skill to invoke, in what order, and under what conditions are delegated to the model, effectively transforming execution into an emergent property of probabilistic inference rather than a predefined structure.
This paradigm introduces significant challenges for system reliability and security. First, the non-deterministic nature of LLM outputs means that identical inputs may yield different execution traces, undermining reproducibility and predictability. Second, because the control logic is encoded implicitly in model weights rather than explicitly in code, it becomes difficult — if not impossible — to statically analyze or formally verify all possible execution paths. This lack of transparency weakens traditional guarantees around correctness and safety. From a security perspective, externalized control flow creates opportunities for adversarial manipulation, such as prompt injection attacks that subtly alter the model's reasoning process and redirect execution toward unintended or unsafe actions. In essence, the system relinquishes direct control over its own behavior, relying instead on a generative model whose internal decision-making process is opaque and only partially aligned with system-level policies. This represents a profound departure from established software engineering principles, where control flow is a primary locus of enforcement and assurance.
(2) Attack Surface Explosion via Skill Ecosystems
The extensibility of OpenClaw through a diverse ecosystem of built-in and third-party skills significantly amplifies the system's attack surface. Each skill encapsulates a specific capability — such as sending emails, accessing files, or interacting with external APIs — and is typically developed independently, often without uniform security standards or rigorous vetting processes. While this modularity enhances flexibility and scalability, it also introduces a critical dependency on the correctness and trustworthiness of every individual component. In effect, the overall security of the system becomes bounded by the weakest skill in the ecosystem.
This phenomenon can be understood as an instance of compositional security failure. Even if the core orchestration framework enforces certain safeguards, a single poorly designed or malicious skill can bypass these protections by exposing unintended functionality, mishandling sensitive data, or executing unsafe operations. Moreover, the dynamic nature of skill discovery and invocation — often driven by semantic matching or LLM-based reasoning — means that the system may select and execute skills in contexts that were not anticipated by their developers. This increases the likelihood of misuse, especially when skills are composed in novel ways across different domains.
The problem is further exacerbated by the introduction of third-party marketplaces, where skills can be published and integrated with minimal friction. While this promotes innovation, it also mirrors the well-documented security challenges of plugin-based architectures in browsers and operating systems, where extensions can act as vectors for privilege escalation, data exfiltration, or persistent compromise. In the absence of strict isolation, sandboxing, or capability-based access control, skills effectively operate with broad privileges inherited from the agent. Consequently, the system lacks a robust mechanism to contain the impact of a compromised or adversarial component. The resulting attack surface is not only large but also highly heterogeneous and difficult to reason about, posing significant challenges for both developers and security analysts.
(3) Semantic-Layer Attacks
Unlike traditional software systems, where attacks typically exploit vulnerabilities at the code, memory, or protocol level, OpenClaw and similar LLM-driven architectures are susceptible to a new class of threats operating at the semantic layer. These attacks do not rely on low-level exploits but instead manipulate the meaning and interpretation of natural language inputs to influence system behavior. Prompt injection is a canonical example, where an adversary embeds malicious instructions within seemingly benign content, causing the model to deviate from its intended task and execute unauthorized actions. Because the system interprets all inputs through the same linguistic interface, it becomes inherently difficult to distinguish between legitimate instructions and adversarial manipulations.
The core issue is that the boundary between data and control is blurred. In conventional systems, user input is treated as data and is strictly separated from executable code, with well-established mechanisms such as input validation and sanitization. In contrast, LLMs treat all text as potential instructions, meaning that any input — whether from a user, a document, or an external API — can influence control flow. This creates a powerful attack vector for indirect manipulation, where malicious content embedded in emails, web pages, or documents can be ingested by the agent and subsequently alter its reasoning process.
Furthermore, semantic-layer attacks are inherently difficult to detect and mitigate using traditional techniques. They do not exhibit clear signatures, such as anomalous system calls or memory corruption, and often appear contextually plausible. For instance, an injected instruction might be framed as a high-priority task or a system override, exploiting the model's tendency to follow authoritative or urgent directives. As a result, defenses based solely on pattern matching or static rules are insufficient. Instead, addressing these threats requires a deeper integration of security mechanisms into the reasoning and execution pipeline, ensuring that all actions are subject to explicit policy enforcement regardless of how they are semantically framed. Ultimately, semantic-layer attacks highlight a fundamental limitation of current LLM-based systems: their inability to robustly separate intent from instruction in an adversarial environment.





Comments