จากกด "ใช่" อย่างมืดบอด สู่การตรวจสอบก่อนเซ็น: Sigil จะสร้างเกราะป้องกันให้ AI Agent ได้อย่างไร?

特邀专栏作者

2026-07-03 08:30

บทความนี้มีประมาณ 4992 คำ การอ่านทั้งหมดใช้เวลาประมาณ 8 นาที

AI Agent สามารถดำเนินการแทนมนุษย์ได้ แต่ไม่ควรปล่อยให้ข้อความยืนยันที่คลุมเครือเพียงประโยคเดียว มาตัดสินสิทธิ์ในทรัพย์สินและอุปกรณ์ของเรา

สรุปโดย AI

ขยาย

ประเด็นหลัก: เมื่อ AI Agent เริ่มดำเนินการที่ละเอียดอ่อน เช่น การทำธุรกรรมบนเชนแทนผู้ใช้ กลไก "การยืนยันที่คลุมเครือ" แบบดั้งเดิมมีความเสี่ยง เช่น กล่องดำ การปลอมแปลง และความขัดแย้งของความเชื่อถือ ผลิตภัณฑ์ Sigil ที่เปิดตัวโดย imToken นั้นใช้หลักการ "สิ่งที่เห็นคือสิ่งที่เซ็น" โดยแปลงคำขอจริงเป็นบัตรยืนยันที่ชัดเจน 并要求ผู้ใช้ยืนยันผ่าน Passkey และการยืนยันตัวตนทางชีวภาพอย่างอิสระ เพื่อสร้างเกราะป้องกันความปลอดภัยระหว่าง Agent และกระเป๋าเงิน ทำให้แน่ใจว่าผู้ใช้ยังคงมีสิทธิ์รับรู้และควบคุมขั้นสุดท้าย
องค์ประกอบสำคัญ:
1. ความเสี่ยงใหม่ในการดำเนินการของ AI Agent: การ授权ของผู้ใช้ประสบปัญหา "กล่องดำ" คือการอนุมัติเจตนาที่คลุมเครือ ไม่ใช่การดำเนินการจริง; การตอบกลับในแชทขาดลายเซ็นดิจิทัล หน้าจอยืนยันอาจถูก Agent ปลอมแปลง ส่งผลให้สูญเสียทรัพย์สินหรือข้อมูลรั่วไหล
2. กลไกหลักของ Sigil: ผู้ใช้สามารถตั้งค่านโยบายล่วงหน้า อนุญาตให้ Agent ดำเนินการที่มีความเสี่ยงต่ำได้โดยอัตโนมัติ; สำหรับการดำเนินการที่ละเอียดอ่อน (เช่น การทำธุรกรรมเงินทุน การเซ็นบนเชน) Sigil จะ暂停กระบวนการ แปลงคำขอเป็นบัตรยืนยันที่ชัดเจน ส่งไปยัง Telegram และให้ผู้ใช้เซ็นอย่างอิสระผ่าน Passkey และการยืนยันตัวตนทางชีวภาพ
3. การป้องกันสามชั้น: ประการแรก ผู้ใช้สามารถมองเห็นพารามิเตอร์ที่แท้จริง (What you see is what you sign) เช่น สินทรัพย์ จำนวนเงิน ผู้รับ ฯลฯ ได้อย่างถูกต้อง; ประการที่สอง ผู้เซ็นต้องเป็นตัวผู้ใช้เอง (ผ่านการยืนยัน Passkey); สุดท้าย หน้ายืนยันถูกเรนเดอร์โดยโมดูลอิสระใน Sandbox ไม่สามารถถูกปลอมแปลง或แก้ไขโดย Agent ได้
4. การวางตำแหน่งผลิตภัณฑ์และสถานการณ์การใช้งาน: Sigil คือการสำรวจเชิงผลิตภัณฑ์ของ imToken ต่อหัวข้อ "Sign" ไม่เพียงแต่ใช้ได้กับการทำธุรกรรมบนเชนเท่านั้น แต่ในอนาคตสามารถขยายไปสู่การเข้าถึงข้อมูล การแก้ไขไฟล์ การซื้อบริการ และการดำเนินการ代理อื่นๆ ของ AI Agent โดยมีเป้าหมายเพื่อเป็นโครงสร้างพื้นฐานสำหรับผู้ใช้ในการจัดการตัวตนและสิทธิ์อัจฉริยะ

Imagine a future where you just tell an AI Agent: "Use half of the available funds in my wallet to increase my ETH position."

The Agent immediately starts reading your balance, searching liquidity pools, comparing quotes, and constructing a transaction path. Dozens of seconds later, it sends you a message: "Found a suitable buying plan. Confirm?"

You reply with a "Yes."

But in that very moment, what exactly did you approve? Which trading pool did it choose, what is the estimated execution price and slippage, which protocol was called, which wallet and how many assets were used, and does it include token approval or other additional operations? You don't truly see any of this information; you are simply trusting the Agent's summary of the operation.

This is precisely the new type of risk that is gradually surfacing as AI Agents evolve from "answering questions" to "taking actions for humans": Agents can now browse the web, log into accounts, and even complete payments and on-chain signatures. However, the authorization interface a user ultimately faces often remains nothing more than a vague chat message and a confirmation option containing almost no valid information.

A single "Yes" begins to decide the fate of your funds, data, and devices.

Therefore, in imToken's latest brand upgrade, a fourth "S" – Sign – has emerged alongside Store, Send, and Stake. If the first three S's correspond to asset custody, value transfer, and network participation respectively, then Sign aims to solve the problem of how users can maintain their ultimate right to know, approve, and control when more and more software begins to act on their behalf.

And Sigil is precisely the first early exploration POC product under the 'Sign' proposition. Its core principle is very interesting: What you see is what you sign.

1. When Agents Start Acting, Why Does the Wallet Need to Re-understand Sign?

In the past, most signature risks faced by crypto wallets stemmed from users not understanding the content of the transaction.

An on-chain transaction might appear at the base level only as complex contract addresses, function parameters, and hexadecimal data. It's difficult for an average user to directly judge whether it means a transfer/swap or some more dangerous asset operation.

Therefore, wallets need to parse raw data into human-readable information, allowing users to see detailed information before signing (further reading: "Ethereum Pushes 'What You See Is What You Sign': Why Is Clear Signing a Necessary Capability Patch for the AI Era?"). Clear Signing was designed precisely to bridge the gap between machine data and user understanding.

However, the problems introduced by AI Agents are far more complex.

Because what the user can't see is no longer just a single on-chain transaction, but potentially an entire operation chain automatically planned and executed by the Agent.

As mentioned, to achieve a goal like "use half my available liquidity to increase my ETH position," an Agent might need to read wallet balances, search on-chain pools, call third-party tools, execute scripts, and complete transactions. In this process, the user can neither inspect every underlying request line-by-line, nor can they avoid making a final decision before the assets are actually swapped.

The current authorization method adopted by many Agents is to send a brief description in a chat window, then wait for the user to reply "Yes," "Confirm," or click a generic button.

While this method seems to complete user authorization, it still has some obvious problems.

Firstly, it's a black box. The user knows they approved something but doesn't necessarily know the specific amount approved, the recipient, or what the Agent ultimately signed for them. The real operation parameters are hidden behind a highly generalized natural language sentence. What the user confirms is a vague intent, not the impending real action.

Secondly, a chat reply does not equal a digital signature. As long as someone has access to the logged-in device – whether by grabbing the phone, taking over the chat account, or simply operating it while next to the user – they could type a "Yes." The system can at most confirm the message came from an account, but cannot confirm it was actually authorized by the account owner.

More troubling is that the confirmation interface itself can be falsified. If an Agent can generate its own approval messages, then the party initiating the operation also controls the interface showing the operation's content to the user. It could easily omit key parameters, use vague language, or even display a seemingly harmless operation while submitting a different request in the background.

This creates an obvious trust paradox: We want to limit the Agent via the confirmation interface, yet we let the Agent itself decide what the user sees during confirmation.

When Agents only summarize articles or organize information, this opacity might just lead to wrong answers. But when they start accessing accounts, funds, file systems, and terminal environments, the consequences of a fuzzy approval can escalate from "inaccurate answers" to real asset loss, data breaches, or device risks (further reading: "Sign Is Not Just Signing: When an AI Agent Signs for You, Who Still Holds Control?").

Therefore, what the AI Agent era needs isn't more "Yes" buttons, but a signing mechanism that can prove "what the user saw, what the user approved, and what the system ultimately executed."

2. Sigil: The Signature Shield Between AI Agent and Wallet

This is precisely what imToken's newly launched Sigil aims to do – positioning itself as a security guardrail between the AI Agent and the wallet.

It doesn't try to stop the Agent from automatically executing all tasks. Instead, users can explicitly authorize the Agent during initial setup, defining which low-risk operations can be completed autonomously and which sensitive operations must be paused, awaiting an independent, clear, and verifiable approval from the user.

Within the set boundaries, the Agent can still act quickly.

But whenever an operation marked as sensitive by the user is involved – especially spending funds or signing transactions – Sigil pauses the flow, parses the real request into a clear confirmation card, and sends it to the user's Telegram. The user must complete the signing via Passkey and biometric authentication before the operation proceeds.

Overall, the entire process can be summarized in four steps:

Agent Initiates Operation: It can continue browsing the web, booking services, sending requests, or preparing a transaction – no different from a typical Agent's workflow.
Check Against Pre-set Security Policies: If it's a low-risk operation allowed for autonomous execution, the flow can continue. If it involves sensitive actions like sending messages, deleting files, running code, spending funds, or on-chain signing, Sigil pauses execution and parses the request.
User Explicitly Approves via Passkey: A clear confirmation card is sent to Telegram, directly showing the merchant, amount, recipient, and other key parameters. What the user sees isn't a sentence written by the Agent itself, but structured content parsed from the real operation.
Finally, the Agent can only proceed after Sigil's gateway verifies the user's signature. Without user approval, no funds or signatures are moved.

The key to this mechanism isn't merely adding one more biometric check; it's re-establishing the relationship between display, signing, and execution: What is displayed is the actual request. What the user signs is the content displayed. What the system ultimately executes must be the exact request that was signed.

If these three are inconsistent, Sigil blocks the operation.

Ultimately, Sigil doesn't require the user to approve every single action of the Agent. Instead, through policy settings, it lets the user decide in advance which behaviors can be automated and which ones require personal approval. Users can directly choose different security levels like Relaxed, Balanced, or Strict, or enter Custom mode to set rules for each type of operation individually.

Taking Balanced mode as an example: some low-risk behaviors don't require additional approval, while operations related to high-asset security, like code execution or terminal commands, must go through Sigil confirmation.

As for spending funds and signing transactions, regardless of the chosen security strategy, personal approval is always required.

This is a boundary Sigil will not compromise on.

3. From Crypto to AI Agents, What Does Sigil Aim to Protect?

Centered around "What you see is what you sign," Sigil provides three layers of protection.

First, users can accurately see what they are signing. For example, on Sigil's confirmation card, parameters like the protocol, amount, and recipient are parsed into clear fields. Users don't need to trust the Agent's summary or face incomprehensible raw data.

This card itself is the user's authorization content. Using the ETH transaction example from the beginning: what the user ultimately sees shouldn't just be a sentence "Buy ETH," but should include the actual asset and amount used, the transaction recipient, key transaction parameters, and other operational information the user needs to understand.

For real-world payment scenarios, it shouldn't just show "Confirm Payment" but should clearly list the merchant, amount, and payee. After all, the closer the displayed content is to the real operation, the more meaningful the user's authorization becomes.

Simultaneously, the only person who can truly sign is the user themselves. This is because Sigil uses Passkey as the secure entry point for approving operations and confirms the user's identity through device biometrics. Therefore, even if someone gains access to the logged-in Telegram device and can see the confirmation message, they cannot complete the approval just by typing text or clicking a regular button.

In other words, Passkey is bound to the user themselves, not to "the person currently holding the phone." It's worth noting that Sigil also adopts a seedless design. Users don't need to store or enter a new set of seed phrases, nor do they need to hand over their wallet's private key to the Agent. The true control over approval capability remains with the user's own Passkey and biometrics.

Furthermore, Sigil's confirmation page is not a regular message temporarily drawn by the Agent. It is an independent registered module whose content is fixed on-chain and rendered in a sandbox environment. This means the Agent cannot, after initiating a sensitive operation, replace the page itself, modify the display logic, or forge a confirmation interface with a similar appearance to trick the user into signing.

The party making the request no longer simultaneously controls the interface showing the request. Combined with single-use signatures, short validity periods, and hashing request parameters, Sigil ensures that the content in the confirmation card corresponds to the final request waiting to be executed. This prevents signatures from being reused long-term and prevents request parameters from being silently swapped after user approval.

As long as the preview content doesn't match the actual request, the operation is blocked.

Therefore, viewing Sigil within this context, it's not just a new wallet feature, but imToken's product exploration into the 'Sign' proposition. The focus is on a more fundamental question: When Agents start doing things, how can we ensure they still operate within the scope allowed by the user?

This need is particularly intuitive in the Crypto space. In the future, on-chain Agents could help users with recurring investments, yield management, fee payments, position adjustments, and risk monitoring, even autonomously executing operations across multiple protocols based on preset conditions. This makes it even more necessary to consider whether an Agent's behavior deviating from user expectations can be stopped immediately.

At the same time, Sigil's significance is not limited to Crypto. Currently, whether it's OpenClaw, Hermes, or the many more Agents running on personal devices and in the cloud in the future, they are gradually integrating with email, instant messaging, calendars, files, browsers, terminals, payment tools, and various online services.

Although these operations don't necessarily happen on the blockchain, their underlying relationship is essentially no different: the Agent is invoking a user's capability on their behalf. Therefore, Sigil may extend in the future from on-chain transactions to data access, identity usage, file modification, content publishing, service purchasing, and automated tasks.

This also explains why the capabilities accumulated by the wallet industry may gain new value in the AI Agent era – private key management, digital signatures, identity verification, permission confirmation, and asset security were primarily used for on-chain transactions before. However, the more fundamental problem they address has always been: how to prove that an action has received the genuine authorization of a specific entity.

When Agents start acting on behalf of humans on a large scale, this set of capabilities has the opportunity to extend from the Crypto world further, becoming infrastructure for users to manage smart identities, automated tasks, and machine permissions.

Therefore, as a joint exploration by imToken and OpenClaw, Sigil attempts to bring imToken's experience accumulated over the past decade in self-custody, wallets, and digital signatures into this new phase where autonomous Agents begin entering real execution environments.

It doesn't replace the Agent. It doesn't replace the wallet.

It stands between the two.

Final Thoughts

Overall, AI is making the capacity for action increasingly cheap.

Things that previously required the user to switch between multiple apps, involving searching, filling in forms, confirming, and paying, may in the future require just one natural language command for the Agent to automatically deconstruct and execute.

But "being able to act for the user" and "having obtained the user's valid authorization" are always two different things.

Because what truly determines whether an intelligent system is trustworthy is not just how many tasks it can complete, but whether the user can always understand it, limit it, and, when necessary, make it stop. From this perspective, Sign is not a redundant procedure hindering Agent efficiency. On the contrary, it is likely the most important layer of trust foundation before Agents can truly access assets and real-world services.

Store allows users to own assets. Send allows value to flow freely. Stake allows users to participate in open networks. And Sign aims to solve the problem of how users can retain their final say when more and more machines start acting on their behalf.

The value of Sigil lies precisely in taking this seemingly abstract proposition of control and, for the first time, pushing it towards a product that can be validated and continuously refined through a real demo.

Let's wait and see.

กระเป๋าสตางค์

ความปลอดภัย

ยินดีต้อนรับเข้าร่วมชุมชนทางการของ Odaily