Security Model
Nestor is designed with a security-first architecture. Every layer, from the Rust core to the Docker sandbox, is built to prevent AI agents from causing harm.
Security Layers
Nestor applies defense in depth with multiple security layers:
- Rust Security Core — Low-level protection via N-API bindings
- Docker Sandbox — Process isolation with minimal capabilities
- Guardrails — Configurable rules for tool access and approvals
- Circuit Breaker — Automatic protection against cascading failures
- StuckDetector — Detects and breaks agent loops
- Trust Scoring — Behavioral monitoring and grading
- Cost Budgets — Hard limits on spending per session/day
- Secret Redaction — Automatic detection and masking of sensitive data
Rust Security Core
The nestor-core crate is compiled to a native Node.js addon via N-API. It provides security primitives that are impossible to bypass from JavaScript:
SSRF Protection
All outgoing HTTP requests are validated against SSRF attacks:
- DNS pinning to prevent rebinding attacks
- Blocking of private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
- Blocking of link-local and metadata endpoints (169.254.169.254)
- Protocol allowlisting (only http/https)
Path Traversal Prevention
File operations are validated to prevent escaping the working directory:
- Canonical path resolution before any file I/O
- Blocking of
..traversal sequences - Symlink resolution and validation
- Configurable allowlist/blocklist for file patterns
Approval System
Sensitive operations require cryptographic approval tokens that cannot be forged by the agent.
Homoglyph Detection
New in v3.4.0, the Rust core detects Unicode homoglyph attacks where visually similar characters (e.g., Cyrillic "a" vs Latin "a") are used to disguise malicious URLs, filenames, or commands. All string inputs are normalized and validated before processing.
Skill Scanner
Before installing any skill, the Skill Scanner performs static analysis to detect potentially dangerous patterns:
- Shell injection vectors in tool definitions
- Unauthorized network access patterns
- File system escape attempts
- Obfuscated code or encoded payloads
Safe Regex
All user-provided regex patterns (custom redaction, guardrails) are validated against ReDoS (Regular Expression Denial of Service) attacks. The Rust core enforces maximum execution time and rejects catastrophic backtracking patterns.
Docker Sandbox
When Docker is available, agent commands run inside an isolated container:
# Docker sandbox configuration
sandbox:
enabled: true
image: nestor-sandbox:latest
capabilities:
drop: [ALL] # drop all Linux capabilities
filesystem:
read_only: true # read-only root filesystem
tmpfs: /tmp # writable temp directory
bind_mounts:
- src: ./src
dst: /workspace/src
read_only: false # agent can write to src/
network: none # no network access by default
memory_limit: 512m
cpu_limit: 1.0
timeout: 300 # kill after 5 minutes
Important: The Docker sandbox is optional but strongly recommended for production use. Without it, agents execute commands directly on the host system with guardrails as the only protection.
Sandbox Modes
| Mode | Network | Filesystem | Use Case |
|---|---|---|---|
strict | None | Read-only | Untrusted agents, security review |
standard | None | Working dir writable | Code editing, file manipulation |
relaxed | Allowed | Working dir writable | Web search, API calls |
Guardrails
Guardrails are configurable rules that constrain agent behavior:
Tool-Level Guardrails
guardrails:
# Require human approval for these tools
require_approval:
- file_write
- shell_exec
# Block specific shell commands
blocked_commands:
- rm -rf
- sudo
- curl | sh
- chmod 777
# Restrict file write patterns
file_restrictions:
blocked_paths:
- .env
- .nestor/config.yaml
- node_modules/
blocked_extensions:
- .exe
- .sh
Behavioral Guardrails
guardrails:
# Limit the agent loop
max_iterations: 25
max_tokens_per_turn: 8192
# Dry-run mode: preview all actions
dry_run: false
# Auto-approve safe operations
auto_approve:
- file_read
- web_search
Guardrails CRUD API
Nestor v3.4.0 introduces a full CRUD API for managing guardrails at runtime, without restarting the server:
CLI Commands
# List all guardrails for an agent
npx nestor-sh guardrail list --agent coder
# Add a new guardrail rule
npx nestor-sh guardrail add --agent coder \
--type blocked_command --value "docker rm"
# Remove a guardrail rule
npx nestor-sh guardrail remove --agent coder \
--type blocked_command --value "docker rm"
# Update guardrail settings
npx nestor-sh guardrail set --agent coder \
--key max_iterations --value 50
Studio API
Guardrails can also be managed via the Studio dashboard REST API:
# GET /api/agents/:name/guardrails
# POST /api/agents/:name/guardrails
# PUT /api/agents/:name/guardrails/:id
# DELETE /api/agents/:name/guardrails/:id
Circuit Breaker
The circuit breaker protects against cascading failures when LLM providers are down or rate-limited:
How It Works
- Closed — Normal operation. Requests pass through to the LLM provider
- Open — Provider is failing. Requests are immediately rejected and fallback chain activates
- Half-Open — Testing recovery. A limited number of requests are allowed through
Configuration
# Circuit breaker settings
circuit_breaker:
failure_threshold: 5 # open after 5 consecutive failures
reset_timeout: 60000 # try again after 60 seconds
half_open_max: 2 # allow 2 test requests in half-open
When a provider circuit opens, Nestor automatically routes to the next provider in the fallback chain (e.g., Claude fails, fall back to GPT-4o, then Gemini, then Ollama).
StuckDetector
The StuckDetector monitors agent behavior and intervenes when an agent enters a loop or makes no progress:
Detection Patterns
- Repetition loop — Agent repeats the same tool call 3+ times with identical parameters
- Error loop — Same error occurs on consecutive iterations
- No progress — No new findings, no new tool calls for N iterations
- Token spiral — Context grows without producing useful output
Recovery Actions
When stuck behavior is detected, Nestor can:
- Inject a system message asking the agent to change strategy
- Reset the conversation context to a previous checkpoint
- Switch to a different LLM provider
- Gracefully terminate with a partial report
# StuckDetector configuration
stuck_detector:
enabled: true
max_repeated_calls: 3 # detect after 3 identical calls
max_error_streak: 5 # detect after 5 consecutive errors
idle_iterations: 10 # detect after 10 iterations with no progress
action: inject_hint # inject_hint | reset | switch_llm | terminate
Trust Score System
The trust score is a composite metric (0-100, grade A-F) computed from an agent's execution history:
Score Components
| Component | Weight | What It Measures |
|---|---|---|
| Accuracy | 35% | Correctness of outputs and tool usage |
| Safety | 30% | Guardrail compliance, no blocked actions attempted |
| Efficiency | 20% | Token usage relative to task complexity |
| Reliability | 15% | Consistency across similar tasks |
Trust-Based Permissions
Agents can be granted or restricted permissions based on their trust score:
# Higher trust = more autonomy
trust_policies:
A:
auto_approve: [file_write, shell_exec]
max_budget: 50.00
B:
auto_approve: [file_write]
require_approval: [shell_exec]
max_budget: 20.00
C:
require_approval: [file_write, shell_exec]
max_budget: 5.00
D:
dry_run: true
max_budget: 1.00
Secret Redaction
Nestor automatically detects and redacts secrets from agent outputs and logs. The Rust core includes 30+ patterns for:
- API keys (AWS, GCP, Azure, Anthropic, OpenAI, Stripe, etc.)
- OAuth tokens and JWT tokens
- Database connection strings
- SSH private keys
- Passwords in URLs
- Credit card numbers
- Custom patterns via regex configuration
# Custom redaction patterns
security:
redaction:
enabled: true
custom_patterns:
- name: internal_api_key
pattern: "MYAPP-[A-Za-z0-9]{32}"
- name: internal_token
pattern: "tok_[a-f0-9]{40}"
Network Security
The server component includes multiple network security layers:
- Rate limiting — Configurable per-endpoint rate limits
- CORS — Strict origin validation
- Security headers — CSP, HSTS, X-Frame-Options, etc.
- Authentication — Token-based auth for the Studio API
- Input validation — Schema validation on all API inputs
Security Best Practices
- Always use the Docker sandbox in production environments
- Set cost budgets to prevent runaway spending
- Require approval for file_write and shell_exec on new agents
- Monitor trust scores and restrict low-trust agents
- Use dry-run mode when testing new skills or workflows
- Enable the circuit breaker for production deployments
- Configure the StuckDetector to prevent infinite loops
- Review agent outputs before deploying to production
- Keep Nestor updated to get the latest security patches
- Configure custom redaction patterns for your organization's secrets
v3.4.0 Security Additions
Release 3.4.0 (2026-04-17) closes a full security audit (5 CRITICALs and 5 HIGHs) and hardens several layers of the defense in depth model. Key additions shipped in this release:
- Approval hardening —
smartmode is now the default for chat and messaging bridges; dangerous shell commands are detected via regex before any execution reaches the approval checker. - CLAUDE.md write-lock — writes targeting
CLAUDE.mdfiles are always gated through explicit approval, even whensecurity.approvalModeis set tooff. - Extended sensitive paths — the protected path list now covers
.credentials/, every.envvariant,.sshprivate keys,.aws,.kube,.docker, and.git/hooks/. - OAuth cookies hardened — the OAuth state cookie for Google and GitHub flows is now emitted with
secure: req.secureandsameSite: 'lax'. - Webhook signatures — Telegram secret-token check (constant-time), Slack HMAC-SHA256 over
v0:<ts>:<rawBody>with a 5-minute replay window, and Discord Ed25519 viatweetnacl. - Audit framework hardening — fail-closed authentication, regex-based whitelist for audited commands, and argument-injection prevention across every evaluator entrypoint.
Security Notice: AI agents can behave unpredictably. Never grant an agent access to production systems, financial accounts, or sensitive data without thorough testing and appropriate guardrails. Always apply the principle of least privilege.
✎ Edit this page on GitHub · Last updated 2026-04-26