Security Model

Nestor is designed with a security-first architecture. Every layer, from the Rust core to the Docker sandbox, is built to prevent AI agents from causing harm.

Security Layers

Nestor applies defense in depth with multiple security layers:

  1. Rust Security Core — Low-level protection via N-API bindings
  2. Docker Sandbox — Process isolation with minimal capabilities
  3. Guardrails — Configurable rules for tool access and approvals
  4. Circuit Breaker — Automatic protection against cascading failures
  5. StuckDetector — Detects and breaks agent loops
  6. Trust Scoring — Behavioral monitoring and grading
  7. Cost Budgets — Hard limits on spending per session/day
  8. Secret Redaction — Automatic detection and masking of sensitive data

Rust Security Core

The nestor-core crate is compiled to a native Node.js addon via N-API. It provides security primitives that are impossible to bypass from JavaScript:

SSRF Protection

All outgoing HTTP requests are validated against SSRF attacks:

Path Traversal Prevention

File operations are validated to prevent escaping the working directory:

Approval System

Sensitive operations require cryptographic approval tokens that cannot be forged by the agent.

Homoglyph Detection

New in v3.4.0, the Rust core detects Unicode homoglyph attacks where visually similar characters (e.g., Cyrillic "a" vs Latin "a") are used to disguise malicious URLs, filenames, or commands. All string inputs are normalized and validated before processing.

Skill Scanner

Before installing any skill, the Skill Scanner performs static analysis to detect potentially dangerous patterns:

Safe Regex

All user-provided regex patterns (custom redaction, guardrails) are validated against ReDoS (Regular Expression Denial of Service) attacks. The Rust core enforces maximum execution time and rejects catastrophic backtracking patterns.

Docker Sandbox

When Docker is available, agent commands run inside an isolated container:

# Docker sandbox configuration
sandbox:
  enabled: true
  image: nestor-sandbox:latest
  capabilities:
    drop: [ALL]           # drop all Linux capabilities
  filesystem:
    read_only: true       # read-only root filesystem
    tmpfs: /tmp           # writable temp directory
    bind_mounts:
      - src: ./src
        dst: /workspace/src
        read_only: false  # agent can write to src/
  network: none           # no network access by default
  memory_limit: 512m
  cpu_limit: 1.0
  timeout: 300            # kill after 5 minutes

Important: The Docker sandbox is optional but strongly recommended for production use. Without it, agents execute commands directly on the host system with guardrails as the only protection.

Sandbox Modes

ModeNetworkFilesystemUse Case
strictNoneRead-onlyUntrusted agents, security review
standardNoneWorking dir writableCode editing, file manipulation
relaxedAllowedWorking dir writableWeb search, API calls

Guardrails

Guardrails are configurable rules that constrain agent behavior:

Tool-Level Guardrails

guardrails:
  # Require human approval for these tools
  require_approval:
    - file_write
    - shell_exec

  # Block specific shell commands
  blocked_commands:
    - rm -rf
    - sudo
    - curl | sh
    - chmod 777

  # Restrict file write patterns
  file_restrictions:
    blocked_paths:
      - .env
      - .nestor/config.yaml
      - node_modules/
    blocked_extensions:
      - .exe
      - .sh

Behavioral Guardrails

guardrails:
  # Limit the agent loop
  max_iterations: 25
  max_tokens_per_turn: 8192

  # Dry-run mode: preview all actions
  dry_run: false

  # Auto-approve safe operations
  auto_approve:
    - file_read
    - web_search

Guardrails CRUD API

Nestor v3.4.0 introduces a full CRUD API for managing guardrails at runtime, without restarting the server:

CLI Commands

# List all guardrails for an agent
npx nestor-sh guardrail list --agent coder

# Add a new guardrail rule
npx nestor-sh guardrail add --agent coder \
  --type blocked_command --value "docker rm"

# Remove a guardrail rule
npx nestor-sh guardrail remove --agent coder \
  --type blocked_command --value "docker rm"

# Update guardrail settings
npx nestor-sh guardrail set --agent coder \
  --key max_iterations --value 50

Studio API

Guardrails can also be managed via the Studio dashboard REST API:

# GET /api/agents/:name/guardrails
# POST /api/agents/:name/guardrails
# PUT /api/agents/:name/guardrails/:id
# DELETE /api/agents/:name/guardrails/:id

Circuit Breaker

The circuit breaker protects against cascading failures when LLM providers are down or rate-limited:

How It Works

Configuration

# Circuit breaker settings
circuit_breaker:
  failure_threshold: 5     # open after 5 consecutive failures
  reset_timeout: 60000     # try again after 60 seconds
  half_open_max: 2         # allow 2 test requests in half-open

When a provider circuit opens, Nestor automatically routes to the next provider in the fallback chain (e.g., Claude fails, fall back to GPT-4o, then Gemini, then Ollama).

StuckDetector

The StuckDetector monitors agent behavior and intervenes when an agent enters a loop or makes no progress:

Detection Patterns

Recovery Actions

When stuck behavior is detected, Nestor can:

  1. Inject a system message asking the agent to change strategy
  2. Reset the conversation context to a previous checkpoint
  3. Switch to a different LLM provider
  4. Gracefully terminate with a partial report
# StuckDetector configuration
stuck_detector:
  enabled: true
  max_repeated_calls: 3    # detect after 3 identical calls
  max_error_streak: 5      # detect after 5 consecutive errors
  idle_iterations: 10      # detect after 10 iterations with no progress
  action: inject_hint      # inject_hint | reset | switch_llm | terminate

Trust Score System

The trust score is a composite metric (0-100, grade A-F) computed from an agent's execution history:

Score Components

ComponentWeightWhat It Measures
Accuracy35%Correctness of outputs and tool usage
Safety30%Guardrail compliance, no blocked actions attempted
Efficiency20%Token usage relative to task complexity
Reliability15%Consistency across similar tasks

Trust-Based Permissions

Agents can be granted or restricted permissions based on their trust score:

# Higher trust = more autonomy
trust_policies:
  A:
    auto_approve: [file_write, shell_exec]
    max_budget: 50.00
  B:
    auto_approve: [file_write]
    require_approval: [shell_exec]
    max_budget: 20.00
  C:
    require_approval: [file_write, shell_exec]
    max_budget: 5.00
  D:
    dry_run: true
    max_budget: 1.00

Secret Redaction

Nestor automatically detects and redacts secrets from agent outputs and logs. The Rust core includes 30+ patterns for:

# Custom redaction patterns
security:
  redaction:
    enabled: true
    custom_patterns:
      - name: internal_api_key
        pattern: "MYAPP-[A-Za-z0-9]{32}"
      - name: internal_token
        pattern: "tok_[a-f0-9]{40}"

Network Security

The server component includes multiple network security layers:

Security Best Practices

  1. Always use the Docker sandbox in production environments
  2. Set cost budgets to prevent runaway spending
  3. Require approval for file_write and shell_exec on new agents
  4. Monitor trust scores and restrict low-trust agents
  5. Use dry-run mode when testing new skills or workflows
  6. Enable the circuit breaker for production deployments
  7. Configure the StuckDetector to prevent infinite loops
  8. Review agent outputs before deploying to production
  9. Keep Nestor updated to get the latest security patches
  10. Configure custom redaction patterns for your organization's secrets

v3.4.0 Security Additions

Release 3.4.0 (2026-04-17) closes a full security audit (5 CRITICALs and 5 HIGHs) and hardens several layers of the defense in depth model. Key additions shipped in this release:

Security Notice: AI agents can behave unpredictably. Never grant an agent access to production systems, financial accounts, or sensitive data without thorough testing and appropriate guardrails. Always apply the principle of least privilege.


✎ Edit this page on GitHub · Last updated 2026-04-26