Performance Characteristics
Latency Breakdown
| Stage | Typical Time | Notes |
|---|---|---|
| AST Parsing | 1-5ms | Scales with code size |
| AST Validation | 2-10ms | Depends on rule count |
| Code Transformation | 1-3ms | One-time per script |
| AI Scoring Gate | ~1ms | Rule-based (cached) |
| VM Execution | Variable | Depends on script complexity |
| Tool Calls | Variable | Network/database bound |
| Output Sanitization | 1-5ms | Scales with output size |
Worker Pool Mode
When using Worker Pool adapter for OS-level isolation, latency changes slightly:| Metric | Standard VM | Worker Pool (4 workers) |
|---|---|---|
| Cold start | ~5ms | ~50ms (pool warm-up) |
| Warm execution | ~1ms | ~3ms (message passing) |
| Concurrent capacity | 1 | 4 (parallel isolation) |
| Memory per execution | Shared | Isolated per worker |
Throughput
| Configuration | Requests/sec | Notes |
|---|---|---|
| Single instance, TF-IDF | ~500 | Bottleneck: VM isolation |
| Single instance, Embeddings | ~200 | Bottleneck: Model inference |
| Single instance, Worker Pool (4 workers) | ~800 | Parallel isolation |
| Multi-instance (4 pods) | ~1,500+ | Near-linear scaling |
| Multi-instance + Worker Pool | ~3,000+ | Best isolation + throughput |
Throughput depends heavily on script complexity and tool call latency. These numbers assume simple scripts with 1-3 tool calls.
Worker Pool Scaling Guidelines
| Workload | minWorkers | maxWorkers | memoryLimit | Use Case |
|---|---|---|---|---|
| Low volume | 1 | 2 | 128MB | <10 req/min |
| Medium volume | 2 | 4 | 256MB | 10-100 req/min |
| High volume | 4 | 8 | 256MB | 100-500 req/min |
| Burst traffic | 2 | 16 | 128MB | Spiky workloads |
Performance Optimization
1. Use TF-IDF for Most Cases
Unless you have 100+ tools with similar descriptions, TF-IDF provides excellent relevance with minimal overhead:2. Enable HNSW for Large Toolsets
For 1000+ tools with embedding strategy, enable HNSW indexing:3. Warm the Search Index on Startup
Pre-index tools during server initialization:4. Use Direct Invoke for Simple Calls
Bypass VM overhead for single-tool operations:5. Cache Describe Results
Tool schemas rarely change. Enable caching:Multi-Instance Deployment
CodeCall is stateless and scales horizontally.Architecture
Shared Cache (Redis)
For consistent search results across instances:Kubernetes Deployment
Resource Recommendations
| Workload | CPU | Memory | Instances |
|---|---|---|---|
| Light (<100 req/min) | 0.5 core | 512MB | 1-2 |
| Medium (100-500 req/min) | 1 core | 1GB | 2-4 |
| Heavy (500+ req/min) | 2 cores | 2GB | 4+ |
Monitoring
Metrics to Track
Execution Latency
Track p50, p95, p99 of
codecall:execute durationError Rate
Monitor validation errors, timeouts, and tool failures
Tool Call Count
Average tool calls per script execution
Search Latency
Track search response times for index health
Logging
CodeCall emits structured logs for observability:Health Checks
Expose CodeCall health via your health endpoint:Alerting Recommendations
| Metric | Warning | Critical |
|---|---|---|
| Execute p99 latency | > 2s | > 5s |
| Error rate | > 5% | > 15% |
| Timeout rate | > 1% | > 5% |
| Security blocks | Any | High volume |
Cost Optimization
Token Savings
CodeCall dramatically reduces token usage:| Scenario | Without CodeCall | With CodeCall | Savings |
|---|---|---|---|
| 100 tools in context | ~25,000 tokens | ~3,000 tokens | 88% |
| Multi-tool workflow (5 calls) | ~50,000 tokens | ~5,000 tokens | 90% |
| Complex filtering | ~100,000 tokens | ~8,000 tokens | 92% |
Compute Costs
| Factor | Impact | Optimization |
|---|---|---|
| VM isolation | ~10ms overhead | Use codecall:invoke for simple calls |
| Embedding inference | ~50ms/query | Use TF-IDF for fewer than 100 tools |
| Tool calls | Dominant cost | Optimize underlying tools |
Cost vs. Performance Tradeoffs
Minimize Latency
Minimize Latency
- Use TF-IDF search
- Enable caching for describe/search
- Use direct invoke for simple calls
- Increase VM timeout for complex scripts
Minimize Compute
Minimize Compute
- Use locked_down preset (shorter timeouts)
- Limit maxToolCalls aggressively
- Cache aggressively
- Use fewer instances with more resources
Minimize Tokens
Minimize Tokens
- Use codecall_only mode
- Hide all tools from list_tools
- Return minimal data from tools
- Let scripts filter server-side
Security in Production
Checklist
1
Use secure or locked_down preset
Never use
experimental in production.2
Enable audit logging
Log all script executions and security events.
3
Configure rate limiting
Prevent abuse via aggressive rate limits.
4
Monitor security events
Alert on validation failures and self-reference attempts.
5
Regular security reviews
Review tool allowlists and filter rules quarterly.
Rate Limiting
Audit Logging
CodeCall provides comprehensive audit logging for compliance and security monitoring.What Gets Logged
| Event | Data Captured | Purpose |
|---|---|---|
| Script execution start | executionId, scriptHash, timestamp | Track execution lifecycle |
| Tool calls | toolName, args (sanitized), duration, result | Audit trail of actions |
| Security events | blocked construct, rule triggered | Security monitoring |
| Scoring gate results | riskLevel, signals, score | Risk assessment audit |
| Script completion | status, duration, toolCallCount | Performance tracking |
Enabling Audit Logging
Audit Event Schema
Integration Examples
Datadog
Datadog
AWS CloudWatch
AWS CloudWatch
Database
Database
Multi-Tenancy Patterns
CodeCall supports multiple isolation strategies for multi-tenant deployments.Tenant Context
Pass tenant information viacodecallContext:
Per-Tenant Tool Filtering
Restrict tools based on tenant:Per-Tenant Configuration
Different security levels per tenant:Isolation Strategies
| Strategy | Isolation Level | Cost | Use Case |
|---|---|---|---|
| Shared pool | Low | $ | Dev/staging |
| Tenant-specific limits | Medium | $$ | SaaS standard |
| Separate worker pools | High | $$$ | Enterprise |
| Dedicated instances | Maximum | $$$$ | Compliance-heavy |
Per-Tenant Resource Quotas
Audit Trail Separation
Separate audit logs by tenant:Troubleshooting
Common Issues
Scripts timing out
Scripts timing out
Symptoms: Frequent
TIMEOUT errorsCauses:- Script too complex
- Tool calls too slow
- Timeout too aggressive
- Profile tool call latency
- Increase
vm.timeoutMsif tools are slow - Break complex scripts into smaller pieces
- Use
Promise.all()for independent tool calls
Search returning irrelevant results
Search returning irrelevant results
Symptoms: Low relevance scores, wrong tools returnedCauses:
- Poor tool descriptions
- Threshold too low
- TF-IDF limitations
- Improve tool descriptions
- Increase
similarityThreshold - Switch to embedding strategy for semantic matching
- Add more specific keywords to descriptions
High memory usage
High memory usage
Symptoms: OOM errors, pod restartsCauses:
- Embedding model loaded
- Large tool index
- Scripts returning large data
- Use TF-IDF instead of embeddings
- Increase memory limits
- Configure output sanitization limits
- Enable HNSW for large indexes
Validation errors for valid code
Validation errors for valid code
Symptoms: Scripts rejected that should workCauses:
- Using blocked constructs
- Reserved prefix collision
- Unicode issues
- Check for
eval,Function, etc. - Avoid
__ag_and__safe_prefixes - Use ASCII identifiers
- Review AST Guard rules
Migration & Rollback
Gradual Rollout
-
Phase 1: Deploy with
mode: 'metadata_driven'- All tools visible normally
- Mark select tools for CodeCall
- Monitor for issues
-
Phase 2: Switch to
mode: 'codecall_opt_in'- Tools opt into CodeCall
- Both access methods work
- Measure token savings
-
Phase 3: Move to
mode: 'codecall_only'- Hide tools from list_tools
- Full CodeCall experience
- Maximum token savings

