Building a Production-Grade N8N Infrastructure on 16GB RAM

The Challenge

N8N is a powerful workflow automation platform, but scaling it to handle heavy production workloads while maintaining a reasonable infrastructure budget is challenging. This is the story of how I architected a high-performance N8N deployment that handles complex Python and JavaScript workloads within the constraints of a single 16GB server.

Architecture Overview

The infrastructure consists of multiple specialized components working together in a queue-based architecture:

┌─────────────────────────────────────────────────────────┐
│                    Client Requests                       │
└──────────────────────┬──────────────────────────────────┘
                       │
                       ▼
              ┌─────────────────┐
              │  Caddy (Proxy)  │
              │   SSL/TLS       │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          │            │            │
          ▼            ▼            ▼
    ┌─────────┐  ┌──────────┐  ┌──────────┐
    │ N8N Main│  │ Webhook  │  │ Webhook  │
    │ UI/API  │  │ Handler  │  │ Handler  │
    └────┬────┘  └──────────┘  └──────────┘
         │
         │ Publishes jobs to
         ▼
    ┌──────────────────┐
    │  Redis Queue     │
    │  (Bull Queue)    │
    └────────┬─────────┘
             │
       ┌─────┴─────┐
       │           │
       ▼           ▼
   ┌────────┐  ┌────────┐
   │Worker-1│  │Worker-2│
   │  15cc  │  │  15cc  │
   └───┬────┘  └───┬────┘
       │           │
       │ Delegates │
       ▼           ▼
   ┌────────┐  ┌────────┐
   │Task    │  │Task    │
   │Runners │  │Runners │
   │Python  │  │Python  │
   │& JS    │  │& JS    │
   └────────┘  └────────┘

Key Design Decisions

1. External Task Runners for Heavy Workloads

One of the most critical architectural decisions was implementing external task runners. N8N supports executing Python and JavaScript code, but running these in the main Node.js process poses security and performance risks.

The Solution:

Isolated task runner containers built on the official n8nio/runners:latest image
Each runner operates in a sandboxed environment
Pre-installed heavy dependencies: NumPy, Pandas, yt-dlp, requests

Benefits:

Security: Code execution is isolated from the main process
Performance: Python workloads don't block Node.js event loop
Stability: Crashed code execution doesn't bring down the worker
Resource Control: Each runner has dedicated memory limits (1.5GB)

2. Dedicated Task Runners Per Worker

Instead of sharing a pool of task runners, each worker has dedicated task runners. This was a deliberate choice for heavy workload scenarios.

worker-1 (2.5GB)
  ├── task-runner-worker-1-1 (1.5GB)
  └── task-runner-worker-1-2 (1.5GB)

worker-2 (2.5GB)
  ├── task-runner-worker-2-1 (1.5GB)
  └── task-runner-worker-2-2 (1.5GB)

Why This Matters:

Load Isolation: A saturated worker-1 doesn't starve worker-2 of task runners
Debugging: Easy to identify which worker is causing performance issues
Scaling: Can scale task runners independently per worker based on actual load
Failure Containment: Issues in one worker's task runners don't cascade

3. Queue-Based Architecture

The infrastructure uses Bull Queue (Redis-backed) for job distribution:

Manual Execution (UI) → Main Process → task-runner-main
Production Jobs → Redis Queue → Workers → task-runner-worker-X
Webhooks → Webhook Process → Redis Queue → Workers

This separation ensures:

UI remains responsive even under heavy production load
Production jobs are distributed across workers
Failed jobs can be retried automatically
Horizontal scaling is possible

Memory Optimization Strategy

Fitting a production-grade setup into 16GB required careful resource allocation:

Component	Count	Memory/Instance	Total	Strategy
N8N Main	1	4GB	4GB	UI/API - needs headroom for workflow editing
Workers	2	2.5GB	5GB	Queue consumers - right-sized for 15 concurrent jobs
Task Runners (Main)	2	1.5GB	3GB	Manual executions - lower load
Task Runners (Workers)	4	1.5GB	6GB	Production - most critical
Webhooks	2	1GB	2GB	Stateless handlers
Caddy	1	256MB	256MB	Lightweight proxy
Prometheus	1	512MB	512MB	Monitoring with 7-day retention
Total	13	-	~21GB	With soft limits & reservations

How it fits in 16GB:

Memory reservations vs limits: Containers reserve minimum memory but can burst to limits
Actual usage: ~12-14GB under normal load
Headroom: 2-4GB for OS and spikes
Log level: Set to warn to reduce I/O overhead

Docker Compose Configuration Highlights

Custom N8N Image with FFmpeg

FROM n8nio/n8n:latest

USER root
RUN apk add --no-cache ffmpeg
USER node

Simple and effective - adds video processing capabilities without bloat.

Task Runner Configuration

The task runners use a custom configuration file (n8n-task-runners.json) that defines:

{
  "task-runners": [
    {
      "runner-type": "javascript",
      "command": "/usr/local/bin/node",
      "args": ["--disallow-code-generation-from-strings"],
      "env-overrides": {
        "NODE_FUNCTION_ALLOW_BUILTIN": "",
        "NODE_FUNCTION_ALLOW_EXTERNAL": ""
      }
    },
    {
      "runner-type": "python",
      "command": "/opt/runners/task-runner-python/.venv/bin/python",
      "env-overrides": {
        "N8N_RUNNERS_EXTERNAL_ALLOW": "numpy,pandas,requests,yt-dlp"
      }
    }
  ]
}

This ensures:

JavaScript runs with security flags (no eval(), no proto manipulation)
Python has explicit whitelist of allowed packages
Each runner type has its own resource constraints

Worker Configuration with Task Runner Support

Critical environment variables that enable workers to use task runners:

worker-1:
  environment:
    - EXECUTIONS_MODE=queue
    - N8N_RUNNERS_ENABLED=true
    - N8N_RUNNERS_MODE=external
    - N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0
    - N8N_RUNNERS_MAX_CONCURRENCY=5

Without these, workers would fail when encountering Python/JS code nodes.

Performance Characteristics

Throughput

30 concurrent workflow executions (2 workers × 15 concurrency)
6 parallel Python/JS tasks (6 task runners total)
Unlimited webhook requests (handled by dedicated webhook processes)

Load Distribution

Manual/UI Load → task-runner-main (sporadic, low volume)
Production Load → task-runner-worker-1 & worker-2 (continuous, high volume)
Webhook Traffic → webhook processes → Queue → Workers

Resource Utilization Under Load

Normal: 60-70% memory usage (~10-11GB)
Peak: 85-90% memory usage (~14-15GB)
Critical threshold: Set at 90% with alerts

Monitoring and Observability

Prometheus Integration

Each process exposes metrics with unique prefixes:

n8n_* - Main process metrics
n8n_worker_1_* - Worker 1 metrics
n8n_worker_2_* - Worker 2 metrics
n8n_webhook_* - Webhook process metrics

Health Checks

Every critical component has health checks:

healthcheck:
  test: ["CMD", "wget", "--spider", "http://localhost:5678/healthz"]
  interval: 30s
  timeout: 5s
  retries: 3

This ensures:

Docker automatically restarts unhealthy containers
Load balancers can route traffic away from degraded instances
Monitoring systems get real-time health status

Common Pitfalls Avoided

1. Task Runner Connection Mismatch

Problem: Using docker-compose replicas creates containers like worker_1, worker_2, but there's no service named just worker.

Wrong:

task-runner-worker:
  environment:
    - N8N_RUNNERS_TASK_BROKER_URI=http://worker:5679  # Fails!

Correct:

task-runner-worker-1:
  environment:
    - N8N_RUNNERS_TASK_BROKER_URI=http://worker-1:5679  # Works!

2. Workers Without Task Runner Configuration

Workers must explicitly enable task runners:

- N8N_RUNNERS_ENABLED=true
- N8N_RUNNERS_MODE=external

Without these, Python/JS nodes will fail silently.

3. Insufficient Memory for Task Runners

Initial allocation of 512MB per task runner caused OOM kills with Pandas operations. Increased to 1.5GB to handle:

Large DataFrame operations
NumPy array computations
Video processing with yt-dlp

Deployment and Scaling

Initial Deployment

# Build custom images
docker-compose build --no-cache

# Start infrastructure
docker-compose up -d

# Verify all services
docker-compose ps
docker stats

Horizontal Scaling (Future)

The architecture supports scaling to multiple servers:

Move Redis to dedicated server
Add worker servers pointing to central Redis
Scale task runners based on worker load
Use external load balancer instead of Caddy

Vertical Scaling

If upgrading to 32GB RAM:

Increase worker count: 2 → 4
Add task runners per worker: 2 → 3
Increase concurrency: 15 → 20 per worker
Expected throughput: 80 concurrent executions

Security Considerations

Network Isolation

All services communicate over internal Docker network. Only Caddy exposes ports externally.

Code Execution Sandboxing

Task runners use security flags: --disallow-code-generation-from-strings
Python external packages are whitelisted
No access to host filesystem from task runners

Secrets Management

Sensitive values stored in .env file:

N8N_ENCRYPTION_KEY=your-encryption-key
N8N_RUNNERS_AUTH_TOKEN=your-auth-token

Never committed to version control.

Lessons Learned

1. Memory Limits vs Reservations

Using both provides the best of both worlds:

Reservation: Guaranteed minimum (prevents resource starvation)
Limit: Maximum allowed (prevents runaway processes)

2. Log Levels Matter

Changing from info to warn reduced disk I/O by ~40% and saved ~200MB memory.

3. Health Check Intervals

Initial 10s intervals created unnecessary overhead. 30s is sufficient for production.

4. Task Runner Auto-Shutdown

Setting N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT=300 allows idle runners to shut down, freeing memory.

Conclusion

Building a production-grade N8N infrastructure on limited resources requires careful architectural decisions and resource optimization. The key takeaways:

External task runners are essential for heavy Python/JS workloads
Dedicated task runners per worker provide better isolation and debugging
Queue-based architecture enables scaling and resilience
Memory optimization through limits, reservations, and right-sizing
Comprehensive monitoring is critical for production stability

This infrastructure successfully handles production workloads processing thousands of workflow executions daily, with complex data transformations using Pandas, video downloads with yt-dlp, and custom JavaScript logic—all within a 16GB constraint.

The architecture is battle-tested, cost-effective, and ready to scale when needed.

Resources

Full configuration available on request.