2026-04-10 · architecture

Building an Ops Hub Under Resource Constraints

Fifteen systems. Two CPUs. Eight gigabytes of RAM. Ninety-six gigabytes of disk. The design of Nexus Ops, and the tradeoffs the constraints forced.

The Problem

Operating the platform meant keeping fifteen systems in my head at once: the existing Nexus CRM, n8n, the Groups app, the Slack workspace, a social media pipeline that lives inside n8n workflows, a lead research corpus of files and Obsidian notes, The Kiln (the multiplayer game at play.agency186.com), Ollama on the VPS host, Google Workspace APIs, proposal pages on shared hosting, Playwright scripts, the Eclipse campaign, Replicate and Perplexity as external APIs, the shared PostgreSQL container, and Traefik itself.

Checking the state of the business required opening n8n, the Groups app, Slack, the CRM, the shared host file manager, Google Drive, and multiple project directories. Each tab held a slice of the truth. Reassembling those slices into a coherent picture was the actual work, and I was doing it every day.

The solution is a single operations hub at ops.traveltamers.com. Its job is to aggregate the fifteen systems into one authenticated view — an activity stream, a health bar, module cards for each integration, and a small set of quick actions that can trigger work across the stack without switching tabs. Its job is explicitly not to replace any of the underlying apps. When a workflow needs deep interaction with n8n, the hub links out to n8n. The hub aggregates and commands. It does not rebuild.

The Governing Constraint

The VPS has two CPUs, eight gigabytes of RAM, and ninety-six gigabytes of disk, shared across every service on the box. Before adding anything, the estimated memory footprint already looks like this:

Traefik                       ~50 MB
PostgreSQL (shared)          ~200 MB
n8n                          ~400 MB
Open WebUI                   ~300 MB
Ollama (when loaded)        ~2000 MB
Nexus CRM (5 containers)     ~800 MB
Groups App                   ~100 MB
                           ---------
Subtotal                    ~3.85 GB

That leaves roughly four gigabytes for the operating system, file buffers, and anything new. Every architectural choice for the ops hub has to be evaluated against that ceiling. Heavy middleware, dedicated cache clusters, compute-intensive workers — off the table before the first line of code.

Copy the CRM, Don't Rewrite

The starting point is a copy of the existing Nexus CRM. It already has authentication, a React frontend with Zustand state management, a Fastify backend, Socket.IO for real-time push, BullMQ for background jobs, and roughly thirty pages of working UI — approximately forty percent of the way to a complete operations dashboard before anything new is written. Surgical expansion beats greenfield construction when the greenfield path means reimplementing auth, sockets, and a component library.

The ops hub keeps the original CRM running at shane.traveltamers.com untouched and deploys the modified copy at ops.traveltamers.com. Both stacks coexist on the same box. The two share the Traefik ingress and (after a consolidation pass) share the infrastructure postgres container, which saves roughly a hundred and fifty megabytes of RAM versus running a third PostgreSQL instance.

BFF, Not Gateway

The first architectural decision the constraints forced was Backend-for-Frontend instead of a general-purpose API gateway. An API gateway (Kong, Tyk, or a custom Fastify instance sitting in front of everything) introduces a new container, new operational surface, and memory overhead that the box cannot spare. And there is exactly one consumer — the Nexus Ops React frontend. There is no second team, no external partner, no mobile client. Rate limiting across multiple consumers and protocol translation are solving problems I do not have.

So Fastify is the gateway. The ops backend aggregates upstream calls through a set of thin client modules — n8nClient, slackClient, googleClient, ollamaClient, groupsDbClient, replicateClient, perplexityClient, systemClient. Each wraps authentication for its upstream service, enforces a timeout (five seconds default, thirty seconds for Ollama because CPU inference is slow), and normalizes errors. A shared proxy helper handles the one-retry-with-exponential-backoff policy across all of them.

BullMQ Discipline

The CRM codebase brings BullMQ along for the ride — Redis-backed job queues with retries, delays, and concurrency limits. The temptation is to put everything in queues. The constraint says otherwise. Redis on this box is capped at 128 megabytes with LRU eviction, which means the queue cannot grow unbounded and cached data is expendable. The discipline that follows: only enqueue work that is legitimately asynchronous (webhooks from n8n, social post ingestion, lead imports), never enqueue work that can be done synchronously inside a request handler, and never treat Redis as durable storage. The Phase 1 deployment can defer the worker container entirely — BullMQ jobs are not critical for launch.

Ollama Local, Claude Remote

The ops hub talks to two different LLM tiers. Ollama runs as a host process on the VPS (not a container), binds to localhost:11434, and is reachable from inside Docker via the bridge gateway at 172.17.0.1:11434. It serves llama3.2 and qwen2.5 at roughly twelve tokens per second on CPU. That is slow for conversational UX and acceptable for classification tasks, pre-filtering, and draft generation where latency tolerance is measured in seconds rather than milliseconds.

Claude (via the Anthropic API) handles everything that requires strong reasoning, editorial judgment, or long-context synthesis — the chat concierge, story selection in the social pipeline, post writing, and agent swarm orchestration. The split is driven by two constraints at once: the CPU ceiling (Ollama cannot serve high-frequency conversational traffic from the same box that is running PostgreSQL and n8n) and the usage posture for remote APIs (every Claude call is a network round trip, so I only want to make one when the task justifies it). Ollama absorbs the cheap, high-volume, latency-tolerant work. Claude absorbs the work that needs to be good.

Why Not Iframes

Iframes were considered for the n8n workflow editor and the Groups app browser. Rejected on four counts. n8n uses cookie-based auth that would require double-login. The Groups app uses Express sessions with trust proxy and secure cookies, which break under cross-origin iframe embedding. An iframe does not let the parent read child DOM, so extracting data for dashboard KPIs is impossible. And every iframe is a full browser rendering context — memory that could be doing something useful on a resource-constrained VPS.

The Tradeoff Calculus

Every decision in this design started with "what does the box allow," and worked outward from there. Separate PostgreSQL instance for isolation? Too expensive in RAM, use the shared one. API gateway for clean separation of concerns? Too expensive in containers, use BFF. Micro-frontends for module independence? Too expensive in bundle size and build complexity, use lazy-loaded route modules. Live DB read of the Groups app instead of building it a public API? Correct — the Groups app has no external consumers besides the dashboard, and building an API just to proxy the same queries is wasted work.

The governing principle is that a resource constraint is an architectural forcing function. When you cannot afford to add a layer, you discover which layers you actually needed. In a year of operating this platform, the answer has almost always been "fewer than the default recommendation." The ops hub is the next expression of that answer — one backend, one frontend, one cache, and one real-time channel, stitching fifteen systems into a single coherent view from inside the ceiling I already own.

← back to writing