MCP: hosting your own AI assistant

I like my assistants like I like my homelab: fast, offline, and not quietly uploading my notes to someone else’s analytics. MCP is the piece that finally made my local setup feel coherent instead of duct‑taped.

Who this is for

If you’re a developer comfortable with terminals, Docker, and a bit of Node/Python, this is for you. We’ll build a local, privacy‑preserving assistant that talks to your filesystem, runs tools, and can plug into open‑source LLMs on your own hardware. The goal: production‑ish hygiene without enterprise ceremony.

What we’ll cover

A quick tour of MCP concepts
Local‑first reference architecture (Ollama/vLLM + MCP servers)
Fast start with Claude Desktop and Cursor
Building a custom MCP server (Node + TypeScript)
Optional Python/FastMCP variant
Docker Compose stack with sane defaults
Security hardening and privacy guardrails that actually matter
Troubleshooting and ops

MCP in 90 seconds

MCP (Model Context Protocol) is a standard that lets an AI client discover and call tools, read resources, and reuse prompts from one or more servers. Under the hood it speaks JSON‑RPC over either stdio (local processes) or Streamable HTTP (remote). Think "USB‑C for AI assistants": same plug, many devices.

Core primitives:

Tools: functions the model may call with structured input to perform actions (e.g. run a Docker task, query a DB).
Resources: read‑only items the client can load into context (files, API responses, config blobs).
Prompts: reusable templates with arguments.

Why it’s helpful for local assistants:

Local stdio servers mean no network egress to use sensitive tools.
You can mix local and remote servers behind the same client.
It’s ecosystem‑friendly: lots of community servers exist already.

Reference architecture (local‑first)

flowchart LR
  subgraph Client
    A[Claude Desktop / Cursor]
  end
  subgraph Local
    O[Ollama / vLLM]:::svc
    F[Filesystem MCP]:::svc
    X[Custom MCP Server]\n(Node/Python):::svc
    V[(Vector DB\nChroma/LanceDB)]:::svc
  end
  subgraph Optional Remote
    R[Remote MCP Servers\n(GitHub, Docker, etc.)]:::svc
  end
  A <--> F
  A <--> X
  X <--> O
  X <--> V
  A <--optional--> R
 
classDef svc fill:#0ea5e9,stroke:#0369a1,color:#fff

Design notes:

Keep sensitive tools local via stdio. Only expose remote HTTP where you need SaaS.
Use an MCP server as your gateway to anything the model should touch. No direct shell access.
Treat the assistant like any service: logs, versioned config, least‑privilege.

Hardware: you can start on a single GPU desktop or even CPU‑only for small models; scale to vLLM on a proper card later.

Fast start: wire up a local Filesystem MCP

Claude Desktop

Install Claude Desktop, open Settings → Developer → Edit config.
Add a minimal MCP entry. Update the path you want to expose:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/you/projects"],
      "env": {"MCP_VERBOSITY": "info"}
    }
  }
}

macOS config lives at: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows (typical): C:\\Users\\<you>\\AppData\\Roaming\\Claude\\claude_desktop_config.json

Restart Claude Desktop. In a new chat, ask it to "list files" and approve the tool call.

Cursor IDE (optional)

Create ~/.cursor/mcp.json:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/repo"]
    }
  }
}

Build a custom MCP server (TypeScript)

This server exposes two tools:

ollama_chat: talk to a local model via Ollama’s HTTP API
search_notes: quick grep‑style search across a read‑only notes directory

Requires Node 18+, @modelcontextprotocol/sdk and Zod. We’ll run it over stdio for privacy.

Project setup

mkdir mcp-local-assistant && cd $_
pm init -y
npm i @modelcontextprotocol/sdk zod undici
npm i -D typescript tsx @types/node
npx tsc --init --rootDir src --outDir dist --module esnext --target es2022
mkdir -p src

src/index.ts

import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { request } from "undici";
import { createReadStream, promises as fs } from "node:fs";
import path from "node:path";
 
const NAME = "mcp-local-assistant";
const VERSION = "0.1.0";
const OLLAMA_URL = process.env.OLLAMA_URL ?? "http://127.0.0.1:11434";
const OLLAMA_MODEL = process.env.OLLAMA_MODEL ?? "llama3.1";
const NOTES_ROOT = process.env.NOTES_ROOT ?? path.resolve(process.cwd(), "notes");
 
// Simple jail for file reads
function safeJoin(root: string, p: string) {
  const full = path.resolve(root, p);
  if (!full.startsWith(path.resolve(root))) throw new Error("path outside NOTES_ROOT");
  return full;
}
 
const server = new McpServer({ name: NAME, version: VERSION });
 
// Tool: talk to local Ollama
server.registerTool(
  "ollama_chat",
  {
    title: "Chat via local Ollama",
    description: "Send a message array to a local Ollama model and return the reply.",
    inputSchema: {
      messages: z.array(z.object({ role: z.enum(["user","assistant","system"]), content: z.string() }))
    }
  },
  async ({ messages }) => {
    const { body } = await request(`${OLLAMA_URL}/api/chat`, {
      method: "POST",
      body: JSON.stringify({ model: OLLAMA_MODEL, messages, stream: false }),
      headers: { "content-type": "application/json" }
    });
    const data = await body.json();
    const text = data?.message?.content ?? String(data);
    return { content: [{ type: "text", text }] };
  }
);
 
// Tool: naive text search in notes
server.registerTool(
  "search_notes",
  {
    title: "Search local notes",
    description: "Return snippets of Markdown files containing a query (case‑insensitive).",
    inputSchema: { query: z.string().min(2), maxFiles: z.number().int().min(1).max(50).default(10) }
  },
  async ({ query, maxFiles }) => {
    const hits: string[] = [];
    async function walk(dir: string) {
      const entries = await fs.readdir(dir, { withFileTypes: true });
      for (const e of entries) {
        const p = path.join(dir, e.name);
        if (e.isDirectory()) await walk(p);
        else if (e.isFile() && e.name.endsWith(".md")) {
          const text = await fs.readFile(p, "utf8");
          if (text.toLowerCase().includes(query.toLowerCase())) {
            hits.push(`- ${path.relative(NOTES_ROOT, p)}`);
            if (hits.length >= maxFiles) break;
          }
        }
      }
    }
    await walk(NOTES_ROOT);
    return { content: [{ type: "text", text: hits.length ? hits.join("\n") : "No matches" }] };
  }
);
 
// Resource: load a note file by URI
server.registerResource(
  "note",
  new ResourceTemplate("note://{path}", { list: undefined }),
  { title: "Note file", description: "Read a Markdown note by relative path" },
  async (_uri, { path: rel }) => {
    const p = safeJoin(NOTES_ROOT, rel);
    const text = await fs.readFile(p, "utf8");
    return { contents: [{ uri: `note://${rel}`, text }] };
  }
);
 
// Start stdio transport
const transport = new StdioServerTransport();
await server.connect(transport);

Run it:

NOTES_ROOT=$HOME/notes OLLAMA_MODEL=llama3.1 tsx src/index.ts

Wire it into Claude Desktop by adding:

{
  "mcpServers": {
    "local-assistant": {
      "command": "node",
      "args": ["/absolute/path/to/dist/index.js"],
      "env": {
        "OLLAMA_URL": "http://127.0.0.1:11434",
        "OLLAMA_MODEL": "llama3.1",
        "NOTES_ROOT": "/Users/you/notes"
      }
    }
  }
}

Tip: use stdio for this server. If you need remote access, add a separate Streamable HTTP deployment and protect it with OAuth and IP allow‑lists.

Python variant (FastMCP)

Prefer Python? Same idea in a few lines:

# server.py
from mcp.server.fastmcp import FastMCP
import httpx, os
 
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://127.0.0.1:11434")
MODEL = os.environ.get("OLLAMA_MODEL", "llama3.1")
 
mcp = FastMCP("mcp-local-py")
 
@mcp.tool()
def ollama_chat(message: str) -> str:
    """Send a single message to local Ollama and return the reply text."""
    r = httpx.post(f"{OLLAMA_URL}/api/chat", json={
        "model": MODEL,
        "messages": [{"role": "user", "content": message}],
        "stream": False
    }, timeout=60)
    r.raise_for_status()
    data = r.json()
    return data.get("message", {}).get("content", str(data))
 
@mcp.resource("note://{name}")
def note(name: str) -> str:
    with open(os.path.join(os.environ.get("NOTES_ROOT","."), name), "r", encoding="utf-8") as f:
        return f.read()
 
if __name__ == "__main__":
    import mcp.server.stdio
    mcp.run(mcp.transport_stdio())

Run:

uv add "mcp[cli]" httpx
OLLAMA_MODEL=llama3.1 uv run python server.py

Docker Compose: local stack with guardrails

The compose file below runs Ollama and our Node MCP server. It favours least‑privilege and predictable networking.

docker-compose.yml

version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434" # Only if you need host access; remove for container‑only
    environment:
      - OLLAMA_KEEP_ALIVE=30m
    deploy:
      resources:
        limits:
          cpus: "2"
    security_opt:
      - no-new-privileges:true
 
  mcp:
    build: ./mcp-local-assistant
    container_name: mcp-local-assistant
    restart: unless-stopped
    environment:
      - OLLAMA_URL=http://ollama:11434
      - OLLAMA_MODEL=llama3.1
      - NOTES_ROOT=/notes
    volumes:
      - ./notes:/notes:ro
    depends_on:
      - ollama
    # Hardening
    read_only: true
    cap_drop: ["ALL"]
    security_opt:
      - no-new-privileges:true
    networks: [internal]
 
networks:
  internal:
    driver: bridge
 
volumes:
  ollama:

mcp-local-assistant/Dockerfile

FROM node:20-alpine as deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
 
FROM node:20-alpine as build
WORKDIR /app
COPY . .
RUN npm ci && npm run build
 
FROM node:20-alpine
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
USER 1000:1000
ENTRYPOINT ["node", "dist/index.js"]

Bring it up:

docker compose up -d --build

Test the server directly with Claude Desktop via stdio on your host, or run the server on the host if you prefer strict host‑only stdio. For remote HTTP exposure, put it behind your zero‑trust gateway and OAuth.

Security and privacy that actually help

Threats to care about

Prompt injection and confused‑deputy: any untrusted text the model reads may try to trigger dangerous tools.
Identity fragmentation: credentials scattered across servers, shells, and env vars become a minefield.
Over‑privileged servers: write access where read would do; broad network egress.

Practical guardrails

Prefer stdio for local servers with sensitive powers. Only use HTTP when you need remote access and then require OAuth and device checks.
Run servers under non‑root, cap_drop: [ALL], no-new-privileges, and read‑only mounts by default.
Scope your MCP tools narrowly. One tool per capability with small, explicit schemas.
Enforce allow‑lists: explicit directories for any filesystem access; explicit hosts for any HTTP client in your code.
No static long‑lived secrets. Use short‑lived tokens, rotate often, and store out of repo. For homelabs, environment files with tight file perms are still better than baking secrets into configs.
Add an interactive confirmation layer on risky tools: require a human click for side‑effects like "delete", "transfer", "push".
For remote access, put Streamable HTTP servers behind your zero‑trust gateway (SAML/OIDC, device posture, IP anchor). Keep an audit log.
Log tool invocations with arguments and status. Keep logs local; scrub payloads on error.

Model safety

Treat any content the model reads as untrusted. Prefer pulling in resources you control (local files) instead of arbitrary web pages. When you must browse, proxy through a sanitizer and redact high‑risk patterns.

Operating it day‑to‑day

Observability: print structured logs from your server; tail them in a tmux pane or Loki/Grafana if you like pain.
Testing: smoke tests that call tools with fixed payloads. It’s just JSON over stdio/HTTP.
Upgrades: pin SDK versions; track release notes monthly; regen your config when MCP spec bumps.
Backups: your notes and configs, not your containers.

Troubleshooting

Claude doesn’t see my server: wrong path or Node version; on Windows ensure you run Desktop with dev mode enabled. Check that your command is reachable from PATH.
HTTP transport CORS: if you mount a Streamable HTTP server in a browser, configure CORS to only allow your client origin. Otherwise stick to stdio.
Ollama timeouts: lower model size or ensure GPU is free; for CPU‑only, expect slow first tokens.
File access denied: your read‑only mount is doing its job. Add a second, explicit rw mount for directories that truly need writes.

A: Example risky‑tool confirm wrapper (TS)

function withConfirmation<TIn extends z.ZodTypeAny>(
  name: string,
  schema: TIn,
  description: string,
  fn: (args: z.infer<TIn>) => Promise<string>
) {
  server.registerTool(name, { title: name, description, inputSchema: schema }, async (args, ctx) => {
    // Ask client to confirm explicitly
    const confirm = await ctx.confirm?.({
      title: `Confirm ${name}`,
      message: `Proceed with ${JSON.stringify(args)}`
    });
    if (!confirm) {
      return { content: [{ type: "text", text: "Cancelled." }] };
    }
    const text = await fn(args as any);
    return { content: [{ type: "text", text }] };
  });
}

B: Minimal vector search tool (TS)

import { ChromaClient } from "chromadb";
const chroma = new ChromaClient({ path: process.env.CHROMA_URL || "http://127.0.0.1:8000" });
 
server.registerTool(
  "search_corpus",
  { title: "Semantic search", description: "Query local Chroma", inputSchema: { q: z.string(), k: z.number().default(5) } },
  async ({ q, k }) => {
    const coll = await chroma.getOrCreateCollection({ name: "docs" });
    const res = await coll.query({ queryTexts: [q], nResults: k });
    const lines = (res?.documents?.[0] || []).map((d: string, i: number) => `${i+1}. ${d.slice(0, 200)}…`);
    return { content: [{ type: "text", text: lines.join("\n") || "No results" }] };
  }
);

C: Example Cursor config with two servers

{
  "mcpServers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] },
    "local-assistant": { "command": "node", "args": ["/home/you/mcp-local-assistant/dist/index.js"], "env": {"NOTES_ROOT": "/home/you/notes"} }
  }
}

Final thoughts

The nice thing about MCP is you can start tiny and grow. One stdio server today, a couple of HTTP servers tomorrow, then a proper gateway if you ever need it. Keep your surface area narrow, your permissions tight, and your models local whenever you can. Your future self, and your future incident post‑mortems, will thank you.