Self-hosted search: moving beyond Google

August 18, 2024 / 10 min read / - views​

For years I defaulted to big commercial search because, well, muscle memory. Then it got noisy: ads on top of ads, AI summaries stapled over decent links, and filters that felt more like nudges. I wanted something I could tune, deploy, and trust — for myself and for my homelab. This post is a practical walkthrough of building a private, fast, and actually‑useful search stack with SearXNG. We’ll cover setup (Docker + Caddy), privacy levers that matter, how to improve result quality, and strategies to reduce dependence on any single company.

What you’ll get:

  • A production‑ready SearXNG deployment (TLS, rate limiting, image proxy, Valkey)
  • A sensible settings.yml you can drop in and adapt
  • Tips to avoid upstream CAPTCHAs and improve relevance
  • A workflow to rely less on monolithic engines without wrecking your day

Who this is for: developers, admins, and homelab nerds who can read YAML without crying. Difficulty: easy‑medium.


Why SearXNG

SearXNG is a metasearch engine: it queries multiple providers (general search, wikis, code, academic, etc.), merges results, and returns a single page without profiling you. You can self‑host it, choose engines per category, prefer or demote domains, and even run it behind Tor or proxies if you need extra anonymity. Crucially, you can keep the instance private, so the only person whose traffic policy you need to care about is you.

Two honest caveats:

  • It’s not a magic “beat‑every‑engine” button. You trade a bit of hand‑curation and setup to get back control and privacy.
  • Some upstreams rate‑limit or slap CAPTCHAs if you hammer them. We’ll mitigate that with SearXNG’s Limiter, Valkey, sane timeouts, and good engine hygiene.

Quickstart: production SearXNG with Docker, Caddy and Valkey

This gets you a secure deployment with automatic TLS and a built‑in rate‑limiter backend. Replace search.example.com and an email you control.

File tree

searxng/
├─ docker-compose.yml
├─ Caddyfile
└─ searxng/
   └─ settings.yml

docker-compose.yml

services:
  valkey:
    image: valkey/valkey:8-alpine
    command: ["valkey-server", "--save", "", "--appendonly", "no"]
    restart: unless-stopped
    volumes:
      - valkey:/data
 
  searxng:
    image: searxng/searxng:latest
    restart: unless-stopped
    depends_on:
      - valkey
    environment:
      # Needed by SearXNG for crypto/session bits
      - SEARXNG_SECRET=${SEARXNG_SECRET}
      # Optional, helps with absolute URLs and OpenSearch
      - SEARXNG_BASE_URL=https://search.example.com/
      # Bind inside the container
      - BIND_ADDRESS=[::]:8080
    volumes:
      - ./searxng:/etc/searxng
    networks:
      - searxnet
 
  caddy:
    image: caddy:2-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    environment:
      - ACME_AGREE=true
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy_data:/data
      - caddy_config:/config
    depends_on:
      - searxng
    networks:
      - searxnet
 
networks:
  searxnet:
    driver: bridge
 
volumes:
  valkey:
  caddy_data:
  caddy_config:

Caddyfile

search.example.com {
  encode zstd gzip
  tls admin@example.com
  header {
    Referrer-Policy "no-referrer"
    X-Content-Type-Options "nosniff"
    X-Download-Options "noopen"
    X-Robots-Tag "noindex, nofollow"
  }
  reverse_proxy searxng:8080
}

Generate a strong secret and start

export SEARXNG_SECRET=$(openssl rand -hex 32)
docker compose up -d

If DNS is correct, Caddy will fetch a cert and you’ll be live at https://search.example.com.

Prefer Nginx/Traefik? Same idea: terminate TLS, forward to searxng:8080, and don’t forget headers.


A sane settings.yml you can actually use

Save this as searxng/settings.yml and tweak to taste.

use_default_settings: true
 
general:
  instance_name: "SearXNG"
  enable_metrics: false            # keep your stats local unless you need them
  privacypolicy_url: "https://search.example.com/privacy"
 
server:
  base_url: "https://search.example.com/"
  method: "POST"                  # prefer POST for queries
  secret_key: "${SEARXNG_SECRET}" # templated through env or generated once
  public_instance: false
  image_proxy: true                # proxy image thumbnails through your server
  limiter: true                    # enable SearXNG's built-in limiter
 
valkey:
  url: "redis://valkey:6379/0"     # Valkey is Redis-compatible
 
outgoing:
  request_timeout: 2.0
  max_request_timeout: 10.0
  useragent_suffix: "admin@search.example.com"  # gives upstreams a contact
  pool_connections: 100
  pool_maxsize: 10
  enable_http2: true
  # If you want to chain via Tor: uncomment below and run a local Tor SOCKS
  # proxies:
  #   all://:
  #     - socks5h://tor:9050
  # using_tor_proxy: false
 
# Minimal, high-signal engines per tab. Start lean; add later.
engines:
  - name: brave
    engine: brave
    categories: [general, images, news, videos]
    # if you have a Brave Search API key, add:
    # api_key: "${BRAVE_API_KEY}"
 
  - name: wikipedia
    engine: wikipedia
    categories: [general]
 
  - name: wikidata
    engine: wikidata
    categories: [general]
 
  - name: stackoverflow
    engine: stackoverflow
    categories: [it]
 
  - name: github
    engine: github
    categories: [it]
 
  - name: arxiv
    engine: arxiv
    categories: [science]
 
  - name: pubmed
    engine: pubmed
    categories: [science]
 
  - name: moja
    engine: mojeek
    categories: [general]
 
categories_as_tabs:
  general:
  images:
  videos:
  news:
  it:
  science:
 
plugins:
  - searx.plugins.hostnames
 
hostnames:
  high_priority:
    - '(.*\.)?wikipedia.org$'
    - '(.*\.)?stackexchange.com$'
    - '(.*\.)?arxiv.org$'
  low_priority:
    - '(.*\.)?pinterest\..*$'   # example of sites you might want lower

Notes

  • Keep engines: small at first. Fewer, better engines usually beat “enable everything.”
  • image_proxy: true stops your browser from ever touching third‑party CDNs for thumbnails.
  • limiter: true needs Valkey. We already added it in Compose.
  • useragent_suffix with a contact can reduce blocks.

Privacy levers that actually matter

  • POST queries + no logs: use server.method: POST and disable reverse‑proxy access logs if you don’t need them. If you deploy with uWSGI or Granian natively, disable app logs there too.
  • Image proxy on: thumbnails and favicons go through your box, not trackers.
  • Private instance: keep it off the public internet unless you intend to serve others. If you must expose it, pair the Limiter with reasonable per‑IP budgets and maybe require auth at the proxy.
  • Outgoing hygiene: short timeouts, minimal concurrency, and a clear useragent_suffix decrease the chance of triggering CAPTCHAs.
  • Optional Tor/proxies: if you need extra anonymity or geodiversity, wire Tor or proxies into outgoing:. Expect slower results; that’s normal.

The Limiter: fewer CAPTCHAs, less pain

SearXNG’s Limiter watches client behaviour and rate‑limits abusive IPs. With Valkey backing it, you get sliding windows and dynamic lists. In practice this means upstream engines see steady, human‑like traffic from you rather than spikes.

If you run a private instance for just you or your household, defaults are fine. For a small team, start with something like:

# searxng/limiter.toml (optional explicit policy)
[policies.default]
# 15 requests / 20s, 150 / 10m as a starting point
window_20s = 15
window_10m = 150

Restart after adding/changing limiter policies.


Result quality: making SearXNG feel “first‑try right”

  1. Curate per‑tab engines.

    • General: Brave, Mojeek, Wikipedia/Wikidata.
    • Developer: GitHub, Stack Overflow, MDN.
    • Science: arXiv, PubMed, Crossref.
    • Images/News: keep a max of two upstreams each for speed.
  2. Bias trusted domains. The hostnames plugin lets you boost or demote hosts. I always prioritise Wikipedia for concept lookups and arXiv for papers.

  3. Use SearXNG’s search syntax. Punch !github flask to query just GitHub, or !! foo to auto‑open the first result. Language filters like :ro or :en help if you live bilingual like I do.

  4. API keys for paid upstreams. Some engines work best with an API key (e.g. Brave). If you can budget it, paid quotas tend to be faster and less ban‑happy.

  5. Tune timeouts. If pages feel slow, nudge outgoing.request_timeout to 3–4 seconds and keep max_request_timeout tight at ~8–10s.

  6. Avoid being noisy. Enabling every engine invites duplicate and flaky results. Start minimal; add only when you feel a consistent gap.


Browser integration

SearXNG advertises an OpenSearch descriptor, so most browsers can set it as default automatically. If your browser ignores autodiscovery, you can add a custom engine with a URL like:

https://search.example.com/search?q=%s

On mobile, I’ve had the best luck with Firefox and iOS browsers that support custom search engines.


Optional: Tor, proxies, and geodiversity

If you run Tor locally, point SearXNG at it by uncommenting the SOCKS proxy in outgoing:. For Docker, add a tor service and refer to it as socks5h://tor:9050. Add a small extra_proxy_timeout to keep UX decent.

Reality check: Tor will slow you down and some engines will still challenge you. Use it when you need it, not by default.


Troubleshooting playbook

  • “Everything times out.” Increase outgoing.request_timeout from 2.0 to 4.0 temporarily and check which engines lag. Disable the worst offenders.
  • CAPTCHAs from a single upstream. Open that engine in a normal browser from the same server IP and solve one challenge. It often unblocks you for a while. Also ensure useragent_suffix has a contact.
  • Images don’t load. If you enabled image_proxy, make sure your reverse proxy allows large responses and isn’t stripping headers.
  • Public instance gets hammered. Raise limiter thresholds carefully, and consider IP allow‑listing or basic auth in Caddy/Nginx.
  • Blank page on first run. Check that SEARXNG_SECRET was actually injected and that settings.yml was mounted to /etc/searxng.

Here’s the workflow that’s stuck for me:

  • Default everything through SearXNG. It’s the muscle memory breaker. I route browser, terminal helpers, and even my launchers to my instance.
  • Use niche independent crawlers on demand. When I’m exploring non‑commercial corners of the web or older technical posts, I’ll fire Marginalia or Wiby. For mainstream queries where an independent index helps, Mojeek adds variety.
  • Lean on vertical engines. For code and docs, GitHub/MDN/Stack Overflow engines beat general web. For papers, arXiv and PubMed are faster than trawling generic news.
  • Keep one paid upstream in reserve. If your work depends on solid web coverage, a small paid plan on a reputable engine via API stabilises things. You don’t have to love it; it’s an escape hatch.

None of this is absolutist. The goal is to make “use the default megasite” a choice, not a reflex.


Updating and maintaining

  • Upgrades: git pull && docker compose pull && docker compose up -d
  • Backups: version your settings.yml and any limiter.toml in a private repo.
  • Metrics: if you enable /stats, remember it’s anonymous counts. Don’t expose internals.

Personal notes from the homelab

I run SearXNG on a small Proxmox VM behind Caddy. Valkey has been rock‑solid as the Limiter backend. The two quality boosts that mattered most were: aggressively trimming engines, and boosting Wikipedia/Stack Exchange via the hostnames plugin. Tor is there as a toggle, but I use it sparingly because it drags. On the browser side, setting SearXNG as the system default finally broke my autopilot.

If you’re building this for a team, spend time on the limiter and on writing a tiny privacy policy page. People appreciate clarity.


Here's a minimal Nginx reverse proxy (if you don’t want Caddy)

server {
  listen 80;
  server_name search.example.com;
  return 301 https://$host$request_uri;
}
 
server {
  listen 443 ssl http2;
  server_name search.example.com;
 
  ssl_certificate     /etc/letsencrypt/live/search.example.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/search.example.com/privkey.pem;
 
  add_header Referrer-Policy "no-referrer" always;
  add_header X-Content-Type-Options "nosniff" always;
  add_header X-Download-Options "noopen" always;
  add_header X-Robots-Tag "noindex, nofollow" always;
 
  location / {
    proxy_pass http://127.0.0.1:8080;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto https;
  }
}

Final take

Self‑hosting search won’t fix the entire internet, but it puts you back in charge of how you query it. SearXNG is a sweet spot: pragmatic, fast enough, private by default, and genuinely customisable. Start small, add only what you need, and let your defaults work for you, not the other way round.