For years I defaulted to big commercial search because, well, muscle memory. Then it got noisy: ads on top of ads, AI summaries stapled over decent links, and filters that felt more like nudges. I wanted something I could tune, deploy, and trust — for myself and for my homelab. This post is a practical walkthrough of building a private, fast, and actually‑useful search stack with SearXNG. We’ll cover setup (Docker + Caddy), privacy levers that matter, how to improve result quality, and strategies to reduce dependence on any single company.
What you’ll get:
- A production‑ready SearXNG deployment (TLS, rate limiting, image proxy, Valkey)
- A sensible
settings.yml
you can drop in and adapt - Tips to avoid upstream CAPTCHAs and improve relevance
- A workflow to rely less on monolithic engines without wrecking your day
Who this is for: developers, admins, and homelab nerds who can read YAML without crying. Difficulty: easy‑medium.
Why SearXNG
SearXNG is a metasearch engine: it queries multiple providers (general search, wikis, code, academic, etc.), merges results, and returns a single page without profiling you. You can self‑host it, choose engines per category, prefer or demote domains, and even run it behind Tor or proxies if you need extra anonymity. Crucially, you can keep the instance private, so the only person whose traffic policy you need to care about is you.
Two honest caveats:
- It’s not a magic “beat‑every‑engine” button. You trade a bit of hand‑curation and setup to get back control and privacy.
- Some upstreams rate‑limit or slap CAPTCHAs if you hammer them. We’ll mitigate that with SearXNG’s Limiter, Valkey, sane timeouts, and good engine hygiene.
Quickstart: production SearXNG with Docker, Caddy and Valkey
This gets you a secure deployment with automatic TLS and a built‑in rate‑limiter backend. Replace search.example.com
and an email you control.
File tree
searxng/
├─ docker-compose.yml
├─ Caddyfile
└─ searxng/
└─ settings.yml
docker-compose.yml
services:
valkey:
image: valkey/valkey:8-alpine
command: ["valkey-server", "--save", "", "--appendonly", "no"]
restart: unless-stopped
volumes:
- valkey:/data
searxng:
image: searxng/searxng:latest
restart: unless-stopped
depends_on:
- valkey
environment:
# Needed by SearXNG for crypto/session bits
- SEARXNG_SECRET=${SEARXNG_SECRET}
# Optional, helps with absolute URLs and OpenSearch
- SEARXNG_BASE_URL=https://search.example.com/
# Bind inside the container
- BIND_ADDRESS=[::]:8080
volumes:
- ./searxng:/etc/searxng
networks:
- searxnet
caddy:
image: caddy:2-alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
environment:
- ACME_AGREE=true
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
depends_on:
- searxng
networks:
- searxnet
networks:
searxnet:
driver: bridge
volumes:
valkey:
caddy_data:
caddy_config:
Caddyfile
search.example.com {
encode zstd gzip
tls admin@example.com
header {
Referrer-Policy "no-referrer"
X-Content-Type-Options "nosniff"
X-Download-Options "noopen"
X-Robots-Tag "noindex, nofollow"
}
reverse_proxy searxng:8080
}
Generate a strong secret and start
export SEARXNG_SECRET=$(openssl rand -hex 32)
docker compose up -d
If DNS is correct, Caddy will fetch a cert and you’ll be live at https://search.example.com
.
Prefer Nginx/Traefik? Same idea: terminate TLS, forward to
searxng:8080
, and don’t forget headers.
A sane settings.yml
you can actually use
Save this as searxng/settings.yml
and tweak to taste.
use_default_settings: true
general:
instance_name: "SearXNG"
enable_metrics: false # keep your stats local unless you need them
privacypolicy_url: "https://search.example.com/privacy"
server:
base_url: "https://search.example.com/"
method: "POST" # prefer POST for queries
secret_key: "${SEARXNG_SECRET}" # templated through env or generated once
public_instance: false
image_proxy: true # proxy image thumbnails through your server
limiter: true # enable SearXNG's built-in limiter
valkey:
url: "redis://valkey:6379/0" # Valkey is Redis-compatible
outgoing:
request_timeout: 2.0
max_request_timeout: 10.0
useragent_suffix: "admin@search.example.com" # gives upstreams a contact
pool_connections: 100
pool_maxsize: 10
enable_http2: true
# If you want to chain via Tor: uncomment below and run a local Tor SOCKS
# proxies:
# all://:
# - socks5h://tor:9050
# using_tor_proxy: false
# Minimal, high-signal engines per tab. Start lean; add later.
engines:
- name: brave
engine: brave
categories: [general, images, news, videos]
# if you have a Brave Search API key, add:
# api_key: "${BRAVE_API_KEY}"
- name: wikipedia
engine: wikipedia
categories: [general]
- name: wikidata
engine: wikidata
categories: [general]
- name: stackoverflow
engine: stackoverflow
categories: [it]
- name: github
engine: github
categories: [it]
- name: arxiv
engine: arxiv
categories: [science]
- name: pubmed
engine: pubmed
categories: [science]
- name: moja
engine: mojeek
categories: [general]
categories_as_tabs:
general:
images:
videos:
news:
it:
science:
plugins:
- searx.plugins.hostnames
hostnames:
high_priority:
- '(.*\.)?wikipedia.org$'
- '(.*\.)?stackexchange.com$'
- '(.*\.)?arxiv.org$'
low_priority:
- '(.*\.)?pinterest\..*$' # example of sites you might want lower
Notes
- Keep
engines:
small at first. Fewer, better engines usually beat “enable everything.” image_proxy: true
stops your browser from ever touching third‑party CDNs for thumbnails.limiter: true
needs Valkey. We already added it in Compose.useragent_suffix
with a contact can reduce blocks.
Privacy levers that actually matter
- POST queries + no logs: use
server.method: POST
and disable reverse‑proxy access logs if you don’t need them. If you deploy with uWSGI or Granian natively, disable app logs there too. - Image proxy on: thumbnails and favicons go through your box, not trackers.
- Private instance: keep it off the public internet unless you intend to serve others. If you must expose it, pair the Limiter with reasonable per‑IP budgets and maybe require auth at the proxy.
- Outgoing hygiene: short timeouts, minimal concurrency, and a clear
useragent_suffix
decrease the chance of triggering CAPTCHAs. - Optional Tor/proxies: if you need extra anonymity or geodiversity, wire Tor or proxies into
outgoing:
. Expect slower results; that’s normal.
The Limiter: fewer CAPTCHAs, less pain
SearXNG’s Limiter watches client behaviour and rate‑limits abusive IPs. With Valkey backing it, you get sliding windows and dynamic lists. In practice this means upstream engines see steady, human‑like traffic from you rather than spikes.
If you run a private instance for just you or your household, defaults are fine. For a small team, start with something like:
# searxng/limiter.toml (optional explicit policy)
[policies.default]
# 15 requests / 20s, 150 / 10m as a starting point
window_20s = 15
window_10m = 150
Restart after adding/changing limiter policies.
Result quality: making SearXNG feel “first‑try right”
-
Curate per‑tab engines.
- General: Brave, Mojeek, Wikipedia/Wikidata.
- Developer: GitHub, Stack Overflow, MDN.
- Science: arXiv, PubMed, Crossref.
- Images/News: keep a max of two upstreams each for speed.
-
Bias trusted domains. The
hostnames
plugin lets you boost or demote hosts. I always prioritise Wikipedia for concept lookups and arXiv for papers. -
Use SearXNG’s search syntax. Punch
!github flask
to query just GitHub, or!! foo
to auto‑open the first result. Language filters like:ro
or:en
help if you live bilingual like I do. -
API keys for paid upstreams. Some engines work best with an API key (e.g. Brave). If you can budget it, paid quotas tend to be faster and less ban‑happy.
-
Tune timeouts. If pages feel slow, nudge
outgoing.request_timeout
to 3–4 seconds and keepmax_request_timeout
tight at ~8–10s. -
Avoid being noisy. Enabling every engine invites duplicate and flaky results. Start minimal; add only when you feel a consistent gap.
Browser integration
SearXNG advertises an OpenSearch descriptor, so most browsers can set it as default automatically. If your browser ignores autodiscovery, you can add a custom engine with a URL like:
https://search.example.com/search?q=%s
On mobile, I’ve had the best luck with Firefox and iOS browsers that support custom search engines.
Optional: Tor, proxies, and geodiversity
If you run Tor locally, point SearXNG at it by uncommenting the SOCKS proxy in outgoing:
. For Docker, add a tor
service and refer to it as socks5h://tor:9050
. Add a small extra_proxy_timeout
to keep UX decent.
Reality check: Tor will slow you down and some engines will still challenge you. Use it when you need it, not by default.
Troubleshooting playbook
- “Everything times out.” Increase
outgoing.request_timeout
from 2.0 to 4.0 temporarily and check which engines lag. Disable the worst offenders. - CAPTCHAs from a single upstream. Open that engine in a normal browser from the same server IP and solve one challenge. It often unblocks you for a while. Also ensure
useragent_suffix
has a contact. - Images don’t load. If you enabled
image_proxy
, make sure your reverse proxy allows large responses and isn’t stripping headers. - Public instance gets hammered. Raise limiter thresholds carefully, and consider IP allow‑listing or basic auth in Caddy/Nginx.
- Blank page on first run. Check that
SEARXNG_SECRET
was actually injected and thatsettings.yml
was mounted to/etc/searxng
.
Reducing dependence on any one search
Here’s the workflow that’s stuck for me:
- Default everything through SearXNG. It’s the muscle memory breaker. I route browser, terminal helpers, and even my launchers to my instance.
- Use niche independent crawlers on demand. When I’m exploring non‑commercial corners of the web or older technical posts, I’ll fire Marginalia or Wiby. For mainstream queries where an independent index helps, Mojeek adds variety.
- Lean on vertical engines. For code and docs, GitHub/MDN/Stack Overflow engines beat general web. For papers, arXiv and PubMed are faster than trawling generic news.
- Keep one paid upstream in reserve. If your work depends on solid web coverage, a small paid plan on a reputable engine via API stabilises things. You don’t have to love it; it’s an escape hatch.
None of this is absolutist. The goal is to make “use the default megasite” a choice, not a reflex.
Updating and maintaining
- Upgrades:
git pull && docker compose pull && docker compose up -d
- Backups: version your
settings.yml
and anylimiter.toml
in a private repo. - Metrics: if you enable
/stats
, remember it’s anonymous counts. Don’t expose internals.
Personal notes from the homelab
I run SearXNG on a small Proxmox VM behind Caddy. Valkey has been rock‑solid as the Limiter backend. The two quality boosts that mattered most were: aggressively trimming engines, and boosting Wikipedia/Stack Exchange via the hostnames
plugin. Tor is there as a toggle, but I use it sparingly because it drags. On the browser side, setting SearXNG as the system default finally broke my autopilot.
If you’re building this for a team, spend time on the limiter and on writing a tiny privacy policy page. People appreciate clarity.
Here's a minimal Nginx reverse proxy (if you don’t want Caddy)
server {
listen 80;
server_name search.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name search.example.com;
ssl_certificate /etc/letsencrypt/live/search.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/search.example.com/privkey.pem;
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Robots-Tag "noindex, nofollow" always;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
}
}
Final take
Self‑hosting search won’t fix the entire internet, but it puts you back in charge of how you query it. SearXNG is a sweet spot: pragmatic, fast enough, private by default, and genuinely customisable. Start small, add only what you need, and let your defaults work for you, not the other way round.