← All guides

Guide · HTTP probe

How to monitor a REST API the right way

curl -I https://api.example.com/health returns 200 OK and your monitor stays green for hours while the database is unreachable, the checkout endpoint times out, and the build that went out last night still says it's the previous version. Real REST API monitoring assumes the status code is the easiest thing to fake — and asserts on everything else too.

Published 2026-05-22 · ~11 min read · StatusPulse Team

Why status-code-only checks miss half the failures

The default uptime monitor — anybody's uptime monitor — does one thing: it sends a GET, reads the response code, and calls anything starting with a 2 healthy. That covers the case where the server is hard down. It also covers roughly half the real outages that page people in the middle of the night.

The other half look like this:

200 OK with broken JSON. A serializer change shipped, your API now returns {"status:"ok"} (missing quote) on the health endpoint. Status code is fine. Every consumer that tries to parse the body explodes.
200 OK in degraded mode. The health endpoint catches the database exception, logs it, and returns {"status":"degraded","db":"down"} with a 200 so the load balancer keeps the pod in rotation. Customers see 503s on every other endpoint. Your monitor doesn't.
Stale deploy. The rollout claimed to succeed, but one replica is still serving last week's build. Half your requests succeed, half fail with "unknown field". The health endpoint, served by both replicas, returns 200 either way.
Slow drift. p50 latency has crept from 80 ms to 1.4 s over a week because an index is missing. Nothing is "down". Customers churn anyway.
Partial outage. /health returns 200 because it doesn't touch the database. /api/orders times out because it does. The status-code-only probe will cheerfully report "all green" through the entire incident.

All five of these are everyday failures. None of them trip a status-code check. Treat the response code as the floor of what you assert on, not the ceiling.

The five layers of an HTTP check

Every HTTP request to your API moves through five layers before the response code even exists. Any one of them can fail independently, and the failure mode tells you what to look at.

DNS resolves. api.example.com turns into an IP. A misconfigured DNS record, an expired domain, a Route 53 outage, or a typo in a CNAME all stop the request before a single byte hits the wire.
TCP connects. The probe opens a socket to that IP on port 443. Firewalls, load-balancer health failures, kernel SYN_RECV queue overflows, and the classic "security group forgot to allow the new region" all live here.
TLS handshakes. Certificate validity, hostname match, chain trust, protocol version, cipher overlap. A cert that expired at midnight UTC takes down every probe worldwide at the same instant.
HTTP responds. The server returns a status line and headers. Anything in 5xx is a server failure; anything in 4xx is usually a client / contract failure; 408 and 504 are timeouts in disguise.
Body asserts. The body actually contains what you expect. This is the only layer that catches semantic failures — the 200 OK that lies.

A useful probe distinguishes between them. If your monitor just says "Down", you have to dig. If it says "TLS handshake failed: certificate expired 2 hours ago", you know exactly which engineer to wake up. StatusPulse stores the failure category per check so you can chart "DNS failures this week" separately from "5xx spikes this week" — they're usually different problems.

What to assert on the body

The body assertion is where status-code-only monitoring stops and real REST API monitoring starts. Four patterns cover most production needs.

JSON value matching

The default and most-used pattern: your health endpoint returns a small JSON document, and you assert on the value of a specific field.

$ curl -s https://api.example.com/health
{"status":"ok","db":"ok","cache":"ok","build":"v1.42.0"}

Assert that the body contains the literal substring "status":"ok". When the API switches to "status":"degraded" on a database failure — which is the right thing for the API to do, by the way, so the load balancer can keep serving cached reads — your probe flips Down. A status-code-only probe would not.

Build hash for deploy verification

Embed the build hash in the health response and assert on it after a deploy. The classic use case: you cut a release, want to confirm every region is actually serving the new code, and don't want to manually curl seventeen edges.

$ curl -s https://api.example.com/health | jq .build
"sha-7c3a9f1"

Add the new SHA to the body assertion. Any region still serving the old build trips Down within one probe interval. Reverse the pattern for a rollback: assert on the previous hash until the rollback is confirmed everywhere.

Version string for rollback detection

Same idea, lower resolution. "version":"1.42.0" in the body, assertion on the substring, alert fires the moment a replica drops back to 1.41.x. Useful when you want human-readable versions rather than commit SHAs on your status page.

Regex for warning markers

Some services don't switch status:ok to status:degraded — they add a warnings array. Treat the presence of any element in that array as a Degraded signal even if the overall status is still ok:

{"status":"ok","warnings":["queue_lag_high"]}

A negative assertion ("body must not contain warnings") catches the moment that array becomes non-empty. The probe stays green when warnings clear, flips to Degraded the moment they appear.

Whatever you assert on, pick a stable marker. Don't assert on timestamps, session IDs, CSRF tokens, or anything that varies per request. The HTTP probe documentation covers the body-size limit (64 KB) and the case-sensitivity rule — if your marker lives past 64 KB into a response, the probe never sees it.

Headers matter, both ways

Headers are where API monitoring gets opinionated. Three header categories deserve attention.

Auth tokens for protected endpoints

Plenty of teams expose /health publicly. Plenty of others — for very good reasons — don't. If your health endpoint is behind a bearer token, the probe needs to send it:

$ curl -H "Authorization: Bearer eyJhbGciOi…" \
       https://api.example.com/health

Use a long-lived, narrowly-scoped probe credential rather than a user JWT — the probe will replay this header every five minutes forever, and you don't want it tied to a person who might leave the company. Store the value in the probe's encrypted-headers field, never in the URL.

X-Probe for upstream identification

Send a constant header identifying the request as a monitor, not a real client:

X-Probe: statuspulse
X-Request-Source: monitoring

Your APM filters out probe traffic from latency percentiles. Your access logs let you grep for probe-only failures. Rate limiters can exempt the probe so a transient 429 doesn't page on-call. A three-character header is the cheapest distinguishing signal you will ever ship.

Correlation IDs for log search

Send a header like X-Probe-Run: $RUN_ID so the server can echo it into its logs. When a check fails, the probe response error message contains the ID; pasting it into your log search jumps straight to the request that broke. This is the difference between "the monitor went red at 03:14" and "the monitor went red at 03:14 because of this request".

Content negotiation

Send Accept: application/json explicitly if you assert on JSON. A server that defaults to HTML when no Accept is supplied (older Rails, default Spring) will happily return <!doctype html> to a probe that forgot to ask for JSON, and your "status":"ok" assertion will always fail, in a confusing way.

Pick the right method

Almost every probe should be a GET. The exceptions are interesting enough to know about.

GET for idempotent health. The default. Cheap, cacheable (which you usually want to disable — see pitfalls), safe to call from anywhere.
POST to exercise the write path. A read-only health check confirms the API is up; it doesn't confirm writes work. A POST /probe endpoint that runs a one-row insert into a probe table, then deletes it, will catch the read-only-replica-promoted-as-primary case a GET will not. Pair it with a body assertion on the inserted row's ID echo.
HEAD for cheap availability. When the body is large (download endpoints, file redirectors), HEAD gets you the status, the content-length, and the headers without the bandwidth. Useful for monitoring static asset reachability without downloading a gigabyte every five minutes.
OPTIONS for CORS sanity. If your frontend talks to api.example.com from app.example.com, a CORS preflight failure breaks the product even though direct curl works fine. An OPTIONS probe with an Origin: https://app.example.com header and a body assertion on the returned Access-Control-Allow-Origin header in the response catches that.

For services that don't speak HTTP at all — gRPC backends in particular — an HTTP /healthz in front of the gRPC process lies systematically: the HTTP layer can be healthy while the gRPC service is rejecting RPCs. Use the gRPC Health probe guide instead. HTTP probes are for HTTP services, full stop.

Latency thresholds — not just up/down

Binary up/down monitoring misses the most common production failure mode: things slowing down. A REST API that took 120 ms last week and 1.8 s this week is in trouble, even if every response is technically 200 OK.

Three habits worth adopting.

Use a Degraded state, not just Down. StatusPulse fires Degraded when the response is fine but the latency exceeded your threshold. Treat Degraded as "look at this, don't wake anyone yet" and Down as "page someone". A 15-minute Degraded streak that doesn't recover is almost always the start of an incident.
Tolerate cold-start spikes. Serverless APIs and autoscaled fleets will hit cold paths occasionally. A single 4-second response after 10 minutes of idle isn't an outage. Use a consecutive-checks rule (StatusPulse's default: 2 consecutive bad checks before flipping state) so one cold start doesn't page anyone.
Watch p50 and p99 separately. p50 climbing from 80 ms to 200 ms means everyone is slower. p99 climbing from 400 ms to 4 s with stable p50 means a fraction of requests hit something pathological — a missing index on a rare code path, a noisy neighbour, a tail latency from a downstream. Both matter. Both deserve their own threshold.

A 1-minute probe interval gives you 1,440 samples per day per region — plenty for a stable p99 over a 24-hour window. Below that, percentiles get noisy.

Common pitfalls that bite in production

Five mistakes that cost teams real incidents. Read them once and you'll avoid most of the embarrassment.

Probing through a CDN cache

If api.example.com sits behind Cloudflare or CloudFront and the health endpoint isn't explicitly excluded from caching, your probe is monitoring the CDN cache, not the origin. The origin can be on fire for 30 minutes and the cached 200 OK {"status":"ok"} keeps your monitor green until the TTL expires.

Two fixes. Either send Cache-Control: no-cache on the probe request and configure the CDN to respect it; or expose a separate /health/origin path with a CDN rule that bypasses the cache for that exact URL. The probe's job is to tell you about the origin. Don't let a middle layer silence it.

Health endpoints that don't touch the database

A startling number of production /healthz endpoints return {"status":"ok"} from a hard-coded handler that doesn't read or write anything. The pod responds, so Kubernetes leaves it in rotation. The database is unreachable, so every real endpoint 500s. The monitor stays green.

A healthcheck should touch every dependency that, if broken, breaks the product. At minimum: a SELECT 1 against the primary database. Adding the same query against the cache, the queue, and any downstream API is worth the extra few milliseconds. The Postgres monitoring guide covers what to assert on the database side once you've got that wired up.

Not following redirects when you should

You probe http://example.com. The server redirects to https://example.com. Your probe records a 301, calls it healthy, and never notices when the HTTPS endpoint itself starts returning 500. Either probe the final HTTPS URL directly, or enable follow-redirects so the probe asserts on the actual destination. Don't measure the redirect step.

The mirror failure: probing the HTTPS URL but having a body assertion that matches the redirect page's body. The redirect keeps working forever, the assertion keeps passing forever, and you never notice the real endpoint is broken. Always assert on content from the page you actually care about.

Rate-limit collisions at high frequency

A 30-second probe interval across 4 regions is 11,520 hits per day. An API with a 10,000-request-per-day per-IP rate limit will start returning 429s before the day is over, your monitor flips Down, and the on-call engineer wakes up to discover the outage is the probe itself.

Either raise the API's rate limit for the probe's source IPs, exempt the X-Probe header from the limiter entirely, or drop the probe interval to a sustainable number. Don't argue with the rate limiter at 03:00.

Probing the wrong layer when the user-visible failure is elsewhere

/health on the API is necessary but not sufficient. Add probes for the workflows users actually care about — login, checkout, the search endpoint. A single probe on /api/orders with a body assertion on the expected shape will tell you about the partial outages a /health probe is structurally blind to. StatusPulse's free plan includes five probes — that's enough to cover health, login, and three critical read endpoints on the same status page.

Wrap-up

Four things to take away:

Status code 200 is the easiest signal to fake, accidentally or otherwise. Assert on the body. Pick a stable marker — "status":"ok", a build hash, a version string, a warnings-absent regex — and let the body do the talking.
Every HTTP check has five layers: DNS, TCP, TLS, HTTP, body. Failure looks different at each one. A probe that distinguishes between them saves you the first ten minutes of every incident.
Pick the right method. GET for most things, POST when you want to exercise the write path, OPTIONS for CORS, HEAD for big payloads. Send headers that identify the probe and let your logs filter it out.
Don't probe through a cache. Don't probe a health endpoint that doesn't touch the database. Don't argue with the rate limiter. The pitfalls list is short, and every item on it has cost someone real downtime.

StatusPulse's HTTP probe ships with all of this baked in: five-layer failure categorisation, optional body assertions with 64 KB scan budget, custom headers stored encrypted at rest, Degraded latency thresholds, multi-region quorum, and a public status page in front of it. Free tier covers five probes, which is enough to monitor a small REST API the right way before you spend a dollar.

Try StatusPulse's HTTP probe

5 probes, 1 status page, forever. No credit card. US or EU host — you choose.

Start free See pricing