Manual link checking works for occasional audits, but modern web development demands automation. When you're deploying multiple times per day, managing a portfolio of client sites, or maintaining a content-heavy platform, you need programmatic broken link detection. An API-based approach integrates link checking directly into your development workflow—catching broken links before they reach production, alerting you when external resources disappear, and maintaining link health without manual intervention.

TL;DR

  • TinyUtils Dead Link Finder provides a JSON API for automation
  • Integrate into CI/CD pipelines to fail builds on broken links
  • Set up scheduled monitoring for production sites
  • Parse structured JSON responses for automated remediation
  • No API key required for reasonable usage

Why Use an API for Link Checking?

Automation Over Manual Clicks

Browser-based tools require human interaction—someone has to click, wait, review, and export. API-based checking runs unattended. Set it up once, and it works while you sleep. This isn't about convenience; it's about catching problems before users report them.

CI/CD Integration

Modern deployment pipelines run automated tests before releasing code. Link checking belongs in that pipeline. A broken link introduced in a content update should block deployment just like a failed unit test. The API enables this integration—your GitHub Action, GitLab CI, or Jenkins job calls the endpoint and acts on the results.

Continuous Monitoring

External links break without warning. The resource you linked to last month might be gone today. Scheduled API calls—daily, weekly, or hourly depending on your needs—catch these changes. Alert integrations (Slack, email, PagerDuty) notify your team immediately when links fail.

Programmatic Remediation

API responses are structured data, not visual reports. Parse the JSON, identify 404s, cross-reference with your CMS, and trigger automated fixes. Some teams automatically replace broken links with archived versions from the Wayback Machine. Others open tickets in their issue tracker. The API enables whatever workflow fits your needs.

The Dead Link Finder API

The TinyUtils Dead Link Finder exposes a JSON API that mirrors the web interface's functionality. Everything you can do through the browser, you can do programmatically.

Request Format

POST /api/check
Content-Type: application/json

{
  "pageUrl": "https://example.com/page-to-check",
  "scope": "domain",
  "includeAssets": false,
  "httpFallback": false,
  "robots": "respect"
}

Parameters

Parameter Type Description
pageUrl string (required) The URL to crawl and check links from
scope string "domain" (same domain), "same-origin" (same origin), or "all" (external too)
includeAssets boolean Check images, scripts, and stylesheets in addition to links
httpFallback boolean Try HTTP if HTTPS fails (not recommended for HSTS sites)
robots string "respect" (honor robots.txt) or "ignore"

Response Format

{
  "ok": true,
  "rows": [
    {
      "link": "https://example.com/missing-page",
      "status": 404,
      "statusText": "Not Found",
      "finalUrl": null,
      "redirectChain": [],
      "error": null
    },
    {
      "link": "https://example.com/moved-page",
      "status": 301,
      "statusText": "Moved Permanently",
      "finalUrl": "https://example.com/new-location",
      "redirectChain": ["https://example.com/moved-page"],
      "error": null
    }
  ],
  "meta": {
    "runTimestamp": "2024-01-15T10:30:00Z",
    "mode": "domain",
    "totals": {
      "checked": 47,
      "ok": 42,
      "broken": 3,
      "redirects": 2
    },
    "requestId": "abc-123-def"
  }
}

Error Response

{
  "ok": false,
  "message": "Invalid URL format",
  "code": "INVALID_URL",
  "requestId": "abc-123-def"
}

Integration Patterns

GitHub Actions

Add link checking to your GitHub workflow. This example runs on every push to main and fails if broken links are found:

name: Link Check
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * *'  # Daily at 6am UTC

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Check for broken links
        run: |
          RESPONSE=$(curl -s -X POST https://tinyutils.com/api/check \
            -H "Content-Type: application/json" \
            -d '{"pageUrl":"https://your-site.com","scope":"domain"}')

          BROKEN=$(echo $RESPONSE | jq '.meta.totals.broken')
          if [ "$BROKEN" -gt 0 ]; then
            echo "Found $BROKEN broken links!"
            echo $RESPONSE | jq '.rows[] | select(.status >= 400)'
            exit 1
          fi

Node.js Script

For more complex logic, use a Node.js script that can process results and take actions:

const checkLinks = async (siteUrl) => {
  const response = await fetch('https://tinyutils.com/api/check', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      pageUrl: siteUrl,
      scope: 'all',
      includeAssets: true
    })
  });

  const data = await response.json();

  if (!data.ok) {
    throw new Error(data.message);
  }

  const broken = data.rows.filter(r => r.status >= 400);
  const redirects = data.rows.filter(r => r.status >= 300 && r.status < 400);

  return { broken, redirects, meta: data.meta };
};

Python Integration

import requests
import json

def check_site_links(url, scope="domain"):
    response = requests.post(
        "https://tinyutils.com/api/check",
        json={"pageUrl": url, "scope": scope}
    )
    data = response.json()

    if not data["ok"]:
        raise Exception(data["message"])

    broken = [r for r in data["rows"] if r["status"] >= 400]
    return broken, data["meta"]["totals"]

Scheduled Monitoring with Cron

Set up a cron job to check your site regularly and send alerts:

# Run daily at midnight
0 0 * * * /path/to/link-check.sh | mail -s "Link Check Report" team@company.com

Understanding Status Codes

Code Range Meaning Action
200-299 Success Link is working
301, 308 Permanent redirect Update to final URL
302, 307 Temporary redirect Monitor but keep original
400 Bad request Check URL format
401, 403 Authorization required Link may require login
404 Not found Resource is gone—fix or remove
500-599 Server error Temporary—recheck later
0 or null Connection failed DNS failure or timeout

Common Use Cases

Pre-Deploy Validation

Check staging environments before promoting to production. Catch content editor mistakes, broken CMS migrations, and misconfigured redirects before users see them.

Content Migration Audits

Moving to a new CMS or redesigning your site? Run link checks before and after to ensure no links were broken in the transition. Compare results to identify regressions.

SEO Monitoring

Broken outbound links can hurt your search rankings. Monitor external links regularly—especially to high-value resources you cite frequently. When external sites restructure, you'll know immediately.

Client Site Management

Agencies managing multiple client sites need automated monitoring. Set up scheduled checks for each client, aggregate results into dashboards, and demonstrate proactive maintenance in client reports.

Documentation Freshness

Technical documentation links to APIs, libraries, and external resources that change frequently. Regular link checks ensure your docs stay accurate and useful.

Best Practices

Rate Limiting

The API includes built-in concurrency limits to respect target sites. For checking multiple pages, stagger your requests rather than firing them all simultaneously. A 1-2 second delay between calls is courteous.

Error Handling

Always check the ok field before processing results. Handle timeouts gracefully—some servers respond slowly. Implement retry logic with exponential backoff for 5xx errors.

Result Caching

Don't check the same URLs repeatedly in short periods. Cache results for at least a few hours. This reduces load on both the API and the target sites you're checking.

Scope Selection

Start with "domain" scope for internal link validation. Use "all" scope when you need to verify external links too—but be aware this takes longer and includes links you can't fix directly.

Frequently Asked Questions

Are there rate limits?

The API has built-in concurrency limits to be respectful to target sites. For high-volume needs, batch your requests and add delays between calls. Excessive usage may be throttled.

Can I check multiple pages in one request?

Each API call checks links from a single page. For site-wide audits, call the API for each page you want to check. Parallelize within reason, but respect rate limits.

What about authenticated pages?

The API checks publicly accessible pages. Content behind login, paywalls, or IP restrictions won't be accessible. For authenticated content, you'll need custom tooling that can handle your authentication mechanism.

How long do requests take?

Response time depends on the number of links and the speed of the target servers. A typical page with 50 links might take 5-15 seconds. Pages with many external links to slow servers take longer.

What's the difference from the web tool?

Same functionality, different interface. The web tool is for one-off manual checks. The API is for automation. Same backend, same accuracy, different access method.

Can I use this for competitive analysis?

You can check any publicly accessible URL. However, respect robots.txt and don't hammer competitor sites with excessive requests.

Why Use an Online API?

  • No infrastructure: No servers to maintain, no binaries to update
  • Consistent behavior: Same results from any client or platform
  • External perspective: Checks from outside your network, like real users
  • Always current: Latest detection logic without client updates
  • Simple integration: Standard REST API works with any language

Ready to Automate?

Start with the interactive Dead Link Finder to understand the output format, then integrate the API into your workflow. Catch broken links automatically, fail builds on errors, and maintain link health without manual effort.

For agency-scale monitoring, see our agency broken link workflow. For recovering already-broken links, check out fixing broken links with Archive.org.