Stats as of 2026-04-29 06:11 UTC — last scan: 2026-04-29

3 scan batches run

3,140 of 3,863 available pages scanned (81.3% coverage) 2,343 of 3,140 scanned pages were reachable (74.6%) 1,532 reachable pages loaded at least one third-party script (65.4% of reachable) 2,942 known third-party service loads identified 24 unique known services across 17 categories


Third-Party JavaScript by Country

Country Scanned Available Reachable URLs with 3rd-Party JS Known Service Loads Last Scan
Usa Edu Master 3,140 3,763 2,343 1,532 2,942 2026-04-29

Hover or focus any non-zero country-table count to preview matching pages. Activate the number to keep the preview open and download a CSV for that country and metric from Download machine-readable third-party tools data (JSON).


Top Third-Party Services

# Service Loads
1 cdnjs (Cloudflare CDN) 531
2 Google Analytics (GA4) 519
3 jsDelivr CDN 392
4 Google Tag Manager 389
5 jQuery 269
6 Font Awesome 203
7 Google Hosted Libraries 168
8 Google reCAPTCHA 134
9 unpkg CDN 105
10 Bootstrap 65
11 HubSpot 35
12 Sentry 30
13 Adobe Dynamic Tag Management / Launch 24
14 OneTrust 22
15 Facebook Pixel 17
16 Cookiebot 12
17 Cloudflare Turnstile / Challenge 8
18 Stripe 6
19 Zendesk 5
20 Usercentrics 3

Top Service Categories

# Category Loads
1 CDN 1,196
2 Analytics 565
3 JavaScript Library 437
4 Tag Manager 413
5 Icon Library 203
6 Security 142
7 CAPTCHA 134
8 UI Framework 65
9 Cookie Consent 37
10 CRM 35
11 Marketing 35
12 Error Tracking 30
13 Advertising 18
14 Payments 6
15 Customer Support 5

📥 Machine-readable results: Download machine-readable third-party tools data (JSON)


Overview

This scan identifies third-party JavaScript loaded by institution websites, including analytics tags, tag managers, cookie-consent tools, CDNs, customer support widgets, and other externally hosted scripts.

The goal is to make external dependencies across the current institution dataset easier to inspect. This helps answer questions like:

  • Which analytics or advertising vendors appear most often?
  • How common are third-party CDNs and consent managers?
  • Which seed groups lean more heavily on externally hosted web tooling?

The scanner looks at every <script src="..."> on a page, excludes same-origin scripts, and then tries to match known services such as Google Tag Manager, Google Analytics, Matomo Cloud, OneTrust, Cookiebot, Cloudflare, Microsoft Clarity, HubSpot, and more.


Why This Matters

Third-party JavaScript can affect:

  • Privacy: analytics, advertising, and tracking integrations may send data to external services.
  • Security: externally hosted libraries and widgets increase supply-chain risk.
  • Resilience: a page may depend on third-party infrastructure outside the control of the institution.
  • Performance: extra scripts often increase page weight and network cost.

This page gives a dataset-wide view of those dependencies.


Usage

Scan a single seed

python3 -m src.cli.scan_third_party_js --country USA_EDU_MASTER --rate-limit 1.0

Scan all seed files

python3 -m src.cli.scan_third_party_js --all --rate-limit 1.0

Scan all seed files with a runtime cap

python3 -m src.cli.scan_third_party_js --all --max-runtime 110 --rate-limit 1.0

Command-line options

Option Default Description
--country CODE Seed code to scan (for example USA_EDU_MASTER)
--all Scan all seed files in the TOON directory
--toon-dir PATH data/toon-seeds Directory with .toon seed files
--rate-limit N 1.0 Maximum HTTP requests per second
--max-runtime N 0 (no limit) Maximum runtime in minutes for graceful CI stops

GitHub Actions

The Scan Third-Party JavaScript workflow (.github/workflows/scan-third-party-js.yml) runs automatically every 6 hours and can also be triggered manually from the Actions tab.

Artifacts uploaded after each run:

Artifact Contents
3pjs-scan-<run_number> data/metadata.db, scan output log, annotated *_3pjs.toon files
validation-metadata data/metadata.db shared with the other scanners

Output

Annotated TOON file

Each page entry in the output *_3pjs.toon file gains a third_party_js field:

{
  "url": "https://example.gov/",
  "third_party_js": [
    {
      "src": "https://www.googletagmanager.com/gtm.js?id=GTM-XXXX",
      "host": "www.googletagmanager.com",
      "service_name": "Google Tag Manager",
      "version": "GTM-XXXX",
      "categories": ["Tag Manager"]
    }
  ]
}

If scanning failed for a URL, a third_party_js_error field is added instead.

Database table

Results are stored in the url_third_party_js_results table:

Column Type Description
url TEXT Page URL
country_code TEXT Legacy field name for seed identifier
scan_id TEXT Unique scan run ID
is_reachable INTEGER 1 = page fetched successfully
scripts TEXT JSON array of third-party script records
error_message TEXT Error message if the page fetch failed
scanned_at TEXT ISO-8601 timestamp