USA Higher-Education Website Scans

This project discovers and catalogs how United States higher-education institutions using .edu domains publish accessibility statements, maintain reachable URLs, and use modern web technologies and third-party JavaScript.

Current Scan Progress

Progress as of 2026-04-29 06:11 UTC

Scan Type Pages Scanned Coverage
Combined Reachability 2,884 confirmed reachable 74.7%
Social Media 3,849 scanned (2,884 reachable) 99.6%
Technology 3,849 scanned 99.6%
Lighthouse 398 scanned 10.3%
Accessibility Statements 3,861 domains 99.9%

Scan data 2,884 of 3,863 available pages confirmed reachable. See the Scan Progress Report for full details.

Latest Scan Results

  • Scan Progress Report — Overall coverage, scan status, and seed-level comparisons across the project.
  • Social Media — Institutional use of social platforms, with evidence behind the published counts.
  • Accessibility Statements — Evidence showing which pages do and do not publish accessibility statements.
  • Technology Scanning — Detected CMSs, frameworks, analytics tools, and other software found on institution websites.
  • Third-Party JavaScript — External scripts, services, and hosted dependencies loaded by scanned pages.
  • Lighthouse Scanning — Google Lighthouse methodology, workflow details, and page-level quality scores as they are collected.
  • Institution Domains — The tracked source dataset: institution domains and page URLs used as scan inputs.

What We Track

Social Media Presence

We check institution pages for links to social platforms, then classify what was found at page and seed level.

See Social Media for platform coverage, tier definitions, and downloadable evidence.

URL Validation

We validate tracked URLs, follow redirects, and monitor persistent failures so the source dataset stays current.

See Scan Progress Report for current validation coverage and seed-level results.

Technology Detection

We detect the CMS, framework, analytics, hosting, and other technologies used by institution websites.

See Technology Scanning for the detected technologies and seed-level tables.

Third-Party JavaScript

We track externally hosted scripts and services such as analytics tags, consent tools, CDNs, shared JavaScript libraries, and support widgets.

See Third-Party JavaScript for the current breakdown and evidence exports.

Lighthouse Audits

We run Google Lighthouse on each scanned page and record five quality scores: performance, accessibility, best practices, SEO, and PWA compliance (0–100 scale).

See Lighthouse Scanning for full details.

Coverage Scope

The dataset currently targets United States higher-education institutions that use .edu domains.

See Institution Domains for the full source domain and page URL list.

How the Scans Work

Scans run automatically on a schedule via GitHub Actions:

Scan Schedule Priority
Social Media Every 2 hours Highest — confirms reachability and collects social-link data in one pass
Accessibility Statements Every 4 hours High — checks for EU WAD-required accessibility statements
Technology Detection Every 4 hours Medium
Third-Party JavaScript Every 6 hours Medium
URL Validation Every 2 hours Low — lightweight redirect/404 checks; skipped for recently validated URLs
Lighthouse Audits Daily Medium — slower per URL (~25 s), so scanned progressively
Scan Progress Report Daily — automatically triggers an extra run for the most-lagging scan

After each scan run, this site is automatically updated with the latest results.

Accessing Scan Artifacts

Each GitHub Actions scan run uploads its results as a downloadable artifact:

  1. Go to GitHub Actions
  2. Click the relevant workflow
  3. Open a completed run and scroll to the Artifacts section
  4. Download the artifact to inspect the database, annotated TOON files, and scan logs

The Scan Progress Report is regenerated automatically, so most visitors should not need the raw artifacts unless they want to inspect the source outputs directly.

Source Code & Data


Scan data is collected by automated workflows and stored as GitHub Actions artifacts. The progress report is regenerated after every scan and committed directly to this site.