This project publishes machine-readable JSON files via GitHub Pages so that other tools and researchers can query the scan data programmatically. No authentication is needed; every endpoint is a plain HTTPS GET request.


Base URL

https://mgifford.github.io/eu-plus-government-scans/

Available Endpoints

The table below lists every JSON file served via GitHub Pages. Files marked committed are small enough to live in the repository and are always available via HTTPS. Files marked artifact-only are too large to commit but can be downloaded from the GitHub Actions workflow run that produced them (see Accessing Artifact Files below).

Endpoint Availability Description
technology-index.json committed Compact cross-reference: technology → page count, categories, per-country page counts. Ideal for finding which countries use a given technology.
technology-license-data.json committed License and policy-classification metadata for the current top detected technologies, including OSI-approval status and DPGA Registry status.
third-party-tools-data.json committed Third-party JavaScript scan summary: top services, categories, per-country stats, and unknown host review queue.
technology-data.json artifact-only Full technology scan data with per-country page-level drilldowns.
social-media-data.json artifact-only Full social-media scan data with per-URL evidence for every country.
accessibility-data.json artifact-only Full accessibility-statement scan data with per-URL evidence.
lighthouse-data.json artifact-only Full Lighthouse audit data with per-URL scores.
lighthouse-data.csv artifact-only Same Lighthouse data as a flat CSV (UTF-8 BOM, opens in Excel).

technology-index.json

URL: https://mgifford.github.io/eu-plus-government-scans/technology-index.json

A compact cross-reference index generated from the technology scan database. Every detected technology is listed with its total page count, detected categories, and a per-country page count breakdown. No per-URL data is included; this keeps the file small enough to commit and serve directly.

Use this endpoint to answer questions like:

  • How many government pages use WordPress, and in which countries?
  • Which countries have the most Drupal deployments?
  • Which technologies appear under the “CMS” category?

Schema

{
  "generated_at": "2026-05-21 00:00 UTC",   // ISO-style UTC timestamp
  "base_url": "https://mgifford.github.io/eu-plus-government-scans/",
  "note": "...",

  // by_technology — keyed by technology name, sorted by page count descending
  "by_technology": {
    "WordPress": {
      "pages": 8762,                // total pages where this technology was detected
      "categories": ["Blogs", "CMS"],  // sorted list of Wappalyzer categories
      "by_country": {              // page count per country code, sorted alphabetically
        "FRANCE": 1234,
        "GERMANY": 567
      }
    }
    // … one entry per detected technology
  },

  // by_category — keyed by category name, sorted by page count descending
  "by_category": {
    "JavaScript libraries": {
      "pages": 54158,              // total pages with at least one tech in this category
      "technologies": ["Bootstrap", "jQuery", "jQuery Migrate"]  // sorted alphabetically
    }
    // … one entry per detected category
  }
}

Example: find all countries using Drupal

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-index.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
drupal = data['by_technology'].get('Drupal', {})
print(f'Total pages: {drupal.get(\"pages\", 0)}')
for country, count in sorted(drupal.get('by_country', {}).items(),
                              key=lambda x: -x[1]):
    print(f'  {country}: {count}')
"

Example: list all CMS technologies

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-index.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
cms = data['by_category'].get('CMS', {})
print('CMS technologies:', cms.get('technologies', []))
print('Total pages:', cms.get('pages', 0))
"

JavaScript example

const BASE = 'https://mgifford.github.io/eu-plus-government-scans/';

const res = await fetch(BASE + 'technology-index.json');
const data = await res.json();

// Which countries use jQuery most?
const jquery = data.by_technology['jQuery'];
const ranked = Object.entries(jquery.by_country)
  .sort(([, a], [, b]) => b - a)
  .slice(0, 5);
console.log('Top 5 jQuery countries:', ranked);

// All CMS technologies
const cmsTechs = data.by_category['CMS']?.technologies ?? [];
console.log('CMS technologies:', cmsTechs);

technology-license-data.json

URL: https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json

Policy-focused metadata for the technologies currently listed in the Technology Scanning page’s Top Technologies table.

Use this endpoint to answer questions like:

  • Which detected top technologies have OSI-approved licenses?
  • Which detected top technologies are listed in the DPGA Registry?
  • Which detections map to proprietary or mixed licensing models?

Schema

{
  "generated_at": "2026-05-21 21:40 UTC",
  "scope_note": "...",
  "dpga_registry_source": "...",
  "records": [
    {
      "technology": "Drupal",
      "license": "GPL-2.0-or-later",
      "osi_approved": "yes",        // one of: yes | no | partial
      "dpga_registry": "listed"     // one of: listed | not_listed
    }
    // ... one entry per top technology
  ]
}

Example: list technologies with OSI-approved licenses

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('records', []):
    if r.get('osi_approved') == 'yes':
        print(r.get('technology'))
"

Example: list DPGA Registry technologies

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('records', []):
    if r.get('dpga_registry') == 'listed':
        print(f\"{r.get('technology')}: {r.get('license')}\")
"

third-party-tools-data.json

URL: https://mgifford.github.io/eu-plus-government-scans/third-party-tools-data.json

Summary of the third-party JavaScript scan: which external scripts and services government pages load, how often, and from which countries.

Schema (top-level keys)

{
  "generated_at": "...",
  "summary": {
    "total_batches": 225,
    "total_scanned": 14043,
    "total_reachable": 13119,
    "urls_with_scripts": 5991,
    "total_available": 82714,
    "identified_service_loads": 7788,
    "unique_services": 24,
    "unique_categories": 16,
    "first_scan": "...",
    "last_scan": "..."
  },
  "top_services": [
    {
      "name": "cdnjs (Cloudflare CDN)",
      "loads": 1926,
      "reachable_pages": 773,
      "prevalence_pct": 5.89
    }
    // …
  ],
  "top_categories": [
    { "name": "CDN", "loads": 4626 }
    // …
  ],
  "by_country": [
    {
      "country_code": "AUSTRIA",
      "total_scanned": 821,
      "total_reachable": 790,
      "urls_with_scripts": 266,
      "identified_service_loads": 44,
      "last_scan": "2026-05-16"
    }
    // …
  ],
  "unknown_hosts": [
    {
      "host": "ajax.aspnetcdn.com",
      "loads": 582,
      "reachable_pages": 287
    }
    // …
  ],
  "country_drilldowns": {
    "AUSTRIA": {
      "with_scripts": [
        {
          "page_url": "https://...",
          "scripts": [
            {
              "src": "https://cdnjs.cloudflare.com/...",
              "host": "cdnjs.cloudflare.com",
              "service_name": "cdnjs (Cloudflare CDN)",
              "categories": ["CDN"]
            }
          ]
        }
      ]
    }
  }
}

JavaScript example

const BASE = 'https://mgifford.github.io/eu-plus-government-scans/';

const res = await fetch(BASE + 'third-party-tools-data.json');
const data = await res.json();

// Top 10 external services
data.top_services.slice(0, 10).forEach(s => {
  console.log(`${s.name}: ${s.loads} loads on ${s.reachable_pages} pages`);
});

// Countries that load Google Analytics most
const gaCountries = data.by_country
  .filter(c => c.identified_service_loads > 0)
  .sort((a, b) => b.identified_service_loads - a.identified_service_loads)
  .slice(0, 5);
console.log('Top service-load countries:', gaCountries.map(c => c.country_code));

technology-data.json

Availability: GitHub Actions artifact (scan-progress-report-*)

Full technology scan data including per-country page-level drilldowns (one entry per scanned URL). This file can exceed GitHub’s 100 MB file-size limit as scan coverage grows, so it is not committed to the repository.

Schema (top-level keys)

{
  "generated_at": "...",
  "summary": { ... },          // same as technology-index.json summary
  "top_technologies": [ ... ], // [{name, pages, categories}]
  "top_categories": [ ... ],   // [{name, pages}]
  "by_country": [ ... ],       // per-country totals
  "country_drilldowns": {
    "GERMANY": {
      "scanned": [
        {
          "page_url": "https://...",
          "technologies": [
            { "name": "Nginx", "categories": ["Web servers"], "versions": ["1.24"] }
          ],
          "technology_names": ["Nginx"],
          "error_message": "",
          "last_scanned": "2026-05-19T..."
        }
      ],
      "detected": [ ... ]    // subset where at least one technology was found
    }
  }
}

Download from GitHub ActionsGenerate Scan Progress Report → latest completed run → Artifactsscan-progress-report-*docs/technology-data.json.


social-media-data.json

Availability: GitHub Actions artifact (scan-progress-report-*)

Full social-media scan data. Per-URL evidence showing which social platforms (Twitter/X, Bluesky, Mastodon, etc.) each government page links to.

See docs/social-media.md for an overview of the scan and field definitions.


accessibility-data.json

Availability: GitHub Actions artifact (scan-progress-report-*)

Full accessibility-statement scan data. Per-URL evidence showing whether each page has an accessibility statement and where it was found.

See docs/accessibility-statements.md for field definitions and methodology.


lighthouse-data.json

Availability: GitHub Actions artifact (scan-progress-report-*)

Full Google Lighthouse audit results. One entry per scanned URL with performance, accessibility, best-practices, SEO, and PWA scores (0–100 scale).

See docs/lighthouse-results.md for methodology.


lighthouse-data.csv

Availability: GitHub Actions artifact (scan-progress-report-*)

Same Lighthouse data as a flat CSV (UTF-8 BOM for correct Excel import). Columns: url, country_code, performance, accessibility, best_practices, seo, pwa, scanned_at.


Accessing Artifact Files

Files marked artifact-only are uploaded after each Generate Scan Progress Report workflow run and retained for 30 days.

To download an artifact:

  1. Go to Actions → Generate Scan Progress Report
  2. Click the most recent completed run
  3. Scroll to the Artifacts section at the bottom of the page
  4. Download scan-progress-report-* — it contains all JSON and CSV data files

To download via the GitHub API:

# Find the latest artifact run ID
RUN_ID=$(gh api --paginate \
  "/repos/mgifford/eu-plus-government-scans/actions/artifacts?per_page=100&name=scan-progress-report" \
  --jq '.artifacts[] | select(.expired == false) | "\(.created_at) \(.id)"' \
  | sort -rk1 | head -1 | awk '{print $2}')

# Download and unzip
gh api "/repos/mgifford/eu-plus-government-scans/actions/artifacts/${RUN_ID}/zip" \
  > scan-data.zip
unzip scan-data.zip -d scan-data/

Update Frequency

All JSON files are regenerated by the Generate Scan Progress Report workflow, which runs daily at 05:15 UTC and after every major scan run.

File Committed to repo Regenerated
technology-index.json ✅ Yes Daily
third-party-tools-data.json ✅ Yes Daily
technology-data.json ❌ Artifact only Daily
social-media-data.json ❌ Artifact only Daily
accessibility-data.json ❌ Artifact only Daily
lighthouse-data.json ❌ Artifact only Daily
lighthouse-data.csv ❌ Artifact only Daily

CORS and Caching

GitHub Pages sets permissive CORS headers (Access-Control-Allow-Origin: *), so all committed JSON files can be fetched directly from client-side JavaScript running on any origin.

Responses are cached by GitHub’s CDN. To get the freshest data, add a cache- busting query string or use conditional If-None-Match / If-Modified-Since headers.