Data API

This project publishes machine-readable JSON files via GitHub Pages so that other tools and researchers can query the scan data programmatically. No authentication is needed; every endpoint is a plain HTTPS GET request.

Base URL

https://mgifford.github.io/eu-plus-government-scans/

Available Endpoints

The table below lists every JSON file served via GitHub Pages. Files marked committed are small enough to live in the repository and are always available via HTTPS. Files marked artifact-only are too large to commit but can be downloaded from the GitHub Actions workflow run that produced them (see Accessing Artifact Files below).

Endpoint	Availability	Description
`technology-index.json`	committed	Compact cross-reference: technology → page count, categories, per-country page counts. Ideal for finding which countries use a given technology.
`technology-license-data.json`	committed	License and policy-classification metadata for the current top detected technologies, including OSI-approval status and DPGA Registry status.
`third-party-tools-data.json`	committed	Third-party JavaScript scan summary: top services, categories, per-country stats, and unknown host review queue.
`technology-data.json`	artifact-only	Full technology scan data with per-country page-level drilldowns.
`social-media-data.json`	artifact-only	Full social-media scan data with per-URL evidence for every country.
`accessibility-data.json`	artifact-only	Full accessibility-statement scan data with per-URL evidence.
`lighthouse-data.json`	artifact-only	Full Lighthouse audit data with per-URL scores.
`lighthouse-data.csv`	artifact-only	Same Lighthouse data as a flat CSV (UTF-8 BOM, opens in Excel).

technology-index.json

URL: https://mgifford.github.io/eu-plus-government-scans/technology-index.json

A compact cross-reference index generated from the technology scan database. Every detected technology is listed with its total page count, detected categories, and a per-country page count breakdown. No per-URL data is included; this keeps the file small enough to commit and serve directly.

Use this endpoint to answer questions like:

How many government pages use WordPress, and in which countries?
Which countries have the most Drupal deployments?
Which technologies appear under the “CMS” category?

Schema

{
  "generated_at": "2026-05-21 00:00 UTC",   // ISO-style UTC timestamp
  "base_url": "https://mgifford.github.io/eu-plus-government-scans/",
  "note": "...",

  // by_technology — keyed by technology name, sorted by page count descending
  "by_technology": {
    "WordPress": {
      "pages": 8762,                // total pages where this technology was detected
      "categories": ["Blogs", "CMS"],  // sorted list of Wappalyzer categories
      "by_country": {              // page count per country code, sorted alphabetically
        "FRANCE": 1234,
        "GERMANY": 567
      }
    }
    // … one entry per detected technology
  },

  // by_category — keyed by category name, sorted by page count descending
  "by_category": {
    "JavaScript libraries": {
      "pages": 54158,              // total pages with at least one tech in this category
      "technologies": ["Bootstrap", "jQuery", "jQuery Migrate"]  // sorted alphabetically
    }
    // … one entry per detected category
  }
}

Example: find all countries using Drupal

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-index.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
drupal = data['by_technology'].get('Drupal', {})
print(f'Total pages: {drupal.get(\"pages\", 0)}')
for country, count in sorted(drupal.get('by_country', {}).items(),
                              key=lambda x: -x[1]):
    print(f'  {country}: {count}')
"

Example: list all CMS technologies

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-index.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
cms = data['by_category'].get('CMS', {})
print('CMS technologies:', cms.get('technologies', []))
print('Total pages:', cms.get('pages', 0))
"

JavaScript example

const BASE = 'https://mgifford.github.io/eu-plus-government-scans/';

const res = await fetch(BASE + 'technology-index.json');
const data = await res.json();

// Which countries use jQuery most?
const jquery = data.by_technology['jQuery'];
const ranked = Object.entries(jquery.by_country)
  .sort(([, a], [, b]) => b - a)
  .slice(0, 5);
console.log('Top 5 jQuery countries:', ranked);

// All CMS technologies
const cmsTechs = data.by_category['CMS']?.technologies ?? [];
console.log('CMS technologies:', cmsTechs);

technology-license-data.json

URL: https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json

Policy-focused metadata for the technologies currently listed in the Technology Scanning page’s Top Technologies table.

Use this endpoint to answer questions like:

Which detected top technologies have OSI-approved licenses?
Which detected top technologies are listed in the DPGA Registry?
Which detections map to proprietary or mixed licensing models?

Schema

{
  "generated_at": "2026-05-21 21:40 UTC",
  "scope_note": "...",
  "dpga_registry_source": "...",
  "records": [
    {
      "technology": "Drupal",
      "license": "GPL-2.0-or-later",
      "osi_approved": "yes",        // one of: yes | no | partial
      "dpga_registry": "listed"     // one of: listed | not_listed
    }
    // ... one entry per top technology
  ]
}

Example: list technologies with OSI-approved licenses

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('records', []):
    if r.get('osi_approved') == 'yes':
        print(r.get('technology'))
"

Example: list DPGA Registry technologies

curl -s https://mgifford.github.io/eu-plus-government-scans/technology-license-data.json \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('records', []):
    if r.get('dpga_registry') == 'listed':
        print(f\"{r.get('technology')}: {r.get('license')}\")
"

third-party-tools-data.json

URL: https://mgifford.github.io/eu-plus-government-scans/third-party-tools-data.json

Summary of the third-party JavaScript scan: which external scripts and services government pages load, how often, and from which countries.

Schema (top-level keys)

{
  "generated_at": "...",
  "summary": {
    "total_batches": 225,
    "total_scanned": 14043,
    "total_reachable": 13119,
    "urls_with_scripts": 5991,
    "total_available": 82714,
    "identified_service_loads": 7788,
    "unique_services": 24,
    "unique_categories": 16,
    "first_scan": "...",
    "last_scan": "..."
  },
  "top_services": [
    {
      "name": "cdnjs (Cloudflare CDN)",
      "loads": 1926,
      "reachable_pages": 773,
      "prevalence_pct": 5.89
    }
    // …
  ],
  "top_categories": [
    { "name": "CDN", "loads": 4626 }
    // …
  ],
  "by_country": [
    {
      "country_code": "AUSTRIA",
      "total_scanned": 821,
      "total_reachable": 790,
      "urls_with_scripts": 266,
      "identified_service_loads": 44,
      "last_scan": "2026-05-16"
    }
    // …
  ],
  "unknown_hosts": [
    {
      "host": "ajax.aspnetcdn.com",
      "loads": 582,
      "reachable_pages": 287
    }
    // …
  ],
  "country_drilldowns": {
    "AUSTRIA": {
      "with_scripts": [
        {
          "page_url": "https://...",
          "scripts": [
            {
              "src": "https://cdnjs.cloudflare.com/...",
              "host": "cdnjs.cloudflare.com",
              "service_name": "cdnjs (Cloudflare CDN)",
              "categories": ["CDN"]
            }
          ]
        }
      ]
    }
  }
}