Technology Scanning

Stats as of 2026-04-09 02:54 UTC — last scan: 2026-04-08

10 scan batches run

10,376 of 82,714 available pages scanned (12.5% coverage) 9,925 pages with technology detections (95.7% of scanned) 270 unique technologies identified


Technology Scan by Country

Country URLs Scanned Pages with Detections Available Last Scan
AUSTRIA 821 787 821 2026-04-07
BELGIUM 1,309 1,225 1,309 2026-04-07
BULGARIA 291 268 291 2026-04-07
CROATIA 233 230 233 2026-04-07
CZECHIA 843 798 843 2026-04-07
DENMARK 415 412 1,521 2026-04-07
ESTONIA 396 388 396 2026-04-08
FINLAND 180 172 180 2026-04-08
FRANCE 2,457 2,282 10,007 2026-04-08
GERMANY 3,431 3,363 6,555 2026-04-08

Hover or focus any non-zero country-table count to preview matching pages. Activate the number to keep the preview open and download a CSV for that country and metric from technology-data.json.


Top Technologies

# Technology Pages Categories
1 jQuery 4,674 JavaScript libraries
2 PHP 3,349 Programming languages
3 Apache 2,524 Web servers
4 Bootstrap 2,113 UI frameworks
5 Font Awesome 2,104 Font scripts
6 Nginx 1,895 Reverse proxies, Web servers
7 Google Font API 1,252 Font scripts
8 jQuery UI 1,184 JavaScript libraries
9 Drupal 1,166 CMS
10 MySQL 1,030 Databases
11 WordPress 1,019 Blogs, CMS
12 jQuery Migrate 1,008 JavaScript libraries
13 Windows Server 833 Operating systems
14 IIS 816 Web servers
15 Slick 812 JavaScript libraries
16 TYPO3 CMS 801 CMS
17 jsDelivr 783 CDN
18 Lightbox 760 JavaScript libraries
19 Microsoft ASP.NET 695 Web frameworks
20 Varnish 618 Caching

Top Technology Categories

# Category Pages
1 JavaScript libraries 10,413
2 Web servers 5,523
3 Programming languages 3,994
4 Font scripts 3,439
5 CMS 3,281
6 UI frameworks 2,368
7 Reverse proxies 1,961
8 CDN 1,453
9 Databases 1,081
10 Operating systems 1,076
11 Web frameworks 1,056
12 Blogs 1,021
13 JavaScript frameworks 785
14 Caching 759
15 Maps 451

📥 Machine-readable results: technology-data.json


Overview

The technology scanner fetches each government page and uses python-Wappalyzer to identify technologies from HTTP response headers and HTML content. Detected technologies (CMS, web server, JavaScript frameworks, analytics, etc.) and their versions are stored in the metadata database and written back into an annotated *_tech.toon TOON file.

Scans run automatically every 6 hours via GitHub Actions so that the full set of URLs across all countries can be covered gradually without overloading government servers.


Usage

Scan a single country

python3 -m src.cli.scan_technology --country ICELAND --rate-limit 2

Scan all countries

python3 -m src.cli.scan_technology --all --rate-limit 2
python3 -m src.cli.scan_technology --all --max-runtime 110 --rate-limit 2.0

Command-line options

Option Default Description
--country CODE Country code to scan (e.g. FRANCE, ICELAND)
--all Scan all countries in the TOON directory
--toon-dir PATH data/toon-seeds/countries Directory with .toon seed files
--rate-limit N 2.0 Maximum HTTP requests per second
--max-runtime N 0 (no limit) Maximum runtime in minutes. The scanner stops gracefully before this limit so that partial results can be saved. Set to ~10 minutes less than the GitHub Actions timeout-minutes value.

GitHub Actions

The Scan Technology Stack workflow (.github/workflows/scan-technology.yml) runs automatically every 6 hours and can also be triggered manually from the Actions tab:

  1. Go to Actions → Scan Technology Stack → Run workflow
  2. Optionally enter a country code (leave blank to scan all countries)
  3. Optionally adjust the rate limit

Artifacts uploaded after each run:

Artifact Contents
tech-scan-<run_number> data/metadata.db, scan output log, annotated *_tech.toon files
validation-metadata data/metadata.db (shared with URL validation and social media scans)

Output

Annotated TOON file

Each page entry in the output *_tech.toon file gains a technologies field:

{
  "url": "https://example.gov/",
  "is_root_page": true,
  "technologies": {
    "Nginx": { "versions": ["1.24"], "categories": ["Web servers"] },
    "WordPress": { "versions": ["6.2"], "categories": ["CMS", "Blogs"] }
  }
}

If detection failed for a URL, a tech_error field is added instead:

{
  "url": "https://unreachable.gov/",
  "tech_error": "Connection error: ..."
}

Database table

Results are stored in the url_tech_results table:

Column Type Description
url TEXT Page URL
country_code TEXT Country identifier
scan_id TEXT Unique scan run ID
technologies TEXT JSON object of detected technologies
error_message TEXT Error message (if detection failed)
scanned_at TEXT ISO-8601 timestamp

Query example:

SELECT url, technologies
FROM url_tech_results
WHERE country_code = 'ICELAND'
ORDER BY scanned_at DESC;

Architecture

scan-technology.yml (GitHub Actions — every 6 hours)
    ↓
scan_technology.py (CLI)
    ↓
TechScanner.scan_country()
    ↓
TechDetector.detect_urls_batch()
    ↓
For each URL:
    httpx.get()  →  HTML + headers
    Wappalyzer.analyze_with_versions_and_categories()
    ↓
Save to url_tech_results table (incremental, per URL)
    ↓
Write *_tech.toon output file

Notes

  • Rate limiting is applied between requests to avoid overloading government servers. The default is 2 requests per second.
  • Technology fingerprinting is best-effort; some sites may return no detections if they use custom or obfuscated stacks.
  • Unlike the URL validator, failed tech scans do not mark a URL for removal — errors are recorded but the URL is kept in future scan cycles.
  • Results are persisted incrementally (one URL at a time) so that partial results are preserved even if the GitHub Actions job times out.
  • The *_tech.toon output files are excluded from version control (see .gitignore).