Technology Scanning
Technology Scanning
Stats as of 2026-04-09 02:54 UTC — last scan: 2026-04-08
10 scan batches run
10,376 of 82,714 available pages scanned (12.5% coverage) 9,925 pages with technology detections (95.7% of scanned) 270 unique technologies identified
Technology Scan by Country
| Country | URLs Scanned | Pages with Detections | Available | Last Scan |
|---|---|---|---|---|
| AUSTRIA | 821 | 787 | 821 | 2026-04-07 |
| BELGIUM | 1,309 | 1,225 | 1,309 | 2026-04-07 |
| BULGARIA | 291 | 268 | 291 | 2026-04-07 |
| CROATIA | 233 | 230 | 233 | 2026-04-07 |
| CZECHIA | 843 | 798 | 843 | 2026-04-07 |
| DENMARK | 415 | 412 | 1,521 | 2026-04-07 |
| ESTONIA | 396 | 388 | 396 | 2026-04-08 |
| FINLAND | 180 | 172 | 180 | 2026-04-08 |
| FRANCE | 2,457 | 2,282 | 10,007 | 2026-04-08 |
| GERMANY | 3,431 | 3,363 | 6,555 | 2026-04-08 |
Hover or focus any non-zero country-table count to preview matching pages. Activate the number to keep the preview open and download a CSV for that country and metric from technology-data.json.
Top Technologies
| # | Technology | Pages | Categories |
|---|---|---|---|
| 1 | jQuery | 4,674 | JavaScript libraries |
| 2 | PHP | 3,349 | Programming languages |
| 3 | Apache | 2,524 | Web servers |
| 4 | Bootstrap | 2,113 | UI frameworks |
| 5 | Font Awesome | 2,104 | Font scripts |
| 6 | Nginx | 1,895 | Reverse proxies, Web servers |
| 7 | Google Font API | 1,252 | Font scripts |
| 8 | jQuery UI | 1,184 | JavaScript libraries |
| 9 | Drupal | 1,166 | CMS |
| 10 | MySQL | 1,030 | Databases |
| 11 | WordPress | 1,019 | Blogs, CMS |
| 12 | jQuery Migrate | 1,008 | JavaScript libraries |
| 13 | Windows Server | 833 | Operating systems |
| 14 | IIS | 816 | Web servers |
| 15 | Slick | 812 | JavaScript libraries |
| 16 | TYPO3 CMS | 801 | CMS |
| 17 | jsDelivr | 783 | CDN |
| 18 | Lightbox | 760 | JavaScript libraries |
| 19 | Microsoft ASP.NET | 695 | Web frameworks |
| 20 | Varnish | 618 | Caching |
Top Technology Categories
| # | Category | Pages |
|---|---|---|
| 1 | JavaScript libraries | 10,413 |
| 2 | Web servers | 5,523 |
| 3 | Programming languages | 3,994 |
| 4 | Font scripts | 3,439 |
| 5 | CMS | 3,281 |
| 6 | UI frameworks | 2,368 |
| 7 | Reverse proxies | 1,961 |
| 8 | CDN | 1,453 |
| 9 | Databases | 1,081 |
| 10 | Operating systems | 1,076 |
| 11 | Web frameworks | 1,056 |
| 12 | Blogs | 1,021 |
| 13 | JavaScript frameworks | 785 |
| 14 | Caching | 759 |
| 15 | Maps | 451 |
📥 Machine-readable results: technology-data.json
Overview
The technology scanner fetches each government page and uses
python-Wappalyzer to identify
technologies from HTTP response headers and HTML content. Detected
technologies (CMS, web server, JavaScript frameworks, analytics, etc.) and
their versions are stored in the metadata database and written back into an
annotated *_tech.toon TOON file.
Scans run automatically every 6 hours via GitHub Actions so that the full set of URLs across all countries can be covered gradually without overloading government servers.
Usage
Scan a single country
python3 -m src.cli.scan_technology --country ICELAND --rate-limit 2
Scan all countries
python3 -m src.cli.scan_technology --all --rate-limit 2
Scan all countries with a runtime cap (recommended for CI)
python3 -m src.cli.scan_technology --all --max-runtime 110 --rate-limit 2.0
Command-line options
| Option | Default | Description |
|---|---|---|
--country CODE |
— | Country code to scan (e.g. FRANCE, ICELAND) |
--all |
— | Scan all countries in the TOON directory |
--toon-dir PATH |
data/toon-seeds/countries |
Directory with .toon seed files |
--rate-limit N |
2.0 |
Maximum HTTP requests per second |
--max-runtime N |
0 (no limit) |
Maximum runtime in minutes. The scanner stops gracefully before this limit so that partial results can be saved. Set to ~10 minutes less than the GitHub Actions timeout-minutes value. |
GitHub Actions
The Scan Technology Stack workflow (.github/workflows/scan-technology.yml)
runs automatically every 6 hours and can also be triggered manually from the
Actions tab:
- Go to Actions → Scan Technology Stack → Run workflow
- Optionally enter a country code (leave blank to scan all countries)
- Optionally adjust the rate limit
Artifacts uploaded after each run:
| Artifact | Contents |
|---|---|
tech-scan-<run_number> |
data/metadata.db, scan output log, annotated *_tech.toon files |
validation-metadata |
data/metadata.db (shared with URL validation and social media scans) |
Output
Annotated TOON file
Each page entry in the output *_tech.toon file gains a technologies field:
{
"url": "https://example.gov/",
"is_root_page": true,
"technologies": {
"Nginx": { "versions": ["1.24"], "categories": ["Web servers"] },
"WordPress": { "versions": ["6.2"], "categories": ["CMS", "Blogs"] }
}
}
If detection failed for a URL, a tech_error field is added instead:
{
"url": "https://unreachable.gov/",
"tech_error": "Connection error: ..."
}
Database table
Results are stored in the url_tech_results table:
| Column | Type | Description |
|---|---|---|
url |
TEXT | Page URL |
country_code |
TEXT | Country identifier |
scan_id |
TEXT | Unique scan run ID |
technologies |
TEXT | JSON object of detected technologies |
error_message |
TEXT | Error message (if detection failed) |
scanned_at |
TEXT | ISO-8601 timestamp |
Query example:
SELECT url, technologies
FROM url_tech_results
WHERE country_code = 'ICELAND'
ORDER BY scanned_at DESC;
Architecture
scan-technology.yml (GitHub Actions — every 6 hours)
↓
scan_technology.py (CLI)
↓
TechScanner.scan_country()
↓
TechDetector.detect_urls_batch()
↓
For each URL:
httpx.get() → HTML + headers
Wappalyzer.analyze_with_versions_and_categories()
↓
Save to url_tech_results table (incremental, per URL)
↓
Write *_tech.toon output file
Notes
- Rate limiting is applied between requests to avoid overloading government servers. The default is 2 requests per second.
- Technology fingerprinting is best-effort; some sites may return no detections if they use custom or obfuscated stacks.
- Unlike the URL validator, failed tech scans do not mark a URL for removal — errors are recorded but the URL is kept in future scan cycles.
- Results are persisted incrementally (one URL at a time) so that partial results are preserved even if the GitHub Actions job times out.
- The
*_tech.toonoutput files are excluded from version control (see.gitignore).