Certificate Transparency scan, purpose assessment, and DNS correlation tooling
Find a file
2026-03-29 12:26:22 +02:00
.gitignore Initial public release 2026-03-29 11:40:06 +02:00
ct_dns_utils.py Initial public release 2026-03-29 11:40:06 +02:00
ct_master_report.py Initial public release 2026-03-29 11:40:06 +02:00
ct_monograph_report.py Clarify dual EKU chapter structure 2026-03-29 12:26:22 +02:00
ct_scan.py Initial public release 2026-03-29 11:40:06 +02:00
ct_usage_assessment.py Initial public release 2026-03-29 11:40:06 +02:00
domains.example.txt Initial public release 2026-03-29 11:40:06 +02:00
Makefile Improve monograph UX and document operator workflow 2026-03-29 12:12:14 +02:00
README.md Improve monograph UX and document operator workflow 2026-03-29 12:12:14 +02:00
requirements.txt Initial public release 2026-03-29 11:40:06 +02:00

Certificate Transparency Search

This project builds a publication-grade report set from Certificate Transparency and public DNS:

  • it finds currently valid leaf certificates whose SAN values contain configured search terms
  • it verifies locally that the certificates are real leaf certificates rather than CA certificates or precertificates
  • it assesses intended usage from EKU and KeyUsage
  • it scans the DNS names exposed by the SAN corpus
  • it produces readable Markdown, LaTeX, and PDF outputs

The project is designed for public source control:

  • real search terms live only in domains.local.txt
  • generated artefacts live only in output/
  • caches live only in .cache/

None of those paths should be committed.

What You Need On A Fresh Machine

Required software

  • git
  • python3
  • make
  • dig
  • xelatex

What each dependency is for

  • python3: runs the scanners and report generators
  • make: gives you short repeatable commands instead of long manual command lines
  • dig: performs the live DNS scan
  • xelatex: compiles the PDF reports

If xelatex is missing, the Markdown and LaTeX outputs can still be generated, but the PDF targets will fail.

Fresh Install On Another Computer

Clone the repository from your chosen remote and enter the directory:

git clone <repository-url>
cd CertTransparencySearch

Create the local Python environment and install dependencies:

make bootstrap

Create the local-only search-term file:

make init-config

Then edit domains.local.txt and replace the placeholder values with the real search terms you want to scan.

Local Search Terms

The tracked file is:

  • domains.example.txt

The local-only file is:

  • domains.local.txt

Rules:

  • keep real search terms only in domains.local.txt
  • do not rename that file unless you also pass DOMAINS=... to make
  • do not commit it

One-Command Runs

Main publication

This is the publication-grade monograph with appendices:

make monograph

Outputs:

  • output/corpus/monograph.md
  • output/corpus/monograph.tex
  • output/corpus/monograph.pdf
  • output/corpus/appendix-inventory.md
  • output/corpus/appendix-inventory.tex
  • output/corpus/appendix-inventory.pdf

Supporting purpose assessment

make purpose

Outputs:

  • output/corpus/certificate-purpose-assessment.md
  • output/corpus/certificate-purpose-assessment.json

Shorter executive report

make consolidated

Outputs:

  • output/corpus/consolidated-corpus-report.md
  • output/corpus/consolidated-corpus-report.tex
  • output/corpus/consolidated-corpus-report.pdf

Full operator run

This creates the local config if missing, then runs the purpose assessment and the full monograph:

make all

Reproducibility And Run Behaviour

The default Makefile values are:

  • DOMAINS=domains.local.txt
  • CACHE_TTL=0
  • DNS_CACHE_TTL=86400
  • MAX_CANDIDATES=10000

This means:

  • Certificate Transparency is refreshed live on every normal run.
  • DNS results are reused for up to one day unless you override the DNS cache TTL.
  • The query cap is high enough for the current corpus and the scanner will refuse to run if the live raw match count exceeds the cap.

If you want to override values:

make monograph CACHE_TTL=86400 DNS_CACHE_TTL=86400

Or:

make monograph DOMAINS=/path/to/other.local.txt

Manual Commands

If you do not want to use make, the equivalent commands are:

Inventory appendix source

.venv/bin/python ct_scan.py \
  --domains-file domains.local.txt \
  --cache-ttl-seconds 0 \
  --max-candidates-per-domain 10000 \
  --output output/corpus/current-valid-certificates.md \
  --latex-output output/corpus/current-valid-certificates.tex \
  --pdf-output output/corpus/current-valid-certificates.pdf

Purpose assessment

.venv/bin/python ct_usage_assessment.py \
  --domains-file domains.local.txt \
  --cache-ttl-seconds 0 \
  --max-candidates 10000 \
  --markdown-output output/corpus/certificate-purpose-assessment.md \
  --json-output output/corpus/certificate-purpose-assessment.json

Consolidated report

.venv/bin/python ct_master_report.py \
  --domains-file domains.local.txt \
  --cache-ttl-seconds 0 \
  --dns-cache-ttl-seconds 86400 \
  --max-candidates-per-domain 10000 \
  --markdown-output output/corpus/consolidated-corpus-report.md \
  --latex-output output/corpus/consolidated-corpus-report.tex \
  --pdf-output output/corpus/consolidated-corpus-report.pdf

Full monograph

.venv/bin/python ct_monograph_report.py \
  --domains-file domains.local.txt \
  --cache-ttl-seconds 0 \
  --dns-cache-ttl-seconds 86400 \
  --max-candidates-per-domain 10000 \
  --markdown-output output/corpus/monograph.md \
  --latex-output output/corpus/monograph.tex \
  --pdf-output output/corpus/monograph.pdf \
  --appendix-markdown-output output/corpus/appendix-inventory.md \
  --appendix-latex-output output/corpus/appendix-inventory.tex \
  --appendix-pdf-output output/corpus/appendix-inventory.pdf

Project Structure

  • ct_scan.py: core CT scan, leaf verification, grouping, and detailed inventory report
  • ct_usage_assessment.py: EKU and KeyUsage assessment
  • ct_dns_utils.py: DNS scanning and provider-signature logic
  • ct_master_report.py: shorter consolidated report
  • ct_monograph_report.py: publication-grade monograph with appendices
  • Makefile: reproducible operator workflow

Safety Against Silent Undercounts

The scanner checks the live raw identity-row count before it executes the capped query. If the configured cap is too low, it stops with an error instead of silently returning an incomplete corpus.

Public Repo Rules

  • keep domains.local.txt local only
  • never commit output/
  • never commit .cache/
  • if you need a sample config in git, update domains.example.txt, not domains.local.txt