| .gitignore | ||
| ct_dns_utils.py | ||
| ct_lineage_report.py | ||
| ct_master_report.py | ||
| ct_monograph_report.py | ||
| ct_scan.py | ||
| ct_usage_assessment.py | ||
| domains.example.txt | ||
| Makefile | ||
| README.md | ||
| requirements.txt | ||
Certificate Transparency Search
This project builds a publication-grade monograph from Certificate Transparency and public DNS:
- it finds currently valid leaf certificates whose SAN values contain configured search terms
- it verifies locally that the certificates are real leaf certificates rather than CA certificates or precertificates
- it assesses intended usage from EKU and KeyUsage
- it scans the DNS names exposed by the SAN corpus
- it produces one primary readable output set: a monograph in Markdown, LaTeX, and PDF
The project is designed for public source control:
- real search terms live only in
domains.local.txt - generated artefacts live only in
output/ - caches live only in
.cache/
None of those paths should be committed.
What You Need On A Fresh Machine
Required software
gitpython3makedigxelatex
What each dependency is for
python3: runs the scanners and report generatorsmake: gives you short repeatable commands instead of long manual command linesdig: performs the live DNS scanxelatex: compiles the PDF reports
If xelatex is missing, the Markdown and LaTeX outputs can still be generated, but the PDF targets will fail.
Fresh Install On Another Computer
Clone the repository from your chosen remote and enter the directory:
git clone <repository-url>
cd CertTransparencySearch
Create the local Python environment and install dependencies:
make bootstrap
Create the local-only search-term file:
make init-config
Then edit domains.local.txt and replace the placeholder values with the real search terms you want to scan.
Local Search Terms
The tracked file is:
domains.example.txt
The local-only file is:
domains.local.txt
Rules:
- keep real search terms only in
domains.local.txt - do not rename that file unless you also pass
DOMAINS=...tomake - do not commit it
One-Command Runs
Main publication
This is the single canonical publication. The appendices are embedded into the same monograph, so you do not need to manage separate visible appendix artefacts:
make monograph
Outputs:
output/corpus/monograph.mdoutput/corpus/monograph.texoutput/corpus/monograph.pdf
Internal helper artefacts used during PDF assembly are written only under .cache/monograph-temp/.
Supporting purpose assessment
This is optional. Its findings are already woven into the monograph, but the standalone output can still be useful during development:
make purpose
Outputs:
output/corpus/certificate-purpose-assessment.mdoutput/corpus/certificate-purpose-assessment.json
Historical lineage analysis
This is optional. Its findings are already woven into the monograph, but the standalone output can still be useful during development:
This report extends the analysis across current and expired certificates to study:
- repeated issuance under the same Subject CN
- Subject CN with different Subject DN over time
- Subject CN with different issuing CA or vendor over time
- Subject CN with different SAN profiles over time
- issuance bursts and step-change start dates
make lineage
Outputs:
output/corpus/certificate-lineage-report.mdoutput/corpus/certificate-lineage-report.texoutput/corpus/certificate-lineage-report.pdf
Shorter executive report
make consolidated
Outputs:
output/corpus/consolidated-corpus-report.mdoutput/corpus/consolidated-corpus-report.texoutput/corpus/consolidated-corpus-report.pdf
Full operator run
This creates the local config if missing, then builds the full monograph:
make all
Reproducibility And Run Behaviour
The default Makefile values are:
DOMAINS=domains.local.txtCACHE_TTL=0DNS_CACHE_TTL=86400MAX_CANDIDATES=10000
This means:
- Certificate Transparency is refreshed live on every normal run.
- DNS results are reused for up to one day unless you override the DNS cache TTL.
- The query cap is high enough for the current corpus and the scanner will refuse to run if the live raw match count exceeds the cap.
If you want to override values:
make monograph CACHE_TTL=86400 DNS_CACHE_TTL=86400
Or:
make monograph DOMAINS=/path/to/other.local.txt
Manual Commands
If you do not want to use make, the equivalent commands are:
Inventory appendix source
This is only needed if you want the raw family inventory outside the monograph:
.venv/bin/python ct_scan.py \
--domains-file domains.local.txt \
--cache-ttl-seconds 0 \
--max-candidates-per-domain 10000 \
--output output/corpus/current-valid-certificates.md \
--latex-output output/corpus/current-valid-certificates.tex \
--pdf-output output/corpus/current-valid-certificates.pdf
Purpose assessment
.venv/bin/python ct_usage_assessment.py \
--domains-file domains.local.txt \
--cache-ttl-seconds 0 \
--max-candidates 10000 \
--markdown-output output/corpus/certificate-purpose-assessment.md \
--json-output output/corpus/certificate-purpose-assessment.json
Consolidated report
.venv/bin/python ct_master_report.py \
--domains-file domains.local.txt \
--cache-ttl-seconds 0 \
--dns-cache-ttl-seconds 86400 \
--max-candidates-per-domain 10000 \
--markdown-output output/corpus/consolidated-corpus-report.md \
--latex-output output/corpus/consolidated-corpus-report.tex \
--pdf-output output/corpus/consolidated-corpus-report.pdf
Historical lineage report
.venv/bin/python ct_lineage_report.py \
--domains-file domains.local.txt \
--cache-ttl-seconds 0 \
--max-candidates-per-domain 10000 \
--markdown-output output/corpus/certificate-lineage-report.md \
--latex-output output/corpus/certificate-lineage-report.tex \
--pdf-output output/corpus/certificate-lineage-report.pdf
Full monograph
.venv/bin/python ct_monograph_report.py \
--domains-file domains.local.txt \
--cache-ttl-seconds 0 \
--dns-cache-ttl-seconds 86400 \
--max-candidates-per-domain 10000 \
--markdown-output output/corpus/monograph.md \
--latex-output output/corpus/monograph.tex \
--pdf-output output/corpus/monograph.pdf
Project Structure
ct_scan.py: core CT scan, leaf verification, grouping, and detailed inventory reportct_usage_assessment.py: EKU and KeyUsage assessmentct_lineage_report.py: historical Subject CN, Subject DN, issuer, SAN, and issuance-burst analysisct_dns_utils.py: DNS scanning and provider-signature logicct_master_report.py: shorter consolidated reportct_monograph_report.py: publication-grade monograph with embedded appendicesMakefile: reproducible operator workflow
Safety Against Silent Undercounts
The scanner checks the live raw identity-row count before it executes the capped query. If the configured cap is too low, it stops with an error instead of silently returning an incomplete corpus.
Public Repo Rules
- keep
domains.local.txtlocal only - never commit
output/ - never commit
.cache/ - if you need a sample config in git, update
domains.example.txt, notdomains.local.txt