--- name: offensive-osint description: "Operational arsenal for external red-team and bug-bounty reconnaissance. Concrete wordlists (28 Swagger paths, 13 GraphQL paths, 35 high-risk ports, 6 missing-header findings, 15 always-on HTTP checks, 5 SAML paths, cloud bucket permutations, JS guess-paths, vendor product fingerprints for Citrix/F5/Pulse/Fortinet/Cisco/PaloAlto/VMware/Exchange, cloud-native service fingerprints, container/K8s exposure paths, CI/CD platform paths, documentation/wiki leak paths, WHOIS/RDAP, DNS record catalog, Wayback CDX recipes), 43+-pattern secret-regex catalog (incl. modern AI API keys: Anthropic/OpenAI/HuggingFace/Cloudflare/DigitalOcean/npm/PyPI/Docker Hub/Atlassian/DataDog/Sentry/ngrok), 80+ dork corpus across 9 categories, GitHub code-search dorks, copy-paste curl/httpie probes for every check, post-discovery enumeration workflows (AWS/GitHub/Slack/JWT/PMAK/Anthropic/OpenAI), endpoint interest scoring rubric (0–100), mobile app ownership confidence, identity-fabric endpoints (Entra/Okta/ADFS/Google/SAML/M365 Teams+SharePoint+OneDrive+OAuth + user-enum), GraphQL field-suggestion enumeration when introspection disabled, 9 read-only secret validators (Postman/AWS/GitHub/Slack/Anthropic/OpenAI/npm/Atlassian/DataDog), Postman workspace search (verified endpoint), Stack Exchange sweep, public SaaS dorks, email security analysis (SPF/DMARC/DKIM/BIMI/MTA-STS/DNSSEC), origin-discovery / CDN bypass techniques, TLS deep audit (sslyze/testssl.sh/JA3/JA4), reverse-DNS sweep + IPv6 enum, vulnerability prioritization data sources (NVD/EPSS/CISA KEV/ExploitDB/Metasploit), 27 attack-path hint templates, 80+ severity-matrix examples, LinkedIn employee enumeration, job posting tech-stack analysis, Slack/Discord workspace discovery, package registry leak hunting (npm/PyPI/Docker Hub/Quay/GHCR), sat imagery for physical recon, tooling quick-install one-liners, sector-specific recon notes (healthcare/finance/ICS-SCADA/IoT/government), runnable stdlib-only secret_scan.py helper, plus the existing tool references for username/email/phone/people/social/breach/infrastructure/crypto/media/geospatial/AI/archiving/automation. Use when you need concrete probe paths, regexes, payloads, scoring rules, curl one-liners, and tool URLs for an authorized external recon engagement." version: 2.1.1 triggers: - external recon - external red team - red team external - attack surface management - ASM - bug bounty recon - bug bounty - reconnaissance - footprinting - asset discovery - swagger discovery - openapi discovery - graphql introspection - graphql discovery - subdomain enumeration - subdomain takeover - cloud bucket enumeration - bucket enum - S3 enum - GCS enum - Azure blob enum - identity fabric - SSO discovery - IdP fingerprinting - tenant fingerprinting - okta enum - entra enum - azure AD enum - ADFS enum - SAML metadata - mobile recon - APK analysis - mobile attack surface - secret scanning - secret leak - leaked credential - github dorking - google dorking - bing dorking - DDG dorking - postman workspace - stack exchange OSINT - breach lookup - have I been pwned - HudsonRock cavalier - infostealer - dehashed - intelx - shodan recon - censys recon - certificate transparency - crt.sh - JARM - favicon mmh3 - JS endpoint extraction - sourcemap leak - copy paste probes - curl one-liner - email security analysis - SPF DMARC DKIM - origin discovery - CDN bypass - WAF bypass - vendor product fingerprints - Citrix Netscaler - F5 BIG-IP - Pulse Secure - FortiGate - PaloAlto GlobalProtect - Cisco AnyConnect - VMware vCenter - cloud native fingerprint - Lambda function URL - Cloud Run - kubernetes exposure - kubelet - etcd - CI CD exposure - Jenkins recon - GitLab self-hosted - GitHub Actions secrets - documentation leak - Notion public - Confluence anonymous - Trello board - WHOIS RDAP - DNS record catalog - Wayback CDX - LinkedIn enumeration - job posting tech stack - Slack workspace discovery - Discord server discovery - npm token leak - PyPI token leak - Docker Hub leak - sat imagery physical recon - TLS deep audit - JA3 JA4 - reverse DNS sweep - IPv6 enumeration - CVE prioritization - EPSS scoring - CISA KEV - vulnerability prioritization - tooling install - sector specific recon - healthcare DICOM - finance SWIFT - ICS SCADA - Modbus - BACnet - post discovery workflow - JWT triage - AWS key triage - GraphQL field suggestion - Anthropic API key - OpenAI API key - Microsoft 365 deep - Teams federation - SharePoint enum - OneDrive enum - hackerone reference - h1 hacktivity - disclosed reports - community bug reports - prior disclosures - bug bounty reference --- # Offensive OSINT — External Red-Team Arsenal > Companion skill: `osint-methodology` (the "how to think" skill). This skill is the "what to reach for." Use them together. ## 0. When to use / When NOT **Use this skill when:** - You need concrete probe paths, wordlists, regexes, payloads, scoring rules, or tool URLs. - You're executing reconnaissance and need the actual technical reference (vs. methodology). - You're building a recon automation and need specific lists to seed it. **Do NOT use this skill when:** - The user is asking for active exploitation, post-exploitation, or anything past reconnaissance. - The user is asking for defensive / blue-team detections. - The target's authorization isn't established — see §1. --- ## 1. Authorization & Legal Posture For assets the operator owns or has written authorization to assess. Soft scope check before acting against an unverified third-party target — see methodology skill §1 for the full posture. --- ## 2. Confidence Levels - **TENTATIVE** — plausible based on indirect evidence (snippet-only dork match, single-source asset, inferred email pattern). - **FIRM** — directly observed (subdomain resolves, HEAD-confirmed bucket exists, banner returned). - **CONFIRMED** — verified via independent corroboration OR direct verification (live PMAK validation, multiple sources agree, listable bucket with object retrieval). --- ## 3. Output Format Conventions Findings should carry: `id`, `module`, `asset_key`, `category`, `severity` (info/low/medium/high/critical), `confidence`, `title`, `description`, `evidence` (url + UTC timestamp + sha256 + raw ≤ 2 KiB), `references`, `remediation`. UTC timestamps everywhere. --- ## 4. Source Hygiene & Citations URL + UTC timestamp + SHA-256 + tool version + run_id, every artifact. PNG screenshots, JSONL run logs, raw HTTP captures capped at 2 KiB body. --- ## 5. Do NOT - Don't paste creds/PII/session tokens into cloud LLMs. - Don't run destructive probes outside DEEP/`--aggressive`. - Don't use validated credentials for anything except read-only liveness check. - Don't single-source attribute. - Don't assume vendor labels are ground truth. --- ## 6. General OSINT (curated tool refs) - [OSINT Bookmarks](https://tools.myosint.training/) — comprehensive bookmarks. - [OSINT Framework](https://osintframework.com/) — tool/resource directory. - [IntelTechniques Tools](https://inteltechniques.com/tools/) — investigative suite. - [Bellingcat Toolkit](https://www.bellingcat.com/resources/2024/09/24/bellingcat-online-investigations-toolkit/) — investigative journalism. - [CyberSudo OSINT Toolkit](https://docs.google.com/spreadsheets/d/1EC0sKA_W9znzsxUt0wye9UYtyATXw5m8) — OSINT websites list. - [Google Dorks](https://dorksearch.com/) — efficient Google searching. - [Distributed Denial of Secrets](https://ddosecrets.com/) — leaked datasets. - [Country-Specific Resources](https://digitaldigging.org/osint/) — country-targeted OSINT. ## 7. Search Engines | Tool | Notes | |------|-------| | [Carrot2](https://search.carrot2.org/#/search/web) | Clusters results by topic | | [etools](https://www.etools.ch/) | Metasearch | | [Kagi](https://kagi.com/) | Privacy-first, non-personalized | | [Brave Search](https://search.brave.com/) | Independent index; Goggles for custom ranking | | [PDF Search](https://www.pdfsearch.io/) | PDF + table of contents | | [Google Fact Check Explorer](https://toolbox.google.com/factcheck/explorer) | Cross-site fact-check | --- ## 8. Username & Email Investigation | Tool | Purpose | |------|---------| | [Sherlock](https://github.com/sherlock-project/sherlock) | Username search across social networks | | [Maigret](https://github.com/soxoj/maigret) | Profile collector by username | | [What's My Name](https://whatsmyname.app/) | Username search | | [Holehe](https://github.com/megadose/holehe) | Email registration check | | [Epieos](https://epieos.com/) | Email pivots and metadata | | [OSINT Industries](https://osint.industries/) | Email/username/phone lookups | | [Hunter.io](https://hunter.io/) | Domain → emails | | [EmailRep](https://emailrep.io/) | Email reputation | | [Emailable](https://emailable.com/) | Email verification | | [Mugetsu](https://mugetsu.io/) | X/Twitter username history | | [RocketReach](https://rocketreach.co/) / [Apollo](https://www.apollo.io/) | Email enrichment + pattern guessing | | [PhoneInfoga](https://github.com/sundowndev/phoneinfoga) | Phone number intelligence | Browser extensions: [GetProspect](https://chromewebstore.google.com/detail/email-finder-getprospect/bhbcbkonalnjkflmdkdodieehnmmeknp), [SignalHire](https://chrome.google.com/webstore/detail/signalhire-find-email-or/aeidadjdhppdffggfgjpanbafaedankd). --- ## 9. People Search - [TruePeopleSearch](https://www.truepeoplesearch.com/) — free U.S. people search. - [WhitePages](https://www.whitepages.com/), [Spokeo](https://www.spokeo.com/), [Webmii](https://webmii.com/), [Pipl](https://pipl.com/) (paid). - [Clearbit](https://clearbit.com/) — company/individual data enrichment. - [FaceCheck](https://facecheck.id/) / [FaceSeek](https://faceseek.online/) — reverse face search. --- ## 10. Phone Number OSINT - [TrueCaller](https://www.truecaller.com/) — caller ID + spam blocking. - [ThatsThem](https://thatsthem.com/) — reverse phone search. - [Infobel](https://infobel.com/) — non-USA phone search. - [FreeCarrierLookup](https://freecarrierlookup.com/) — carrier/type (US). - [NumlookupAPI](https://numlookupapi.com/) [Freemium] — programmatic carrier checks. - [CallerIDTest](https://calleridtest.com/), [Advanced Background Checks](https://www.advancedbackgroundchecks.com/). --- ## 11. Email-Pattern Inference (TENTATIVE candidates) Given a `(first_name, last_name, domain)`, generate these 8 candidate addresses for breach pre-hits, phishing list curation, and downstream enrichment. Mark as **TENTATIVE** confidence until corroborated. ``` {first}.{last}@{domain} # john.doe@example.com {first}{last}@{domain} # johndoe@example.com {first}@{domain} # john@example.com {first[0]}{last}@{domain} # jdoe@example.com {first}.{last[0]}@{domain} # john.d@example.com {last}@{domain} # doe@example.com {first}_{last}@{domain} # john_doe@example.com {first}-{last}@{domain} # john-doe@example.com ``` Lowercase before lookup. Strip diacritics for ASCII fallback. If the org uses a known pattern (e.g., Hunter.io shows `{first}.{last}` is dominant), prioritize that one and mark FIRM. --- ## 12. Email-Harvest Source Stack Six parallel sources, dedup at the end: 1. **IntelX phonebook API** — 2-step search + poll. Largest single source for breach-era addresses. 2. **Hunter.io** — domain-search endpoint. ~25 free/month. Returns verified emails + roles. 3. **crt.sh** — extract X.509 SAN extensions. Many certs include admin/contact emails. 4. **DuckDuckGo SERP scrape** — HTML scrape of `"@{target-domain}"` results. 5. **Bing SERP scrape** — same query, complementary index. 6. **Wayback CDX** — historic snapshots of the target's homepage / contact / about pages often contain emails removed from the live site. **Email regex:** ```regex \b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b ``` **Noise filter (reject numeric-only locals):** ```regex ^[0-9]+$ ``` (Discards garbage like `12345@example.com` from random tokens.) --- ## 13. Social Media | Platform | Tool | |----------|------| | Instagram | [Picuki](https://www.picuki.com/) — profile view without account | | X/Twitter | [snscrape](https://github.com/snscrape/snscrape) — preferred CLI scraper; Twint as fallback | | Facebook | [Graph Search](https://inteltechniques.com/tools/Facebook.html), [sowsearch.info](https://sowsearch.info/), [lookup-id.com](https://lookup-id.com/), [whopostedwhat.com](https://whopostedwhat.com/) | | Facebook (research) | [Meta Content Library](https://transparency.meta.com/researcher) — CrowdTangle successor (researcher-gated) | | YouTube/Twitch | [Social Blade](https://socialblade.com/) — analytics | | TikTok | [Tokboard](https://tokboard.com/) — trends + profile analytics | | Reddit | [Reveddit](https://www.reveddit.com/) — removed content; [RedTrack.social](https://redtrack.social/) — user history | | Bluesky | [Firesky](https://firesky.tv/) — real-time firehose; [SkyView](https://bsky.jazco.dev/) — follower graphs | | Mastodon | [FediSearch](https://fedisearch.skorpil.cz/) — cross-instance search; [Fedifinder](https://fedifinder.glitch.me/) — find Twitter users on Mastodon | | Faces | [Search4Faces](https://search4faces.com/) | --- ## 14. Public Records & Company Information - [OpenCorporates](https://opencorporates.com/) — world's largest open company DB. - [SEC EDGAR](https://www.sec.gov/edgar.shtml) — U.S. company filings. - [OpenOwnership Register](https://register.openownership.org/) — beneficial ownership. - [MuckRock](https://www.muckrock.com/) — FOIA repository + request tracking. - [EU Tenders (TED)](https://ted.europa.eu/) — EU procurement notices. - [World Bank Projects](https://projects.worldbank.org/) — project + procurement records. - [UK Companies House](https://find-and-update.company-information.service.gov.uk/) — UK companies + officers + filings. ### 14.1 RU registries [Rusprofile](https://www.rusprofile.ru/), [Kontur.Focus](https://focus.kontur.ru/) (freemium), [zakupki.gov.ru](https://zakupki.gov.ru/) (procurement), EGRUL/EGRIP (official, captcha-gated). ### 14.2 CN registries + USCC + ICP - **GSXT** — [gsxt.gov.cn](https://www.gsxt.gov.cn/) National Enterprise Credit Info; cross-check with Tianyancha / Qichacha. - **USCC (Unified Social Credit Code)** — 18-character entity ID assigned to all CN legal entities. Format: ``. Useful for joining GSXT records to ICP filings. - **ICP Beian** — [beian.miit.gov.cn](https://beian.miit.gov.cn/) — every domain serving traffic in mainland CN must register an ICP filing; the filing links the domain to a USCC, which links to the legal entity in GSXT. - Workflow: `target.cn` domain → ICP lookup → USCC → GSXT → entity name + officers + adjacent registered entities. ### 14.3 Sanctions & Compliance - [OFAC SDN List](https://sanctionssearch.ofac.treas.gov/), [EU Sanctions Map](https://www.sanctionsmap.eu/). - [OpenSanctions](https://www.opensanctions.org/) — aggregated. - [OCCRP Aleph](https://aleph.occrp.org/) — investigative documents, leaks, company records. --- ## 15. Breach & Leak Data - [Have I Been Pwned](https://haveibeenpwned.com/) — breach lookup; Pwned Passwords API (k-anonymity). - [Dehashed](https://dehashed.com/) — credential search (paid). - [IntelX](https://intelx.io/) — data intelligence. - [LeakCheck](https://leakcheck.io/), [Snusbase](https://snusbase.com/), [BreachDirectory](https://breachdirectory.org/), [Scattered Secrets](https://scatteredsecrets.com/), [Phonebook](https://phonebook.cz/), [LeakPeek](https://leakpeek.com/). - [Cavalier (Hudson Rock)](https://cavalier.hudsonrock.com/) — **infostealer log lookups; FREE; highest single-source ROI for finding compromised employee credentials in corporate SSO**. ### 15.0.1 HudsonRock Cavalier — direct API recipe The web UI wraps a **public, unauthenticated JSON API**. Hit it directly: ```bash # By domain (canonical first call) curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-domain?domain=target.com" | jq . # By email (single-account check) curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-email?email=alice@target.com" | jq . # By URL (when target's app is the breach victim) curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-url?url=https://app.target.com" | jq . ``` PowerShell: ```powershell $hr = Invoke-RestMethod -Uri "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-domain?domain=$D" -TimeoutSec 30 "Employees: $($hr.employees) | Users: $($hr.users) | Third-party: $($hr.third_parties) | Total: $($hr.total)" $hr.data.employees_urls | Sort-Object -Property occurrence -Descending | Select-Object -First 20 $hr.data.clients_urls | Sort-Object -Property occurrence -Descending | Select-Object -First 15 ``` **Top-level JSON fields:** - `total` — total stealer entries touching this domain. - `totalStealers` — global stealer-log corpus size (context only). - `employees` — count of `<*>@` accounts found. - `users` — count of accounts where the domain appeared as a *visited* URL (customers/vendors). - `third_parties` — accounts touching adjacent domains in the org. - `data.employees_urls[]` — `{occurrence, type, url}` — internal apps where employees were logging in when stolen. **Subdomain hits here = recon gold.** - `data.clients_urls[]` — same shape; user-facing apps (often reveals undocumented public portals). - `data.stealer_families[]` — `{_key, _value}` → which stealer (RedLine / Lumma / StealC / Vidar / Raccoon). - `data.dates_compromised[]` — `{_key, _value}` → temporal distribution. **Free-tier caveats (CRITICAL to know):** - Subdomain hostnames in `data.*_urls[]` past the first few are **redacted with asterisks** (`*****.target.com`). Pivot to paid Cavalier tier or other sources for unredacted. - Free endpoint returns counts + sample URLs only. Cleartext passwords + emails are **never** in the free response. - Rate limit ~1 req/sec/IP; 429 on burst. Sleep 1s between calls. - For unredacted creds + bulk enumeration → paid Cavalier portal. **Severity mapping (per §15.1 + §15.2):** `employees ≥ 10` → CRITICAL, **regardless of whether the breached service is still online** (legacy Lotus Domino / on-prem mail decommissioned + cloud SSO migration → employees almost always reuse passwords → SSO_EXPOSURE escalates CRITICAL). ### 15.1 Domain-Level Breach Severity Mapping When you query a breach corpus by domain, map the result to severity like so: | Stat | Severity | |---|---| | ≥ 10 employees compromised | **CRITICAL** | | 1–9 employees compromised | **HIGH** | | ≥ 1 end-user (non-employee) compromised | **MEDIUM** | | Domain seen in breach with 0 named accounts | **INFO** | **Employees vs end-users distinction:** an employee account is `@` (the breach victim is the target's own staff). An end-user account is the target's customer who reused a password — useful for credential-stuffing risk awareness but not directly compromising the target's identity fabric. ### 15.2 SSO_EXPOSURE finding When a discovered SSO tenant (Entra GUID / Okta slug / Google Workspace domain) intersects with the breach corpus on its domain → `SSO_EXPOSURE` finding, severity **CRITICAL**. Evidence: tenant ID + product + employee count + per-account source attribution. **Legacy-mail-decommissioned pattern (high-value variant):** If `mail.` / `webmail.` returns **NXDOMAIN today** but HudsonRock/HIBP corpus still has historical employee credentials against it AND `autodiscover.` resolves to Microsoft IPs (M365) or `aspmx.l.google.com` MX (Workspace), the org migrated from on-prem to cloud — and the stolen passwords almost certainly survived the migration via password reuse. **Escalate to CRITICAL `SSO_EXPOSURE`** even when the legacy host is dead. Concrete triggers (all three together): 1. `Resolve-DnsName mail. -Type A` → NXDOMAIN (legacy gone) 2. HudsonRock corpus has employee URLs against the *old* host (e.g. `mail./names.nsf` for Lotus Domino, `mail./owa/` for Exchange, `mail./iwaredir.nsf` for iNotes, `mail./zimbra/` for Zimbra) 3. Current MX → M365 / Google Workspace / Zoho cloud (DNS confirms migration) Evidence pack: tenant GUID + breach count + 3+ legacy URLs from corpus + autodiscover Microsoft IPs + current MX. Recommend forced password rotation + MFA audit + Conditional Access review. --- ## 16. Pre-built Wordlists & Probe Paths Copy-pasteable arsenals, severity-annotated where relevant. ### 16.1 Swagger / OpenAPI discovery — 28 paths Probe each path on every alive webapp. GET (or HEAD if rate-limited). ``` swagger.json swagger.yaml swagger/v1/swagger.json swagger/v2/swagger.json swagger-ui.html swagger-ui/ swagger-resources api-docs api-docs.json api/swagger api/swagger.json api/swagger-ui.html api/v1/swagger.json api/v2/swagger.json api/v3/api-docs v2/api-docs v3/api-docs openapi.json openapi.yaml openapi/v1 openapi/v3 docs redoc rapidoc api/docs api/documentation .well-known/openapi ``` **Severity:** - Reachable Swagger/OpenAPI spec without auth → **HIGH** `LEAKY_API_SPEC` (full endpoint enumeration leaks; often reveals undocumented internal APIs). - Behind auth but accessible to any authenticated user → MEDIUM (still discloses internal API surface). ### 16.2 GraphQL discovery — 13 paths ``` graphql graphiql api/graphql v1/graphql v2/graphql query api/query gql altair playground subscriptions graphql/console api/v1/graphql ``` **Standard introspection POST body:** ```json { "operationName": "IntrospectionQuery", "query": "query IntrospectionQuery { __schema { types { name kind fields { name type { name kind } } } queryType { name } mutationType { name } subscriptionType { name } } }" } ``` **Severity:** - Introspection returns schema without auth → **HIGH** `OPEN_GRAPHQL_API`. - Field-suggestion enumeration possible (server returns "did you mean" for typo'd field names) → **MEDIUM** (re-derive partial schema even when introspection is disabled). - `/graphql` accepts batched queries (`[...]` request body) → MEDIUM (rate-limit bypass surface; auth bypass via mixed batches). UI markers (lower severity but still discoverable): - HTML response contains `graphiql`, `playground`, `apollo studio`, `altair` → GraphiQL UI exposed (often shipped accidentally on prod). ### 16.3 High-risk ports — 35 services For each open port, emit a finding with the severity and "why an attacker cares" below. Source for the open-port observation: Shodan InternetDB (free, 1 req/sec) is the recommended starting point. | Port | Service | Severity | Why it matters | |---|---|---|---| | 21 | FTP | HIGH | Anonymous read often enabled; cleartext creds. | | 22 | SSH | LOW | Banner discloses version; brute-force surface. | | 23 | Telnet | HIGH | Cleartext protocol; should never be exposed. | | 25 | SMTP | LOW | Open relay risk; version banner. | | 53 | DNS | LOW | Recursion = DDoS amplifier; AXFR opportunism. | | 80 | HTTP | INFO | Standard. | | 110 | POP3 | LOW | Cleartext if no STARTTLS. | | 111 | rpcbind | MEDIUM | NFS exports enumeration. | | 135 | MS RPC | HIGH | Enum via Impacket. | | 139 | NetBIOS-SSN | HIGH | File/printer enum. | | 143 | IMAP | LOW | Cleartext if no STARTTLS. | | 161 | SNMP | HIGH | Community strings often `public`/`private`; full device enum. | | 389 | LDAP | HIGH | Anonymous bind = full directory dump. | | 443 | HTTPS | INFO | Standard. | | 445 | SMB | **CRITICAL** | EternalBlue, SMB relay, anonymous shares. | | 465 | SMTPS | LOW | Banner. | | 514 | rsyslog | MEDIUM | Log injection / DoS. | | 587 | SMTP-MSA | LOW | Banner. | | 631 | IPP/CUPS | MEDIUM | Print server enum / RCE in old CUPS. | | 873 | rsync | HIGH | Modules often listable; backup data exposure. | | 1433 | MSSQL | HIGH | Brute-force; xp_cmdshell. | | 1521 | Oracle TNS | HIGH | Brute-force; SID enum. | | 2049 | NFS | HIGH | World-readable exports. | | 2375 | Docker API (unencrypted) | **CRITICAL** | Unauthenticated container/host takeover. | | 2376 | Docker API (TLS) | HIGH | Cert validation bypass risk. | | 3000 | Common dev / Grafana | MEDIUM | Often Grafana / Express dev with default creds. | | 3306 | MySQL | HIGH | Brute-force; default `root:""`. | | 3389 | RDP | **CRITICAL** | BlueKeep / DejaBlue / NLA bypass. | | 5432 | PostgreSQL | HIGH | Brute-force; default `postgres:postgres`. | | 5601 | Kibana | HIGH | Often unauthenticated; Elasticsearch pivot. | | 5900 | VNC | HIGH | Often unauthenticated or weak password. | | 5984 | CouchDB | HIGH | Default no auth; admin party. | | 6379 | Redis | **CRITICAL** | No auth default; write `authorized_keys` for SSH. | | 7001 | WebLogic | HIGH | Frequent CVEs (CVE-2020-14882, etc.). | | 8000 | Common dev | MEDIUM | Django, common dev servers. | | 8080 | HTTP-alt | MEDIUM | Tomcat, Jenkins, common proxy. | | 8443 | HTTPS-alt | MEDIUM | Same as 8080. | | 8888 | Common dev / Jupyter | HIGH | Jupyter often exposes interactive shell. | | 9090 | Cockpit / Prometheus | HIGH | Server admin UI / metrics scraping. | | 9200 | Elasticsearch | **CRITICAL** | Typically no auth. | | 9300 | Elasticsearch transport | HIGH | Cluster join + RCE. | | 11211 | memcached | MEDIUM | UDP DDoS amp; data dump. | | 27017 | MongoDB | **CRITICAL** | No auth by default. | | 50070 | Hadoop NameNode | HIGH | HDFS browse. | When Shodan InternetDB returns `vulns[]` for a port, escalate the finding severity by one tier and include the CVE list in evidence. ### 16.4 Missing security headers — 6 findings For every alive webapp, audit response headers. Each missing header below = one finding. | Header | Severity (default) | Severity (sensitive path) | Notes | |---|---|---|---| | `Strict-Transport-Security` | MEDIUM | **HIGH** | Sensitive paths: `/login`, `/signin`, `/sso`, `/admin`, `/auth`. | | `Content-Security-Policy` | MEDIUM | MEDIUM | XSS impact mitigation gone. | | `X-Frame-Options` | LOW | LOW | Clickjacking. (CSP `frame-ancestors` is the modern replacement.) | | `X-Content-Type-Options` | LOW | LOW | MIME-sniff XSS. | | `Referrer-Policy` | INFO | INFO | Outbound link leakage. | | `Permissions-Policy` | INFO | INFO | Feature-policy hardening. | ### 16.5 Always-on HTTP checks — 15 paths Run these against every alive webapp regardless of Nuclei availability. Cheap; high signal. | Path | Finding | Severity | Match logic | |---|---|---|---| | `/.git/config` | Exposed `.git` repo | **CRITICAL** | Body contains `[core]`, `[remote`, `repositoryformatversion` | | `/.git/HEAD` | Exposed `.git/HEAD` | HIGH | Body matches `^ref:\s` | | `/.env` | Exposed `.env` | **CRITICAL** | Multiline regex `^\s*[A-Z_][A-Z0-9_]*\s*=` | | `/server-status` | Apache server-status | MEDIUM | Body contains `Apache Server Status` or matching title | | `/server-info` | Apache mod_info | MEDIUM | Body contains `Apache Server Information` | | `/.DS_Store` | Exposed `.DS_Store` | LOW | Byte signature `\x00\x00\x00\x01Bud1` | | `/phpinfo.php` | phpinfo() leak | HIGH | Body contains `phpinfo()`, `PHP Version`, or matching title | | `/info.php` | phpinfo() (alt path) | HIGH | Same as above | | `/actuator/env` | Spring Boot `/actuator/env` | **CRITICAL** | Body contains `"propertySources"`, `systemProperties`, `systemEnvironment` | | `/actuator/heapdump` | Spring Boot heapdump | **CRITICAL** | HPROF magic bytes / large binary download | | `/_cat/indices` | Elasticsearch open | HIGH | Returns index list | | `/console` | Jenkins script console | HIGH | Body contains `Jenkins`/`Script Console` | | `/manager/html` | Tomcat Manager | HIGH | Body contains `Tomcat Web Application Manager` | | `/wp-admin/install.php` | Orphaned WP install | LOW | Body contains `WordPress Installation` | | `/.well-known/security.txt` | Disclosure policy info | INFO | Parse contact + policy fields | Plus parse `/robots.txt` for `Disallow:` paths — those become the next-tier wordlist for that target. ### 16.6 SAML metadata — 5 paths ``` /saml/metadata /FederationMetadata/2007-06/FederationMetadata.xml /federationmetadata/2007-06/federationmetadata.xml /simplesaml/saml2/idp/metadata.php /auth/saml2/metadata ``` Reachable SAML metadata XML reveals: `EntityID`, signing certs (often pinned → cert-reuse pivot), `SingleSignOnService` URL, `NameIDFormat`. Mark as `MISCONFIG` (LOW severity unless metadata leaks internal hostnames or non-public certs, then MEDIUM). ### 16.7 SSO subdomain prefixes — 8 prefixes Probe each against root domain + every sibling brand domain: ``` auth.{domain} login.{domain} sso.{domain} idp.{domain} iam.{domain} identity.{domain} accounts.{domain} oauth.{domain} ``` Plus probe `/.well-known/openid-configuration` on every alive subdomain (regardless of prefix). ### 16.8 Cloud bucket permutation arsenal **6 prefixes:** ``` "" # bare candidate backup- assets- static- dev- prod- ``` **15 suffixes:** ``` "" # bare candidate -backup -assets -static -media -data -uploads -dev -prod -staging -logs -private -public -dump -archive ``` **47 generic stems** (filter unless combined with target-identifying token): ``` www, mail, email, app, apps, web, webmail, ftp, cdn, static, assets, media, img, images, videos, download, downloads, upload, uploads, data, files, docs, support, help, kb, blog, news, dev, test, staging, stg, qa, uat, sandbox, preprod, preview, vpn, mx, smtp, imap, pop, dns, ns, ns1, ns2, mx1, mx2 ``` **Provider URL templates:** S3: ``` https://{candidate}.s3.amazonaws.com/ https://{candidate}.s3-{region}.amazonaws.com/ # try us-east-1, us-west-2, eu-west-1, ap-southeast-1 first https://s3.{region}.amazonaws.com/{candidate}/ ``` GCS: ``` https://{candidate}.storage.googleapis.com/ https://storage.googleapis.com/{candidate}/ ``` Azure Blob: ``` https://{candidate}.blob.core.windows.net/ ``` **Probe technique:** HEAD first → 200/301 = exists, 403 = exists private, 404 = skip. On exists, GET root → if XML/JSON object listing returns, **CRITICAL** `PUBLIC_CLOUD_BUCKET`. Direct-URL object reads but not listable → **HIGH** `PUBLIC_CLOUD_BUCKET_OBJECT_READ`. ### 16.9 JS guess-paths for endpoint discovery Probe these paths on every alive webapp (in addition to scraped `