IOC Extraction¶
TI Mindmap HUB extracts Indicators of Compromise (IOCs) from each processed report using a combination of regex pattern matching and LLM-based analysis.
Supported IOC Types¶
| Type | Method | Validation |
|---|---|---|
| IPv4 / IPv6 | Regex + LLM | Format validation, private range exclusion |
| Domains | Regex + LLM | TLD validation, whitelist filtering |
| URLs | Regex + LLM | Format validation |
| File Hashes (MD5, SHA-1, SHA-256) | Regex | Length and character validation |
| CVE IDs | Regex | Format validation (CVE-YYYY-NNNNN) |
| Email Addresses | Regex + LLM | Format validation |
Extraction Pipeline¶
- Pattern matching identifies candidate indicators using regex
- LLM analysis provides context-aware extraction for ambiguous cases
- Validation checks format correctness and filters known false positives
- Deduplication removes duplicate indicators within the same report
Whitelisting¶
Common benign indicators are excluded automatically:
- Well-known domains (e.g., google.com, microsoft.com)
- Cloud provider infrastructure ranges
- Known false-positive patterns
- RFC 5737 documentation IP ranges (192.0.2.x, 198.51.100.x, 203.0.113.x)
Example Output¶
{
"iocs": [
{
"type": "ipv4",
"value": "198.51.100.42",
"context": "Command and control server"
},
{
"type": "domain",
"value": "malicious-example-domain.com",
"context": "Phishing infrastructure"
},
{
"type": "sha256",
"value": "a1b2c3d4e5f67890abcdef1234567890a1b2c3d4e5f67890abcdef1234567890",
"context": "Malware payload hash"
}
]
}
Note
All IOCs shown in documentation use sanitized or RFC-reserved values.
Known Limitations¶
- False positives — Benign indicators may be extracted (e.g., vendor domains mentioned in context)
- False negatives — Obfuscated, image-embedded, or non-standard IOCs may be missed
- Defanged indicators —
hxxp://and[.]notation is not always recognized
See Known Limitations for the full list.