Why does find() return None even when the tag exists?

This usually comes down to namespaces or an incorrect search path. ElementTree may store namespaced tags in an expanded form that doesn’t match what you see in the raw XML. Printing a sample parsed tag often reveals what’s actually happening.

When should I use iterparse() instead of parse()?

Use iterparse() when the XML file is large enough to cause memory issues. It lets you process one element at a time instead of loading the entire document into memory. For small files, parse() is still simpler and easier to work with.

Can Python handle messy or inconsistent XML?

Strict parsers expect well-formed XML and will fail when the structure is broken. If you need best-effort extraction, lxml with recovery enabled or BeautifulSoup in XML mode can help, as long as you accept looser parsing rules.

Is XML parsing safe by default in Python?

Parsing trusted XML is generally safe, but untrusted input can cause performance or resource problems. Using safer defaults like defusedxml, streaming large files, and restricting parser features helps keep things predictable.

How to parse XML in Python?

02.03.202610 minutes

XML is commonly used in RSS feeds, sitemaps, configuration files, and some older APIs. Even if it’s not part of your daily workflow, Python developers often run into XML when they need to parse structured data from these sources.

This guide shows how to parse XML in Python, the practical way. We’ll cover reading XML from files, strings, and URLs, handling namespaces correctly, and working with large XML files without memory issues. By the end, you’ll know exactly which tools to use and why.

What XML Parsing Means In Python

In Python, XML parsing means reading an XML document and converting it into a structure your code can work with directly. Instead of handling raw text, Python represents the XML as a tree of elements that you can navigate and query.

Each tag in the XML becomes an element in that tree. Nested tags form parent and child relationships, while attributes are stored as simple key-value pairs. Once the document is parsed, you can move through this structure, extract values, and transform the data into formats like lists, dictionaries, or CSV files.

Parsing does not change the XML itself. It simply gives you access to its contents in a controlled and predictable way. What you do with that data next depends on your use case, whether that’s analysis, storage, or further processing.

Choose The Right XML Parser

XML parsing in Python isn’t one fixed workflow. The best approach depends on what the XML looks like and what you’re trying to do with it. The key is starting with a reliable default, then switching only when the input or requirements push you there.

For most well-formed XML, a tree-based parser is the cleanest option because it mirrors the document’s structure. That’s why ElementTree is the usual starting point. It’s built in, easy to work with, and covers the common cases where you’re loading XML and extracting a handful of values.

If the file is large, iterparse() lets you stream elements instead of loading the whole tree. If you need full XPath or schema validation, lxml is the upgrade that saves you from awkward workarounds. If the XML is inconsistent but still well-formed, BeautifulSoup can be a convenient extraction API. If the input is truly not well-formed XML, use lxml with recovery enabled or parse it as HTML (best-effort), depending on what you need.

What You Need

You don’t need much setup to work through the examples in this guide. Python’s standard library already covers most XML parsing use cases.

All examples use Python 3 and assume you have an XML source to work with, whether that’s a local file, a string, or XML fetched from a web request.

ElementTree is the primary parser used throughout the guide and is included with Python by default. Other libraries show up only in specific scenarios. requests is used when fetching XML over HTTP, lxml appears when full XPath or schema validation is needed, and BeautifulSoup is used for best-effort extraction when the XML itself is inconsistent.

That’s it. You can start with the standard library and pull in extra tools only if the input or requirements call for them.

Parse XML With ElementTree

ElementTree is usually the right starting point for parsing XML in Python. It’s part of the standard library, easy to reason about, and matches how XML is structured.

When you parse XML with ElementTree, Python turns it into a tree of elements. Each tag becomes an element object, with access to its text, attributes, and children. Once you have that tree, the rest of the work is just navigation and extraction.

Parse XML From A File

When the XML lives on disk, ElementTree can load it directly.

import xml.etree.ElementTree as ET
tree = ET.parse("example.xml")
root = tree.getroot()
print(root.tag)

ET.parse() reads the file and builds the tree, while getroot() gives you the top-level element. If this fails, it’s usually because the file path is wrong or the XML itself is malformed.

A quick way to understand the structure is to inspect the first level of child tags:

print([child.tag for child in root])

This confirms the file loaded correctly and gives you a sense of where the data sits.

Parse XML From A String

XML often comes from API responses, test data, or embedded content instead of a file. In those cases, fromstring() works better.

import xml.etree.ElementTree as ET
xml_data = """
<library>
    <book>
       <title>Clean Code</title>
        <author>Robert C. Martin</author>
    </book>
</library>
"""

root = ET.fromstring(xml_data)
print(root.tag)

Once parsed, the resulting tree behaves the same way as one loaded from a file. You use the same methods to navigate and extract values.

Parse XML From A URL

When fetching XML from an API or feed, you typically grab the response first and then parse it.

One thing that matters here is timeouts. Requests do not set one by default, so it’s better to be explicit.

import requests
import xml.etree.ElementTree as ET

url = "https://example.com/feed.xml"

try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    # Use bytes so encoding declarations are handled correctly
    root = ET.fromstring(response.content)
    print(root.tag)

except requests.exceptions.Timeout:
    print("Request timed out while fetching XML")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

except ET.ParseError as e:
    print(f"XML parsing error: {e}")

Using response.content instead of response.text avoids subtle encoding issues when the XML declares its own charset.

Explore The Element Tree

After parsing, it helps to quickly explore the structure before writing queries.

To see immediate children of the root element:

for child in root:
print(child.tag)

This prints the immediate child elements of the root, which is useful for quickly understanding the document structure.

If the document is deeper, iter() lets you scan tags across the tree:

for elem in root.iter():
print(elem.tag)
break

You rarely need to inspect everything. Even a small sample usually tells you where the data lives.

Extract Tags And Text Safely

ElementTree returns None when a tag doesn’t exist, and .text can be empty even when the tag is present. Defensive checks make your parsing more predictable.

title_element = root.find(".//title")
if title_element is not None and title_element.text:
    print(title_element.text.strip())
else:
    print("Title not found")

This pattern avoids common AttributeError issues and keeps your code resilient when the XML isn’t exactly what you expect.

Parse Large XML Files Without Blowing Up Memory (iterparse)

parse() is perfect for small and medium XML files. But once the file gets big, loading the entire tree into memory can slow everything down or crash your script. That’s when iterparse() starts to matter.

Instead of building the full document upfront, iterparse() walks through the file and hands you elements as they finish parsing. You grab what you need, clear the element, and keep moving.

Why iterparse() Works Better For Big Files

With iterparse(), the "end" event fires once an element is fully built, meaning its children and text are ready to use. That’s usually the event you want, because it lets you safely extract data without half-parsed edge cases.

elem.clear() usually keeps memory stable. If memory still grows, it helps to make sure you’re processing a repeating unit (like <record>) and clearing elements as soon as you’re done with them, without keeping extra references to processed elements. If you don’t clear, ElementTree keeps references around and memory climbs anyway.

A Simple iterparse() Pattern

This pattern processes repeated <record> elements and clears each one after it’s handled.

import xml.etree.ElementTree as ET
for event, elem in ET.iterparse("large_file.xml", events=("end",)):
    tag = elem.tag.split("}")[-1]
    if tag == "record":
        print(elem.findtext("id", default=""))

    elem.clear()

Stream Large XML File With iterparse.webp

What’s happening here is straightforward. You wait until the record is complete, extract only what you need, then call elem.clear() so memory doesn’t creep up as the file streams.

Picking The Right Element To Stream

iterparse() works best when the XML has a repeating “unit” you can process one at a time, like <item>, <record>, or <entry>.

Aim for the element that represents one complete chunk of data. If you stream tiny child tags instead, the code gets harder to follow and you usually don’t gain anything.

When You Should Use iterparse()

Use it when the XML file is too large to load comfortably, you can process one record at a time, and you don’t need random access to the whole document. If the file is small, parse() is still simpler and easier to maintain.

Quick Way To Confirm It’s Working

If memory still climbs while your script runs, the usual cause is forgetting to clear elements. A simple sanity test is to process a limited number of records first and check that memory stays steady.

import xml.etree.ElementTree as ET
count = 0

for event, elem in ET.iterparse("large_file.xml", events=("end",)):
    tag = elem.tag.split("}")[-1]
    if tag == "record":
        count += 1
        elem.clear()

        if count == 1000:
            break

If memory stays stable while the count goes up, you’re using iterparse() the way it’s meant to be used.

Use lxml When You Need Full XPath Or Validation (Optional Upgrade)

ElementTree works well for simple XML parsing tasks like finding elements and reading values. If you need full XPath support, schema validation, or better handling of complex XML, lxml is usually the better choice.

It’s a third-party library built on libxml2, and the biggest win is full XPath support. You can often replace loops and nested find() calls with one XPath expression.

When lxml Is Worth Using

You’ll usually reach for lxml when:

You need full XPath (not just ElementTree’s limited subset)
You want to validate XML against an XSD schema
You’re dealing with heavy namespaces or complex structures
You want better error messages, or faster parsing on larger inputs

Install lxml

pip install lxml

Parse XML With lxml

The basic workflow looks very similar to ElementTree.

from lxml import etree
tree = etree.parse("example.xml")
root = tree.getroot()

print(root.tag)

Use XPath For Cleaner Queries

With full XPath, you can target elements and extract text in one step.

from lxml import etree						
tree = etree.parse("example.xml")
root = tree.getroot()

results = root.xpath("//book/title/text()")
print(results)

This returns a list of book title strings from every <title> under <book>.

Validate XML With An XSD Schema

If you have an XSD schema file, lxml makes validation pretty direct.

from lxml import etree
schema_doc = etree.parse("schema.xsd")
schema = etree.XMLSchema(schema_doc)

parser = etree.XMLParser(schema=schema)

try:
    etree.parse("example.xml", parser)
    print("XML is valid.")
except etree.XMLSyntaxError as e:
    print(f"XML failed validation: {e}")

Stream Large XML With lxml iterparse()

You can stream large files with lxml too, and filtering by tag keeps things faster when you only care about one repeating element.

from lxml import etree

for event, elem in etree.iterparse("large_file.xml", events=("end",), tag="record"):
    record_id = elem.findtext("id", default="").strip()
    if record_id:
        print(record_id)

    elem.clear()

Use BeautifulSoup When XML Is Messy (And You Need Tolerance)

Most XML guides assume clean, well-formed input. Real feeds are often anything but. Tags go missing, structures change between records, and strict parsers fail fast.

That’s where BeautifulSoup can help. It’s useful when the goal is extraction, not correctness, and you’re okay with best-effort results.

When BeautifulSoup Makes Sense

BeautifulSoup is a reasonable choice when:

The XML structure isn’t consistent across records
You want quick searching with find() and find_all()
You’re dealing with scraped data or vendor exports that are messy in practice

One thing to be clear about: if you want BeautifulSoup to parse XML properly, use "xml" mode and have lxml installed. That combination gives you the most predictable behavior.

Parse XML With BeautifulSoup

from bs4 import BeautifulSoup

with open("feed.xml", "rb") as f:
    soup = BeautifulSoup(f.read(), "xml")

for item in soup.find_all("item"):

    title_tag = item.find("title")
    title = title_tag.get_text(strip=True) if title_tag else ""
    print(title)

This stays simple on purpose. Missing tags don’t raise exceptions, and you can extract what’s available without defensive boilerplate everywhere.

If The XML Is Actually Broken

If the input isn’t well-formed XML at all, BeautifulSoup’s XML mode can still fail. XML is strict by design.

At that point, your realistic options are:

Use lxml with a parser configured for recovery
Parse as HTML if you only care about extraction and can accept HTML-style rules

The key is to choose the behavior you want up front, instead of expecting every “messy” file to behave like clean XML.

Secure XML Parsing And Safety Checks

Most XML issues come down to malformed input, missing tags, or unexpectedly large files. When parsing untrusted XML (uploads, user input, third-party feeds), there’s also a risk of abuse, such as inputs designed to exhaust CPU or memory or trigger unsafe parser features.

What Can Go Wrong With Untrusted XML

The common risks are:

Entity expansion attacks (like “Billion Laughs”) that balloon CPU and memory
Deep nesting that slows parsing or hits limits
Huge documents that overload RAM
XXE-style payloads if your parser is configured to allow external entity resolution / DTD features

With Python’s standard library, the biggest practical concern is usually denial of service (time/memory), even if you aren’t doing anything fancy.

Use defusedxml For Untrusted Input

If you want a safer default that still feels close to ElementTree, defusedxml is the usual recommendation for server-side parsing when the XML isn’t fully under your control.

Install:

pip install defusedxml

Use it like this:

from defusedxml.ElementTree import parse
tree = parse("untrusted.xml")
root = tree.getroot()
print(root.tag)

Note: defusedxml modules aren’t full drop-in replacements for everything in the stdlib XML modules. They mainly cover parsing/loading safely. For other utilities, keep using the standard library.

If You Use lxml, Lock It Down

lxml is powerful, but you should be explicit when parsing untrusted XML. In particular, lxml’s parser can resolve entities unless you turn that off.

A safer baseline looks like this:

from lxml import etree
import requests

parser = etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    load_dtd=False
)

response = requests.get("https://example.com/feed.xml", timeout=10)
response.raise_for_status()

root = etree.fromstring(response.content, parser=parser)
print(type(root)) 
print(root.tag) 
print(root.nsmap)

If you’re parsing from a file instead of bytes:

from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
tree = etree.parse("untrusted.xml", parser)
root = tree.getroot()

That’s usually enough to avoid the classic “entity/DTD surprise” problems while still letting you use XPath and validation when you need it.

Best Practices, Mistakes, And Troubleshooting

By now, parsing XML itself shouldn’t be the hard part. What usually causes issues is assuming the XML will always look the same. It won’t. Tags go missing, values come back empty, and namespaces quietly break your queries.

This section focuses on keeping your code stable when that happens.

Parse Defensively

Never assume a tag exists or contains text. Accessing .text directly will eventually crash your script.

title_element = root.find(".//title")
title_text = title_element.text.strip() if title_element is not None and title_element.text else ""

That single check avoids the most common ElementTree error.

Clean Text Before Using It

Whitespace is common in XML and easy to miss. Newlines and extra spaces can break comparisons or downstream logic.

Stripping text as you extract it is a safe default:

value = element.text.strip() if element is not None and element.text else ""

When find() Suddenly Stops Working

If find() returns None even though the tag exists, namespaces are usually the reason. ElementTree stores namespaced tags using their full URI.

Printing a sample tag quickly shows what’s going on:

for elem in root.iter():
    print(elem.tag)
    break

If you see a namespace wrapped in braces, your queries need to account for it. The simplest pattern is to define a namespace map and use it in your XPath/find calls:

ns = {"ns": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = root.findall(".//ns:url", ns)

Quick Troubleshooting Checklist

find() returns None: wrong path or namespaces
NoneType has no attribute 'text': missing tag, add checks
Values look right but fail comparisons: strip whitespace
Memory usage keeps climbing: switch to iterparse()
Strict parsing keeps failing: try lxml or BeautifulSoup