Read More: Introducing Live Network Activity
BlogPython wget and CLI: Full Guide to Download Files and Pages

Python wget and CLI: Full Guide to Download Files and Pages

Python & wget.png

Trying to download a file or web page using Python? With different libraries capable of handling downloads, it gets more complex than expected. Python’s wget is the ideal choice if you want to download content without dealing with sessions and headers or needing a complex setup.

It works over HTTP, HTTPS, or FTP and gives you two options: a lightweight Python package or the wget command-line tool. This guide walks you through both. We’ve included advanced download techniques along with bulk and recursive downloads, and compare it with tools like requests and urllib so you can choose what fits best.

Setting Up wget in a Python Environment

Before you proceed with Python wget, it is important to realize what you want to work with. Do you install a Python package? Use a system command? Or both? The two are separate approaches, and they are often confused.

The python-wget package is a pip install used in scripts. On the other hand, the wget CLI is a system utility with more features, installed via apt, brew, or Windows binaries. Here’s how to install both and quickly check if everything is ready to use.

Installing the python-wget Package

1. Download Python from the Python downloads page and install it. Once done, open terminal or command prompt and verify if Python is installed on your machine using the command: python --version

Check Python version.webp

2. If you find the Python version displayed, proceed with installing the wget package using: pip install wget

Install wget.webp

3. Once done, you can import it into your script with the import wget command.

Installing the wget CLI Tool

1. Download and install wget from the official GNU wget site or through the wget executable. Here are commands for macOS and Ubuntu-based systems.

macOS: brew install wget
Ubuntu: sudo apt install wget

2. Upon installation, use the command wget --help to verify the installation.

Verify wget Installation.webp

Downloading Files and Pages with python-wget

The easiest way to download a file or page using Python is through the python-wget package. With the help of the examples below, you can go for quick downloads, save files to specific locations, and handle errors.

Basic File and HTML Downloads

The python-wget package lets you download any direct file link that is HTTP, HTTPS, or FTP based. You get to download PDFs, ZIPs, images, and even static HTML pages. However, it falls short if the sites require authentication, JavaScript rendering, or dynamic redirects.

import wget
url = "https://examplewebsite.com/"
filename = wget.download(url)

This downloads the content at the given URL and saves it in the current working directory. If the URL points to a file, it’s saved as-is. If it’s a page, it saves the HTML.

Saving to a Custom Path

By default, wget.download() saves the file in the current working directory with the same name as in the URL. While handy, this is a letdown if you are organizing downloads or automating batch tasks.

Using the out parameter is the way around, allowing you to save the download to a specific folder or rename it.

import wget
url = "https://examplewebsite.com/sample.csv"
output_path = "/downloads/data.csv"
filename = wget.download(url, out=output_path)

Here, the file is downloaded and saved with a custom name in a specific directory. This helps keep downloads organized when automating tasks.

Error Handling and Limitations

If you’re facing unfinished or broken downloads, your request has likely failed. As Python wget doesn’t support custom headers, authentication, or timeouts, failures are even more likely. Hence, we recommend sticking to the static pages with direct file links.

The best part with Python wget is that it can still catch basic errors using a try-except block. Here’s an example.

import wget
url = "https://examplewebsite.com/file.zip"
try:
    filename = wget.download(url)
except Exception as e:
    print("Download failed:", e)

This wraps the download in a try-except block. It helps catch issues like broken URLs or network errors, though python-wget won’t give you the reason for the failure.

Using wget CLI Inside Python Scripts

If you're using Python wget, you might have already noticed its limitations. Using wget CLI gives you full control over how downloads are executed. Simply run wget commands from your Python script to make your downloads more flexible and reliable.

Running wget with subprocess

The easiest way to use wget inside Python is by calling it directly with the subprocess module. This lets you run CLI commands from a script and handle the output, errors, and exit codes just like any other process.

import subprocess
url = "https://examplewebsite.com/"
subprocess.run(["wget", url])

In the code above, Python uses subprocess.run() to call wget as a system command. The URL is passed as part of a list, which safely executes it without needing manual string formatting.

Building a Custom wget_download() Function

With the wget call working, turn it into a reusable function. This way, you can use it to automate file downloads, manage different URLs and destinations, and even scale your scripts without repeating code.

import subprocess
def wget_download(url, output_path):
    command = ["wget", url, "-O", output_path]
    subprocess.run(command)

A custom function is created to take a URL and an output path. It uses the -O flag to tell wget where to save the file, making it easier to reuse in different scripts.

Controlling Output: Retries and Quiet Modes

Every time a download fails, you don’t want to restart the whole script. If it's a download loop, it’s even worse as the terminal is flooded with progress logs. Hence, learning about retries and quiet mode is essential.

import subprocess
url = "https://examplewebsite.com/"
command = ["wget", url, "--tries=3", "--quiet"]
subprocess.run(command)

If you notice, wget is run with two flags. --tries=3 allows up to three retry attempts if the download fails. Whereas, --quiet hides progress messages and status output for a cleaner terminal.

Advanced Python wget CLI Download Techniques

Before you wonder if the wget CLI really adds anything extra, we are getting to the good part. Without the need for added lines of code, you can use the few extra flags to gain exceptional control over the downloads.

Resuming Incomplete Downloads

You can't start from scratch every time your connection drops or your script fails mid-download. With the -c or --continue flag, you can pick up right where it left off and avoid downloading everything again.

import subprocess
url = "https://examplewebsite.com/"
subprocess.run(["wget", "--continue", url])

Timestamp-Based Conditional Downloads

When you're downloading files often, you don’t want to grab the same version again and again. The --timestamping flag checks if the server version is newer than what you already have, and skips the download if not, saving time and resources.

import subprocess
url = "https://examplewebsite.com/"
subprocess.run(["wget", "--timestamping", url])

Custom Filenames and Download Paths

By default, wget saves the file with its original name in your current folder. If you’d rather organize things your way, use -O or --output-document to rename it. To save it in a specific directory, use -P or --directory-prefix.

Note: The best part with the wget -P flag is that it will even create the folder if it doesn’t exist.

import subprocess
url = "https://examplewebsite.com/"
subprocess.run(["wget", "-O", "custom_name.zip", "-P", "downloads", url])

In this example, wget saves the file as custom_name.zip inside the downloads folder. The -O flag sets the filename, and -P defines the target directory.

Automating Bulk Downloads with Python wget CLI

Adding a line of code to download each URL takes forever. Instead, you can use Python and wget CLI together to automate downloads. This is efficient, as it saves time and keeps the code clean.

Looping Over Multiple URLs

If you already have a list of URLs in Python, you can use a for loop over them and run wget for each one. The best part? You get the flexibility to filter, log, or modify URLs before downloading without leaving the script.

import subprocess
urls = [
    "https://examplewebsite.com/file1.zip",
    "https://examplewebsite.com/file2.zip",
    "https://examplewebsite.com/file3.zip"
]
for url in urls:
    subprocess.run(["wget", url])

Using URL List Files

While adding lists to Python is a great way to download them one by one, creating a .txt file with multiple links is the ultimate move. It keeps your code cleaner and lets wget handle everything with a single command using the -i or --input-file flag.

import subprocess
subprocess.run(["wget", "-i", "urls.txt"])

In this example, wget reads URLs from the urls.txt file using the -i flag and downloads them in order.

Adding Delays for Proper Downloads

If you're downloading many files in a row, hitting a server too quickly can trigger blocks or errors. The --wait flag adds a pause between each download, keeping things smooth and server-friendly.

import subprocess
subprocess.run(["wget", "-i", "urls.txt", "--wait=5"])

From the code above, wget processes each URL in urls.txt and uses the --wait=5 flag to pause for 5 seconds between downloads.

Recursive Downloads and Website Mirroring

Downloads don’t stop at a single page. You might want to download a full blog post along with its images, stylesheets, or even subpages. Don’t go through the hassle of building a crawler, as wget already has the tools.

  • --recursive or -r: Follows links on the page and fetches the resources they lead to
  • --level=N or -l: Controls how deep wget goes while following links (e.g., 2 means it’ll follow links two levels deep)
  • --convert-links or -k: Rewrites links in the downloaded pages so they work correctly when viewed offline

Here’s an example that puts it all into action:

import subprocess
url = "https://examplewebsite.com/"
subprocess.run([
    "wget",
    "--recursive",
    "--level=2",
    "--convert-links",
    url
])

From the code above, wget downloads the target page and follows its internal links up to two levels deep. As it does, it rewrites those links so the saved pages stay connected and usable when opened locally.

Using Proxies with wget in Python

wget makes downloads easy, but large requests, repeated access, or scraping can lead to blocks, bans, or timeouts. The simple fix? Use proxies. They route your requests through alternate IPs, helping you stay anonymous and avoid getting blocked.

Setting Up Proxies Using Environment Variables and .wgetrc

Before using a proxy with wget, you need to tell wget where to find it. You can do this temporarily using environment variables, or make it persistent by configuring the .wgetrc file. Both methods define how wget connects through proxies, whether it’s HTTP, HTTPS, or FTP.

Note: wget expects proxy addresses to follow this structure: <PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT>. If authentication is needed, add username:password@ before the IP.

If multiple proxy methods are defined, wget prioritizes command-line options, followed by environment variables, and config files.

Using Environment Variables (Temporary Method)

Environment variables let you quickly set up proxy support for wget, but the change only lasts for the current terminal session. If you close the terminal or reboot, these settings will be lost unless you add them to a shell config like .bashrc or .zshrc.

export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
export ftp_proxy=http://proxy.example.com:8080

Using .wgetrc Configuration File (Persistent Method)

Unlike environment variables, .wgetrc lets you set proxy rules that stay active across reboots or terminal restarts. This is a wget-specific config file typically located at ~/.wgetrc for a single user, or /etc/wgetrc for system-wide changes.

To configure proxies, add the following lines to your .wgetrc file:

use_proxy = on
http_proxy = http://proxy.example.com:8080
https_proxy = http://proxy.example.com:8080
ftp_proxy = http://proxy.example.com:8080

use_proxy = on tells wget to use the proxy settings you've defined. Without it, wget might skip using the proxy even if it's set. You can also pass a custom .wgetrc using wget --config ./path/to/.wgetrc if you don’t want to rely on the default file.

Using Proxy Per Command and Skipping Specific Domains

If you don’t want to modify environment variables or config files, the -e flag lets you set proxy settings directly in the command. Why is it useful? You can test or temporarily switch proxies without changing system-wide settings.

wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:8080 https://examplewebsite.com/

What if you want to skip a certain domain from using a proxy? This is where the no_proxy environment variable comes in. It lets you define domains or IP addresses where wget should bypass the proxy. This only works when set using export.

export no_proxy=localhost,127.0.0.1,internal.example.com

In this example, wget will not use the proxy for localhost, the loopback IP, or any subdomain of internal.example.com.

Using Authentication with Proxy Servers

Some proxies require a username and password before letting you connect. wget supports both inline credentials in the proxy URL and separate flags for clarity.

Option 1: Using --proxy-user and --proxy-password

wget --proxy-user=<yourUsername> --proxy-password=<yourPassword> https://examplewebsite.com/

Option 2: Using export with credentials in the proxy URL

export http_proxy=http://yourUsername:[email protected]:8080
export https_proxy=https://yourUsername:[email protected]:8080

Note: Wondering why we haven’t included SOCKS proxies? That’s because wget doesn’t support SOCKS. If your use case involves SOCKS proxies, you’ll need to switch to a tool like curl or use a proxy tunneling tool (e.g., tsocks, proxychains, or ssh -D) that can route wget traffic through a SOCKS proxy indirectly.

What’s Better for Downloads? python-wget vs requests vs urllib

Python wget is often the first choice for quick downloads. The wget CLI adds more control for things like retries and recursive downloads. But as the need for customization, error handling, or API-level logic increases, requests and urllib start to shine.

Not sure what to choose? Here’s a quick comparison that clears all.


python-wget vs requests vs urllib

Factor Python wget wget CLI via subprocess requests urllib
Ease of Use Easy (mostly one-liners) Moderate (requires command building and subprocess module) Easy (clean syntax, good for beginners) Hard (verbose and lower-level syntax)
Request Customization None CLI flags allow user-agent and headers Full control (headers, sessions, auth) Supports headers, cookies, timeouts
Downloading Pages and Files Good (handles single file downloads reliably) Excellent with recursive flags (optimized for large and batch file downloads) Excellent (supports streaming and chunked downloads for large files) Moderate (can handle files, but requires manual stream management)
Error Handling Minimal Requires parsing stdout/stderr Try/except, status code access Standard Python exceptions
Redirect Handling No automatic redirect support CLI follows redirects by default Follows redirects With config
Extensibility Not extensible Hard to extend, you pass fixed flags Highly extensible Limited via low-level modules
Performance Fine for small tasks Optimized for batch or mirrored downloads Efficient, especially with sessions Fast, but more manual setup
Best Use Case One-off file or page download Bulk or recursive downloads, retries, automation Web scraping, APIs, custom requests Low-level control or limited environments

Wrapping Up

When downloading files or pages using Python, the tool you choose can make a real difference. The Python wget package is quick to use and great for basic tasks, while the CLI brings more power and flexibility for advanced downloads.

That said, no matter which method you pick, it’s important to be mindful of how you use it. Always check a site’s robots.txt to know what’s allowed, and be careful with how frequently you send requests. Respecting these rules helps you avoid IP blocks and keeps your scraping or downloading safe and responsible.

Residential ProxiesResidential Proxies
  • 35 million+ real residential IPs

  • 195+ countries

  • Less than 0.5 second connect times

FAQs

Python wget FAQs

FAQs
cookies
Use Cookies
This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore more