How to Download Files from a URL Directory Using Python ?

Introduction

This tutorial shows how to list and download all files from a public URL directory — for example, when someone shares a web directory like https://******/pub/ containing multiple files.

We’ll focus on downloading all files that end with .nc (NetCDF files), but you can adapt the method for any file type.

Downloading Files from a URL Directory

Imagine a colleague sends you a link to a web directory — for example:

1
https://******/pub/

When you open it in your browser, you see a list of files (see image below).
Your goal is to automatically list and download all the files ending with .nc using Python.

Listing and downloading all <code>.nc</code> files from a URL directory using Python.
Listing and downloading all .nc files from a URL directory using Python.

Step 1 — List All Files Under the URL Directory

The first step is to retrieve the HTML content of the directory page. You can do this easily with the requests library:

1
2
3
4
import requests

url = 'https://******/pub/'
page = requests.get(url).text

Next, you need to extract all file links from the page.
A simple and efficient way to do that is by using BeautifulSoup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from bs4 import BeautifulSoup

soup = BeautifulSoup(page, 'html.parser')

# Filter only files ending with ".nc"
links = [url + node.get('href') for node in soup.find_all('a') if node.get('href').endswith('.nc')]

print("Found files:")
for link in links:
    print(link)

This will give you a list of complete URLs for all .nc files in the directory.

Step 2 — Download All Files Locally

Once you have the list of links, you can download each file using urllib.request.urlretrieve:

1
2
3
4
5
6
7
8
import urllib.request

for link in links:
    filename = link.split('/')[-1]
    print(f"Downloading: {filename}")
    urllib.request.urlretrieve(link, filename)

print("✅ Download completed!")

This script will:

  1. Extract the filename from each URL.
  2. Download each file and save it locally in the current directory.

Optional Improvements

Add a progress bar with tqdm:

1
2
3
4
5
from tqdm import tqdm

for link in tqdm(links, desc="Downloading files"):
    filename = link.split('/')[-1]
    urllib.request.urlretrieve(link, filename)

Handle errors gracefully:

1
2
3
4
try:
    urllib.request.urlretrieve(link, filename)
except Exception as e:
    print(f"❌ Failed to download {filename}: {e}")

Change output directory:

1
2
3
4
5
6
7
8
import os

output_dir = "downloads"
os.makedirs(output_dir, exist_ok=True)

for link in links:
    filename = os.path.join(output_dir, link.split('/')[-1])
    urllib.request.urlretrieve(link, filename)

See Also

References

Image

of