How to Get All File or Directory Names in a Specific AWS S3 Bucket Directory Using Python ?

Introduction

When working with publicly accessible data on AWS S3, such as NOAA environmental satellite products, it's often useful to programmatically list either all the files or subdirectories within a specific path (also called a "prefix") of a bucket.

In this guide, we show how to use Python and the boto3 library to retrieve:

  • All file names under a specified prefix
  • All "folder" names (i.e., common prefixes) under a given path

We’ll use NOAA's noaa-nesdis-snpp-pds public bucket as an example, which contains a wide variety of satellite products.

Installation

Install boto3 if you haven't already:

1
pip install boto3

List All Files in a Given Directory

Use the S3 resource interface and filter by a prefix to retrieve all file keys under a "directory":

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import boto3
from botocore import UNSIGNED
from botocore.config import Config

# Initialize anonymous S3 resource
s3 = boto3.resource('s3', config=Config(signature_version=UNSIGNED))

bucket_name = "noaa-nesdis-snpp-pds"
folder = "VIIRS-IMG-GEO-TC/2023/05/30/"  # Example prefix

s3_bucket = s3.Bucket(bucket_name)

# List all .h5 files under the prefix
files_in_s3 = [
    obj.key for obj in s3_bucket.objects.filter(Prefix=folder)
    if obj.key.endswith(".h5")
]

print(f"Found {len(files_in_s3)} files")
print("First 10 files:")
for f in files_in_s3[:10]:
    print(f)

List All Subdirectories (Folders)

If you'd rather get a list of all top-level folders (or subfolders under a given prefix), use the S3 client interface with the Delimiter parameter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import boto3
from botocore import UNSIGNED
from botocore.config import Config

# Anonymous S3 client
s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED))

bucket_name = "noaa-nesdis-snpp-pds"
prefix = ""  # Root level; change to e.g., 'VIIRS-IMG-GEO-TC/2023/' to go deeper

response = s3_client.list_objects_v2(
    Bucket=bucket_name,
    Prefix=prefix,
    Delimiter="/"  # Important: groups by "folders"
)

# Extract folder names
folders = [cp['Prefix'] for cp in response.get('CommonPrefixes', [])]

print("Folders found:")
for folder in folders:
    print(folder)

Example Output

Running the above folder-listing code on the root of the noaa-nesdis-snpp-pds bucket gives:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Folders found:
ATMS-SCIENCE-RDR/
ATMS-SDR-GEO/
ATMS-SDR/
ATMS-SFR/
ATMS-TDR/
ATMS_BUFR/
CRIS-SCIENCE-RDR/
CrIS-FS-SDR/
CrIS-SDR-GEO/
GRIDDED_VIIRS_LSA_DLY/
...
VIIRS-MOD-GEO-TC/
VIIRS-NCC-EDR/
VIIRS_SurfaceReflectance_EDR/
VIIRS_VFM_MWS_MOSAIC/
VI_BWKL_GLB/

These are the "subdirectories" or data product categories under the top-level prefix.

Notes:
- Delimiter="/" tells S3 to group keys by "directory levels".
- CommonPrefixes in the response contains the "subfolder names" under your prefix.
- To paginate if you have many folders, use ContinuationToken (or I can show that if needed).

Tips and Troubleshooting

  • Public Buckets: Ensure you're using Config(signature_version=UNSIGNED) for public, unauthenticated access.
  • Prefix Format: Always include a trailing slash (/) in prefixes when simulating folders.
  • Pagination: If there are more than 1000 objects or folders, use ContinuationToken to handle pagination (boto3's paginator can help).
  • Check Bucket Access: If access is denied, double-check that you're accessing a public bucket or that your AWS credentials are set up.

References