Introduction
If you're a developer or data scientist, chances are you've worked with large datasets that contain multiple compressed files. And if you're working with these files in Python, it can be quite tedious and time-consuming to manually unzip each file one by one.
Table of contents
Python provides a number of modules for dealing with compressed files. The zipfile module allows you to unzip files in Python
Unzip one file
To begin, import the zipfile module:
import zipfile
myZipFile = 'archive.zip'
Next, create an instance of the ZipFile class, which will represent our zipped file and extract the contents of myZipFile by using the extractall() method
destination_folder = '.'
with zipfile.ZipFile(myZipFile, 'r') as zip_ref:
zip_ref.extractall(destination_folder)
The extractall() method takes a single argument, which is the destination path where we want our files to be extracted. The destination path must exist before calling this method, or else an error will be thrown.
Unzip multiple files
Example 1
We can use numpy's glob() function to search for all zip files in the specified folder. The function returns a list of paths to all matching files:
import glob
root = '/Volumes/HD2/Datasets/'
zip_files = glob.glob('{}*.zip'.format(root))
len(zip_files)
Output for example
156
For each zip file, iterate through it and create a separate folder to unzip its contents.
import zipfile
import os
for myzipfile in zip_files:
destination_folder = myzipfile.split('.')[0]
os.makedirs(destination_folder, exist_ok=True)
with zipfile.ZipFile(myzipfile, 'r') as zip_ref:
zip_ref.extractall(destination_folder)
Example 2
One possible alternative approach is to retrieve all files from a designated folder and initially verify their format to determine if they are zip files
import zipfile, os
path_to_zipped_files = '/Users/bmarchant/Desktop/2019_new'
os.chdir(path_to_zipped_files)
for file in os.listdir(path_to_zipped_files):
if zipfile.is_zipfile(file):
with zipfile.ZipFile(file) as item:
item.extractall()
References
Links | Site |
---|---|
zipfile | docs.python.org |