Download Dataset from Kaggle using API and Python

Kaggle allows to download various public datasets. Each dataset has identifier which consists of owner and dataset name. Let’s say you want to download the Iris dataset. This dataset is provided by UCI Machine Learning at uciml/iris. Each dataset can have various files. For example, uciml/iris dataset is provided in CSV format (Iris.csv) and in SQLite database file format (database.sqlite).

Kaggle API client provides dataset_download_files method which allows to download all files in ZIP format for a dataset. Also there is dataset_download_file method which can be used to download a specific file for a dataset. Both methods accepts dataset identifier and directory path where to save a file.

import os

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

dataset = 'uciml/iris'
path = 'datasets/iris'

api = KaggleApi()
api.authenticate()

api.dataset_download_files(dataset, path)

api.dataset_download_file(dataset, 'Iris.csv', path)
api.dataset_download_file(dataset, 'database.sqlite', path)

dataset_download_files parameters:

NoParameterDefault valueDescription
1.datasetDataset identifier in format owner/dataset_name.
2.pathNoneDirectory path where to save the downloaded file. By default, current working directory.
3.forceFalseForce to download the file even if the file already exists.
4.quietTrueControls whether verbose output is suppressed.
5.unzipFalseUnzip files when download has been finished. ZIP file will be deleted when process completed.

dataset_download_file parameters:

NoParameterDefault valueDescription
1.datasetDataset identifier in format owner/dataset_name.
2.file_nameDataset file name.
3.pathNoneDirectory path where to save the downloaded file. By default, current working directory.
4.forceFalseForce to download the file even if the file already exists.
5.quietTrueControls whether verbose output is suppressed.

Leave a Comment

Your email address will not be published. Required fields are marked *