Get List of Dataset Files from Kaggle using API and Python

Dataset can be provided in various file formats. Kaggle supports CSV, JSON, BigQuery and SQLite database file format. Files can be compressed using the ZIP or other common file archive format.

Kaggle API client provides datasets_list_files method to get a list of dataset files. This method returns results in Python dictionary.

import os
from pprint import pprint

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

owner = 'uciml'
datasetName = 'iris'

api = KaggleApi()
api.authenticate()

files = api.datasets_list_files(owner, datasetName)
pprint(files)

A part of the output:

{'datasetFiles': [{'columns': [.............],
                   'creationDate': '2016-09-27T07:38:05.44Z',
                   'datasetRef': 'uciml/iris',
                   'description': 'SQLite database containing the same data as '
                                  'Iris.csv',
                   'fileType': '.csv',
                   'name': 'Iris.csv',
                   ........................
                  {'columns': [],
                   'creationDate': '2016-09-27T07:38:05.44Z',
                   'datasetRef': 'uciml/iris',
                   'description': 'SQLite database containing the same data as '
                                  'Iris.csv',
                   'fileType': '.sqlite',
                   'name': 'database.sqlite',
                   ........................

datasets_list_files method parameters:

NoParameterDefault valueDescription
1.owner_slugDataset owner.
2.dataset_slugDataset name.

Leave a Comment

Your email address will not be published. Required fields are marked *