Get List of Dataset Files from Kaggle using API and Python

Get List of Dataset Files from Kaggle using API and Python

Dataset can be provided in various file formats. Kaggle supports CSV, JSON, BigQuery and SQLite database file format. Files can be compressed using the ZIP or other common file archive format.

Kaggle API client provides the datasets_list_files method to get a list of dataset files. This method returns results in Python dictionary.

import os
from pprint import pprint

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

owner = 'uciml'
datasetName = 'iris'

api = KaggleApi()
api.authenticate()

files = api.datasets_list_files(owner, datasetName)
pprint(files)

A part of the output:

{'datasetFiles': [{'columns': [.............],
                   'creationDate': '2016-09-27T07:38:05.44Z',
                   'datasetRef': 'uciml/iris',
                   'description': 'SQLite database containing the same data as '
                                  'Iris.csv',
                   'fileType': '.csv',
                   'name': 'Iris.csv',
                   ........................
                  {'columns': [],
                   'creationDate': '2016-09-27T07:38:05.44Z',
                   'datasetRef': 'uciml/iris',
                   'description': 'SQLite database containing the same data as '
                                  'Iris.csv',
                   'fileType': '.sqlite',
                   'name': 'database.sqlite',
                   ........................

datasets_list_files method parameters:

NoParameterDefault valueDescription
1.owner_slug-Dataset owner.
2.dataset_slug-Dataset name.

Leave a Comment

Cancel reply

Your email address will not be published.