Create New Dataset in Kaggle using API and Python

Kaggle allows to create a custom dataset and upload it to the platform. Dataset should have associated metadata file which specifies additional information about dataset.

Kaggle API client provides dataset_initialize method to initiate metadata file. This method accepts directory where a file will be saved.

init.py

import os

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

datasetDir = 'custom_dataset'

api = KaggleApi()
api.authenticate()

if not os.path.exists(datasetDir):
    os.makedirs(datasetDir)

api.dataset_initialize(datasetDir)

In specified directory dataset-metadata.json file is created. This file contains minimal information required to upload dataset to the Kaggle platform. Provide title and identifier which consists of your username and dataset name. The Kaggle API follows the Data Package specification to define metadata. All available parameters can be found on GitHub repository of the Kaggle API.

custom_dataset/dataset-metadata.json

{
  "title": "Custom dataset",
  "id": "YOUR_USERNAME/custom-dataset",
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ]
}

Now you need to put your data file to specified directory. For example, we put the following CSV file which contains values of x and y.

custom_dataset/data.csv

x,y
1,10
2,20
3,30
4,40
5,50

Kaggle API client provides dataset_create_new method which allows to create a new dataset and upload files to the Kaggle. When it finished, you can go to datasets page https://www.kaggle.com/<username>/datasets to see all your created datasets.

Dataset can be deleted in settings page https://www.kaggle.com/<username>/custom-dataset/settings by pressing ‘Delete Dataset’ button.

main.py

import os

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

datasetDir = 'custom_dataset'

api = KaggleApi()
api.authenticate()

api.dataset_create_new(datasetDir)

dataset_initialize parameters:

NoParameterDefault valueDescription
1.folderDirectory path where to initialize the metadata file.

dataset_create_new parameters:

NoParameterDefault valueDescription
1.folderDirectory path which contains dataset files and metadata file.
2.publicFalseDefines whether dataset will be available publicly.
3.quietFalseControls whether verbose output is suppressed.
4.convert_to_csvTrueIf true, convert data to CSV format.
5.dir_modeskipIf main directory contains subdirectories, what to do with them: skip – ignore, zip – compress and upload, tar – uncompressed upload.

Leave a Comment

Your email address will not be published. Required fields are marked *