Create New Dataset in Kaggle using API and Python

Create New Dataset in Kaggle using API and Python

Kaggle allows creating a custom dataset and upload it to the platform. A dataset should have an associated metadata file which specifies additional information about the dataset.

Kaggle API client provides the dataset_initialize method to initiate metadata file. This method accepts the directory where a file will be saved.

init.py

import os

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

datasetDir = 'custom_dataset'

api = KaggleApi()
api.authenticate()

if not os.path.exists(datasetDir):
    os.makedirs(datasetDir)

api.dataset_initialize(datasetDir)

In the specified directory, the dataset-metadata.json file is created. This file contains minimal information required to upload the dataset to the Kaggle platform. Provide a title and identifier which consists of your username and dataset name. The Kaggle API follows the Data Package specification to define metadata. All available parameters can be found on the GitHub repository of the Kaggle API.

custom_dataset/dataset-metadata.json

{
  "title": "Custom dataset",
  "id": "YOUR_USERNAME/custom-dataset",
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ]
}

Now you need to put your data file to specified directory. For example, we put the following CSV file which contains values of x and y.

custom_dataset/data.csv

x,y
1,10
2,20
3,30
4,40
5,50

Kaggle API client provides dataset_create_new method which allows creating a new dataset and upload files to the Kaggle. When it finished, you can go to datasets page https://www.kaggle.com/<username>/datasets to see all your created datasets.

A dataset can be deleted in settings page https://www.kaggle.com/<username>/custom-dataset/settings by pressing 'Delete Dataset' button.

main.py

import os

os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

from kaggle.api.kaggle_api_extended import KaggleApi

datasetDir = 'custom_dataset'

api = KaggleApi()
api.authenticate()

api.dataset_create_new(datasetDir)

dataset_initialize parameters:

NoParameterDefault valueDescription
1.folder-Directory path where to initialize the metadata file.

dataset_create_new parameters:

NoParameterDefault valueDescription
1.folder-Directory path which contains dataset files and metadata file.
2.publicFalseDefines whether dataset will be available publicly.
3.quietFalseControls whether verbose output is suppressed.
4.convert_to_csvTrueIf true, convert data to CSV format.
5.dir_modeskipIf main directory contains subdirectories, what to do with them: skip - ignore, zip - compress and upload, tar - uncompressed upload.

Leave a Comment

Cancel reply

Your email address will not be published.