Kaggle allows creating a custom dataset and upload it to the platform. A dataset should have an associated metadata file which specifies additional information about the dataset.
Kaggle API client provides the dataset_initialize
method to initiate metadata file. This method accepts the directory where a file will be saved.
import os
os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'
from kaggle.api.kaggle_api_extended import KaggleApi
datasetDir = 'custom_dataset'
api = KaggleApi()
api.authenticate()
if not os.path.exists(datasetDir):
os.makedirs(datasetDir)
api.dataset_initialize(datasetDir)
In the specified directory, the dataset-metadata.json
file is created. This file contains minimal information required to upload the dataset to the Kaggle platform. Provide a title and identifier which consists of your username and dataset name. The Kaggle API follows the Data Package specification to define metadata. All available parameters can be found on the GitHub repository of the Kaggle API.
{
"title": "Custom dataset",
"id": "YOUR_USERNAME/custom-dataset",
"licenses": [
{
"name": "CC0-1.0"
}
]
}
Now you need to put your data file to specified directory. For example, we put the following CSV file which contains values of x and y.
x,y
1,10
2,20
3,30
4,40
5,50
Kaggle API client provides dataset_create_new
method which allows creating a new dataset and upload files to the Kaggle. When it finished, you can go to datasets page https://www.kaggle.com/<username>/datasets
to see all your created datasets.
A dataset can be deleted in settings page https://www.kaggle.com/<username>/custom-dataset/settings
by pressing 'Delete Dataset' button.
import os
os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'
from kaggle.api.kaggle_api_extended import KaggleApi
datasetDir = 'custom_dataset'
api = KaggleApi()
api.authenticate()
api.dataset_create_new(datasetDir)
dataset_initialize
parameters:
No | Parameter | Default value | Description |
---|---|---|---|
1. | folder | - | Directory path where to initialize the metadata file. |
dataset_create_new
parameters:
No | Parameter | Default value | Description |
---|---|---|---|
1. | folder | - | Directory path which contains dataset files and metadata file. |
2. | public | False | Defines whether dataset will be available publicly. |
3. | quiet | False | Controls whether verbose output is suppressed. |
4. | convert_to_csv | True | If true, convert data to CSV format. |
5. | dir_mode | skip | If main directory contains subdirectories, what to do with them: skip - ignore, zip - compress and upload, tar - uncompressed upload. |
Leave a Comment
Cancel reply