Let's say you didn't know the exact dataset identifier. Kaggle allows searching datasets via API by providing keyword, tags, file type, license or owner. Kaggle API returns paginated results. In order to get next page, we should to provide page number.
Kaggle API client provides the dataset_list
method for searching datasets. In the following code, datasets are searched by iris keyword. Also, we want to receive results sorted by votes.
import os
from pprint import pprint
os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KEY'
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
datasets = api.dataset_list(search='iris', sort_by='votes')
for dataset in datasets:
print('---- ' + dataset.ref + ' ----')
pprint(vars(dataset))
The dataset_list
method returns a list of Dataset
objects. We print dataset identifier and all attributes of an object. A part of the output:
---- uciml/iris ----
{'creatorName': 'Kaggle Team',
'creatorUrl': 'kaggleteam',
'currentVersionNumber': 2,
'description': None,
'downloadCount': 196903,
'files': [],
'id': 19,
'isFeatured': False,
'isPrivate': False,
'isReviewed': True,
'kernelCount': 5239,
'lastUpdated': datetime.datetime(2016, 9, 27, 7, 38, 5),
'licenseName': 'CC0: Public Domain',
'ownerName': 'UCI Machine Learning',
'ownerRef': 'uciml',
'ref': 'uciml/iris',
'size': '4KB',
.......................
dataset_list
method parameters:
No | Parameter | Default value | Description |
---|---|---|---|
1. | sort_by | None | Sort results. Available options: hottest (default), votes , updated , active , published . |
2. | file_type | None | Search for datasets by file type. Available options: all (default), csv , sqlite , json , bigQuery . |
3. | license_name | None | Search for datasets by license. Available options: all (default), cc , gpl , odb , other . |
4. | tag_ids | None | Search for datasets by tags. Tag list should be separated by comma. |
5. | search | None | Search for datasets by keyword. |
6. | user | None | Search for datasets by owner. |
7. | mine | False | Return datasets owned by currently logged user. |
8. | page | 1 | Page number. |
9. | max_size | None | Maximum size of the dataset. |
10. | min_size | None | Minimum size of the dataset. |
Leave a Comment
Cancel reply