Product Update: Create and Manage Datasets from the Command Line using the Official Kaggle API

Learning

Product Update: Create and Manage Datasets from the Command Line using the Official Kaggle API – Source www.kaggle.com

Kaggle Datasets API Tutorial

Have you used Kaggle’s beta API to download data or make a competition submission? We’re pleased to announce version 1.1 of the API which includes new features for easily managing your datasets on Kaggle from the command line.

Read on to learn how to use the API to create and update datasets or check out detailed documentation on our GitHub page.

Create new datasets »

After you follow the installation instructions, it’s simple to create a new dataset on Kaggle from files on your local machine:

  1. Create a folder containing the files you want to upload
  2. Run kaggle datasets init -p /path/to/dataset to generate a metadata file
  3. Add your dataset’s metadata to the generated file, datapackage.json
  4. Run kaggle datasets create -p /path/to/dataset to create the dataset

Your dataset will be private by default. You can also add a -u flag to make it public when you create it, or navigate to “Settings” > “Sharing” from your dataset’s page to make it public or share with collaborators.

Update datasets »

You can also create new versions of existing datasets allowing you to programmatically keep a dataset fresh with the latest data.

  1. Run kaggle datasets init -p /path/to/dataset to generate a metadata file if you don’t already have one
  2. Make sure the id field in datapackage.json points to your dataset
  3. Run kaggle datasets version -p /path/to/dataset -m "Your message here"

For more tips for maintaining your dataset, check out our guide to data publishing. Have questions or feedback for us? We’d love to hear from you on our Product Feedback forum. We’ll also aim to share more tutorials showing you how to use these features. And if you use the API in a data project, let us know in the comments!

Leave a Reply