Generating a Dataset with ChatGPT

Whether it is Data Mining, Machine Learning, or Deep Learning, they all depend on datasets in any implementation domain. Sometimes, obtaining datasets can be very challenging due to their large size, rarity, strict permission requirements, and so on. This post will provide information on how to use ChatGPT to create datasets.

Alright, let’s try for a Natural Language Processing (NLP) case where a dataset is needed for text classification (sometimes referred to as text categorization). For example, with COVID, there are four variants: alpha, beta, gamma, and delta. Now, the system is asked to detect news that discusses which variants. Therefore, five classes are needed: alpha, beta, gamma, delta, and others.

Just ask ChatGPT: Can you give me a news paragraph about the “Alpha” COVID-19 variant?. Then, a paragraph of information will appear. Now, we just need to type again: Can you add 9 similar paragraphs? (if you want to get 10 articles for the “Alpha” class)..

Next, you can transfer the information to a CSV file that will be used for the training process. Of course, you can add as much data per class as possible. That’s all, I hope it’s useful.