"In the following, we will look at different representations of text and feature extraction methods for text. We will make use of the Yelp Dataset (https://www.kaggle.com/yelp-dataset/yelp-dataset): \"This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.\" This example is adapted from the FeatEng book."
]
},
{
"cell_type": "markdown",
"id": "78930164-4e89-426f-b153-efdc4a1cf7ef",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<b>Note:</b> As the dataset is simply too large to store it on gitlab, please download the dataset directly using the link above, unzip it and store it in the data directory..</div>"