"In the following, we will look at different representations of text and feature extraction methods for text. We will make use of the Yelp Dataset (https://www.kaggle.com/yelp-dataset/yelp-dataset): \"This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.\" This example is adapted from the FeatEng book."
"In the following, we will look at different representations of text and feature extraction methods for text. We will make use of the Yelp Dataset (https://www.kaggle.com/yelp-dataset/yelp-dataset): \"This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.\" This example is adapted from the FeatEng book."
]
]
},
},
{
"cell_type": "markdown",
"id": "78930164-4e89-426f-b153-efdc4a1cf7ef",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<b>Note:</b> As the dataset is simply too large to store it on gitlab, please download the dataset directly using the link above, unzip it and store it in the data directory..</div>"