Format | Price | Quantity | Select |
---|---|---|---|
PDF Download |
$6.95
|
||
Printed Black & White Copy |
$7.25
|
The case describes a coffee shop considering how it can take advantage of a fictional Yelp promotion that allows businesses to select the top three customer reviews they wish to highlight on their Yelp pages. As the coffee shop owner considers various strategies like which reviews were voted the most useful and which reviews were most recent, she recognizes that her analysis is affected by the same cold start problem that Yelp faces when determining if a brand-new review should be shown at the top of a review list. The owner regards this challenge as an opportunity to use a pretrained language model on historical Yelp reviews that others found useful in hopes of uncovering the top three reviews to highlight her shop. This case allows students to consider how data science and machine learning can be used to analyze natural language, specifically in the form of text. The material provides an opportunity to use Python starter code from three Jupyter notebooks to process text data (Yelp reviews) formatted as JSON files, create document embeddings using the text, and then leverage the document embeddings as features in a model that predicts an outcome. Students should be familiar with concepts such as regression techniques, out-of-sample testing, and building models with categorical variables. Jupyter notebooks are provided as instructor supplements to this case and will require a Jupyter environment to use them.
- To practice working with JSON-formatted data - To introduce preprocessing steps for text like tokenization, lemmatization, sentence segmentation, and phrase modeling - To introduce the concept of embeddings through constructing word embeddings - To explore building document embeddings to capture meaning - To construct predictive models using document embeddings as features