Week 3 Outline
- Discuss experience summaries
- Review quiz 1
- Review analysis with term frequency vectors
- Text Compare2 code
- nltk library for tokenizing
- Term Frequency / Inverse Document Frequency (tf-idf)
- Next steps
- Term frequency analysis on course desriptions (code download)
- Using k-nearest neighbor for (supervised) predictions
- Discussion items:
- Does IDF weighting help? Better weighting?
- Does k matter?
- Strategies for summaring best k
- Evaluating on training instances --- alternatives?
- Ideas for experience reports