Week 3 Outline

  • Discuss experience summaries
  • Review quiz 1
  • Review analysis with term frequency vectors
  • Text Compare2 code
  • nltk library for tokenizing
  • Term Frequency / Inverse Document Frequency (tf-idf)
  • Next steps
  • Term frequency analysis on course desriptions (code download)
  • Using k-nearest neighbor for (supervised) predictions
  • Discussion items:
    • Does IDF weighting help? Better weighting?
    • Does k matter?
    • Strategies for summaring best k
    • Evaluating on training instances --- alternatives?
  • Ideas for experience reports