Assignment 8
Text analysis with dictionaries, sets and loops
Submit before 11:30 PM Sunday November 12
Overview
You will expand on the functionality of the text analysis code from the last assignment. The enhancements will allow you to compare and recommend texts based on their similarity scores.
Python Functions and Scripts
This assignment requires use of code in the zipped a8 folder. It includes updates to the functions from the last assignment, plus additional some new functions that demonstrate reading files and comparing their similarities.
Complete the following steps, answering any questions:
- Study the function clean_word. Test it on a variety of words with different endings and capitalizations. Explain what it does.
- Revisit the display_counts function from the last assignment. Try it out with different files in the files list (e.g. display_counts(files[0]). Note: if you work on a windows computer, you might need to modify the files strings so that they use backslashes (\) instead of forward slashes.
- Try out the compare_text function with two files from the files list (e.g. compare_text(files[0], files[1])). Indentify a pair of files that has a high similarity score and a low similarity score. Report them.
- Write a function called recommend_text that takes a file name (string) as a parameter. It should return the filename (string) from files that refers to the text with the highest similarity to the given parameter file. This function should invoke the compare_text function.
- Write a function called best_pair. It should consider all non-repeating pairs of files in the files list and return the pair of text files with the highest similarity score. This function should invoke the compare_text function.
- Modify the count_word function so that it doesn't include words in the stop_list set. Rerun your tests and review the results. Do you notice any changes?
Deliverables
Create a text file called assn8.txt that contains the following:
- A statement that summarizes your completion of the assignment. It should contain
the following information:
- Any help or resources that you used in completing the assignment.
- A summary of your experience possibly including your approach, difficulties and time you spent.
- For each of your Python scripts or functions:
- A listing of the code
- Running examples that demonstrate that your code works correctly
Submit your assn8.txt file to D2L.