{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# IMDb guide" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Learning to think in code is not easy if you've never programmed before. If you need a nudge along the way while solving the IMDb problems presented in class, here follows some pseudocode/advice for each task. There will be no answers in here, just tips and advice in how to solve the problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the number of movies per genre" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. A good starting point here is to use the code from yesterdays last IMDb exercise as a base.\n", "2. What you want to do here is count the number of occurrences of each genre in the file, dictionaries would be a good option here. Initialize an empty dictionary outside the loop.\n", "3. Loop over the genre list that you retrieved for each movie, and use an `if/else` statement to check if the genre is present as a key in the dictionary. If it is not, add it as a key with the value 1 (as this is the first occurrence of the genre). Remember to think about the upper/lower case problem, and make everything lowercase before comparing or adding to the dictionary.\n", "4. If the genre is already present as a key in the dictionary, just add 1 to the value associated with that key.\n", "5. After the loop, print the dictionary and the numbers for each genre should be stated." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is the average length of the movies (hours and minutes) in each genre?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. This question is similar to the previous, but instead of number of movies in each genre we want to find out the average length of the movies in each genre. So a good starting point is to use the same dictionary that we used in the previous exercise.\n", "2. Locate the column where the runtime is listed (you can manually inspect the file) and save it to a variable.\n", "3. Instead of just counting the number of occurrences for each genre, we here want to save all the runtimes for movies in a specific genre. So instead of having an integer as a value in our dictionary, we can have a list containing the runtimes instead. If the genre does not exist, add it to the list as a value, containing the runtime. If the genre exists already, append the runtime to the already existing list.\n", "4. Once you have a dictionary with the runtimes for all the movies listed, we need to loop over this dictionary to calculate the average. But first, print your dictionary just to visually inspect how it looks.\n", "5. For each genre in the dictionary, calculate the mean runtime by summing up all runtimes in the list, and divide by the number of entries.\n", "6. As this gives you the runtime in seconds, we need to convert this to hours first, so divide the number with 3600, and save it as an `integer` to get the number of whole hours.\n", "7. Calculate how many minutes are left by `runtime-(hours*3600)/60`. As we don't care about the seconds, you can just round the minutes.\n", "8. Print genre, hours, and minutes for each iteration in the loop.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Re-structure and write the output to a new file as below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Drawing\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a tricky one!\n", "\n", "1. Use `sys.argv` and use for defining input- and output files on the command line.\n", "1. Find out what information needs to be saved, and how to save it in a smart way. In this case it's still sorted according to genres, which means we can re-use the genre dictionary we have used before. We also want to save information from every movie, just like with the runtime, so the previous code still works. The difference here is that we want to save even more information, so instead of just appending a value to the list, as we did with the runtime, we have to append more. In this case a list in a list is a good option. It could for example in the end have the structure as below:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`{'drama': [[8.5, 'Movie1', 1987, '1h42min'], [7.3, 'Movie2', 1999, '2h12min']].....}`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Save all the relevant information in new variables (rating, year, runtime, movie, genre, etc) when looping over the file.\n", "4. Calculate the runtime for the movie using the code from before. You can just do it in the loop directly, or break it out and make it into a function if you want.\n", "5. Once you have saved all the important information into your dictionary and looped over the file, it's time to reformat it and write it to the output file. Start by opening the output file in `write` mode.\n", "6. Loop over the dictionary and write the output to file. Here you have to loop twice, first `for genre in dictionary`, where you write the genre to file. Then `for movie in genre` to write the information for each movie under the correct genre." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }