{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## IMDb Day 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The format of the IMDb file is: \n", "- Line by line \n", "- Columns separated by the \\| character\n", "- Header starting with #\n", "\n", "<img src=\"../../lectures/img/header_imdb.png\" alt=\"Drawing\" style=\"width: 1200px;\"/> " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Find the number of unique genres\n", "\n", "Watch out for the upper/lower cases!\n", "\n", "The correct answer is 22" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Find the number of movies per genre\n", "\n", "Correct answers:\n", "\n", "<img src=\"../../lectures/img/movie_dict.png\" alt=\"Drawing\" style=\"width: 500px;\"/> " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. (Optional/Extra) What is the average length of the movies (hours and minutes) in each genre?\n", "\n", "Here you have to loop twice!\n", "\n", "Correct answers:\n", "\n", "<img src=\"../../lectures/img/average_length.png\" alt=\"Drawing\" style=\"width: 500px;\"/> " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. (Advanced) Re-structure and write the output to a new file as below\n", "\n", "<img src=\"../../lectures/img/re-structured.png\" alt=\"Drawing\" style=\"width: 400px;\"/> \n", "\n", "Note:\n", "- Use a text editor, not notebooks for this\n", "- Use functions as much as possible\n", "- Use `sys.argv` for input/output\n", "\n", "<br><br><br><br><br><br><br>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tips for if you're unsure on how to start\n", "\n", "As everything is coding, there are many different ways of writing code that will achieve the same end result. Below is presented one way of thinking about these problems, there are of course many other ways.\n", "\n", "### 1. Find the number of unique genres\n", "\n", "1. Create an empty list outside the loop where you will collect all the different genres\n", "2. Start by reading the file and splitting up the columns, just as you did on yesterdays exercise\n", "3. Identify the columns where all the genres for a movie is listed, and split this column into a list\n", "4. Loop over this list of genres and add them to your empty list from step one, UNLESS IT IS ALREADY THERE\n", "5. After looping over all lines, check the length of your list from step 1\n", "\n", "### 2. Find the number of movies per genre\n", "\n", "1. Use the code from above, but instead of creating an empty list before starting to loop over the file, create an empty dictionary\n", "2. When looping over the genres, check if they are in the dictionary, otherwise add them and assign the value 1 to them. If they are present already, increase the value with 1.\n", "\n", "### 3. What is the average length of the movies (hours and minutes) in each genre?\n", "\n", "1. Use the code above, but instead of assigning the value 1 to each genre initially, add the runtime of the movie as a list item\n", "2. For each new movie, append the runtime to the existing list, so by the end of the loop you have, for each genre, a list of the runtimes for all movies in that genre\n", "3. Loop over the dictionary and calculate the average of the list\n", "4. Format the average (that is in seconds) to hours and minutes by dividing appropriately\n", "5. Print the results, or save them to a variable or file\n", "\n", "### 4. Re-structure and write the output to a new file as below\n", "\n", "1. Use the code above, but instead of just adding the runtime as a list element to each genre, add a list (or tuple) of items (rating, movie, year, runtime) to the list. In the end you will for each genre have a list of lists (or tuples), containing all the relevant information for each movie\n", "2. Loop over the dictionary and write the content of the dictionary to a new file with the correct formatting" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 4 }