{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Recommend me, Senpai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Team: GIT -REKT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Team\n", "* Nicolas Botello\n", "* Yang Yang\n", "* Austin Tang\n", "* Connor Flatt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview & Motivation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Have you ever wondered if a picture could say a thousand words? Or have you ever wondered if a picture has any influence on how we make decisions? \n", "These are questions the GIT -REKT team pondered on in the beginning of the “recommend me, senpai” project. The team received a dataset on “myanimelist.com” from a fellow researcher. The dataset holds a massive amount of data containing reviews from users and anime image links. In addition, we found a great library called illustration2vec that analyzes anime images and gives back labels to the images sent to the library. With this data and illustration2vec we could, find answers to the initial questions asked at the beginning of our analysis. \n", "Since we had anime photos and a library to analyze them with, we decided to give a more focused question compared to our initial questions. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Related Work\n", "We found movie content-based recommendation systems interesting and wanted to predict if someone will enjoy something based of an image. We also found [illustration2vec](http://illustration2vec.net/papers/illustration2vec-main.pdf) and there paper which really seemed to be great for extracting data from anime pictures. Google photos tag searching was a great example of extracting metadata tags to be able identify pictures. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The main question we decided to solve was, can we judge a video by it’s cover?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initial Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data that is loaded in, was extracted from following the scripts in the READ.md file. Which from all the database of 'myanimelist.com' database was filtered down to only anime. We then extracted just the cover photo for each image and utilized that. In order to retrieve more pictures we scraped 'myanimelist.com' and each anime to retrieve more information after we did our first analyses and felt we needed more pictures. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load In Information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bring in information that contains all the labels and only bring in labels that have higher than a 50% chance of being inside the picture" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json, os \n", "base_path=''\n", "json_data_path= os.path.join(base_path,'labels-min.json')\n", "json2_data_path= os.path.join(base_path,'labels2-min.json')\n", "rows = []\n", "label_freq= {}\n", "numOfLabelsPerAnime =[]\n", "total_labels=0\n", "for data_path in [json_data_path, json2_data_path]:\n", " with open(data_path,'r') as labels_file:\n", " for line in labels_file:\n", " labels = {}\n", " pic=json.loads(line)\n", " labels[\"id\"]=pic[\"id\"]\n", " labels_dict = pic[\"labels\"][0]\n", " #there is no need for \n", " #copyright : gives u anime names nt needed\n", " #character :gives name of character in pic\n", " #rating :says if image is safe or not\n", " #general :gives general labels\n", " for item in labels_dict[\"general\"]:\n", " if item[1] > .5:\n", " total_labels+=1\n", " labels[item[0]]=item[1]\n", " #find occuerences of labels\n", " if item[0] in label_freq:\n", " label_freq[item[0]]= label_freq[item[0]] +1\n", " #print label_freq[item[0]]\n", " else:\n", " label_freq[item[0]]=1\n", " numOfLabelsPerAnime.append(len(labels)-1)\n", " rows.append(labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Number of records of pictures that we have with labels" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13084\n" ] } ], "source": [ "print len(rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Avg number of labels per picture \n", "Max and min" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average: 4\n", "Max: 32\n", "Min: 1\n" ] } ], "source": [ "print \"Average: \" + str(sum(numOfLabelsPerAnime)/len(numOfLabelsPerAnime))\n", "print \"Max: \" + str(max(numOfLabelsPerAnime))\n", "print \"Min: \" + str(min(numOfLabelsPerAnime))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get Percentage of times labels showed up in complete dataset" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#convert from just occurences to percentage\n", "\n", "for key,value in label_freq.iteritems():\n", " label_freq[key]= float(value)/ total_labels" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame(rows) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example of data: \n", "Each column is a label that belongs to a picture and the value is the percentage of how reliable that is found inside the picture" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | 1boy | \n", "1girl | \n", "2boys | \n", "2girls | \n", "3boys | \n", "3girls | \n", "4girls | \n", "5girls | \n", "6+girls | \n", ":3 | \n", "... | \n", "white legwear | \n", "white panties | \n", "wings | \n", "witch hat | \n", "wolf ears | \n", "wrist cuffs | \n", "yellow eyes | \n", "younger | \n", "yuri | \n", "zettai ryouiki | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
1 | \n", "0.000000 | \n", "0.55814 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
4 | \n", "0.575337 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 309 columns
\n", "\n", " | id | \n", "image | \n", "title | \n", "
---|---|---|---|
0 | \n", "890.0 | \n", "http://cdn.myanimelist.net/images/anime/12/211... | \n", "Yuusha-Ou GaoGaiGar | \n", "
1 | \n", "2151.0 | \n", "http://cdn.myanimelist.net/images/anime/1/2296... | \n", "Nils no Fushigi na Tabi | \n", "
2 | \n", "22507.0 | \n", "http://cdn.myanimelist.net/images/anime/3/6483... | \n", "Initial D Final Stage | \n", "
3 | \n", "23421.0 | \n", "http://cdn.myanimelist.net/images/anime/3/7564... | \n", "Re:␣Hamatora | \n", "
4 | \n", "1167.0 | \n", "http://cdn.myanimelist.net/images/anime/11/379... | \n", "Samurai Gun | \n", "
\n", " | 1boy | \n", "1girl | \n", "2boys | \n", "2girls | \n", "3boys | \n", "3girls | \n", "4girls | \n", "5girls | \n", "6+girls | \n", ":3 | \n", "... | \n", "white legwear | \n", "white panties | \n", "wings | \n", "witch hat | \n", "wolf ears | \n", "wrist cuffs | \n", "yellow eyes | \n", "younger | \n", "yuri | \n", "zettai ryouiki | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
1 | \n", "0.000000 | \n", "0.55814 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "0.000000 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
4 | \n", "0.575337 | \n", "0.00000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 308 columns
\n", "\n", " | 1boy | \n", "1girl | \n", "2boys | \n", "2girls | \n", "3boys | \n", "3girls | \n", "4girls | \n", "5girls | \n", "6+girls | \n", ":3 | \n", "... | \n", "white legwear | \n", "white panties | \n", "wings | \n", "witch hat | \n", "wolf ears | \n", "wrist cuffs | \n", "yellow eyes | \n", "younger | \n", "yuri | \n", "zettai ryouiki | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 rows × 308 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "31 | \n", "32 | \n", "33 | \n", "34 | \n", "35 | \n", "36 | \n", "37 | \n", "38 | \n", "39 | \n", "id | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "890.0 | \n", "
1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2151.0 | \n", "
2 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "22507.0 | \n", "
3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1167.0 | \n", "
4 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "5226.0 | \n", "
5 rows × 41 columns
\n", "