{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "04-Recommand.ipynb",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"metadata": {
"id": "b3vIWjxfy4uB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"
\n",
"# **Chapter 4 | 추천시스템**\n",
"참고사이트 : https://www.machinelearningplus.com/nlp/cosine-similarity/\n",
"## **1 데이터 불러오기**"
]
},
{
"metadata": {
"id": "_-fO8fCqy4uD",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# ! apt-get update\n",
"# ! apt-get install g++ openjdk-8-jdk \n",
"# ! pip3 install nltk konlpy matplotlib gensim \n",
"\n",
"# ! apt-get install fonts-nanum-eco\n",
"# ! apt-get install fontconfig\n",
"# ! fc-cache -fv\n",
"# ! cp /usr/share/fonts/truetype/nanum/Nanum* /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf/\n",
"# ! rm -rf /content/.cache/matplotlib/*"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "nBagMLpZzEyD",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 72
},
"outputId": "57224cf2-c10e-4148-e7d7-3f5bf539d896"
},
"cell_type": "code",
"source": [
" import nltk\n",
" nltk.download('wordnet')\n",
"\n",
"import pandas as pd\n",
"import io, requests\n",
"url = \"https://raw.githubusercontent.com/YongBeomKim/nltk_basic/master/data/movies_metadata.csv\"\n",
"response = requests.get(url).content\n",
"movies = pd.read_csv(io.StringIO(response.decode('utf-8')),\n",
" usecols=['original_title', 'overview', 'title'], low_memory=False)\n",
"movies = movies.dropna(axis=0)\n",
"movies.shape"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
"[nltk_data] Package wordnet is already up-to-date!\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(44506, 3)"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"metadata": {
"id": "anzxC1Kxy4uH",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 138
},
"outputId": "c9d4aee2-a4eb-4e43-bae1-7abe04e80f8f"
},
"cell_type": "code",
"source": [
"movie_plot_li = movies['overview']\n",
"movie_info_li = movies['title']\n",
"movies.head(3)"
],
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n", " | original_title | \n", "overview | \n", "title | \n", "
---|---|---|---|
0 | \n", "Toy Story | \n", "Led by Woody, Andy's toys live happily in his ... | \n", "Toy Story | \n", "
1 | \n", "Jumanji | \n", "When siblings Judy and Peter discover an encha... | \n", "Jumanji | \n", "
2 | \n", "Grumpier Old Men | \n", "A family wedding reignites the ancient feud be... | \n", "Grumpier Old Men | \n", "