{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "\n", "***\n", "***\n", "# 数据抓取\n", " > # 网易云音乐\n", "\n", "***\n", "***\n", "\n", "王成军 \n", "\n", "wangchengjun@nju.edu.cn\n", "\n", "计算传播网 http://computational-communication.com\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "https://github.com/RitterHou/music-163\n", "\n", "爬取网易云音乐的所有的歌曲的评论数。以下为主要思路:\n", "\n", "- 爬取所有的歌手信息(artists.py);\n", "- 根据上一步爬取到的歌手信息去爬取所有的专辑信息(album_by _artist.py);\n", "- 根据专辑信息爬取所有的歌曲信息(music_by _album.py);\n", "- 根据歌曲信息爬取其评论条数(comments_by _music.py)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 爬取所有的歌手信息(artists.py)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-10T03:06:06.662665Z", "start_time": "2018-05-10T03:06:06.655299Z" }, "slideshow": { "slide_type": "slide" } }, "source": [ "观察网易云音乐官网页面HTML结构\n", "\n", "http://music.163.com/" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-10T05:06:03.732955Z", "start_time": "2018-05-10T05:06:03.727816Z" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "\n", "http://music.163.com/#/discover/artist/cat" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "http://music.163.com/#/discover/artist/cat?id=4003&initial=0" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:46:31.341755Z", "start_time": "2019-06-08T06:46:31.101827Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import requests\n", "from bs4 import BeautifulSoup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:46:36.971775Z", "start_time": "2019-06-08T06:46:36.965137Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "headers = {\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", " 'Accept-Encoding': 'gzip, deflate, sdch',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',\n", " 'Cache-Control': 'no-cache',\n", " 'Connection': 'keep-alive',\n", " 'Cookie': '_ntes_nnid=7eced19b27ffae35dad3f8f2bf5885cd,1476521011210; _ntes_nuid=7eced19b27ffae35dad3f8f2bf5885cd; usertrack=c+5+hlgB7TgnsAmACnXtAg==; Province=025; City=025; NTES_PASSPORT=6n9ihXhbWKPi8yAqG.i2kETSCRa.ug06Txh8EMrrRsliVQXFV_orx5HffqhQjuGHkNQrLOIRLLotGohL9s10wcYSPiQfI2wiPacKlJ3nYAXgM; P_INFO=hourui93@163.com|1476523293|1|study|11&12|jis&1476511733&mail163#jis&320100#10#0#0|151889&0|g37_client_check&mailsettings&mail163&study&blog|hourui93@163.com; NTES_SESS=Fa2uk.YZsGoj59AgD6tRjTXGaJ8_1_4YvGfXUkS7C1NwtMe.tG1Vzr255TXM6yj2mKqTZzqFtoEKQrgewi9ZK60ylIqq5puaG6QIaNQ7EK5MTcRgHLOhqttDHfaI_vsBzB4bibfamzx1.fhlpqZh_FcnXUYQFw5F5KIBUmGJg7xdasvGf_EgfICWV; S_INFO=1476597594|1|0&80##|hourui93; NETEASE_AUTH_SOURCE=space; NETEASE_AUTH_USERNAME=hourui93; _ga=GA1.2.1405085820.1476521280; JSESSIONID-WYYY=cbd082d2ce2cffbcd5c085d8bf565a95aee3173ddbbb00bfa270950f93f1d8bb4cb55a56a4049fa8c828373f630c78f4a43d6c3d252c4c44f44b098a9434a7d8fc110670a6e1e9af992c78092936b1e19351435ecff76a181993780035547fa5241a5afb96e8c665182d0d5b911663281967d675ff2658015887a94b3ee1575fa1956a5a%3A1476607977016; _iuqxldmzr_=25; __utma=94650624.1038096298.1476521011.1476595468.1476606177.8; __utmb=94650624.20.10.1476606177; __utmc=94650624; __utmz=94650624.1476521011.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", " 'DNT': '1',\n", " 'Host': 'music.163.com',\n", " 'Pragma': 'no-cache',\n", " 'Referer': 'http://music.163.com/',\n", " 'Upgrade-Insecure-Requests': '1',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'\n", "}" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:48:24.064616Z", "start_time": "2019-06-08T06:48:23.830515Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "group_id = 1001\n", "initial = 67\n", "params = {'id': group_id, 'initial': initial}\n", "r = requests.get('http://music.163.com/discover/artist/cat', params=params, headers=headers)\n", "\n", "# 网页解析\n", "soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", "body = soup.body" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:48:24.943027Z", "start_time": "2019-06-08T06:48:24.917495Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "hotartist_dic = {}\n", "hot_artists = body.find_all('a', attrs={'class': 'msk'})\n", "for artist in hot_artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " hotartist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:48:26.078504Z", "start_time": "2019-06-08T06:48:26.052773Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "artist_dic = {}\n", "artists = body.find_all('a', attrs={'class': 'nm nm-icn f-thide s-fc0'})\n", "for artist in artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:48:27.132271Z", "start_time": "2019-06-08T06:48:27.125570Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'1007135': '陈小虎',\n", " '1014024': '茶季杨',\n", " '1028199': '菜小宝丶',\n", " '1041027': 'Chace',\n", " '1043290': '传琦SAMA',\n", " '1043304': '曹思义',\n", " '1044054': 'R7CKY',\n", " '1046063': '蔡方萌',\n", " '1049979': '陈建斌',\n", " '1051032': '陈坤',\n", " '1058228': '陈鸿宇',\n", " '1060096': '大彻',\n", " '1081291': 'CloudINJoke',\n", " '1084178': 'Candy_Wind',\n", " '1124141': '残泪杰',\n", " '1132979': '蔡照',\n", " '1149014': '厨房仔',\n", " '1152025': '陈知&游园惊梦',\n", " '12010027': '陈柏霖',\n", " '12023426': 'CARTA',\n", " '12079044': '陈硕子',\n", " '12091078': '崔航',\n", " '12095299': '陈柯宇',\n", " '12131566': 'CLOUDWANG 王云',\n", " '12132471': '陈信喆',\n", " '12301031': '陈圣夫',\n", " '12312092': '曹方瑞',\n", " '12474254': '陈飞宇',\n", " '12488046': '陈玮镔',\n", " '12634049': '丛铭君',\n", " '13056440': '陈立农',\n", " '13057490': '陈斯琪',\n", " '13059431': '陈名豪',\n", " '13228454': '蔡威泽',\n", " '14100493': '崔伟立',\n", " '14471208': '陈丙',\n", " '14713124': '陈彦希REGI',\n", " '168021': '陈俊彤',\n", " '168042': '曹寅',\n", " '189379': '陈鹏杰',\n", " '2110': '曹格',\n", " '2111': '崔健',\n", " '2112': '陈小春',\n", " '2113': '车继铃',\n", " '2115': '陈百强',\n", " '2116': '陈奕迅',\n", " '2117': '侧田',\n", " '2118': '成龙',\n", " '2119': '蔡国庆',\n", " '2121': '蔡旻佑',\n", " '2122': '陈冠希',\n", " '2124': '陈楚生',\n", " '2125': '陈晓东',\n", " '2127': '陈柏宇',\n", " '2130': '蔡国权',\n", " '2131': '川子',\n", " '2133': '陈奂仁',\n", " '2134': '陈坤',\n", " '2135': '陈勋奇',\n", " '2138': '陈浩民',\n", " '2141': '陈雷',\n", " '2147': '陈旭',\n", " '2150': '陈冠蒲',\n", " '2153': '常石磊',\n", " '2159': '曹轩宾',\n", " '2164': '陈伟霆',\n", " '2174': '陈底里',\n", " '2201': '陈势安',\n", " '2202': '陈其钢',\n", " '2204': '陈辉权',\n", " '2209': '陈少华',\n", " '2213': '苍茫',\n", " '2230': '陈雅森',\n", " '224360': '陈亮',\n", " '2276': '崔京浩',\n", " '2294': '陈光荣',\n", " '2312': 'C AllStar',\n", " '2330': '晨熙',\n", " '2331': '陈翔',\n", " '2336': '程池',\n", " '2338': '船长',\n", " '2375': '蔡志展',\n", " '2414': '曹越',\n", " '2428': '陈劭康KOMIC',\n", " '2440': '陈伟伦',\n", " '2448': '曹秦',\n", " '31055': 'Cee',\n", " '6521': '曾航生',\n", " '6557': '曾一鸣',\n", " '6608': '曾志伟',\n", " '6862': '曾志豪',\n", " '727005': '陈赫',\n", " '826307': '陈晓',\n", " '900139': '曾经艺也',\n", " '941023': '曹槽',\n", " '958173': '才让东珠',\n", " '963409': '陈学冬',\n", " '963431': '陈致逸',\n", " '964450': '陈奕夫',\n", " '991003': '崔跃文'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_dic" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:48:56.865253Z", "start_time": "2019-06-08T06:48:56.821547Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def save_artist(group_id, initial, hot_artist_dic, artisti_dic):\n", " params = {'id': group_id, 'initial': initial}\n", " r = requests.get('http://music.163.com/discover/artist/cat', params=params)\n", "\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", "\n", " hot_artists = body.find_all('a', attrs={'class': 'msk'})\n", " artists = body.find_all('a', attrs={'class': 'nm nm-icn f-thide s-fc0'})\n", " for artist in hot_artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " hot_artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)\n", "\n", " for artist in artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)\n", " #return artist_dic, hot_artist_dic\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:49:26.692104Z", "start_time": "2019-06-08T06:49:25.980456Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "gg = 1001\n", "initial = 67\n", "artist_dic = {}\n", "hot_artist_dic = {} \n", "save_artist(gg, initial, hot_artist_dic, artist_dic )" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:49:27.218844Z", "start_time": "2019-06-08T06:49:27.211969Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'1007135': '陈小虎',\n", " '1014024': '茶季杨',\n", " '1028199': '菜小宝丶',\n", " '1041027': 'Chace',\n", " '1043290': '传琦SAMA',\n", " '1043304': '曹思义',\n", " '1044054': 'R7CKY',\n", " '1046063': '蔡方萌',\n", " '1049979': '陈建斌',\n", " '1051032': '陈坤',\n", " '1058228': '陈鸿宇',\n", " '1060096': '大彻',\n", " '1081291': 'CloudINJoke',\n", " '1084178': 'Candy_Wind',\n", " '1124141': '残泪杰',\n", " '1132979': '蔡照',\n", " '1149014': '厨房仔',\n", " '1152025': '陈知&游园惊梦',\n", " '12010027': '陈柏霖',\n", " '12023426': 'CARTA',\n", " '12079044': '陈硕子',\n", " '12091078': '崔航',\n", " '12095299': '陈柯宇',\n", " '12131566': 'CLOUDWANG 王云',\n", " '12132471': '陈信喆',\n", " '12301031': '陈圣夫',\n", " '12312092': '曹方瑞',\n", " '12474254': '陈飞宇',\n", " '12488046': '陈玮镔',\n", " '12634049': '丛铭君',\n", " '13056440': '陈立农',\n", " '13057490': '陈斯琪',\n", " '13059431': '陈名豪',\n", " '13228454': '蔡威泽',\n", " '14100493': '崔伟立',\n", " '14471208': '陈丙',\n", " '14713124': '陈彦希REGI',\n", " '168021': '陈俊彤',\n", " '168042': '曹寅',\n", " '189379': '陈鹏杰',\n", " '2110': '曹格',\n", " '2111': '崔健',\n", " '2112': '陈小春',\n", " '2113': '车继铃',\n", " '2115': '陈百强',\n", " '2116': '陈奕迅',\n", " '2117': '侧田',\n", " '2118': '成龙',\n", " '2119': '蔡国庆',\n", " '2121': '蔡旻佑',\n", " '2122': '陈冠希',\n", " '2124': '陈楚生',\n", " '2125': '陈晓东',\n", " '2127': '陈柏宇',\n", " '2130': '蔡国权',\n", " '2131': '川子',\n", " '2133': '陈奂仁',\n", " '2134': '陈坤',\n", " '2135': '陈勋奇',\n", " '2138': '陈浩民',\n", " '2141': '陈雷',\n", " '2147': '陈旭',\n", " '2150': '陈冠蒲',\n", " '2153': '常石磊',\n", " '2159': '曹轩宾',\n", " '2164': '陈伟霆',\n", " '2174': '陈底里',\n", " '2201': '陈势安',\n", " '2202': '陈其钢',\n", " '2204': '陈辉权',\n", " '2209': '陈少华',\n", " '2213': '苍茫',\n", " '2230': '陈雅森',\n", " '224360': '陈亮',\n", " '2276': '崔京浩',\n", " '2294': '陈光荣',\n", " '2312': 'C AllStar',\n", " '2330': '晨熙',\n", " '2331': '陈翔',\n", " '2336': '程池',\n", " '2338': '船长',\n", " '2375': '蔡志展',\n", " '2414': '曹越',\n", " '2428': '陈劭康KOMIC',\n", " '2440': '陈伟伦',\n", " '2448': '曹秦',\n", " '31055': 'Cee',\n", " '6521': '曾航生',\n", " '6557': '曾一鸣',\n", " '6608': '曾志伟',\n", " '6862': '曾志豪',\n", " '727005': '陈赫',\n", " '826307': '陈晓',\n", " '900139': '曾经艺也',\n", " '941023': '曹槽',\n", " '958173': '才让东珠',\n", " '963409': '陈学冬',\n", " '963431': '陈致逸',\n", " '964450': '陈奕夫',\n", " '991003': '崔跃文'}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_dic" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:50:30.406054Z", "start_time": "2019-06-08T06:50:20.699560Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "65\n", "66\n", "67\n", "68\n", "69\n", "70\n", "71\n", "72\n", "73\n", "74\n", "75\n", "76\n", "77\n", "78\n", "79\n", "80\n", "81\n", "82\n", "83\n", "84\n", "85\n", "86\n", "87\n", "88\n", "89\n", "90\n" ] } ], "source": [ "artist_dic = {}\n", "hot_artist_dic = {} \n", "for i in range(65, 91):\n", " print(i)\n", " save_artist(gg, i, hot_artist_dic, artist_dic )" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:50:34.393601Z", "start_time": "2019-06-08T06:50:34.389092Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "260" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(hot_artist_dic)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:50:36.174734Z", "start_time": "2019-06-08T06:50:36.170101Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2329" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(artist_dic)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 爬取所有的专辑信息(album_by _artist.py)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:15:13.091212Z", "start_time": "2018-05-09T10:15:13.086629Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'89659'" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hot_artist_dic.keys())[0]" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:11:23.224190Z", "start_time": "2018-05-09T10:11:23.220284Z" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "http://music.163.com/#/artist/album?id=89659&limit=400" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "headers = {\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", " 'Accept-Encoding': 'gzip, deflate, sdch',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',\n", " 'Cache-Control': 'no-cache',\n", " 'Connection': 'keep-alive',\n", " 'Cookie': '_ntes_nnid=7eced19b27ffae35dad3f8f2bf5885cd,1476521011210; _ntes_nuid=7eced19b27ffae35dad3f8f2bf5885cd; usertrack=c+5+hlgB7TgnsAmACnXtAg==; Province=025; City=025; _ga=GA1.2.1405085820.1476521280; NTES_PASSPORT=6n9ihXhbWKPi8yAqG.i2kETSCRa.ug06Txh8EMrrRsliVQXFV_orx5HffqhQjuGHkNQrLOIRLLotGohL9s10wcYSPiQfI2wiPacKlJ3nYAXgM; P_INFO=hourui93@163.com|1476523293|1|study|11&12|jis&1476511733&mail163#jis&320100#10#0#0|151889&0|g37_client_check&mailsettings&mail163&study&blog|hourui93@163.com; JSESSIONID-WYYY=189f31767098c3bd9d03d9b968c065daf43cbd4c1596732e4dcb471beafe2bf0605b85e969f92600064a977e0b64a24f0af7894ca898b696bd58ad5f39c8fce821ec2f81f826ea967215de4d10469e9bd672e75d25f116a9d309d360582a79620b250625859bc039161c78ab125a1e9bf5d291f6d4e4da30574ccd6bbab70b710e3f358f%3A1476594130342; _iuqxldmzr_=25; __utma=94650624.1038096298.1476521011.1476588849.1476592408.6; __utmb=94650624.11.10.1476592408; __utmc=94650624; __utmz=94650624.1476521011.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", " 'DNT': '1',\n", " 'Host': 'music.163.com',\n", " 'Pragma': 'no-cache',\n", " 'Referer': 'http://music.163.com/',\n", " 'Upgrade-Insecure-Requests': '1',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'\n", "}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:51:57.760465Z", "start_time": "2019-06-08T06:51:57.732887Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def save_albums(artist_id, albume_dic):\n", " params = {'id': artist_id, 'limit': '200'}\n", " # 获取歌手个人主页\n", " r = requests.get('http://music.163.com/artist/album', headers=headers, params=params)\n", "\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", "\n", " albums = body.find_all('a', attrs={'class': 'tit s-fc0'}) # 获取所有专辑\n", "\n", " for album in albums:\n", " albume_id = album['href'].replace('/album?id=', '')\n", " albume_dic[albume_id] = artist_id" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:51:59.852580Z", "start_time": "2019-06-08T06:51:59.156091Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "albume_dic = {}\n", "save_albums('2116', albume_dic)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:52:04.139520Z", "start_time": "2019-06-08T06:52:04.132673Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'2261058': '2116',\n", " '2261091': '2116',\n", " '2261147': '2116',\n", " '2302128': '2116',\n", " '2332713': '2116',\n", " '2336647': '2116',\n", " '2339617': '2116',\n", " '2374009': '2116',\n", " '2374010': '2116',\n", " '2374011': '2116',\n", " '2374012': '2116',\n", " '2374013': '2116',\n", " '2374014': '2116',\n", " '2465020': '2116',\n", " '2518003': '2116',\n", " '2532179': '2116',\n", " '2621232': '2116',\n", " '2692238': '2116',\n", " '2692239': '2116',\n", " '2692242': '2116',\n", " '2732645': '2116',\n", " '2767144': '2116',\n", " '2786670': '2116',\n", " '2793003': '2116',\n", " '2801259': '2116',\n", " '3070638': '2116',\n", " '3102567': '2116',\n", " '3109376': '2116',\n", " '3170625': '2116',\n", " '3184340': '2116',\n", " '3211014': '2116',\n", " '3279543': '2116',\n", " '3279818': '2116',\n", " '3319407': '2116',\n", " '3391071': '2116',\n", " '3404003': '2116',\n", " '34611604': '2116',\n", " '34735139': '2116',\n", " '34881554': '2116',\n", " '34923261': '2116',\n", " '34961173': '2116',\n", " '35398900': '2116',\n", " '35406784': '2116',\n", " '35411774': '2116',\n", " '35520072': '2116',\n", " '35643233': '2116',\n", " '35663692': '2116',\n", " '35835294': '2116',\n", " '36304576': '2116',\n", " '38296010': '2116',\n", " '6335': '2116',\n", " '6338': '2116',\n", " '6339': '2116',\n", " '6341': '2116',\n", " '6343': '2116',\n", " '6355': '2116',\n", " '6360': '2116',\n", " '6362': '2116',\n", " '6365': '2116',\n", " '6375': '2116',\n", " '6378': '2116',\n", " '6388': '2116',\n", " '6394': '2116',\n", " '6404': '2116',\n", " '6410': '2116',\n", " '6423': '2116',\n", " '6429': '2116',\n", " '6434': '2116',\n", " '6437': '2116',\n", " '6451': '2116',\n", " '6452': '2116',\n", " '6454': '2116',\n", " '6462': '2116',\n", " '6475': '2116',\n", " '6479': '2116',\n", " '6483': '2116',\n", " '6491': '2116',\n", " '6498': '2116',\n", " '6510': '2116',\n", " '6522': '2116',\n", " '6530': '2116',\n", " '6543': '2116',\n", " '6548': '2116',\n", " '6551': '2116',\n", " '6555': '2116',\n", " '6559': '2116',\n", " '6562': '2116',\n", " '6565': '2116',\n", " '6567': '2116',\n", " '6572': '2116',\n", " '6581': '2116',\n", " '6584': '2116',\n", " '6590': '2116',\n", " '6595': '2116',\n", " '6599': '2116',\n", " '6604': '2116',\n", " '6607': '2116',\n", " '6612': '2116',\n", " '6620': '2116',\n", " '6624': '2116',\n", " '74268947': '2116'}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "albume_dic" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 根据专辑信息爬取所有的歌曲信息(music_by _album.py)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:53:21.908344Z", "start_time": "2019-06-08T06:53:21.875821Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def save_music(album_id, music_dic):\n", " params = {'id': album_id}\n", " # 获取专辑对应的页面\n", " r = requests.get('http://music.163.com/album', headers=headers, params=params)\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", " musics = body.find('ul', attrs={'class': 'f-hide'}).find_all('li') # 获取专辑的所有音乐\n", " for music in musics:\n", " music = music.find('a')\n", " music_id = music['href'].replace('/song?id=', '')\n", " music_name = music.getText()\n", " music_dic[music_id] = [music_name, album_id]" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:17:59.692873Z", "start_time": "2018-05-09T10:17:59.688003Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'37110871'" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(albume_dic.keys())[0]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:54:02.277256Z", "start_time": "2019-06-08T06:54:01.729400Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "music_dic = {}\n", "save_music('6423', music_dic)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:54:05.031891Z", "start_time": "2019-06-08T06:54:05.024717Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'65321': ['兄妹', '6423'],\n", " '65326': ['十年', '6423'],\n", " '65334': ['你的背包', '6423'],\n", " '65337': ['K歌之王', '6423'],\n", " '65342': ['Shall We Talk', '6423'],\n", " '65347': ['低等动物', '6423'],\n", " '65350': ['寂寞让你更快乐', '6423'],\n", " '65355': ['圣诞结', '6423'],\n", " '65360': ['想哭', '6423'],\n", " '65365': ['不如这样', '6423'],\n", " '65369': ['你会不会', '6423'],\n", " '65373': ['Last Order', '6423'],\n", " '65377': ['冤家', '6423'],\n", " '65381': ['全世界失眠', '6423'],\n", " '65385': ['我们都寂寞', '6423'],\n", " '65389': ['阿怪', '6423'],\n", " '65393': ['谢谢侬', '6423'],\n", " '65397': ['爱是怀疑', '6423'],\n", " '65400': [\"Because You're Good To Me\", '6423'],\n", " '65403': ['Good Times', '6423'],\n", " '65406': ['要你的', '6423'],\n", " '65410': ['像一句广告', '6423'],\n", " '65414': ['我也不会那样做', '6423'],\n", " '65418': ['人造卫星', '6423'],\n", " '65421': ['狂人日记', '6423'],\n", " '65425': ['没有手机的日子', '6423'],\n", " '65429': ['跳蚤市场', '6423'],\n", " '65433': ['故事', '6423'],\n", " '65437': ['男人的错', '6423'],\n", " '65441': ['没有你', '6423']}" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "music_dic" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 根据歌曲信息爬取其评论条数(comments_by _music.py" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "http://music.163.com/#/song?id=516997458\n", "\n", "\n", "很遗憾的是评论数虽然也在详情页内,但是网易云音乐做了防爬处理,\n", "- 采用AJAX调用评论数API的方式填充评论相关数据,\n", "- 异步的特性导致我们爬到的页面中评论数是空,\n", "\n", "我们就找一找这个API吧,通关观察XHR请求发现是下面这个家伙..\n", "\n", "响应结果很丰富呢,所有评论相关的数据都有,不过经过观察发现这个API是经过加密处理的,不过没关系...\n", "\n", "https://blog.csdn.net/python233/article/details/72825003\n", "\n", "https://www.zhihu.com/question/36081767\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T12:34:21.926934Z", "start_time": "2019-06-08T12:34:21.919976Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "params = {\n", " 'csrf_token': ''\n", "}\n", "\n", "data = {\n", " 'params': '5L+s/X1qDy33tb2sjT6to2T4oxv89Fjg1aYRkjgzpNPR6hgCpp0YVjNoTLQAwWu9VYvKROPZQj6qTpBK+sUeJovyNHsnU9/StEfZwCOcKfECFFtAvoNIpulj1TDOtBir',\n", " 'encSecKey': '59079f3e07d6e240410018dc871bf9364f122b720c0735837d7916ac78d48a79ec06c6307e6a0e576605d6228bd0b377a96e1a7fc7c7ddc8f6a3dc6cc50746933352d4ec5cbe7bddd6dcb94de085a3b408d895ebfdf2f43a7c72fc783512b3c9efb860679a88ef21ccec5ff13592be450a1edebf981c0bf779b122ddbd825492'\n", " \n", "}" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:51:34.796512Z", "start_time": "2018-05-10T04:51:34.793490Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=0\n" ] } ], "source": [ "print(url)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:56:46.506510Z", "start_time": "2019-06-08T06:56:46.384303Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['moreHot', 'userId', 'comments', 'topComments', 'more', 'isMusician', 'code', 'hotComments', 'total'])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 0\n", "music_id = '65337'\n", "url = 'http://music.163.com/api/v1/resource/comments/R_SO_4_'+ music_id + '?limit=20&offset=' + str(offset)\n", "response = requests.post(url, headers=headers, data=data)\n", "cj = response.json()\n", "cj.keys()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:56:51.381315Z", "start_time": "2019-06-08T06:56:51.376314Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "(19987, 20, 15, 0)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cj['total'],len(cj['comments']), len(cj['hotComments']), len(cj['topComments'])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T06:57:27.015220Z", "start_time": "2019-06-08T06:57:27.009824Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1446073367,\n", " 'commentLocationType': 0,\n", " 'content': '没人评论了?',\n", " 'decoration': {},\n", " 'expressionUrl': None,\n", " 'liked': False,\n", " 'likedCount': 35,\n", " 'parentCommentId': 0,\n", " 'pendantData': None,\n", " 'repliedMark': False,\n", " 'showFloorComment': None,\n", " 'status': 0,\n", " 'time': 1554742251254,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p2.music.126.net/57TAbI-npfKuVhTFn8k-eQ==/109951163040243157.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'liveInfo': None,\n", " 'locationInfo': None,\n", " 'nickname': 'booxs',\n", " 'remarkName': None,\n", " 'userId': 111179639,\n", " 'userType': 0,\n", " 'vipRights': {'associator': None,\n", " 'musicPackage': {'rights': True, 'vipCode': 220},\n", " 'redVipAnnualCount': -1},\n", " 'vipType': 10}}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cj['comments'][0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## 翻页的实现\n", "\n", "limit是一页的数量,offset往后的偏移。\n", "- 比如limit是20,offset是40,就展示第三页的\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=0\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=20\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=40" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 另外一种方法" ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:38:38.262593Z", "start_time": "2018-05-10T04:38:38.133110Z" } }, "outputs": [], "source": [ "from Crypto.Cipher import AES\n", "import base64\n", "import requests\n", "import json\n", "import time\n", "\n", "# headers\n", "headers = {\n", " 'Host': 'music.163.com',\n", " 'Connection': 'keep-alive',\n", " 'Content-Length': '484',\n", " 'Cache-Control': 'max-age=0',\n", " 'Origin': 'http://music.163.com',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',\n", " 'Content-Type': 'application/x-www-form-urlencoded',\n", " 'Accept': '*/*',\n", " 'DNT': '1',\n", " 'Accept-Encoding': 'gzip, deflate',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4',\n", " 'Cookie': 'JSESSIONID-WYYY=b66d89ed74ae9e94ead89b16e475556e763dd34f95e6ca357d06830a210abc7b685e82318b9d1d5b52ac4f4b9a55024c7a34024fddaee852404ed410933db994dcc0e398f61e670bfeea81105cbe098294e39ac566e1d5aa7232df741870ba1fe96e5cede8372ca587275d35c1a5d1b23a11e274a4c249afba03e20fa2dafb7a16eebdf6%3A1476373826753; _iuqxldmzr_=25; _ntes_nnid=7fa73e96706f26f3ada99abba6c4a6b2,1476372027128; _ntes_nuid=7fa73e96706f26f3ada99abba6c4a6b2; __utma=94650624.748605760.1476372027.1476372027.1476372027.1; __utmb=94650624.4.10.1476372027; __utmc=94650624; __utmz=94650624.1476372027.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", "}\n", "\n", "\n", "\n", "#获取params\n", "def get_params(first_param, forth_param):\n", " iv = \"0102030405060708\"\n", " first_key = forth_param\n", " second_key = 16 * 'F'\n", " h_encText = AES_encrypt(first_param, first_key.encode(), iv.encode())\n", " h_encText = AES_encrypt(h_encText.decode(), second_key.encode(), iv.encode())\n", " return h_encText.decode()\n", "\n", "\n", "# 获取encSecKey\n", "def get_encSecKey():\n", " encSecKey = \"257348aecb5e556c066de214e531faadd1c55d814f9be95fd06d6bff9f4c7a41f831f6394d5a3fd2e3881736d94a02ca919d952872e7d0a50ebfa1769a7a62d512f5f1ca21aec60bc3819a9c3ffca5eca9a0dba6d6f7249b06f5965ecfff3695b54e1c28f3f624750ed39e7de08fc8493242e26dbc4484a01c76f739e135637c\"\n", " return encSecKey\n", "\n", "\n", "# 解AES秘\n", "def AES_encrypt(text, key, iv):\n", " pad = 16 - len(text) % 16\n", " text = text + pad * chr(pad)\n", " encryptor = AES.new(key, AES.MODE_CBC, iv)\n", " encrypt_text = encryptor.encrypt(text.encode())\n", " encrypt_text = base64.b64encode(encrypt_text)\n", " return encrypt_text\n", "\n", "\n", "# 获取json数据\n", "def get_json(url, data):\n", " response = requests.post(url, headers=headers, data=data)\n", " return response.content\n", "\n", "\n", "# 传入post数据\n", "def crypt_api(id, offset):\n", " url = \"http://music.163.com/weapi/v1/resource/comments/R_SO_4_%s/?csrf_token=\" % id\n", " first_param = \"{rid:\\\"\\\", offset:\\\"%s\\\", total:\\\"true\\\", limit:\\\"20\\\", csrf_token:\\\"\\\"}\" % offset\n", " forth_param = \"0CoJUm6Qyw8W8jud\"\n", " params = get_params(first_param, forth_param)\n", " encSecKey = get_encSecKey()\n", " data = {\n", " \"params\": params,\n", " \"encSecKey\": encSecKey\n", " }\n", " return url, data\n", "\n" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:55.356484Z", "start_time": "2018-05-10T04:41:55.251451Z" } }, "outputs": [ { "data": { "text/plain": [ "8054" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 0\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "comments_sum" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:56.302874Z", "start_time": "2018-05-10T04:41:56.298243Z" } }, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(json_dict['comments'])" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:57.399920Z", "start_time": "2018-05-10T04:41:57.394647Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1112523641,\n", " 'content': '喜欢双笙,喜欢这首歌',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 1,\n", " 'pendantData': None,\n", " 'time': 1525904882188,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/Eklu6D8QoR1Hb5UhLhCzPw==/109951163288324813.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '狂妄嘻嘻',\n", " 'remarkName': None,\n", " 'userId': 1451756393,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:42:02.080722Z", "start_time": "2018-05-10T04:42:02.074947Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [{'content': '我们历史老师是一个年轻的小伙子。那是个阳光明媚的中午,他拖堂拖了很久,喇叭里响起了学校广播“校园之声”的开场白,接着就是这首歌。老师听到这首歌前奏后,自以为是地说一定是播音员自己唱的。我们都在下面反驳他,说人家歌就是这样的。。\\n而现在,距中考只有58天了,毕业后,就回不去了。',\n", " 'status': 0,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/gm976KYbWTvYvExzjBNeaw==/109951163217371336.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '惴洛',\n", " 'remarkName': None,\n", " 'userId': 1325932231,\n", " 'userType': 0,\n", " 'vipType': 0}}],\n", " 'commentId': 1112261542,\n", " 'content': '还有不到一个月了高一学姐祝你考试加油哦',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 0,\n", " 'pendantData': None,\n", " 'time': 1525876023865,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/kAuCCkW-fcC7yu4wix9z5Q==/109951163144186242.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '土园yy',\n", " 'remarkName': None,\n", " 'userId': 275653796,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dict['comments'][4]" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:40:42.433766Z", "start_time": "2018-05-10T04:40:42.284057Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1107837178,\n", " 'content': '冥月声音好听好温柔[爱心]表白',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 3,\n", " 'pendantData': None,\n", " 'time': 1525515089450,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/suhvzXk2pEUOaeHUPU0aQQ==/109951163173870029.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '黴祇',\n", " 'remarkName': None,\n", " 'userId': 619018018,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 20\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:02.854104Z", "start_time": "2018-05-10T04:41:02.771941Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1102303635,\n", " 'content': '找这首歌找了好久了!!无厘头的找,今天无意居然听到了(*^▽^)/★*☆',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 1,\n", " 'pendantData': None,\n", " 'time': 1525072647936,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/fU8tvMVN2f5WkSUZehQ21Q==/3274345636764863.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '黎诺0',\n", " 'remarkName': None,\n", " 'userId': 129375977,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 40\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T06:52:10.629887Z", "start_time": "2018-05-10T06:52:10.623599Z" } }, "outputs": [ { "data": { "text/plain": [ "361.49312377210214" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "800/1018*460 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [conda env:anaconda]", "language": "python", "name": "conda-env-anaconda-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }