{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "https://github.com/RitterHou/music-163\n", "\n", "爬取网易云音乐的所有的歌曲的评论数。以下为主要思路:\n", "\n", "爬取所有的歌手信息(artists.py);\n", "根据上一步爬取到的歌手信息去爬取所有的专辑信息(album_by _artist.py);\n", "根据专辑信息爬取所有的歌曲信息(music_by _album.py);\n", "根据歌曲信息爬取其评论条数(comments_by _music.py)\n", "数据库相关的语句都存放于(sql.py)中。" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-10T03:06:06.662665Z", "start_time": "2018-05-10T03:06:06.655299Z" } }, "source": [ "观察网易云音乐官网页面HTML结构\n", "- 首页( http://music.163.com/)\n", "- 歌单分类页( http://music.163.com/discover/playlist)。\n", "- 歌单页( http://music.163.com/playlist?id=499518394)\n", "- 歌曲详情页( http://music.163.com/song?id=109998)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:24:39.823122Z", "start_time": "2019-01-29T03:24:39.635100Z" } }, "outputs": [], "source": [ "import requests\n", "from bs4 import BeautifulSoup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:24:40.356617Z", "start_time": "2019-01-29T03:24:40.350443Z" } }, "outputs": [], "source": [ "headers = {\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", " 'Accept-Encoding': 'gzip, deflate, sdch',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',\n", " 'Cache-Control': 'no-cache',\n", " 'Connection': 'keep-alive',\n", " 'Cookie': '_ntes_nnid=7eced19b27ffae35dad3f8f2bf5885cd,1476521011210; _ntes_nuid=7eced19b27ffae35dad3f8f2bf5885cd; usertrack=c+5+hlgB7TgnsAmACnXtAg==; Province=025; City=025; NTES_PASSPORT=6n9ihXhbWKPi8yAqG.i2kETSCRa.ug06Txh8EMrrRsliVQXFV_orx5HffqhQjuGHkNQrLOIRLLotGohL9s10wcYSPiQfI2wiPacKlJ3nYAXgM; P_INFO=hourui93@163.com|1476523293|1|study|11&12|jis&1476511733&mail163#jis&320100#10#0#0|151889&0|g37_client_check&mailsettings&mail163&study&blog|hourui93@163.com; NTES_SESS=Fa2uk.YZsGoj59AgD6tRjTXGaJ8_1_4YvGfXUkS7C1NwtMe.tG1Vzr255TXM6yj2mKqTZzqFtoEKQrgewi9ZK60ylIqq5puaG6QIaNQ7EK5MTcRgHLOhqttDHfaI_vsBzB4bibfamzx1.fhlpqZh_FcnXUYQFw5F5KIBUmGJg7xdasvGf_EgfICWV; S_INFO=1476597594|1|0&80##|hourui93; NETEASE_AUTH_SOURCE=space; NETEASE_AUTH_USERNAME=hourui93; _ga=GA1.2.1405085820.1476521280; JSESSIONID-WYYY=cbd082d2ce2cffbcd5c085d8bf565a95aee3173ddbbb00bfa270950f93f1d8bb4cb55a56a4049fa8c828373f630c78f4a43d6c3d252c4c44f44b098a9434a7d8fc110670a6e1e9af992c78092936b1e19351435ecff76a181993780035547fa5241a5afb96e8c665182d0d5b911663281967d675ff2658015887a94b3ee1575fa1956a5a%3A1476607977016; _iuqxldmzr_=25; __utma=94650624.1038096298.1476521011.1476595468.1476606177.8; __utmb=94650624.20.10.1476606177; __utmc=94650624; __utmz=94650624.1476521011.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", " 'DNT': '1',\n", " 'Host': 'music.163.com',\n", " 'Pragma': 'no-cache',\n", " 'Referer': 'http://music.163.com/',\n", " 'Upgrade-Insecure-Requests': '1',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 爬取所有的歌手信息(artists.py)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-10T05:06:03.732955Z", "start_time": "2018-05-10T05:06:03.727816Z" } }, "source": [ "\n", "\n", "http://music.163.com/#/discover/artist/cat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://music.163.com/#/discover/artist/cat?id=4003&initial=0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:25:32.796945Z", "start_time": "2019-01-29T03:25:32.445561Z" } }, "outputs": [], "source": [ "group_id = 4003\n", "initial = 0\n", "params = {'id': group_id, 'initial': initial}\n", "r = requests.get('http://music.163.com/discover/artist/cat', params=params, headers=headers)\n", "\n", "# 网页解析\n", "soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", "body = soup.body" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:25:33.555438Z", "start_time": "2019-01-29T03:25:33.530110Z" } }, "outputs": [], "source": [ "hotartist_dic = {}\n", "hot_artists = body.find_all('a', attrs={'class': 'msk'})\n", "for artist in hot_artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " hotartist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:25:34.261876Z", "start_time": "2019-01-29T03:25:34.236074Z" } }, "outputs": [], "source": [ "artist_dic = {}\n", "artists = body.find_all('a', attrs={'class': 'nm nm-icn f-thide s-fc0'})\n", "for artist in artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-01-29T03:25:34.889001Z", "start_time": "2019-01-29T03:25:34.882058Z" } }, "outputs": [ { "data": { "text/plain": [ "{'1046093': 'Laxmikant-Pyarelal',\n", " '1049361': 'ГРУППА ПИЦЦА',\n", " '106666': 'Дрыгва',\n", " '106719': 'คาราบาว',\n", " '106733': 'ДДТ',\n", " '106998': '25 hours',\n", " '1078416': 'บีไฟว์',\n", " '1083129': '2nd Room',\n", " '1142003': 'Cutie*',\n", " '1143051': 'SkyLights',\n", " '1159115': '-Sunny-Youth-',\n", " '1184013': 'ForceZL',\n", " '1194100': 'GOMAR\\xa0STUDIO',\n", " '1194110': 'ELYAR-PULAT',\n", " '12025447': 'Wukong Defunct',\n", " '12032215': '欧洲音厨',\n", " '12051100': '抛抛',\n", " '12070076': '惊奇海洋(Marvel ocean)',\n", " '1207047': 'eigenTunes亦听',\n", " '12071037': 'Weed',\n", " '1209010': '一个大G',\n", " '1211067': 'Quartz',\n", " '12111066': 'Husan',\n", " '12119141': '巅藏说唱团',\n", " '12122134': '365 DaBand',\n", " '12139012': 'Do\\xa0Shit',\n", " '12147367': 'Near\\xa0Death\\xa0Experience/病亟',\n", " '12147390': 'WEON MASHUP',\n", " '12172418': 'ARnPRo',\n", " '12172468': 'MelodyGarden',\n", " '12173248': 'DJ宋雨飞',\n", " '12174281': 'PSYKHON',\n", " '12185319': 'RTG',\n", " '12185381': '漏音器\\xa0The\\xa0Lowincher',\n", " '12194273': '六甲番',\n", " '12194760': '菩提集团',\n", " '12194976': 'BNC$BrandNewCohort',\n", " '12200872': 'SIH SHANDIIN HOOLAI',\n", " '12200958': 'NiXaN说唱组合',\n", " '12204600': '9596乐队',\n", " '12236432': 'Mzoce',\n", " '12237434': 'Wiener Sängerknaben',\n", " '12258195': '向洋乐团',\n", " '12259787': '4oot',\n", " '12270840': '准噶尔乐队',\n", " '12271466': '魔幻之声口琴重奏团',\n", " '12275027': 'LightCould',\n", " '12276318': 'Cheetah\\xa0Mobile Games',\n", " '12281709': 'RED',\n", " '12287113': 'GANGSAMOSA',\n", " '12288929': 'Elecrystal\\xa0Sound\\xa0Team',\n", " '12291554': 'H.M.Funk',\n", " '12317490': '廢兔Itsuki',\n", " '12324675': 'GumNam',\n", " '12357295': 'DAVID BORING',\n", " '12359012': 'ART\\xa0RAP',\n", " '12359215': '野狼乐队',\n", " '12373475': 'Goodbye\\xa0Honey\\xa0Boy',\n", " '12392015': '木野',\n", " '12394484': '布日德组合',\n", " '12420908': 'INVADE',\n", " '12424041': 'İmera',\n", " '12497408': 'Controls',\n", " '12511077': 'Dahlia\\xa0Rosea(玫瑰博士)',\n", " '12538438': 'Checkit',\n", " '12568487': '新疆Kelkvn说唱团体',\n", " '12580237': '增城捌贰陆大哥大保健娱乐有限公司',\n", " '12600335': '林如韵',\n", " '12610587': '千夜',\n", " '12641213': 'Poorman',\n", " '12641527': 'UMI NOISE',\n", " '12642917': 'AMTwo',\n", " '12648648': 'K.P.R.',\n", " '12768267': 'Waif\\xa0&\\xa0Zilch',\n", " '12814141': 'Unicorn_独角兽',\n", " '12897296': '444+222',\n", " '12924615': 'ShahMat',\n", " '12924892': 'K-AN',\n", " '12967340': 'The\\xa0Messy梅西合唱团',\n", " '13022785': '988 DJs',\n", " '13037523': 'Liberation',\n", " '13110785': 'Jangaa组合',\n", " '13288090': '范世坤吉他私塾',\n", " '13416134': '骏景小学合唱团',\n", " '13612241': 'designco',\n", " '13612402': '清华大学键盘队',\n", " '13683852': 'ZebraZebra',\n", " '13701254': 'SHIRAQ',\n", " '13793310': '贝叔',\n", " '13793901': '王雪山',\n", " '13908376': 'Anuradha',\n", " '14119844': 'HASG\\xa0POP',\n", " '14804020': 'Bana-X',\n", " '28186665': 'T&T',\n", " '30126361': '$uper旧街口',\n", " '30380343': 'Blast Way Z',\n", " '30644313': 'bltchigga',\n", " '818191': 'Мельница',\n", " '823055': '3.2.1',\n", " '942020': ' Johnyboy'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_dic" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:46:38.820519Z", "start_time": "2018-05-09T09:46:38.775710Z" } }, "outputs": [], "source": [ "def save_artist(group_id, initial, hot_artist_dic, artisti_dic):\n", " params = {'id': group_id, 'initial': initial}\n", " r = requests.get('http://music.163.com/discover/artist/cat', params=params)\n", "\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", "\n", " hot_artists = body.find_all('a', attrs={'class': 'msk'})\n", " artists = body.find_all('a', attrs={'class': 'nm nm-icn f-thide s-fc0'})\n", " for artist in hot_artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " hot_artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)\n", "\n", " for artist in artists:\n", " artist_id = artist['href'].replace('/artist?id=', '').strip()\n", " artist_name = artist['title'].replace('的音乐', '')\n", " try:\n", " artist_dic[artist_id] = artist_name\n", " except Exception as e:\n", " # 打印错误日志\n", " print(e)\n", " #return artist_dic, hot_artist_dic\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:47:35.944863Z", "start_time": "2018-05-09T09:47:35.536703Z" } }, "outputs": [], "source": [ "\n", "gg = 4003\n", "initial = 0\n", "artist_dic = {}\n", "hot_artist_dic = {} \n", "save_artist(gg, initial, hot_artist_dic, artist_dic )" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:47:41.542860Z", "start_time": "2018-05-09T09:47:41.536318Z" } }, "outputs": [ { "data": { "text/plain": [ "{'1046093': 'Laxmikant-Pyarelal',\n", " '1049361': 'ГРУППА ПИЦЦА',\n", " '1050038': 'Фактор-2',\n", " '106666': 'Дрыгва',\n", " '106672': 'Камаедзiца',\n", " '106719': 'คาราบาว',\n", " '106733': 'ДДТ',\n", " '106985': 'Харизма',\n", " '106997': '17:28',\n", " '106998': '25 hours',\n", " '1078416': 'บีไฟว์',\n", " '1083129': '2nd Room',\n", " '1143051': 'SkyLights',\n", " '1158110': 'ChinaDJRadio',\n", " '1160013': '一个人的宇宙',\n", " '12082153': 'Самое большое простое число ',\n", " '12139012': 'Do\\xa0Shit',\n", " '12191177': 'รวมศิลปิน Luster',\n", " '12194976': 'BNC$BrandNewCohort',\n", " '12200872': 'SIH SHANDIIN HOOLAI',\n", " '12204600': '9596乐队',\n", " '12215016': 'Groovy LIVE SYSU',\n", " '12424041': 'İmera',\n", " '12509248': 'Rabbit工作室',\n", " '12639444': 'NECROSADISTIC PUNISHMENT',\n", " '12641336': '南之南',\n", " '12699532': 'PDWN',\n", " '12788512': '拾旬乐',\n", " '12797604': '亥門',\n", " '12814141': 'Unicorn_official',\n", " '12867140': 'Xiao·Xin',\n", " '12897296': '444+222',\n", " '12897309': '蓝色妮可',\n", " '12924892': 'K-AN',\n", " '12943092': '堆填区',\n", " '12955345': 'HAKAN',\n", " '12967340': 'The\\xa0Messy梅西合唱团',\n", " '12974253': \"นา'กา\",\n", " '12975240': 'ไมโคร',\n", " '12977209': 'เจนนี่ VS คอรี่',\n", " '13006273': '演绎',\n", " '13011233': '5 สาวฝุ่นตลบ',\n", " '13012236': 'อินคา',\n", " '13022208': 'เบิร์ด & เสก',\n", " '13023092': 'บางแก้ว',\n", " '13037921': 'MANGOMusic',\n", " '13109480': 'Memetjan_Alim',\n", " '13110785': 'Jangaa组合',\n", " '13151437': 'Karmashsa Ansambli',\n", " '13222504': 'Линник',\n", " '13222529': 'Улиткас',\n", " '13226421': 'てん、',\n", " '13226728': 'สักวา',\n", " '13227224': \"'לוקץ\",\n", " '13284125': 'The\\xa0Singers\\xa0of\\xa0Lights',\n", " '13284192': 'Buzz Fridge',\n", " '13285024': 'NamelessTag無名標籤',\n", " '13429324': 'Олег Пунгин',\n", " '13430117': 'Конец фильма',\n", " '13430153': 'ГАЛЯЦЭНАГЕН',\n", " '13462069': 'Христина Соловій',\n", " '13464081': '#Холостячки',\n", " '13465021': 'שמעון ולוי',\n", " '13465093': 'שלמה ארצי',\n", " '13484124': 'จ่อย รวมมิตร',\n", " '13484209': 'Вика Курзова',\n", " '13484253': 'Фибры',\n", " '13485181': 'Леонид Руденко',\n", " '13485216': 'Шампанского Пожалуйста!',\n", " '13485217': 'Саша Ветер',\n", " '13485332': 'Сањар и Сам Я',\n", " '13486037': '\\u200bPura Mashankura',\n", " '13486232': 'מור ברנשטיין',\n", " '13497013': '2 Of Us',\n", " '13497129': 'Артём Угловский',\n", " '13498112': 'เซ็กซ์ ตลาดแตก',\n", " '13498157': 'עילי בוטנר',\n", " '13500036': 'อ้อมใจ มหาหิงค์',\n", " '13500833': 'האחיות כרקוקלי',\n", " '13683852': 'ZebraZebra',\n", " '13707392': 'מועדון הקצב של אביהו פנחסוב',\n", " '13790097': 'Major\\xa0Fifth',\n", " '13793613': 'NIIGELL\\xa0QGQII',\n", " '13793901': '王雪山',\n", " '13908376': 'Anuradha',\n", " '13911469': 'Yellow Duck!',\n", " '14100185': 'โอ๊ะโอ',\n", " '14101200': 'สี่โพดำ',\n", " '711224': \" C'mon Lennon\",\n", " '713076': ' Dewa Budjana',\n", " '753041': 'ידידיה וגבריאל בלחסן',\n", " '763093': ' Пионерлагерь Пыльная Радуга',\n", " '784048': 'Чайф',\n", " '794269': 'ОбщежитиЕ ',\n", " '818191': 'Мельница',\n", " '823055': '3.2.1',\n", " '827392': 'Джинсовые мальчики',\n", " '908212': 'Психея ',\n", " '942020': ' Johnyboy',\n", " '957151': '██████'}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_dic" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:48:23.190957Z", "start_time": "2018-05-09T09:48:12.070714Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "65\n", "66\n", "67\n", "68\n", "69\n", "70\n", "71\n", "72\n", "73\n", "74\n", "75\n", "76\n", "77\n", "78\n", "79\n", "80\n", "81\n", "82\n", "83\n", "84\n", "85\n", "86\n", "87\n", "88\n", "89\n", "90\n" ] } ], "source": [ "artist_dic = {}\n", "hot_artist_dic = {} \n", "for i in range(65, 91):\n", " print(i)\n", " save_artist(gg, i, hot_artist_dic, artist_dic )" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:49:30.888189Z", "start_time": "2018-05-09T09:49:30.884325Z" } }, "outputs": [ { "data": { "text/plain": [ "254" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(hot_artist_dic)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T09:49:37.647699Z", "start_time": "2018-05-09T09:49:37.644057Z" } }, "outputs": [ { "data": { "text/plain": [ "1608" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(artist_dic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 爬取所有的专辑信息(album_by _artist.py)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:15:13.091212Z", "start_time": "2018-05-09T10:15:13.086629Z" } }, "outputs": [ { "data": { "text/plain": [ "'89659'" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hot_artist_dic.keys())[0]" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:11:23.224190Z", "start_time": "2018-05-09T10:11:23.220284Z" } }, "source": [ "http://music.163.com/#/artist/album?id=89659&limit=400" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:15:11.922060Z", "start_time": "2018-05-09T10:15:11.895374Z" } }, "outputs": [], "source": [ "headers = {\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", " 'Accept-Encoding': 'gzip, deflate, sdch',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',\n", " 'Cache-Control': 'no-cache',\n", " 'Connection': 'keep-alive',\n", " 'Cookie': '_ntes_nnid=7eced19b27ffae35dad3f8f2bf5885cd,1476521011210; _ntes_nuid=7eced19b27ffae35dad3f8f2bf5885cd; usertrack=c+5+hlgB7TgnsAmACnXtAg==; Province=025; City=025; _ga=GA1.2.1405085820.1476521280; NTES_PASSPORT=6n9ihXhbWKPi8yAqG.i2kETSCRa.ug06Txh8EMrrRsliVQXFV_orx5HffqhQjuGHkNQrLOIRLLotGohL9s10wcYSPiQfI2wiPacKlJ3nYAXgM; P_INFO=hourui93@163.com|1476523293|1|study|11&12|jis&1476511733&mail163#jis&320100#10#0#0|151889&0|g37_client_check&mailsettings&mail163&study&blog|hourui93@163.com; JSESSIONID-WYYY=189f31767098c3bd9d03d9b968c065daf43cbd4c1596732e4dcb471beafe2bf0605b85e969f92600064a977e0b64a24f0af7894ca898b696bd58ad5f39c8fce821ec2f81f826ea967215de4d10469e9bd672e75d25f116a9d309d360582a79620b250625859bc039161c78ab125a1e9bf5d291f6d4e4da30574ccd6bbab70b710e3f358f%3A1476594130342; _iuqxldmzr_=25; __utma=94650624.1038096298.1476521011.1476588849.1476592408.6; __utmb=94650624.11.10.1476592408; __utmc=94650624; __utmz=94650624.1476521011.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", " 'DNT': '1',\n", " 'Host': 'music.163.com',\n", " 'Pragma': 'no-cache',\n", " 'Referer': 'http://music.163.com/',\n", " 'Upgrade-Insecure-Requests': '1',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'\n", "}\n", "\n", "def save_albums(artist_id, albume_dic):\n", " params = {'id': artist_id, 'limit': '200'}\n", " # 获取歌手个人主页\n", " r = requests.get('http://music.163.com/artist/album', headers=headers, params=params)\n", "\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", "\n", " albums = body.find_all('a', attrs={'class': 'tit s-fc0'}) # 获取所有专辑\n", "\n", " for album in albums:\n", " albume_id = album['href'].replace('/album?id=', '')\n", " albume_dic[albume_id] = artist_id" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:15:14.315285Z", "start_time": "2018-05-09T10:15:13.854338Z" } }, "outputs": [], "source": [ "albume_dic = {}\n", "save_albums('89659', albume_dic)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:15:14.874253Z", "start_time": "2018-05-09T10:15:14.869741Z" } }, "outputs": [ { "data": { "text/plain": [ "{'2903111': '89659',\n", " '37104113': '89659',\n", " '37104284': '89659',\n", " '37104348': '89659',\n", " '37104857': '89659',\n", " '37110077': '89659',\n", " '37110141': '89659',\n", " '37110256': '89659',\n", " '37110395': '89659',\n", " '37110462': '89659',\n", " '37110655': '89659',\n", " '37110751': '89659',\n", " '37110871': '89659',\n", " '37934081': '89659',\n", " '37934219': '89659'}" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "albume_dic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 根据专辑信息爬取所有的歌曲信息(music_by _album.py)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:17:40.515322Z", "start_time": "2018-05-09T10:17:40.484723Z" } }, "outputs": [], "source": [ "headers = {\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", " 'Accept-Encoding': 'gzip, deflate, sdch',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',\n", " 'Cache-Control': 'no-cache',\n", " 'Connection': 'keep-alive',\n", " 'Cookie': '_ntes_nnid=7eced19b27ffae35dad3f8f2bf5885cd,1476521011210; _ntes_nuid=7eced19b27ffae35dad3f8f2bf5885cd; usertrack=c+5+hlgB7TgnsAmACnXtAg==; Province=025; City=025; NTES_PASSPORT=6n9ihXhbWKPi8yAqG.i2kETSCRa.ug06Txh8EMrrRsliVQXFV_orx5HffqhQjuGHkNQrLOIRLLotGohL9s10wcYSPiQfI2wiPacKlJ3nYAXgM; P_INFO=hourui93@163.com|1476523293|1|study|11&12|jis&1476511733&mail163#jis&320100#10#0#0|151889&0|g37_client_check&mailsettings&mail163&study&blog|hourui93@163.com; _ga=GA1.2.1405085820.1476521280; JSESSIONID-WYYY=fb5288e1c5f667324f1636d020704cab2f27ee915622b114f89027cbf60c38be2af6b9cbef2223c1f2581e3502f11b86efd60891d6f61b6f783c0d55114f8269fa801df7352f5cc4c8259876e563a6bd0212b504a8997723a0593b21d5b3d9076d4fa38c098be68e3c5d36d342e4a8e40c1f73378cec0b5851bd8a628886edbdd23a7093%3A1476623819662; _iuqxldmzr_=25; __utma=94650624.1038096298.1476521011.1476610320.1476622020.10; __utmb=94650624.14.10.1476622020; __utmc=94650624; __utmz=94650624.1476521011.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", " 'DNT': '1',\n", " 'Host': 'music.163.com',\n", " 'Pragma': 'no-cache',\n", " 'Referer': 'http://music.163.com/',\n", " 'Upgrade-Insecure-Requests': '1',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'\n", " }\n", "\n", "def save_music(album_id, music_dic):\n", " params = {'id': album_id}\n", " # 获取专辑对应的页面\n", " r = requests.get('http://music.163.com/album', headers=headers, params=params)\n", "\n", " # 网页解析\n", " soup = BeautifulSoup(r.content.decode(), 'html.parser')\n", " body = soup.body\n", "\n", " musics = body.find('ul', attrs={'class': 'f-hide'}).find_all('li') # 获取专辑的所有音乐\n", "\n", " for music in musics:\n", " music = music.find('a')\n", " music_id = music['href'].replace('/song?id=', '')\n", " music_name = music.getText()\n", " music_dic[music_id] = [music_name, album_id]" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:17:59.692873Z", "start_time": "2018-05-09T10:17:59.688003Z" } }, "outputs": [ { "data": { "text/plain": [ "'37110871'" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(albume_dic.keys())[0]" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:18:27.259075Z", "start_time": "2018-05-09T10:18:26.472857Z" } }, "outputs": [], "source": [ "music_dic = {}\n", "save_music('37110871', music_dic)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T10:18:31.313504Z", "start_time": "2018-05-09T10:18:31.308368Z" } }, "outputs": [ { "data": { "text/plain": [ "{'527013176': ['อย่ารักใครข้างเดียว', '37110871'],\n", " '527013177': ['จะไม่รับปาก', '37110871'],\n", " '527013178': ['เจ้าหญิงนิทรา', '37110871'],\n", " '527013179': ['หุ่นกระป๋อง', '37110871'],\n", " '527013180': ['เธอจะอยู่กับฉันตลอดไป', '37110871'],\n", " '527013181': ['เมืองคนเหล็ก', '37110871'],\n", " '527013182': ['เพลงผีเสื้อ', '37110871'],\n", " '527013183': ['วังวน', '37110871'],\n", " '527013184': ['ปฏิเสธรัก', '37110871'],\n", " '527013185': ['ชีวิต มิตรภาพ ความรัก', '37110871'],\n", " '527013186': ['ปฏิเสธไม่ได้ว่ารักเธอ Feat. แบงค์ แคลช', '37110871'],\n", " '527013187': ['เพลงผีเสื้อ', '37110871'],\n", " '527013188': ['ชีวิต มิตรภาพ ความรัก Concert', '37110871']}" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "music_dic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 根据歌曲信息爬取其评论条数(comments_by _music.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://music.163.com/#/song?id=516997458\n", "\n", "\n", "很遗憾的是评论数虽然也在详情页内,但是网易云音乐做了防爬处理,\n", "- 采用AJAX调用评论数API的方式填充评论相关数据,\n", "- 异步的特性导致我们爬到的页面中评论数是空,\n", "\n", "我们就找一找这个API吧,通关观察XHR请求发现是下面这个家伙..\n", "\n", "响应结果很丰富呢,所有评论相关的数据都有,不过经过观察发现这个API是经过加密处理的,不过没关系...\n", "\n", "https://blog.csdn.net/python233/article/details/72825003\n", "\n", "https://www.zhihu.com/question/36081767\n" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:51:47.675804Z", "start_time": "2018-05-10T04:51:47.665466Z" } }, "outputs": [], "source": [ "headers = {\n", " 'Host': 'music.163.com',\n", " 'Connection': 'keep-alive',\n", " 'Content-Length': '484',\n", " 'Cache-Control': 'max-age=0',\n", " 'Origin': 'http://music.163.com',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',\n", " 'Content-Type': 'application/x-www-form-urlencoded',\n", " 'Accept': '*/*',\n", " 'DNT': '1',\n", " 'Accept-Encoding': 'gzip, deflate',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4',\n", " 'Cookie': 'JSESSIONID-WYYY=b66d89ed74ae9e94ead89b16e475556e763dd34f95e6ca357d06830a210abc7b685e82318b9d1d5b52ac4f4b9a55024c7a34024fddaee852404ed410933db994dcc0e398f61e670bfeea81105cbe098294e39ac566e1d5aa7232df741870ba1fe96e5cede8372ca587275d35c1a5d1b23a11e274a4c249afba03e20fa2dafb7a16eebdf6%3A1476373826753; _iuqxldmzr_=25; _ntes_nnid=7fa73e96706f26f3ada99abba6c4a6b2,1476372027128; _ntes_nuid=7fa73e96706f26f3ada99abba6c4a6b2; __utma=94650624.748605760.1476372027.1476372027.1476372027.1; __utmb=94650624.4.10.1476372027; __utmc=94650624; __utmz=94650624.1476372027.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", "}\n", "\n", "params = {\n", " 'csrf_token': ''\n", "}\n", "\n", "data = {\n", " 'params': '5L+s/X1qDy33tb2sjT6to2T4oxv89Fjg1aYRkjgzpNPR6hgCpp0YVjNoTLQAwWu9VYvKROPZQj6qTpBK+sUeJovyNHsnU9/StEfZwCOcKfECFFtAvoNIpulj1TDOtBir',\n", " 'encSecKey': '59079f3e07d6e240410018dc871bf9364f122b720c0735837d7916ac78d48a79ec06c6307e6a0e576605d6228bd0b377a96e1a7fc7c7ddc8f6a3dc6cc50746933352d4ec5cbe7bddd6dcb94de085a3b408d895ebfdf2f43a7c72fc783512b3c9efb860679a88ef21ccec5ff13592be450a1edebf981c0bf779b122ddbd825492'\n", " \n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 翻页的实现\n", "\n", "limit是一页的数量,offset往后的偏移。\n", "- 比如limit是20,offset是40,就展示第三页的\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=0\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=20\n", "\n", "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=40" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:51:34.796512Z", "start_time": "2018-05-10T04:51:34.793490Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://music.163.com/api/v1/resource/comments/R_SO_4_516997458?limit=20&offset=0\n" ] } ], "source": [ "print(url)" ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:53:14.394201Z", "start_time": "2018-05-10T04:53:14.212965Z" } }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['total', 'topComments', 'hotComments', 'moreHot', 'more', 'userId', 'code', 'isMusician', 'comments'])" ] }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 0\n", "music_id = '516997458'\n", "url = 'http://music.163.com/api/v1/resource/comments/R_SO_4_'+ music_id + '?limit=20&offset=' + str(offset)\n", "response = requests.post(url, headers=headers, data=data)\n", "cj = response.json()\n", "cj.keys()" ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:53:15.538029Z", "start_time": "2018-05-10T04:53:15.533106Z" } }, "outputs": [ { "data": { "text/plain": [ "(8054, 20, 15, 0)" ] }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cj['total'],len(cj['comments']), len(cj['hotComments']), len(cj['topComments'])" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:53:22.211876Z", "start_time": "2018-05-10T04:53:22.207477Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1112523641,\n", " 'content': '喜欢双笙,喜欢这首歌',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 1,\n", " 'pendantData': None,\n", " 'time': 1525904882188,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/Eklu6D8QoR1Hb5UhLhCzPw==/109951163288324813.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '狂妄嘻嘻',\n", " 'remarkName': None,\n", " 'userId': 1451756393,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cj['comments'][0]" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:52:27.602284Z", "start_time": "2018-05-10T04:52:27.495409Z" } }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['total', 'topComments', 'more', 'userId', 'isMusician', 'code', 'comments'])" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 20\n", "music_id = '516997458'\n", "url = 'http://music.163.com/api/v1/resource/comments/R_SO_4_'+ music_id + '?limit=20&offset=' + str(offset)\n", "response = requests.post(url, headers=headers, data=data)\n", "cj = response.json()\n", "cj.keys()" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:52:36.121937Z", "start_time": "2018-05-10T04:52:36.117906Z" } }, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 161, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(cj['comments'])" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:52:51.454697Z", "start_time": "2018-05-10T04:52:51.448988Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1107837178,\n", " 'content': '冥月声音好听好温柔[爱心]表白',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 3,\n", " 'pendantData': None,\n", " 'time': 1525515089450,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/suhvzXk2pEUOaeHUPU0aQQ==/109951163173870029.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '黴祇',\n", " 'remarkName': None,\n", " 'userId': 619018018,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cj['comments'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 另外一种方法" ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:38:38.262593Z", "start_time": "2018-05-10T04:38:38.133110Z" } }, "outputs": [], "source": [ "from Crypto.Cipher import AES\n", "import base64\n", "import requests\n", "import json\n", "import time\n", "\n", "# headers\n", "headers = {\n", " 'Host': 'music.163.com',\n", " 'Connection': 'keep-alive',\n", " 'Content-Length': '484',\n", " 'Cache-Control': 'max-age=0',\n", " 'Origin': 'http://music.163.com',\n", " 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',\n", " 'Content-Type': 'application/x-www-form-urlencoded',\n", " 'Accept': '*/*',\n", " 'DNT': '1',\n", " 'Accept-Encoding': 'gzip, deflate',\n", " 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4',\n", " 'Cookie': 'JSESSIONID-WYYY=b66d89ed74ae9e94ead89b16e475556e763dd34f95e6ca357d06830a210abc7b685e82318b9d1d5b52ac4f4b9a55024c7a34024fddaee852404ed410933db994dcc0e398f61e670bfeea81105cbe098294e39ac566e1d5aa7232df741870ba1fe96e5cede8372ca587275d35c1a5d1b23a11e274a4c249afba03e20fa2dafb7a16eebdf6%3A1476373826753; _iuqxldmzr_=25; _ntes_nnid=7fa73e96706f26f3ada99abba6c4a6b2,1476372027128; _ntes_nuid=7fa73e96706f26f3ada99abba6c4a6b2; __utma=94650624.748605760.1476372027.1476372027.1476372027.1; __utmb=94650624.4.10.1476372027; __utmc=94650624; __utmz=94650624.1476372027.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',\n", "}\n", "\n", "\n", "\n", "#获取params\n", "def get_params(first_param, forth_param):\n", " iv = \"0102030405060708\"\n", " first_key = forth_param\n", " second_key = 16 * 'F'\n", " h_encText = AES_encrypt(first_param, first_key.encode(), iv.encode())\n", " h_encText = AES_encrypt(h_encText.decode(), second_key.encode(), iv.encode())\n", " return h_encText.decode()\n", "\n", "\n", "# 获取encSecKey\n", "def get_encSecKey():\n", " encSecKey = \"257348aecb5e556c066de214e531faadd1c55d814f9be95fd06d6bff9f4c7a41f831f6394d5a3fd2e3881736d94a02ca919d952872e7d0a50ebfa1769a7a62d512f5f1ca21aec60bc3819a9c3ffca5eca9a0dba6d6f7249b06f5965ecfff3695b54e1c28f3f624750ed39e7de08fc8493242e26dbc4484a01c76f739e135637c\"\n", " return encSecKey\n", "\n", "\n", "# 解AES秘\n", "def AES_encrypt(text, key, iv):\n", " pad = 16 - len(text) % 16\n", " text = text + pad * chr(pad)\n", " encryptor = AES.new(key, AES.MODE_CBC, iv)\n", " encrypt_text = encryptor.encrypt(text.encode())\n", " encrypt_text = base64.b64encode(encrypt_text)\n", " return encrypt_text\n", "\n", "\n", "# 获取json数据\n", "def get_json(url, data):\n", " response = requests.post(url, headers=headers, data=data)\n", " return response.content\n", "\n", "\n", "# 传入post数据\n", "def crypt_api(id, offset):\n", " url = \"http://music.163.com/weapi/v1/resource/comments/R_SO_4_%s/?csrf_token=\" % id\n", " first_param = \"{rid:\\\"\\\", offset:\\\"%s\\\", total:\\\"true\\\", limit:\\\"20\\\", csrf_token:\\\"\\\"}\" % offset\n", " forth_param = \"0CoJUm6Qyw8W8jud\"\n", " params = get_params(first_param, forth_param)\n", " encSecKey = get_encSecKey()\n", " data = {\n", " \"params\": params,\n", " \"encSecKey\": encSecKey\n", " }\n", " return url, data\n", "\n" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:55.356484Z", "start_time": "2018-05-10T04:41:55.251451Z" } }, "outputs": [ { "data": { "text/plain": [ "8054" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 0\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "comments_sum" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:56.302874Z", "start_time": "2018-05-10T04:41:56.298243Z" } }, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(json_dict['comments'])" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:57.399920Z", "start_time": "2018-05-10T04:41:57.394647Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1112523641,\n", " 'content': '喜欢双笙,喜欢这首歌',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 1,\n", " 'pendantData': None,\n", " 'time': 1525904882188,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/Eklu6D8QoR1Hb5UhLhCzPw==/109951163288324813.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '狂妄嘻嘻',\n", " 'remarkName': None,\n", " 'userId': 1451756393,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:42:02.080722Z", "start_time": "2018-05-10T04:42:02.074947Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [{'content': '我们历史老师是一个年轻的小伙子。那是个阳光明媚的中午,他拖堂拖了很久,喇叭里响起了学校广播“校园之声”的开场白,接着就是这首歌。老师听到这首歌前奏后,自以为是地说一定是播音员自己唱的。我们都在下面反驳他,说人家歌就是这样的。。\\n而现在,距中考只有58天了,毕业后,就回不去了。',\n", " 'status': 0,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/gm976KYbWTvYvExzjBNeaw==/109951163217371336.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '惴洛',\n", " 'remarkName': None,\n", " 'userId': 1325932231,\n", " 'userType': 0,\n", " 'vipType': 0}}],\n", " 'commentId': 1112261542,\n", " 'content': '还有不到一个月了高一学姐祝你考试加油哦',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 0,\n", " 'pendantData': None,\n", " 'time': 1525876023865,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/kAuCCkW-fcC7yu4wix9z5Q==/109951163144186242.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '土园yy',\n", " 'remarkName': None,\n", " 'userId': 275653796,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_dict['comments'][4]" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:40:42.433766Z", "start_time": "2018-05-10T04:40:42.284057Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1107837178,\n", " 'content': '冥月声音好听好温柔[爱心]表白',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 3,\n", " 'pendantData': None,\n", " 'time': 1525515089450,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/suhvzXk2pEUOaeHUPU0aQQ==/109951163173870029.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '黴祇',\n", " 'remarkName': None,\n", " 'userId': 619018018,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 20\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T04:41:02.854104Z", "start_time": "2018-05-10T04:41:02.771941Z" } }, "outputs": [ { "data": { "text/plain": [ "{'beReplied': [],\n", " 'commentId': 1102303635,\n", " 'content': '找这首歌找了好久了!!无厘头的找,今天无意居然听到了(*^▽^)/★*☆',\n", " 'isRemoveHotComment': False,\n", " 'liked': False,\n", " 'likedCount': 1,\n", " 'pendantData': None,\n", " 'time': 1525072647936,\n", " 'user': {'authStatus': 0,\n", " 'avatarUrl': 'http://p1.music.126.net/fU8tvMVN2f5WkSUZehQ21Q==/3274345636764863.jpg',\n", " 'expertTags': None,\n", " 'experts': None,\n", " 'locationInfo': None,\n", " 'nickname': '黎诺0',\n", " 'remarkName': None,\n", " 'userId': 129375977,\n", " 'userType': 0,\n", " 'vipType': 0}}" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "offset = 40\n", "id = '516997458'\n", "url, data = crypt_api(id, offset)\n", "json_text = get_json(url, data)\n", "json_dict = json.loads(json_text.decode(\"utf-8\"))\n", "comments_sum = json_dict['total']\n", "json_dict['comments'][0]" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "ExecuteTime": { "end_time": "2018-05-10T06:52:10.629887Z", "start_time": "2018-05-10T06:52:10.623599Z" } }, "outputs": [ { "data": { "text/plain": [ "361.49312377210214" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "800/1018*460 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }