{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Profitable App Profiles for the App Store and Google Play Markets\n", "\n", "* We're developing apps for users, which are free to download and install. And our main source of revenue will be in-app advertising. \n", "* Since, our goal is to achieve maximum revenue from in-app advertising. We will be analysing market trends and most profitable free apps across various genres in both stores.\n", "\n", "> **DataSource & Documentation:**
\n", ">[PlayStore](https://www.kaggle.com/lava18/google-play-store-apps/home)
\n", ">[AppStore](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction to Datasets and Converting them into List of Lists" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "open_a_file = open('AppleStore.csv')\n", "open_p_file = open('googleplaystore.csv')\n", "\n", "from csv import reader\n", "\n", "read_a_file = reader(open_a_file)\n", "read_p_file = reader(open_p_file)\n", "\n", "dataset_a = list(read_a_file)\n", "dataset_p = list(read_p_file)\n", "\n", "ios_header = dataset_a[0]\n", "ios_data = dataset_a[1:]\n", "\n", "android_header = dataset_p[0]\n", "android_data = dataset_p[1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining a Method for easy exploration of the Datasets" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Apple Store\n", "\n", "['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']\n", "\n", "\n", "['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']\n", "\n", "\n", "['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']\n", "\n", "\n", "['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']\n", "\n", "\n", "['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']\n", "\n", "\n", "Number of rows: 7197\n", "Number of columns: 17\n" ] } ], "source": [ "def explore_data(dataset, start, end, rows_and_columns=False):\n", " dataset_slice = dataset[start:end] \n", " for row in dataset_slice:\n", " print(row)\n", " print('\\n') # adds a new (empty) line after each row\n", "\n", " if rows_and_columns:\n", " print('Number of rows:', len(dataset))\n", " print('Number of columns:', len(dataset[0]))\n", "\n", "print('Apple Store\\n')\n", "explore_data(ios_data, 0, 5, rows_and_columns=True)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Google Play Store\n", "\n", "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n", "\n", "\n", "['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']\n", "\n", "\n", "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n", "\n", "\n", "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n", "\n", "\n", "['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']\n", "\n", "\n", "Number of rows: 10841\n", "Number of columns: 13\n" ] } ], "source": [ "print('Google Play Store\\n')\n", "explore_data(android_data, 0, 5, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Column Names" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ios header \n", " ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] \n", "\n", "android header \n", " ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']\n" ] } ], "source": [ "print('ios header','\\n', ios_header, '\\n')\n", "# print('\\n')\n", "print('android header', '\\n', android_header)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deleting Wrong Data\n", "- From the discussion forum at [Kaggle](https://www.kaggle.com/lava18/google-play-store-apps/home). We can observe that one of the row (10472) has a missing column. We will remove it." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']\n" ] }, { "data": { "text/plain": [ "12" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(android_data[10472])\n", "len(android_data[10472])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of rows: 10840\n", "Number of columns: 13\n" ] } ], "source": [ "del android_data[10472]\n", "explore_data(android_data, 0, 0, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Removing duplicate Entries\n", "- Also, we can observe below that there are various apps having duplicate entries. For Example - Instagram. We would need to remove them." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n", "['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n", "['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n", "['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n" ] } ], "source": [ "for app in android_data:\n", " name = app[0]\n", " if name == 'Instagram':\n", " print(app) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Now we have to delete duplicate entries. But instead of removing them randomly, we can find a better way.\n", "- We can observe that the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times.\n", "- Hence, The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Step I - Get names of all the apps which have duplicate entries*" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1181\n", "['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']\n" ] } ], "source": [ "unique_apps = []\n", "duplicate_apps = []\n", "\n", "for app in android_data:\n", " name = app[0]\n", " if name in unique_apps:\n", " duplicate_apps.append(name)\n", " unique_apps.append(name)\n", " \n", "print(len(duplicate_apps))\n", "print(duplicate_apps[:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Step II - Make a dictionary having rows of apps which we want to keep.*" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9659" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews_max = {}\n", "\n", "for app in android_data:\n", " name = app[0]\n", " n_reviews = float(app[3])\n", " if (name in reviews_max) and (reviews_max[name] < n_reviews):\n", " reviews_max[name] = n_reviews\n", " if name not in reviews_max:\n", " reviews_max[name] = n_reviews\n", "\n", "len(reviews_max)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9659\n" ] } ], "source": [ "print(len(reviews_max))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Step III - Use dictionay created above to remove duplicate rows, and getting cleaned data in android_clean list*" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n", "\n", "\n", "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n", "\n", "\n", "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n", "\n", "\n", "Number of rows: 9659\n", "Number of columns: 13\n" ] } ], "source": [ "android_clean = []\n", "already_added = []\n", "\n", "for app in android_data:\n", " name = app[0]\n", " n_reviews = float(app[3])\n", " if (name not in already_added) and (n_reviews == reviews_max[name]):\n", " android_clean.append(app)\n", " already_added.append(name)\n", " \n", "explore_data(android_clean, 0, 3, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Removing Non-English Apps\n", "- We can observe that there are few Non-English apps, which we dont want. We will delete them as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Step I- We will define a function which filters out Non-English Strings*" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "False\n", "True\n", "True\n" ] } ], "source": [ "def english_or_not(a_string):\n", " count = 0\n", " for i in a_string:\n", " if (ord(i) > 127):\n", " count += 1\n", " if count >= 3:\n", " return False\n", " \n", " return True\n", "\n", "print(english_or_not('Instagram'))\n", "print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))\n", "print(english_or_not('Docs To Go™ Free Office Suite'))\n", "print(english_or_not('Instachat 😜'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Step II - Delete Non-English Apps using the above function**" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "english_a_apps = []\n", "english_ios_apps = []\n", "\n", "for app in android_clean:\n", " name = app[0]\n", " if english_or_not(name):\n", " english_a_apps.append(app)\n", " \n", "for app in ios_data:\n", " name = app[0]\n", " if english_or_not(name):\n", " english_ios_apps.append(app) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**English Android Apps**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n", "\n", "\n", "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n", "\n", "\n", "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n", "\n", "\n", "Number of rows: 9597\n", "Number of columns: 13\n" ] } ], "source": [ "explore_data(english_a_apps, 0, 3, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**English ios Apps**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']\n", "\n", "\n", "['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']\n", "\n", "\n", "['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']\n", "\n", "\n", "Number of rows: 7197\n", "Number of columns: 17\n" ] } ], "source": [ "explore_data(english_ios_apps, 0, 3, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Isolating the Free Apps\n", "- Now we will isolate the apps in both play store and AppStore which are Free using same methods as above." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "free_a_apps = []\n", "free_ios_apps = []\n", "\n", "for app in english_a_apps:\n", " price = app[7]\n", " if price =='0':\n", " free_a_apps.append(app)\n", " \n", "for app in english_ios_apps:\n", " price = app[5]\n", " if price == '0':\n", " free_ios_apps.append(app) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Free English Android Apps**" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n", "\n", "\n", "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n", "\n", "\n", "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n", "\n", "\n", "Number of rows: 8848\n", "Number of columns: 13\n" ] } ], "source": [ "explore_data(free_a_apps, 0, 3, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Free English ios Apps**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']\n", "\n", "\n", "['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']\n", "\n", "\n", "['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']\n", "\n", "\n", "Number of rows: 4056\n", "Number of columns: 17\n" ] } ], "source": [ "explore_data(free_ios_apps, 0, 3, rows_and_columns=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Most Commom Apps by Genre\n", "\n", "Our aim is to determine the kinds of apps that are likely to attract more users so that we can leverage the use of in-app advertising.\n", "\n", "To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:\n", "\n", "1. Build a minimal Android version of the app, and add it to Google Play.\n", "2. If the app has a good response from users, we then develop it further.\n", "3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.\n", "\n", "We will find app profiles that are successful in both the stores. And begin our analysis by getting a sense of the most common genres for each market.\n", "\n", "> **Play Store** - Two columns (`Category` and `Genres`) provide us an idea for the genres.
\n", "> **App Store** - One column (`prime_genre`) provides us an idea for the genres." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding Most Common Genres" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def freq_table(apps_list, index):\n", " genre_dict = {}\n", " genre_per = {}\n", " sorted_gen_per = []\n", " \n", " # Generating Frequency Distribution\n", " for app in apps_list:\n", " genre = app[index]\n", " \n", " if genre in genre_dict:\n", " genre_dict[genre] += 1\n", " else:\n", " genre_dict[genre] = 1\n", " \n", " # Sum of all Values\n", " sum_dict = sum(genre_dict.values()) \n", "\n", " # Generating Frequency Percentage Disrtibution\n", " for i in genre_dict:\n", " genre_per[i] = (genre_dict[i]/sum_dict)*100\n", " \n", " #Sorting the list of frequency percentages \n", " for w in sorted(genre_per, key = genre_per.get, reverse=True):\n", " sorted_gen_per.append((w, genre_per[w]))\n", " \n", " return sorted_gen_per\n", "\n", "genre_ios_apps = freq_table(free_ios_apps, -5)\n", "category_a_apps = freq_table(free_a_apps, 1)\n", "genre_a_apps = freq_table(free_a_apps, -4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Most Common Genres - iOS**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Games', 55.64595660749507),\n", " ('Entertainment', 8.234714003944774),\n", " ('Photo & Video', 4.117357001972387),\n", " ('Social Networking', 3.5256410256410255),\n", " ('Education', 3.2544378698224854),\n", " ('Shopping', 2.983234714003945),\n", " ('Utilities', 2.687376725838264),\n", " ('Lifestyle', 2.3175542406311638),\n", " ('Finance', 2.0710059171597637),\n", " ('Sports', 1.947731755424063),\n", " ('Health & Fitness', 1.8737672583826428),\n", " ('Music', 1.6518737672583828),\n", " ('Book', 1.6272189349112427),\n", " ('Productivity', 1.5285996055226825),\n", " ('News', 1.4299802761341223),\n", " ('Travel', 1.3806706114398422),\n", " ('Food & Drink', 1.0601577909270217),\n", " ('Weather', 0.7642998027613412),\n", " ('Reference', 0.4930966469428008),\n", " ('Business', 0.4930966469428008),\n", " ('Navigation', 0.4930966469428008),\n", " ('Catalogs', 0.22189349112426035),\n", " ('Medical', 0.19723865877712032)]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genre_ios_apps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With more then half (55%) share, Games is the most common genre.
And, even the next few top genres are also in fun segment namely Entertainment, Photo & Video, Social Networking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Most Common Genres - Android**
" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Tools', 8.44258589511754),\n", " ('Entertainment', 6.080470162748644),\n", " ('Education', 5.357142857142857),\n", " ('Business', 4.599909584086799),\n", " ('Productivity', 3.899186256781193),\n", " ('Lifestyle', 3.8765822784810124),\n", " ('Finance', 3.7070524412296564),\n", " ('Medical', 3.5375226039783),\n", " ('Sports', 3.4584086799276674),\n", " ('Personalization', 3.322784810126582),\n", " ('Communication', 3.2323688969258586),\n", " ('Action', 3.096745027124774),\n", " ('Health & Fitness', 3.0854430379746836),\n", " ('Photography', 2.949819168173599),\n", " ('News & Magazines', 2.802893309222423),\n", " ('Social', 2.667269439421338),\n", " ('Travel & Local', 2.328209764918626),\n", " ('Shopping', 2.2490958408679926),\n", " ('Books & Reference', 2.1360759493670884),\n", " ('Simulation', 2.0456600361663653),\n", " ('Dating', 1.8648282097649187),\n", " ('Arcade', 1.842224231464738),\n", " ('Video Players & Editors', 1.7744122965641953),\n", " ('Casual', 1.763110307414105),\n", " ('Maps & Navigation', 1.3901446654611211),\n", " ('Food & Drink', 1.2432188065099457),\n", " ('Puzzle', 1.1301989150090417),\n", " ('Racing', 0.9945750452079566),\n", " ('Libraries & Demo', 0.9380650994575045),\n", " ('Role Playing', 0.9380650994575045),\n", " ('Auto & Vehicles', 0.9267631103074141),\n", " ('Strategy', 0.9154611211573236),\n", " ('House & Home', 0.8024412296564195),\n", " ('Weather', 0.7911392405063291),\n", " ('Events', 0.7120253164556962),\n", " ('Adventure', 0.6668173598553345),\n", " ('Art & Design', 0.599005424954792),\n", " ('Beauty', 0.599005424954792),\n", " ('Comics', 0.599005424954792),\n", " ('Parenting', 0.4972875226039783),\n", " ('Card', 0.45207956600361665),\n", " ('Trivia', 0.4181735985533454),\n", " ('Casino', 0.4181735985533454),\n", " ('Educational;Education', 0.39556962025316456),\n", " ('Board', 0.3842676311030741),\n", " ('Educational', 0.3729656419529837),\n", " ('Education;Education', 0.33905967450271246),\n", " ('Word', 0.25994575045207957),\n", " ('Casual;Pretend Play', 0.23734177215189875),\n", " ('Music', 0.2034358047016275),\n", " ('Entertainment;Music & Video', 0.16952983725135623),\n", " ('Puzzle;Brain Games', 0.16952983725135623),\n", " ('Racing;Action & Adventure', 0.16952983725135623),\n", " ('Casual;Brain Games', 0.13562386980108498),\n", " ('Casual;Action & Adventure', 0.13562386980108498),\n", " ('Arcade;Action & Adventure', 0.12432188065099457),\n", " ('Action;Action & Adventure', 0.10171790235081375),\n", " ('Educational;Pretend Play', 0.09041591320072333),\n", " ('Entertainment;Brain Games', 0.07911392405063292),\n", " ('Simulation;Action & Adventure', 0.07911392405063292),\n", " ('Board;Brain Games', 0.07911392405063292),\n", " ('Parenting;Education', 0.07911392405063292),\n", " ('Art & Design;Creativity', 0.06781193490054249),\n", " ('Educational;Brain Games', 0.06781193490054249),\n", " ('Casual;Creativity', 0.06781193490054249),\n", " ('Parenting;Music & Video', 0.06781193490054249),\n", " ('Education;Pretend Play', 0.05650994575045208),\n", " ('Education;Creativity', 0.045207956600361664),\n", " ('Role Playing;Pretend Play', 0.045207956600361664),\n", " ('Education;Brain Games', 0.033905967450271246),\n", " ('Entertainment;Creativity', 0.033905967450271246),\n", " ('Educational;Creativity', 0.033905967450271246),\n", " ('Adventure;Action & Adventure', 0.033905967450271246),\n", " ('Role Playing;Action & Adventure', 0.033905967450271246),\n", " ('Educational;Action & Adventure', 0.033905967450271246),\n", " ('Entertainment;Action & Adventure', 0.033905967450271246),\n", " ('Puzzle;Action & Adventure', 0.033905967450271246),\n", " ('Education;Action & Adventure', 0.033905967450271246),\n", " ('Education;Music & Video', 0.033905967450271246),\n", " ('Casual;Education', 0.022603978300180832),\n", " ('Music;Music & Video', 0.022603978300180832),\n", " ('Simulation;Pretend Play', 0.022603978300180832),\n", " ('Puzzle;Creativity', 0.022603978300180832),\n", " ('Sports;Action & Adventure', 0.022603978300180832),\n", " ('Board;Action & Adventure', 0.022603978300180832),\n", " ('Entertainment;Pretend Play', 0.022603978300180832),\n", " ('Video Players & Editors;Music & Video', 0.022603978300180832),\n", " ('Comics;Creativity', 0.011301989150090416),\n", " ('Lifestyle;Pretend Play', 0.011301989150090416),\n", " ('Art & Design;Pretend Play', 0.011301989150090416),\n", " ('Entertainment;Education', 0.011301989150090416),\n", " ('Arcade;Pretend Play', 0.011301989150090416),\n", " ('Art & Design;Action & Adventure', 0.011301989150090416),\n", " ('Strategy;Action & Adventure', 0.011301989150090416),\n", " ('Music & Audio;Music & Video', 0.011301989150090416),\n", " ('Health & Fitness;Education', 0.011301989150090416),\n", " ('Casual;Music & Video', 0.011301989150090416),\n", " ('Travel & Local;Action & Adventure', 0.011301989150090416),\n", " ('Tools;Education', 0.011301989150090416),\n", " ('Parenting;Brain Games', 0.011301989150090416),\n", " ('Video Players & Editors;Creativity', 0.011301989150090416),\n", " ('Health & Fitness;Action & Adventure', 0.011301989150090416),\n", " ('Trivia;Education', 0.011301989150090416),\n", " ('Lifestyle;Education', 0.011301989150090416),\n", " ('Card;Action & Adventure', 0.011301989150090416),\n", " ('Books & Reference;Education', 0.011301989150090416),\n", " ('Simulation;Education', 0.011301989150090416),\n", " ('Puzzle;Education', 0.011301989150090416),\n", " ('Adventure;Education', 0.011301989150090416),\n", " ('Role Playing;Brain Games', 0.011301989150090416),\n", " ('Strategy;Education', 0.011301989150090416),\n", " ('Racing;Pretend Play', 0.011301989150090416),\n", " ('Communication;Creativity', 0.011301989150090416),\n", " ('Strategy;Creativity', 0.011301989150090416)]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genre_a_apps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Most Common Categories - Android**" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('FAMILY', 18.942133815551536),\n", " ('GAME', 9.697106690777577),\n", " ('TOOLS', 8.453887884267631),\n", " ('BUSINESS', 4.599909584086799),\n", " ('PRODUCTIVITY', 3.899186256781193),\n", " ('LIFESTYLE', 3.887884267631103),\n", " ('FINANCE', 3.7070524412296564),\n", " ('MEDICAL', 3.5375226039783),\n", " ('SPORTS', 3.390596745027125),\n", " ('PERSONALIZATION', 3.322784810126582),\n", " ('COMMUNICATION', 3.2323688969258586),\n", " ('HEALTH_AND_FITNESS', 3.0854430379746836),\n", " ('PHOTOGRAPHY', 2.949819168173599),\n", " ('NEWS_AND_MAGAZINES', 2.802893309222423),\n", " ('SOCIAL', 2.667269439421338),\n", " ('TRAVEL_AND_LOCAL', 2.3395117540687163),\n", " ('SHOPPING', 2.2490958408679926),\n", " ('BOOKS_AND_REFERENCE', 2.1360759493670884),\n", " ('DATING', 1.8648282097649187),\n", " ('VIDEO_PLAYERS', 1.7970162748643763),\n", " ('MAPS_AND_NAVIGATION', 1.3901446654611211),\n", " ('FOOD_AND_DRINK', 1.2432188065099457),\n", " ('EDUCATION', 1.164104882459313),\n", " ('ENTERTAINMENT', 0.9606690777576853),\n", " ('LIBRARIES_AND_DEMO', 0.9380650994575045),\n", " ('AUTO_AND_VEHICLES', 0.9267631103074141),\n", " ('HOUSE_AND_HOME', 0.8024412296564195),\n", " ('WEATHER', 0.7911392405063291),\n", " ('EVENTS', 0.7120253164556962),\n", " ('PARENTING', 0.6555153707052441),\n", " ('ART_AND_DESIGN', 0.6442133815551537),\n", " ('COMICS', 0.6103074141048824),\n", " ('BEAUTY', 0.599005424954792)]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "category_a_apps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Play Store the distribution seems to be more balanced.
Although, we have Games as one of the top most category, but it doesn't effect the distribution so much. However, apps for Practical Purposes like `Tools`, `Education`, `Business`, `Productivity` are taking the top spots.

\n", "**We may also notice that `Genre` column is far more granular (more categories) compared to `Category` column. Since, we want to get the overall picture, it will be better to use `Category` column from now on.**\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Most Popular Apps on App Store and Play Store by Genre\n", "\n", "Being most common doesn't equate with being most popular. Maybe there is more supply then demand. So here, we will analyse by comparing the number of users(installs) per Genre.\n", "> **Play Store** : column - `Installs` provides us an idea for the number of installs.
\n", "> **App Store** : Here, we don't have any column for installs. But the column `rating_count_tot` (User Ratings) provides us an idea for the installs in each genre." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Most Popular Apps - iOS**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Reference', 67447.9),\n", " ('Music', 56482.02985074627),\n", " ('Social Networking', 53078.195804195806),\n", " ('Weather', 47220.93548387097),\n", " ('Photo & Video', 27249.892215568863),\n", " ('Navigation', 25972.05),\n", " ('Travel', 20216.01785714286),\n", " ('Food & Drink', 20179.093023255813),\n", " ('Sports', 20128.974683544304),\n", " ('Health & Fitness', 19952.315789473683),\n", " ('Productivity', 19053.887096774193),\n", " ('Games', 18924.68896765618),\n", " ('Shopping', 18746.677685950413),\n", " ('News', 15892.724137931034),\n", " ('Utilities', 14010.100917431193),\n", " ('Finance', 13522.261904761905),\n", " ('Entertainment', 10822.961077844311),\n", " ('Lifestyle', 8978.308510638299),\n", " ('Book', 8498.333333333334),\n", " ('Business', 6367.8),\n", " ('Education', 6266.333333333333),\n", " ('Catalogs', 1779.5555555555557),\n", " ('Medical', 459.75)]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categories_ios_list = []\n", "pop_ios_apps_dict = {}\n", "pop_ios_apps_list = []\n", "\n", "# Getting list of Genres in Apple Store\n", "for app in genre_ios_apps:\n", " categories_ios_list.append(app[0])\n", "\n", "# Getting No. of installs/genre list in App Store\n", "for cat in categories_ios_list:\n", " lngth = 0\n", " total = 0\n", " for app in free_ios_apps:\n", " if cat == app[-5]:\n", " n_ratings = float(app[6])\n", " lngth += 1\n", " total += n_ratings\n", " avg_rating = total/lngth\n", " pop_ios_apps_dict[cat] = avg_rating\n", "\n", "# Sorting in Descending Order to display the most popular genres\n", "for i in sorted(pop_ios_apps_dict, \n", " key = pop_ios_apps_dict.get, reverse=True):\n", " pop_ios_apps_list.append((i, pop_ios_apps_dict[i]))\n", " \n", "pop_ios_apps_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Most Popular Categories - Android**" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('COMMUNICATION', 38590581.08741259),\n", " ('VIDEO_PLAYERS', 24727872.452830188),\n", " ('SOCIAL', 23253652.127118643),\n", " ('PHOTOGRAPHY', 17840110.40229885),\n", " ('PRODUCTIVITY', 16787331.344927534),\n", " ('GAME', 15544014.51048951),\n", " ('TRAVEL_AND_LOCAL', 13984077.710144928),\n", " ('ENTERTAINMENT', 11640705.88235294),\n", " ('TOOLS', 10830251.970588235),\n", " ('NEWS_AND_MAGAZINES', 9549178.467741935),\n", " ('BOOKS_AND_REFERENCE', 8814199.78835979),\n", " ('SHOPPING', 7036877.311557789),\n", " ('PERSONALIZATION', 5201482.6122448975),\n", " ('WEATHER', 5145550.285714285),\n", " ('HEALTH_AND_FITNESS', 4188821.9853479853),\n", " ('MAPS_AND_NAVIGATION', 4049274.6341463416),\n", " ('FAMILY', 3695641.8198090694),\n", " ('SPORTS', 3650602.276666667),\n", " ('ART_AND_DESIGN', 1986335.0877192982),\n", " ('FOOD_AND_DRINK', 1924897.7363636363),\n", " ('EDUCATION', 1833495.145631068),\n", " ('BUSINESS', 1712290.1474201474),\n", " ('LIFESTYLE', 1446158.2238372094),\n", " ('FINANCE', 1387692.475609756),\n", " ('HOUSE_AND_HOME', 1360598.042253521),\n", " ('DATING', 854028.8303030303),\n", " ('COMICS', 832613.8888888889),\n", " ('AUTO_AND_VEHICLES', 647317.8170731707),\n", " ('LIBRARIES_AND_DEMO', 638503.734939759),\n", " ('PARENTING', 542603.6206896552),\n", " ('BEAUTY', 513151.88679245283),\n", " ('EVENTS', 253542.22222222222),\n", " ('MEDICAL', 120550.61980830671)]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "categories_a_list = []\n", "pop_a_apps_dict = {}\n", "pop_a_apps_list = []\n", "\n", "# Getting the list of Categories in Play Store\n", "for app in category_a_apps:\n", " categories_a_list.append(app[0])\n", "\n", "# Getting the Most Ratings/Category in Play Store\n", "for i in categories_a_list:\n", " total = 0\n", " lngth = 0\n", " for app in free_a_apps:\n", " if app[1] == i:\n", " installs = app[5]\n", " installs = installs.replace(',', '')\n", " installs = float(installs.replace('+', ''))\n", " total += installs\n", " lngth += 1\n", " avg_rating = total/lngth\n", " pop_a_apps_dict[i] = avg_rating\n", " \n", "# Sorting in Descending Order to display the most popular genres\n", "for i in sorted(pop_a_apps_dict, \n", " key = pop_a_apps_dict.get, reverse=True):\n", " pop_a_apps_list.append((i, pop_a_apps_dict[i]))\n", "\n", " \n", "pop_a_apps_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Here, we can see various categories which are famous and common between both the stores.
We know that most of the app installs and popularity comes from a few apps. Such as Facebook for Social and Youtube for Music or Video Players. We are also not interested in building up a communication app like Whatsapp, Skype, etc. But there is one category which is popular in both stores - Photo Category.**
\n", "\n", "### *Conclusion:*\n", "\n", "***We can recommend developing an app which is in Photographs Category.***" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }