# LoL Churn Predictor [Part 1 - Data Collection]

**David Skarbrevik - 2018**

We will be getting all data from Riot's API. Their API allows us to query a Summoner's name (player username) to get match history from the last 3 years. Ultimately our goal is to build a model that predicts yes/no (1/0) will a player stop playing the game (based on some previous data) or not.

<a id="toc"></a>

<br>
<hr style="background-color: black; padding: 1px;">
<br>

<h2>Table of Contents</h2>

<br>

<ol>
    <h3><li><a href="#section1">Planning</a></li></h3>
    <br>
    <h3><li><a href="#section2">Getting Familiar with Riot's API</a></li></h3>
    <br>
    <h3><li><a href="#section3">Building functions to get our data</a></li></h3>
    <br>
    <h3><li><a href="#section4">Actually gathering the data</a></li></h3>          
</ol>

<br>
<hr style="background-color: black; padding: 1px;">
<br>

<a id='section1'></a>

## Step 1) Planning 

Let's first make sure we have a clear vision of what kind of data we can (and want) to get.

### 1) What kind of data can we collect from a match?

Individual player data about how many kills, assists, at what time and location, what champion they chose, how long the match lasted, etc. 

### 2) Is there any other non-match data we can collect about a player to help with our prediction?

Not really at the moment. Might think more about this in the future

### 3) What players should we be targeting? 

Creating a random database may be difficult unless I can figure out how to get a list of random summoner IDs... **see "Step 3" of this notebook for more info.**


### 4) How many players and matches do we need to collect to make an adequate test/train dataset?

I think maybe 1000 players would be a good start. I'll just take the first match of each player to start but if this doesn't seem to be enough to build an accurate model I'll consider taking more. I think the biggest challenge for data collection right now will be finding enough players that fit my specifications (see "Step 3" of this notebook).

### 5) How should our data be organized to feed it into a model?

I'm planning to use either logistic regression or a simple feed-forward neural network. For these models I want each row of my dataset to represent all the data for a single player. By looking at just the first match for each player it should be easy to make sure the number of features in each row is the same. Later, if I wanted a more general model for churn that works for any player with any number of matches, an RNN might be worth looking into considering the sequential, non-equal length of input vectors in that case.


***

<div align="right">
    <a href="#toc">back to top</a>
</div>
<a id='section2'></a>

## Step 2) Getting familiar with Riot's API (using the Python wrapper Cassiopeia)

Advantages of using Cassiopeia instead of making the API calls ourselves:
* Useful pre-built functions for gathering data (e.g. `MatchParticipant.stats.to_dict()` gets a LOT of features at once)
* Rate limiting is taken care of for us (sooo helpful since we want to make a lot of calls to build our database)

**Import needed libraries**

In [1]:
import numpy as np
import re
import json
import pandas as pd
from collections import defaultdict, Counter
import arrow
import ipywidgets as widgets # for progress bar

import jupyternotify # gives desktop notifications when cell blocks complete
ip = get_ipython()
ip.register_magics(jupyternotify.JupyterNotifyMagics(ip, require_interaction=True))

import cassiopeia as cass # Riot API wrapper
from cassiopeia import Summoner

<IPython.core.display.Javascript object>

**Set global parameters for Riot API**

In [2]:
no_print_calls = cass.get_default_config()
no_print_calls['logging']['print_calls'] = False

cass.apply_settings(no_print_calls)

In [3]:
cass.set_riot_api_key("RGAPI-975f0783-9b2f-4f69-adb0-000b71a0d9f2") # API key that you get from your account
cass.set_default_region("NA")

**Let's try to get the summoner ids of some players (the best players)**

In [4]:
challenger_league = cass.get_challenger_league(queue=cass.Queue.ranked_solo_fives)

for challenger in challenger_league:
    print(challenger.summoner.id)

Making call: https://na1.api.riotgames.com/lol/league/v3/challengerleagues/by-queue/RANKED_SOLO_5x5
34402943
24705131
21103810
47212115
20598223
93809082
20823651
21124626
21623482
20429469
20202108
29557073
24449785
28725693
24671130
30645632
70132282
21371454
42133105
41233933
34225884
49899264
35097344
22670229
45186258
19285521
48193601
19839806
401063
56901374
30266347
37532228
19804632
27251616
79249642
52371378
22478427
83339123
60469878
21542029
30972130
40661961
93269094
36119978
19920510
51580106
77269277
20947579
30255016
19787999
20476893
77520973
21652056
50877721
90331209
82962549
32252808
40279866
36782900
45079499
19587365
24823450
20636732
51949451
32705024
77720407
71229345
91419117
22638316
20389591
52559520
51075422
88109144
37481045
20962399
76609200
19577112
45038172
35592112
83089575
43839117
94819093
20130821
22403576
71715
35570736
77489438
40710913
65389099
77211704
24326160
32556358
21276764
19338315
43079579
20248040
26065445
19698094
91636261
21260950
20414

That's cool but these are the ids of the best of the best... we want the newest of the new (unranked players) so we'll have to work a bit harder.

**How to access a single user**

In [4]:
summoner = Summoner(name="Msendak", region="NA")
print("{name} ({id}) is a level {level} summoner on the {region} server.".format(name=summoner.name,
                                                                          level=summoner.level,
                                                                          id=summoner.id,
                                                                          region=summoner.region))

Msendak (37709821) is a level 55 summoner on the Region.north_america server.


**We can also access the user with their summoner id**

It is my understanding that new summoner names can be bought but that summoner ids may be more permenant so for the sake of stability we'll be associating players by their id when gathering data later.

In [35]:
summoner = Summoner(id=37709821, region="NA")
print("{name} ({id}) is a level {level} summoner on the {region} server.".format(name=summoner.name,
                                                                          level=summoner.level,
                                                                          id=summoner.id,
                                                                          region=summoner.region))

MSendak (37709821) is a level 49 summoner on the Region.north_america server.


**Get all the matches that user played**

In [15]:
matches = summoner.match_history

In [16]:
print("Total matches = {}".format(len(matches)))

Total matches = 2779


**Time of most recent and oldest match (in the last three years)**

In [17]:
first_match = matches[-1]
latest_match = matches[0]

time_first_match = re.search("\d+-\d+-\d+",str(first_match.creation))[0]
time_latest_match = re.search("\d+-\d+-\d+",str(latest_match.creation))[0]

print("{name}'s Oldest Match stored in the API: {date}".format(name=summoner.name, date=time_first_match))
print("{name}'s Most Recent Match: {date}".format(name=summoner.name, date=time_latest_match))

MSendak's Oldest Match stored in the API: 2015-05-01
MSendak's Most Recent Match: 2018-04-22


**Get stats from any player in that specific match**

In [20]:
players = first_match.participants
player = players[1] # selecting a specific player

In [21]:
player_stats = player.stats.to_dict()

player_stats

{'assists': 19,
 'champLevel': 14,
 'combatPlayerScore': 0,
 'damageDealtToObjectives': 0,
 'damageDealtToTurrets': 0,
 'damageSelfMitigated': 0,
 'deaths': 11,
 'doubleKills': 3,
 'firstBloodAssist': False,
 'firstBloodKill': False,
 'firstInhibitorAssist': True,
 'firstInhibitorKill': False,
 'firstTowerAssist': False,
 'firstTowerKill': False,
 'goldEarned': 8496,
 'goldSpent': 7050,
 'inhibitorKills': 0,
 'item0': 3087,
 'item1': 3006,
 'item2': 2003,
 'item3': 3153,
 'item4': 0,
 'item5': 0,
 'item6': 2052,
 'killingSprees': 3,
 'kills': 9,
 'largestCriticalStrike': 388,
 'largestKillingSpree': 3,
 'largestMultiKill': 3,
 'longestTimeSpentLiving': 136,
 'magicDamageDealt': 9021,
 'magicDamageDealtToChampions': 3439,
 'magicalDamageTaken': 1794,
 'neutralMinionsKilled': 0,
 'objectivePlayerScore': 0,
 'participantId': 2,
 'pentaKills': 0,
 'physicalDamageDealt': 20675,
 'physicalDamageDealtToChampions': 8278,
 'physicalDamageTaken': 12276,
 'playerScore0': 0,
 'playerScore1': 0,
 '

In [22]:
player_timeline = player.timeline.to_dict()

player_timeline

{'creepsPerMinDeltas': {'0-10': 1.5},
 'csDiffPerMinDeltas': {'0-10': 0.1599999999999998},
 'damageTakenDiffPerMinDeltas': {'0-10': -53.019999999999925},
 'damageTakenPerMinDeltas': {'0-10': 964.6999999999999},
 'goldPerMinDeltas': {'0-10': 467.29999999999995},
 'id': 2,
 'lane': 'MIDDLE',
 'role': 'NONE',
 'xpDiffPerMinDeltas': {'0-10': 36.0800000000001},
 'xpPerMinDeltas': {'0-10': 672.8}}

The "stats" and "timeline" attributes we see here are where to bulk of data from a match come from.

***

<div align="right">
    <a href="#toc">back to top</a>
</div>
<a id='section3'></a>

## Step 3) Building functions to get our data

Now that we're familiar with how the Cassiopeia library works, let's get the data we really want.

I specifically want players that are new to the game (low level / few matches), but there is no way to directly access a list of players that meet that criteria through the API (at least not that I know of). So instead, I made a new account (summoner: 'OldTimeCandyBowl') and played a match. I then queried the API about the match to get the summoner ids of my teammates in that match (knowing they'd likely be low level players). My plan is to branch out from them, by taking the players in other matches they've had and then the other players _those_ players have played with, and so on and so forth until I've branched deep enough that I'm happy with the number of low level summoners I've gathered. Initially **I'll aim for 1,000 summoners**.

**Targeted summoner for database I'm building:**

* first match must be in 2018
* first match must be at least 1 month ago (from today)
* player level must not be higher than 5

**Data to collect from target summoner:**

* gameplay data from first match

**Possible prediction tasks:**

_(all predictions based on summoner's gameplay data from first match)_
* Will the summoner get to level 3 or higher within the first month of play?
* Did the summoner play more than 1 match?
* Did the summoner play at least X matches?

**We are interested in a very specific type of player so this function filters out players that don't fit our specifications.**

In [5]:
# limits the type of player we want in our database
def is_candidate_player(summoner, first_match, level_cap):
    
    # NOTE: (int,int,int) that is output with True/False is a counter check to understand how many players are rejected from db.
    
    entry_game_types = ['BOT_5X5_INTRO', 'BOT_5X5_BEGINNER', 'BOT_5X5_INTERMEDIATE']
    time_now = arrow.now()

    # level cap restriction
    if summoner.level > level_cap:
        return False, (1,0,0)
      
    # must not be newer than this
    if first_match.creation.shift(months=1) > time_now:
        return False, (0,1,0)
    
    # must not be older than this
    if first_match.creation.year < 2018:
        return False, (0,0,1)
    
    return True, (0,0,0)

**This function actually gathers the game data from a player's first match.**

In [6]:
# gather all player data from a single match
def get_player_game_data(summoner, player_history):    

    first_match = player_history[-1]
    recent_match = player_history[0]
    first_match_participant = first_match.participants[summoner]
    time_now = arrow.now()
    
    # we need to un-nest the timeline dict and standardize the format of this data
    timeline_dict = {}
  
    try:
        for key,val in first_match_participant.timeline.to_dict().items():

            if isinstance(val, dict):
                    timeline_dict['{0}_0-10'.format(key)] = val.get('0-10',0)
                    timeline_dict['{0}_10-20'.format(key)] = val.get('10-20',0)
                    timeline_dict['{0}_20-30'.format(key)] = val.get('20-30',0)
                    timeline_dict['{0}_30-end'.format(key)] = val.get('30-end',0)                    
            else:
                timeline_dict[key] = val 
    except:
        print("Problem getting timeline data")
            
    try:
        # stats data is already in good shape!
        stats_dict = first_match_participant.stats.to_dict()
    except:
        print("Problem getting stats data")
        
    try:
        # get basic summoner info
        summoner_dict = {"summoner_id": summoner.id, "summoner_name": summoner.name, "summoner_level": summoner.level}
    except:
        print("Problem getting summoner info.")
        
    try:
        # there is some other info we want that's not captured by the 'stats' or 'timeline
        extra_tid_bits = {'first_match_id': first_match.id, 'first_match_duration': first_match.duration, 
                          'first_match_time': first_match.creation, 'latest_match_time': recent_match.creation, 
                          'total_matches': len(player_history)}
    except:
        print("Problem getting extra info about player match.")
    
    
    return {**extra_tid_bits, **timeline_dict, **stats_dict, **summoner_dict}

**This function takes in a list of summoner ids and produces a dataset of first match data from it.**

Only really necessary if I'm in a situation where I have built a good list of summoner ids but I've lost my dataset for some reason.

In [7]:
##############################
# WE WILL CALL THIS DIRECTLY #
##############################  
# given a list of summoner ids, will return dataframe with game data for each player
# this is a backup, "just in case" function that I would only use if I lost my database but still had my list of summoners.
def db_from_ids(summoner_ids: list):
    
    master_game_data_list = []
    issue_ids = []

    for i in range(len(summoner_ids)):

        if i % 10 == 0:
            print("Currently processing player {}".format(i+1))

        try:
            summoner = Summoner(id=summoner_ids[i], region="NA")
            summoner_history = summoner.match_history
        except Exception as inst:
            issue_ids.append(player)
            print("Problem getting player {}".format(player))
            print(type(inst))   
            print(inst)  

        try:
            game_data = get_player_game_data(summoner, summoner_history)
            master_game_data_db.append(game_data)
        except Exception as inst:
            print("Problem getting game_data")
            print(type(inst))   
            print(inst)  

    if issue_ids:
        print("Could not collect data on these players:")
        print(issue_ids)
        
    master_df = pd.DataFrame([*master_game_data_db])
    
    return master_df

**After each round of data collection this saves lists and data to files.**

In [13]:
##############################
# WE WILL CALL THIS DIRECTLY #
##############################  
# this ties the data from the end of a round to all the data collected to that point and saves it to disk
def end_of_round_housekeeping(master_db, round_db: list, master_id_list: list, round_list: list):
    
    if isinstance(master_db, pd.core.frame.DataFrame):
        master_db = master_db.append(round_db)

    master_id_list += round_list # running list of all ids already in database

    # save the current seed list
    with open("./data/current_seed_list.txt", "w") as file1:
        for summoner in round_list:
            file1.write(str(summoner)+"\n")

    # save all summoner ids that have been added to database already
    with open("./data/summoners_in_db.txt", "a") as file2:
        for summoner in round_list:
            file2.write(str(summoner)+"\n")
    
    # save dataset to csv
    master_db.to_csv("./data/riot_master_df2.csv", index=False)
            
    return master_db, master_id_list  

**If we need to restart the notebook or the kernel crashes for some reason, this will read our lists/data from disk.**

In [14]:
##############################
# WE WILL CALL THIS DIRECTLY #
##############################
# if kernel crashes, call this to get most current lists and dataframe back
def recover_lists():  
    current_seed_list = []
    summoners_in_db = []

    with open('./data/current_seed_list.txt', 'r') as file1:
        lines = file1.readlines()
        for line in lines:
              current_seed_list.append(int(line.rstrip('\n')))

    with open('./data/summoners_in_db.txt', 'r') as file2:
        lines = file2.readlines()
        for line in lines:
              summoners_in_db.append(int(line.rstrip('\n')))  
        
    data_df = pd.read_csv("./data/riot_master_df.csv", encoding="ISO-8859-1")  
    
    print("Summoners in current seed list = {}".format(len(current_seed_list)))
    print("All summoners gathered so far = {}".format(len(summoners_in_db)))
    
    return data_df, current_seed_list, summoners_in_db

**This will be the conductor of the orchestra. We'll call this to get new ids and data.**

In [10]:
##############################
# WE WILL CALL THIS DIRECTLY #
##############################
# collect gameplay data from players
def get_new_players(seed_list_of_summoners, summoners_already_in_db = None, cut_off = None, level_cap = 10):

    # PARAMETER(S) #
    # seed_list_of_summoners : (list) of summoner ids to branch from to find new players
    # cutt_off               : (int) stops data mining if cut_off number of players get added to db
    
    
    # initialization for data we want to collect
    gameplay_database = [] # list of dictionaries containing player match data
    new_seeders = [] # new list of seeders for next call to this function
    back_up_seeders = [] # if there are less than 5 good candidates then return all players considered instead
    
    # tracking to troubleshoot issues and help validate data collection
    player_added_count = 0 # track how many players added to database
    considered_count = 0 # track how many players entered "is_candidate_player"
    denied_level_cap = 0 # track how many players denied b/c level too high
    denied_too_young = 0 # track how many players denied b/c account too new
    denied_too_old = 0 # track how many players denied b/c account too old
    
    
    if cut_off != None:
        if not isinstance(cut_off, int):
            cut_off = None
            print("Warning: cut_off must be type 'int'... cut_off was reset to none.")
    
    # avoid unnecessary duplicate data mining
    if summoners_already_in_db == None:
        summoners_already_in_db = seed_list_of_summoners
    
    total_seeders = len(seed_list_of_summoners)
    
    # progress bar to make to track... progress
    interval = 100/total_seeders # for incrementing progress bar
    progress = widgets.FloatProgress(value=0.0, min=0.0, max=100.0, description='0%', bar_style='info')
    display(progress)
    
    # cycle through ids in "seeder list" to get new summoner ids and data
    for player in seed_list_of_summoners:
        
        try: 
            seed_summoner = Summoner(id=player, region="NA")
            seeder_history = seed_summoner.match_history
            seeder_total_matches = len(seeder_history)
            
            for i in range(seeder_total_matches):
                
                total_players = len(seeder_history[i].participants)
                
                for j in range(total_players):
                    
                    player = seeder_history[i].participants[j]
                    
                    # avoid bots and duplicates
                    try:
                        if player.is_bot == True:
                            continue
                    except:
                        pass # not great practice but this occurs frequently and is meaningless for me

                    try:
                        if player.summoner.id in summoners_already_in_db:
                            continue
                    except Exception as inst:
                        print("Issue checking if summoner in seed_list")
                        print(type(inst))   
                        print(inst)  
             
                    player_history = player.summoner.match_history
                    first_match = player_history[-1]

                    try:
                        put_in_db, denied_checks = is_candidate_player(player.summoner, first_match, level_cap)
                    except Exception as inst:
                        print("Problem checking if candidate player.")
                        print(type(inst))   
                        print(inst)     

                    considered_count += 1
                    denied_level_cap += denied_checks[0]
                    denied_too_young += denied_checks[1]
                    denied_too_old += denied_checks[2]
                    
                    back_up_seeders.append(player.summoner.id)
                    
                    if put_in_db == True:
                       
                        try:
                            game_data = get_player_game_data(player.summoner, player_history)
                        except:
                            print("Problem getting game_data")

                        # throw our new valuable data into the treasure chest!
                        gameplay_database.append(game_data)
                        new_seeders.append(player.summoner.id)
                        summoners_already_in_db.append(player.summoner.id)
                        
                        player_added_count += 1
                        
                        if player_added_count % 100 == 0:
                            print("Currently {} players in database.".format(player_added_count))
                        
                        if cut_off:
                            if player_count >= cut_off:
                                return gameplay_database
                            
            # update progress bar (end of seeder player history)               
            progress.value += interval
            progress.description = "{}%".format(round(progress.value,1))
            
        except Exception as inst:
            print("ERROR while processing player: {0} ({1}).".format(seed_summoner.name, seed_summoner.id))
            print(type(inst))   
            print(inst)
    
    
    # made it to the end!!! Let's summarize what happened during collection.
    progress.bar_style = 'success'
    print("DONE! \n")
    print("Players that were considered for database = {}".format(considered_count))
    print("Players that were added to database = {}".format(player_added_count))
    print("Players denied b/c of level cap = {}".format(denied_level_cap))
    print("Players denied b/c account too young = {}".format(denied_too_young))
    print("Players denied b/c account too old = {}".format(denied_too_old))
    
    if len(new_seeders) >= 5:
        return gameplay_database, new_seeders
    else:
        print("\033[1;31m" + "WARNING:" + "\033[0m" + " There weren't many good candidate players :( so look carefully at seed list that was returned.")
        return gameplay_database, back_up_seeders

***

<div align="right">
    <a href="#toc">back to top</a>
</div>
<a id='section4'></a>

## Step 4) Actually gathering our data

**Note about Step 4:** You'll notice as you look through Step 4 that I gather data in "Rounds". I'm essentially manually looping over a call to my function above until I have enough data. I'm doing this manually for now so that I can validate the data I get out of each loop. If I want to use this more in the future I'll probably write a loop in to the functions above.

**Now that we have our functions ready, we just need a seed list of summoner ids to start from.**

We'll use this new account that I played just a few games on to try and get new player ids.

In [5]:
summoner = Summoner(name="OldTimeCandyBowl", region="NA")
print(summoner.name, summoner.level)

Making call: https://na1.api.riotgames.com/lol/summoner/v3/summoners/by-name/OldTimeCandyBowl
OldTimeCandyBowl 3


In [6]:
match_history = summoner.match_history
seed_match = match_history[0]

players = seed_match.participants

seed_list = []

for i in range(len(players)):
    
    if players[i].summoner.level < 10 and (players[i].summoner.match_history[-1].creation.year > 2017):
        seed_list.append(players[i].summoner.id)
    
    print("{0} ({1}) is level {2}".format(players[i].summoner.name, 
                                              players[i].summoner.id, 
                                              players[i].summoner.level))

    


Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/247157416?beginIndex=0&endIndex=100
Making call: https://ddragon.leagueoflegends.com/realms/na.json
Making call: https://na1.api.riotgames.com/lol/match/v3/matches/2768433833
Making call: https://na1.api.riotgames.com/lol/summoner/v3/summoners/by-account/247157416
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/247157416?beginIndex=0&endIndex=100
OldTimeCandyBowl (94643101) is level 3
[94643101]
Making call: https://na1.api.riotgames.com/lol/summoner/v3/summoners/76772327
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/233648039?beginIndex=0&endIndex=100
FrozenVettel (76772327) is level 9
[94643101]
Making call: https://na1.api.riotgames.com/lol/summoner/v3/summoners/93799945
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246057408?beginIndex=0&endIndex=100
INFO: Unexpected service rate limit, backing off for 35 seconds

In [22]:
print("Seed list of summoner ids: {}".format(seed_list))

Seed list of summoner ids: [94643101, 93799945, 94533010, 95079317, 94859763]


### Starting Round 1 of summoner id mining

Let's use our seed list to try and get more new players.

In [None]:
game_data_dict, new_seed_list = get_new_players(seed_list)

**Awesome! We found some new ids that fit our specifications! Let's verify that they seem right for our database.**

In [20]:
for id in new_seed_list:
    summoner = Summoner(id=id, region="NA")
    print("{0} ({1}) is level {2} with {3} matches. First match was {4}".format(summoner.name, 
                                                                                summoner.id, 
                                                                                summoner.level, 
                                                                                len(summoner.match_history),
                                                                                summoner.match_history[-1].creation.format("MM-DD-YYYY")))


Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244707853?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244707853?beginIndex=0&endIndex=100
Juanochi (92091071) is level 7 with 8 matches. First match was 01-16-2018
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246050296?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246050296?beginIndex=0&endIndex=100
BigDaddyMiles (93659918) is level 3 with 4 matches. First match was 03-10-2018
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244444992?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244444992?beginIndex=0&endIndex=100
ritowhy1 (93589089) is level 8 with 16 matches. First match was 03-09-2018
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246049176?

**It looks like only two of the ten players are under level 5.** 

Since we want to have a lot of examples of players that didn't get to level 3 (or maybe at least not to level 5), we're going to just use the lowest level players from our new list to try and get more new players.

### Starting Round 2 of data collection

In [None]:
game_data_dict2, new_seed_list2 = get_new_players(new_seed_list, level_cap=5) # specifying level cap to be lower (default = 10)

**Yes! Looks like we may have found 21 players that are new (but at least a month old still) and low level!**

Let's double check to see their info:

In [None]:
match_count = 0
for id in new_seed_list2:
    summoner = Summoner(id=id, region="NA")
    print("{0} ({1}) is level {2} with {3} matches. First match was {4}".format(summoner.name, 
                                                                                summoner.id, 
                                                                                summoner.level, 
                                                                                len(summoner.match_history),
                                                                                summoner.match_history[-1].creation.format("MM-DD-YYYY")))
    match_count += len(summoner.match_history)

print("\n")
print("TOTAL MATCHES BETWEEN ALL NEW PLAYERS = {}".format(match_count))

So we have a total of 21 people in our dataset and with their combined 98 matches to search from, that's close to 1,000 possible new summoner ids we can get. Ultimately we'll want at least 1,000 so we'll need to rerun our data gathering functions at least once more (probably two more times).

Also note that with our level cap of 5 most of the players in our dataset _are_ level 5 or 4... but very few are 3 or 2 or 1. Because one possible prediction task we'd like to do is decided if a players will get to level 3 (out of tutorial mode) or not, we'd like more cases where a player did not get out (still level 1 or 2). To accomplish this we could get a lot more data than 1,000 players or change our level_cap to 3 or 4. I'm going to opt to just collect more data for now and see how it goes.

### Staring Round 3 of data collection

This may take a while, so I'm going to use a cell magic command to notify me when this finishes.

In [None]:
%%notify
game_data_dict23, new_seed_list3 = get_new_players(new_seed_list2, level_cap=5) # specifying level cap to be lower (default = 10)

**Another successful run! We considered 564 players (not 1,000 like I estimated but still good!). This time with 48 summoners meeting our criteria. Our momentum is building!!**

Let's take a quick look at some info from these players:

In [None]:
# some quick checks
match_count = 0
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in new_seed_list3:
    summoner = Summoner(id=id, region="NA")
    print("{0} ({1}) is level {2} with {3} matches. First match was {4}".format(summoner.name, 
                                                                                summoner.id, 
                                                                                summoner.level, 
                                                                                len(summoner.match_history),
                                                                                summoner.match_history[-1].creation.format("MM-DD-YYYY")))
    match_count += len(summoner.match_history)
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("\n")
print("TOTAL MATCHES BETWEEN ALL NEW PLAYERS = {}".format(match_count))
print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

We see that we're starting to find players that didn't make it past level 3 but they are a relatively small percentage of the players we're collecting. This makes sense since low level players play fewer matches they will be harder for us to stumble upon. 

**Before we start Round 4, let's get together all the summoners already in our database:**

In [57]:
print("Round 2 seed list had {} ids".format(len(new_seed_list2)))
print("Round 3 seed list had {} ids".format(len(new_seed_list3)))

summoners_in_db = new_seed_list2 + new_seed_list3

seed_list3 = new_seed_list3 # just want to change naming convention here

Round 2 seed list had 21 ids
Round 3 seed list had 48 ids


**Also because we've worked so hard to mine these summoner ids, let's save those to a file after each round just in case:**

In [65]:
with open("current_seed_lists.txt", "w") as file1:
    for summoner in seed_list3:
        file1.write(str(summoner)+"\n")
    
with open("summoners_in_db.txt", "w") as file2:
    for summoner in summoners_in_db:
        file2.write(str(summoner)+"\n")

Let's also combine our databases for convenience:

In [55]:
game_data_list_master = game_data_dict2 + game_data_dict23 # accidentally named game_data_dict3, "game_data_dict23"...

### Starting Round 4

**note:** In this new round we're adding a `summoners_already_in_db` to avoid summoners we've already seen. Since we're starting to actually build momentum this will hopefully save us a lot of time in needless API calls.

In [None]:
%%notify -m "Round 4 of Riot data collection complete!"
game_data_list4, seed_list4 = get_new_players(seed_list3, summoners_already_in_db = summoners_in_db, level_cap=5) 

**Another successful run!** Again let's take a look at the specific level breakdowns for these new players:

In [71]:
# some quick checks
match_count = 0
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list4:
    summoner = Summoner(id=id, region="NA")
    
    match_count += len(summoner.match_history)
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("\n")
print("TOTAL MATCHES BETWEEN ALL NEW PLAYERS = {}".format(match_count))
print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244731542?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244704257?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244709001?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244708495?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244695549?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244676229?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244536171?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244676333?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244717994?

**Great! There are actually quite a few new players below level 3.** But I'm aiming for a dataset of at least 1,000 players with at least 500 examples of a player below level 3 so this is about 0.5% of the data I need. Let's hope the next round is even better!

**Again, we'll do a bit of housekeeping before we're ready to move on to the next round of data collection:**

In [72]:
game_data_list_master += game_data_list4 # current dataset... slowly building

summoners_in_db += seed_list4 # running list of all ids already in database

# save the current seed list
with open("current_seed_lists.txt", "w") as file1:
    for summoner in seed_list4:
        file1.write(str(summoner)+"\n")

# save all summoner ids that have been added to database already
with open("summoners_in_db.txt", "w") as file2:
    for summoner in summoners_in_db:
        file2.write(str(summoner)+"\n")

### Starting Round 5 of data collection

In [None]:
%%notify -m "Round 5 of Riot data collection complete!"
game_data_list5, seed_list5 = get_new_players(seed_list4, summoners_already_in_db = summoners_in_db, level_cap=5) 

**Good News! It's still working!** We are still gaining momentum in building our database. Let's do another in-depth check to see how this is breaking down:

In [79]:
# some quick checks
match_count = 0
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list5:
    summoner = Summoner(id=id, region="NA")
    
    match_count += len(summoner.match_history)
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("\n")
print("TOTAL MATCHES BETWEEN ALL NEW PLAYERS = {}".format(match_count))
print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244730607?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244459338?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244603216?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/43573836?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244701400?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244704305?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244696435?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244714535?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246380132?b

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246122362?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245891544?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245143305?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246266881?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246258616?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246252665?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/241647014?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245998237?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246051784?

We see similar results as last round. Most players are level 4 or 5 but there are still a good amount of lower level players in this set. We've about doubled the amount of new ids from the last round to get a total of around 300. I should also note that this round took 2-3 hours to complete. **At this rate I expect to go 2-3 more rounds of data collection.** Each round should take significantly longer than the last (perhaps doubling each time?). So the last round might take the better part of a day. Because the Riot API keys I'm getting are only good for a day, I'll need to be mindful of when my key will expire before I start another round (maybe this is another good reason for not automating these rounds of collection).

**Once again we have a few housekeeping steps.** We seem to have streamlined the process of housekeeping enough and we may have to do a few more rounds of data collection, so this time lets package it into a function to make it easier next time. _The function will be put up in "Step 3"_.

In [77]:
def end_of_round_housekeeping(master_db: list, round_db: list, master_id_list: list, round_list: list):
    master_db += round_db # current dataset... slowly building

    master_id_list += round_list # running list of all ids already in database

    # save the current seed list
    with open("current_seed_lists.txt", "w") as file1:
        for summoner in round_list:
            file1.write(str(summoner)+"\n")

    # save all summoner ids that have been added to database already
    with open("summoners_in_db.txt", "a") as file2:
        for summoner in round_list:
            file2.write(str(summoner)+"\n")
    
    # save last round's data to the other rounds as json
    with open("master_db.json", 'a') as file3:
        json.dump(round_db, file3)
            
    return master_db, master_id_list

In [78]:
game_data_list_master, summoners_in_db = end_of_round_housekeeping(master_db = game_data_list_master,
                                                                   round_db = game_data_list5,
                                                                   master_id_list = summoners_in_db,
                                                                   round_list = seed_list5)

### Starting Round 6 of data collection 

In [None]:
%%notify -m "Round 6 of Riot data collection complete!"
game_data_list6, seed_list6 = get_new_players(seed_list_of_summoners = seed_list5, 
                                              summoners_already_in_db = summoners_in_db, 
                                              level_cap=5) 

**This round went smoothly, let's check out the breakdown:**

In [85]:
# some quick checks
match_count = 0
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list6:
    summoner = Summoner(id=id, region="NA")
    
    match_count += len(summoner.match_history)
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("\n")
print("TOTAL MATCHES BETWEEN ALL NEW PLAYERS = {}".format(match_count))
print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244626375?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244733460?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244464208?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244696963?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244704126?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244694447?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244661686?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244649651?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/244707576?

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245803463?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245825894?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245493143?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245414477?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245490646?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245411930?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245539968?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245527418?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245222406?

Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246075111?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246075137?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246072477?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246014729?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246037095?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245432969?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/245561240?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/246057496?beginIndex=0&endIndex=100
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/228114156?

**Looks good, let's do some housekeeping and move on to round 7.**

In [None]:
game_data_list_master, summoners_in_db = end_of_round_housekeeping(master_db = game_data_list_master,
                                                                   round_db = game_data_list6,
                                                                   master_id_list = summoners_in_db,
                                                                   round_list = seed_list6)

## Oh no the kernel crashed! :(

_Quick update:_

I had an issue that forced me to restart the kernel (this is why we've been backing up our data!). So We're going to quickly read in the current seed list and complete list of summoners in db before starting round 7.

ALSO, note how from this point on there are no printed output when a call is made to the Riot API... I realized that with so many calls being made/printed it was starting to slow down the notebook visually (scrolling and typing had slower response rates), I'm not sure if this was impacting the actual computational speed of the kernel but at any rate, we're not going to show those anymore. There are print statements built into my functions that will help us get a sense for how the data collection process is progressing instead.

In [8]:
current_seed_list = []
summoners_in_db = []

with open('current_seed_lists.txt', 'r') as file1:
    lines = file1.readlines()
    for line in lines:
          current_seed_list.append(int(line.rstrip('\n')))
            
with open('summoners_in_db.txt', 'r') as file2:
    lines = file2.readlines()
    for line in lines:
          summoners_in_db.append(int(line.rstrip('\n')))            

print("Summoners in current seed list = {}".format(len(current_seed_list)))
print("All summoners gathered so far = {}".format(len(summoners_in_db)))

Summoners in current seed list = 194
All summoners gathered so far = 506


OK great, we're back on track and ready for round 7.

## Starting Round 7 of data collection

In [None]:
%%notify -m "Round 7 of Riot data collection complete!"

game_data_list7, seed_list7 = get_new_players(seed_list_of_summoners = current_seed_list, 
                                              summoners_already_in_db = summoners_in_db, 
                                              level_cap=5) 

Great, another round done! And looks like we're still building momentum which is awesome. Let's be sure:

In [15]:
# some quick checks
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list7:
    summoner = Summoner(id=id, region="NA")
    
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

NUMBER OF PLAYERS BELOW LEVEL 3 = 81
NUMBER OF PLAYERS AT LEVEL 3 = 76
NUMBER OF PLAYERS ABOVE LEVEL 3 = 224


Perfect, we're still getting a good amount of all types of levels... let's do our housekeeping (saving and appending lists/data) before we move on.

In [47]:
master_db, summoners_in_db = end_of_round_housekeeping(master_db = master_db,
                                                                   round_db = game_data_list7,
                                                                   master_id_list = summoners_in_db,
                                                                   round_list = seed_list7)

Now we're ready for round 8, but before that let's just see how big our database is right now:

In [49]:
print("{} rows in our database with {} columns each.".format(master_db.shape[0],master_db.shape[1]))

887 rows in our database with 116 columns each.


So far it's looking like smooth sailing!... well... accept for the setbacks... and the slow, hackish building of our data collection functions... but otherwise compleeetely smooth!... let's just start round 8.

### Starting Round 8 of data collection (possibly the final round)

Just kidding, slight hiccup... had to restart. So I put another function up in Step 3 so that I can quickly get back the most recent lists if the kernel ever crashes again.

In [38]:
master_df, current_seed_list, summoners_in_db = recover_lists()

Summoners in current seed list = 381
All summoners gathered so far = 887


Now we're ready!

In [None]:
%%notify -m "Round 8 of Riot data collection complete!"
%%time

game_data_list8, seed_list8 = get_new_players(seed_list_of_summoners = current_seed_list, 
                                              summoners_already_in_db = summoners_in_db, 
                                              level_cap=5) 

**We did it!** Over 8,000 players considered in this round. Unfortunately the vast majority of those were rejected due to being too high level, but we still got over 700. Let's see the level breakdown:

In [40]:
# some quick checks
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list8:
    summoner = Summoner(id=id, region="NA")
    
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

NUMBER OF PLAYERS BELOW LEVEL 3 = 189
NUMBER OF PLAYERS AT LEVEL 3 = 142
NUMBER OF PLAYERS ABOVE LEVEL 3 = 380


**Last housekeeping step:**

In [44]:
master_df, summoners_in_db = end_of_round_housekeeping(master_db = master_df,
                                                                   round_db = game_data_list8,
                                                                   master_id_list = summoners_in_db,
                                                                   round_list = seed_list8)

**Quick peak at what we collected:**

In [3]:
master_df.head()

Unnamed: 0,assists,champLevel,combatPlayerScore,creepsPerMinDeltas_0-10,creepsPerMinDeltas_10-20,creepsPerMinDeltas_20-30,creepsPerMinDeltas_30-end,csDiffPerMinDeltas_0-10,csDiffPerMinDeltas_10-20,csDiffPerMinDeltas_20-30,...,wardsPlaced,win,xpDiffPerMinDeltas_0-10,xpDiffPerMinDeltas_10-20,xpDiffPerMinDeltas_20-30,xpDiffPerMinDeltas_30-end,xpPerMinDeltas_0-10,xpPerMinDeltas_10-20,xpPerMinDeltas_20-30,xpPerMinDeltas_30-end
0,16.0,12.0,0.0,5.1,0.0,0.0,0.0,4.8,0.0,0.0,...,0.0,True,371.2,0.0,0.0,0.0,583.9,0.0,0.0,0.0
1,4.0,11.0,0.0,3.5,0.0,0.0,0.0,1.72,0.0,0.0,...,3.0,True,84.66,0.0,0.0,0.0,363.9,0.0,0.0,0.0
2,10.0,13.0,0.0,0.8,3.6,0.0,0.0,,,,...,0.0,True,,,,,331.8,464.2,0.0,0.0
3,0.0,1.0,0.0,0.0,0.0,0.0,0.0,,,,...,0.0,True,,,,,0.0,0.0,0.0,0.0
4,7.0,17.0,0.0,4.1,4.0,3.1,0.0,4.1,4.0,3.0,...,1.0,True,538.6,429.1,783.3,0.0,538.6,429.1,806.5,0.0


In [11]:
print("Number of summoners in dataset = {}".format(master_df.shape[0]))
print("Number of features from each summoner = {}".format(master_df.shape[1]))
print("Average summoner level in dataset = {}".format(round(np.average(master_df['summoner_level']),1)))

Number of summoners in dataset = 1598
Number of features from each summoner = 116
Average summoner level in dataset = 3.5


## BACK FROM THE GRAVE! (for round 9 of data collection)...

After cleaning the data, performing analysis on the dataset, and building models with the dataset... I decided I want more data :)

So here we are, back at the source to take a stab at round 9. Right now we have a little over a 1,000 players in the dataset, let's see how much we can gain in round 9!

In [15]:
master_df, current_seed_list, summoners_in_db = recover_lists()

Summoners in current seed list = 711
All summoners gathered so far = 1598


In [16]:
%%notify -m "Round 9 of Riot data collection complete!"
%%time

game_data_list9, seed_list9 = get_new_players(seed_list_of_summoners = current_seed_list, 
                                              summoners_already_in_db = summoners_in_db, 
                                              level_cap=5) 

FloatProgress(value=0.0, bar_style='info', description='0%')

Currently 10 players in database.
Currently 20 players in database.
Currently 30 players in database.
Currently 40 players in database.
Currently 50 players in database.
ERROR while processing player: saiuheadasirl (92681756).
<class 'merakicommons.container.SearchError'>
-1
Currently 60 players in database.
Currently 70 players in database.
Currently 80 players in database.
Currently 90 players in database.
Currently 100 players in database.
Currently 110 players in database.
Currently 120 players in database.
Currently 130 players in database.
Currently 140 players in database.
ERROR while processing player: Evanisugly (92300802).
<class 'merakicommons.container.SearchError'>
-1
Currently 150 players in database.
Currently 160 players in database.
Currently 170 players in database.
Currently 180 players in database.
Currently 190 players in database.
Currently 200 players in database.
Currently 210 players in database.
Currently 220 players in database.
Currently 230 players in datab

Currently 1110 players in database.
ERROR while processing player: TaraNoodlez (93839565).
<class 'merakicommons.container.SearchError'>
-1
Currently 1120 players in database.
Currently 1130 players in database.
Currently 1140 players in database.
Currently 1150 players in database.
Currently 1160 players in database.
Currently 1170 players in database.
Currently 1180 players in database.
Currently 1190 players in database.
ERROR while processing player: BenDurDonDat (93060028).
<class 'merakicommons.container.SearchError'>
-1
Currently 1200 players in database.
Currently 1210 players in database.
Currently 1220 players in database.
ERROR while processing player: BigDaddyMeats (92110852).
<class 'merakicommons.container.SearchError'>
-1
Currently 1230 players in database.
Currently 1240 players in database.
Currently 1250 players in database.
ERROR while processing player: sindysus (93350379).
<class 'cassiopeia.datastores.riotapi.common.APIError'>
The Riot API experienced an internal 

<IPython.core.display.Javascript object>

In [18]:
# some quick checks
below_level_3_count = 0
level_3_count = 0
above_level_3_count = 0

for id in seed_list8:
    summoner = Summoner(id=id, region="NA")
    
    if summoner.level < 3:
        below_level_3_count += 1
    elif summoner.level == 3:
        level_3_count += 1
    elif summoner.level > 3:
        above_level_3_count += 1

print("NUMBER OF PLAYERS BELOW LEVEL 3 = {}".format(below_level_3_count))
print("NUMBER OF PLAYERS AT LEVEL 3 = {}".format(level_3_count))
print("NUMBER OF PLAYERS ABOVE LEVEL 3 = {}".format(above_level_3_count))

NUMBER OF PLAYERS BELOW LEVEL 3 = 335
NUMBER OF PLAYERS AT LEVEL 3 = 253
NUMBER OF PLAYERS ABOVE LEVEL 3 = 677


In [17]:
master_df, summoners_in_db = end_of_round_housekeeping(master_db = master_df,
                                                                   round_db = game_data_list9,
                                                                   master_id_list = summoners_in_db,
                                                                   round_list = seed_list9)

Terrific! With this new round of data collection we've almost doubled the size of our previous dataset. Now we'll go send it through cleaning and see how it does on our models.

***

<div align="right">
    <a href="#toc">back to top</a>
</div>

## End of Part 1

### In the <a href="https://nbviewer.jupyter.org/github/dskarbrevik/League-of-Legends-Churn-Prediction/blob/master/LoL%20Churn%20Predictor%20%5BPart%202%20-%20Data%20Cleaning%20and%20EDA%5D.ipynb">next notebook</a>, we'll make sure the data is clean and do a little exploratory analysis.