# Tutorial 5: Persistence of editor actions over time

[Wikiwho Edit Persistence](https://www.wikiwho.net/en/edit_persistence/v1.0.0-beta/) to track the persistence of all actions done in a page, or all actions done by an editor by providing monthly accumulates (sums) of the actions (insertions, deletions and reinserions) per page.

## 1. Edit Persistence of a page

The method `edit_persistance` will query the persistence of changes on a page.

In [None]:
from wikiwho_wrapper import WikiWho
ww = WikiWho(lng='en')

df = ww.dv.edit_persistence(page_id=6187)
df.head()

The dataframe accumulates several actions by month (`year_month` column), and editor (`editor_id`):
 
- **`adds`**: number of tokens inserted for the first time
- **`adds_surv_48h`**: number of tokens inserted for the first time that survived at least 48 hours
- **`adds_persistent`**: number of tokens inserted for the first time that survived until, at least, the end of the month
- **`adds_stopword_count`**: number of tokens inserted that were stop words
- **`dels`**: number of tokens deleted
- **`dels_surv_48h`**: number of tokens deleted that were not resinserted in the next 48 hours
- **`dels_persistent`**: number of tokens deleted that were not resinserted until, at least, the end of the month
- **`dels_stopword_count`**: number of tokens deleted that were stop words
- **`reins`**: number of tokens reinserted 
- **`reins_surv_48h`**: number of tokens reinserted that survived at least 48 hours
- **`reins_persistent`**: number of tokens reinserted that survived until the end of the month
- **`reins_stopword_count`**: number of tokens reinserted that were stop words

### 1.1. Calculating total actions

The dataframe present the persistance of granulated actions, i.e insertions, resinsertions and deletions, so a common operation is to add them up to get the big picture of the persistence.

In [None]:
#Calculating total actions (regardles persistence)
df['total_actions'] = df['adds'] + df['dels'] + df['reins']

#Calculating total actions in 48h
df['total_actions_48h'] = df['adds_surv_48h'] + df['dels_surv_48h'] + df['reins_surv_48h']

# Calculating total persistent actions
df['total_persistent'] = df['adds_persistent'] + df['dels_persistent'] + df['reins_persistent']

# Calculating total stopword counts
df['total_stopword_count'] = df['adds_stopword_count'] + df['dels_stopword_count'] + df['reins_stopword_count']

#display
df[['year_month', 'editor_id', 'total_actions', 'total_actions_48h', 'total_persistent', 'total_stopword_count']].head()

### 1.2. Edit persistance of a page (without editor)

The dataframe present the data by month and by editor. If the goal is to only track the changes of the pages (without the editor), a simple groupby will be sufficient.

In [None]:
df.drop(columns=['editor_id']).groupby(['year_month', 'page_id']).sum().head()

### 1.3. Total actions per editor (without month)

Another possibility is to calculate the total number of actions per editor regardless the time.

In [None]:
df.drop(columns=['page_id']).groupby('editor_id').sum().head()

### 1.4 Number of actions per editor

We can also check the number of months in which each of the editors have had, at least, one contribution

In [None]:
df.groupby('editor_id').size().sort_values(ascending=False).head()

## 2. Edit Persistence of an editor (across all page)

The most valuable service that the `edit_persistance` function provides is tracking all actions of an editor across all pages. Let's start with the editor id `2092791` taken from the previous section.

In [None]:
df = ww.dv.edit_persistence(editor_id=2092791)
df.head()

The dataframe accumulates several actions by month (`year_month` column), and page (`page_id`):
 
- **`adds`**: number of tokens inserted for the first time
- **`adds_surv_48h`**: number of tokens inserted for the first time that survived at least 48 hours
- **`adds_persistent`**: number of tokens inserted for the first time that survived until, at least, the end of the month
- **`adds_stopword_count`**: number of tokens inserted that were stop words
- **`dels`**: number of tokens deleted
- **`dels_surv_48h`**: number of tokens deleted that were not resinserted in the next 48 hours
- **`dels_persistent`**: number of tokens deleted that were not resinserted until, at least, the end of the month
- **`dels_stopword_count`**: number of tokens deleted that were stop words
- **`reins`**: number of tokens reinserted 
- **`reins_surv_48h`**: number of tokens reinserted that survived at least 48 hours
- **`reins_persistent`**: number of tokens reinserted that survived until the end of the month
- **`reins_stopword_count`**: number of tokens reinserted that were stop words

### 2.1. Calculating total actions 

The dataframe present the persistance of granulated actions, i.e insertions, resinsertions and deletions, so a common operation is to add them up to get the big picture of the persistence.

In [None]:
#Calculating total actions (regardles persistence)
df['total_actions'] = df['adds'] + df['dels'] + df['reins']

#Calculating total actions in 48h
df['total_actions_48h'] = df['adds_surv_48h'] + df['dels_surv_48h'] + df['reins_surv_48h']

# Calculating total persistent actions
df['total_persistent'] = df['adds_persistent'] + df['dels_persistent'] + df['reins_persistent']

# Calculating total stopword counts
df['total_stopword_count'] = df['adds_stopword_count'] + df['dels_stopword_count'] + df['reins_stopword_count']

#display
df[['year_month', 'editor_id', 'total_actions', 'total_actions_48h', 'total_persistent', 'total_stopword_count']].head()

### 2.2. Edit persistance of an editor (across all pages)

The dataframe present the data by month and by page. If the goal is to only track the actions of the editor (without the pages), a simple groupby will be sufficient.

In [None]:
df.drop(columns=['page_id']).groupby(['year_month', 'editor_id']).sum().head()

### 2.3. Total actions per editor (without month)

Another possibility is to calculate the total number of actions (per page) per editor regardless the time.

In [None]:
df.drop(columns=['editor_id']).groupby(['page_id']).sum().head()

Or we can get number of actions across all pages.

In [None]:
df.drop(columns=['page_id']).groupby('editor_id').sum().head()

### 2.4 Number of actions per page

We can also check the number of months in which each of the editors have had, at least, one contribution

In [None]:
df.groupby('page_id').size().sort_values(ascending=False).head()

Notice that the above also list all the pages in which an author has contributed. 

In [None]:
from utils.notebooks import get_next_notebook
from IPython.display import HTML
try:
 display(HTML(f'Go to next workbook'))
except:
 HTML('Go to next workbook')
