{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Stackoverflow 2019 Survey results analysis and insights extraction about OSS contributions\n", "This project follows the [CRISP-DM](https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome) process to answer the followig questions:\n", " - How often do developers contribute to OSS?\n", " - Do Hobyist developers contribute more often to OSS?\n", " - Does OSS quality perception play a bias role towards OSS contribution?\n", " - Are experienced developers contributing more frequently to OSS?\n", " - Do developers contributing to the OSS have a higher income?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import utils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Data understanding" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(88883, 85)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"../data/so_survey_2019/survey_results_public.csv\")\n", "schema = pd.read_csv(\"../data/so_survey_2019/survey_results_schema.csv\")\n", "eu_countries = pd.read_csv(\"../data/listofeucountries.csv\")\n", "df.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Column | \n", "QuestionText | \n", "
|---|---|---|
| 0 | \n", "Respondent | \n", "Randomized respondent ID number (not in order ... | \n", "
| 1 | \n", "MainBranch | \n", "Which of the following options best describes ... | \n", "
| 2 | \n", "Hobbyist | \n", "Do you code as a hobby? | \n", "
| 3 | \n", "OpenSourcer | \n", "How often do you contribute to open source? | \n", "
| 4 | \n", "OpenSource | \n", "How do you feel about the quality of open sour... | \n", "
| \n", " | Respondent | \n", "MainBranch | \n", "Hobbyist | \n", "OpenSourcer | \n", "OpenSource | \n", "Employment | \n", "Country | \n", "Student | \n", "EdLevel | \n", "UndergradMajor | \n", "... | \n", "WelcomeChange | \n", "SONewContent | \n", "Age | \n", "Gender | \n", "Trans | \n", "Sexuality | \n", "Ethnicity | \n", "Dependents | \n", "SurveyLength | \n", "SurveyEase | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "I am a student who is learning to code | \n", "Yes | \n", "Never | \n", "The quality of OSS and closed source software ... | \n", "Not employed, and not looking for work | \n", "United Kingdom | \n", "No | \n", "Primary/elementary school | \n", "NaN | \n", "... | \n", "Just as welcome now as I felt last year | \n", "Tech articles written by other developers;Indu... | \n", "14.0 | \n", "Man | \n", "No | \n", "Straight / Heterosexual | \n", "NaN | \n", "No | \n", "Appropriate in length | \n", "Neither easy nor difficult | \n", "
| 1 | \n", "2 | \n", "I am a student who is learning to code | \n", "No | \n", "Less than once per year | \n", "The quality of OSS and closed source software ... | \n", "Not employed, but looking for work | \n", "Bosnia and Herzegovina | \n", "Yes, full-time | \n", "Secondary school (e.g. American high school, G... | \n", "NaN | \n", "... | \n", "Just as welcome now as I felt last year | \n", "Tech articles written by other developers;Indu... | \n", "19.0 | \n", "Man | \n", "No | \n", "Straight / Heterosexual | \n", "NaN | \n", "No | \n", "Appropriate in length | \n", "Neither easy nor difficult | \n", "
| 2 | \n", "3 | \n", "I am not primarily a developer, but I write co... | \n", "Yes | \n", "Never | \n", "The quality of OSS and closed source software ... | \n", "Employed full-time | \n", "Thailand | \n", "No | \n", "Bachelor’s degree (BA, BS, B.Eng., etc.) | \n", "Web development or web design | \n", "... | \n", "Just as welcome now as I felt last year | \n", "Tech meetups or events in your area;Courses on... | \n", "28.0 | \n", "Man | \n", "No | \n", "Straight / Heterosexual | \n", "NaN | \n", "Yes | \n", "Appropriate in length | \n", "Neither easy nor difficult | \n", "
| 3 | \n", "4 | \n", "I am a developer by profession | \n", "No | \n", "Never | \n", "The quality of OSS and closed source software ... | \n", "Employed full-time | \n", "United States | \n", "No | \n", "Bachelor’s degree (BA, BS, B.Eng., etc.) | \n", "Computer science, computer engineering, or sof... | \n", "... | \n", "Just as welcome now as I felt last year | \n", "Tech articles written by other developers;Indu... | \n", "22.0 | \n", "Man | \n", "No | \n", "Straight / Heterosexual | \n", "White or of European descent | \n", "No | \n", "Appropriate in length | \n", "Easy | \n", "
| 4 | \n", "5 | \n", "I am a developer by profession | \n", "Yes | \n", "Once a month or more often | \n", "OSS is, on average, of HIGHER quality than pro... | \n", "Employed full-time | \n", "Ukraine | \n", "No | \n", "Bachelor’s degree (BA, BS, B.Eng., etc.) | \n", "Computer science, computer engineering, or sof... | \n", "... | \n", "Just as welcome now as I felt last year | \n", "Tech meetups or events in your area;Courses on... | \n", "30.0 | \n", "Man | \n", "No | \n", "Straight / Heterosexual | \n", "White or of European descent;Multiracial | \n", "No | \n", "Appropriate in length | \n", "Easy | \n", "
5 rows × 85 columns
\n", "| \n", " | \n", " | Respondent | \n", "
|---|---|---|
| OpenSourcer | \n", "Hobbyist | \n", "\n", " |
| Less than once a month\\nbut more than once per year | \n", "No | \n", "2346 | \n", "
| Yes | \n", "18215 | \n", "|
| Less than once per year | \n", "No | \n", "4586 | \n", "
| Yes | \n", "20386 | \n", "|
| Never | \n", "No | \n", "9564 | \n", "
| Yes | \n", "22731 | \n", "|
| Once a month or more often | \n", "No | \n", "1130 | \n", "
| Yes | \n", "9925 | \n", "
| \n", " | OpenSourcer | \n", "Hobbyist | \n", "Respondent | \n", "
|---|---|---|---|
| 0 | \n", "Less than once a month\\nbut more than once per... | \n", "No | \n", "2346 | \n", "
| 1 | \n", "Less than once a month\\nbut more than once per... | \n", "Yes | \n", "18215 | \n", "
| 2 | \n", "Less than once per year | \n", "No | \n", "4586 | \n", "
| 3 | \n", "Less than once per year | \n", "Yes | \n", "20386 | \n", "
| 4 | \n", "Never | \n", "No | \n", "9564 | \n", "
| 5 | \n", "Never | \n", "Yes | \n", "22731 | \n", "
| 6 | \n", "Once a month or more often | \n", "No | \n", "1130 | \n", "
| 7 | \n", "Once a month or more often | \n", "Yes | \n", "9925 | \n", "
| \n", " | \n", " | Respondent | \n", "
|---|---|---|
| OpenSourcer | \n", "OpenSource | \n", "\n", " |
| Less than once a month\\nbut more than once per year | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "8355 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "1505 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "8136 | \n", "|
| Less than once per year | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "8437 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "1867 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "9776 | \n", "|
| Never | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "8259 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "2542 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "11115 | \n", "|
| Once a month or more often | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "5272 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "727 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "3782 | \n", "
| \n", " | \n", " | Respondent | \n", "
|---|---|---|
| OpenSourcer | \n", "OpenSource | \n", "\n", " |
| Less than once a month\\nbut more than once per year | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "1000 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "252 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "1045 | \n", "|
| Less than once per year | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "1692 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "500 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "2300 | \n", "|
| Never | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "3026 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "1217 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "4931 | \n", "|
| Once a month or more often | \n", "OSS is, on average, of HIGHER quality than proprietary / closed source software | \n", "515 | \n", "
| OSS is, on average, of LOWER quality than proprietary / closed source software | \n", "149 | \n", "|
| The quality of OSS and closed source software is about the same | \n", "442 | \n", "
| \n", " | \n", " | Respondent | \n", "
|---|---|---|
| Years of experience | \n", "OpenSourcer | \n", "\n", " |
| 0 - 5 | \n", "Less than once a month\\nbut more than once per year | \n", "6428 | \n", "
| Less than once per year | \n", "8329 | \n", "|
| Never | \n", "12508 | \n", "|
| Once a month or more often | \n", "3239 | \n", "|
| 10 - 20 | \n", "Less than once a month\\nbut more than once per year | \n", "4327 | \n", "
| Less than once per year | \n", "5090 | \n", "|
| Never | \n", "4660 | \n", "|
| Once a month or more often | \n", "2186 | \n", "|
| 20 - 40 | \n", "Less than once a month\\nbut more than once per year | \n", "1749 | \n", "
| Less than once per year | \n", "2247 | \n", "|
| Never | \n", "2183 | \n", "|
| Once a month or more often | \n", "1018 | \n", "|
| 5 - 10 | \n", "Less than once a month\\nbut more than once per year | \n", "5331 | \n", "
| Less than once per year | \n", "6045 | \n", "|
| Never | \n", "6194 | \n", "|
| Once a month or more often | \n", "2454 | \n", "|
| >40 | \n", "Less than once a month\\nbut more than once per year | \n", "64 | \n", "
| Less than once per year | \n", "109 | \n", "|
| Never | \n", "122 | \n", "|
| Once a month or more often | \n", "48 | \n", "
| \n", " | OpenSourcer | \n", "CompTotal | \n", "mean_salary_formatted | \n", "
|---|---|---|---|
| 0 | \n", "Less than once a month\\nbut more than once per... | \n", "105824.852658 | \n", "105,824.85 | \n", "
| 1 | \n", "Less than once per year | \n", "99803.203081 | \n", "99,803.20 | \n", "
| 2 | \n", "Never | \n", "93806.636620 | \n", "93,806.64 | \n", "
| 3 | \n", "Once a month or more often | \n", "115656.535993 | \n", "115,656.54 | \n", "