{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This Analysis and Charts will help Aspiring Data Professionals make smarter decisions. Data is collected from glassdoor website.\n",
"
Data is cleaned and transformed to start doing analysis."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": true
}
}
}
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import plotly.graph_objects as go\n",
"from plotly.subplots import make_subplots\n",
"import plotly.express as px\n",
"import plotly.io as pio\n",
"pio.renderers.default = \"notebook\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": true
}
}
}
}
},
"outputs": [],
"source": [
"data = pd.read_csv(\"data_scientist_jobinfo.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" job_title | \n",
" Location | \n",
" Sector | \n",
" Python | \n",
" R | \n",
" Scala | \n",
" Spark | \n",
" AWS | \n",
" SQL | \n",
" Excel | \n",
" PowerBI | \n",
" Tableau | \n",
" Tensorflow | \n",
" Pytorch | \n",
" Keras | \n",
" Company_Size | \n",
" Company_Age | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Engineer | \n",
" Winnipeg | \n",
" Information Technology | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" Medium | \n",
" 34 | \n",
"
\n",
" \n",
" 1 | \n",
" Scientist | \n",
" Toronto | \n",
" Information Technology | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" Small | \n",
" 7 | \n",
"
\n",
" \n",
" 2 | \n",
" Scientist | \n",
" Toronto | \n",
" Business Services | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" Medium | \n",
" 28 | \n",
"
\n",
" \n",
" 3 | \n",
" Scientist | \n",
" Vancouver | \n",
" Information Technology | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" Medium | \n",
" 10 | \n",
"
\n",
" \n",
" 4 | \n",
" Analyst | \n",
" Waterloo | \n",
" -1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" Small | \n",
" -1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" job_title Location Sector Python R Scala Spark AWS \\\n",
"0 Engineer Winnipeg Information Technology 1 0 0 0 0 \n",
"1 Scientist Toronto Information Technology 1 0 0 1 1 \n",
"2 Scientist Toronto Business Services 1 0 1 1 1 \n",
"3 Scientist Vancouver Information Technology 1 0 1 0 1 \n",
"4 Analyst Waterloo -1 1 0 0 0 1 \n",
"\n",
" SQL Excel PowerBI Tableau Tensorflow Pytorch Keras Company_Size \\\n",
"0 1 1 1 1 0 0 0 Medium \n",
"1 1 0 0 0 0 0 0 Small \n",
"2 1 0 0 0 0 0 0 Medium \n",
"3 0 1 0 0 0 0 0 Medium \n",
"4 1 1 0 0 0 0 0 Small \n",
"\n",
" Company_Age \n",
"0 34 \n",
"1 7 \n",
"2 28 \n",
"3 10 \n",
"4 -1 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 532 entries, 0 to 531\n",
"Data columns (total 17 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 job_title 532 non-null object\n",
" 1 Location 532 non-null object\n",
" 2 Sector 532 non-null object\n",
" 3 Python 532 non-null int64 \n",
" 4 R 532 non-null int64 \n",
" 5 Scala 532 non-null int64 \n",
" 6 Spark 532 non-null int64 \n",
" 7 AWS 532 non-null int64 \n",
" 8 SQL 532 non-null int64 \n",
" 9 Excel 532 non-null int64 \n",
" 10 PowerBI 532 non-null int64 \n",
" 11 Tableau 532 non-null int64 \n",
" 12 Tensorflow 532 non-null int64 \n",
" 13 Pytorch 532 non-null int64 \n",
" 14 Keras 532 non-null int64 \n",
" 15 Company_Size 532 non-null object\n",
" 16 Company_Age 532 non-null int64 \n",
"dtypes: int64(13), object(4)\n",
"memory usage: 70.8+ KB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have 532 rows and 17 columns"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = px.pie(data, names='job_title', title='Job Title', color_discrete_sequence=px.colors.sequential.haline)\n",
"fig.update_traces(textposition='inside', textinfo='percent+label+value', pull=[0, 0.2, 0, 0, 0, 0],\n",
" marker=dict(line=dict(color='#000000', width=2)))\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Based on the pie chart, roughly 38.2% of the data job which were posted is Data Scientist. Data Analyst comes second with 26.3% and Data Engineer comes third with 18.4%. Other roles such as Research Scientist, Machine Learning Engineer and Director is under 10%. But it also because of over lapping that happens in job roles. Some companies include MLE's task in Data Scientist role. However it clearly shows that Data Scientist are in demand."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Top 10 Sectors which have the most Jobs: \n",
"\n",
"Information Technology 131\n",
"Business Services 55\n",
"Finance 52\n",
"Biotech & Pharmaceuticals 36\n",
"Retail 29\n",
"Media 22\n",
"Manufacturing 18\n",
"Insurance 13\n",
"Telecommunications 11\n",
"Healthcare 10\n",
"Name: Sector, dtype: int64\n"
]
}
],
"source": [
"print(\"Top 10 Sectors which have the most Jobs: \\n\")\n",
"\n",
"sector_data = data[data['Sector']!='-1']\n",
"print(sector_data['Sector'].value_counts()[:10])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sector_wise = sector_data.groupby(by=['Sector'])['job_title'].count()\n",
"fig = go.Figure(data=[go.Bar(x=sector_wise.index, y=sector_wise.values)])\n",
"\n",
"fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',\n",
" marker_line_width=1.5, opacity=0.8)\n",
"\n",
"fig.update_layout(xaxis={'categoryorder':'total descending'},\n",
" title=\"Sector wise Total Jobs\",\n",
" xaxis_title=\"Sectors\",\n",
" yaxis_title=\"Total Jobs(532)\")\n",
"\n",
"fig.update_xaxes(tickangle=45, tickfont=dict(family='Rockwell', color='crimson', size=14))\n",
"fig.update_yaxes(tickfont=dict(family='Rockwell', color='darkblue', size=14))\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"IT sector has the most jobs than any other. In fact, it has over 100 posting while Business Services has just around 50 which is second in the order. Finance, Biotech & Pharmaceauticals and Retail sector also has more job postings. Based on this aspiring data scientists can choose which sector they should target."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Job Count | \n",
"
\n",
" \n",
" Sector | \n",
" job_title | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Information Technology | \n",
" Scientist | \n",
" 41 | \n",
"
\n",
" \n",
" Engineer | \n",
" 37 | \n",
"
\n",
" \n",
" Analyst | \n",
" 31 | \n",
"
\n",
" \n",
" Business Services | \n",
" Analyst | \n",
" 23 | \n",
"
\n",
" \n",
" Biotech & Pharmaceuticals | \n",
" Scientist | \n",
" 22 | \n",
"
\n",
" \n",
" Finance | \n",
" Scientist | \n",
" 19 | \n",
"
\n",
" \n",
" Retail | \n",
" Scientist | \n",
" 18 | \n",
"
\n",
" \n",
" Business Services | \n",
" Scientist | \n",
" 16 | \n",
"
\n",
" \n",
" Finance | \n",
" Analyst | \n",
" 15 | \n",
"
\n",
" \n",
" Engineer | \n",
" 11 | \n",
"
\n",
" \n",
" Information Technology | \n",
" MLE | \n",
" 10 | \n",
"
\n",
" \n",
" Retail | \n",
" Analyst | \n",
" 10 | \n",
"
\n",
" \n",
" Insurance | \n",
" Scientist | \n",
" 9 | \n",
"
\n",
" \n",
" Business Services | \n",
" Engineer | \n",
" 8 | \n",
"
\n",
" \n",
" Media | \n",
" Scientist | \n",
" 8 | \n",
"
\n",
" \n",
" Analyst | \n",
" 7 | \n",
"
\n",
" \n",
" Information Technology | \n",
" Researcher | \n",
" 7 | \n",
"
\n",
" \n",
" Biotech & Pharmaceuticals | \n",
" Researcher | \n",
" 6 | \n",
"
\n",
" \n",
" Manufacturing | \n",
" Analyst | \n",
" 6 | \n",
"
\n",
" \n",
" Engineer | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Job Count\n",
"Sector job_title \n",
"Information Technology Scientist 41\n",
" Engineer 37\n",
" Analyst 31\n",
"Business Services Analyst 23\n",
"Biotech & Pharmaceuticals Scientist 22\n",
"Finance Scientist 19\n",
"Retail Scientist 18\n",
"Business Services Scientist 16\n",
"Finance Analyst 15\n",
" Engineer 11\n",
"Information Technology MLE 10\n",
"Retail Analyst 10\n",
"Insurance Scientist 9\n",
"Business Services Engineer 8\n",
"Media Scientist 8\n",
" Analyst 7\n",
"Information Technology Researcher 7\n",
"Biotech & Pharmaceuticals Researcher 6\n",
"Manufacturing Analyst 6\n",
" Engineer 5"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pivot_data = data[data['Sector']!='-1']\n",
"\n",
"pd.options.display.max_rows\n",
"pd.set_option('display.max_rows', None)\n",
"pd.pivot_table(pivot_data, index =['Sector','job_title'],values='Company_Age', aggfunc='count').sort_values(\n",
" 'Company_Age', ascending = False).rename(columns={'Company_Age':'Job Count'})[:20]"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Above table shows which job roles are most wanted by which sector. For instance Business Services needs more analysts than scientist which makes sense Since they focus on making smarter decision by analysing data rather than building models. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Small 227\n",
"Medium 178\n",
"Large 127\n",
"Name: Company_Size, dtype: int64\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Job Count | \n",
"
\n",
" \n",
" Company_Size | \n",
" job_title | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Small | \n",
" Analyst | \n",
" 41 | \n",
"
\n",
" \n",
" Scientist | \n",
" 34 | \n",
"
\n",
" \n",
" Engineer | \n",
" 25 | \n",
"
\n",
" \n",
" MLE | \n",
" 8 | \n",
"
\n",
" \n",
" Researcher | \n",
" 8 | \n",
"
\n",
" \n",
" Director | \n",
" 3 | \n",
"
\n",
" \n",
" Medium | \n",
" Scientist | \n",
" 65 | \n",
"
\n",
" \n",
" Analyst | \n",
" 47 | \n",
"
\n",
" \n",
" Engineer | \n",
" 31 | \n",
"
\n",
" \n",
" Researcher | \n",
" 18 | \n",
"
\n",
" \n",
" Director | \n",
" 7 | \n",
"
\n",
" \n",
" MLE | \n",
" 4 | \n",
"
\n",
" \n",
" Large | \n",
" Scientist | \n",
" 58 | \n",
"
\n",
" \n",
" Engineer | \n",
" 24 | \n",
"
\n",
" \n",
" Analyst | \n",
" 22 | \n",
"
\n",
" \n",
" Researcher | \n",
" 10 | \n",
"
\n",
" \n",
" MLE | \n",
" 6 | \n",
"
\n",
" \n",
" Director | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Job Count\n",
"Company_Size job_title \n",
"Small Analyst 41\n",
" Scientist 34\n",
" Engineer 25\n",
" MLE 8\n",
" Researcher 8\n",
" Director 3\n",
"Medium Scientist 65\n",
" Analyst 47\n",
" Engineer 31\n",
" Researcher 18\n",
" Director 7\n",
" MLE 4\n",
"Large Scientist 58\n",
" Engineer 24\n",
" Analyst 22\n",
" Researcher 10\n",
" MLE 6\n",
" Director 5"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(data['Company_Size'].value_counts())\n",
"\n",
"pd.pivot_table(pivot_data, index =['Company_Size','job_title'],values='Company_Age', aggfunc='count').sort_values(\n",
" ['Company_Size','Company_Age'], ascending = False).rename(columns={'Company_Age':'Job Count'})[:20]"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Above table tell us that it's not only big companies that is making use of data. Now even smaller companies is starting to realize power of data and how it can help them. And they are the ones who is hiring more. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = px.histogram(data[data['Company_Age']>0], x=\"Company_Age\",\n",
" opacity=.8, labels={'Company_Age':'Company Age'},\n",
" title='Histogram of Company\\'s Age',\n",
" color_discrete_sequence=['rgb(0, 100, 100)'])\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"This histogram demonstrates that even newer companies are hiring data professionals to make smarter decision for their businesses. So it also shows that you don't need huge amount of data to drive more business profits. It's about how you use, what you have to solve business problems."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Job_Count | \n",
"
\n",
" \n",
" Location | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Toronto | \n",
" 152 | \n",
"
\n",
" \n",
" Vancouver | \n",
" 72 | \n",
"
\n",
" \n",
" Montreal | \n",
" 68 | \n",
"
\n",
" \n",
" Mississauga | \n",
" 29 | \n",
"
\n",
" \n",
" Ottawa | \n",
" 25 | \n",
"
\n",
" \n",
" Brampton | \n",
" 21 | \n",
"
\n",
" \n",
" Calgary | \n",
" 15 | \n",
"
\n",
" \n",
" Canada | \n",
" 9 | \n",
"
\n",
" \n",
" Waterloo | \n",
" 8 | \n",
"
\n",
" \n",
" Victoria | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Job_Count\n",
"Location \n",
"Toronto 152\n",
"Vancouver 72\n",
"Montreal 68\n",
"Mississauga 29\n",
"Ottawa 25\n",
"Brampton 21\n",
"Calgary 15\n",
"Canada 9\n",
"Waterloo 8\n",
"Victoria 8"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.pivot_table(data, index =['Location'],values='Company_Age', aggfunc='count').sort_values(\n",
" 'Company_Age', ascending = False).rename(columns={'Company_Age':'Job_Count'})[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Above table shows that most jobs will be in bigger cities."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
},
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]]\n",
"\n",
"fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Python', 'R', 'SQL', 'Scala'])\n",
"\n",
"fig.add_trace(go.Pie(labels=['Yes','No'], values=data['Python'].value_counts(), name='Python',\n",
" marker_colors=['#00FFFF','#550000']), 1, 1)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['R'].value_counts(), name='R'), 1, 2)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['SQL'].value_counts(), name='SQL'), 2, 1)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Scala'].value_counts(), name='Scala'), 2, 2)\n",
"\n",
"fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n",
" marker=dict(line=dict(color='#000000', width=2)))\n",
"\n",
"fig.update(layout_title_text='Languages Requirements',\n",
" layout_showlegend=True)\n",
"\n",
"fig.update_layout(\n",
" autosize=False,\n",
" width=700,\n",
" height=700)\n",
"\n",
"fig = go.Figure(fig)\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Above Pie charts illustrates that Python and SQL are the must have language for any data professionals. Other languages depends on company's requirements. Scala is also getting popular because of Apache Spark."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'},{'type':'domain'}]]\n",
"\n",
"fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Tensorflow', 'Pytorch', 'Keras'])\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Tensorflow'].value_counts(),\n",
" name='Tensorflow', marker_colors=['#550000','#00FFFF']), 1, 1)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Pytorch'].value_counts(), name='Pytorch'), 1, 2)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Keras'].value_counts(), name='Keras'), 2, 1)\n",
"\n",
"fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n",
" marker=dict(line=dict(color='#000000', width=2)))\n",
"\n",
"fig.update(layout_title_text='DL Framework Requirements',\n",
" layout_showlegend=True)\n",
"\n",
"fig.update_layout(autosize=False,\n",
" width=800,\n",
" height=800)\n",
"\n",
"fig = go.Figure(fig)\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"Most companies requires that you know tensorflow and it's higher level API Keras. Tensorflow is more popular than Pytorch because of it's deployment functionalities. Nevertheless Pytorch is also popular for it's easy use."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'},{'type':'domain'}]]\n",
"\n",
"fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Excel', 'Tableau', 'PowerBI'])\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Excel'].value_counts(),\n",
" name='Excel', marker_colors=['#550000','#00FFFF']), 1, 1)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Tableau'].value_counts(), name='Tableau'), 1, 2)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['PowerBI'].value_counts(), \n",
" name='PowerBI'), 2,1)\n",
"\n",
"fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n",
" marker=dict(line=dict(color='#000000', width=2)))\n",
"\n",
"fig.update(layout_title_text='BI Tool Requirements',\n",
" layout_showlegend=True)\n",
"\n",
"fig.update_layout(autosize=False,\n",
" width=800,\n",
" height=800)\n",
"\n",
"fig = go.Figure(fig)\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"In terms of visualization tools Excel is still popular but Tableau is more powerful tool which is very easy to use and doesn't require any coding skills."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"specs = [[{'type':'domain'}, {'type':'domain'}]]\n",
"\n",
"fig = make_subplots(rows=1, cols=2, specs=specs, subplot_titles=['AWS', 'Spark'])\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['AWS'].value_counts(),\n",
" name='AWS', marker_colors=['#550000','#00FFFF']), 1, 1)\n",
"\n",
"fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Spark'].value_counts(), name='Spark'), 1, 2)\n",
"\n",
"fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n",
" marker=dict(line=dict(color='#000000', width=2)))\n",
"\n",
"fig.update(layout_title_text='AWS & Spark Requirements',\n",
" layout_showlegend=True)\n",
"\n",
"fig = go.Figure(fig)\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": false
}
}
}
}
},
"source": [
"AWS and spark are the most important technologies that one should know for better job prospects at a larger companies."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": true
}
}
}
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
\n",
" \n",
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"columns = ['Python', 'R', 'AWS', 'Scala', 'Excel', 'Tableau', 'PowerBI', 'Spark', 'SQL', 'Pytorch', 'Tensorflow', 'Keras']\n",
"count = []\n",
"\n",
"for col in columns:\n",
" count.append(data[data[col]==1][col].count())\n",
"\n",
"\n",
"fig = go.Figure(data=[go.Bar(x=columns, y=count)])\n",
"\n",
"fig.update_traces(marker_color='darkblue', marker_line_color='rgb(0,255,255)',\n",
" marker_line_width=1.5, opacity=.8)\n",
"\n",
"fig.update_layout(xaxis={'categoryorder':'total descending'},\n",
" title=\"Number of times Tool & Technologies Mentioned in Job Descriptions\",\n",
" xaxis_title=\"Tools & Technologies\",\n",
" yaxis_title=\"Count(532)\")\n",
"\n",
"fig.update_xaxes(tickfont=dict(family='Rockwell', color='crimson', size=14))\n",
"fig.update_yaxes(tickfont=dict(family='Rockwell', color='darkblue', size=14))\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"extensions": {
"jupyter_dashboards": {
"version": 1,
"views": {
"grid_default": {},
"report_default": {
"hidden": true
}
}
}
}
},
"source": [
"This Bar Graph demonstrates that which tools you should more focus on learning. One more thing, Here keras is last but that doesn't mean that it's not required, most companies does not include it in job description because they expects you to know this basic tools for easy model development."
]
}
],
"metadata": {
"extensions": {
"jupyter_dashboards": {
"activeView": "grid_default",
"version": 1,
"views": {
"grid_default": {
"cellMargin": 10,
"defaultCellHeight": 20,
"maxColumns": 12,
"name": "grid",
"type": "grid"
},
"report_default": {
"name": "report",
"type": "report"
}
}
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}