{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This Analysis and Charts will help Aspiring Data Professionals make smarter decisions. Data is collected from glassdoor website.\n", "
Data is cleaned and transformed to start doing analysis." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "outputs": [], "source": [ "import pandas as pd\n", "import plotly.graph_objects as go\n", "from plotly.subplots import make_subplots\n", "import plotly.express as px\n", "import plotly.io as pio\n", "pio.renderers.default = \"notebook\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "outputs": [], "source": [ "data = pd.read_csv(\"data_scientist_jobinfo.csv\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
job_titleLocationSectorPythonRScalaSparkAWSSQLExcelPowerBITableauTensorflowPytorchKerasCompany_SizeCompany_Age
0EngineerWinnipegInformation Technology100001111000Medium34
1ScientistTorontoInformation Technology100111000000Small7
2ScientistTorontoBusiness Services101111000000Medium28
3ScientistVancouverInformation Technology101010100000Medium10
4AnalystWaterloo-1100011100000Small-1
\n", "
" ], "text/plain": [ " job_title Location Sector Python R Scala Spark AWS \\\n", "0 Engineer Winnipeg Information Technology 1 0 0 0 0 \n", "1 Scientist Toronto Information Technology 1 0 0 1 1 \n", "2 Scientist Toronto Business Services 1 0 1 1 1 \n", "3 Scientist Vancouver Information Technology 1 0 1 0 1 \n", "4 Analyst Waterloo -1 1 0 0 0 1 \n", "\n", " SQL Excel PowerBI Tableau Tensorflow Pytorch Keras Company_Size \\\n", "0 1 1 1 1 0 0 0 Medium \n", "1 1 0 0 0 0 0 0 Small \n", "2 1 0 0 0 0 0 0 Medium \n", "3 0 1 0 0 0 0 0 Medium \n", "4 1 1 0 0 0 0 0 Small \n", "\n", " Company_Age \n", "0 34 \n", "1 7 \n", "2 28 \n", "3 10 \n", "4 -1 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 532 entries, 0 to 531\n", "Data columns (total 17 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 job_title 532 non-null object\n", " 1 Location 532 non-null object\n", " 2 Sector 532 non-null object\n", " 3 Python 532 non-null int64 \n", " 4 R 532 non-null int64 \n", " 5 Scala 532 non-null int64 \n", " 6 Spark 532 non-null int64 \n", " 7 AWS 532 non-null int64 \n", " 8 SQL 532 non-null int64 \n", " 9 Excel 532 non-null int64 \n", " 10 PowerBI 532 non-null int64 \n", " 11 Tableau 532 non-null int64 \n", " 12 Tensorflow 532 non-null int64 \n", " 13 Pytorch 532 non-null int64 \n", " 14 Keras 532 non-null int64 \n", " 15 Company_Size 532 non-null object\n", " 16 Company_Age 532 non-null int64 \n", "dtypes: int64(13), object(4)\n", "memory usage: 70.8+ KB\n" ] } ], "source": [ "data.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 532 rows and 17 columns" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = px.pie(data, names='job_title', title='Job Title', color_discrete_sequence=px.colors.sequential.haline)\n", "fig.update_traces(textposition='inside', textinfo='percent+label+value', pull=[0, 0.2, 0, 0, 0, 0],\n", " marker=dict(line=dict(color='#000000', width=2)))\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Based on the pie chart, roughly 38.2% of the data job which were posted is Data Scientist. Data Analyst comes second with 26.3% and Data Engineer comes third with 18.4%. Other roles such as Research Scientist, Machine Learning Engineer and Director is under 10%. But it also because of over lapping that happens in job roles. Some companies include MLE's task in Data Scientist role. However it clearly shows that Data Scientist are in demand." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 10 Sectors which have the most Jobs: \n", "\n", "Information Technology 131\n", "Business Services 55\n", "Finance 52\n", "Biotech & Pharmaceuticals 36\n", "Retail 29\n", "Media 22\n", "Manufacturing 18\n", "Insurance 13\n", "Telecommunications 11\n", "Healthcare 10\n", "Name: Sector, dtype: int64\n" ] } ], "source": [ "print(\"Top 10 Sectors which have the most Jobs: \\n\")\n", "\n", "sector_data = data[data['Sector']!='-1']\n", "print(sector_data['Sector'].value_counts()[:10])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sector_wise = sector_data.groupby(by=['Sector'])['job_title'].count()\n", "fig = go.Figure(data=[go.Bar(x=sector_wise.index, y=sector_wise.values)])\n", "\n", "fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',\n", " marker_line_width=1.5, opacity=0.8)\n", "\n", "fig.update_layout(xaxis={'categoryorder':'total descending'},\n", " title=\"Sector wise Total Jobs\",\n", " xaxis_title=\"Sectors\",\n", " yaxis_title=\"Total Jobs(532)\")\n", "\n", "fig.update_xaxes(tickangle=45, tickfont=dict(family='Rockwell', color='crimson', size=14))\n", "fig.update_yaxes(tickfont=dict(family='Rockwell', color='darkblue', size=14))\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "IT sector has the most jobs than any other. In fact, it has over 100 posting while Business Services has just around 50 which is second in the order. Finance, Biotech & Pharmaceauticals and Retail sector also has more job postings. Based on this aspiring data scientists can choose which sector they should target." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Job Count
Sectorjob_title
Information TechnologyScientist41
Engineer37
Analyst31
Business ServicesAnalyst23
Biotech & PharmaceuticalsScientist22
FinanceScientist19
RetailScientist18
Business ServicesScientist16
FinanceAnalyst15
Engineer11
Information TechnologyMLE10
RetailAnalyst10
InsuranceScientist9
Business ServicesEngineer8
MediaScientist8
Analyst7
Information TechnologyResearcher7
Biotech & PharmaceuticalsResearcher6
ManufacturingAnalyst6
Engineer5
\n", "
" ], "text/plain": [ " Job Count\n", "Sector job_title \n", "Information Technology Scientist 41\n", " Engineer 37\n", " Analyst 31\n", "Business Services Analyst 23\n", "Biotech & Pharmaceuticals Scientist 22\n", "Finance Scientist 19\n", "Retail Scientist 18\n", "Business Services Scientist 16\n", "Finance Analyst 15\n", " Engineer 11\n", "Information Technology MLE 10\n", "Retail Analyst 10\n", "Insurance Scientist 9\n", "Business Services Engineer 8\n", "Media Scientist 8\n", " Analyst 7\n", "Information Technology Researcher 7\n", "Biotech & Pharmaceuticals Researcher 6\n", "Manufacturing Analyst 6\n", " Engineer 5" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pivot_data = data[data['Sector']!='-1']\n", "\n", "pd.options.display.max_rows\n", "pd.set_option('display.max_rows', None)\n", "pd.pivot_table(pivot_data, index =['Sector','job_title'],values='Company_Age', aggfunc='count').sort_values(\n", " 'Company_Age', ascending = False).rename(columns={'Company_Age':'Job Count'})[:20]" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Above table shows which job roles are most wanted by which sector. For instance Business Services needs more analysts than scientist which makes sense Since they focus on making smarter decision by analysing data rather than building models. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Small 227\n", "Medium 178\n", "Large 127\n", "Name: Company_Size, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Job Count
Company_Sizejob_title
SmallAnalyst41
Scientist34
Engineer25
MLE8
Researcher8
Director3
MediumScientist65
Analyst47
Engineer31
Researcher18
Director7
MLE4
LargeScientist58
Engineer24
Analyst22
Researcher10
MLE6
Director5
\n", "
" ], "text/plain": [ " Job Count\n", "Company_Size job_title \n", "Small Analyst 41\n", " Scientist 34\n", " Engineer 25\n", " MLE 8\n", " Researcher 8\n", " Director 3\n", "Medium Scientist 65\n", " Analyst 47\n", " Engineer 31\n", " Researcher 18\n", " Director 7\n", " MLE 4\n", "Large Scientist 58\n", " Engineer 24\n", " Analyst 22\n", " Researcher 10\n", " MLE 6\n", " Director 5" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(data['Company_Size'].value_counts())\n", "\n", "pd.pivot_table(pivot_data, index =['Company_Size','job_title'],values='Company_Age', aggfunc='count').sort_values(\n", " ['Company_Size','Company_Age'], ascending = False).rename(columns={'Company_Age':'Job Count'})[:20]" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Above table tell us that it's not only big companies that is making use of data. Now even smaller companies is starting to realize power of data and how it can help them. And they are the ones who is hiring more. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = px.histogram(data[data['Company_Age']>0], x=\"Company_Age\",\n", " opacity=.8, labels={'Company_Age':'Company Age'},\n", " title='Histogram of Company\\'s Age',\n", " color_discrete_sequence=['rgb(0, 100, 100)'])\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "This histogram demonstrates that even newer companies are hiring data professionals to make smarter decision for their businesses. So it also shows that you don't need huge amount of data to drive more business profits. It's about how you use, what you have to solve business problems." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Job_Count
Location
Toronto152
Vancouver72
Montreal68
Mississauga29
Ottawa25
Brampton21
Calgary15
Canada9
Waterloo8
Victoria8
\n", "
" ], "text/plain": [ " Job_Count\n", "Location \n", "Toronto 152\n", "Vancouver 72\n", "Montreal 68\n", "Mississauga 29\n", "Ottawa 25\n", "Brampton 21\n", "Calgary 15\n", "Canada 9\n", "Waterloo 8\n", "Victoria 8" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(data, index =['Location'],values='Company_Age', aggfunc='count').sort_values(\n", " 'Company_Age', ascending = False).rename(columns={'Company_Age':'Job_Count'})[:10]" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Above table shows that most jobs will be in bigger cities." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]]\n", "\n", "fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Python', 'R', 'SQL', 'Scala'])\n", "\n", "fig.add_trace(go.Pie(labels=['Yes','No'], values=data['Python'].value_counts(), name='Python',\n", " marker_colors=['#00FFFF','#550000']), 1, 1)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['R'].value_counts(), name='R'), 1, 2)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['SQL'].value_counts(), name='SQL'), 2, 1)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Scala'].value_counts(), name='Scala'), 2, 2)\n", "\n", "fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n", " marker=dict(line=dict(color='#000000', width=2)))\n", "\n", "fig.update(layout_title_text='Languages Requirements',\n", " layout_showlegend=True)\n", "\n", "fig.update_layout(\n", " autosize=False,\n", " width=700,\n", " height=700)\n", "\n", "fig = go.Figure(fig)\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Above Pie charts illustrates that Python and SQL are the must have language for any data professionals. Other languages depends on company's requirements. Scala is also getting popular because of Apache Spark." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'},{'type':'domain'}]]\n", "\n", "fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Tensorflow', 'Pytorch', 'Keras'])\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Tensorflow'].value_counts(),\n", " name='Tensorflow', marker_colors=['#550000','#00FFFF']), 1, 1)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Pytorch'].value_counts(), name='Pytorch'), 1, 2)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Keras'].value_counts(), name='Keras'), 2, 1)\n", "\n", "fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n", " marker=dict(line=dict(color='#000000', width=2)))\n", "\n", "fig.update(layout_title_text='DL Framework Requirements',\n", " layout_showlegend=True)\n", "\n", "fig.update_layout(autosize=False,\n", " width=800,\n", " height=800)\n", "\n", "fig = go.Figure(fig)\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "Most companies requires that you know tensorflow and it's higher level API Keras. Tensorflow is more popular than Pytorch because of it's deployment functionalities. Nevertheless Pytorch is also popular for it's easy use." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'},{'type':'domain'}]]\n", "\n", "fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=['Excel', 'Tableau', 'PowerBI'])\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Excel'].value_counts(),\n", " name='Excel', marker_colors=['#550000','#00FFFF']), 1, 1)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Tableau'].value_counts(), name='Tableau'), 1, 2)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['PowerBI'].value_counts(), \n", " name='PowerBI'), 2,1)\n", "\n", "fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n", " marker=dict(line=dict(color='#000000', width=2)))\n", "\n", "fig.update(layout_title_text='BI Tool Requirements',\n", " layout_showlegend=True)\n", "\n", "fig.update_layout(autosize=False,\n", " width=800,\n", " height=800)\n", "\n", "fig = go.Figure(fig)\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "In terms of visualization tools Excel is still popular but Tableau is more powerful tool which is very easy to use and doesn't require any coding skills." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "specs = [[{'type':'domain'}, {'type':'domain'}]]\n", "\n", "fig = make_subplots(rows=1, cols=2, specs=specs, subplot_titles=['AWS', 'Spark'])\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['AWS'].value_counts(),\n", " name='AWS', marker_colors=['#550000','#00FFFF']), 1, 1)\n", "\n", "fig.add_trace(go.Pie(labels=['No','Yes'], values=data['Spark'].value_counts(), name='Spark'), 1, 2)\n", "\n", "fig.update_traces(textposition='inside', textinfo='percent+label+value', hole=.3,\n", " marker=dict(line=dict(color='#000000', width=2)))\n", "\n", "fig.update(layout_title_text='AWS & Spark Requirements',\n", " layout_showlegend=True)\n", "\n", "fig = go.Figure(fig)\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "source": [ "AWS and spark are the most important technologies that one should know for better job prospects at a larger companies." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "columns = ['Python', 'R', 'AWS', 'Scala', 'Excel', 'Tableau', 'PowerBI', 'Spark', 'SQL', 'Pytorch', 'Tensorflow', 'Keras']\n", "count = []\n", "\n", "for col in columns:\n", " count.append(data[data[col]==1][col].count())\n", "\n", "\n", "fig = go.Figure(data=[go.Bar(x=columns, y=count)])\n", "\n", "fig.update_traces(marker_color='darkblue', marker_line_color='rgb(0,255,255)',\n", " marker_line_width=1.5, opacity=.8)\n", "\n", "fig.update_layout(xaxis={'categoryorder':'total descending'},\n", " title=\"Number of times Tool & Technologies Mentioned in Job Descriptions\",\n", " xaxis_title=\"Tools & Technologies\",\n", " yaxis_title=\"Count(532)\")\n", "\n", "fig.update_xaxes(tickfont=dict(family='Rockwell', color='crimson', size=14))\n", "fig.update_yaxes(tickfont=dict(family='Rockwell', color='darkblue', size=14))\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "source": [ "This Bar Graph demonstrates that which tools you should more focus on learning. One more thing, Here keras is last but that doesn't mean that it's not required, most companies does not include it in job description because they expects you to know this basic tools for easy model development." ] } ], "metadata": { "extensions": { "jupyter_dashboards": { "activeView": "grid_default", "version": 1, "views": { "grid_default": { "cellMargin": 10, "defaultCellHeight": 20, "maxColumns": 12, "name": "grid", "type": "grid" }, "report_default": { "name": "report", "type": "report" } } } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }