NA edit: next level being context specific here whether your field is nlp customer analytics etc just wanted to see what everyone had to say without pinning responses to one field data analysis using regression and multilevel hierarchical models - gelman & hill such a great introduction to modeling and regression helps you think through structuring your models and what to do about different types of problems even if you aren't interested in multilevel models (which i think you should they are great) you can learn a lot about building examining and summarizing models the actual quality of code examples isn't as good as islr or some others and the book assumes you know some basic r but i would definitely recommend people check it out beat me to it can't speak highly enough of this book islr is now a classic of machine learning if you are way into math then get esl i also really got a lot out of practical machine learning in python that is introduction to statistical learning right? i'd have to know r already i assume what level of math do i need for elements of statistical learning? esl is hardcore so if you have to ask about the math requirements i'd avoid it and stick with islr if you don't know how to program in r there is a github repository where someone did the whole islr in python search around in github and it should be easy to find [deleted] if you are in industry it’s easily data science for business statistical rethinking by richard mcelreath it's really the book you need to develop some bayesian intuition before facing the big boss: bayesian data analysis by gelman :) kevin murphy’s machine learning: a probabilistic perspective honestly it was hands-on machinelearning with python and tensorflow though i think its outdated now learning tensorflow really well has let me focus on the actual science and less the programmatic engineering problem of developing a deep learning pipeline this one was very useful when i was doing my first logistic regression model that and the great way the introduction and end-to-end project is described makes this a must own yup that end to end project was great took me like 2 weeks to complete and digest it all why would you think it's outdated now? it only came out a few months ago and the notebooks are updated on github tensorflow has had some major updates since its been out i didn't know they updated the notebooks since bayesian reasoning and machine learning what's the next level? what are you trying to do? nlp? image recognition? speech recognition? customer analytics? improving your coding skills? really vague question it is a vague question but you haven't read any good books lately? i was searching for data science skills and i've stumbled into this http: nirvacana com thoughts wp-content uploads 2013 07 roadtodatascientist1 png i think it's a great work! anything you redditors see is missing? note: since i work as a consultant data scientist but i get stuck into end to end projects where for several months i have to fill many roles ( it business analyst sysadmin dbadmin ) my purpose is to always be prepared on the data science core skills in case more interesting jobs or projects kicks in :) i like how linear regression is like one little blip on the map but in reality you could spend years thinking about the nuances of linear regression as an economist is this really all the stats you'd need in data science? i have graduate level econometrics so am more than confident with all these techniques but i couldn't implement a kalman filter to save my life this is definitrly aiming to be more conclusive no of course you don't need all of them so much depends on the purpose domain etc no need for text mining if all your data is numeric i think you misunderstood my point my question is of the stats a data scientist uses is this about it? my for asking being that i'm an economist myself and am surprised that there isn't 'more to it' oh i see sure i'd say there's definitely more especially having been in academia and read textbooks and research defonitely more out there and in development under research i thought as much thank you! what's so special about this? it's just a bunch of techniques tools and methods in no particular order i would argue their supposed order makes it even worse because it seems like they are implying there is some type of order to this but it doesn't make any sense the branches seem to be arbitrary like why does programming branch out like that? i guess it's pure design to get the subway style for the beginner stuff it may be easy to order but for the more advanced part it would be difficult and somewhat subjective yeah then i would argue it is worse where the designer is forgoing functionality for form sorry to be shitty about this whole thing i am not blaming you but i got into data science to learn good graphical techniques and this is not it i agree maybe a venn diagram would be a better graphical representation if ordering doesn't have any sense maybe that specific roadmap must be changed and priorities revisited but i still have the feeling that with no priority things will get more confused i'd be very interested to see an attempt with spatial arrangement representing something important like relationships between the techniques or commonalities in what they're used for or order of learning or something it sounds like a daunting but cool task i'm just imagining will ferrell reading off all of this in the prestige worldwide presentation https: www youtube com watch?v=cis914madl8 "dataframes!" "regression analysis!" " ggplot2!" "research and development!" forget about d3 js in your daily work you will not be able to spend a week on creating a visualization add all the devops stuff docker jenkins oozie and so on and classic project management topics like scrum and so on add everything about nn and dl now nearly all the use cases i am working on are in this area not that the "classical" ml is not important anymore but it is already so easy to run standard ml that is done by classical statisticians developers standard workers and students so i am as a data scientist just get called to validate the approach and the results the rest of the roadmap are the absolute basics i may be misreading your comment but are you saying don't use d3 because it takes "a week creating a visualization"? because this is far from true i use d3 routinely learned it fairly quickly and can write usable d3 code in only a couple of hours depending on the desired visualization thanks! devops and project management are nice additions ( even though i'm not a fan of getting involved into devops stuff) i totally agree on d3 js for nn and dl well i would like to explore more those topics but my company and our clients prefer the "classical ml" approach because they have the feeling that they "know what's going on" when i show them weights or rules of a decision tree in some fancy powerpoint fundamentals should include some of statistics knowledge python basics cuz for example: no sql and jason can’t be of the fundamentals since some data scientists don’t have them which doesn’t mean that they lack fundamentals the order and the hierarchy of this map has to be revised term document matrix appears twice on the "orange" line of nlp is the statistics section really an accurate depiction of the amount of statistics a data scientist should know? i’m finishing up my second graduate course in statistics and already am familiar with everything on there this doesn’t seem right i'm a phd physics candidate with about two years left before i graduate i've decided not to pursue a career in academia and i'm interested in moving into data science after i graduate my research is in particle physics and my day-to-day work is programming and data analysis this sounds like a pretty good background for an aspiring data scientist but i feel like my skills and experience are so specific to particle physics that it would be difficult to sell myself for a data science job i'm essentially a self-taught programmer and data analyst who's learnt just enough programming and data analysis to enable me to do my research and nothing more i realised this probably wasn't going to get me a job so i've recently started learning some new skills tools and incorporating them into my work where possible i'm teaching myself python and i switched my analysis code from c++ to python i'm going to try switching my histogramming from root to matplotlib and i might be able to find some excuses to use pandas for some number crunching does anyone have any other suggestions? in particular are there any ex-particle physicists here who can advise me on how to avoid pigeonholing myself as someone who can only do root-based cut-and-count analyses? hey there! seems like you're me two years ago so my advice should be pretty relevant i was a particle physics grad student who graduated and transitioned to ds hands down the most useful thing i did was to take the coursera ml course (andrew ng) and complete the homeworks in python matplotlib jupyter notebooks though this isn't exactly "incorporating" something into your research but learning something new in your spare time the second most useful thing i did was switch from root and c++ to python i did all my hardcore algorithm computations in root and c++ (since i was sharing code with collaborators who used root) but i always took the resulting ttree in the root file and loaded it up into a pandas dataframe with root_numpy all the analysis from there (histogramming fitting plotting yadda yadda) was done exclusively in jupyter notebooks with pandas numpy matplotlib and scipy this made the transition to my ds job super smooth because everyone i worked with was using jupyter notebooks and these same python packages feel free to pm me if you have more questions i fully support the physics-- ds transition! the real question is what experiment?!? i'll show you mine if you show me yours! link link remember me when i come asking for a job!!! hi thanks so much for your reply this is exactly the kind of advice i was looking for! i have a couple of questions if that's okay: my understanding (please correct me if i'm wrong) is that jupyter notebooks are used for interactive analysis on a locally-stored dataset (say on my laptop) my problem here is that the datasets i process are pretty big even after skimming (several hundred gb) so they're stored on a server somewhere and in order to analyse them i have to run jobs on an hpc cluster it seems like jupyter notebooks aren't really suited to this kind of working style i guess i could test out code locally using jupyter on a small subset of my data then once i'm happy with it run it on the cluster is that what you're suggesting? or am i completely misunderstanding? i can't be the only particle physicists who has to deal with large amounts of data ! one of the things root is really good at is accessing large amounts of out-of-memory data when you're dealing with hundreds of gbs of data this is essential because there's no way to hold all that data in memory at the same time so with root i can for example plot some quantity from my tchain entry-by-entry as my code loops over the entries without having to load the entire tchain into memory my understanding (again please correct me if i'm wrong) is that this is not possible using python pandas and i'd instead have to load the entire tchain into memory loop over the entries and store everything i want to plot in a pandas dataframe then once the loop is finished i can plot it how do you avoid running out of memory? is it simply a case of splitting up your analysis into a sufficiently large number of jobs such that each job uses only a small amount of memory? yeah jupyter isn't so good for large scale analysis you could definitely use it to test your code before productionizing it on a large dataset however the way i used it was to first run the c++ reconstruction tasks on the huge data sets (usually parallelized across a big cluster) and have that output much smaller root files containing only ttrees or ntuples then i would transfer those root files to my local machine and do the actual physics analysis in a notebook yeah pandas isn't as good at incremental memory accesses as root what i mentioned above in (1) is what i would suggest run the basic reconstruction in c++ and store the actual variables you're interested in in a ttree then load that ttree into a notebook i regularly ran c++ jobs over ~1tb of data and condensed the useful bits all down into a ~10 mb of root files you can use pyroot you get the root i o advantages and you can use almost any python packages you want there are many nice things about using pyroot over normal root though there are a few quirks (be very careful about scoping) you generally want to complete any looping over momentum vectors before you load into pandas because that is not a really well supported use case histogramming is also not super well supported in matplotlib so if you're only plotting you should make the histograms in root and then extract the values there should be lots of people around who can show you how to run pyroot on the grid instead of your usual executables i'm already using pyroot (having switched from regular root) so that's not the issue i guess my main problem is you generally want to complete any looping over momentum vectors before you load into pandas because that is not a really well supported use case histogramming is also not super well supported in matplotlib so if you're only plotting you should make the histograms in root my analyses seem mostly to consist of looping over millions of events doing calculations including lots of specialist particle physics stuff like relativistic four-vectors and then plotting things each loop this is fine in (py)root but not really possible using pandas matplotlib so i've transitioned from root to pyroot which is definitely a positive step but i feel trapped in a situation where i'm effectively doing a regular root analysis just written in python rather than using more marketable tools like pandas matplotlib jupyter etc u n3utrino mentioned he regularly ran c++ jobs over ~1tb of data and condensed the useful bits all down into a ~10 mb of root files i'm more like 10 tb - 300 gb which ignoring all the other problems is still much too large to do an interactive jupyter analysis on my laptop either my analyses just aren't suited to small file sizes or i need to get better at throwing out useless data! without knowing what your analysis it's really hard to judge but look at it this way: matplotlib is for making plots it's hard to imagine that your plot making step requires 300 gb of input you are producing histograms and then you're plotting them if producing them and plotting are done in the same step you should seriously think about refactoring your analysis code after you make your histograms nothing stops you from reading them into a notebook and doing the actual plotting in matplotlib i honestly don't think pandas is a super good fit for particle physics analysis but i think you could use it if you really wanted to it's again hard to imagine needing a few kilobytes per event at the final stage of your analysis that would mean you're putting ~100m events into your plots if that's your use case then pandas is not going to work for you i would dig into machine learning try to use some deep neural network for nonlinear cut optimization or jet analysis take a look at those papers: enhanced higgs to τ+τ− searches with deep learning parameterized machine learning for high-energy physics a convolutional neural network neutrino event classifier classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions stacking machine learning classifiers to identify higgs bosons at the lhc decorrelated jet substructure tagging using adversarial neural networks jet constituents for deep neural network based top quark tagging efficient antihydrogen detection in antimatter physics by deep learning so it essentially boils down to your field of interest if you are into some heavy flavour dig into cut optimizations in order to reduce background if you are on the jet side read papers above and try doing something like that if you are in detectors i've heard that lhcb is doing some detector readouts using advanced ml techniques my suggestion would be also to read some ml theory for starter i've taken full andrew ng's cs229 from stanford then at this moment i'm reading deep learning book from goodfellow bengio and courville hope this helps but take it with a grain of salt as i'm still undergraduate that has a lot to learn put a website together that displays your research in an appealing way na how to get user flair filters (in testing): discussion meta career networking tooling education projects fun trivia welcome to r datascience a place to discuss data data science becoming a data scientist data munging and more! data science related subreddits related applications r analytics r bigdata_analytics r bigdata r businessintelligence r visualization related methods r machinelearning r statistics r rstats r pystats r datasets related help r askstatistics r learnmachinelearning r mlquestions r datascience curated content official podcast list where to start if you're brand new to this subreddit and want to ask a question please use the search functionality first before posting this way you can search if someone has already asked your question you can use the search form on this page or visit the following link which will allow you to search only this subreddit = data science subreddit search rules of the road be fair be patient be helpful read this no video links no surveys post suggestions anything you think is worthy of discussion no listicles (n free videos y free book z free courses etc ) no "best of" posts that are just lists - rather than show a list contribute a post that describes your top choice remember the reddit self-promotion rule of thumb: ""for every 1 time you post self-promotional content 9 other posts (submissions or comments) should not contain self-promotional content "" if cross-posting please put it in the title - it is very helpful to see what other discussions are happening elsewhere on reddit i'm a data science beginner with a software engineering background that's just trying to learn some of the basic data science skills the johns hopkins specialization on coursera seems like a great place to start but i have concerns about the fact that it focuses on r instead of python which is a language that i'm already familiar with i don't want this to turn into a python vs r flame war my question focuses more on the content of the courses do the coursera courses do a good enough job teaching transferable concepts and skills as opposed to just teaching you how to use a particular language? for those who have taken it do you feel like you learned more about data science or just more about r? i'm a data scientist and i've done the coursera certification my opinion is that it is definitely not enough to be a data scientist it's way too simplistic this it's enough for someone from a different field to become a data analyst (e g someone underneath a data scientist) but not enough for you to become a data scientist (e g machine learning and prediction model design) hoping for a few hundred hours of online coursework to make you a data scientist isn't a realistic goal anyway fwiw i have taken a few of the classes i also went in proficient in python and mostly ignorant of r and i've enjoyed learning a new language i'm even growing to like it and imo it still has some strong advantages over python for working with data agreed if you treat it as the only training you need to become a data scientist you will be disappointed however if you take it for what it is a decent foray into data analysis you can get a lot out of it i'm currently doing bsc maths and stats and have started their data science track as an introduction to the field but what you recommend after this? okay followup question for you: say i have two engineering degrees that included classes like stats prob regression doe and even quality control despite not using advanced stats in awhile would the coursera certificate plus some additional resources get me to the point where i can start doing prediction modeling? i have done some of their r courses in the past and recently did their stats course to refresh my basic stats knowledge it's not like you just learn how to do things in r and that's that when i did the stats course they covered variuous probability distributions and showed how to use that knowledge in r (e g get quantiles from a particular distribution) if you're not familiar with statistics it's a good way to get introduced to it ideally you would learn how to implement it in python but i can't imagine that being a particularly big hurdle i am sure the other topics are the same i found the courses very helpful and was able to get a new job partly because i took the courses and applied what i learned to my work fortunately i was in a great position to do that if you're looking to get into data science the coursera courses are a great start i would also take andre ngs course on machine learning that being said you'll probably need more than just these courses there are many entry points into data science that don't require a phd everyone needs data wrangling everyone needs programmers who understand basic statistics take the courses and continue expanding your knowledge then look for entry points into the field that you qualify for you will probably need both r and python during your career as a data scientist it teaches a mix of statistics and a program so r with statistics for these classes most of the homework and classwork gets done in python by small groups of students that host the scripts on github using r is very intuitive for getting started in data science in my opinion i used it a lot last year at work and now i am using the skills and concepts i learnt and converting the code to python pandas so it is more maintainable by my other team members and i don't regret starting with r at all data analysis data science is very different from my experience in what i've learned at uni and moocs to what i practice - it very much depends on how data driven your organization is i myself find business acumen and presentation skills equally as important if not more important that analytical know-how what do you guys say about these reviews from class-central com? https: www class-central com mooc 1713 coursera-r-programming these classes over at john hopkins have low ratings between 2-3 stars each would those that have taken it agree with the reviews? i did'nt saw your post when i was writing mine link if you look at this you can find resources for both r as well as python your question is good and it would help me as a beginner as well will wait for some coursera alumni to answer this good luck :) good luck of course different companies need different types of data scientists with different skill sets but could you recommend some sample projects (that could be done in a reasonable time) that highlight different aspects of data science or differing skill sets? (e g gathering cleaning data statistical analyses predictive modeling) here's what you should do: forget about boiling the ocean 2 take a hyper-targeted "magnifying glass fire starting" approach find 5-10 data science jobs you’d take if offered to you figure out what skill sets job responsibilities you would have find the common ones (nlp recommendation classification etc ) figure out the tool sets that the jobs require find the common tools (r python scala hadoop etc ) if it's not a tool or tools you know then learn them as part of the portfolio work do 3 projects that cover the common job responsibilities for the jobs you are interested in using the common tools for those jobs do a structured writeup for each of the three projects this will force you to construct a portfolio that will appeal directly to the exact data science jobs that you want and would actually take seems like a well-structured approach thanks! any ideas for specific projects? not some simple analysis but something that would make employers go "hrm" np - i've tried the boil the ocean many times and i finally stopped doing it :) as for specific project it really depends on the job postings that you are looking at i wrote my thoughts down here a few days ago (http: www getadatasciencejob com advice how-to-choose-a-data-science-project-for-your-data-science-portfolio) if you want to check out a longer version the shorter version is the following: find a job posting from that job posting extract the programming languages they want from that job posting extract the data they use from that job posting extract the things they care about (visualization modeling prediction classification dimensionality reduction regression clustering etc ) from that job posting see what other words pop up (in the article linked the job posting talks about noaa weather data) put together a project that uses the data from #2 #3 #4 and #5 above in the link above i looked at a capital one data science job posting they mentioned noaa weather data they mentioned financial transaction data they mentioned using apis and external data sources they mentioned r python and java they mentioned predictive analytics so one project that i suggest was using noaa weather in combination with a proxy for financial transaction data to do some predictive modeling and data exploration the proxy for financial transaction data is the hashtag "#bought" on twitter so you could do some api work with twitter some data munging work with noaa weather and then do some modeling does that help? awesome thanks that's exactly what i was looking for! of course i wish others had posted some examples like you did (that was the point of my op) but your example helps me to frame things in a way that should help me to come up with another example or two on my own thanks again! you could build a scraper in python and host it on heroku through an api in flask? you can do whatever predictive stuff you wanted with that ah cool i do most work in django but i've seen flask come up once or twice one django project already gave me difficulties when i pushed it to heroku so i'll definitely give flask a shot do you know is flask a good working environment for python? does it have any limitations say compared to django? the only reason why i mentioned heroku is because you don't have to worry about web server stuff - if you know this stuff maybe you'd rather use digital ocean or something django is batteries included flask is not this means that things like users logins forms are plugins not in the bare bones framework on the bright side flask is very minimal which makes getting python code on the web quite easy i e https: flask-restful readthedocs org en 0 3 1 quickstart html#a-minimal-api flask seems to be the defacto back-end web framework for internal data science products linkedin is a company that i know uses it there aren't any (touch wood) data science tutorials that use flask that i've seen apart from russel jurney's book but its easy to apply data science to it i have found making applications in flask really fun and i hope you enjoy it thanks that was a really informative and greatly motivating response ;) thanks! best of luck! and don't feel left out because you aren't using framework x or language y a lot of 'data scientists' are extremely elitist write a supervised (email) spam classifier this is simple enough it is often asked during interviews background: i have a bachelors in math and minor in physics situation: i have a job as a software engineer and "data scientist" issue: i want to be a real data scientist (what does that even mean?) i'm working on learning spark right now i have to learn some machine learning natural language processing recap on some basic stats concepts and probably a lot more that i don't know about my background is strong and my company is willing to expense me for an online course set of courses what's the best place for someone like me to start? i'm looking at the data science toolbox (coursera) but i'm sure there are better programs i would like to learn to use spark in the process because we are currently transitioning to it any advice would be of great help thank you! nobody really knows what "real data science" is don't worry about it spark is a great place to start definitely learn about python or scala a lot of the coursera-type content is a bit basic if you're already a developer if you're comfortable learning on the run playing with kaggle data is a fun place to start that coursera specialization is great for introducing a new data analyst to r but for your purposes it will not really achieve any of your goals for machine learning i suggest andrew ng's machine learning course on coursera to get you started it's basic but i'm taking it and it seems like fantastic "first steps" into ml you will also need to know either scala or python if your team is looking to transition to spark if you don't have much prior experience in java then python will probably be a better fit easier to learn for your background as far as an intro to spark it's a relatively new tech so i don't know of any great online courses (although i'm sure they exist) the book "learning spark" published by o'reilly is a gentle introduction to spark and distributed computing so i suggest that for getting you started if you're new to spark and distributed computing in terms of brushing up on stats "all of statistics" might be good for brushing up on the basics and other stats you'll need to know for machine learning all of statistics http: www amazon co uk all-statistics-statistical-inference-springer dp 0387402721 i thought you were joking for a sec but there it is be aware that you can get the pdf for free if you are interested: http: read pudn com downloads158 ebook 702714 larry%20wasserman_all%20of%20statistics pdf ace - cheers udacity also has some really good courses that might be a bit more practical and hands on suited to developers background hi all i'm not looking to break into the data science profession but rather want to gain (some?) these skills to supplement myself i work in product development for a small insurance company and want to help us make more data-driven decisions (based on needs sentiment analysis claims data mining customer clustering etc) as it relates to products i know this is a bit vague but are there any courses professional exams etc you'd recommend? i'm currently reading data smart by john foreman and really enjoying that not to focus on the actual tool so much but i currently just use excel with solver and access i'm certainly open to learning new tools also http: www-bcf usc edu ~gareth isl islr%20fourth%20printing pdf (open large pdf) this book gets recommended very often for a starter in machine learning (data science) it gives really good intuition about the methods but still gives sort of enough math behind them all the codes are in r and that's probably the best language for you to learn for data analysis other choice would be python the book has also an introduction to r excel will not be useful in these kinds of analyses downloaded thank you! agreed re:excel it's mostly useless for this stuff but the solver add-in allows for some optimization calculations i've found useful solver is great tool especially for linear programming problems pivot tables are also really nice for some stuff i'm doing transferring your data between r python and excel can be sometimes a bit annoying but it's worth it after you start getting results from r python also anybody can start using excel almost immediately but for programming languages it usually takes a little bit of time to get used to if you like videos the authors of the book have 15 hours of videos from their stanford online course which uses this book as the course book and all the videos are on youtube the teaching is top notch http: www r-bloggers com in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos they also have the "bigger brother" for the islr which covers topics deeper and with more rigorous math and also some extra topics (e g undirected graph models and neural networks) the elements of statistical learning is also almost double the size of the islr http: www-stat stanford edu ~tibs elemstatlearn download html (even bigger pdf) but by just reading islr you're able to fully utilise and understand all the covered methods one more related question: according to my research online r is the best for data mining and analysis for a variety of reasons but it comes with a steeper learning curve than python which although it doesn't excel as much at data science obviously has broader utility in terms of automating tasks (could benefit me in other areas) in other words would python be "good enough"? i feel like it will be more useful for me in the long term as i'm not trying to get an actual data scientist role r is a functional programming language specifically made for data analysis (statistics) it's also often referred as "language from statisticians to statisticians" python is general purpose programming language there isn't a big differences in their usability (personal opinion) for me python excels in preprocessing data python also allows very easy scripting (automation) as you said python has really solid stack on machine learning and statistics libraries so you wouldn't be missing from them also and being general purpose language python has a lot more support for other programming areas for me the python is the language to learn if you want to become data scientist vs r which for me is the language to choose if you just want to do data analysis r excels in all analysis and closer to statistics your analysis are the easier it gets (compared to other languages) one big problem for some people with r is that they have learned python java or other language which isn't functional programming language (object-oriented or something else) and i feel that the problems rising from not being accustomed to the paradigm often exaggerates "the steeper learning curve" you seem to be just hopping into data science so it doesn't really matter which you choose for your first one (r vs python) if you learn the theory behind the models it doesn't take too long to jump into other language i personally use r python java scala and perl at work to summarise: at this point it doesn't really matter which one you choose more important is to get hands dirty with data and then see what kind of problems are arising and then make decision if this language is better or should i change also for the beginner it's really important to have good materials and in that regard r beats python by a quite large margin unfortunately my suggestion is to go through the book with r as it is the language they use to teach then play around with your own datasets using r and after that try python with the same problems and see which feels more comfortable and do you need the python to automate stuff r is not able to do r is actually quite powerful scripter for all the data analysis automation after all the more i develop a data science database analysis skillset the more curious i am about how to give back to the community (or other organizations) what opportunities to volunteer your time effort have you found? what sorts of organizations might be in need of those with data backgrounds? i felt like i could have used data analysis in order to help fuel a friends councilman campaign i did see an article or analysis somewhere online where he tore apart the communities crime data salary data financial data etc and as a hungry recent grad i offered to help him do it for free and maybe bring some information to light and offer the community new insight into what is going on statistically in the area he declined and stated he didn't see the real value in it this felt pretty absurd for him to turn down tbh and he is a younger more open minded guy; maybe i didn't advertise myself properly either way who would turn down a data analyst doing free analysis? lol either way i plan on doing something small in the near future for shits and giggles either way in this space (being a small local community) i felt like i had to find a problem pitch an idea and then advertise it and myself to people who would use it since i am younger and still trying to save up money and i am more financially motivated at this time in my life i felt like with the same effort i should just try and consult instead if you are not financially motivated since you want to volunteer i would try and get city data do some analysis maybe offer some excel solutions for automation something like that to the municipal center police administrative services people or public schools na how to get user flair filters (in testing): discussion meta career networking tooling education projects fun trivia welcome to r datascience a place to discuss data data science becoming a data scientist data munging and more! data science related subreddits related applications r analytics r bigdata_analytics r bigdata r businessintelligence r visualization related methods r machinelearning r statistics r rstats r pystats r datasets related help r askstatistics r learnmachinelearning r mlquestions r datascience curated content official podcast list where to start if you're brand new to this subreddit and want to ask a question please use the search functionality first before posting this way you can search if someone has already asked your question you can use the search form on this page or visit the following link which will allow you to search only this subreddit = data science subreddit search rules of the road be fair be patient be helpful read this no video links no surveys post suggestions anything you think is worthy of discussion no listicles (n free videos y free book z free courses etc ) no "best of" posts that are just lists - rather than show a list contribute a post that describes your top choice remember the reddit self-promotion rule of thumb: ""for every 1 time you post self-promotional content 9 other posts (submissions or comments) should not contain self-promotional content "" if cross-posting please put it in the title - it is very helpful to see what other discussions are happening elsewhere on reddit i've recently become interested in data science but have discovered that as a relatively new discipline there are few degree programs specifically for it a local university has a program in bioinformatics which sounds like data science but geared toward genetics here is a glance at the curriculum! for a little background i started undergrad as an engineer ended up majoring in humanities but now want to go back for something a little more jobby (and makes use of the interest i already have in technology) thanks for any input! i think the curriculum lacks enough machine learning exposure to really sculpt you into a data scientist have you considered a degree in machine learning? it might be tough however if you lack enough cs background why did you switch from engineering? be honest with yourself don't fall into the trap of more education (debt?) because you think the 2nd time around the math will somehow get easier good luck! thanks for the input! out of curiosity what specific mathematical tools does a data scientist use? if i knew i might be able to examine that branch and determine if in fact i'm barking up the wrong tree edit: also if you had to name three critical cs courses toward becoming a data scientist what would they be? (you can make them up just trying to get a specific idea of where the specialization is) so dont learn how to code or scale up models just be the "data science' guy and rake in all da prophetz!? i fight this same garbage with customers and non-programmers who've built a tableau vis at a certain point (usually very soon if you're not doing trivial work) you need to know how the systems work not just how to use them lol im a data analyst and do some predictive modeling i know vba some r python etc but am no means a data scientist all the 'data science' jobs that require modeling skills want people who are good coders in addition to being a true scientist guess ill always be a gopher since im no math phd but these articles are rubbish hi i put a post up a week or so ago about how i hire some junior data scientists - i was actually struggling because i usually hire more senior positions i got some great feedback - and thank you to everyone who commented at the time though i put up a comment saying that i felt that this subreddit and others like the ml one while great at covering some of the area's in data science left gaps in other area's that really matter in real world scenarios i said i would write something about it i wrote an obscenely long post about it and then it didn't post properly (operator error) so rather than re-type that essay i thought i would do something at a higher level and then answer questions lets set some context first i work in private industry - a big(ish) uk financial services company i do a mix of internal r&d type work - stuff our own teams ask for - and stuff that clients ask for so - everything that follows is in that context - it is a bit different if your working for a start up company it is very different if your working in acedemia it's very different if your working for government agencies keep that in mind i think this forum is awesome - i lurk every day however there is stuff that makes up the majority of my life and my guys life that doesn't get discussed here which - as there are so many people posting about moving into this world and looking for jobs in this world - i think is an issue here are some things which i think need to be discussed here more also - if you can show me this stuff on a cv or in an interview it will jump you straight to the top of the pile 1) you are ridiculously expensive - show me how you will add value there is a team in every private company that all other departments fear and dread they are called "finance" and they are the bane of every managers life they apply basic mathematics in bizarre ways and they will constantly demand that managers either spend more or less money than they are the managers will never win when it comes to head count it boils down to profit margin lets say i am recruiting for a senior data scientist and will pay then $100 000 ( really - thats a bit on the low side but it makes a simple calculation) lets say my company runs at a 20% profit margin in the world of finance this means that that person needs to add $500 000 of value- not $100 000 - before they break even you may think this is crazy - but that is because you are a mere mortal and do not know finance maths you don't have to agree with it - you just have to live with it what does that mean for you the data guy? you need to get stuff done you probably aren't going to be getting your own sales leads and doing your own deals - but you need to add value and that really means being pragmatic some work needs to be absolutely perfect these are the places where you spend the extra week tweaking your model for that last 1% of accuracy it's where you are expected to go read papers to find a new clustering algorithm that will reduce the over-fit by 5% and you get a month to try it and deal with it but - a lot of stuff doesn't need perfection if you need to join two sets of data as a one off task then it doesn't matter if you use sas a lump of perl bookmarks in textpad excel python no one cares - you just need to get it done if you need to know whether two elements of data correlate then often a basic regression is "good enough" and will save you a couple of hours what does this mean? you'll know which hat you need to wear - but when you're wearing your "just get it done" hat - which will be more often than your "get it perfect" hat - you need to a toolbag full of quick work arounds and practical methods if something takes 100 lines of sas 10 lines of python or 2 lines of perl don't go the sas route if you need to eyeball and juggle 10 000 records then you could drop it out as a set of tables with r or you could do it in excel i know it's not cool - but finance don't care - so your manager doesn't - so you don't get good at this stuff be pragmatic know when to have a "good enough" mentality and show it 2) learn to deal with junk real world data is usually rubbish you need to be really good at dealing with rubbish examples - i have about 2 petabytes of data coming from about 8 000 sources the absolute best raw data set has a 2% error rate the worst has a 75% error rate those figures are better than a lot of other groups are dealing with you don't get to complain or get someone else to clean it up - you need to be good at adapting to it really really really good that data comes in to me in perhaps 1500 schemas and formats no provider - ever - sticks to a schema ever ever! so i need to be able to join data that arrived in ebsdic to stuff that turns up in weirdly compressed avro (tip here - learn to love csv - it's a perfect intermediate - as is an sqlite table) looking across my data sets i can see a minimum of 21 different data structures for date:time what ever your going to do your going to use dates and times so - thats something you need to be slick with remember point 1) - this is "get it done" stuff also - a lot of data science is speculative - your going to have 10 idea's for every 1 actual piece of solid work you do for those idea's you're usually going to need to crash a data sample together give it an eyeballing patch it up a bit do some basic work and see if it's practical that means 9 out of 10 of those tasks you do will be disposable - so just get it done all of this is probably best described as "data monkeying" - your not doing science - your monkeying with data realistically over the course of a year you will probably spend 50% of your time doing data monkey work rather than real data science what does that mean? when i recruit a data scientist they absolutely completely and totally must be damn good data monkeys i'm counting on you being able to do the data monkeying in 50% of your day not 90% of your day so that the other 50% of your day you can do the "data science" bit and actually add value - cos the finance team are watching it's not cool you don't get a conference speech out of it and it doesn't get you a bonus but unless you are dealing with a single source of data a good deal of your life is going to be spent dealing with this mess you need to 1) get good at it and 2) not take too long dealing with it if i had god like powers over this sub i would make it so that 50% or more of the posts are people trading tips cookbooks idea's and lots of practice data sets so they are getting good at data monkeying rather than data science definitely less cool - but will make the biggest impact to your working lives some examples of data monkeying: flicking between data structures and schemas recasting data parsing data changing time series - compressing and interpolation of time events -spliting data joining data dealing with common types of tricky data - like names address structures dates time series blah blah blah fastest way to get your cv to the top of the pile - make sure that i can see your data monkeying as well as your data science skills 3) learn to tell a story and not be scary your going to work with all sorts of people - sales it operations and lots of managers and you will intimidate every single one of them whether you are or not actually scary when you walk into a room they will automatically assume that you are the brightest person in that room and that your going to baffle them some people - a minority - will try and get close to you and learn from you the vast majority will react to their intimidation by either not listening to you at all ( many managers ) or feeling annoyed by you ( most sales people) it's not anyone's fault - it's just human nature if you break out the big words the jargon the acronyms and present them with a 19 page excel spreadsheet you do nothing but reinforce those pre-conceptions downside for you is that it's harder to rapidly climb the career ladder downside for your boss is that it's harder for you to show 5x or 20x your salary as value - which means more discussions with finance ( shudder) two easy fixes and one sneaky fix: fix 1 - learn to tell a story seriously - when you tell people about your work give it a beginning a middle and an end "i was asked x i did a b c and d it looks like the answer is y" you might not need to do this for people for people who read this sub but this is humanising you another thing - put it in context i e "a client has x as a problem i did a b c and d it looks like the answer is y because it helps the client due to blah blah blah " fix 2 - present in the right way for the audience some people can deal with lots of data some people insist on it some people are intimiated by it some people genuinely see it as you trying to hide behind a snow of nonsense for example - if your doing something for a finance group or a bunch of actuaries - you need the 19 page spreadsheet and you'd better be damn sure every single cell is correct if you were presenting to a senior sales manager then you want a few pages of powerpoint with big diagrams and a few bullets per page maximum thats not because the sales guy is less clever - it's just what they need to consume information you don't need to be a graphic designer - but you do need an acceptable grasp of displaying data reading flowingdata read blogs practice learn to make an acceptable spreadsheet learn to make an acceptable powerpoint play with mathplotlib sas-graph plotly again - you don't need to be amazing - you don't need to be a master data visualisation expert - "good enough" - but that still needs practice sneaky fix: remember how you intimidate people because they think your a genius? ask them a question about something they know - "what do you think the client will do with this" or "how will hr use this data to plan the company party?" give them a set of options for something even if you make them up doesn't matter what it is - just ask one so they can contribute practice doing it subtly i think i'm going to run out of words soon - more in the next comment 4) show adaptability and continuous learning - but also balance this isn't really something you can change - it's either in your nature or it's not pretty much everyone who applies for any form of data science job can be banded into one of two catorgaries you have the enthusiasts and the 9-5ers being incredibly sweeping - the 9-5'ers are usually people with a stats degree they have 5 or 6 methodologies that they are comfortable with and which they are really good with they take a cook book approach to everything - they will take the same steps with the same methods on pretty much any project be that a re-modelling of an actuarial table or processing of twitter data these people will be aware of the world changing but will either be nuetral or mildly negative about it the enthusiasts are the people who are self learning self motivated and think that playing with data is awesome rather than just a way to pay the bills these people want all the latest toys want to use all the latest methods and always want to learn the reality of the world is that the world is changing so fast that if your nose isn't bleeding you don't fully understand it this is just a phase - it will settle down in a couple of years when "data science" drops off the hype curve a lot of the methods and technologies that look amazing right now will go back to being little niche things but we'll be left with some common standards ( incidently - my bets are: spark will over-take hadoop for data science - hadoop will become a pretty standard it platform kafka and hive will become de-facto standards r will over-take sas - but you'll still need sas on your cv to get a job and python will displace perl as the data science swiss army knife - although i'm not sure thats a great thing) different teams will consider one of these groups "good" and the other "bad" the 9-5er is not going to fit well in a start-up or a telco or anywhere where there is a competitive demand to get good with data fast equally - the enthusiast is never going to fill very well into something like a banks mortgage analyitics team or a basel group - you will scare the shit out of them and annoy everyone around you you either won't be taken on in the first place or you'll be pushed out pretty quick be self aware enough about what you are and show that to recruiters - if you are an enthusiast - show it if your a 9-5er - show it theres a middle ground - for example - there is more wage security in a bank than in a start-up and probably a better salary if your an enthusiast but you have a young family you maybe would be better at the bank - but you may also need to tone down your cv and hold your tounge a good deal in the office thats ok - but again - be aware of it i hire enthusiasts but finding enthusiasts is hard- they're obviously out there but it's tricky to get through on the cv (incidently i don't actually care very much at all about your education at all - one of the two best hires i have ever made is a 17 year old drop out who was entitely self taught the other has 2 phd's in quantum physics - they are equally as good at both data science and data monkeying and both are massive data geeks i do care about what you have done though - past positions hobbies interests all of those i weight equally ) if your trying to get into an industry that hires enthusiasts - show enthusiasm coursera courses open source world data monkeying for open journalism groups blog posts personal projects analyising fitbit data finding errors in national statistics data systems which predict the colour of the next train for the local train spotting society prove your an enthusiast - i don't think it matters now 5) understand - don't just learn - some computer science and some physics a lot of what you want to do is extract interesting signals from noisy data sets and there are lots of cool ways of doing this - discussed on this forum and on the machine learning forum and there are a lot of dull ways of doing things and you can go onto stackoverflow and find code fragments to do these things thats what about 90% of people in this world do thats because they don't have a truly deep understanding of what they do - they're following the instructions putting together ikea furniture lets take error as a simple example error will be in every data set you ever use so you need to be good at dealing with it but what is "error"? there is random error there is systematic error there are ways of detecting the two and separating them you can make the systematic error go away ( btw - if you can see a way of doing this it almost always will save someone some money - so a client will pay for it - so tell people) but you can't reduce random error that means the dataset has a noise floor - so maybe think of it in nyquist terms when you do that it firstly gives you some new tools to use - secondly it gives you a whole new way of looking at how you take samples third it tells you now not to take samples from that specific data set and fourth you can now compare different sets of data with a new set of metrics - error rates and noise floors - and maybe get more value out of them for find a reason for the difference - because usually when things are different a sales guy can make a sale out of something none of that is either hard or needs special qualifications or magic powers to do - it's just about thinking things through a 1 2 step further than most other people and asking "why" more pretty much all forms of physics and engineering and big swathes of biology chemistry computer science etc is about extracting signals from noise - you can flick between them how ever it suits you when you have a basic understanding of the fundementals you don't need to be an expert - but a grounding in things like error ( go look and mandelbrots papers from bell) linear and non-linear systems (you usually can't do data science of any value on a non-linear system even if you think you can) the limitations of your basic tools of the trade ( for example - most regressions are very poor at working with rare events - so don't use them for modelling rare events like what ever tyres exploding or vending machines falling on people - they'll give you an answer but it'll be meaningless) get a grip of the noodly stuff around huge data sets - like benford birthday problems littlewoods law etc - all the stuff which will catch you out if you assume too much or just use cookie cutter methods 99% of the time having this foundation of understanding doesn't matter a jot 1% of the time it either helps you make a big jump or saves you from screwing up r will over-take sas - but you'll still need sas on your cv to get a job and python will displace perl as the data science swiss army knife - although i'm not sure thats a great thing although i find your post very sincere and of great quality this bit amused me =) this shows that data science is indeed a vast world i would never have assumed many data scientists use sas or perl for most data scientists the debate would rather be which of python or julia will replace r i guess there are some industry specific trends as well it is good to know thanks for the great post =) sas is very common in government finance and healthcare much less so at tech companies and other 'enthusiast' places yes i guess it depends on the sector i work in a tech company and nobody works with sas it's all about r or python but it's good to know it's not lime this everywhere yeah any sector that existed before the nineties is hopelessely outdated i work with mainframe cobol sas even microsoft biztalk and sharepoint for f sake on a daily basis and they won't change it thing is design is ugly but the functionality is still there silly to throw money at a new system for a sexier interface when the old one still works yes it makes sense it just makes it harder for employees to jump to a new sector by the way if your company mostly uses outdated technologies how do they manage to hire junior profiles? few of them would have an idea of what sas spss cobol or fotran are i am a junior myself short story: they only vaguely mentioned it during the hiring "you will be trained in our technologies which are industry standard " so i was oh yeah cool little did i know they just sent us to some training consultancy center for a couple of weeks i came to appreciate the old robust steady environment though although the occasional python script that comes along is really fun :) aha so they have built a whole strategy to trick young people into using sas good to know ;-) especially cobol damn i hate cobol i don't want to do data mining in cobol but it's the only way to get my data from the mainframe in a performant way (i know there must be something better but try getting access rights ) sas is a welcoming environment compared to cobol the brutal truth is that mainframes are going nowhere - at least for the next 20 years no matter how old and crusty and creaky you think they are those things work and the iops on them are insane - a baby mainframe will beat the shit out of damn big hadoop cluster all day long for many data intensive tasks - and you can literally smash them with a sledgehammer and they keep working remember that there is only one metric which a cio is scared of - "has the company got good uptime" - and mainframes do as for working with cobol - you're not actually going to be working with cobol what ever data you want you're going to write some jcl to drop it out to vsam and then process that outside of the mainframe jcl in a batch is a very cheap process so just go pull every thing you think you might need if you need more - grow the jcl job - don't mess about in the mainframe itself also you know it's probably good to use sas in healthcare and finance fda and irs aren't too fond of open source and using third party packages also it's probably good for the companies in those sectors b c sas will take full responsibility and cya if things go bad you're talking specifically for the us though in the rest of the world people mostly use what is best without being regulated sas is dying in our company in the sense that only currently existing applications use sas newer modules are written in whatever is best and use sockets to a central orchestrator in a common dataformat (which also takes care of the ascii ebcdic nightmare mainframe users encounter) perl is blazing fast for cleaning up string data a perl one-liner or two have saved me hours in r python gives you more flexibility than r but its packages are less well documented and it isn't as fast as perl for cleaning up filthy text data totally agree on this that's the pragmatism op talked about: use the tool that will do the job whether it is sexy or not on a personal note i no longer use perl although i have a decent knowledge of it i think that on the long term awk or sed will replace it as perl is not often taught nowadays and unix based systems gain more users i'm one of those young people who are comfortable with sed awk and python but don't know a shred of perl should i bother with perl at all? i think you can skip it you will be able to do the same things with sed awk or python sometimes it might be slightly quicker in perl but that might not justify spending days learning the language perl is basically built around regex which makes it very efficient to process modify files you can sometimes do powerful file processing in one or two lines (but so does awk) it is not the most elegant language and it is hard to code scripts which are at the same time long and clean from these two links (1 2) it seems perl in on decline overall (incidently i don't actually care very much at all about your education at all - one of the two best hires i have ever made is a 17 year old drop out who was entitely self taught the other has 2 phd's in quantum physics - they are equally as good at both data science and data monkeying and both are massive data geeks i do care about what you have done though - past positions hobbies interests all of those i weight equally ) awesome thanks for being someone who gives a damn about performance rather than sometimes wildly arbitrary metrics of someone's worth to society i say this as someone who is college educated -whatever the hell that actually means - and loves learning (to the point where i'm going back to school just for fun next semester) some of the worst people i've ever worked with were "highly educated " one of my biggest "problem child" employees when i was a manager was a naval academy graduate and one of my best employees barely had highschool done incidentally the "big boss" loved the naval academy guy and hated the other guy largely because of their educations and shit on the other guy until he quit - then i was stuck babysitting a problem child with a massive ego and inflated sense of self-worth keep on doing what you're doing! but you'll still need sas on your cv to get a job i don't think this is true agree sas is for people who can't code total nonsense sas is exceptionally good at what it does if your doing a lot of statistics sas is massively far ahead of any competition the company is spending $1billion per year on r&d sas is easy to use badly - it's very very forgiving so you can write shit and it'll still run i have seen code which was executing in 17 days be tuned to run in under 15 seconds - that shows you how forgiving it is it's easy to write shit sas code but it's exactly as easy to write shit python or r it's hard to write good sas code but just as hard to write good r or python the big difference is that r and python is pretty much free ( although most companies will be using one of the supported version) but - 100 seats of r might cost you $30 000 where as 100 seats of sas can very easily cost you $10m or even $20m if your a start-up - thats why you don't have sas if you a tier 1 bank where money isn't an issue then it's a different scenario [deleted] yes - but only if someone else is paying for it sas is massively important in the financial services industries - banks etc no matter how much bitching and moaning happens on this thread if you're a bank where all your analyitical models are built in sas and all your data is in sas index files - your going to keep with sas the newer and or faster moving companies don't use sas very much - it's r or python the thing with banks though - they may be dull and boring (they really really are) but they are much more stable than a fly-by-night start-up might not matter to you coming out of university but dullness and job security is valuable at times - like when your starting a family [deleted] the brutal truth is that grades only really matter to get your first ever position after that it matters vastly more about whether you can do the job in front of yourself whether you stretch yourself and what your management chain and your clients ( internal or external) think of you after you've been working for a few years they pretty much stop mattering entirely so - when i look at resumes i'm using qualifications as a filter rather than a selector an ma shows me that you know book knowledge (good) and probably explain at least some of it (more important) an m phil shows me you can't handle independent work and self manage a phd shows me you can handle independent project and you will probably be a pain to manage for the first 6 months i'll probably look at your grades but it wouldn't massively change my opinion - although saying that i'd probably rank someone a little higher if they fought through a tough viva or had to do it a second time even if you're worried about belgian grades then make sure you add a couple of lines about them and what they mean on your resume grades seem to matter a bit more to american recruiters than european ones and the name of the university seems to matter more as well however your resume carries the sub-text of "some one thought enough about this person for them to fly 1 3 of the way around the world and do all sorts of paperwork to get them this qualification" - thats worth a lot in it's own right it won't get you a job but it might get you added to a list of phone interviews just to make sure your not something special it's industrial you will find that banks are pretty much a sas only zone the rest of the financial services industry is usually "sas + something else" - usually "sas + r" as you move more towards the tech sector you see sas dropping away and r increasing so then you don't need sas on your resume to get a job well it depends if you want to stay in the tech industry all your life - you can probably live without it but not a lot of stability or job security in that world if you have a young family you might well want something like a pretty solid assurance of a pay check - at the expense of some dullness and stolidness and the big majority of those sorts of companies who will offer that will either expect sas or are going to be positive or at worst neutral sas on a sas will never stop you getting a job having julia python etc on your cv can stop you getting a job ( see previous comments) but now you're saying something pretty different i just doubted the claim that no one without sas on their resume gets hired incredibly thoughtful and generous op so valuable to the community thank you for it but i believe you'd be well served to acknowledge the set of biases that are apparent throughout your post and comments as evidenced in part by: sas over r unequivocally (simple to debunk this with a 5 second indeed query) the tech industry not having "job security" (possibly the most secure industry in the foreseeable future -- see the steady growth of tech companies in the sp500 over time -- this is not a fluke) the idea that mainframes aren't going anywhere in "at least 20 years" (moore's law would like to have a word with you -- think about where we were 20 years ago) these all point to your background at a long established more traditional company it'd be a shame if people took too many of your points as gospel especially with regard to the areas that you're admittedly not an expert in -- specifically the tech industry you get 0 5 out of three correct i personally much prefer r over sas both for performance and cost reasons (although python is my number 1 preference by a long way for "maths" and perl for "data monkeying" - if only because i came up through the java world and python is a better mental fit than r's lisp-yness and with perl if i spend another 10 years on it i will maybe get good enough to understand someone elses code on the first attempt without swearing - i hear that if you can do that larry wall gives you a small medal) the point i am trying to get across is that many companies - banks insurers health companies etc have got 100+ to 1 000+ people teams using sas exclusively with 15 years of legacy models written in sas and the team managers came up through the ranks as sas developers and who believe they're still damn good at it r can be as whizzy as you like but in those environments sas isn't going anywhere until the point where younger guys come through become managers and make a lot of noise - thats years away not weeks or months i don't think a sas credit on the resume is a bad thing to have - it opens a lot of doors and probably doesn't close many certainly there is a far wider range of jobs available for someone with good sas and good r python if i lived in a big tech hub i wouldn't especially care if i didn't and i wanted a ds job i would the tech industry really does have a higher turn over rate than the likes of a bank the reason being that expectation is higher and average talent per employee is higher and those two drive higher expectations a big slow beast like a legacy bank is different if you're on for example an ifrs9 programme (which is soaking up huge amounts of people at the moment) then you're typically in a programme which is going to last 3-5 years for essentially a single project you might have 100 data guys 5 layers of non-technical management and a bpmo that is bigger than my entire team it's a far slower paced role - both by design (you may do almost identical work 3 4 or even 5 times) - because the regulator explicitly demands it and also because these types of companies are far slower by nature way more meetings way more reports way more chains of command and matrix org charts what that means is that it's much easier to hide away and take it slow for a few days if the new baby is keeping you awake - and it's easier to carve out a special niche - and niches are what keep people safer (note: safer not safe) at redundancy times in short - it's easier to get lost in the system and keep under the radar thats not great for career advancement but there are times when career advancement isn't top of the agenda my background very much is the tech industry - both small start-ups and some very very big tier i companies and almost all of that was focused on "it with data" rather than "data with it" if that makes sense got the acm articles and the o'reilly book credits to prove it in those roles i've faced into lots of client companies - from huge big banks to little agile web companies with the twilight zone that is retail and logistics in between ( do not enter and expect to keep your sanity ) today i work for a big(ish) financial services company because firstly it's a genuinely interesting role but also because i have a young(ish) family and the economy still on a roller-coaster here in the uk - it's a pragmatic decision i like where i am but not enough that i won't be back in the tech sector in the next couple of years there is no such thing as a job for life but i don't think thats a bad thing mainframes people don't buy mainframes because of chip speed - when you spend your $10m to buy a machine with 4 cores each running at about 1 5ghz it's obviously not on the top of the list you do buy them for sheer io throughput and backplane width- the io throughput of a mainframe is insane - a small low end 10 year old mainframe will comfortable beat a fibre optic connected cluster running local ssds all day long - but thats not the reason mainframes aren't going anywhere either - the reason is dead people and cio's wives you have a lot of big businesses - banks insurers atm companies aerospace parts tracking companies etc - where critical code was written 20 30 or even 40 years ago perhaps 2 or 3 million lines of undocumented "secret sauce" code that code is a mixture of cobol (think of it as mainframe application code - easy to live with) tso (mainframe gui's - pain in the arse) and skeltonised jcl ( mainframe "gluecode" - utter fking nightmare ) and if you have ultra performance critical code it's going to be written in llasm - **low level assembly no fancy shmancy mov's and jmp's for the likes of a mainframe wrapped around this you have countless systems all tightly coupled at both the data and connectivity layers in any given big financial services company that might be 20m to 50m lines of code of all sorts of languages in maybe 50 or 100 different applications - many themselves mission critical- all connecting at the raw sockets level all with custom ebsdic convetors parsers blah blah blah if you want to get rid of the mainframe you need to point the rest of the stack at something else mainframe's don't do web services they don't do ascii they don't even have a tcp ip stack in most of them ( its an alternative system called sna - all those tightly coupled applications usually have a custom sna adaptors in etc etc etc) so - you're not just getting rid of a mainframe - your ripping out or massively refactoring a load of other systems as well but - with enough will and enough cash you could do it apart from the fact that the "secret sauce" code is undocumented and the guys that wrote it are retired or often dead seriously - if they were 40 when they wrote the code they can be 80 now - the reason mainframe developers charge such crazy day rates is because there are less of them each year due to the age of them it works because it's had 20 years of tuning every single failure mode covered and it runs on a mainframe - which never ever break - so it just sits there ticking away the dead guys can't tell you how to move it so - you have a high powered cio with a porche ( a weird amount of cio's have porches ) nice house and a wife who likes nice shoes he goes to his ceo : "hey - i want to get rid of this mainframe - it's costing us millions a year - screw ibm fujitsu" and the ceo says "sure - that sounds good - but if you break it on the way your fired" (remember a typical cio only has a 4 year life span anyway) so he can get rid of the undocumented highly coupled mainframe who's developers are dead and if his guys blow it he's fired and his wife is p**sed off or he can kick it down the road to the next guy in a few years time and spend his time doing fancy shiny projects which make him look good so thats what he does it what everybody does and will keep doing so until the risk of not doing so is higher than the risk of doing it and as long as you keep paying ibm your $5m year they'll keep supporting it that may sound both trivialized and also overly cynical but you need to remember that a cio cares about something which you as a data scientist do not a good cio can survive essentially any storm - if you are the guy with the biggest budget on the board of a big company then you are damn good at politics but a cio absolutely totally and completely can not survive uptime issues thats a universal truth across all industries a cio will consciously use "what will this do to uptime" as a filter for any decision they make - it's what they're paid to do and their ops managers get paid to always lay out the worst case scenarios to make make sure the cio is fully informed there are entire frameworks like itil to ensure just that don't worry i havent even heard of sas before it must be either a territory or business area thing in my area its all spss r even sometimes still mathlab besides the common coding languages and some microsoft products + various bi softwares where do you live that you've never heard of sas before?! or how young are you i used sas very briefly in one of my classes in college and know a few old timer statisticians at my company that use it otherwise we are starting to migrate towards r germany spark will over-take hadoop for data science i think you mean "spark will over-take mapreduce for data science " spark is a yarn application in the hadoop ecosystem- it isn't a substitute for hadoop great post though i was glad to see this correction it confused me for a minute i really wish you would spell "you're" correctly sure you're a hiring manager and maybe you think you can get away with spelling errors but i sure don't want to work for someone who thinks they can get away with spelling errors while expecting perfection from their employees apologies i genuinely have pretty bad dyslexia getting the letters in the right order is a challenge - the punctuation is pretty much dealt with by random key presses you are missing a big group of people: geophysicists with the downturn in oil prices many if them are unemployed and these people are masters at noise reduction and signal processing--separating the very small amount if real information from an enormous amount of random noise i'd suggest you go find some and see if they fit your needs try r geology careers lots of bright geeks looking for work there agreed with everything but why specifically physics? maybe 'cause physics uses a sh*tload of advanced mathematics on real-world data just guessing would you mind linking a useful paper from mandelbrot about error? i would go wider initially - start by reading chaos by james gliek and then work from there it's a good intro into a lot of critical concepts which the vast majority of data scientists don't understand there are whole swathes of systems which emit data which for all sorts of reasons can not be modelled by that i mean that while they produce data and you can execute algorithms against them the results while looking very pretty are actually meaningless mandelbrot shannon and all the key concepts are covered at a high level and there are a lot of citiations to the key papers you should have a grip on thanks! you forgot 1 be skeptical of your results data analysts should always be questioning your results: does this make sense? is this believable? does it align with business expectations? are the results intuitive? other questions i'm asking: is my analysis wrong? is my data shitty? where have i introduced bias? basically the worst thing to do is just accept your results as-is without any sort of critical thinking your data will always be shitty always if you ask yourself this question then you don't understand your data or your wasting your time - so not getting stuff done all the other questions are good ones - but get discussed on this forum already so didn't call them out i think this is where experience with data really makes a difference if you have dealt with many projects involving different sources of data and used several analytical methods you know things will never go perfectly at first i tend to always try to find what could explain my results a part from the hypothesis i am trying to test any of the step you took could have introduced a bias or an error my co-workers (read industry) does this and it annoys the shit out of me i'm gonna disagree very strongly with some of this if i had to ingest data from 1000 sources with different schema i would not give that to a data scientist to hack together using sqllite and csv files i'm gonna give that to the data engineering team to build robust data pipelines and etl processes with alarms archival slas documentation code review change management etc then merge everything into a data warehouse cluster (redshift) that the data science team can use let the engineers engineer shit and let the data scientist analyze data well again horses for courses obviously there are very rigour pipelines into the production platforms and all the itil rigour that comes with that but someone needs to task the engineers with the what needs joining and structuring and in what ways - which is a research task ( remember i am r&d) so there will be all sorts of proxies and straw men of the process around second - i work in financial services and one of the area's i deal with is fraud detection for all sorts of reasons related to my point 5 fraud detection is should be done on the rawest of the raw data you will very very carefully take the data from outside the pipelines on purpose - and so when i do fraud-y stuff i am conciously dealing with the data as it arrived if the engineers sort it out for me with the standard systems we have i loose between 40% and 75% of the edge cases i'm supposed to be looking at - which looses money which makes it a bad thing thirdly - data usually costs money (either directly or in rescources) to get hold of so there is always an up-front analysis task needed when your thinking about getting new data sources- and there are as many great datasets in crappy ebsdic as there are in highly structured protocol buffers - actually a lot more - you need to be adaptable remember - i don't want people doing data monkeying - monkeying is dead money but if they can't do it - or need the engineering teams to get changes done - then i'm also loosing money because work isn't getting done it's about pragmatism if they are regularly ( i e more than 4 times in a year) going to be working on a standard data set - then this is absolutely the place for getting etl tasks and big lumps of hive in to production data warehouses: warehouses and marts and the lovely systems that spin them and load are fabulous if your data fit into them ( i mean this - i get super geeky about dw structures and technology) but firstly you need to think storage ( many datasets can't use anything in a public cloud) and a few petabytes of raw data and another few of metadata balloons out rapidly in olap structures - and it's not cheap at that scale when you're buying your own storage and also - and this is a far bigger problem - your limited by truth warehouses and marts work if you can define some form of truth critically it needs to be a single truth cos of well - lots of the stuff in my section 5 it doesn't have to be golden (although that makes it easier for sure) but it must be singular the industry i work in doesn't have a single golden truth it has multiple silver truths it's not possible to dimension a warehouse in a way that works well for all of them so - no matter how geeky i am about them personally - warehouses are not a tool i or my company or our competitors can use it's subtle and has caught a lot of companies out - and will become a bigger and bigger issue over the next few years as it gets more understood and - more likely - as businesses change over time as an example there is a uk bank that is just about to write off a high 8 figure investment in their warehouse for exactly this reason they have changed their business over the last few years and have gone from "single golden truth" to "one gold two silver and a dirty truth" and they are going to a whole new paradigm which isn't warehouse based it was the engineers who built the warehouse -and did it well but the data scientists who found the issues as the business pivoted (and had to break the news the finance team - they should have got medals) - there are just a whole load of industries where warehouses either don't work well ( like the bank which has changed it's business focus in this case) or don't work at all ( like mine) geeky note - what these companies and industries need is the technology and data model that is the warehouse equivalent of a graph database i e you have relational databases oltp -and datawarehouses olap - both based on codd but different as graph-based systems evolve a warehouse equivilant of them will solve all sorts of problems which are only just emerging if you are losing 70% of the data in an engineered etl process you need a better process no offense but that sounds like a terrible place to work as a data scientist analyst i work in the us tech industry not financial services but we work with huge amounts of unstructured data i don't want my statisticians and economists working on data ingestion they can all write queries and manipulate csvs but i don't want them doing that let the experts be experts at they are experts at yup op's gonna have a hard time squeezing out that 5x return on salary for the dreaded finance team if they task people who specialize in one thing with doing a different thing better to have a team with broad expertise across the team that can act cohesively than trying to find people who do everything pair the data engineers with the data scientists and make 'em both more productive i have a very wide spectrum of skills in my group(s) and always attempt to recruit to spread the skillbase further but there are not unlimited staff and there is far more demand for the teams efforts than there is supply for them so at times people need to get their heads down and do things that they are either not expert in or fill gaps is it ideal? nope but it needs doing as for covering my costs - i have (now) 15 heads in this specific team in us dollars my salary bill is about $2 7m ( although not all of that goes to the staff - due to the wonders of "fully loaded" costing) and they are delivering back somewhere in the region of $22million in direct revenue ( i e specific chargable work - usually specific targeted work for individual clients ) and their r&d underlies perhaps 30% of our companies revenue streams which is maybe 10x the direct revenue figure in terms of scale - it's not a stand out year in cost revenue terms but it's not terrible either that may (or may not) seem like a large amount of money for a relatively small team - the reason for the figure is that a lot of the work we do for clients is about using data science cunning-ness to either make them money or save them money making money is meh you can usually get about 1% of the revenue lift as a fee - i e you can charge perhaps $10 000 for every extra $1m you make for a client for any company - saving money is always much more valuable - as a very rough guide $1 saved is usually worth about $3 of new revenue so you can charge more for it we are typically averaging about 2 2% of the cost save i e we are billing maybe $22k for every $1m saved- which could be better but thats what you get for sales guys trying to give away the house for buttons compared to similar teams in our industry competitors - we're delivering about the same level of revenue per team member at a slightly lower cost rate so thats ok the finance team are always going to want more that that but they're getting at least 15x salary back so they can frankly piss off you and i are saying the same things in different sub-threads kindasorta feel free just to respond to steamtrade if you prefer sorry for delay in response - it's been a busy few weeks i know it's annoying to some people who like binary answers - but this is another nuanced answer: the pipeline processes in my company ( and the equivalents in our competitors and clients ) are set up to meet certain specific regulatory and legal requirements for any data scientist in the eu an intimate knowledge of data protection act is critical anyway but in my industry i have 6 or more sets of different regulatory frameworks and consumer protection frameworks as the vast majority of the work that my company and our ecosystem does is supported and controlled by those frameworks then the pipelines we have are linked to them it's not a case that i'm "loosing" data - it's that for the main use-cases that the pipelines were built for - at the cost of multi millions - the regulators say "thou shalt not consume that record" so when i'm in the edge cases - i can either use sanitised data which has a lot of the interesting stuff suppressed in it - or i can use very raw data some of the edge cases i work with the really really interesting cases look remarkably similar to data errors - so we take the absolute rawest unprocessed data we can the other critical context is that my team is an r&d team if something looks valuable then it will be productionalised - at which point it stops being r&d can you stop pushing for the data monkeying? from what i hear it seems like you are just referring to data engineering how do you deal with a transition period new data sources etc where the data is just not available yet in the format you need? in probably the majority of companies that are maybe experimenting or investing in a data pipeline this will be true you can't just flip a switch and have everything available in the perfect warehouse structure to abuse the term those companies will need the equivalent of a "full-stack" data scientist some may argue that is a data analyst but it is possible that people just fit in between the roles probably not going to be the best data scientist but they may be more useful for companies that aren't yet all in [deleted] i added some more content which may help but - i don't think you need to be subtle about it: i regularly work with complex and malformed raw datasets and know full well that that is the norm not the exception and so have become an expert at actions such as parsing interpolation blah blah blah a few neat examples of this how good i am at this would be would be a b c [deleted] two sides of the same coin i would say - they are both emerging roles i would say a data engineer is more like an it guy who can work with data and has a great empathy with it - maybe has a solid maths background but - day to day is focusing more on stuff like ingestion issues tuning the nuts of spark jobs getting the data scientists jobs which are taking 1 hour to run down to 1 second but - they still need to data monkey they still need to be able to cobble together a model they need to be able to look at a dataset and see oddness the data scientist is the other side - a great person with data who can work with systems so - the scientist may make a model but it could be cobbled together out of a messy lump of pig hive and r they are not going to spend 10 hours refactoring a nested sort out - but i would expect them to not be helpless - if they need to refactor it - they need to be competent enough to hit stackoverflow and at least give it a crack really - right now it's two ends of a spectrum rather than two disrete worlds over time the roles will either merge together more or become more seperate - a lot of it depends on how the tooling and languages evolves and whether hadoop spark moves to be more or less abstracted apart from in a few highly regulated roles where you start usually has little or nothing to do with where you end up if you want a leadership role - show leadership show personal diligence it doesn't matter if your an mba or work in the post room - it's exactly the same for everybody don't ask for leadership - just do it and grow fill the niches other people don't want and expand from the niches [deleted] data science boils down to money there is lots of money in the healthcare world and lots of data so there will be plenty of opportunity for positions and for a long time if your in the eu i think the eu data protection act in a couple ofyears will cause some spasms - it'll affect pretty much all data jobs in 2017 - but when everyone settles down with it the money train will start again which will start the recruitment train again "you are a mere mortal and do not know finance maths you don't have to agree with it - you just have to live with it " if the firm's objective is to increase its profit margin then i agree that your "finance maths" is right if the firm's objective is to maximize profits then the firm should be willing to hire any worker that would increase the firm's revenues by more than $100 000 this confusion between averages and marginals is an elementary mistake that too many people in finance and accounting make (which is why people like you can get away with stating it as if it's a principle that us mere mortals just do not understand) let me give you an example suppose i can produce headphones for $4 i face a downward-sloping demand curve for my headphones and i can price discriminate suppose the first customer is willing to pay $5 for a pair of headphones the second is willing to pay $4 50 for a pair of headphones and all other potential customers are not willing to pay any positive amount for a pair of headphones by selling one pair of headphones for $5 my profit margin is ($5 00-$4 00) ($5 00) =0 2 or 20% and my overall profits are $1 00 by selling a second pair of headphones at $4 50 my profit margin is ($9 50-$8 00) ($9 50) = 0 16 or %16 and my overall profits are $1 50 according to your "finance maths " i should only sell one pair of headphones since selling two would result in a lower profit margin but that would mean that you are leaving 50 cents worth of profits on the table because for some reason you care about profit margins rather than overall profits multiply everything in this example by 40 000 and replace "pair of headphones" with "data scientist " and you'll see that you are using the same flawed logic then the firm should be willing to hire any worker that would increase the firm's revenues by more than $100 000 you also have to factor in the opportunity cost of what you would be doing with that 100k if you were not hiring that worker a listed firm cares about revenue and profit due to the craziness of the stock market it cares about revenue far more than profit ( which personally drives me crazy as it leads to vast numbers of crazy decisions all over the world ever single day) if a firm is listed and isn't focused on these two items and on revenue as number 1 - the board are going to jail - it's against the law to run a listed company and not focus on revenue and profit but that doesn't happen because they'll be fired by the non-execs well before that happens but i don't get to change the way the world works "finance maths" is about maximisation of revenue and profit within any given quarter this leads to decisions being made such as "a client offers me $1m for work delivered on 31st decemeber and $2m for the same work delivered 1st january which do you take" - and you end up taking the $1m route because finance it sucks but it is what it is is the logic flawed - yes can i fix it? nope are there any sample generic resumes (no personal identifiable details) that are good that you can show us? thanks for this post; i think this perspective is great to have in this sub i have a few follow up questions? how does one show they are a good data monkey? it's not something that i can write down ("good with dealing with messy data" doesn't seem convincing) and its not something that is often shown off so how should that come across on a resume? the pragmatic part makes a lot of sense but if i come from an academic background (phd) how do i show that i know what that means? most of my work is going to be in the form of publications which are the opposite of pragmatic how do i convince people to get past this "phd-stigma"? yes - you can because if you do you're already ahead of 70% of the cv's on my desk show me some examples it doesn't matter if they come from an uber-cool silicon valley job you had or just that you took some stuff you got from flowingdata and tried it with a different dataset and had to clean up a load of crap being brutal - no one actually totally believes whats on a cv when you read a cv the only thing your looking for is "i assume this person is lying about at least some of the contents of this cv - is there enough here that interests me to make it worth a phone call" ( fucked up the formatting) the phd is a weird one some people flat out won't hire phd's - its a genuine problem i e i wouldn't ever let my son go for a phd as it causes so many employment issues in the future flip side is that personally i hire a number of post-docs and they cost me a lot less than an equivilant non-phd because they get turned down so often anywho first - again - don't be subtle say the words something like "there were a whole bunch of ways i could have gone with my work - i very purposefully took a pragmatic approach" - again - lots of people don't say it so just adding the sentence puts you ahead second recruitment isn't decided by a cv the cv is just to get you the interview recruitment is decided by the interview show the interviewer that you live in the real world - your raw data was a mess you had to choose sets of options based on pragmatic realities you are at least aware of the concept of time constraints and budgets i would also suggest that you actively raise it with them - don't wait for them to ask if you raise it it shows self awareness - more brownie points some people flat out won't hire phd's - its a genuine problem i e i wouldn't ever let my son go for a phd as it causes so many employment issues in the future might be a cultural difference again but certainly not a problem in the us in fact in a lot of companies the vast majority of data scientists have phds but certainly not a problem in the us that's not true outside the data-science bubble my phd in stats was a very real obstacle trying to find a job in the midwest you can't even get a job as a data scientist without a master and phd is usually preferred being brutal - no one actually totally believes whats on a cv when you read a cv the only thing your looking for is "i assume this person is lying about at least some of the contents of this cv - is there enough here that interests me to make it worth a phone call" this kind of makes you a dick to be totally brutal i don't lie on my resume or coverletter - if you're finding that you're hiring people who have previously lied on their resume you've been hiring the wrong people maybe it's because i'm in a different field but lying on your resume in my field is pretty much instant grounds for termination as well it should be [deleted] you should ask lots of screening questions in an interview i mean to be honest interviews are a poor test of the viability of an employee realistically they mean almost nothing they usually don't provide any sort of reliable metrics to gauge how they will be at the job and mostly they simply serve to piss people off when i was a manager and looking to hire guys you know what i did? i made them perform some of the functions they would be required to perform during training and on the job that separated out the wheat from the chaff pretty quick now granted i'm in aviation so it's a totally different skillset than a data scientist (i just subscribe to this subreddit because datascience is totally badass and underappreciated in my field) but putting a guy in a simulator and spending not just one session with the guy but a few sessions with the guy over 3 or 4 days got me wayyyyyyy better employees than a simple sit down interview a quick one-time sit-down interview selects for people who are good at interviews not necessarily the people you want if you want to find good candidates make them work for it a little bit give them a short project in aviation i had a little formula i used i had a meeting with the guy not really an interview it was mostly just a bullshit session to see if the guy was comfortable with small talk and friendly and so he could see our operation and see if he was ok doing the work we do then a session or two in the simulator to see if the guy could actually fly well or was adaptable enough to learn how to do things differently then i'd get lunch with the guy or gal a couple three times then if we were still interested and he was still interested we'd throw him on as a passenger on a few trips to see if he was alright going the places that we went (we went into some weird and challenging places) this whole process took about a week and we had great success finding employees that were a good match one week is worth the effort at a small company at a larger company where you may have 50 people to interview at once this may be more problematic but there are work arounds the "data" doesn't lie - we know that first impressions are often inaccurate so if you want the best employees you need to get a better picture of them before you hire them if i were going to hire any sort of "knowledge worker" (that is to say a data scientist or a programmer a technical writer or whatever) i'd give them a project i'd say "yeah let's do an interview " but give that person a project to work on nothing too crazy but a simple project that you can use to evaluate whether or not they know what they're talking about giving them a project lets you evaluate three things: one - it lets you know if they simply know enough to complete the task you need to tailor these projects to each individual candidate you can't have a "blanket evaluation" or you'll end up with people sharing all this information on the internet and you'll largely have people regurgitating what you want to see and hear instead of actually getting evaluated (if you want an example of this check out some of the stuff out there about airline interviews) two - a simple project has a deadline and deadlines are crucial in measuring performance if you give out a project to an applicant and they can't complete it by the time your interview is here for a job then you get to really see what kind of person and worker they are are they full of excuses and bullshit or did they honestly not have enough time to complete the task? personally i don't care if they didn't get it done because they didn't have enough time but i need to see how honest they are about that sort of thing the ideal candidate would have called or texted me before showing up to the interview to tell me "look this project was more than i was able to handle right now i need more time " at which point i'd say "no problem bring what you have we'll talk about it when you get here " three you need to see how they take criticism of their work i'm not saying abuse them (that is to say don't be a dick) i'm saying give them some constructive criticism of the project you just had them do if they can't handle it in the interview it doesn't matter how many letters they have after their name they are going to be difficult to deal with any time you need to change their performance this is r datascience not r makehastydecisionsbasedonnotenoughevidence build a bigger dataset when you hire and you'll find you get better people think fabricating positions or degrees is uncommon this is the modus operandi of indian it consulting and staffing firms thousands of companies fall for it since these guys are so good at prepping for interviews lol err'data scientist hates excel index match ftw you're* indeed after you have to re-read a few sentences because of that the rest of the post starts to lose value and don't forget having to read "loosing" instead of "losing" repeatedly lets say i am recruiting for a senior data scientist and will pay then $100 000 man salaries must be way different in the uk most genuinely skilled data scientists here in the bay area would laugh you out of the room if you offered $100k yeah $100k is for experienced web analysts in nyc not true multi talented data scientists i tidied up the example - and called out it is an example $100k ( £60k uk ) is probably an ok-ish intermediate grade salary there are a lot of people on triple that in more senior roles there are a lot of people on 1 2 that in less senior roles (edit - i just looked at my pay role - i pay nearly 5x that at the high end to 75% of that at the low end and this is the north of the uk where it's never sunny and we all live in caves ) also remember cost of living - wages in the north of england are a fraction of bay area - but cost of living is also a fraction of the price as well - salaries are set by supply and demand you'll also often trade salary for job security - doing data science for an big insurance company may not pay has high but as long as you don't make an arse of yourself or your work then it's reasonable to assume you will still have a job in 10+ years if you want to stay put you'll also often trade salary for job security tell me about it i work in a government lab comprising federal employees and contractors the feds take a 20% pay cut relative to contractors but pretty much have a guaranteed job for life i would take it it evens out over time if you factor in risk i have a friend who made the jump he literally sat down and estimated the expected utility of the decision like a good scientist i haven't decided whether i want to spend the rest of my career in this specific field but will probably do the same thing when another fed slot opens up (if i'm still here) all in all i think the pendulum is going to start swinging the other way and more young techies are going to get drawn to giant ibm-type corporations where you wear a tie to work every day and know that you'll spend your whole career there i guess google & al are kinda becoming that yeah you have no idea i get about 50k usd in one of the most expensive cities in the world there's a reason green cards are so sought after even from europe really great stuff but if you're building a team then i think it makes sense to specialize - i definitely do not spend 50% of my time "monkeying with data" because we hire sql jockeys that don't require 6 figure salaries to do that stuff it's different strokes for different folks - and different positions my expectation would be if you want a 6 figure salary you are a master at data monkeying already i don't want you data monkeying because i want you doing something else more valuable - but i expect you to be able to do it in case the "sql jockeys" aren't around or it's 2am and something needs doing now actually - thinking more about this - i wouldn't give someone a job let alone 6 figures - if they had an expectation of getting clean structured data served up to them on a plate because 1) it's going to be based on someone elses idea of "clean" - and if you're not finding the issues with the data your loosing your company money either in lost sales or increased costs - and secondly it bakes in a data structure meaning your limited in your exploration routes there are a million jobs in the world where thats ok but this is data "science" - science is based on defeating problems like i say - different strokes for different folks a lot of the big banks and insurance companies work like you state - it's not wrong it's just different it's 2am and something needs doing now what kind of model do you need trained at 2 a m ? maybe it was a marketing emergency? :) he works in fraud i can see how emergencies could happen i don't want you data monkeying because i want you doing something else more valuable - but i expect you to be able to do it in case the "sql jockeys" aren't around or it's 2am and something needs doing now i was just saying it's a waste of resources to have a ds spending 50% of his her time monkeying - you seem to agree completely agree that all ds must have a high level of etl munging ability for when they do have to "get their hands dirty" and to inform their conversations with data engineers managers "actually - thinking more about this - i wouldn't give someone a job let alone 6 figures - if they had an expectation of getting clean structured data served up to them on a plate because 1) it's going to be based on someone elses idea of "clean" - and if you're not finding the issues with the data your loosing your company money either in lost sales or increased costs - and secondly it bakes in a data structure meaning your limited in your exploration routes " i hear you but this comment illustrates naivete about how teams of data scientists who are ml (or whatever focused) work with teams of data engineers i don't want anyone providing me with data unless i understand (and typically unless i've had input about) how it was gathered transformed etc still i understand this paradigm doesn't work for small organizations who can't afford specialization thanks so much for the post as somebody that is in their last year of a phd program it was really helpful to know what kinds of things a hiring manager would look for i do have question if you don't mind i am finishing up a phd in experimental physics i haven't had to do much serious data science during it but i instead have been teaching myself programming and the concepts of data science in my spare time part of this is undertaking pet projects in my spare time to learn about x y or z technique how would i show a recruiter this kind of self-taught knowledge on a resume? i believe i would be able to handle a technical interview well enough with what i have learned over the years but i am most concerned that i just wouldn't be able to get to the interview stage given my past formal education experience (and lack of data science in it) attach a portfolio to your application and refer to it in your cv i recently switched from academia to a data scientist position during my 8 years in academia i did a lot of data analysis stats a bi of machine learning programing scripting in an other context we would have called it data science but i officially was a computational biologist to be sure the recruiters understand which were my skills i added a couple of simple analyses or data visualisation i did on my spare time it was answering a simple question (not biology related "real life" questions) and showed how i got the data treated it and reported it i put everything in a file which i mentioned in my cover letter and cv it apparently worked quite well as i was offered an interview for each application i sent also don't forget that in data science there is science having a phd is clearly a plus when it comes to sell your analytical thinking and your capacity to efficiently find a way to answer an initial question or hypothesis this should not be overseen this is probably one of the most helpful posts i've read on this forum for a while thank you very much your "data monkeying" analogy reminds me of this classic nyt article about the importance of clean data thanks for the insights! http: www nytimes com 2014 08 18 technology for-big-data-scientists-hurdle-to-insights-is-janitor-work html as someone who hires data science people this couldn't be any better post u nailed it! thank you for the insights i'm eagerly awaiting more :) hello! so i just got hired as a data analyst and i wasn't really grilled on a lot of that stuff however i'm not merely making anything close to 100k in salary i also only have a b s with that said i want to move up quickly and get that coveted title of data science as soon as possible hell i kind of ultimately want to be a cto would you have any advice for someone like me i think the bit about data monkeying and telling a story is really important most of the people in management don't seem too technically minded but they are all of the gatekeepers i have a teaching background so i'm ok disseminating things down to people and i try not to talk down to them what else would you suggest i do to move up the latter? [deleted] i just posted about this in another thread fee free to comment and ask questions would you mind posting your portfolio? the one thing that got me into my current position was networking i met my first internship guy and the company that hired me in person before i sent them my resume i'm also lucky in that here in dc there are a lot of companies that are looking for talent and not much to go around this is why i moved here try to find communities around data science tech or startups in your area getting mentors helps too [deleted] imo i'd suggest just to include methods processes you used as well as the final gist in 1 or 2 nice pictures(graphs) experienced eyes will catch the gist in no time and they'll appreciate the time u saved even before being hired can you code in anything other than sas? python and r are currently in wide use right now i highly suggest looking into those also don't stop working on projects there is always a new kaggle to work on and the open source community in data science is very strong [deleted] what kind of degree do you have? experience can also make up for a degree you can always create your own experience also everyplace is different and has different requirements i am sure that my place hired me because there isn't as much comparable talent in the area (dc) and they are a growing startup with lots of opportunities therefore they are more open to junior entry level people btw if you get rejected automatically then you might be having your resume automatically screened there are ways of beating that be good at excel and sql srsly i am in consulting and playing with ds and ml mostly for myself a lot of what you say is really the same for my field also speak business and adapt to your client: no sql or python for cfo and such go technical for it and know when to give up some people just don't get data and go politics that sums up pretty much why i don't like doing data science fortunately i don't work in corporate environment in my opininon: you don't need data scientists for most of the tasks you described above most of the data cleansing and joining could be easily and effectively performed by dqm etl guys they have necessary skills and tools to address hundreds of dirty and messy data sources ds team members should not spend their expensive work hours on tasks which could be done by dedicated it specialists the description of some aspects of the work process in your company demonstrates extremely ineffective pipeline and poor management good post! thank you so much for sharing your insight great post! did not think that this world even existed thank you the definition of value mentioned herein is really important! it says to me that getting close to some answer for the question asked is better than not answering it also the answers are time sensitive costly and sometimes incomplete juggling is important too i think what he is saying is if you want it to find the answer it may change half way through adapt and have the power to get to the answer is there a better job ? example: like wave surfing people that love surfing are out in the line up for a reason to catch a wave they love it they had to paddle out pick a board learn to catch and adapt balance personal style and effort with failure its rewarding to be inside mother nature and get the "answer " some people just stay on the beach a lot of this is irrelevant or wrong for people looking for data science jobs in the bay area you don't need to 'show' how you will add value at most companies most of these companies have several or even dozens of open data roles you don't need to justify that you will provide a certain amount of value you just need to meet the hiring bar and demonstrate that you have the relevant skills (which are of course not necessarily purely technical ones; domain knowledge matters so does how you carry yourself and communicate) real-world data can be messy but unless your company works in a consulting-type fashion and or completely lacks competent data engineers your data should not be that awful you definitely do not need to have sas on your resume to land a job nor is it really a positive given the current data stacks most top companies are working on having a ph d is definitely not a negative learning how to deal with junk is probably the advice i can't upvote enough most of the software jobs in the real world have vast amounts of technical debt as a data scientist you have to deal with most of it and in some places and depending on your role (e g fte vs consulting scientist) you may have to overcome some established resistance to change as well but let's not spoil the fun part of it =) you are ridiculously expensive honestly find me people who can actually do everything asked for and then try to pull them away from high paying tech jobs how many people know probability statistics machine learning cs can code databases architecture has intuition for analytics can communicate well and put together a good presentation and can keep up with all the modern tech? i've interviewed hundreds going for the gold here worth every fucking penny when you find these rare breeds who can do all of the above (often enthusiasts) more practically hire that team with complementary skill sets otherwise really excellent write up will be sharing thanks for taking the time to write all this information [deleted] right - deep breath first - know this - if you want to talk to me about anything pm me - i will respond within a couple of hours second - topping yourself is probably a bit extreme if that is hyperbole then i get it if it's not then talk to someone who isn't on reddit about it talk to people on reddit - what ever works for you please do not top yourself though it's hard to give you some steer because i don't know what you are passionate about - so i may shoot wide on a few topics - bear with me first - you have a phd in engineering assuming it's not for something incredibly niche then you certainly have a career track into the engineering world it may not be as hip and trendy as the data science world - and the prevelance of hipster mustaches may be lower but a phd gives you a route into a high demand stable industry that may not be appealing but it's a damn safe "plan b" - so lets say that this is your fall back if you want to do "engineering with data science elements" and maybe leave the door open for a future move then ga's are finally gaining traction in this space - they are (finally) coming at it from a "here is a way to lower construction costs" as opposed to "the boffins are goofing off again" angle and it's becoming more acceptable - cos everyone likes saving money that plays to a lot of your skills and interested next - your location how come you are in silicon valley? this is a genuine question you've said applied for a bunch of jobs at a start-ups - which may or may not be a good idea - i'll talk about that in a minute - but "silicon valley startups != all of data science" based on what you have said you might be a bit underqualified for the specific jobs you are going for but you are well over-qualified for many other industries they are not the cool start-ups they are not in the valley but skills like yours are massively in demand in places like atlanta new york charlotte washington and it would seem dallas as well ( less sure about the last one) not so many start-ups there although atlanta is flooded with them at the moment - but lots of big established companies doing banking insurance healthcare oil etc all of these are recruiting like crazy and you have as good or better a skillset than they are currently taking on if you are living on a couch in the valley then in all seriousness you don't have an enormous amount of "roots" to stop you moving if you want to stay in the valley and live that lifestyle then i get that if you want to get your head down and work in a less trendy area i get that as well i don't know a lot of places that are recruiting in the valley at the moment - not been out in a couple of months - but i do know other companies in other cities - if your interested pm me you cv while we're talking about cv's - lets have a very frank chat about cvs i said before and i will say again - phd's on a cv are a bit off-putting - it's great that you have a good qualification but the average recruit who is straight from their phd is a massive pain in the arse for the first 6 months and it takes the management team a while to get them sorted out calmed down and for the rest of the team to stop being pissed off with them i know nothing about you personally - but i know lots of new phds - no matter what you are like you get tarred with the same brush when you're doing your "soft skills" bit make sure your cv trumpets loudly and clearly that you understand the concept of teamwork and humility and your place in the world but you say you have an engineering degree - engineers are typically less of an arse than others so you get some brownie points there in a different sentence you say " that i learned while i was pursuing my phd" maybe i'm mis-reading that but did you complete the phd or did you mphil d phil? if you got either of those leave them off your cv! put something else "extentive three year post-masters course involving lab work and tuition" - what ever leave out the m phil more cv stuff make sure you don't say "i had 4-5 patent ideas about x" on your cv a patent costs about $25k to submit ( although way more to defend) so it reads as "i had some idea's but none of them were good enough for someone to give me $25k" but if you wrote "developed 4 streams of intellectual property with a focus on future patent submission in the area of x for university y" then you are telling me your a smart chap and your university was crazy for not giving you the cash yes - it totally sucks that changing the phrasing matters - but it does what you must realise is that these aren't some special rules put in place to piss you off - they apply to everyone let me explain when i need to recruit the first thing i have to do is jump through a bunch of hoops with finance and hr thats annoying when i finally get the go-ahead the first thing i'll do is ring people i trust and see if they reccomend anyone on a good day they'll link me up with someone good and i'll have an easy life if i can't immediately land on someone then i post external adverts and i get drowned in cv's that always happens just as a project goes sideways ( it's a universal rule of recruitment ) and so now i need to read a bunch of cv's while i'm annoyed or stressed and the hr team who spent 8 weeks dragging their heels about me being allowed to recruit and now demanding it's all wrapped up within a few days so i'm at my desk in an arse with a great big pile of cv's - many of which say very very similar things i'm grumpy and i have to be picky - i just can't spend a man week on interviews so i'm going to nit-pick on little details for you as the person putting in the cv it seems unjust i guess it is unjust but thats just the way it is the thing to do is make yourself shine show me you are a real person show me you have valuable skills more importantly - show me you understand how the skills fit into a wider context - how they add value how they make or save money next - you seem to be getting caught out with pandas pandas is this years "toy" everything in the computing world goes through hype curves - for a year or 18 months everone wants to play with the new favorite toy - then a different tool becomes cool and everyone moves on pandas is this years toy for "fucking about with data" if people need to mess about with data they will tell you that pandas is the only way to do this three years ago i was pulled into a whole series of meeting where the lead developers of the company were stating that the only way to develop web applications was with ruby and if they weren't allowed to use ruby they would quit but now not so much the reality is that this kind of thinking is patently bollocks i frigging love pandas and have been an avid user of it for perhaps 4 years but there is nothing i can do with pandas that the guy sitting on the next desk to me right now can't do just as well with his favourite perl toolchain - and usually he's faster because he's got 30 years of experience in perl and there is nothing either of us can't do that the woman sitting opposite him right now can't do with her favourite toolchain of r and some lump's of jni'd java ( yeah - i know it's a weird mix but she's good at her job and glares at us if we mock her so we don't) what you need pandas for is "data monkeying" - get the right data into the right shape accessable in the right way and getting all the basic metrics and stats out of it so that you can start doing science with it with higher end tooling that makes sense if you are going for junior roles - a lot of your initial work is going to be data monkeying much more than data science data monkeying sucks - it does not get you celebrity girlfriends or fields medals but you need to be good at it - if you are slow with it then you are not doing the real ds work that makes the company the money if people care about pandas for the jobs you are going for - get good at pandas "python for data analysis" is a very good place to start - i give that book to all my guys practise practise practise don't read the book - practise like i said when i first started this thread - if i had a magic wand i would make this sub far more about practicing stuff about data monkeying as much as about the pure science - just because it makes you much more employable don't take any of this personally - you are just stuck in the system every time i hire someone my team is going to take a dip in performance for a number of weeks while we get the new person up to speed - that happens with any hire in any job if my existing team have to data monkey for the new guy that is more of an impact still i will always choose the good data monkey over the less good one other things being equal - it's not about that person - it's about the productivity for the other guys i manage - and ultimately - how it affects performance and money next - interview performance lets cover the basics first are you walking in with an ego? there is an ego sweetspot - as a recruiter i want to see an ego of between 4 out of 10 and 6 out of 10 because you have a phd you automatically get marked down 1 2 a point for the reasons i gave above if you're under-confident than you might be a challenge to manage or it might not - so you loose points but not to many if you are over confident - 7 10 or more - then you will definately be a challenge to manage if you are absolutely a rock star in your specific area i'll tolerate it but if you are a regular person then all you'll do is piss my team off and piss me off because they are pissed off that would be ok if you came in "fully formed" - but if you are straight out of university then i'm going to have to spend between $80k and $150k on extra training and expenses to get you up to speed in the first year- so i have to pay more and be pissed off if you have a giant ego thats ok - just don't show it in the interview i will give you a real example of this the perl guy i talked about above - he's 57 and has got grey hair i always make sure he attends at least one of the interviews of anyone i am interested in if they talk to him like he's a bit thick or too dumb to be in the room then they are immediately out and you would be amazed at the number of people that simply can't help themselves but try and show that they know much more than the old guy in the room - it's like a red rag to a bull for a lot of people - perhaps 50% or a littlemore he's been doing this sort of work for 30 years - way before it was considered trendy he's made every mistake fixed them all and is incredibly productive - which is why he's still doing it after 30 years he typically provokes a stronger than average reaction in those people but those same people are going to be the people that cause issues for the rest of my team as well as a manager my number one concern in life is how my team are doing and what i can do to make them better happier more productive more engaged etc etc etc - so thats my number one concern in interviews as well next - when you get asked question - do you give straight forward answers? if so - stop it as an interviewer i am going to assume that you'll give me the correct answer - what i care about is how your mind works how you react to stress what happens when i give you a poke or a sideways question treat all interview questions like a 14 year old treats maths homework - show your working - your working gets you way more points than just giving the right answer so - if i ask a question tell me what you think the question itself means tell me how you are thinking about the answer if you have more than one option tell me why you went for the option you ended up choosing remember - hiring someone new is the start of a long spending process for the recruiter - a lot of what is going through our heads is not "is this person in front of me now the person we want in our team?" but "will this person become what we want in our team" you need to show your personality your intelligence your adaptability and your social ability as well as your book smarts and coding skills then all the usual stuff - wash brush hair shave clean teeth wear a suit that fits ( it simply does not matter at all about the brand or the price - it really matters that you demonstrate the self awareness to be able to dress yourself properly - if you don't know go check out the sidebar on r mensfashionadvice if you are female - wear a watch if your a male - wear a belt and make sure it matches the colour of your shoes why? who knows - but it's a rule that women wear a watch to interviews and men wear a belt which matches their shoes perhaps less so in the valley but in the real world it's still "a thing" - even if it's doesn't make a lot of sense shake hands saying hello ask two or three interesting and challenging questions at the end of the interview shake hands on the way out one thing that might make your current position easier to accept this is probably the only time in your life it will be this hard once you have got onto the ladder the next job is easier to get easier to find and easier to interview for i promise hi kindasortadata thank you for you kind email i tried to apply for jobs in my field for 6 months but nothing happened data science seemed to be a good option (challenging an kind of like continuation of research) then i started refreshing my stats skills and learning python nltk other data science skills i am not a us citizen i came here to pursue higher studies i have a lot of loan to return (almost $35000) i do not come from a wealthy family h1b visa deadline is approaching last year i missed it because i was not able to find a full time job this year up to now i am not able to find a full time job i have applied to companies in sfo nyc la (no i do not live in sfo) i can not even apply for my own green card (phd from us universities can apply without sponsorship) because i come from a country of high immigration to us so us government restricts immigration via stupid priority date also i do not have many research papers (just have 3-4 conference proceedings) so my green card application would not be strong i have 4-5 ideas that are patent worthy and it costs a lot to file for patent i wanted to file for patents to make my application stronger there are a few engineering jobs in my field but i can not get hired because those jobs require citizenship so i do not bother to apply there i have good resume for ds jobs i got interview calls from google ibm and few startups but i was not able to convert those to full time jobs i work 20-25 hours week internship and i barely make enough to survive (i rent a couch) rest of the time i update and learn ds skills i think i am a good person since 2009 i have been donating monthly to unicef children fund recently i started donating to animal rescue organization even today i donate what ever i can i do not try to intentionally hurt other people i just mind my business i am very good and kind to my friends many of them tell me that i am their best friend i don't want to hurt them by killing myself i was very fat i lost more than 110 lbs during my phd i am 35 and single i have not even kissed a girl i don't know what i am living for…i do not see any hope…i have lived in the us legally for more than 10 years…i have very close friends here…i don't have many friends back in my country…just immediate family…dad brother and a sister…when my mom died i could not even go back…if i do not get visa this time then i will be kicked out of the country…i will have to start again…when will i be able to return loan? when will i find someone? what will i do with my life?…i am tired of this pain…i regret the choices i have made in my life…i think i am a very selfish person…you are a stranger to me so it is easier to tell you what i have been going through…i can not tell this to my friends…i don't want to tell them…someone online was telling me that i should find an american girl and marry her…i don't want to marry someone to get a green card…i would marry someone for love… but i have always been unsuccessful in love…my childhood obesity ruined my confidence…i think i am ok looking…i used to wear 41 waist jeans and now i wear 32 super skinny jeans eat healthy exercise…sometimes i have seen girls staring at me or smiling at me (may be they were staring at someone else or smiling at someone else and my mind was telling me otherwise)…i don't know what i am doing in this world i don't know what to do with my life…if i had a job i would not feel like this…if i don't find a job i think i will quit everything buy a bike and go for a south american bike trip and see things and eventually end my life wow right a whole heap of different things here and i have to say i'm not really sure how best to respond to it i know how my dad would have responded to you and that would have been to have given you a firm kick up the arse perhaps thats what i should do maybe you need someone to be nice to you? aybe you need practical advice or perhaps emotional jesus right i am going to offer a series of suggestions none of these is definately going to fix your problem but they probably won't hurt you either in the spirit of my dad's approach to this i will say a few hard points and then move on 1) being nice doesn't get you a career it might get you a job but a career is made by kicking and biting and clawing your way to the top in the management world of mba's and fancy suits it's all about politics and back stabbing in the engineering world - which is really what ds is - it's about going further faster and better than those around you being lovely doesn't make a career 2) you don't get paid to have a personal life one of the hardest lessons that anyone learns is that if you want to make a career rather than have a job you have to have a huge wall between "out of work" and "in work" you have to learn - force yourself - to do this for for all the interviews as well you can not take any level of insecurity or lack of confidence into the interviews you can't take a feeling of injustice or sadness you need to walk in and put on a show it doesn't matter if you are the saddest clown in the circus - when you go into the interview you must force yourself to be outgoing pragmatic and quietly confident if you have a phd you went through a viva and you passed it there is no interview which is more important more intimidating or more stressful than your viva you survived that and so you will survive the interview do not tell the interviewer that you are a nice person who is sad and you deserve the job all of those points my be true but the interviewer doesn't want to hear it go in show you know your subject show you know how to be a team player show you understand that you'll get your head down ask three questions at the end leave the room that is all just doing that will get you ahead of 50% of the other interviewees money having lots of money doesn't make you happy but it sure as hell is shit if you have none so here is what you are going to do: go get a job yes - you have an internship - which is a great start - but if it's 25 hours a week you have at least 10 other hours where you can get more cash so here is my first real suggestion - go get a part time job in a shop - working on the till a coffee shop a cafe what ever you have no need to put it on your cv if you don't want to this will give you more money it will mean you are staring at the walls less and it will mean you talk to more people all of those are good things all three of these things will make you happier by getting a job in a coffee shop you get three lots of good things and no down sides studying stuff like pandas can happen in the evenings and weekends - if you are living like a hermit then it doesn't matter and if you want to have a long term career in data science or in fact pretty much any job in it you need to get used to spending your evenings and weekends learning new things anyway this brings us onto step 2: get another job but this time a data science job you are going to be spending your evenings and weekends doing data science "stuff" you need practise and you need new challenges in order to advance the best thing you could do is get a full time ds job - but we know thats tricky but there are a shit load of part time data science jobs that are way easy to get the thing is - they are not called "data science" so no one looks for them there are a bunch of websites offering "pay by the hour" type work in the uk we have peopleperhour fivvr guru etc same will be true in the us there are all sorts of people on those sites who need work done with crap datasets they aren't fancy valley start-ups - they are accountants in ohio compost makers in nebraska online retailers of widgets in alaska they are looking for help with "web analyitics" they want "data analysis" or "data mining" they need help "sorting out my crm system" no where do they say "data science" but it is data science it's pages and pages and pages of data monkeying work they will pay you to practise your skills you get cash for doing what you were doing to do for free by yourself anyway - and better still you get a great grounding in all sorts of different mess so - what do we have now? you get even more money you get all sorts of data sets to work with you will find at least some of them interesting and so you will become more motivated and - and this is the best bit - you get a shit load of new things to put on your cv now instead of being a phd going up against 10 other phd's your a phd with a whole load of customer experience a whole load of hard won knowledge a lot more practise under your belt and ever question the interviewer asks you you are answering with a real example not a hypothetical answer that pushes you well ahead here's another tip - when you do this - even if it kills you make damn sure you get good reviews put your reviews onto your cv visas: there is no magic fix for this from an employment point of view if i as the recruiter have a choice between a good person and a mostly good person but the good person needs visa sponsorship - then it means a very large amount of paperwork for me as the manager and i have to talk to finance and hr - which is worse than doing paperwork the only way you can beat this is by being better than the people you are competing against there are no short cuts to this hmm actually -- like i said before -- you may be slightly underskilled for silicon valley type jobs but you are well about the typical candidate to a bank insurer etc if you focused on these companies then just because the other candidates are a bit lower in caliber it makes you look better so you would probably have an easier time in those positions emotional stuff: i really wasn't expecting to post anything like this in a data science reddit but fuck it first things first brush hair shave wash clean teeth have a hair cut if you have any strange affectations like wearing a dog collar or only ever wearing green shoes then take yourself to one side have a firm word with yourself and stop it immediately then - talk to people seriously - it's not magic - it's statistics and you say you want a data science role lets have a scale of "best case" to "worst case" scenarios lets say you talk to a lady in a shop the absolutely best case is that you fall madly in love get married and win the lottery the absolutely worst case is "nothing" if she fall into mad passionate lust with you the world does not end you don't die no one points and laughs absolutely no one in the entire world cares so - lets say you say "hello" and smile at ten women maybe nothing happens at all with all 10 but the consequences of that is 10 x fuck all - which is still fuck all maybe just maybe something happens with one of them and if it doesn't - say hello to 10 more after you say hello - then you need the super secret knowledge which most men are missing this is phd grade stuff but perhaps you are ready for it: be nice don't be creepy or weird ask questions listen to what is said and do not use the space when they are talking to work out what you are going to say next don't put people up on pedastals it's not rocket science but for some reason a lot of people forget it a job in a coffee shop would make this easier for you - you would be forced to speak to a lot of people about 50% of which would be female it sounds like you are having a tough time and i really do offer my sympathy to you sorry for responding to your message…i had applied to a company in sfo the company flew me to sfo and i did very well in the interview today i get this rejection message: thanks for the follow up! i did speak with the team again they enjoyed speaking with you yet at this time they'd like to pursue another candidate let's be sure to keep in touch in the case things change as the team had very positive feelings about your candidacy regarding receipts feel free to scan and email or take a picture with your phone and email everything to us thank you! i replied by letting them know that i am still interested and i will learn from this experience do better at other onsite interviews i thanked the hr i get this response then: thank you! what kind words the team really liked you as well they think you are incredibly smart and if there is an opportunity to consider you in the future they would let's stay in touch! i'd like to ensure you get where you'd like to be asap want to touch base after your next round of interviews or before? it is not like i am not getting interviews i have been interviewing since january so far 12-15 hr interviews 3 data challenges that i converted to tech interviews…and one tech interview converted to onsite… the last step is stopping me…visa deadline april 1 is coming up…i don't know what i will do with my life…even if i get a job and company sponsors visa…there will be a lottery…all the hard work of 10 years will be decided by a random draw by a computer if your company is setting a ridiculously expensive data scientist to the task of data monkeying you're wasting money a fresh-out-of-undergrad can do the monkeying the data scientist tells that person how actually - thats not true at all i can't move for cv's at the moment saying that alice or bob is a genius with data science algorithms but finding someone who is productive is fucking impossible i spend way too much time teaching graduates - i e post-docs docs and masters students - the very basics of data monkeying it's not their fault - they aren't taught it enough ( or at all ) in school - which was exactly the reason i made this post in the first place let me give you a real example from last wednesday i have two new grads in the team i set them a task - write a tool to take a series of a dozen client data sets - each of about 5m to 15million records in each dataset- which will be sent to us in different layouts but will all be csv and all contain errors generate some landing tables with column names from the csv bulk load the raw data into the landing tables parse and lex them to automatically identify a series of key data attributes - i e address structures name structures etc and then rip those into a set of working tables and then generate a load of basic metrics for each step of the process one guy went the etl route and he's still working on it today - it looks like his method will work ok - but thats 4 days of work the lady worked faster and got it coded in a day and a bit but took a lot of "best practise" code from stackoverflow and it took 7 hours to do the data load itself and she got her first experience of being screamed at by a dba also - the lexing is pretty poor so she's going to re-do it that task for an experienced member of my team would be around 45 minutes to 1 hour - for building the jobs and the data loads to be complete and the metrics to be created why? because the good monkey's have been around the block a lot they know what does and doesn't work they have cookbooks full of functional methods to do all sorts of monkeying they have reams of well tested well proven code elements they can pull together quickly - monkeying is two things - it's a mindset - which they have developed and a toolkit of bits - which they have built up over time all of my guys have a different toolkit and most of them have slightly different mindsets about how to best get stuff done quickly - and they'll happily bicker about why their way is better than the persons next to them - but they are all highly productive - and thats the big difference - the grads and docs i get don't have either the toolkit or the mindset remember - the entire point of being good at the monkeying is so you can get it out of the way as rapidly as possible - no one makes money from monkeying so it needs to be done 1) quickly and 2) accurately so that they can then move on to higher value work now - the question is - "why have the data scientists monkey the data" - and there are a few answers - the first is that i don't have unlimited supplies of people and we are drowning in work - so everyone needs to get their heads down and push through the second is that the guys who i consider my senior staff - and most of my juniors as well now - would not consider working on a dataset without giving it a serious dose of eyeballing - spending 1 2 hour to and 1 hour monkeying a lump of messy data is the fastest most efficient way of finding it's eccentricities and also seeing if there is anything unusual about it the human eye is always always always better at identifying "weird" than a pattern matching algorithm is let me emphasise that - all of my best guys will monkey the data even if they don't need to - they do it to learn the data if we go back to the client and say "here is the work you ask for" we get $x - and thats nice but if we go back and say "here is the work - in the process we found some anomolies which are costing you $lots and we can fix for $y" then we get paid $x + $y - which is better pro-tip: the plural of 'idea' is 'ideas ' data science training programs with data scientists in high demand lots of training programs have started up to help people learn the skills necessary to enter the profession i’m guessing many of you may have made a new year’s resolution to learn improve your data science skills and thought a post identifying the numerous programs would be helpful i spent a fair amount of time in 2017 researching the various data science training programs available and have categorized these programs by price payment structure (flat fee subscription per class) and curriculum (structured vs ala carte) please keep in mind i am not ranking these training programs just categorizing them data science training programs gateway drugs: free ala carte courses these programs allow you to select and take free data science courses kaggle’s kernels are very useful for picking up specific data science skills whereas data 36 provides more generalized courses both provide an easy way to get a basic familiarization with data science skills and kaggle is great for picking up specific skills udacity actually uses their courses as a gateway to their paid nanodegree programs and you can take quite a bit of the content for free enter the dojo: free training programs with structured curriculum two variations of the free program with structured curriculum are available the first i’ll cover is general introduction to data science programs these include allison future learn edx and cognitive class all provide a structured sequence of courses you can take to learn various aspects of statistical analysis programming future learn and edx charge for certifying completion of their programs but otherwise is free the second type is programs like insight and data incubator target which are for science ph d s transitioning from their fields into data science both offer highly intensive data science boot camps to build on the statistical skills acquired in a ph d program both programs then serve as recruiters placing their students with companies they make their money on the program from the recruiter fees (remember if its free you’re the product) cool win win idea for recovering academics training buffets: pay per course several programs allow you to select data science training courses and pay per course these programs often have a large catalog of courses and are reasonably priced ($7 - $10 per course) data oragami udemy and some udacity content are available on a per course basis good way to try a few classes to see if data science is right for you or pick up some specific skills dataversity also follows this model but at a much higher price point ($79 - $129 per course) district data labs also offers a premium offering focused on corporate training ($25 per participant) subscribe now! structured curriculum with paid subscription these data science training programs provide a structured curriculum with multiple learning paths for a low monthly rate coursera data camp dataquest lynda and o’reilly all follow this model and offer monthly subscriptions from $25 to $50 a month these programs incentivize you to hustle since you can learn at your own pace and the faster you move through the content the less you pay data camp and dataquest appear to be some of the more popular in the data science community old school: structured curriculum with upfront tuition not surprisingly this old school model is the most prevalent way to offer data science training udacity is perhaps the king of this space offering the best value with good content in a structured course sequence ($499 - $699) the price goes up from there from $699 (simplilearn data science) to $8500 (thinkful) brain station general assembly k2 data science springboard and the institute for statistics education all fall somewhere between these two price points these programs typically differentiate themselves from the subscribe now! programs by offering one-on-one mentors that can help you through the program if the one-on-one matters to you this is the way to go otherwise i did not see a lot of difference between the tuition and subscription based models in fact springboard actually uses data camp’s content i do not have university courses identified here if you’ve got plenty of time and money to burn check out this dataset listing all university data science programs for the complete list with basic details and pricing of each program view the dataset nice list! for people in the uk there’s also the asi data science fellowship which follows a similar model to insight and data incubator i believe great job compiling all this info in one post question: what are you thoughts on bootcamps like trilogy education services? i was checking out a data science boot camp at my local university and it turns out it’s run by trilogy and that trilogy runs lots of boot camps nationwide under each university’s name it costs $10k for a 6month part time program seemed interesting but i’m being put off by the “admissions advisor ” he’s very pushy and giving off a strong salesman vibe lots of letting tricks and phrases and little lies to get me to finish my application asap anyone have experience with their bootcamps? worthwhile? i don’t know for sure but if there’s a $10k fee and a pushy recruiter for this boot camp i would be extremely cautious many of the prestigious boot camps i know don’t charge the participants they get their money from the partner companies and from recruitment fees they can do that because their program actually leads to people getting jobs and companies value the skills of the people that go through the program there are some exploitative companies out there offering boot camps and training courses for high fees that are cashing in on the hype around data science and machine learning but not really adding much value with what they teach wow wish i knew of programs here in the us that got their money from partner companies rather than the students when you see companies paying for the program then you know that there’s real value being added both to the students and companies in general 50% goes to the company university doing the marketing and the rest to the company university providing content companies than pay the instructor $3 to $6k per class universities have a larger instructor pay range from $4k to $12k depending on prestige so if they get 10 of you ($100k) $50k goes to marketing $3k to $12k to instructor and rest to content provider stick to the online training programs you are my hero 3 i've been spending so much time trying to find universities that have data science programs thank you!! hey r datascience as the title implies i've got a background in stats and data science but my ~2 5 years of professional experience thus far has been entirely software development from the few interviews i've taken on the side this has actually seemed to be beneficial as a lot of data scientists with strictly data science education may not have great hands on programming skills (which are a big part of data science) employers seem to take kindly to the breadth that comes with an education in stats ds but with applied software expertise my title is currently "data scientist" which at least gets me plenty of linkedin messages i'm at the point where i'm starting to feel like i am losing grasp on all that i learned in my stats ds studies and i'm just wondering if anyone has any ideas to stay relevant stay fresh and keep my data science skills honed additionally how can i prove this? they're going to ask for examples of my work and i'm going to have to say "welp i spent 2 5 years writing software" i'm scared my prospects are slipping i 100% want my next job to be data science but i feel like every day i spend at my current job is moving me further from that even though i'm developing a lot of valuable skills thoughts? tl;dr: how can i get a job in data science with no ds experience but an ms in ds? or if my job doesn't provide me with relevant ds experience where else can i get it? im confused title is data scientist ms in data science no experience ?? the last bullet should say "no data science experience" and then you're spot on i was hired as a "data scientist" straight out of college and the job spec was very much data science focused but the work since has not been data science at all strictly java development the only reason i have stayed is because i've convinced them to pay me more and more money every 6 months i'm making above median (and average) for ds and software devs but the work is just not fulfilling whatsoever if they're paying you to stay i'll take that as a sign that they actually like you have you spoken to your team at all about taking on more data work? who's doing the ds work if you aren't? i have actually but our customer just doesn't need it done no one in my shop is doing ds work really they said if i can come up with something compelling we can pitch it but i just don't know that there's much applicable work to be done with the stuff we're doing i'm just wondering if anyone has any ideas to stay relevant stay fresh and keep my data science skills honed additionally how can i prove this? they're going to ask for examples of my work and i'm going to have to say "welp i spent 2 5 years writing software" sounds like you need to spend a few weekends building a portfolio you have less to prove than most entering the field i imagine a couple projects will put you over the top by "proving" your knowledge give you talking points in interviews employers seem to take kindly to the breadth that comes with an education in stats ds but with applied software expertise this is good to hear the skepticism toward data science degrees is rooted in many institutions trying to make a quick buck with unfocused coursework and bad degree candidates you're effectively shirking that stereotype +1 for you i don't mean to shanghai the conversation but i want to talk about this further i personally think data science degrees are valuable for the right people their weakness is also their strength the intelligent candidate will take the courses that challenge them and build career capital sounds like you're one of those candidates sounds like you need to spend a few weekends building a portfolio you have less to prove than most entering the field i imagine a couple projects will put you over the top by yeah i agree and this is sort of the heart of my question how do i build a portfolio that matters? just work on interesting side projects? are there any sort of "open source" problems that are being worked on that need help that you know of? i agree the data science buzz has created an interesting landscape for both employers and job seekers (and students seeking degrees) i was prvileged to attend a pretty good public state university who then opened their data science program the year i was graduating and my advisor was the one running it and mentioned it to me it's probably not the university for data science but it is a well regarded institution and i think that has earned me a few points as well i'm mostly worried about the interview process i feel like i won't be able to answer any data science questions similar to how i wouldn't be able to answer any algorithm questions in a software interview i'm kind of caught between both worlds with a really wide knowledgebase but not a very deep one honestly very few companies know how to interview data science candidates well technically if you're gunning for a job at facebook or amazon then of course those interviews will be technical and hard but if you're gunning for a job at any-company-in-city-of-choice they probably don't have a resident data scientist who can interview you well and you just have to sound technical and smart enough to regular dev people the bar is much lower here and i'm sure you could do that i guess i'm less worried about retaining a job and more worried that i won't be able to get a data science job (which is what i want to do) like i said the bar to getting a job at any-company-in-city is much much lower than at the big shots if they don't have a resident senior data scientist they don't know how to interview you if they don't know how to interview you they can't judge if what your saying is good or bs :) it's a comforting thought but i want to be a data scientist in reality not on paper : my love you'll never know how "real" your work is until you start working there regardless of the company and its reputation yeah that's the problem i'm currently finding myself in and now we're back to square one how can i do data science without a job in data science? not a rhetorical question i am genuinely asking r datascience if they have any suggestions e g side projects challenges competitions open source projects etc i strongly recommend starting a homelab with a bit of learning about the systems end of things you could easily have some decent infrastructure for some ds projects that suit your interests are easily within your budget and can make a useful contribution to public knowledge or form the base of a future commercial open source project if you haven't been there yet you might check out r homelab when i was first starting mine they were an incredible resource for sourcing equipment software and understanding my infrastructure needs for my projects my current expansion is for working with several datasets in the 700 gb - 2 tb range homelabs are a great way to maintain build your skill set in demonstrable ways they do have the downside of bringing your work home to a degree but if you can work around that they're a great tool for your situation wow this is great advice and exactly the kind of answer i was hoping for thank you! i will definitely be checking this out you're welcome! it's also worth mentioning that aws is a platform option as well the huge caveat there though is that when you're working with any kind of significant ds-type processing you can run up a bill in a day that costs more than the cost of an entire homelab so user beware =d that said i hope you have a great time exploring! good luck! appreciate it! have you heard of anyone using gaming gpu's with cuda? i think i could rationalize the home lab if i can game in my spare time and it would also give me an excuse to get a 1080 ti o:) this happens a lot where data scientists end up working as devs let me guess: are you at a startup? kaggle competitions are a good way to create a portfolio but really anything interesting could look good - it's easy to scrape data off wikipedia etc but you should do these with a clear goal in mind: to get yourself a good job asap where you are actually doing data science another possible route if you want to stay at the current company is to learn a ton about machine learning and tweak a bunch of models that run in production if you can show you're a ml guru then you'll have no trouble getting a data science job so brief bit of context - my company is starting a new data science team and is aggressively investing in the recruiting process internally we have very few individuals that know use machine learning methods me being one of them the vp overseeing the construction of this team is very interested in recruiting me for the team as he knows i have data science skills and i've also worked with him directly using machine learning to solve a business problem part of my concern is i'm not sure i'm ready i was encouraged to apply to a manager position they opened with the team which would likely operate as an individual contributor for some time i'm confident i can use r to handle most problem sets that come up but am still working my way through learning python additionally i've never used spark hadoop aws (their data structure) and my sql is pretty rusty on top of this i'd really like to get more statistical methods under my belt so i'm not limited or at least know where to look to learn more to solve certain officer problems the upside is i'm a very quick study and the position itself will likely not require highly advanced techniques for a while there are some nlp problems that will come up and i don't have any experience with those but i understand the approach and how the methods work conceptually i'm sure this all reads as a bit like imposter syndrome but i feel conflicted that i might get this position and fail because i still have areas i would like to grow in before i'd feel very confident calling myself a "data scientist" can anyone else relate or have similar stories? any thoughts advice or words of encouragement? edit: title should say "aiming" too high never underestimate yourself otherwise you’ll always be on a shitty position nothing is too big you can’t accomplish if you won’t listen a stranger redditor than listen the vp he asked you he must certainly knows you’re ready “if somebody offers you an amazing opportunity but you are not sure you can do it say yes – then learn how to do it later!” - richard branson amazingly right! i think although you may not have all the skills you want you have demonstrated to the vp that you are competent in data science also do not forget that you have domain knowledge and you have more experience with the data the team would work with than any outside hire imposter syndrome will exist until data science masters programs become more mainstream so do not concern yourself with it i think you should trust the vp and your current boss who obviously thinks this is something you'd be qualified for every new role should have some growing pains you can handle it a common misunderstanding is that "failing" is a bad thing actually "failing" means you had the guts to push your self beyond your comfort zone most learning occurs through "failure " in fact if you "succeeded" all the time that means you never really challenged yourself you never pushed yourself to grow i see the prospect of "failure" as an opportunity for growth think about it this way when people go out to run marathons they don't say "oh i've never ran a marathon before so i'm not ready " what they do is train as much as they can then on marathon day they push themselves through it i had the same worries as you for my first ds job but i have experienced a lot of growth (in all senses of the term due to all the late night pizza-fueled extra studying sessions) from being thrown into the deep pool to add to that it was a start-up funded by my ceo so there was no room for fucking up good luck and get to work i feel as though that many data scientists are "nimble" in the sense that we just pick up what's necessary in order to do the job most people don't come into an entry level ds job knowing everything you just sorta figure it out for example: i've never in my life thought about what it means to put something in production and now i have a sense of it i've also learned a great deal of etl to the point where i can help out with the basic stuff now again just sorta figured it out because i had to i expect to continue learning a lot more in my job so you're idea of being ready is to use all the stuff you are mentioning not on real messy interesting projects but on toy projects by yourself? trust me the fact you're wondering if you're ready and knowing what you don't know means you're ready the majority of spark hadoop aws use cases in data science involve either libraries (sparklyr for example) or sql application layers so if you can read documentation you've got the library parts covered and if you brush up on your sql which is a pretty easy language to begin with you've be set for the latter part also in my experience the most effective data scientists come from the business side of things the combination of soft skills with technical know-how is no joke hey this is great! i just finished a coding bootcamp where one of our projects was this exact problem prompt i'd love to see your github and compare codes! a fun next step would be to predict the job title (e g data scientist vs data analyst) from the skills listed in the job description (done in this post) haha my code isn't too great just a brute force string searching through job postings i used selenium and beautiful soup i attach the link to my github at the end of the blog post there are a dozen of projects like yours most of them found findings to be very sensitive to location probably could go down that road if you're willing to further your project i'd probably have to add one or two lines of code to do that so that could definitely be an extension thanks! i wish i knew how to parallelize this so i could do 10000 posts instead of 1000 but i'm a newb good luck report back if you do it! look into the multithreading module for that it's actually quite easy i know others said this but thanks for the write up and summary good to know i mean searching for jobs on your own you tend to notice trends but seeing a sample from obviously more postings than i could go through in a few weeks months definitely gives a nice idea of prevalence cheers! no problem my dude i plan to post more blog articles like these cuz i think they're important given this new era of aspiring data science practioners take away learn python and spark not surprised kaggle isnt there the scripts feature killed any usefulness for employers by driving up the false positive rate for kaggle folks that it devalued it as a signal i didnt get the second sentence he's saying that having a bunch of reasonable finishes across a variety of problems carried signal that a kaggler was a competent predictive modeler before the introduction of scripts now you can go around auto-running other peoples' work and get the same results with zero competence pretty much i imagine after the few completely incompetent people were interviewed after kaggle scripts was introduced a company would of decided it was a worthless indicator most of the regulars hated the introduction of scripts are you still on there? has sentiment changed over time? i've not been on in over two years now edit - kaggle results are still useful if you evaluate value high finishes over a volume of mediocre ones most of the regulars hated the introduction of scripts are you still on there? nope a lot of people have left over time i would rather do driven data if i was to do it why do free work for kaggle for a corporation since most of there competitions are now commercialized look at the zillow one looking to boost one of their core competencies through crowdsourcing what would you personally advise for newer folks? does being on the leaderboard mean much nowadays? if you win a prize ie top 5 but the vast majority of those people are already employed totally not a good return on investment for jobs all the recommendations are tough like writing a blog because you would need to know more than you would need to get a job but also on top of that build enough of an audience for it to matter the easiest answer is just to network i hear you makes sense in a way i'd expect an employed data scientist interviewer to know about kaggle maybe it's just hr who doesn't except kaggle never really has been a major hiring criteria for data science the only scenario i can think of whereas participation in kaggle competitions could help is if the company you're applying to is one whose main field of activity is data science services then the people hiring you will know about kaggle and will be able to appreciate the competitions at this point i list kaggle competitions as data science projects on my cv rather than a "kaggle competition" i think it depends in fintech kaggle is huge exactly any field where algorithmic improvement directly translates into $$$ and or makes previously untenable business models possible is going to be interested in kaggle(ers) i disagree i've been hired in two data science positions because i put my kaggle profile link at the top of my resume both analytics managers had the opportunity to see that i can actually code that i know machine learning and that i'm actively learning keeping up to date with new analytical methods i've been told verbally that it put me at an advantage over other applicants it's a data science specific version of github what more could anyone want? i feel as if its not mentioned enough i had machine learning interns come in at my company for the summer most did not know what kaggle was i'd figure it wouldve been mentioned in univeristies via clubs etc same it's a shame because arguably it has the potential to be a better learning platform for statistical programming and machine learning than grad school does (in my own opinion) grad school is prettty much reading a bunch of theoretical papers on non-robust algorithms that are too difficult to implement and when do improve performance by 0 00001% on a specified dataset at least that what it was for me in one pattern recognition class i took yeah that's along the lines of what i figured i got a bs in a mathematics based field went into the workforce and found a lower level data analyst job then taught myself all the data science related skills that i could through kaggle and rpubs by attempting to recreate what other people have programmed from there on i got my first data science job within 6 months of being in the workforce i interviewed a few applicants for data scientist positions and they only knew about low level algorithms like logistic regression decision trees and maybe svm none of them had ever heard of xgboost lightgbm or any new competitive algorithms this was always super puzzling to me because it showed that they didn't keep up to date with the field that they're trying to get a job in where are you located? i have fairly good projects (and no formal work experience) and i struggle to get even an interview on the back of these projects ditto i've gotten multiple job offers that i wouldn't have if not for kaggle except kaggle never really has been a major hiring criteria for data science agree this is why i didn't say it was a major factor in hiring criteria however it was a signal for a short time early on if you did reasonably well instead of needing to do absurdly well like you currently need to do thanks for the write up! no problem i hope to have more up in few days great info thank you what is the implied usage of java and similar languages in these postings? very interested in this considering that java seems to be at the level of around 40-60% of python and r is there a specific demand for the skillset combo compared to two separate people for software engineering and data work? java is an enterprise wide programming language while python and r serve as direct data analytic languages java is the "everything else" meaning building applications servers etc and don't forget java - hadoop it's in your best interest to be well rounded meaning you can see data scientist as maybe a subspecialty of a software engineer but this doesn't mean you need to know html and javascript and what not know your standard data science tools and languages and throw on either java or c++ since they're widely used in applications confused by hadoop being that high are there really that many companies that really need it? i have my doubts why are you confused ? hadoop related technologies are cheap alternatives to many expensive technologies such as teradata(not complete substitute but good enough for many use cases and better in many others) i work for consulting company and we delivered production platform based on cloudera to major corporation in my country and just by having hadoop in their stack they got more then 40% off the price they were supposed to pay for new teradata and this 40% cost reduction was 3x times as much as total cost of implementing cloudera platform (hw+licenses+mandays for all related processes and by all i mean everything related to using platform in corporate environment and workshops for employees woah that's pretty insightful im more of an ml guy than a big data guy but it is on my todo list so it's nice knowing how effective hadoop can be i meant more about needing it due to the amount of data and not price i see but hadoop is ok solution even for tbs of data and very often companies are forced to use tapes for cold storage or even discard data completely but that does not have to happen with hadoop hadoop provides relatively cheap alternative that runs on commodity hardware and is horizontally scalable and is also very durable thanks to hdfs i'm not sure how hr goes about making job postings maybe they do generic research see hadoop and slap it on there maybe they ask real engineers in their firm it was over 1000 job postings maybe if i do it over 10k maybe it'll be different? i also did not include excel but i'd expect it to appear a lot over 'data analytic' queries thanks op! :) thanks for doing that have an internet point! out of curiosity can we have more detail on the location and dates? i'd love to see if this data changes a year from now there was a stackoverflow article discussion python being up on the charts and dominating the data science industry but seeing your data set makes me curious of academia preferred to use r and industry preferred to use python i wouldn't trust the r as much since i'm brute forcing the string search so if an irrelevent standalone 'r' is sitting there then it could be counted accidentally nonetheless it is still really popular there are actually data scientist in my company that only know r and not python but i highly advise you to gear towards python learn r if you want to go beyond your already built python stack python it is!!! i did start off learning r but i told my professor it felt silly doing r when automation data mining and parsing is so much easier with python she basically went "well learn both!!" i've since forgotten r though i think the data scientists at my company mostly do power bi and python but mostly use a bunch of tooling i think the data scientists at my company mostly do power bi those are more than likely bi analysts r is really powerful i started with python using pandas mostly and have since moved to r also this was done over one weekend so i didn't really make it sophisticated it was my first time using selenium and beautifulsoup so if i come back to it i'll definitely make an updated post! thanks no problem! yeah i totally get you on the "it's a bit rough around the edges" the entire thing was great to see though personally i don't get selenium did you find a tutorial or guide to follow? i'd love to give selenium a try i hardly know selenium i just needed something to load up a browser and go through url links so the selenium code is minimal in my script feel free to look at my github page:) it's under the indeedml repo [deleted] nope never heard of that lol and you basically acknowledged in a public forum that you violated indeeds terms of service you are not permitted to use indeed’s site or its content other than for non-commercial purposes use of any automated system or software whether operated by a third party or otherwise to extract data from the site (such as screen scraping or crawling) is prohibited indeed reserves the right to take such action as it considers necessary including issuing legal proceedings without further notice in relation to any unauthorized use of the site whoops i'm a mechanical engineer and i've been at my current role for 1 5 years i was at my previous job for ~2 years i applied for a data science job at a local startup and will be receiving an offer soon but i might have to leave this area after a year or so (my partner might be changing jobs then) on the one hand this would help me get my foot in the door of data science but on the other hand i might look like a job hopper (2 years 1 5 years 1 year etc) my two options are to take the new job and leave after a year or stay at my current mech e job strengthen my data science skills somehow and apply to another job when we move from the area any advice? i'd make the leap it's pretty convoluted to say you might move in a year is it sure that your partner will change job? why does that involve relocating? another way to look at it is the following: in a year you move and you're looking for a data science job do you think you're better off with 1 year of experience and "maybe looking like a job hopper" or without 1 year of experience in data science? i think the answer is pretty clear that's a nice way of putting it haha 1 year is a long time take the job you can always explain why you're job hopping also a lot can change in a year maybe you don't need to move maybe your partner doesn't get the new job maybe you change partner? who knows take the leap a years experience in the field will trump any preparation in the eyes someone hiring also moving because of family is not uncommon and just be upfront with it when if you need to move in a year even if you move it's possible after a year you can talk the employer into letting you work remote take it if you like it i'd also take the job even if you do move you'll have good explanations for switching jobs changing industries is not job hopping in my opinion i get it though i'm staying in my current role longer than i normally would because i want to show i can be dedicated to something might might might you might not like the job your partner might not change you might get a great offer to go elsewhere trade on knowns i agree with what everyone else has said also i think the stigma of a 'job hopper' is changing especially within data science i find a lot within the field look to move when they have learnt what they can in their current role so it's not unusual to see people looking to leave 12-18months time (obviously this is not always the case but it definitively becoming more accepted) hi all i would really like some suggestions on where to focus my time and energy at the moment because the overwhelming amount of options has me at somewhat of a standstill i studied economics (undergrad) and have a good (but definitely could be better) knowledge of statistics and econometrics i've only learned the in-depth mathematical properties of ols regression and the assumptions that need to be monitored with it i'm currently working my way through introduction to statistical learning which is a good overview of some of the topics in machine learning but definitely doesn't add to mathematical knowledge i am fairly proficient in r and stata using them almost daily for my job can also use tableau but haven't used it in a while i have completed the codecademy course in python but wouldn't consider it a skill before i invest a good bit more time in using it for data analysis i'm currently between applying to grad school or focusing more data-science skills and applications what are your thoughts on where to focus? build a portfolio of data analysis? work through more textbooks? focus on learning python? i'm unsure apologies for adding to the clutter of "help me" topics on this subreddit any help is appreciated grad school is needed in order to get into this field for like 80% of people phds are pretty standard at many top companies not because they necessarily make better data scientist but because they can a masters will get you into the field faster a phd(depending on focus) arguably gives you more opportunity do not go to grad school unless you need it to be who you've always imagined yourself to be do not go unless you simply can't imagine any other way learn python buy books be careful of doing too much analysis and getting pigeonholed as an analyst rather than a scientist the only debate in my mind is between a phd and a masters what do you mean by: be careful of doing too much analysis and gritting pigeonholed as an analyst rather than a scientist i meant getting not gritting there is a difference between science and analysis roles don't fall into the trap of the latter i am torn on this one i have b s in applied math and physics and have been working in start ups and data driven companies for 5 ish years i have worked my way from a bi developer to data scientist i am doing well and have some great ds wins under my belt but i worry about what not having that ms check box will mean for the future of my career how much will it restrict my mobility due to dealing with junk hr? i see what you're saying you need hr to get your first job and it's easier and perhaps better to get the rest by recommendation referral etc at that point your credentials especially in a space of smaller companies are largely irrelevant if you want to jump to a different job that you have no contacts for maybe the masters would help but even then it's unclear because you already have experience and results hello all i have been working in tech for at least 10 years during this period i started from building and maintaining computers and laptops (hardware) to now doing data analytics for major companies over the last 6 years i am able to successfully build data analytics dashboards that analyse data from various data sources by building data models and user interfaces using appropriate data visualisations i am able to elicit requirements negotiate and work with business stakeholders and do technical things like the above paragraph how can i migrate from a data analyst with coding skills to data science field? thank you in advance a great introductory book with an applied approach is hands-on machine learning with scikit-learn and tensorflow by aurélien géron (2017) incidentally all the code in the book is up on github in the form of jupyter notebooks what online ml courses would you recommend? unfortunately i am not familiar with online courses that have a very applied approach similar to the one taken in the book but i will list some courses that i found to be quite useful from a theory perspective (by which i do not necessarily mean they are very difficult or anything but just that the lectures do not go over practical coding aspects in python r at least as far as i can remember) also please note that i have not watched all the videos however the ones that i have actually watched i found to be quite informative and well explained and i hope it generalizes to the remaining lectures in the series anyway here they are in no particular order (although prof winston is the funniest): cs 229 machine learning - andrew ng - stanford cs 156 machine learning - yaser abu-mostafa - caltech 6 034 artificial intelligence - patrick winston - mit videos linked by u unnamedn00b: title channel published duration likes total views lecture 1 | machine learning (stanford) stanford 2008-07-23 1:08:40 8 897+ (98%) 1 794 880 lecture 01 - the learning problem caltech 2012-08-28 1:21:28 4 216+ (98%) 673 278 1 introduction and scope mit opencourseware 2014-01-10 0:47:19 5 110+ (98%) 750 093 info | u unnamedn00bcandelete |v2 0 0 if it's close enough consider "data engineering" which is essentially managing data for data science; and see what additional skills you'd need if you're strongest with one database system (e g oracle) look at their stats ml data-mining tools first what kind of models can you build? hi there i can build star snowflake schemas link tables can work with normalised and de-normalised schemas he means statistical models such as logistic regression random forest etc not database models what kind of data analysis models can you build? in other words how much experience do you have with uni-variate regression models multivariate regression models time series models neural nets machine learning etc what models can you build to analyze data once it's been pushed to a star snowflake schema database? the models you listed are primarily for data warehousing i e how data fits into tables within a relational database like oracle that makes retrieval intuitive and efficient edit: words data analysis models none from you description as i don't have to do that in my current role i build the described data models and then aggregate it in various ways for users to analyse kpis and metrics this is why i am looking to learn more about the items you describe and transition into a data science role if possible welp grab a stats book and pickup r or python and start tinkering you can find a free pdf for machine learning with x language if you can do databases you can scrape some data store it then apply some of the techniques from the book it grows from there i highly recommend courses or something do some projects learn git get comfy in the terminal or command line looking at kaggle is fun read blogs about other people's projects data cleaning feature selection or reduction there's a bunch of skills you have to work on to get the whole picture math and imagination in how to apply it for solving problems with a piece of code is the ultimate goal have fun! it's not a race it's about building knowledge thank you hey maybe you can also look into data engineering since you have a database background already it is not going to be long until a large chunk of data analysis will be automated so no models? that’s all database stuff i came from an sql background myself as mentioned thus far learn some stats (and then some more) learn python or r (be proficient) and learn some machine learning ('git gud' with a library tensorflow sklearn) you will take more to one than the others and if you eventually get good at all of them (to an extent) but can complete your own analyis and modeling and defend it your on your way to a data science position if you can't hack it in stats ml and programing you're only marginally a value add to a data science team the python data science handbook is a great way to get familiar with a lot of common data science topics including common models it's also free and comes with jupyter notebooks that you can interact with to facilitate learning also start considering the strengths and weaknesses of various models and the different approaches to take when your models have high bias or high variance doing this will inevitably help you get more familiar with how different models operate under the hood and can save you if something goes wrong in production i would also start to explore deep neural nets and their variations as they continually destroy benchmarks left and right for various predictive tasks the para ‘i am able ’ tells me you will do great as a product manager? have you thought about this? if not why not? i have considered this however do product managers normally jump there straight from my position? i don’t have a live example however if i’m building a predictive analytics product i’d consider you as a great option since you have first-hand experience etc question is which one would you choose ds or pm? i’ll why as well since i want to see if i’m missing anything sorry to hijack your topic but i am looking to get skilled in what you are currently doing as a data analyst coming from a financial background can you help me understand what tools languages etc you are using currently to build your models and translating them into visual representations thanks! it's mainly sql and then a viz software such as my bi or tableau the way to be most effective is to make sure everything is on the same grain check out dimensional modeling - it normally covers the basics i’m in a similar position to op (although it seems a little more skilled with statistical modeling) learning sql would be huge along with developing an understanding of data structures particularly how data moves from normalized systems to dimensional data warehouses and marts you need to have the required tool knowledge and skills and experience of the new position considering data science is in demannd and somewhat a new job most people come to it from a non pure data science background some are statisticians some are computer scientists and some are data analysts but the common point of all of them is that they learned to use the data science toolbox and then applied that knowledge on a previous job i would also strongly suggest getting a msc at least all job openings i see mention "msc or phd in stats data science computer science or a related quantitative field " people here have mentioned the tool you must learn and prove proficiency with either through a degree or at your current position edit : downvotes on that subreddit are just weird sometimes just do it start doing projects that require the skills you want do them in your spare time volunteer to help other groups with projects that require these skills do them for hackathons do your job the way you're expected to then add in a bit of data science bonus on top the best way to learn is by doing the best way to get a job is by showing that you're already doing that job don't wait for anyone to give you a stamp of approval to say "you have passed data science certification level 1"- it doesn't exist i think this subreddit has a lot of great content on short-term data science advice: tools technologies whether or not to take the job go to grad school etc however i don't see enough discussions regarding how short term decisions are impacting your long term (think 10-15 year) trajectory "where do you see yourself 5 years from now" is a really cliche kinda worthless question to ask however for a data scientist asking "where do i see myself 15 years from now" is actually a very very important one to answer this would be my guidance: align yourself to one of three broad tracks: - principal scientist - vp of data science - vp of (insert traditional business unit e g marketing supply chain sales) these are your three major tracks they move in (mostly all other things equal) increasing order of responsibility soft skills and salary and decreasing order of hands-on data science work technical proficiency and job security obviously there are career paths that don't fall in this track namely starting your own company but i think that for most people this is an accurate set of end-state options becoming a principal scientist means you have likely reached a level where you are the only person alive that knows how certain things work you are one of two people that can make changes to it and even the ceo of your company is afraid of what's going to happen when you leave there may or may not be an emergency plan if you get hit by a bus as a result you get to call a lot of your own shots in terms of scope type of work on the flip side there is a ceiling to how much you can be paid because you are not an executive and executives decide how much money people make (i'm being a bit tongue in cheek here but only a little bit) you most likely have a phd you are most likely incredibly smart and you most likely don't care too much for people who are not as smart as you becoming the vp of data science means that you now oversee a whole department of data scientists and yet you will likely never touch a line of code again your time is spent making sure that what your department produces is appropriately valued by the company and that you can connect company needs and goals with the broad direction your team is taking you excel at taking complex scientific language and making it palatable to regular people you are able to convey value to stakeholders you are able to sell your product and you are able to do that while also understanding everything your team does at worst one notch below the person who did the work at some point you were likely a very good data scientist but what set you apart was not your data science skills - it was standing out as someone who could steal job from your boss by going to meetings talking to customers making tough calls etc you may or may not have a phd you are incredibly sharp you get along well with most people but deep down you judge the crap out of everyone who doesn't have a great understanding of data you make very good money - but there is always pressure for you to push your team harder and get more results lots of travel lots of meetings the biggest drawback? the odds that you'll get promoted to ceo are low because the traditional functions are always the most likely candidate for that becoming a vp of any other function means that data is no longer your main tool - it's now your secret weapon you spent the last 10-15 years moving up the ranks and learning how to leverage data to make better business decisions than everyone else when someone started talking hypotheticals and opinions you started talking data and facts and your boss noticed and your boss' peers noticed and you started getting invited to more meetings more projects and more decision-making events now you're managing an entire division of the company and you're doing it with data on your side - the only way you know how you can talk to a wall you can sit in the room with the heavyweights and you feel perfectly at home there you know how to play politics and you kinda like it you have built a strong team of supporters on the way up - you're good at what you do so it wasn't that hard which one of these three sounds like the most attractive should certainly shape which decisions you make more importantly: you should have a very honest conversation with yourself as to whether or not your strengths align with each specific path and if not what you need to do to strengthen your weaknesses "you are most likely incredibly smart and you most likely don't care too much for people who are not as smart as you " this type of thinking will limit your what you can achieve it will but honestly some people are just built for that these are people who have no interest (and or skill) in workplace politics i've met a couple in my lifetime: they are great people to work with but you can tell they have no interest in managing an entire department and have to give a crap about budgets or hitting targets or other crap that people on the managerial track need to worry about listen as someone with a degree in computational physics from a department rated in the top 20 world wide both overall and in physics i think i've met my fair share of of "incredibly smart people" my colleagues were amazingly inclusive interested in whatever anyone had to say remember getting a phd often requires teaching freshman: the premed major who just needs to get through intro physics because they want to get into medical school how little a freshman in college knows compared to someone in their 6th year of graduate school is amazing and every single person in my department knew that there was something to learn from anyone all scientists have been the young one with their first talk or poster at the professional conference the best principal data scientists will listen to everyone on their team from market strategists lower level analysts and their colleagues that doesn't mean they will agree with them if your boss thinks this way run don't walk couple of things: i think you're under the impression that i want to become the principle scientist i am describing i do not i think you're under the impression that i am trying to say that everyone who is a principal scientist should be are that way i am not but in my experience this is a much more acceptable behavior at the principal scientist level than it is at any other level i think you're under the impression that i consider "not liking other people" to be a good thing i don't having said all of that: giving people credit for including "the premed major who just needs to get through intro physics because they want to get into medical school" is a really low standard so they were very inclusive of other really smart people that just weren't smart in the very specific area in which you are smart in? in my experience academia is the least inclusive group out of any place i've been in maybe your specific department wasn't but out of everyone that i know that did a phd and then went on to work in industry the overwhelming consensus is that academia is incredibly closed-minded about the value that non-research work brings and that looking back that brain-washing was actually quite damaging but i think that for most people this is an accurate set of end-state options given the ratio of junior mid-level data scientists to people in these positions i'd say not obviously thinking about the future can be helpful but deciding where you want to be in 15 years is kinda ridiculous especially with evolving positions within the world especially tech also people switch careers more often than ever so the traditional career path is less common so that part of creating this plan for 15 years ahead is strange in that regard too also i bet at least a handful of people here are interested in academia or government work among other things follow-up question: if you're still early in your career and don't yet know which path you want to take how do you suggest figuring it out? well it probably merits some attention i don't think it's probably as important or straightforward as the op is suggesting it's likely that you won't have a clear cut answer from the start but what really helps is to start asking yourself the question early and often why? because it allows you to focus on other people that are at that level and figure out if you see yourself evolving in that path or not early in my career (literally first job out of grad school) i was in a group that had both a vp of science and two principal scientists prior to that if you had asked me where my future was i would have said principal scientist because i really like the technical side of things being at that job showed me two things: i am likely not smart enough to be a successful principal scientist i have well above average people skills for a data scientist which will probably help me going a more managerial route i actually enjoy being the link between people that understand data science and the people that don't realizing things like that can help you start shaping your future i don't think anyone has the final answer early in their career but you can at least start ruling certain things out and you can also start trying to find mentors role models that can help you chart out where you want to go i'm not convinced that data science as we know it today will exist in 15 years don't choose a career for the end goal as long as you can live inside your means find a career where you enjoy the day-to-day work and don't be afraid to change careers as your interests and opportunities evolve what makes you think that? well the obvious things: the future is hard to predict and data science is hard to define less obvious: right now there is a lot of demand for the data science skill-set but that may not be true in 15 years the supply will increase as more are people are trained and demand may drop like the "ai winters" of the past or a possible tech bubble bursting software may also eat data science careers: right now software engineers can use friendly apis like sklearn and traditional analysts can use tools like tableau it's possible the next generation of tools will eat away at the statisticians-who-code from one side or the other there will always be good careers for hard working quantitatively minded people who can learn new skills what those careers will look like in 15 years? no idea i personally think it's crazy to plan 15 years ahead in a field that's only been around for 6-8 years? be ambitious do what you enjoy and take opportunities as they arise hi all for a background on myself i have undergrad degrees in economics and in business admin (finance concentration) and i recently received a graduate degree in statistics (data science concentration) from what is considered a very good graduate school with that said their data science concentration was not great in my opinion as it was focused too much on engineering mathematics statistical theory than it was on applicable data science skills as a result while i am fairly good at statistical computing ml r coding i have gaps elsewhere in my knowledge (cloud computing database management etc ) as i work part-time i'm trying to fill the holes in my data science background i figure getting one of the aws certifications (certified solutions architect associate is the certification i'd be leaning towards) couldn't hurt but i'm also not sure if it's worth my time i would love to hear from any full-time data scientists what their thoughts are on a certification like this or on having a strong background in aws in general thanks! can't speak to the certifications but my team uses aws on a daily basis for our team it isn't a deal breaker if someone doesn't have those skills but if they do then they have a very strong chance of getting hired (assuming they have a strong stats modeling background) thanks for the insight that's good to know i know aws isn't a core data science skill to have in the same way that statistics ml programming in r and python is but it's good to hear that aws could be considered a useful skill as a data scientist i was looking for exactly something like this i was in the middle of my aws csa prep and suddenly felt whether it is worth spending that much time hey for what its worth - i never got the certificate however i'm pretty good with aws and have it on my resume now i'm able to use aws for what i need as a data scientist (spin up an ec2 instance quickly use it for running scripts daily hosting apis hosting my r shiny web apps and save my preferred instance setup as an ami for myself) i don't feel that i'm missing a ton by not having the certificate but then again i'm not in the job market right now also i wonder if employers would care that i have "intermediate aws" as a skill rather than "aws certified" not sure we use aws a lot and simply having the related skill sets listed would help a candidate we would probably ask about it in the interview and how well you did there is what would matter having the skill is useful i haven't seen companies put a high value on certifications though (i work in tech so ymmv) i think you generally see higher roi on project work that you can show than certifications- so you might consider just adding an aws component into your next project (eg deploy with aws) we use aws heavily and when considering for data science roles it looks like this: strong ml weak ml strong aws y n weak aws y n the data cloud engineers do need to know a lot more about aws though since they build all the infrastructure we use since graduating a couple years ago i have been relatively unsure about what i want to do (big surprise whoop whoop) anyway for the past couple months i have been super into the thought of pursuing data science jobs but there seems to be so much out there about the best way to build up data science skills i also come from a non-data background so i am starting from the very beginning here i think this new article on datahero's blog (i'll include the link in the comments in case anyone wants to read) gives a pretty good overview of different options especially because i am also interested in marketing but it's still hard to know where to start are there any schools or sites out there to get me started? this is a ok place to start i think my first question would be what is your background you say non data this could mean anything from someone with a degree in women's studies to someone who has a degree in theoretical mathematics yeah it's definitely on the liberal arts side i studied history with a focus in women's history i recommend working through the tutorials at datacamp specifically for r they give simple clear explanations of both the code and the concept they also have simpler help files than cran amazing! thank you! it sounds like im about your age if work as a data scientist i'm happy to have a conversation if you would like reach out and we can set up a call or something i too am curious in a data science position or something similar is it okay if i also pm you? basically i just want to know what i can do with the skills that i have and where to go from here i'm a recent undergraduate with a degree in sociology and psychology (however research has always been a strong suit of mine plus i love it!) but i have a knack for business hey! i'm absolutely happy to help anyone out please reach out and we can absolutely have a conversation here's the article if anyone is interested: https: datahero com blog 2017 05 24 how-to-become-a-data-scientist hello r datascience i've developed an interest in data science but i need some guidance i know that you guys see these threads a lot but i assure you that my situation is somewhat uncommon my situation: i have an undergrad stats background i was admitted to a phd program in biostatistics in a strong school which i will start this fall i only recently learned about data science several months after i accepted my offer of admission i do not want to leave my program because having a phd really helps a statistician in biotechnology and at the moment i'm still very interested in that field 1) suppose that i earn my ms and decide that i want to pursue a career in data science assuming that i have done due diligence by then and honed my data science skills (coursera kaggle github and other self-learning) is it better to continue to my phd if i also enjoy my research or would it be better to stop and get a ds job? i've read varying opinions (pros and cons) on the worth of a phd for a data science career but those answers are usually for somebody who is considering applying to a phd program not for somebody who was already admitted note that continuing to a phd would give me time to take cs classes in algorithms and machine learning 2) what's the best way to just get a feel for whether i like data science without sacrificing a lot of time or money? when i ask this i am not referring to long term self-learning rather i just want a free way to "explore" data science without paying a considerable sum for a coursera sequence if i enjoy data science based on this i would then proceed to more in-depth self-learning that would involve coursera github projects etcetra [deleted] great thanks for the insight! regarding the second question i understand that i won't truly know how it feels to be a data scientist until i get a job in the field myself i'm moreso wondering how i should approach figuring out if i like the technical content to rephrase my question what is the very first step that you would recommend someone take if they are interested in data science? is there a free tutorial that you would recommend? a website or repository of information? i'm not asking about a whole curriculum just a single resource that i can look at before anything else there are tons of basic tutorials for data science in python and machine learning google for some of those take a look at some ebooks but it shouldn't require any formal "classes " assuming you can pick up the coding part once you have a bit of a foundation go on kaggle and take a look at some of the challenges as a beginner some of the more advanced problems will be pretty hard to tackle that's fine try and use what you know to translate the data into what the challenge asks for do exploratory analysis data cleanup feature extraction and then fit some models this is basically how most ds projects go and you could see if you find this "scientific exploration" interesting or if it was a slog there's a lot more to data science than that but it should give you at least some insight into what your life would be like great thanks! i'll look into some of these tutorials if you go into the phd program i really really strongly encourage you to do a data-science internship with a solid company while you are in grad school that's the best way to get a feel for whether you actually enjoy it as a profession and it would dovetail pretty well with your phd program how long would it take you to get the ms? honestly if it's just one year and you do a solid internship after that first year you might be in the perfect position to decide which path you want to take two years for an ms unfortunately but i'll look into internships is it better to continue to my phd if i also enjoy my research or would it be better to stop and get a ds job? if you want to make more money probably stop and get a job if you want to open the set of jobs that you can have in the future probably a phd i say probably because plenty of people have gone on to become principal data scientists without a phd and plenty of phds have on to become executive leaders of functions that have nothing to do with data science in general the advantage of a phd is not nearly as big on the starting salary side when you account for missing out on 4+ years of grown-up salary but it does expand the universe of jobs that you are qualified to grow into to me the key question for people is whether they want to climb up the ladder as quickly as possible or if they'd like to eventually be able to settle into a job where they are doing pure data science with complete freedom a ms is your best path to the former a phd to the latter but neither are guarantees what's the best way to just get a feel for whether i like data science without sacrificing a lot of time or money? depends on your definition of "a lot of time" there are a ton of free tutorials out there on how to implement data science methods in python and r going through those tutorials and then finding small personal projects to apply those to is a good gauge on how you'll feel about data science the example i always give people: when i got interested in python the first time it was because i wanted to do some basic analysis on fantasy football data i ended up spending a good amount of time learning about python just because of that and that grew into me getting more interested in functionality like pandas and sci-kit learn find something like that and see if you can leverage data science for something that you find fun i'll look into applying basic methods to small personal projects thanks i see that you have a phd if you got a chance for a do-over would you have gotten the phd still? hi r datascience i have been following this subreddit for a while and i now want to start learning about data science and its uses some information about myself: 1 i am a college sophomore with marketing major i have zero (and i really mean "zero") background on coding or any programming language the reason i want to study data science is that i want to work in the field of market research and client service in a marketing agency in which i know data science skills would make me a desirable candidate i don't want to become a hard-core data analyst but i think some knowledge on manipulating and visualizing data would come in handy (please correct me if i'm wrong) i love reading news articles that use data (and data visualization) to back up its arguments i enjoy reading fivethirtyeight and some other websites focusing on infographics i am currently learning r programming using the book called "r programming for data science" written by roger peng and tableau using its tutorial videos i want to ask you guys especially who work in the field of marketing two questions about the use of data science what are some others skills that you think are necessary to have to do what i want to do in the future (market research client service and hopefully marketing consulting in long-run)? do you have any step-by-step learning process (what should i learn first which books or websites) to learn about data science in general do you think this look good and attainable? i always recommend beginners to start with the analytics edge and then to statistical learning both free and excellent moocs data science is the generalist skill if you want to go deep into marketing you need to pick up the domain knowledge as well marketing is immensely hard imo because of all the data you can track right now you should understand how to do time series analysis to see the impact of a marketing campaign you should understand how attribution works: first point of contact last click time decay etc also marketing will deal with a lot of qualitative data learn some nonparametric stats one of my biggest pet peeves is when someone does a survey and takes the average of "how satisfied are you?" type of questions this a lot of courses don't provide the big picture and it's easy to get overwhelmed i'm also reading data smart (which works with excel to demonstrate ds concepts) and practical data science with r (it would be nice to have some r experience beforehand) i believe these books give you more practical insights into how ds can be applied to marketing among other related fields thank you! thank you so much i just wonder if this is the statistical learning class you are talking about! it is i'd recommend taking a more beginner course like the analytics edge first statistical learning is more intermediate-level and might be too challenging for beginners it was the first mooc i took and i had to revisit it after taking other courses to finally get it well if you want to learn python i am a data science masters student at ucsd studying data science here is all my coursework for my masters program https: github com mgalarnyk dse210_probability_statistics_python https: github com mgalarnyk dse200_python_for_data_analysis https: github com mgalarnyk dse220_machine_learning https: www academia edu 23734578 k-https: www academia edu 23734578 k-means_pca_and_dendrogram_on_the_animals_with_attributes_dataset r is a good start be sure to learn basics of dplyr and ggplot2: do your "data janitor work" like a boss with dplyr data wrangling with dplyr and tidyr cheat sheet quick introduction to ggplot2b - edwin chen graphics ggplot2 getting started with r: kaggle's titanic competition if you like data visualization maybe you can go another route javascript from eloqunet javascript then d3 js some of links that may be useful for the later: http: p migdal pl 2016 02 09 d3js-icm-kfnrd html for some general data sci: a visual introduction to machine learning i think r with dplyr for data manipulation and ggplot2 for data visualization is a good choice for your needs especially since you're already reading roger peng's book datacamp has specific tutorials on ggplot and dplyr also check out this book by hadley wickham on doing data science in r here it's not finished yet but it looks promising thank you! if you want to work in the digital marketing (and most probably you will) then knowledge of google analytics is a must you can find some courses on the web but i find it better to work on real data and real problems maybe some on your friends have a website and uses google analytics? also this course might be helpful: https: www open2study com courses online-advertising for data science i would recommend learning path i've created: http: studiy co path data-science i'm in my last year of a chemistry phd and looking at jobs i want to continue developing analytical methodology and laboratory techniques instrumentation but would like to have a more versatile background in data analysis especially in the case that my chosen line of work (which has very little competition but also very few positions available across the board) is not hiring in the next year fellowships like insight and data incubator both contain stipulations that you cannot seek employment outside of their company sponsors and must commit to a pure data science career (getting a phd in a science you love just to abandon it for a tech company seems a bit strange) if i head into industry first i basically disqualify myself from ever getting a fellowship but if i go the fellowship route (assuming i get accepted of course) i would forfeit an r&d job for a computer desk job is it a better idea to enter my chosen field and develop my data science skills in my spare time? or better yet within the company through sponsored courses or simply from a colleague? any suggestions as to what i do? or am i oversimplifying data science here? galvanize zipfian doesn't have those limitations but isn't a free program not sure if you've checked it out as an option i have and i might as well pay tuition for an in-state master's program at that rate (16k tuition at intensive paid data science bootcamps) the main draw of a free fellowship (aside from being free) is that it's corporate sponsored tied meaning it almost guarantees a job after the program i've decided that i will try to move forward with my current career plan and see if data science is something i really need to do my job well before diving headfirst into it while abandoning all other career options you might also look to see if somewhere like samsi (statistics institute at nc state) is hiring postdocs if you can show some data analytic skills they might be interested in the diversity have you looked for jobs at national labs? pnnl seems to be snapping up a ton of people in data science and might have use for your chem skills as well getting a phd in a science you love just to abandon it for a tech company seems a bit strange i know people in academia who have made hard pivots in their fields even within academia many folks who get their science phds just love the challenge of diving deep into a question and building models to extract insights relating to those questions that is their motivation it's not really that strange full disclosure: i despised my field by the time i was done with my phd and have not been keeping up with the literature since leaving perhaps since your work is more experimental this may not be true for you but i found most phd students (especially ones with phds that would be interesting to data scientist positions) have jobs where the majority of time is spend on your desk in front of your honestly it seems like you're not really interested in data science in industry and are treating it as a backup career option in case the tenure track or national lab route doesn't work out i would urge you to analyze your motivations and give your academic dream a go before you decide to get out (because as you said you will basically disqualify yourself from an r&d job forever but you can always go the other way later) you seem to have misread interpreted my post i have absolutely no designs on being in academia (people seem to forget how hard it is to get a tenure track position and how hard work alone will never beat talent which i do not have) i want an industry job however i want it to be in r&d and data science seems to complement many research scientists' skillsets unfortunately these fellowships do not allow you to take any jobs that aren't with their corporate partners and do not take candidates who have been in industry either way thanks for your input but i'm clear on what i want; i was just hoping to find a way to take advantage of a data science training opportunity in case i can't find an industry job in chemistry i was a galvanize student and you do not need to take a job at a partner company you can go to whatever company you want however having an in at a company is pretty nice to get your foot in the door happy to answer questions about the program indeed i do seem to have misinterpreted your post so from my perspective of having transitioned from a phd student to data scientist my colleagues and i don't think very highly of the paid programs (someone else mentioned galvanize and zipfian) insight and data incubator on the other hand are very selective and are worthwhile but as you mentioned they're for folks committed to making a switch wtf why did you hate your field? sometimes after immersing yourself in an area for a long time you start to see all the warts instead of the big picture i barely made it through an undergraduate degree in psychology before deciding that 95% of the papers were probably bullshit because of the way people manipulated experimental design data collection and statistical analysis across the field it's not hard to become cynical when you see the underbelly of academia or if you got into the wrong field (for you) in the first place and decided to persevere anyways because of sunk costs or whatever 95% of the papers were probably bullshit because of the way people manipulated experimental design data collection and statistical analysis across the field ding-ding-ding funnily enough my phd was in cognitive neuroscience (my bachelor's and master's were in math comp sci) see the underbelly of academia or if you got into the wrong field (for you) in the first place and decided to persevere anyways because of sunk costs or whatever also ding-ding-ding on both counts but this true for all soft sciences i was working in a "computational" program that sold itself as doing serious rigorous work turned out to be bs also a lot of tech companies work with what would be considered "soft science" data (social networks were around and studied in sociology well before myspace and facebook came into existence) yet they manage to avoid many of the manipulations of experimental design data collection and statistical analysis also i have witnessed enough of these manipulations in physics and other hard sciences that has more to do with the perverse incentive structures in academia damn what are your opinions on completing a coursework ms in machine learning then? if the goal is to get an industry position then it really depends on the university i would definitely do a significant ml project (either as a master's thesis or personal project) to show your ability to practically apply the models you learned in your classwork otoh we live in a time when resources for self learning are more available than ever so given that master's are usually unfunded (in the us) it makes sense to save the money the main benefit might be to get your foot in the door if you get it from a university that the tech companies love (cmu mit stanford berkeley waterloo) good comment masters' is only worth it from a top school i would think of the fellowship programs as opportunities to help you land a data science job academic jobs are competitive but so is the rest of the job market (and so are data science fellowship programs) once you get an industry job why would you want to join one of the fellowship programs? sure you could learn a lot in them but you'll learn a lot on the job too and you'll already have achieved the primary function of the fellowships -- a full time job why not just take a few free online classes first to decide if this data science stuff is really what you want to learn about? that way you can learn on your own time and doesn't cost you money or relocation to wherever the data science hack school is these fellowships things are not a small commitment hey all first time poster here i'm a soon-to-be chemistry graduate from the university of east anglia in the uk and i can confidently say that i don't want to pursue a career in chemistry during my course i've done modules in analytical and forensic chemistry and i really enjoyed the analysis of data given to me and communicating my findings through technical reports and presentations i started applying for graduate schemes as an analyst this year and although i wasn't successful i was getting feedback that my cv and applications were good i started applying for job ads as an analyst in various different sectors and i'm getting rejected straight away i think it's because i'm not trained in data science skills things like r python sql so i've applied to do a masters conversion course in computing science at uea (link below) and i've been offered a place it looks to me like a good course for me to get a grounding in those skills the department also sounds as if it has a very good research group involved in data science and machine learning so my dissertation could be quite interesting i could also do my research project in collaboration with a company as a paid placement my question is how valuable is a masters degree in computing science for data science providers in the uk it's a career path i definitely want but i want to be sure that i'm going to stand a good chance of getting a good job at the end of it apologies if this is just a us-based subreddit i'm not sure! msc in computing science at uea: https: www2 uea ac uk study postgraduate taught-degree detail msc-computing-science in my honest and humble opinion masters degrees are usually not worthwhile - although your situation may be one of the slightly more suited to a masters conversion due to the technical aspects it won't do your cv any harm however what you could learn in a year in industry would likely dwarf what you will take from a year-long masters course - and you will be no less light in the pocket seen as you had good feedback on applications you could consider aiming for different roles in similar firms for a year (internship or grad scheme) whilst requesting training and expressing an interest in moving across to analyst areas can also teach yourself tech skills like sql for free online (doesn't make it any less valid if you can legitimately do it and prove it!) best of luck! hello and thank you for responding i really appreciate the perspective of someone who has some knowledge of the industry! you see my primary concern right now is that i'm as green as i could possibly be and i believe that pursuing a masters conversion will at least give me some concentrated experience in the field the possibility to conduct some research as well perhaps even in collaboration with a private company is kind of an additional icing on the cake i think having the set profile of learning is another useful thing for me but at the end of the day the most important thing to me right now is being able to secure good employment at the end of it i want to know if doing a masters conversion is perceived by employers as being a credible and attractive proposition i have obviously acquired certain skills in my bachelors degree in chemistry too so i'd like to think i should do alright happy to help i guess you need to look closely at the profiles of the roles you want and try to gauge how competitive they are and if they bonus factor of a masters is worthwhile (most are quite competitive i would imagine?) i would say it will probably increase your chances of getting to interview stage but it is no guarantee of employment and because most times top 100 do not 'require' a msc it always leaves me questioning whether they are worth the investment i do believe that all employers value work experience very highly now so you definitely need to consider how you will get industry experience insight during the course and if you can supplement it with an internship to build on and prep for the competency & strengths-based assessments at assessment centres? as i said though your case is more valid and you can justify it more where i think some students study a masters without good reason your university likely has a final grad careers fair coming up in early summer maybe even one specific to stem i'd recommend going and getting the opinions of the tech sector recruiters there - they tend to be very helpful and may be able to give more info in the meantime may as well keep applying and seeing where you get to you raise some good points with regards to work experience the course profile suggests you may be able to undertake a project in collaboration with a private company i'm going to pursue that straight away in the hope of securing that at the very least the other additional thing is that the school has a prominent research group involved in data mining and machine learning so perhaps conducting my research dissertation with that group would offer me some good experience of solving real world problems using stuff i've learned from my course additionally i've noticed another course offered by uea (msc knowledge discovery and data mining) which involves the exact same course profile that i would take in my msc computing science except that it has citp and (partial) ceng accredition which would obviously help set me aside as well i've touched base with the school to determine if i can get onto that course with a chemistry degree we'll wait and see when i look at data science videos and communal websites all i see is math algorithms and computer code and matrices and more mathy shit why aren’t critical thinking or reasoning skills emphasized valued in this field? the way i see it data science can’t be without being amalgamated with the practice of inference from epistemic probability (thats a technical term for critical thinking) in my philosophy book it talks about using statistics as a justification for argument and it brings up a problem where someone is both a republican and a resident in a certain neighborhood the statistics for how likely republicans are likely to vote is this one number but the statistics for how likely residents of that neighborhood are likely to vote is another number which means you don't know how likely that person is to vote based on the information alone and you need to do some critical thinking to figure out what is going on surely this is a sort of a problem that will arise when data scientists try to gather together numbers to make predictions now imagine you are assigned to predict--- which political candidate will win an election the data that can be collected include-- race physical attractiveness how much people love him how much people hate him how many times he has shown up on tumblr what his age is what country he comes from how high pitched his voice his how many times he has worn a tie what his favorite food is if he has a rubber ducky in his bathtub or not---- thousands upon thousands of factors i fail to see how solving problems with data science can be done without being adept at the practice of inference from epistemic probability (which again is a technical term for critical thinking) can anyone explain this? why when i go to a data science website or a data science video on youtube i all of a sudden just start breathing math? tl;dr mathematics is the highest and most generalizable form of critical thinking well just for the example you mentioned we can calculate the probability of republicans in that area voting one way or the other you thought it required critical thinking and it does! but the critical thinking has been codified into mathematics; the best way to solve many of the common problems in data science has been turned into the minimization of some "loss function" data science is all about falling back to best practices such that the need for ad hoc "critical thinking" has been minimized as much as possible because the best ways of approaching the data have been outlined and their benefits and drawbacks (because there are indeed drawbacks to any choice of model or data manipulation!) have been quantified because they are quantifies we can use the language of mathematics to explain them clearly for example if i built a model to predict the amount of money someone will spend on my product in the next year (which i just did) and it tends to predict high numbers for high rolling customers but not as high as they really spent and if it tends to predict small numbers for less frequent customers but not as small as what they truly spent the model is not achieving the purpose i built it to do in an acceptable manner! what then? well that's just a calibration problem there is a mathematical definition of calibration which i am too lazy to type on my phone but the intuition of the concept is exactly what i wrote above; predictions are larger or smaller than they should be even though they are large when they are meant to be large and small when they are meant to be small miscalibrated models have been studied in detail and methods to adjust them have been outlined in papers we can read because our problem can now be defined in the precise terms of mathematics we can speak a common language and make sure everyone is on the same page this does not mean that there is no "critical thinking" taking place in fact we can see creative and critical thinking in nearly every model we study and every solution to every data science problem miscalibrated models being one of them that is not to say ad hoc "critical thinking" does not take place these past few days i have been figuring out why certain fields had missing values because they were clearly not missing at random (another math concept just crept in!) but i had structured my pipeline such that i did not expect it was possible they were missing the answer lay in the actual workings of the business this is not really something we are interested in talking with each other about because the problem is not generalizable (another math concept of sorts) my problem was specific to my use case and beyond the "missing-at-random" mathematical framework which i used to identify a systemic problem of missing values it was up to me to use "critical thinking" of the ad hoc manner to solve thank you for your response i really appreciate you taking the time to share your experiences i have a few more questions if its too long to read the bolded words summarize it well just for the example you mentioned we can calculate the probability of republicans in that area voting one way or the other is there a way to do so given only the two statistics? lets say that we know that 1 80 percent of republicans voted 2 20 percent of people in wolfwhistle neighborhood voted 3 joe is a republican who lives in wolfwhistle would it be possible for a data scientist to figure out based on this information alone-- how likely it is that joe voted say using a particular formula? we can see creative and critical thinking in nearly every model we study and every solution to every data science problem miscalibrated models being one of them it seems to me here that you are referring to critical thinking creativity in the realm of crafting mathematical models i was referring to critical thinking in a different realm---that of inference from epistemic probability inference from epistemic probability is a fancy ass term but it should be quite intuitive to most people once they are able to access that concept in their intuition so here is a blurb to give an intuitive sense of it: its epistemic probability vs statistical probability statistical probability is reasoning that something will behave in a certain way because of the number of times it has done it before epistemic probability is making assessments on things based on their properties an example of applying epistemic probability would be the sharp-mindedness of what spies or war strategists do to achieve their goals basically reasoning based not on numbers whereas applying statistical probability looks more like collecting data on a notebook working out the statistics manually and then using that information to make assessments if that gives you a good enough sense of what inference from epistemic probability is i would like to ask you as a data scientist do you find that this kind of critical thinking is just as necessary and important as being adept in math? i imagine that it would as it seems it would be important that the nature of all factors of concern be properly assessed first if only to speed up the process of finally being able to draw the best line that accommodates all the factors lol did that make any sense at all? my problem was specific to my use case and beyond the "missing-at-random" mathematical framework which i used to identify a systemic problem of missing values it was up to me to use "critical thinking" of the ad hoc manner to solve haha i hardly understood a word of that paragraph but can you further describe what the “ad hoc critical thinking” looked like? tl;dr there is almost always a way especially in data science to insert math into your critical thinking process when possible it should be done is there a way to do so given only the two statistics? assuming that there is nothing special differentiating republicans living in wolfwhistle from republicans we can estimate joe's probability of voting to be 80% * 20% = 16% this assumption is called independence and may or may not be valid depending on the circumstances (btw it probably isn't as different geographic regions are very stratified politically - but a statistical model would not know that a priori; this is where a data scientist's "domain knowledge" or ability to apply epistemic probability (as i read it) comes in) if you know independence does not hold then there is indeed not enough information to answer the question for your next question i think there is a false dichotomy you are making about critical thinking with or without math math is just a tool to make thinking very clear and unambiguous and should always be used when it can without it simple things can become extremely complicated as an illustrative example try describing the outcomes of this physical activity but without using any numbers: person a and person b are of different strengths and are pulling a rope in opposite directions with different percents of their strength which person will win this tug-of-war game? not easy isn't it? you know that when person a and person b are of the same strength but person b is using a higher strength percentage then person b will win likewise you know that if person a and person b are trying equally hard but person b is naturally stronger then person b will win but what about when they are both different? what if person a is stronger but person b is trying harder? how do you say that using only the tools of english? in math this is very simple: s1 and s2 are the strength of person a and person b respectively and p1 and p2 are the percents of the strength they are using then b will win if s1 * p1 s2 * p2 give me any s1 s2 p1 p2 i have the general solution right here for every choice! it even works when there are negative numbers the problem is solved and any problem which can be translated to that problem is also solved in math when we can translate between problems we call the problems "isomorphic" - they are for all intents and purposes the same thus when it is possible to use math and when it is possible to define things in mathematical terms then it is the best practice to do so in data science we are lucky that most problems we come across are isomorphic to some mathematical definition which can be studied with all the machinery math provides we can see that in disciplines where using math is not possible (or highly difficult) people just get stuck for example in the law people discuss the tradeoff between privacy and security and which is best for society over the last 300 years this discussion has evolved to become enormously complicated factoring in the effects of privacy of companies having all your credit reports terrorism and google tracking your data because we are unable to put a number on this tradeoff no conclusion has been reached for the last 300 years no framework to reach a general solution has been or can be defined they're stuck and always will be! meanwhile the people who were thinking about tug-of-war 300 years ago can now tell you how much fuel is needed to hurl the lunk of metal they just engineered into space i am not trying to make an us v them mentality here - i am merely illustrating the consequences of not using math when we cannot define things in math every new complication forces us to reexamine the problem anew for example how will we address privacy when the government can decipher encrypted data? examples of "epistemic probability" pop up in machine learning quite frequently most of the ideas from there do not in fact stem from some mathematical derivation (though strong mathematical maturity is required to understand the idea) but come from some crazy intuition which just happens to work this is why machine learning has been likened to "a poorly understood pile of linear algebra" and "alchemy"; without math we do not have the clear grasp of what we are doing that we in the field have come to enjoy and expect i've made my point but just for fun i will use your war strategist as a further example of how critical thinking can use instinct or non-numeric logic or how it can use logic and then back it up with math an experienced and ingenious general in ancient times takes to the field with his army chased by forces superior in numbers but inferior in speed he comes across a wild marsh he knows the marsh can't possibly be good for his horses; it's too slippery! but this is the type of man who sees opportunity in adversity he orders his infantry to stay and lead the enemy on a wild goose chase where all sides are bogged down in the marsh while his cavalry loops around the marsh to deal a devastating strike on the enemy supply lines he knows his cavalry is up for the job; since they've been riding alongside the infantry they've been rested well and the enemy forces must be spread out across a wide area since they've been chasing him for so long the faster ones must be in front and the slower ones lag behind in the back it's a success! when asked how he did such a brilliant move (look i'm not a genius general) he says "the enemy general thought my retreat was a rout and in his overconfidence he let his forces get spread too thin rookie mistake!" now another experienced general takes to the field but this time he cites data again he comes across the marsh "i recorded the average speed of horses when they ride in this sort of mud " he says shaking his head "they are 30% slower and the injury rate on the horses is 3 times higher due to the slipperiness of the mud they can't possibly ride in here!" he starts to think about where instead to send them "well given the fact that we have been chased for so long the enemy forces must be spread out across a wide area in fact this must be true because according to my reports the number of enemies my men in the back face each time they sortie have slightly but steadily decreased each day for the last four days the horsemen could do an excellent job riding past the enemy vanguard and slaughter their way through the disorganized back line " he knows his horsemen still have the energy for the job because they have had three good meals a day since the battle began and he has measured the horses and know that they can ride at speed for up to 16 hours a day for 3 days before they fatigue it's a success! when asked how he came up with such a brilliant move he says "the enemy general let his forces get spread too thin in all the battles i've experienced and in the battles i've studied not once did that turn out well " so what is the difference between these two generals? one assessed the situation the other assessed the situation but he cited quantitative evidence can it be said that if he cited numbers he is "not critically thinking"? of course not! both generals used the exact same reasoning each time! the second just tacked on some numbers and can it be said that because the first general did not cite numbers he was not thinking mathematically? strong arguments could be made that he was doing so unconsciously people in machine learning might even say he has trained his mind to approximate a mathematical decision-making surface which determines his choices in the world of data science we are lucky that such translations are always possible and because they are it is a best practice to cite numbers when we can this way we do not have to rely on "instinct" like the first general and can speak a clear and common language (moreover we no longer have to be a genius like the first general; someone like me can read the logic of the second general and appropriate it for ourself most importantly we know when we should use their logic its strengths and weaknesses because we know exactly how they reached it!) [deleted] thanks and done! i enjoyed my philosophy class because it focused on a very normative topic - the law the practice of philosophy as a whole does it's best work in those areas rather than when it ventures into discussing how things physically are and a true discussion of why 1+1=2 is probably only possible with a discussion of the philosophy of the axioms of math which is totally out of the scope of an introductory philosophy course i would venture that most data science websites and videos are focused on teaching skills and technique whereas critical thinking depends heavily on domain knowledge and to some degree is not something that can be readily taught of course there is a discussion to be had about the importance of interpetability in making inferences using data when machines are involved i think there is a degree of signalling bias when it comes to essential skills for data science because all things being equal a fancy machine learning approach to a problem is more likely to be published and covered than a simpler more empirical approach to solving the same problem in short i guess critical thinking is more specific to a problem or domain and math should be the same for everything so you could talk about what datapoints are relevant to estimate the chance for trump winning the presidential election but this would more or less be only helpful in general for elections and maybe some thoughts are only relevant for this case so math is more evenly applied and problems faced by critical thinking are specific to the case maybe if you look into other subreddits that have domain knowledge a discussion there could take place okay i am going to restate what you just said just so i can make sure i got all your points it seems to me like you are saying that (1) critical thinking skills is in fact heavily embedded in the practice of data science however (2) its not like you can put an inventory of critical thinking skills algorithms online but for math you can which is why communal websites for data scientists is just a stock full inventory of math algorithms https: gallery cortanaintelligence com are these points correct? and if so may i ask do you reach these conclusions by speculation or do you draw from experience from being a data scientist yourself? i'm currently researching in the field of neural networks i worked on some data sets in the past for personal amusement so i might not have a great enough picture but with my current experience i'd say that critical thinking is mostly tied to domain knowledge one could compile a record of abstract methodology's for critical analysis of data but that would only be somewhat helpful i guess for example if you say a data set should be cleaned and nonrelevant parts can be discarded the decision to tell relevant from irrelevant parts is a highly individual problem that can't be solved by a tutorial that is generally on data science edit: another funny example would be if you have a data set of math test scores from children you may find that shoesize positively correlates with the test score but correlation does not mean causation on second look you see that the age of the children varies but the test is for everyone the same so younger children have a clear disadvantage than older ones whenever you conclude something from data you have to question whether this makes sense you can list tutorials on mathematical topics however those lists usually don’t go very deep into the topic unless it is from a textbook or why it is relevant to your business question there’s always a deeper interpretation of the mathematics which could help you understand the data or even how to solve the business problem i e our principal components are different than our non negative factorization components; what does that mean and what will each mean when showing difference similiarities between two points with high dimensions? mathematics statistics is critical thinking data scientists will have to take complicated sometimes rigorous mathematics and translate how that is relevant for the question at hand while also being able to convince other people with many different levels of mathematical understanding (woops i replied with wrong account for that this is op haha) the trend you've described is a symptom of the pseudo-bubble that data science is experiencing with being one of the three most recent buzzwords and having all the moocs and online masters programs taking off the field has been inundated with underqualified incompetent "talent" a sizeable portion of which come from non-stem backgrounds critical thinking is a skill which needs to be taught learned practiced but is significantly more subjective than many other skills required in the field it's much easier to write down or explain some math to a math-light reader or teach basic programming syntax than start at the first principles of logic critical thinking and the scientific method for data science "content creators" the easy pickings are in the 5-minute read webpages or 5-10 minute youtube videos teaching critical thinking is much more suited to a conversational approach rather than a presentation lecture or piece-to-camera this doesn't translate well to web-based resources sure you can casually mention critical thinking but you're never even gonna break the surface in a short video additionally you'll probably find that a large proportion of the content creators in the ds arena are early-stage themselves and likely haven't yet honed in on the most relevant parts of effective data science but rather are sharing their journey with you fwiw the higher-ups in the data science world have top notch critical thinking (and particularly lateral thinking) ability - indeed this is where they're likely to spend most of their time think smarter implement better code less (quantity - more quality) a lot of entry level early stage "data scientists" are really data analysts and are still at the "code monkey" (or even "glorified excel monkey") stage where sure a bit of critical thinking will make your work more robust but if you're doing what a senior ds is telling you to do they're the ones doing the heavy brain work not you there's another post on the front page about supply and demand of talent and it being hard to find competent data scientists this is very true there are so many people in my area who're calling themselves data scientists because they did a mooc and actively blogging vlogging drawing others in to the same position - hot ticket $$$ etc realistically it's another couple of years before non-technical ctos stop hiring data scientists because it's "the next big thing" and realise it's only useful if you've the structure resources and ability to both garner actionable insights and actually act on them 75%+ of the dses i talked to at the last conference i was at expressed irritation at their company for not implementing their suggestions or not providing them with quality data or sticking them in the back office or xyz they've really only got their job because someone (cto or chief hr person) caught a whiff of the data science buzz and said "i'll get me one of those" not because the company needs one anecdotally entry level pay here is really good vs the effort a digital native needs to complete a mooc (compared with a similar time investment of say mortgage advisors) and it'll be a good 12-18 months before anyone at their company could realise they don't add any realistic value to the business this is gonna burst eventually then those types of people will move on to the next big thing and the data scientists left around will be earning their paychecks by thinking hard (and well) about their problems before jumping in to the code thank you so much for your answer! this is good insight and information hello all i'm looking for sources and tips to improve my skills on stuff like using docker working in linux shell scripting aws cli git etc that are not directly used for modelling and stuff but are still operational needs would be nice if there are tutorials specifically for data science purposes but i would still appreciate it if it's not ps: don't know whether "devops" is the right umberalla term for these but felt right enough thanks! the dataquest blog has a few introductory posts that you may like? https: www dataquest io blog docker-data-science https: www dataquest io blog digitalocean-docker-data-science looks good! thanks! the devops engineer linked this on our slack and i thought it was super helpful i'm trying to do the same thing as you it's definitely one of my weaker areas this looks great thanks! ss64 is my favorite reference for unix shell commands i usually go there first when i have to look something up for a command remindme! 1 week first thing's first 2 or 3? chris here: 3 ftw! excellent! i see you're also going to be using pandas which is nice ♥ pandas! for those not in on this joke explain it? there’s a big team behind this university of michigan coursera specialization and we want to share with you what we’re doing to bring applied data science and python skills to everyone! from pedagogy and technology through to curricular design and content please feel free to ask us anything! want to know why we think python is great for data science? or what it takes to put a mooc together? christopher brooks is faculty in the university of michigan school of information and does research in learning analytics and educational technologies such as predictive models of student success kevyn collins-thompson is faculty in the university of michigan school of information and does research in information retrieval and text analysis daniel romero is faculty in the university of michigan school of information and does research in networks and complex systems v g vinod vydiswaran is faculty in the university of michigan medical school and the school of information and does research in text mining and natural language processing such as mining health information from patient records and social media in addition to the faculty we are joined by our coordinators stephanie haley and course tutorial assistant filip jankovic! here’s the course we are planning to teach! coursera org specializations data-science-python here's our proof! http: i imgur com dxaa0f2 jpg where do you see this falling within the spectrum of data science education? do you look at this as an alternative post-bs data science? a more casual introduction to the data science skills? (we're all chiming in on this one!) our target learner is one who has a bit of a background in programming and statistics - maybe a keen senior undergraduate student the focus is on skills not so much theory though of course enough is dealt with to cement knowledge of concepts someone pursuing a graduate degree in for example psychology or economics and some introductory programming skills would find this challenging but attainable someone with a cs and stats background might depending on the details of their background - lots of undergraduate students even in fields like computer science can finish without machine learning network analysis and text mining so there's something in here for them too! and we're hoping to support learners who did do cs stats in the past and are looking to change focus in jobs what do you believe is a reasonable expectation for roi for a paid data science specialization on coursera? how does this compare to the expectations for other certification paths (e g post-graduate certificates)? today has been filled with insightful conversations around data science and python - thanks to all who participated this ama through posing questions and sharing their thoughts! if we haven’t gotten to your question yet we apologize and will try to circle back on it soon for those interested in learning more about our work check out our applied data science with python specialization on coursera: https: www coursera org specializations data-science-python today has been filled with insightful conversations around data science and pytho you literally answered one question here i don't see how your conclusion follows with any legitimacy at all hi chris here sorry we messed up a bit with the cross posting in iama we answered many more and this sign off post was supposed to reference that discussion too! here's a link to the rest of the discussion: https: www reddit com r iama comments 4zch0a we_are_a_faculty_team_aiming_to_bring_applied ?st=is8xph1r&sh=259f9116 i guess the first question should have been are you going to bail at the last minute? pretty disappointing especially to only find out at 8pm the night the first course was supposed to start it doesn't come off as a very professional move at the very least hi guys long time lurker first time poster here i just got my first full time job and it's as a crm analyst basically what i do is i'm the admin of a lot of tools the company has to find whitespace and enrichment of data read: i've always got to be clever in the way i pull lists i'm also tasked with filling in whitespace on my own and let's just say it's a lot to do by hand i love my job and they didn't throw me a shit assignment to be mean they knew i could innovative i was wondering if there are skills i need to be brushing up on at home we use qlik view at work and obviously excel and i'm the only one with some r and python experience qlik is a pain in the ass but i i found some udemy courses that will help i turned my hobby into a job and now i'm in a rut of what i need to be doing to stay sharp and above the fray anyway all help is appreciated tldr; got my first data science job finding white space i'm tired of going home with nothing to do help me out please? i was wondering if there are skills i need to be brushing up on at home netflixing i'm half-serious don't burn yourself out oh man there is so much cool stuff to learn under the umbrella of data science you could fill the time forever! probably the biggest thing is to get better at programming especially python the more coding projects you do on the side the better until you get really proficient at it after that machine learning andrew ng's course is really good from what i hear after those two core things there is a whole world of other less important topics natural language processing is big neural nets are a hot topic right now if you're considering getting really hardcore about machine learning time series analysis is under-appreciated personally i'm really big on probability i wrote [a book on data science](amazon com data-science-handbook-field-cady dp 1119092949) that covers a lot of topics - it's free to just look at the table of contents if you want to use it for ideas thanks man do you think using nlp or some python webscrapping stuff will be useful in my line of work? my guess is that you won't use either in this particular job but they're used in a lot of different data science jobs and in any case they are interesting and would help you build other skill mmm i'm going to go assume you posted to know what you should be learning to make you better at your job (rather than unrelated concepts "for fun") and go against the grain here and say it's probably currently a bad use of your time to be learning more advanced topics like machine learning (supervised and unsupervised) time-series analysis and the like these concepts are more or less useful for big data science and demand some knowledge of statistics and math that you may or may not have to not only understand but to apply appropriately (a few years of stats including bayesian and inferential stats math through calculus and linear algebra if not more) you can certainly apply these techniques to your data but whether or not you've satisfied their assumptions and how to interpret their results is predicated on your knowledge on when or how they are to be used and so you may get some results but whether or not they are significant valid and actionable are suspect unless you have the background (and it sounds like no one is there with enough knowledge to tell you otherwise) when you're given a hammer (statistical technique) everything starts looking like a nail just because you can do something doesn't mean you should here's what i think you should learn: 1 python packages appropriate for the data you're working with: pandas instead of excel for analysis reportlab for generating automated pdf reports numpy for general data use etc 2 the ins and outs of the current programs you're using with regards to their apis 3 good coding practices more advanced oop documentation and version control like git 4 more advanced topics in python programming depending on your current skill level automate the boring stuff is good for beginners to intermediate and is great for automating office work fluent python is great if you're comfortable at coding these four topics will not only make you more useful in your current position they will help you be an attractive candidate for future data science positions i will once again caution against learning things like machine learning and other statistical techniques unless you are willing to develop those ideas rigorously (possibly through a masters degree in stats of math instead or a few coursera classes) https: www youtube com watch?v=_oemnp6hgx4 scruffy second [0:02] second yagami0345 incomedy 112 880viewssincesep2010 botinfo this is what i needed thank you very very much looking for some advice on potential weak areas that i have so i can address them early background on me: meng mechanical engineering graduate from top 10 uk university significant experience with matlab for numerical simulations experience with java matlab python vba for last 2 years at work mainly for simple post-processing of engineering analysis recently completed most of the content on codeacademy decided that i was interested in data science so i began this course on udemy in my spare time my current thoughts are that the coding side will be fine i'm very comfortable with python so far but i think some of the advanced statistics knowledge and 'business' thinking will need to be gained from other sources is there anything else that will benefit my learning eventual job application? if you are anything like me you are really good at understanding the underlying math behind most algorithms work on your stats that is super important statistical modelling distributions hypothesis testing etc etc maybe learn these skills concurrently with r how is your database knowledge? business skills are developed on the job mostly i suppose you could take a few business courses but that won't give you the full suite of skills some things you have to gain from experience from the way you worded that it seems you have a similar background? yeah i can quite quickly work out what existing code does need some practice at creating my own algorithms etc though can you recommend any courses resources for the statistics side of things? the only database knowledge i have is the codeacademy course on sql i'm planning to do some of the 'competitions' on kaggle when i'm a bit further along i guess this will give me the experience on which variables to correlate what's useful for building ml models etc many thanks for your reply i did my masters in applied math so it is similar in so far as math is concerned there are a lot of good stats books out there pick one and read it or watch brian caffo's intro to bio stats sql is probably the most widely used db code academy is a little sparse on sql tho maybe try sql zoo it's encouraging to hear from someone that took a similar path! i'll definitely give sql zoo a look after i'm done with my python course i agree that the sql in codeacademy is very basic thanks again stats - all of statistics wasserman prob - probability and random processes grimmet ml theory - prml bishop or brml barber throw in a good review of linear algebra and multivariate calculus and your golden so called 'business skills' are terribly defined and are literally the least important aspect imo any company worth their salt will want competent people and will happily help them transition into the domain if you can solve an svm by hand or code backprop from scratch there is practically no domain you couldnt pick up so called 'business skills' are terribly defined and are literally the least important aspect imo any company worth their salt will want competent people and will happily help them transition into the domain pretty much no one means domain knowledge when they say 'business skills' they (largely) mean being able to translate business problems into ones they can use their skill set to solve being able to communicate findings and being able to think strategically as well as tactically if you can solve an svm by hand or code backprop from scratch there is practically no domain you couldnt pick up the last thing i would ask a potential hire is 'can you code backprop from scratch' well i think domain knowledge encompasses the skills and ability you mention surely for most people the barrier is not understanding the dynamics of the underlying problem - because its formulated by non tech people etc all of this requires experience and knowledge re the second point we obviously sit at different ends of the data science spectrum my interview for my current firm asked things like deriving the update rules for factored distributions in variational bayes and doing backprop on a basic neural net by hand well i think domain knowledge encompasses the skills and ability you mention if they're applied across all domains then it seems odd to say they're domain knowledge skills their specific application may have domain specific nuance though so i guess we're playing a semantic game re the second point we obviously sit at different ends of the data science spectrum my interview for my current firm asked things like deriving the update rules for factored distributions in variational bayes and doing backprop on a basic neural net by hand fair enough thanks for the suggestions and it's good to hear your take on the business skills what the most important factors are i've come across svm in some reading on kaggle but not backprop it's still early days on my python data science machine learning course so when i do come across them properly i'll be sure to try and deconstruct them instead of just using pre-built functions hi all i'm looking for a good online program to take to develop my analytical and data science skills does anyone know of any good programs? thanks! uwash berkley harvard stanford north western northwestern - ms in predictive analytics program i'm in my 4th quarter and i'm loving it it builds a solid foundation in statistics machine learning predictive modeling etc you'll learn python r and sas for building predictive models http: sps northwestern edu program-areas graduate predictive-analytics www google com sorry but you really should at least do that before posting here anyways berkeley and smu both have online masters programs in data science read papers start a github learn python learn r learn sql compete in kaggle something something rinse repeat learn python and sql don't waste your time with r unless you're already familiar with it - python is better and more popular i strongly recommend reading a book on it also to give you a broad overview i will put in a plug for the one i wrote but in truth there's a lot of good ones out there good luck!! it's harder to get in data science without at least a master's degree possible but more difficult to get in independently you really just need to brush up on your statistics and programming skills also some tertiary skills like github github flow agile development (check out jira) etc you blend the skills of a statistician or machine learning expert and a software developer to prove you can do the work you need to start building out a portfolio of analyses using data sets you can find or are offered e g on kaggle there is a lot of information you can find on this topic i'd check kdnuggets pools to figure out what is used keep in mind that different people use different things according to what they want to do thanks! i’ll check that out googling perhaps well i know but i just wanted to make sure if my good search matches what other ppl are actually working on :) i'm a web engineer and sometimes do work with our bi team this team provides reporting to managers and marketing teams about web analytics this involves normalizing data from many sources visualizations spreadsheets and so on the architecture is spread across multiple databases and data is shuffled and updated between systems routinely we have enough data volume to use hadoop and we also use graph databases for things like predictive analytics where is the line drawn between bi and data science? at what point does one 'graduate' from bi to data scientist? how does pay differ between the two roles? really the biggest difference is on the math and the direction you are looking bi is all about describing a problem and reporting on it using (at most) descriptive statistics for example designing a report that tells how many calls were taken by a specific call center on a given day while not simple it is usually math that a reasonably sharp business person can understand also bi is typically backward looking it is reporting on what happened yesterday out last week there are a lot of well defined tools and certifications for bi data science is typically forward looking with more prediction involved the math is much more complex involving statistics probability simulation and calculus an example might be to predict how many calls will come in to multiple call centers so that you can have the optimal staffing levels the math is often things that you have to explain to people without formal training it is building models that can accurately predict what will happen tomorrow or next week this is am emergent field with much newer tools and very few certifications i don't know that you graduate from bi to data science without a lot of work it isn't impossible but it isn't something that can be done with a few weeks of study it really takes the same shift in problem solving that id needed to move from using simple descriptive statistics to mastering things like regression analysis pay is much better in data science l feel primarily because there are so few people who do it a good senior bi person should make in the ball park of a typical data scientist but there will be many many more applicants for that job thanks for the clear and concise response to this question as my previous training (finance) and current career (accounting) are not too math intensive i've been trying to break into the bi route i prefer self study rather than extra schooling considering my already abundant student debt can you please briefly list some of the more important well defined certifications for bi i should pursue to break into the field? thanks! i would say that you might be surprised at how appropriate your background is for a career in bi in large part bi is doing reporting around business operations and profitability an area where you probably have strong domain expertise i have seen people who work in bi and focus on general ledger operations do quite well especially if they have the respect of their executive team in terms of skills the most wide-ranging areas to focus on are sql and excel don't get me wrong there are a ton of different bi tools out there however most corporate data used for reporting lives in databases and looks to be there for a long time sql proficiency is a must you can download a fully functioning sql database from a vendor or an open source project i would say that oracle and sql server are probably the most popular databases to have on your resume for bi you can learn many of the same things with open source databases but you don't see them in industry nearly as much for bi specific tools the really hot ones right now seem to be tableau and qlikview however tools like sap business objects and microstrategy have been around a really long time and have large following also oracle financials are a near standard which makes obiee worth considering there are also the open-source routes with tools like pentaho and talend where you can download and run it on your desktop at no cost excel may be poo-poohed upon by techies but it is in wide use around the industry with many small companies using it as their primary analytical tool as for certifications i find that they have not been of great value to me in the technology field in 2005 the programmer with all the java certs wearing the java hat and the denim java shirt drinking coffee from their java mug was typically not the one who survived layoffs each bi stack has its own certifications you would probably get the most from oracle or sql server related certs also if you want to go into data science learning sas bi might be an option although it is very expensive to do and the software is hard to come by there are some vendor agnostic certs like cbip but i haven't seen them make a big difference in people's careers if you are really concerned about the education i might consider an inexpensive continuing education certificate from a real university (like this one these should not be more than $1000 usd unless they are something really special the thing to understand about hiring is that you have to be able to convince the person that you can do the job not just get the job the more you know about the company hiring manager job and expectations the less certifications matter i have found it was much easier to communicate my value when working with friends or friends of friends or startups staffed with real humans working with large companies can be really difficult blindly sending out resumes is a real challenge as well people are typically evaluated on a combination of technical aptitude domain experience and upside potential to the employer you will be a stronger candidate for accounting or finance-oriented job or doing consulting that spanned both bi and finance in a nutshell i think you have a strong background for bi especially in a finance-oriented capacity the industry has a lot of technically trained people but that shouldn't stop you you can learn bi on your own build your skills network and break into the industry there simply aren't degrees in 'data' (at least not until very recently) so everyone who is working in the field started somewhere else and had to learn from the ground up cheers! business intelligence tools: business intelligence tools are a type of application software designed to retrieve analyze and report data for business intelligence the tools generally read data that have been previously stored often though not necessarily in a data warehouse or data mart interesting:businessintelligence |datawarehouse |businessintelligence2 0 |oraclebusinessintelligencesuiteenterpriseedition parentcommentercantogglensfw ordelete willalsodeleteoncommentscoreof-1orless |faqs |mods |magicwords you are awesome thank you for going beyond my question and linking examples as well as explaining more about what is valuable what might not be beneficial and showing me a better picture of how i may fit into the bi world before i left banking i got a glimpse of obiee (there was an intranet portal dedicated to teaching some basics) and always work in excel my short term goals were to learn more vba and work toward sql certification to break into more of a bi role you have given me a better framework in which to focus and it's much appreciated yes thank you for the detailed response this is very helpful! business intelligence is about building reports data science is about building models i like this answer! we have enough data volume to use hadoop which is how much? and we also use graph databases for things like predictive analytics like teradata? neo4j? where is the line drawn between bi and data science? at what point does one 'graduate' from bi to data scientist? how does pay differ between the two roles? any answer to this will be mostly made up as no one really agrees the difference is the title 'data science' is a catch all i would say data science captures bi in the same way bi is captures in 'business analysis' people who act like predictive bi = data science and descriptive data science = bi don't know what they are talking about in the same light: hadoop doesn't mean data science and excel doesn't mean bi ba people who act like data science is precisely defined have absolutely no idea what they are talking about there a two schools of thought on that i think the first essentially is difference in tools bi teams work with tools that are optimised for small to medium data sets (qlikview tableau or similar) while data scientists are working with tools for big data sets (hadoop spark or similar) the second is the idea of a data pipeline whereby the data engineers capture and process the data the data scientists verifies and aggregates the data and the business analyst (bi) then interprets the data in the context of the business this is how we handle it in my company as well as companies like king or facebook afaik not sure about pay difference all depends on skill set and experience you're getting some down votes but you're certainly not wrong i have seen this work flow before where the end user is the bi analyst thanks yeah sometime people on r datascience take the very narrow view of data science (it's only about predictions!) or a very broad one (it's machine learning + engineering + modelling + analysis + everything) but the actual work in most companies looks very different this is also a good answer thank you! hi sheldon!!!! yay internet! i'm a first year phd student in big data stream analysis it's now time to submit my proposal however i have not yet decided on what to focus my undergrad was in mathematics so i'm pretty confident in that area and my masters was in computer science - just basic programming skills along with some cool introductory machine learning modules i know i want to work in industry as a data scientist as soon as i graduate and for this reason i want to know what kind of skills i need to develop during my phd from what i've seen so far the most popular are deep learning (mainly theano) spark hadoop rapidminer and python so you guys are more experienced than me what kind of skills would you like your future starter co-worker to have? given your background best thing you can do is get experience working on problems and communicating results communication of insights and communication to fully understand the needs of the people you're working with are the biggest challenges i've seen facing aspiring data scientists while it's fun to learn new tools at the end of the day were in the business of helping people solve problems if you don't have the skills to do that all the tools in the world are worthless [deleted] i want to clarify that when you say "business logic" you mean in the normal-person meaning how things make sense for the business and not the software analyst meaning in which "business logic" is part of a software design specification thank you for your reply i appreciate it what you're saying makes sense to me although i've never worked in a big company and that's indeed an area i must improve on but what about the actual theoretical knowledge and programming frameworks and such? are they not so important since i can always learn whatever i need? i'm asking because i don't want to waste 3 years of my time learning tools and methodologies that won't be useful to me once i graduate tools and methodologies are of course important most people you meet will assume you learned the technical skills during your classes the research portion of your phd should also teach you how to well do research a good researcher is able to explain results to someone outside the field like a business manager [deleted] i see your point maybe i'm overthinking this i just wanted to be sure i'm not doing what i'm doing in vain because i'm sure i won't be staying in academia many thanks! sounds like a blast if you're dead keen on getting a job in industry i would say you should be looking at doing a phd project funded cosupervised by an industry partner -- you'll get hands-on experience working with production data you'll be solving real problems (and making real money) for a company you'll probably be paid more than if you were doing your phd just at a uni and you'll practically be guaranteed a job after here's the first thing that came up when i googled "data science industry phd scholarships" there's tons available and if you're degree is already funded this will open up your options even more just pick a company or industry you want to work in on a side note i'm really sad that things are going this way it's practically impossible to get a research job in pure research (my phd was in pure maths) unless you're basically a fields medalist well it's too late now i'm already studying towards a phd and it's not funded by a company to my defense i did try to get into a phd in the university of amsterdam which was funded by philips but i got rejected which was surprising really because i had the qualifications and a solid background in neural networks while the theme of the project was deep learning well i can't have it all! in any case thanks for your input a skill expertise that's relatively difficult to find is being able to rapidly make sense of filthy real world data that you didn't create yourself this doesn't look worth it to me when there are so many mooc's and online resources i wouldn't pay for a data science course i've heard that most employers require some kind of advanced degree otherwise they won't even consider you that sounds true but i wouldn't do the advanced degree in data science the field is still relatively new and i think a traditional masters or doctoral degree in fields like applied math comp sci or stats would do you best furthermore these degrees are a fuck ton cheaper than $25k year but who am i to say i've been wondering about this sort of question myself yes there are mooc's out there and a person might be able to recreate the same learning without attending college but how do you prove your knowledge to employers? i'm frustrated with the academic system after many semesters in college i feel like most of what i've learned has been through burying my head in a textbook for hours and very little from the "instruction " (especially in technical classes the humanities have usually had helpful instruction in my experience) i feel like if you have a relevent undergrad degree then the masters isn't as important however in my case i will be graduating with a ba in economics now i know that i can learn the material on my own but i feel like the networking and the fact that i have a degree on my resume may be worth the price tag as far as masters programs go this one is pretty cheap and from a decent school i'm really hoping to find someone with some experience in the field to give some advice i will be getting my phd in the research evaluation measurement and statistics (rems) program within my school's educational psychology department most of my classes are statistics-related and all of my research surrounds statistical learning but am i out of luck because of what my degree will be in? thank you for your help! i think the answer depends my concern with your degree is that you might be to focused on linear models and spss what analysis tools do you use? what programming languages do you use? i use r daily for data cleaning basic and advanced statistics data mining the degree itself is essentially a quantitative psychology degree some of the other skills i've seen recommended (e g python and sql) i don't really have experience with but am planning on learning soon our school offers a data science masters and i'm planning on taking some of those classes to fill in gaps of knowledge r is good definitely take the classes in data science yes but it's difficult i'm in this position and it's hard to get people to listen to me most companies ignore me they will ignore you too many data science positions are currently filled by people from the physical sciences because they do so much empirical analysis however i've heard discussions from managers hiring about not liking certain candidates because while they were very strong in stats they were not able to do the cs side of things data science is not all about statistical analysis there's a lot of data wrangling you may need to implement some ml algorithms or stats models for specific problems if you don't have the cs background to do that you'll probably have a harder time getting a job i'm likely not the best example as i have leadership experience at fortune 100 companies but my phd was in industrial organizational psychology it's very heavy quantitative focus (due to hiring surveys must be unbiased to title vii or else companies can be sued) so i took statistics classes in my department education department and statistics department i rapidly went from a data scientist role (recruited right out of grad school on linkedin) to a ton of sr data science offers to a director of analytics offer so yes it's feasible but there's a ton of factors out there it's up to you to have acquired the full skill set and then convincing recruiters hr that you know your stuff i did i quit halfway through a phd in ecology to move to industry i focused on the healthcare space worked for a pharmaceutical contractor and now work for a medicare anti-fraud firm be prepared to tell a story at your interviews and really work to tie your skill sets to what they need i would not expect so - which is to say your application won't be thrown out because of your degree and it may very well be a positive lots of folks i've met who work as data scientists have degrees in fields more distant from statistics than yours - physics chemistry engineering etc fields where statistics is needed but is not the theoretical focus - and people with those degrees routinely get jobs as data scientists i would be ready to explain how your skillset is geared towards data science to any prospective employers as they may be unfamiliar with the courses you took as long as you can pitch your skills to them and pass the technical questions on interviews i would think that it won't be an issue for you hi i'd love to write a blog post for anyone interested in getting into data science and wanted to ask fellow data scientists how did you get into this career? where did you start right after graduation? what did you study? did you take any additional courses? and most importantly - which skills do you think are the most important for someone to have or gain if they want to be a data scientist? thanks angelina angelina you might take a look at http: www datascience university skills to start maybe also http: www datascience university people good luck and best wishes with the project oodles and oodles of subject matter expertise and the curiosity to learn about new technologies plus enough understanding of logic and math to actually implement those technologies to me the most important thing to remember as a data scientist or really anybody concerned with probabilities is that most things are unexceptional extraordinary claims require extraordinary evidence keeping this is mind helps you stay objective and on target to actually answer the business question at hand ml some stats class and a couple of other moocs on coursera and it was enough to pass an interview back than i had 3+ years of experience as a java developer how long ago was the interview? slightly less than 2 years thanks everyone! that's great to hear i really want to show people that a lot of people already have the necessary skills to go into data science and that data science isn't all the same - that is varies greatly on the type of industry and job and that different skills are needed for different industries! was data science harder than biology or marketing? referred here by the computer science subreddit because it turns out my interests was more in data science than computer science short background: near graduation with nothing but research as primary skill but i'm done short by data processing tools rather than moping around trying to find what works with my failures i'd like to learn programming to bolster my only skill what are some programming skills (languages frameworks) that would help me with big data tools (jaspersoft bi suite talend open studio skytree server tableau splunk etc) and traditional data tools (hadoop mongodb cassandra redis riak couchdb etc)? my goal is to get into the data science career since i'm nearly graduating i've nothing to do but work and learn new skills thank you for your time! r python java julia thank you! i've never heard of julia thought what is it about? it is a high level functional programming language you can read more about it here http: julialang org yes i think some job post ask for it just to appear more appealing but if you don't know it it doesn't roll you out of any position analytic skills math skills understanding of causality - it's more important than any particular tool i would start with: 1 sql - it's a standard 2 bash python (or perl) very often you need to do a batch work so some programming skills are always handy and you will be more flexible in case you change your mind in a future so i'm a grad with limited knowledge of stats coding and data analysis (hard science major) was recently employed as a data analyst i have lots of motivation for getting into data science mining what would be the best (fastest) route into data science? would it be to try and find internships that match the skills i want as closely as possible? would it be wiser to try and find a software engineering role grad scheme and start learning stats data mining techniques in my own time? it's easier for me to get back into a data analyst role so perhaps that would be a good idea? but i question whether that would give me enough technical skill for data science here is an article that analyzes the main paths for becoming a data scientist today: a master's degree a bootcamp or taking online courses on your own time it goes over the differences in experience cost time advantages and disadvantages http: datascopeanalytics com what-we-think 2014 08 04 how-do-i-become-a-data-scientist-an-evaluation-of-3-alternatives i hope that helps making your mind interesting link but are you suggesting that job hopping to acquire the different skills is not a good idea? try to leverage yourself as a data analyst who can code by positioning yourself closer to your software team (i e munging logs automating data analysis with python scripts) then write a business case for data science for your business and present it to the highest position in the company you can if that doesn't work i'm sure their competitors would be interested interesting thanks for the response but then what's stopping them from hiring someone more suitable instead of gambling on me learning whilst i work? hiring people costs a lot and you are essentially doing their job for them! as someone who has worked in science shouldn't there be something like error bars somewhere i mean how do you know hadoop being the highest percentage skill in chicago means anything ? how big is the sample ? how big are the fluctuations? errors bars aren't perfect but they do beat nothing https: imgflip com i jjgxi na how to get user flair filters (in testing): discussion meta career networking tooling education projects fun trivia welcome to r datascience a place to discuss data data science becoming a data scientist data munging and more! data science related subreddits related applications r analytics r bigdata_analytics r bigdata r businessintelligence r visualization related methods r machinelearning r statistics r rstats r pystats r datasets related help r askstatistics r learnmachinelearning r mlquestions r datascience curated content official podcast list where to start if you're brand new to this subreddit and want to ask a question please use the search functionality first before posting this way you can search if someone has already asked your question you can use the search form on this page or visit the following link which will allow you to search only this subreddit = data science subreddit search rules of the road be fair be patient be helpful read this no video links no surveys post suggestions anything you think is worthy of discussion no listicles (n free videos y free book z free courses etc ) no "best of" posts that are just lists - rather than show a list contribute a post that describes your top choice remember the reddit self-promotion rule of thumb: ""for every 1 time you post self-promotional content 9 other posts (submissions or comments) should not contain self-promotional content "" if cross-posting please put it in the title - it is very helpful to see what other discussions are happening elsewhere on reddit i cannot agree more with this post i’m currently wrestling with integrating my work into an agile iconix environment especially into the daily stand-ups you’re also on point that there needs to be a better divide generally between the “programming” a data scientist does to automate analyses and real software engineering required for real-time analytics and creating customer-facing data products for example i was stumped when i recently read a blog post on hiring data scientists where the author states that they turn down anyone who doesn’t understand duck-typing in python would love to hear more of your thoughts on this and especially on the importance of serious people skills – something that isn’t talked about or understood enough and takes a far far second place to just hunting for phds in math or cs for these positions the more you get away from ds the less you have these issues as a consultant people client and soft skills outweigh technical capability sevenfold (from india ) i graudated last year and have been working in it industry since a year i'm comp sci grad i work with an mnc and recently i've shown interest in chatting about analytics and data science related stuff with some senior management of business intelligence domain in my organization now when a new project in data science is coming up for the org they've chosen me to work in that project although i feel pretty happy and pleased about that i sometimes also feel overwhelmed by the stuff i need to learn in data science i was told to improve my statistics skills and other data science related skillset on an urgent basis so it would be really helpful if you can provide me with a list (that doesn't scare-off a newbie) of skills that i need to develop in order to become good at data science note: although i'm a comp sci grad my interest has never been towards programming and algorightms i like database related stuff like sql etc so please also suggest me what specific programming and algo skills i might need now thank you http: www reddit com r datascience comments 2vomul so_you_want_to_be_a_data_scientist hey that's pretty helpful thnx :) na how to get user flair filters (in testing): discussion meta career networking tooling education projects fun trivia welcome to r datascience a place to discuss data data science becoming a data scientist data munging and more! data science related subreddits related applications r analytics r bigdata_analytics r bigdata r businessintelligence r visualization related methods r machinelearning r statistics r rstats r pystats r datasets related help r askstatistics r learnmachinelearning r mlquestions r datascience curated content official podcast list where to start if you're brand new to this subreddit and want to ask a question please use the search functionality first before posting this way you can search if someone has already asked your question you can use the search form on this page or visit the following link which will allow you to search only this subreddit = data science subreddit search rules of the road be fair be patient be helpful read this no video links no surveys post suggestions anything you think is worthy of discussion no listicles (n free videos y free book z free courses etc ) no "best of" posts that are just lists - rather than show a list contribute a post that describes your top choice remember the reddit self-promotion rule of thumb: ""for every 1 time you post self-promotional content 9 other posts (submissions or comments) should not contain self-promotional content "" if cross-posting please put it in the title - it is very helpful to see what other discussions are happening elsewhere on reddit na how to get user flair filters (in testing): discussion meta career networking tooling education projects fun trivia welcome to r datascience a place to discuss data data science becoming a data scientist data munging and more! data science related subreddits related applications r analytics r bigdata_analytics r bigdata r businessintelligence r visualization related methods r machinelearning r statistics r rstats r pystats r datasets related help r askstatistics r learnmachinelearning r mlquestions r datascience curated content official podcast list where to start if you're brand new to this subreddit and want to ask a question please use the search functionality first before posting this way you can search if someone has already asked your question you can use the search form on this page or visit the following link which will allow you to search only this subreddit = data science subreddit search rules of the road be fair be patient be helpful read this no video links no surveys post suggestions anything you think is worthy of discussion no listicles (n free videos y free book z free courses etc ) no "best of" posts that are just lists - rather than show a list contribute a post that describes your top choice remember the reddit self-promotion rule of thumb: ""for every 1 time you post self-promotional content 9 other posts (submissions or comments) should not contain self-promotional content "" if cross-posting please put it in the title - it is very helpful to see what other discussions are happening elsewhere on reddit i'm getting really tired and frustrated of hearing some folks (especially big-name data people on twitter) who try to reassure people like me that it is possible to get a job in data science without a grad degree i graduated from a boot camp two years ago and i've applied to hundreds of entry-level data scientist and data analyst jobs with no luck i currently work three different contract jobs in one of my jobs i work as an online mentor which means i'm qualified to teach the stuff but not qualified to work in this field i made 28k last year and because i only do 1099 work i'm facing a huge tax bill in a couple months yeah i know i should've been paying estimated taxes along the way but that's hard to do when you have $400 month boot camp loan payment i've done the networking thing to death i've hit up my friend's sister's boyfriends and my friend's tinder date for coffee and nothing has come from that i'm constantly working on improving my skills and publishing work but nothing is coming from that i've had professional data scientists tell me they're stumped as to why i can't get work so i guess this means that i need to go grad school and go further into debt let's say i choose to do a masters how do i know the job market won't be stacked in favor of phds? i honestly don't know what to do at this point my motivation to do my current jobs is eroding and it's affecting my performance at work at this point all i want is an internship but that is pretty much impossible because all those positions require you to be in school maybe i'm just bitter and i need to rant but if there are pieces of wisdom you can spare i would greatly appreciate them edit: i honestly did not expect this to receive as many responses as it did i greatly appreciate the quantity and quality of feedback including the more critical ones as well i'm going to try to keep responding to more of your messages hate to tell you this but when you finish that degree and have all that debt you'll still be filling out hundreds of applications with little or no response are you able to consider relocation? i'm willing to bet that most of the people that you feel are doing a disservice aren't familiar with your local job market that totally depends on the program you go into there are data science analytics programs that are a near guaranteed high quality job upon graduation [deleted] i did along with all of my classmates and every graduating class since is that a trend that you think can go on forever? because honestly "i know a guy who every single one of his classmates got a job right out of college" sounds like evidence that the jobs are filling up pretty quick top ranked mba’s have been getting jobs for years do you think there are harvard mba’s struggling to find work? i don’t see why top ranked analytics graduates would be any different unless it’s a skill that is no longer desired why are you comparing op to a top ranked harvard mba? or was it you that graduated from harvard with honors? sorry i'm having a ton of trouble finding out why "top ranked mbas" (there's no need for an apostrophe there since it's not showing ownership i'm amazed that i have to correct someone as edumacated as you) is all the sudden being talked about because it is a degree that has longevity and continues to have value your idea that all the jobs are going to be taken up and then the top programs are going to be unable to get their graduates jobs is foolish the world is absolutely saturated with mba graduates and yet those from good programs are highly sought out and in demand the same is true for analytics professionals do i need to hold your hand and walk you through this some more? where would i relocate? do i just choose a city that i think wouldn't have a competitive data science job market? do i move first and then apply or apply first and then move? i understand your frustration but dude google is your friend i've done the networking thing to death i've hit up my friend's sister's boyfriends and my friend's tinder date for coffee and nothing has come from that i would hardly consider this "networking" go to meetups hack nights etc what are some of your open source published works? maybe you could get feedback from those what is your undergrad degree in? also linkedin working online forums writing essays (i was close to crossing this off my list as "too 90s" except the "difficult to please" hiring manager specifically mentioned some of my articles during the interview) think of it this way - people are molecules bouncing around in a fluid every time one person bounces off another there's a small chance they'll "stick " so what you want to do is to bounce off as many people as possible increasing the chances for a "stick " i call it "increasing your social surface area " (i give the same advice for dating fwiw) networking means more people are aware of you writing essays means more people are aware of you meetups means more people are aware of you what you want is that when one of these people bumps into another and their friend says "man i simply cannot find someone to do our data analysis work " that the person that knows you says "i know this one guy - let me send you his resume " this is a result excellent response these types of jobs are pretty high profile i'd bet many companies would consider applicants from other parts of the us or world i moved over 1k miles each time for my first two jobs out of grad school ok this answer shows me that you need some help approaching the hiring process realistically - if you are able and willing to relocate you apply first make clear as part of your application that you would love to move to their city and take the best job you're able to get out of that process if you can afford to go as far as to say that you will not need help relocating - because that may open up some jobs things are different if you're married or dating someone but if single you should relocate to the first and or best job offer you get definately don't move before applying the biggest cities for data science are sf nyc dc la and austin apply to jobs there if one of these is somewhat close you could also try going to a meetup don't forget seattle no he should do the opposite don't apply for jobs there because since those are the hot areas for ds then they will be full of people with advanced degrees and he won't be competitive go to tier-two cities: nashville atlanta madison cleveland etc speaking from experience at least in dc it isn't hard to set yourself apart from the crowd i also think that your goal should be to obtain the experience and skills to become top tier instead of selling yourself short obviously everyone is aiming for top tier but realistically op isn't there yet which is why op is experiencing this drought nothing wrong with going to a tier-two company city for a bit to learn then move up all about being realistic la's data science scene is very meh compared to dc sf or seattle i've seen the same few bad job openings posted for 5-6 months now that's interesting u working_from_home_ has a point there are many routes you can take but i've heard many stories of people getting jobs by networking at meetups if you have a friend at a company with an open slot and your friend knows you know your shit guess what he's going to want to work with a friend than a stranger and give a good recommendation this is something i wish i did well my friend's friend who's "not the brightest" has a nice position at a prestigious firm just because she knew someone my undergrad classmate with a finance undergrad job got a quant position working along graduates and phds with stem degrees because he was good friends with the professor i've personally met multiple people with decent paying data scientist roles (not six figure level but still) that came out of bootcamps and don't have graduate degrees in fact i met one guy who is a data scientist who as a b a in psychology i don't think your lack of education is what is holding you back here a little but late to the party and not op but if i was willing to consider relocation how do i search out the jobs? would you recommend focusing on one city first or do a blanket search on the whole country? this i'm getting really tired and frustrated of hearing some folks (especially big-name data people on twitter) mistake one: taking "big-name data people on twitter" even half-seriously yeah i know i should've been paying estimated taxes along the way but that's hard to do when you have $400 month boot camp loan payment mistake two: paying for a bootcamp that isn't "get your money back with no job" or "charge you via 1st year salary" since bootcamps (for the most part) are basically money machines that try to ride the hype train especially when you could have just gotten a masters if you'll pay up anyway i've done the networking thing to death i've hit up my friend's sister's boyfriends and my friend's tinder date for coffee and nothing has come from that mistake three: thinking networking is over tinder etc and not from meetups from others seriously interested in professional relationships p s most of the "you can get a job without a grad degree" do not have people who are in school as the target audience; instead it is people who have graduated some time ago before data science became the "hype " who have work experience in related fields that are the target audience for the message mistake one: taking "big-name data people on twitter" even half-seriously care to elaborate? not sure if i'm missing something or if this is just a casual statement mistake two: paying for a bootcamp that isn't "get your money back with no job" or "charge you via 1st year salary" since bootcamps (for the most part) are basically money machines that try to ride the hype train especially when you could have just gotten a masters if you'll pay up anyway agreed i've been careful about these details myself mistake three: thinking networking is over tinder etc and not from meetups from others seriously interested in professional relationships i kind of wonder if his type of networking would be more effective in other fields of work it might be a case of simply never being told what to actually do i have no memory of how i first stumbled into a makerspace several years ago but i'm almost positive it wasn't a conscious decision to better my career p s most of the "you can get a job without a grad degree" do not have people who are in school as the target audience; instead it is people who have graduated some time ago before data science became the "hype " who have work experience in related fields that are the target audience for the message this is something i hate about advice giving there's this meme going through your culture to do a thing and you don't know what context that meme is supposed to have lol look up survival bias and big-name data people on twitter that is indeed a thing but there are multiple ways to interpret his statement so i was just curious about what he actually meant two comments: getting a job in data science eventually vs immediately becoming a data scientist are different things when people talk about getting a data science job without a grad degree i think the general thought is that you can eventually become a data scientist but you'll need to gain some experience first you should really examine your resume phone interview and on-site interview skills the one disadvantage that people who just finished undergrad have is that they haven't had the opportunity to gather as much experience on how to brand themselves solid reply agree on both items 100% agree with this perhaps you need to shoot for an analyst position first i don't see people becoming data scientists out of school-- it takes experience as well also imo it's all about selling your self in interviews and communicating you're skills experience effectively check out ramit sethi he has a lot of good videos articles on this subject is an analyst position more realistic for people with just an undergraduate degree? i think so unless your undergrad is in math statistics or computer science agreed! 100% i took a paycut to become a project coordinator- that was 6 months ago- i'm now a business analyst data analyst and had zero experience in it good luck! and keep at it! i'm currently a project coordinator and looking to become a business analyst data analyst! do you mind sharing your journey? i have a bachelors degree in marketing (conveniently received at the height of the recession) ended up in the restaurant industry a year later i was bar manager of a corporate restaurant 4 years later i realized i didn’t want work 60+hours a week and have no life for not enough money- i went and got my scrum master certification and my capm (project management certification (one week of class 3 months of studying and a 3 5 hour exam later and recruiters came out of the woodwork! i was determined to get a position in it (the work life balance and the culture appealed to me) got a position at hospitals pmo as a project coordinator for $45 000 yr i was already prepared for the huge pay cut in advance so i sucked it up and did the work(which was mostly admin work) during this time i was i offered another position through another consultancy to become an associate project manager (basically a project coordinator) with a fortune 500 company and jumped at the chance it was still a project coordinator position but it was with a larger company and the work sounded more exciting and hands on than my previous position i worked hard and showed them i was career driven willing to learn and just was extremely enthusiastic about the opportunity i had been given we started a new team and i was offered to learn how to become an analyst and work towards using different data analytics tools not sure where i’m going next i just know i want to learn more! :) i definitely agree with this advice! it is not an overnight process developing your professional interpersonal skills is a lifelong journey bootcamps are a way to accelerate this process however there are mixed reviews of these programs so you have to select the right one i have a friend who just recently graduated from principal analytics prep a new analytics bootcamp in nyc and she can’t stop raving about it you should look into that one to see if it fits your need what kind of work are you interested in and applying to? do you have a quantitative bachelors? do you get interviews or not? i don't know why this comment looks to be downvoted as i feel these are important questions before we can fully condemn those who claim there are positions available without the need for a grad degree without a quantitative bachelors (or a bachelors degree at all) i could see how a boot camp could potentially not be sufficient whether or not op is making it to interviews makes a big difference as to advice as well as understanding what might be going wrong is it a resume issue which could make sense due to the need for real work experience and they possible negative perception of bootcamps in their ability to develop capable data scientists further i'd add a question about what kind of side projects op has done sometimes i think moreso when transitioning to a new career and especially data science side projects can make or break your application it's a chance to show you could he immediately valuable to the company and that you have the capabilities needed to be successful in the role without direct job experience this is a very important (imo) aspect of the job hunt anyways i feel for you op and i really hope your luck changes getting through a boot camp is not a walk in the park as i understand it and i wish you the best! i sometimes question whether side projects are the right thing we should be telling people to do its like you expect me to spend all my free time doing this stuff that i want you to pay me to do? i have to love it so much i shouldn't ever do anything else? my side projects are like i made a short film on what modeling (like computer modeling) is i'm trying to build a rain noise generator (whatever i like rain) i made an arcade controller to play nba jam i like mst3k and bicycling side projects should be hobbies that tell you about a person i think the thing that is like ten times more powerful than side projects is having a github that you actually use i've had multiple job offers (from real people not the recruiters that harass you on github) where in the interview people are like "looked at your github you know how to do things" and i would say my github is a mess of nonsense i've scraped together over the years (like a hot dog template for visual studio) i guess my point is side projects aren't as important as evidence that you know how to do what you say you know and you know how to sell that to people based on their post op needs to know how to give an elevator pitch know how to network know how to write a cover letter op needs more soft skills i dunno i don't think side projects are inherently bad just that what we expect them to tell us about is a person might be the wrong thing also i think i'm jealous of the people who have really cool side projects i feel like the avoidance of side projects is an odd opinion personally i think it is similar to the opinions i see quite a bit on the cscareerquestions subreddit which is relevant because it's a similar tech field about doing take home projects and other "extra" work to get a job i'm not personally in computer science but i am working towards getting into the world of data science and i feel like side projects are a must have for someone transitioning careers i come from a mech eng background and i have utilized side projects to get multiple jobs when you are up against a great deal of competition you need something to set you apart sure you don't really want to invest 40 hours of work to solve a problem for a company that may not give you a position but this "unpaid consulting" is 100% worth it if it gets you a job it's not unlike sales which is the field i am currently in lots of people will tell you to never perform unpaid consulting as you are giving away value for free without any need for commitment on the end of the customer while this is true i am torn on the issue as providing value to a customer is important in order to win a sale there are levels sure and excessive assistance is wasting your time however it again sets you apart from the competition and puts you in a better position with the customer the thing about data science is there are so many different types of data scientists and so many different types of roles that all fall under the same umbrella if we take away the competition argument for a minute a company still needs to know that you can provide value to them before pulling the trigger on a decent salary and committing to you unless you have substantial work experience with clear metrics on the value you provided a company (in which case this conversation wouldn't matter because you probably have offers sitting in your inbox) you need something that will highlight your skill set coming from a different field or from a bootcamp starts you behind the ball a little bit why should i hire you? "oh well i scrubbed data from various websites and developed a model that led to a conclusion about solving traffic problems " it's especially great to have something relevant to the company "i know your team struggles to set sales numbers evident by your public release of your q4 performance last year where you missed your goal by 50% i looked into your industry and analyzed available public data to develop some action items for your sales team" how fast do you think that guy is hired? granted to your point github could be used as a replacement for this though i personally am unsure if it would carry the same weight regardless you need to convince a company that their investment in you is justified and working on projects is the best way (imo) to do so which is the entire reason most bootcamps focus on projects rather than just lectures i apologize if i came off combative or argumentative i suppose i have some strong feelings on the subject and could be arguing in support of side projects because it is my personal plan to get in the industry in which case it would be disappointing to hear i'm wasting my time if you don't have a job all you have is side projects my point is if you have a job you shouldn't be doing side projects honestly i've never worked or sought jobs in an industry where i would be responsible for producing models that increase sales or whatever it's pretty boring if you are just searching for kpis related to revenue or profit so maybe i don't know anything at all do you have any open source data science projects you could point me to? perhaps on github? interested and curious i have a bachelors in economics im interested in an entry level data analysis or data science work i get a call back for about every 12-15 applications that's awesome that said i'm skeptical that an econ bachelor's degree would be the reason for your callbacks (unless you went to a top-tier school) you must be impressive in other aspects as well the reason i asked that is because "data analysis or data science" is vague and covers an enormous range of work i know this is just a rant post and not how you present yourself to companies but it sounds like you are applying to everything with the title "data analyst" or "data scientist" you might need to think more specifically about the kinds of things you're most qualified for and would enjoy then apply only for the positions that meet those criteria and also apply for positions that meet those criteria even if they don't have the word data in the title if you get a job building r models or sql scripts or whatever it's not going to matter whether your job title is "data analyst" or "bi analyst" or whatever these days i only apply to jobs with the loosest requirements so if its a job that says something like: python r sql machine learning skills required then i apply to it do you have an active github with proof that you have the skills you claim to have? are your resume linkedin github clear that you have useful skills and are actively learning? yes and yes seems to me you’ve got to change something i was in your exact position a little over 3 months ago and landed a data science position so stay positive and believe in yourself that’s probably the hardest part after getting rejected so much had people telling me the same thing and that i though my skills were impressive their data scientist candidates were all phds i’d recommend meetups some aren’t great but others are enlightening i went to ~1-2 a week for the 4 months i was looking and don’t be shy it’s better to be awkward as f and walk into strangers conversations and introduce yourself than to go listen and leave also these don’t necessarily feel productive in the moment but they do show interest and give you topics to bring up in interviews ds also requires skills outside of bootcamp curriculum shit are you familiar with linux? logging into servers? aws? sql? writing code outside of a jupiter notebook? have you coded any algos from scratch? do you have a blog? curious what your side projects have been and how much time you’ve spent on them it’s overwhelming sometimes but if you crank out serious 8-10hr days for 5 days in a row you can get a lot of stuff done good luck keep your chin up can i ask about your background when you started? i’m hoping to get into data science but i’ve been told it’s very difficult without domain by way of grad degree or experience i received a degree in astrophysics but no higher degree learning python sql and django at this moment what are my chances of becoming a data scientist if i learn machine learning and other ds material? honestly i’m of the mindset that as long as someone is genuinely interested and willing to become obsessed with ml and ds for a period of months until they land a job then it just takes commitment effort and time and if you’re lucky like me you’ll find a manager that feels the same i ended up joining a small team of 3 everyone else has a phd and i’ve got a bachelors means i’m legit lost in the math convos even though i studied electrical engineering but i’m capable of conceptually understanding all the concepts and taking direction when i need to build things not saying it’s easy but if you’re someone who can bounce back from rejection and have genuine enthusiasm to dig deeper as opposed to turning around until you land a job then it’s very doable get your python going work on understanding the classic titanic problem and make sure you just read a ton of inspiring projects and view cool visualizations that get you motivated ted talks too once i had convinced myself how cool ml and dl was and how it was the future i had the drive to work thru the rest of the muck when it got hard and slow i’m sorry that sounds pretty rough i know a couple people who have done boot camps and they all have jobs they really enjoy in data science it sounds like you’re doing everything right so don’t give up or change your tune where do you live? i’m in the bay area and there are tons of meetups and conferences all the time where recruiters are looking for people to hire and based on who i’ve talked to data science is quite competitive but if the hard work and talent is there then a job will be found so i would recommend going to meetups ([meetup com](www meetup com) ) if you haven’t been also i can say that when you’re interviewing or talking to recruiters the best thing you can do is have an impressive project to talk about may be obvious but may not be they want to hear you talk in detail about the projects you’ve done so they can understand quickly and easily the impact you’d have at a company have you gotten many interviews or is that difficult as well see if you can get feedback from people who know you about if you might be faltering at a particular stage of job searching i do disagree with your title because it is absolutely possible good luck! i haven't really done meetups i do go to the occassional data science events but i haven't used meetup com i work two nights a week so that presents an obstacle for me well i'd highly recommend it if you only have two nights a week that you work then you've got five nights a week free! it looks like you're in sf i'd check out the sfpython meetup also a big fan of data science for sustainability though mostly because i geek out on clean tech only two nights a week? you have 5 other nights to go to meetups two nights a week!?!? i don't want to sound overly critical but given this answer and some of your other responses it seems that you're a bit lazy for a career in data science i could be totally wrong (and i probably am) but you should do some serious introspection about why you don't have any ds offers a few helpful tips: get your resume in front of a professional: a recruiter personal development professional from your alma mater etc these people will be able to take your resume to the next level given my three jobs i’m working about 50-60 hours a week and that doesn’t include the 10-20 hours i try to spend learning and improving skills and working on projects i'm sure one or two of your side projects would make an interesting talk and get your name out there a bit more lot easier to get a job when people know who you are you aren't entitled to a job you are busting your ass? congrats so is everybody else how do you define these data science jobs? let's not muddle the system maker with the system runner do you want to write the algorithm or use an already-popular algorithm? the latter is not what gets you hired at a big company unless they need peons i guess there is a difference between engineering and a trade and i think programming is a deceptive subject you might go in thinking that you can be the next alan turing but you come out being a plumber i'm using hyperbole here but i think the point stands i'm in the situation of being interested in data science as a career and i'm currently looking at my options for getting there i'm realizing it's more complicated than the title 'data scientist' i'm also holding out hope that i can be just as good and knowledgeable as anyone despite not spending time in the 'guilds' to get my phd it's two-fold can i learn the same amount without that phd (but in a more efficient manner saving time and money)? also can i convince others of this? a portfolio is nice but portfolios are for artists and tradespeople people that need assurance their system is based on solid theory which is essentially what the term 'engineering' is held to as a standard may not look at your portfolio as enough merely using programming tools and frameworks and calling yourself an engineer might be akin to building a home out of legos sure with enough practice you learned how to build one but you don't have a way to calculate its load-bearing capacity or things of that sort you don't have precision i'm obviously fond of self-teaching of subjects but i'm trying to escape that bias looking at this question honestly i don't know that you can be a data scientist without a serious degree the difference between a data scientist and a programmer is that most programmers have much more fault tolerance built into their role "moving fast and breaking things" is okay now r&d departments and machine learning startups with employed data scientists might move fast and break things but they do so with theoretical knowledge guiding their best guesses a programmer hacker will often break things to just learn how they work (even the fundamental theory behind them) it's a murky distinction sometimes but i think it exists in some capacity no offence but can i ask what your background is? you seem to have quite a few strong opinions for someone who says they aren't a data scientist - do you work elsewhere in the field? i'm a programmer there're a couple points where i shouldn't have come off as strong in my opinion but the gist is what i would stand by it's possible you will need work experience beggars can't be choosers either take what you can get and work your way up get your foot in the door at a good company that's what i did your dream job isn't going to fall into your lap a few strategic moves later and you may all of a sudden find yourself in a good position you have to be willing to put in the time you are competing with masters and phd students from all branches of science and social science start a master's part time employers like that it shows initiative and potential first some introspection is always good are you really sure your work and skills are up to par? see dunning-kruger effect always good to be humble in the end you are doing something wrong did you get any interviews? if not inform yourself about how to improve your cv and application letter obviously your applications always gets filtered out early if you got interviews you somehow failed them work and that skill in this case besides boot-camp what is your other skill? if you would hire someone i would also be skeptical about boot-camps there is a huge difference between boot-camp and a bachelors degree in my opinion at least on paper also data scientists often have data science as second education maybe they used to be a programmer or data analyst maybe they have a bachelor masters in chemistry or economics and working experience in their field so they already understand the data and concepts meaning just having a boot-camp isn't much networking and social skills are important but you also need to be good at it and people need to like you else it is actually counter-productive but impossible to tell what the issue is cast a wider net that includes possible relocation dunning–kruger effect in the field of psychology the dunning–kruger effect is a cognitive bias wherein people of low ability suffer from illusory superiority mistakenly assessing their cognitive ability as greater than it is the cognitive bias of illusory superiority derives from the metacognitive inability of low-ability persons to recognize their own ineptitude; without the self-awareness of metacognition low-ability people cannot objectively evaluate their actual competence or incompetence conversely highly competent individuals may erroneously presume that tasks easy for them to perform are also easy for other people to perform or that other people will have a similar understanding of subjects that they themselves are well-versed in [pm |excludeme |excludefromsubreddit |faq information |source |donate ]downvotetoremove|v0 28 [deleted] references? [deleted] [deleted] that sucks it is very competitive data science roles command a very high salary you need to convince a hiring manager that you'd be a net positive roi on top of doing your job search and networking it's just as important to keep doing projects to build up your portfolio i know of one such guy who writes about his path to data science with a bs on quora he put a huge effort into kaggle and had several projects in the top 10% rankings that's when he said companies started to get interested in him keep doing projects to demonstrate your value don't expect boot camps degrees and certificates to prove anything employers want to hear about what you've actually accomplished that directly demonstrates what you could do for them your bootcamp doesn't show that the bootcamp is meant to give you the skills to complete projects in order to demonstrate that i'm going to address one (and only one) specific point you brought up - not knowing if the market will be stacked in favor of phds if you get a masters yes the market will always be stacked in favor of phds always but that does not mean you can't land a data scientist job with just a masters having just completed one of those masters within the past year i can speak from experience about the effectiveness of these degrees several of my classmates had zero data science or even analyst experience prior to enrolling in the program and they are all now employed as data scientists let that sink in; they are all now employed as data scientists every single person from my cohort do you have a good ba? i good ba in pure applied math physics or engineering looks good go for an analyst title and pivot to data science oriented tasks - it's not that different and the barrier to entry is lower you're going to be having a tough time getting into a role of a data scientist which is a very senior position with a bootcamp under your belt even people with msc degree aren't ready for a ds role imo it takes time and trust from a company - something that is hard to say when your resume says x-weeks bootcamp so either your cv is not conveying that they think you can do the job and you need to fix that or you need to put other items on your cv that indicates you know your shit get a position in the field and pivot to your interest you need trust and experience under your belt my 2 cents on the matter thats my current path i have a ms in stats and got a data analyst position at first 1 5yr later i got promoted senior with high pay (90) im currently working hard to get the data science projects of my department my current work is 50:50 between data analysis and data science i also was able to prove myself and made contacts with other people becoming data scientists elsewhere feels like things are opening up for when i feel its time to move or for an internal promotion i also spend my free time on coursera or kaggle in the evening my company is also not in tech and from ive seen many industry is really not as advanced as tech and data science for hire companies this makes it a bit more easy to prove yourself without being surrounded by machine learning phds on the other hand you dont benefit from strong mentorship so its not the best long term and it requires a pro active mindset because youll be the one to bring the ideas forward rather than being handled projects i'm confused why anybody would ever pay for a boot camp with the amount of free material out there sure it takes a bit more work but literally everything a boot camp offers you can get for free if i'm going to pay for something then i would go get a ms or phd from an actual institution in response to your personal problem i highly recommend going on linkedin and connecting with anyone remotely related to data science or analytics also list on your profile all your experience projects etc once i surpassed 1000 connections and did the profile i literally couldn't stop getting inmail interview requests now (most of these) are not high quality jobs but better than your current situation also post your details hacker news who want' s to be hired i got an interview from uber just posting on that i think the real disservice is that you went into debt to get a degree you sound frustrated and desperate and that is no good i got a data science job without a grad degree i did this by going to 4 hackathons and networking like crazy and also living in dc which has a shortage of tech workers what are your actual skills? do you have a github? and how hard have you job searched? i saw that you don't go to meetups? welp that is a clear thing you could try those worked super well for me well i'm a data scientist with no degrees that actually got hired a couple months after doing a coding bootcamp lol i got lucky and got hired from within the company (got hired as an etl developer) but still don't let a degree hold you back if you really dedicate yourself to data science and are patient with the job process you will be able to get one anything is possible with hard work and passion :) shoot me a pm if you need any advice! edit - for the people downvoting this can you explain why? data science became so popular that we are facing an oversupply of dscientists the industry will obviously pick the ones with highest degrees that’s what i’m afraid of that i feel for the hype and now the surplus demand has turned to surplus hype i think you're confusing what people call data scientists with kids who know how to throw ml at everything and hope for the best true data science jobs are begging for people the problem is almost all of them require a phd even more accurate for true ceos there's such a huge shortage of true ceos that the few who are qualified to do the work (e g marissa mayer) end up getting paid crazy salaries what bootcamp did you attend? and where are you an online mentor? do you have an undergraduate degree? yes in economics ok so that’s definitely worth something cs or statistics would probably be worth more but yeah most of the ds job postings i’ve seen require a masters with some requiring a phd personally i’ve always viewed boot camps as shortcuts why would i want to hire someone who has only completed a bootcamp vs someone who has a graduate degree? not to mention bootcamps could be downright terrible there is no level of accreditation on which to base the legitimacy of a bootcamp what bootcamp did you attend and where? there are certainly some that are better than others just as an aside if you have a stem bs degree and are american you should be able to go into a phd program that will pay you a salary and offer you benefits if you have an ms degree you should be able to go to any phd program in the world in stem and receive a salary you should not be going into debt to go to school to study something in stem i had facebook email me for a interview request and i never submitted 1 application to them tl;dr: learned a buncha shit in 20 months with no prior anything-related experience got job as data scientist edit: seems like this was removed from r learnprogramming trying to direct all the pms to come here first i want to thank the entire reddit community because without this place i wouldn’t have went down the rabbit hole that is self-learning job searching and negotiation second just to list out my background so people know where i started and how i got here: i graduated in 2013 with a bachelor’s in civil engineering (useless in this case) and again in 2015 with a master’s in operations research (much more useful namewise at least) both from the same top school the name of the school and the operations research degree opened up quite a few doors in the beginning of my (2-year) career and definitely was a factor in getting an interview but had nothing to do directly with what was needed for the data science job this is because that offer was contingent on a programming skillset and specific data science problem-solving abilities of which i had none right after graduation the most useful advice to keep in mind: keep trying keep learning don’t be afraid to switch jobs when you’re bored or it’s not what you want continuously look for new opportunities and always negotiate i went from a 47k job where i lasted only 4 months to a 65k job where i lasted just under a year to a 90k job where i stayed 10 months to my new job at 115k all in under 2 and a half years strap yourself in this will be long! step 1: get your first real job out of college realize how much you loathe it feel entitled because they’re not paying you for your amazing theoretical prowess that isn’t really useful realize that you were meant to do much more cool shit and convince yourself that you need a higher paying job my first job out of grad school lasted 4 months it was an analyst title which i thought was awesome because i had no idea what analysts do but it was mostly bitchwork and data entry the one upside was that my boss mentioned a pivot table once and i googled it so i finally learned what it was but i still figured i was too smart for this shit so i looked for other jobs because i needed something to challenge me congrats you now have the drive to get your ass to a better role! step 2: i got into the adtech industry after my 4-month stint they liked me because of that pivot table thing i learned to do s this is where the data science itch began but i knew i wouldn’t be satisfied in the long run as pompous as it is to keep saying i was too smart for this shit i was i just needed the tools to show that the amount of data that lives in the industry is insane and it’s always good to mention how much data you’ve worked with this place is where you earn your sql excel and tableau medals you edit some dashboards you pivot and slice data you don’t necessarily write your own complex queries from scratch but you know how they look like and know what joins do by no means was i going to do any advanced stuff at work so i needed to start doing it on my own if i wanted to grow in my time at this job (after work but also during work use your down time wisely!) i took mit’s intro to comp sci with python edx’s analytics edge and andrew ng’s machine learning this set up the foundation but since they were all intro courses i couldn’t apply the knowledge there were still a bunch of missing pieces but! at least i got started towards the end of my time there i found rmotr com through reddit i finished the advanced python programming course which was incredibly difficult for me at the time because of the knowledge density and intensity i highly recommend it if you want to learn more advanced python methodologies and applications and also if you’re leaning towards the development side step 3: i left my last company of a few thousand people where everything was essentially fully established and moved to a smaller company of 100ish people there was more opportunity to build and own projects here and it’s where i earned my dev analytics and machine learning medals this is where classes will continue to aid in your learning but where google and stackoverflow will help you actually build cool shit you will have thousands of questions the classes won’t be able to answer so your searching skills will greatly improve in this time during my time here i completed coursera umichigan’s intro to data science with python i completed it relatively quickly and from what i recall it wasn’t too challenging after that course i stumbled on udemy and completed jose portilla’s python for data science and machine learning bootcamp which was a turning point from knowledge to application this class is a must it’s how i learned to neatly organize my data frames manipulate them very easily and thanks to google and stackoverflow how to get all that data into csv and excel sheets so i can send them to people it doesn’t sound like much but data organization and manipulation was the #1 worthwhile skill i learned it’s also where i learned to implement all machine learning algorithms using scikit-learn and a bit of deep learning there wasn’t much theory behind it which was perfectly fine because i was going for 100% application this is also where i took advantage of the training reimbursement at work- i kept buying courses and it was free! during this time i also completed stanford’s statistical learning course on their lagunita platform (good for knowledge base) the first three courses of andrew ng’s deep learning specialization on coursera (it was a breeze because it was in python and i had a deep understanding of dataframes by this time also very good for knowledge base and algorithm implementation from scratch) and another udemy class from jose salvatierra called the complete postgresql and python developer course- also a game changer it was the first course i had on clean python code for software development the way he thinks is outstanding and i highly recommend it step 4: resume building and linkedin there are articles out there that can explain this a lot better than i can but here were my steps to have my resume and linkedin ready: resume kept the resume to one page had it look more modern sleek and fresh (even had dark grey and blue colors) under my name listed my email number github and linkedin across the entire width of the page recent work experience on top descriptions included what technology i used (python impala etc ) to do something (built multiple scrapers python notebooks automated reporting etc ) and the effect (saved hours of manual work for account managers increased revenue day over day by x etc) this can be easily remembered by saying i used x to do y with the z results note: not all of my descriptions had results my last listed job on my resume only had the support work i did- i supported accounts totaling x revenue monthly partook in meetings with clients etc not every task has a quantifiable outcome but it’s nice to throw some numbers in there when you can i read in some places that no one would care about this but i did it anyway and listed all courses and bootcamps i had finished by that time which was around 8 while i had some projects i had done at work i could speak to i wanted them to know that i was really dedicated to learning everything i could about the field and it worked! below that was my education- both degrees listed without gpas and lastly active interests maybe old-school corporations don’t care for things like this but for start-uppy tech companies that are in a growth stage i figured they’d like to see my what i do on the side i’ve been competitively dancing for almost a decade and weightlifting for more than that so if being a dancing weightlifting engineering-background guy makes me seem more unique i’m going for it whatever makes you stick out! linkedin professional-looking photo doesn’t have to be professional just professional-looking fill out everything linkedin asks you to fill out so you can be an all-star and appear in more searches the summary should include a shitload of keywords that relate to what you’ve done and what you want to do automation analytics machine learning python sql nosql ms-sql throw all that shit in there i only filled out the description for my most recent job because that’s where i actually did cool shit i put a lot more detail here in linkedin than i did on my resume then i listed the 3-4 jobs i had before that no description put all my certifications from the courses i took with links put my education obvs the rest…eh doesn’t really matter step 5: job search so you have your nice and shiny resume ready and your linkedin set to go this is where the entirety of your hard work will be rewarded how badly do you want this job? i stopped using indeed monster etc a long while ago the single tool i used was and still is glassdoor download a pdf copy of your resume to your phone or a cloud drive search on glassdoor on the daily keep saved searches ready to go- “junior data scientist” “data scientist” “senior analytics” “senior data analyst” “junior machine learning” “entry data science” and so on when you’re on the bus or laundromat or in bed late at night and can’t sleep look for openings filter by the rating you’re willing to take on and apply like mad i got dozens of applications done just from waiting at the laundromat all the calls i had after were 100% from glassdoor applications step 6: the initial call i’ve had 3 total initial calls from the probably 50 or so applications i sent over the summer (very few openings that didn’t require 5+ years of java and machine learning product dev etc etc and largely distributed blah blah where i live) here were most of the things i was asked: • what tools i used at work • how have i made processes more efficient at work • anything i’ve automated • largest amount of data i worked with and what was the project and result • why the shift from the current job • how much i know about their company and how i’d describe the company so someone else (do your research!) i had 100% success on my initial calls each time mentioned some sort of python automated scripts (simply by using windows task scheduler and batch file- thanks to google search!) and a data manipulation project (highest i’ve had is a few million rows) and i was good to go step 7: the data exercise from those 3 initial calls i had 2 exercises sent via email and one via codility the first exercise was sql and visualization heavy i was given a sqlite database to work from and had to alter tables to feed into other tables to aggregate other metrics and so on once that was done i had to use the resulting tables to do some visualizations and inference did i know how to do most of what they asked? hell no i had google and stackoverflow open for every little detail i didn’t know how to do off the top of my head the entire thing took about 20-25 hours spread across the week and even when i submitted it didn’t feel complete i couldn’t afford not to put all my free time into this exercise the end result: the hiring manager and team was impressed with the code but they didn’t vibe with the presentation style of my jupyter notebook and it was very apparent that i lacked the domain knowledge required (this was for a health tech company and i have no health anything experience) it actually prompted them to re-post with an altered job description requiring domain knowledge woo? regardless this served as a huge source of validation for me- these senior level members thought my code was good the second exercise was from the company i ultimately accepted it was 3-4 hours in total to assess business intelligence skills (sql and visualization) they liked it and i moved on to the in-person which i’ll go into in the next step the last exercise was codility- and while my code “worked” there was likely some test cases i didn’t account for either that or the company got irritated when i said i received an offer and if they could speed up the process they didn’t follow through step 8: the in-person interview so you got to this stage! congrats! and you’ll be interviewing with 3 vps 2 c-level execs and 2 data scientists jesus fuck you’ve never met this many executives in your whole life no need to freak out this simply validates your hard work you’ll be meeting with very important people for a very important job and they think you might be good at it even if i hadn’t made it past this i tasted victory i did something that may not be recommended by most people: i didn’t prepare for questions they’d ask me but rather prepared for all the questions i’d ask them this did two things: i didn’t obsess about what they’d ask me so i was relaxed and it gave me a lot of chances to show i knew my shit when i asked them a bunch of stuff besides for a data science job i figured they’d ask questions about how i’d solve some problems they currently have as opposed to some common questions and that’s exactly what they did not something you can really prepare for the night before since it’s a way of thinking you’d have to grasp through all the classes and projects and problems you solved at your current job important note: i am not advocating ignoring prepping for questions i did about 30-35 interviews phone and in person before my current job so i had a lot of learning experience i already had a more natural-feeling response for most questions and if you really were into your projects at your current job you’ll know what you did inside out so it’s easier to talk about it on the spot but by all means if you don’t have much interview experience prepare and practice! here are my notes from after the interviews including what was asked and how i answered and what i asked: vp of data science • notice any hiccup in your exercise? i debated with him on the accuracy of a single statement in the exercise assuring him that since i used a hadoop-based query engine and they used aws my method worked every time i used it i never checked whether he or i was right because afterwards i started thinking he was right and didn’t want to feel like an idiot but we moved on rather quickly • how would you implement typo detection? i gave a convoluted response but put simply some distance index between words as in how many changes would it take to get to the word we may want he liked the answer because it’s what he was thinking too • how’s your style of explaining things to people? very logical step-by-step process with the goal of weaning people off needing me i’d explain it to them completely then next time leave a few steps missing and ask if they’d remember then eventually just give them a step or two • what’s something you want to be better at? being more personable when explaining technical terms to non-tech people then i went crazy with a ton of questions about what projects they’re working on what’s the first thing i’d be working on the challenges they have currently how do they interact with the sales team and so on vp tech • so data! tell me about it i told him that i love it i’m excited by it and i wana get better at it • what as a process you made more efficient at work created an automated process using a batch file to run python script via task scheduler it scrapes an internal web tool and creates reporting that otherwise doesn’t exist which saves hours for the account managers weekly • so you aimed towards a process that would essentially take something that’s not working too well fix it and productionalize it? why yes yes indeed • so that kind of sounds like a software development mentality absolutely and eventually after i have a lot of exposure to the research side of data science i’d like to get more into a machine learning engineering role to build everything out • cool man! he probably liked that i wasn’t purely analytics but also built tools to solve problems not related to data science coo president • what are areas do you think you need development in? being more on the business side of things as i tend to like delving deep into my code to make things work i sometimes get delayed info of the overall business health • do you have any entrepreneurial experience? i said nope to which he responded with “nothing? not even selling lemonade?” then it jogged my memory of when i tried to sell yugioh and pokemon cards at the pool when i was young with my binder of sheets with prices too high so no one would buy he had a laugh and said it was a good answer because the simple experience in learning the prices were too high was a lesson • what are you looking for? something challenging where i won’t be just a sql monkey (this term was thrown around by a lot of the team so i kept repeating it and made references to who mentioned it to show that i’m paying attention) where there will be big issues to solve across the company and a place where i’d be doing something meaningful in this case it was helping local businesses thrive and i’m all for that i’m coming from an adtech background so the emphasis was very clear on the “finding meaning” part • if that's the case why this company? i liked that they were very fast with their interview process i told him that and that it shows a lot about the company and how much they care to get things done • what was your proudest moment? told him about the first time i built a tool that helped the business which was at my current company the year or so of effort learning python and databases and manipulating dataframes led to a really cool scraping project that now seems rather novice but i couldn’t contain my excitement when i accomplished it data scientists sit and chat i asked them questions about how they like it there what projects they worked on etc very laid back vp marketing (first form) this was the one guy who really grilled me with problem solving questions • why did google decide to build out their own browser? this is where my background in adtech helped i listed almost everything i could about user data selling to advertisers tracking users etc he thought those were good answers but it wasn’t what he was looking for he asked me the next leading question • what was so good about chrome compared to ie? i stumbled on this since i never could really compare it fully to internet explorer since i never used ie i just knew people said it sucked with some guidance i answered correctly: faster load times • and what does that mean? i took a few seconds of thought and answered correctly that google wants their search pages to load faster from there he pulled some stats about google cpc and rates from another country and asked me how much would google make in capturing a certain percent of the internet explorer user market my process was correct but the multiplication was off in the end a bit embarrassing but at least i owned it and made some jokes about division by hand got the correct answer after that concluded the first in-person interview got called for another in-person and i was shitting myself because i thought maybe they didn’t get enough information i was much more nervous for this one but once the interviews started i was calm and confident cmo • what are some of areas that you need development in? same as i said before- business side things • why the short tenure in your old jobs (4 months 12 months 9 months)? this is where you have to show yourself as the ever-growing constant-learning autodidact with insatiable appetite to learn i told him i learn on my own outside of work i apply that knowledge to build cool shit and that i outgrow my positions very quickly so i needed something more challenging i backed it up with the projects i completed • what'll be the biggest challenge you'll face here? data science team structure- sprints prioritizing the right projects etc haven’t experienced it before so i’d have to learn how to operate within that structure • what would your current boss say about you? i explained that i have sort of two bosses one tech and one nontech the tech one would say i can take an idea and run with it to build a tool the nontech would say i’m very helpful and available asap when he needs me • what would they say you need improvement on? nontech boss- business side of things tech boss- get more into the details of adtech like which scripts are executed on the page how it relates to different servers etc • what would your last boss say about you? always learning on the job • what's one example of when you thought outside the box? gave example of how the data engineering team was backed up and couldn’t ingest some third party data so i used python to ingest the data 6-8 weeks before they could do it i also explained that while the process was essentially the same (extract transform load) i thought outside the box by not relying on the team assigned with the task and figured out my own way to do it he thought that was an excellent example • what was your proudest moment? same answer as before • why the move? current company is pivoting has been for 8 months but not much to show for it a lot of senior leadership is exiting not confident in the direction it’s taking so figured this would be a great time to make a change • how would you describe your old bosses? last job- was first a coworker that was promoted to my boss she was very kind figuring out how to manage but never lost sight of being compassionate and fighting for her team wonderful overall current job- nontech boss is very hands off since he doesn’t know the details of what i do but gives good overall ideas with tech boss we work together constantly on data tasks or ideas for new tools to build very logical and unemotional at work similar to me after i asked about what success looks like in the role and what were the biggest challenges facing his department vp marketing (final form) here he was again! back with more questions to grill me i really liked the guy because he did his due diligence and it was fun because the questions made my brain’s gears go overdrive • how would you go about seeing if users ordering from more than one location is profitable? i responded with a very convoluted explanation for a b test which he said was good then asked how to do it without the ability to do a b test using data we already have was able to eventually tell him something along the lines of a time series analysis involving control groups • walk me through how you'll implement a b test told him the basics but that i haven’t done it in practice couldn’t answer his question about how long it should run for so i told him straight up and he was okay with it • how would you go about determining the optimal number of recommendations to show on the app for each geographical type? basic group-bys by geo and success rate for each number of recommendations shown • what is logistic regression? at this point i had just finished one of andrew ng’s deep learning course where you code a logistic regression from scratch so i did a little showboating here with how much i knew =d • take me through the process of how you got into machine learning i told him basically what i’ve described here- that i felt useless after my master’s needed to not be left behind in the machine learning revolution went crazy from day one and here i am i asked him: • what are the projects i'll work on in the first month? • you worked at other huge and established companies so why here and what makes you come back everyday? and! i give you the absolute best question to ask: • “you’ve had the most opportunity to get to know me and my skillset i’d like to know if you had any reservations about my qualifications as a candidate so we can discuss and take care of any concerns ” boom! and just like that i knew how impressed he was and that the only reservation was my short experience but that i more than made up for it with my passion and drive he almost didn’t want to say my lack of experience was a concern and looked very hesitant i guess in fear of having me being like “peace!” and that was that! step 9: wait forever and get paranoid title says it all it’s hard to wait and wait especially when you felt like you did really well and especially when the interviewing process took 3 weeks but the decision process takes another 3 weeks my advice is simply keep applying to other places don’t take your foot off the pedal and continue learning building things i managed to finish another 2 courses from the time of the first interview to the offer and even built my own small personal website don’t let up! step 10: negotiate i’ll leave it to you to gather more advice on negotiating and how to go about it but my general advice is to always negotiate whether the market value is higher than the offer (i’m not a fan of this explanation but i’ve never had to use it) or you suddenly feel that the responsibilities are worth more or as in my case you realize they don’t offer benefits you thought would be offered then negotiate it can be by phone or email just do it it’s uncomfortable you’ll question your decision every second of the day for what seems like forever you think they’ll rescind the offer and get someone cheaper just relax it’s business it’s part of showing your skills by not leaving money on the table with a role as specialized as this where there is a lot of demand you have the upper hand if you’ve already proved yourself i got a nice bump at my current job and at the new data science job by asking for more i’ll leave you this fantastic link that helped with a changing mindset: http: www kalzumeus com 2012 01 23 salary-negotiation and that’s a wrap! a quick summary of the most important lessons i learned in this journey: you don’t have to get an expensive data science degree or go to an expensive bootcamp everything is literally available for free somewhere online and more structured resources are available at very low cost (udemy and their $10 specials!) glassdoor is the most important app in this process download it keep a fresh copy of your resume on your phone and send out apps during your commute at the laundromat while in bed on a lazy saturday etc it’s almost effortless absorb everything you can a lot of it won’t stick but a lot of it will learning demands consistency 10 hours of study spread across 2 weeks is much better than 10 hours you did that one weekend 2 weeks ago use what you learn somehow- if you picked up python google how to scrape the web or how to automate sending files via email or how to connect to a certain database make a project out of it even a mini-project that you can speak about later google will show you the way! optimizing processes is sexy and it was the most frequently asked question in this job search in case you couldn’t tell google and stackoverflow were lifesavers talk is cheap a lot of people i know talk about taking classes and how excited they are a year later they’re in the same place learn it use it and continue learning spend less time talking about how you’re gonna do something and work towards getting it done you’ll stumble through a lot of material- and that’s okay not everything is connected in the beginning and a lot of it will feel like wasted effort keep going! you’ll reach the “aha!” moment when everything clicks and you “get it” it might take a year and a half but think about what would have happened if you started a year and a half ago? adding to the last point it’s hard to know where to start and where to go i’ll summarize a cheap quick start guide for data science below if you’re lost! get ready to make sacrifices on average it was 3-4 hours daily everyday before or after work and sometimes 6 hours on each of the weekend days and this isn’t counting the coding i did during work to make things more efficient which is at least another 3-4 hours per workday i did take about 6-8 weeks off in total throughout the whole process though you’ll burn out sometimes and that’s okay! if you’re as driven and passionate as i was you’ll come back to it weeks later maybe even a month lastly reddit is a place of vast knowledge of the field use it go to r learnprogramming or r datascience or r jobs or r personalfinance there will be questions and topics covering a lot of what i covered here quick start guide for data science: (in no particular order) introduction to computer science with python from edx org either: o andrew ng’s machine learning via coursera (not in python but teaches you to know the matrix manipulation fundamentals) o statistical learning via stanford lagunita (more theory than programming understanding but covers similar concepts and introduces r which is also a good tool) python data science and machine learning bootcamp via udemy again this is just to get started google and stackoverflow will take you to the next level and other courses will fill the knowledge gaps full list of courses i’ve completed: • complete python web course from udemy • complete python and postgresql developer course from udemy • deeplearning ai's specialization from coursera • statistical learning from stanford lagunita • python for data science and machine learning from udemy • introduction to data science in python from coursera • introduction to computer science and programming using python from edx • analytics edge from edx • machine learning from coursera thanks for reading! wishing you the best in your data science journey i hope it’s as rewarding exciting and fruitful as it was for me you greatly underestimate the value of your master's degree in operations research mostly because the program was curved like crazy i never took the time to actually study until i almost failed and almost had to retake a required course i left feeling like a fraud and had to take pieces from other resources after i graduated to learn basic probstats i hear that's how a lot of engineering programs are curved like crazy because they're just "so hard" but it made me feel like i didn't have to take anything seriously so i graduated but not proudly and not feeling like i deserved to from the inside it didn't seem very valuable to me for the money from the outside and a couple years later incredibly valuable and worth the price tag but definitely won't do it again exactly or is probably one of the best degrees you can have to get into data science along with cs and stats ignore my ignorance but what's operational research about? first time i have heard of it it is basically the math of optimization the field was originally created by the military in wwii to efficiently manage large supply lines and troop vehicle movements it is very heavy on statistics big data and computer science making operations researchers obvious candidates for data science roles yeah me too the name of the school and the operations research degree opened up quite a few doors in the beginning of my (2-year) career and definitely was a factor in getting an interview but had nothing to do directly with what was needed for the data science job this is because that offer was contingent on a programming skillset and specific data science problem-solving abilities of which i had none right after graduation the offer may also have been contingent on your education background you just had that already unfortunately industry trusts grad degree holders more for these roles operations research is going to flag your cv as coming from a candidate that has an optimization and statistics background the grad degree flags you as someone that can learn more-or-less self directed what?! 20 months! but i thought i could get a data scientist job by spending 20 minutes this afternoon learning about data science on coursera! only if you upgrade to the super specialization for only $50 month more! if you're like me and like finishing courses quickly their new model works out for you excellent post!i too am a civil engineering graduate with almost 2 and a half years of experience in the field of data analytics worked on r sql with a little bit of predictive modelling and reporting learning python now would you say a post graduate is important in getting a job with a better pay? absolutely having the m s despite the lack of useful stuff from it gave me confidence (except at my first and second jobs where i was just happy to actually have a job) to negotiate for more it also forced me to negotiate for more so i could pay off the crazy loans from it negotiating for more allowed my next negotiation to be easier as i had a higher base to start from a lot of data science positions like operations research backgrounds so that's definitely a plus but if you have the skills already have done awesome projects that brought value to someone i'm telling you now there's nothing worthwhile you'll learn from a 60-70k degree if you want to get deeper into the theory and nuts and bolts of data science save yourself that money and take full legit courses from stanford or mit both of which offer free online courses on their platforms but if it's for the confidence and to get more eyes on your resume- then it's up to you to decide if it's worth the debt thank you! would you mind giving me a small brief about what operations research actually is and why data science positions prefer it? it involves a lot of statistics like stochastic and deterministic models my program combined it with finance and entrepreneurship and since i didn't know anything about my career at the time i took a lot of bullshit classes that didn't do much the ideal case is to have classes on data structure databases and some coding class then that's a valuable degree to have i had none of that but degree opened up the doors for me to prove myself not only debt - opportunity cost of the salary of a data scientist for two years found the economist great post perfect 5 7! my only real push is why in holy hell were you using glassdoor if you graduated from a top school? you have no alumni database network to utilize? holy shit you just made me realize i never once looked into the alumni portal for job postings for data science from what i remember last time i looked a couple of years ago like 90% jobs were all catered to finance so i stopped using it as for networking i hated it i was terribly unresourceful during both undergrad and grad and never took advantage of any career development stuff i asked a few former classmates about a couple of job postings but it was all the same- i didn't have the experience there was also a sense of pride like "i'm gonna do this on my own and i'll figure it out" i laughed so hard at 5 7-please tell me you're referencing the meme i think you are i am sir you're the dark knight of posters here on datascience you mean "dark night" this was along the same lines that i progressed as well graduated with a bs in a math related field from a state school went into the work force it's been about 15 months since i graduated my path since graduation (going from company to company) has been •0-2 months: job searching learning to code took a coursera •2-8 months: data analyst (taught myself data science and ml in this time) •8-13 months: jr data scientist (learned more about ds more field centric experience) •13 months - present: data scientist everyone wants someone else to give them data science jobs but literally every resource you need to know to become a great data scientist can be found by keeping on top of and practicing on kaggle rpubs (if you use r) data science related subreddits and data science websites nice! i read "president" instead of "present" and was about to bombard you with questions lol congrats! thank you! same goes to you! were your three jobs all at different companies? yep! all were different fortune 500 companies do you mind going into some detail? that is a lot of job hopping did you get any push back about that? how did you find them? were you just constantly looking or were poached by recruiters? no problem i actually never intended to do a lot of job hopping for context: data analyst job was at company a jr data scientist job was at company b data scientist job was at company c i applied for the jr data scientist job because my job as a data analyst didn't really do much more than basic data manipulation i also knew that i wouldn't stay at company a as a data analyst for an extended period of time because they didn't have any data scientists i really enjoyed being at company b as a jr data scientist the team was awesome and i was learning a lot but someone (from company c) reached out to me and talked me into coming in for an interview (company b and company c are in the same city) i never intended on going to company c moreover i just took the interview because i wanted more exposure to see what other companies were doing in the field however company c offered me an exorbitant amount of money a better position title also was more aligned with machine learning methodology that i wanted to do and had a lot more added perks that came with working for that company (think along the lines of free flights when you work for an airline kind of thing) i never intended to job hop so much but after i landed the jr data scientist job i was hounded after by recruiters because everyone wants a data scientist i never received push back for hopping around all that much especially because i was moving up in positions and not moving laterally i do plan to stay my current company for quite a while though wow! i'm literally in your data analyst point and the moment and want to follow your track i'm having the same issue with my current role where i'm only ever doing simple manipulation and have got the most i could possibly get out of this role i'm glad you posted this and i read it as i have clarification that i am on the correct path and fee like it is more likely that i can follow the path i want to follow thank you brah! best of luck! keep improving your skills everyday! how did you find a data analyst job so quickly? beefed up linkedin coursera courses had a bunch of coding examples on github to be honest it was the only call back interview that i received after applying for hundreds of analyst positions it was pure luck that i even got in for the interview what do you mean by practicing rpubs? looking at other people's code understand what they're doing and attempt to replicate what they're doing on a different data set is there a way you specifically find rpubs worth any attention? from the homepage you just see a bunch of homework and lab assignments haha that is a good question i do wish that rpubs had a better way of organizing it's website if i wanted to learn a specific designation of machine learning or coding process or whatever i'd usually just google ' insert topic here rpubs' for example if i wanted to look at how people did support vector machines in r i'd do https: www google com search?q=svm+rpubs this guy is my favorite rpubs contributor though he does all kinds of statistical and machine learning processes https: rpubs com ledongnhatnam that post was great i too came from a non technical background international relations major in undergrad did a mini mba at duke worked in fraud analytics for two years working with sql and making powerpoints for ecommerce clients how our software could help them i slaved away for two years learning all my math and programming at local community college and coursera and now i'm at northwestern about to graduate in december and looking for full time jobs i can totally relate to this post hi! you have a very similar background to me - what program are you in at northwestern? i'm literally about to enroll in local cc classes with the same path in mind i am in the master of science in analytics program here that's awesome man! congrats! takes a while but once you're there its a beautiful thing this post gives so much hope wow i'm currently doing jose's udemy course and it's really great it's the first course that i really stick to i have no programming experience at all and i tried bunch of other open courses (coursera edx) but i think this one is so far the best absolutely! it's an amazing course and it focuses on application whereas the others are sometimes bogged down too much with theory that can make it hard to get excited you reminded me- i should probably change the order because i forgot that jose has a whole intro to python section that i skipped so it shouldn't necessarily come after the first two but could be in tandem thanks! how would you compare this course to what you actually do at work? i not it's not enough but from the research i did on the internet every website every blog everyone says something different i'm kind of overwhelmed there is quite a lot to learn especially for a beginner like me it's absolutely overwhelming for sure i almost quit like 8 times in the process i even took a couple of weeks off from jose's course because my brain was overloaded with everything i was trying to do the thing is the function of my current role doesn't need python at all just a lot of sql and tableau (which i don't like) after this course my function became 100% of what i did in this course i ditched using dbvisualizer for queries (except for a few cases) and put them right in my script i ditched tableau because i could get a lot more detailed information i pulled all the data created dataframes filtered and visualized it with seaborn look up how to scrape the web and all of a sudden you can access data probably no one at your company would think of before look up how to send automated emails and all of a sudden your entire company has access to reports no one's had before just like that you're a super valuable employee the key skill here is data manipulation once it's in the form you want you can run your models your visualizations blah blah but first you have to get it how you want it data manipulation was absolutely the best skill i received from the course and i use it everyday to make everyone's lives easier the seaborn functions and machine learning models are layers on top of that finely-structures sexy ass dataframe how did you manage your time with completing so many courses? what was your average week like when going through these? i didn't do much socializing and only saw my friends a couple of times every month i spent a lot of time with my gf since she had work to get done too on the weekends so we would set up shop and cram it out on weekdays i'd just stay in the office or head to the apt and code and on average it was one course every 2 months so definitely doable in the beginning it took a lot of effort and 4-6 weeks of 5+ hours every day after work towards the end it eased up to a couple of hours daily for more advanced stuff i still had days of 6-8 hours when i really wanted to learn something i'm not much of a drinker or happy hour person so that frees up my 6 hours after work now that i made it to where i want to be i have more time for all that socializing =d this entire guide post is so helpful not only for aspiring data scientists but also for someone like me who is stuck in a really bad contractor job and wants to get out badly into a solid paying job with benefits i struggle most with interviews so i'm saving this post to use it as a guide to get better at the interview process and also to start learning different programs (in my case as a finance major tableau sql sfdc and other analytical tools that are commonly used within a sales operations finance department) thank you op you're very welcome! glad it's useful =d nice write up! it was really interesting to read about the variety in questions you got asked did you ever get any push back or negative response on the job hopping? nope just answered honestly about needing something more challenging but not just saying that because anyone can say that i backed it up with the courses i took and projects i built this almost exactly what i did i built my chops in ad tech and learned python on the side ad tech is great because its a growing segment and from my experience most people that work in advertising are proud to understand excel when programmatic buying is much more than that i opted for a developer position in healthcare bi so i could see what went into producing enterprise grade data systems yeah i never learned about adtech until i got into it great place to be for data related anything wow this is a lot of detail a huge motivational boost as well thank you! why is there patrick mckenzie's photo on this post? because of the link to his blog (kalzumeus) that confused the hell outta me too [deleted] those $10 sales are too addicting man lemme tell you i have just finished a course in data analytics using excel and moved on to deep learning by andrew ng on coursera i am not really doing it make career switch but just cos i'm curious about this field unfortunately i have no programming experience or background in maths so things have been challenging i finished a course on python on code school some months back and been hitting up random stuff on stats to supplement these courses i was really questioning what i'm doing with my life and why i am spending so much time learning something that i probably will not even use at work but your post was so inspiring and i think i'll keep on trucking thanks a ton!! this is a pretty good field guide and something which is very realistic unlike other fancier click baity guides thank you for this! thanks for sharing your story helps me move forward with directed confidence i'm currently in the earlier stages six months into working as an "analyst" after a masters in stats and working about 30 hours a week on the outside on projects study job applications looking for position where i can better up my skills i haven't ruled out a phd but like you my motivation spiked when i left school and so i'm thinking best to concentrate effort on what's working well i think i'd prefer to go without it but i'll make the choice in another year according to what i learn during that time best of luck! this is awesome - shows what hard work can do; you've clearly got a bunch of talent too - company is very lucky to have you congratulations! thanks!! this is awesome i am also a civil grad! with a mse in structural materials engineering i've been posting recently and have received so much feedback and i can't stress enough how great this post is! thanks for sharing great post! thanks for sharing congratulations on your hard work paying off your post is something i needed to hear as a senior in college with a degree in the humanities who found out that data analysis is awesome great to hear! in the company i'm currently at a lot of account managers have a humanities background they work with the tech team and data science team often for things like sql queries and excel stuff there's only one of them who actively tries to learn he asks questions about why things work in sql the way they do asks for one-on-one guidance etc it's a relatively easy path to an analyst role if the direct path isn't viable plus if you're in direct contact with clients on the business side of things that's awesome to have on your resume if you want to get into data analysis so that's a potential way to go while you're still in college take a probstats course if you haven't already digest it understand it make love to it because it will the most useful thing you'll know later on good luck! well my minor is actually data science so i've already taken statistic courses and programming courses so it's not a completely new area for me but it makes me feel better knowing that people started from scratch and we're able to pursue the data science jobs good luck in your field oh snap dope! had no idea ignore my last reply then lol hopefully someone else might find it useful =p best of luck! i'm sure you'll kill it thanks man and your advice is still good lol your original post has sorta kick started my current drive to learn text mining and i'm currently in the process of working on a project with a prof of mine in text mining so thanks lol your post made me reflect on my journey and realize that i'm 18 months in myself and the cool thing is that i'm like 90 percent there just in the final stretch of landing the "one" we started at almost the same time! i already have a top-notch ds internship from a well-known tech firm that is getting me interviews anywhere but i have the ~8 months until i graduate to go until then i'm just studying my ass off for ds interviews and tuning up personal projects just wanted to humblebrag a bit and holler at you to let you know that there's someone else out there who is following the same path! congrats! you'll do great great work! where are you located? ny nj metro area great post and loving the humor you put in each of your steps i swear we basically walked the same path i also got my masters in or back in 2013 but everything in the curriculum just did not prepare me for the current world of data science i think your post gives a lot of hope for those who are stuck in steps 1-3 congrats on the career change man wish you all the best i love talking shit about how little i learned from that degree but then realize i had 1 1000th of the motivation and drive to learn as i have now so i have to take some (aka most) of the blame lol thanks! wish you the best too link to your blog please or else we need to think it's jose portillas's account for promotion :p thank you lol i don't have one this is all you get for now and you'll have to take my word that i'm not jose no matter how bad i'd want to have his brain yeah i probably would not have gone with "ballsfor11days" for an alternate promo account lol it's tough to be that level prodigy u jmportilla great job awesome post! i feel like i am in a similar situation to where you were at your job most of what i do is more time consuming work that challenges me through time constraints more stress than challenge haha i've been considering getting a masters degree (recently took gre and did decent for not studying) but don't want to make a move until i am nearly positive it will be worth it i'd probably say i am around step 2ish and feel like i may be able to stay at my current company for step 3 as it progresses all i know is that your post is very motivating and excellent guidance i just enrolled for the python for data science and machine learning bootcamp thanks again! great! it's a fantastic course take and use every bit of info you get from there i really appreciate this post i'm trying to find a job in the next few months in data analytics hopefully i can find something faster then 20 i went to university for computer engineering so that should help speed things up a bit i've also worked in operations though for a small company if i had the money i would just get a masters but i cannot afford it right now most of the courses you mentioned are one's i have on my list so that is a good thing i'm working on sql right now and then moving to more python i'm trying to come up with a good project for my portfolio now computer engineering will be more than enough plus the experience in ops at company should solidify your qualifications depending how long it was don't worry about the master's until you're near getting a full-fledged data scientist role and even then with your current degree it might not be necessary as long as you've had the fundamental statistics courses focus on getting your programming skills down and taking machine learning courses if you're currently working now build projects that help solve problems there you should definitely find something before 20 months if you're going into analytics but not as a data scientist you can find different roles like revenue analyst yield analyst optimization analyst and data analyst that could lead you into a data scientist role if you want it good luck! that's the hope i'm looking to find something in 6 months or so i'm not working now so i am focusing on mooc's my main focus in on python and its applications(ml stats data visualization) enough r to be able to read and understand the code as well as sql and ssis for basic database management i'll be doing uber to pay the bills for a short time might try to come up with a data analytics project with that as well as i'm realy interested in the world of finance so my goal is to find something in that field i'm hoping to apply ml to some financial problems thanks i'm just hoping to find a job i really enjoy extremely thorough and extensive post you give me hope as a pure math masters graduate great hustle mate - well done! will definitely checkout the courses you've mentioned in your post how was the postgresql class? i learned a bit of postgresql admin in a different udemy class but wondering if a deeper dive would be beneficial for reference i know the psycopg2 library pretty well and am probably intermediate-advanced in sql generally it was great because it wasn't just querying from a database it was building an app that interacted with the database i already knew a ton of sql not postgres specifically but i did like that i can put that particular flavor on me resume did you go to columbia? why was there no coding in your ops research degree? was the 47k degree job before or after your masters? if before why so low? yes! there's coding in the undergrad program but the master's didn't require coding or data structures or databases or anything at least when i was there and because i had no idea what i was doing and came from civil engineering i didn't know all that was the important stuff i didn't learn anything from most classes but that's because i didn't force myself to learn until i almost failed out of a required class so as much as i want to blame the program for taking 60k and providing no applicable knowledge it was my fault too for not actually applying myself when i had the chance on top of that i was terribly unresourceful while i was there buuutttt some of the classes were also questionable and i'll leave it at that i imagine sometimes what i may have been able to accomplish with the drive i have now either way i'm happy =) and the 47k job was after master's it was so low because it wasn't a well-known company and the "analyst" title was more of a cover-up for data entry and bitchwork plus i was afraid of negotiating it was my first full time job i felt like a complete fraud feeling like i didn't learn anything useful and thought i should be grateful for even getting that much that mentality now that i know i actually have applicable skills and not just random theory is gone and replaced with a much more confident stance rl [deleted] i'm loving it! absolutely worth it i feel like i actually belong here i'm here like 11 hours a day (by choice!) and 9 of those are actually spent researching and scripting the others are spent eating and taking some breathers and going to the gym at first i was very intimidated and unsure especially because i looked at the data science team's github on my first day and psyched myself out but some conversations about expectations with my boss calmed me down in just the past 2 weeks i've done nlp on a bunch of text data for categorization and sentiment analysis automated bayesian a b test reporting and some other less cool but still cool things and there's more coming down the pipe it's awesome and because it's awesome and i love it i'm more committed than i've ever been haven't even thought about leaving yet! for me that says a lot =d this is such a beautiful post i'm on the process of learning machine learning algorithms and obviously python never in my life i have encountered so many concepts in such a short time nevertheless it's very interesting i just want to look back and pat myself in the back because all this struggle will be worth it thank you getting exposed to a lot in such a short time means you're doing it right the first year of cramming in all that information is the hardest good luck! thanks for the inspiration sir can you share with me what is your work stations setup like? are you using a laptop or desktop? specs? number of monitors? i'm just curious if i need to invest in a good workstation to be a good data scientist or ask my company to get me better equipment :) inspiring post - thank you! do you think you could have done this without a master's degree? my undergrad is in information systems and i'm a "sql monkey" at work trying to get into the data science field i'm currently spending ~4 hours a day teaching myself python after work i don't want to invest in 30-60k for a master's degree when i can self-teach myself on the other hand i feel like employers won't look at my resume if i don't have a master's degree thoughts? i think it definitely helps with getting the resume looked at no matter where you apply but i'm guessing prestigious established places likely exclusively look at only phd's or master's newer growth-stage workplaces are probably more lenient because they'll need someone fast to solve urgent business issues and if your resume if full of projects you've worked on that might get you in meaning the newer places might not need masters-level machine learning stuff just yet just someone to build dashboards for the exec team do basic analysis some scripting lifetime value calculations a b testing etc surprisingly all the above is a good amount of the job for the data science team at my company analytics-wise we have a lot of holes to plug there are talks of machine learning projects but we don't have the bandwidth to tackle them yet might as well try applying without a master's for now and see how far it gets you? maybe take a look at relatively unknown companies? and if at some point you feel it's necessary for a master's: http: www omscs gatech edu home georgia tech offers an online master's in cs with a machine learning concentration for about 8k or so if you want need a master's from a great institution that's probably as cheap as you can go i'll likely apply for next fall just because =d hope that was helpful! very helpful thank you just curious why gt’s omcs program over the omsa program? ah definitely because it was the first link i stumbled on lol but yeah that's a great alternative too and looks more focused on business application as opposed to theory though more expensive than the cs degree now apparently: http: www bursar gatech edu student tuition spring_2018 spring18-all_fees pdf they also have their micromaster's on edx org which is supposedly the 3 foundational courses of the omsa program that could be a first step since i think the certificate can transfer over as credit for the actual omsa program what about the math? i was trying to study linear algebra but not sure if i had to go back to earlier algebra did you do all of this without calculus? i did a lot of stats in grad school but never understood the matrix algebra much though i did know what data meant after analyzing not sure if i should plow through the courses as you mentioned and see if i get stuck because of lack of math or just find where the math beginning point is and do a little at a time as we go the meat of the machine learning courses was linear algebra the only calculus i remember being used was in explaining gradient descent and its various optimizations which all just deal with the slopes of a given point aka partial derivatives the mathy derivations were optional to watch i think the classes gave a really good intuitive understanding about that part though which helps visualize what's going on without having to completely understand the calculus i did have a linear algebra class in undergrad from which i retained very little but all i've encountered extensively so far is matrix manipulation aka if you know your matrix sizes and how they're supposed to add subtract multiply you're pretty much all set andrew ng's machine learning course gives an overview of all the linear algebra you'll need i think he included it in his deep learning specialization too if you're stuck understanding those topics practice programming with real data in matrix form if you haven't already i sucked at understanding linear algebra when it was taught and condensed into variables but when i was actually manipulating real datasets then it made a lot more sense to me thank you man wow how long did it take to finish those courses? i'm planning on doing the same i've stumbled upon this post extremely late but i'm wondering if you completed the courses listed in the order you listed or do you recommend going about them a certain way? hey there! the list at the end of the post is in reverse-chronological order so started from the bottom it depends what you want to focus on and what you already know but i'd say start w either statistical learning or machine learning machine learning is my favorite of the two because it's not a black box and you'll have to code the algorithms in octave or matlab thank you!! i am in a similar situation that you were in civil engineering undergrad and masters in structural engineering and i'm working in that field now about a year or so since graduating my experience in coding is next to nothing i've used matlab for my thesis but that's about it and it was mostly for producing nice graphs my main problem is that i don't know where to begin i find myself taking a longer time searching for "the perfect" course for beginners than actually doing a course or classes even taking it a step back data science interests me but i feel like i don't know what more there is in the cs field as other career paths or even within data science what possible paths there are within that edit: responses not just limited to nyc! because the industry gives the "data scientist" title to people with differing levels of data-related skillsets everything from generating reports to building machine learning models comparing data scientist roles just by title or years of experience isn't enough--getting down to the comparison of skills and role is more useful i've run into this while trying to collect more data about a data scientist's salary to use in future negotiations and understand my market value my employer like most i imagine is pretty tight lipped about what others in my role make and opaque about how they arrive at compensation numbers so i hope this post helps us all a little i'd like to share that information with the community in the hopes that others will do the same giving us all a bunch of data points where we can say to our employers "i heard of data scientist x making y doing z in nyc" here it is and thanks for everyone's participation! location: nyc title: data scientist education level: bachelors in computer science industry vertical: software-as-a-service tech company company size: 500-1000 majority of time spent using (tools): python pandas spark jupyter notebook sql majority of time spent doing (role): writing etls data analysis reporting for a b tests communication to influence the product building dashboards for monitoring metrics compensation: $104k salary some options no bonus edit: before making this post i had already used paysa glassdoor stackoverflow's salary tool h1bdata com and o'reilly's model report to compare my salary but i'm always looking for more data points and this is a different kind of data that'll be helpful to the community! i second what many have said here most of these roles are not data science this is why the salaries are within $80 to $100k range i am a data engineer and these job descriptions and salary match mine to the t with that said the line is being blurred our team is increasing being ask to preform data science asks using machine learning our team does not fully understand the math behind the algorithms but we are able to get our error within a few percentage points of the models built by true data scientists in our company simply through brute force (e g loop though parameters) this is going to become more common place employers are going to have to ask themselves is it worth the a marginal accuracy if you can pay someone 30 to 50 percent less? agreed not everyone is in to creating new methods some want to just answer business questions yeah it seems the purists in this thread basically think data science is computational statistics? at what point if your definition of "data scientists" excludes like 80+% of people with this job title do you start reconsidering your definition? at what point if your definition of "data scientists" excludes like 80+% of people with this job title do you start reconsidering your definition? at what point do people making 2-3x less doing a job that already had a name (bi analyst etc) consider that their title may be inflated? i would imagine that even the people who are deigned to "not be data scientists" by whoever in this thread are not doing literally the exact same work as the data engineers within their firms even if "data scientist" just translated to "more technical bi analyst" then that would be a meaningful distinction and i would wonder what this sort of gatekeeping is supposed to achieve beyond trying to raise the status of its insiders maybe these people should just call themselves computational statisticians? or maybe they're keenly concerned with who should be labeled as a "scientist"? i would wonder what this sort of gatekeeping is supposed to achieve beyond trying to raise the status of its insiders no need to wonder that's exactly the purpose; though you're looking at it backwards in time in both of your posts imo it's retaining the status it conferred before 'title creep' to be completely honest it's disconcerting that my brother (who isn't involved in tech or ds at all) is recommending i request a title change because "data scientist" doesn't mean anything beyond entry level analytics at 3m i'm not going to fight someone over it but surely you can understand the annoyance i'm in my mid thirties this is a senior role that i (we've) grown into ultimately it shouldn't have any effect on my productivity and growth so i don't consider myself 'salty' - just annoyed it's retaining the status it conferred before 'title creep' i suppose that's a fair point but this argument also comes with a lot of implicit assumptions about what these data scientists "should" be earning if a regular data engineer could not do my job well and i certainly could not do a real engineer's job well then some sort of title differentiation (with implications for compensation) is justified i guess i just bristle at potentially being labeled as a "data engineer" but i wouldn't care if you called me a machine learning engineer or whatever completely understand wow thats scary what are the senior level analysts called there? title in our field doesnt matter imo you can tell from a brief conversation if a data scientist is a data scientist he may have been using hyperbole but the sentiment is there a lot of people with master's level degrees don't "understand" the math behind the algorithms maybe they could work through it if they had to but most people just learn the assumptions you have to make and the process you go through to get to a result i find it interesting that your example is something obscure rather complex im a researcher in industry and have never needed to know that example you give below this is the kind of absolute rubbish phd grads echo round each other to make themselves feel good it should be a fucking meme alongside "have you tried reinforcement learning" two months ago it was "you need a phd or else you dont really understand the scientific method" i would be suspicious of somebody who describes themselves​ as a data scientist in practically the same breath as making sweeping statements about what people do or do not understand based upon qualifications in my experience 99% of phds offer little to no directly relevant experience to machine learning data science even guys i know with somewhat related stats or maths orientated phds say they only learnt a small handful of things of actual relevance and certainly not enough to compensate for 4 years of no pay and most people have phds that are basically unrelated sure you may be able to fool some hr worker that your phd is related but surprisingly few can sit down with somebody who is technically mature and give any concrete examples of non trivial overlaps somebody with a good degree in maths physics engineering has ample background to pick up the details of practically any recent paper i don't think what i said it quite that controversial i'm not trying to circle jerk i think there is a pretty big difference between approaches used in industry versus approaches used in academia i work in academia and i think having a strong understanding of things like the example that i cited is important - it is an assumption of ols after all on the other hand i have also met recent phd statistics graduates that have literally never worked with data and have no clue how to do anything practical i just find that economics training does one pretty well in terms understanding the math and being able to implement it my point is that this is not 'off the top of your head' knowledge that is useful in anything you do you propose a model and you see how that model holds up empirically plenty of seasoned academics would need to go to a reference to quantify the specific effects or implications when some assumption doesn't hold the implication that you have to be able to derive the result mathematically in order to fully understand is patently false the example you give could be fully understood from a bayesian perspective with no derivation i'd venture to guess that even most people with phd's don't fully "understand" the math behind the algorithms doing a phd doesn't magically give you full understanding of everything mathematical most phd data scientists had topics quite different than machine learning so it's not like they have spent months studying every single algorithm out there i don't talk to phd statisticians frequently but i'm not quite sure as i said the master's level people know the assumptions of the different techniques but they frequently don't know why you make such assumptions for instance they might know that heteroskedasticity inflates your standard errors but not really know why that happens mathematically but i see your point that a lot of data science people come from physics and such and probably don't have the background in statistical theory that i'm used to i would bet that even most phd economists wouldn't be able to explain your example off the top of their head hmm i didn't go to a top school or anything but i think most of my professors would be able to explain why that happens maybe not! i guess there are also a lot of phd economists that don't stay in academia who might not stay as in tune with the math phd requires doing novel research even if they didn't study the specific math behind a certain algorithm there are skills related to general evaluation and research that can apply it isn't that the phd is an expert in everything right away it is that they have proven they can be an expert in something our team does not fully understand the math behind the algorithms but we are able to get our error within a few percentage points of the models built by true data scientists in our company simply through brute force (e g loop though parameters) the math understanding what you're trying to accomplish and being able to deliver the biggest differences between de and ds isn't algorithm tuning so i don't think your marginal accuracy comment is super relevant for predicting future utilization just a few differences off the top of my head: ds don't just use 'standard' metrics - they use create what makes sense for the specific problem ds are better at thinking though statistical aspects of prediction problems - will our model generalize? ds are less prone to falling into popular bad practices - like defaulting to pca for dimensionality reduction with little thought to whether or not that makes sense tools questions asked location: nj title: ceo and co-founder education level: ph d industry vertical: healthcare company size: 0 employees majority of time spent using: r rails react majority of time spent doing: developing machine learning algorithms front end back end fund raising writing product requirement doc and mvp reqs meeting with cto on a regular basis talking to potential customers meeting with academic collaborators talking to attorneys submitting government grants meeting with potential advisers trying to figure out how to write a pitch compensation: $0 salary a whole pile of equity which doesn't really help with rent :-o edit: somehow this reply reached the top of the comments section even though it is the least informative sorry about that! good luck thanks! good luck! feel free to post if there's anything we (the subreddit readers) can help with he said he wants memes of economists with things they'd say drunk those things are mostly obscene though thanks :-) i've been on r datascience for a while now (even did an ama a few years ago) so i'll definitely post once anything comes up what are you doing on reddit!?! :) debugging trying to figure out why my analytics pipeline keeps crashing i solved it and was monitoring an hour-long run and waiting to see if it crashes it didn't! [deleted] laughing given that the company has no revenue (nor employees since both co-founders are unpaid) i'm afraid we don't have openings for interns i will definitely post on reddit once that changes i think youd be suprised at what people are willing to do for free i for one have worked for free just to get experience honestly i wouldn't feel comfortable taking a person who gets nothing in return for their labor i hope that in a few months we will be able to afford a part-timer experience is worth more than money to some but i hope you do as well best of luck :) really? no python? :o laughing i actually use python for some consulting work need to pay the bills somehow though calling it "python" is an overstatement i just work with pandas numpy scipy gotcha any reason why you don't use it at work? i assume with the phd you're just more comfortable with r? i'm working on biological data (single-cell data to be more specific) all of the leading packages are in r [deleted] unfortunately that's the reality of starting a business you need a product to make money and until that product exists there is no revenue for salaries which means i'm technically unemployed hopefully that changes within in a few weeks and i can actually pay myself i didnt know you had your own business gl mate hope you ll be succesfull! thank you :-) location: nyc title: researcher data scientist education level: phd 1 year data science experience industry vertical: streaming music company company size: 3000+ majority of time spent using (tools): python scala spark jupyter notebook sql majority of time spent doing (role): writing etls data analysis communication to influence the product researching new products compensation: $161k salary options but company is not public yet so value is not set in stone though right now worth around $180 000 mind asking what your phd is in? physics i only did a bit of ml (nothing in my research just graduate ml courses) i know a few people there if i'm assuming correctly as well given what i've heard it sounds like a great place to work! i'm teaching myself bayesian stats causality and refreshing knowledge of practicing ml trying to integrate it as part of my work even though the bulk of my work doesn't require those things what are high leverage areas i should study up on know how to do to better prep for any future data science interviews? depends on the type of role you want but those sound great for a lot of roles i would essentially study work on stuff you really enjoy and look for roles that need those skills for example i do about 50% engineering and 50% research as our pure researchers have trouble understanding how to scale our products and our pure engineers have trouble understanding some of the more subtle points of some of the models i found that i enjoyed the engineering a lot more than i expected (at my previous company they did not have good engineers so i had to learn it myself and found i enjoyed it) so here i am! you guys have a sick office if i’m assuming correctly you are probably assuming right and while it's loud it's pretty sweet! why not post total compensation? im on the west coast but have you though about checking paysa or http: h1bdata info index php?em=&job=data+scientist&city=new+york&year=all+years tbh the value of my options even given a generous revenue multiple is not that great and it's all just paper gains anyways yep i've already used paysa glassdoor stackoverflow's salary tool h1bdata and o'reilly's model to compare my salary but always looking for more data points! would love to hear your stats! [deleted] out of curiosity what did you get your masters in and what sector are you working in in the la area now working on my masters in stats interested in ds thanks :) love this idea! location: nyc title: data scientist education level: bachelors in chemistry (plus a data science bootcamp) industry vertical: consulting company size: 20-100 majority of time spent using (tools): python pandas jupyter notebook sql majority of time spent doing (role): getting cleaning investigating shaping analyzing data also communicating findings compensation: 85k salary ~10k bonus which bootcamp? and how much of an impact did it have in hiring you? jeez i work in research and i use a lot the same tools (python through jupyter pandas stata and sql i do statistical modeling visualization (for publication and eda) and build a lot of metrics to describe raw data i wouldn't call myself a data scientist and i make less than $50k (in ohio) this guy lives in nyc tbh he might be underpaid dang i new the cost of living was more there but i didn't realize it was that much more bankrate com says i would need to make $103k in manhattan or about $80k in brooklyn 😭 yeah definitely way below market that's roughly what entry level analysts make in good consulting firms might be time to ask for a raise! in fairness i'm nine months into my first data science job and i work 40 hours week (compared with entry level analysts 70-80) at my year mark i'm definitely going to ask for a raise though! thanks for replying! wishing you the best of luck in your career have you compared yourself on glassdoor? yep! i've already used paysa glassdoor stackoverflow's salary tool h1bdata and o'reilly's model to compare my salary but always looking for more data points! would love to hear your stats! sorry currently in my senior year of undergrad! studying computer science with a concentration in data analytics though will likely end up in the minneapolis st paul area after school what do you mean by "o'reilly's model" is there some app or something like so's? location: raleigh nc title: data scientist education: masters of computing (data track) bachelor's of physics bachelor's of math industry: predictive models for businesses company size: 1 000+ tools: jupyter notebook python keras scikit-learn role: building predictive models based on unbalanced data to increase the reliability and usefulness of hires competition: $8334 month did you do the georgia tech master's program? it sounded like it looking at the way you phrased it nope another school out west went to do my phd but my advisor wanted me to work 10 hours a day 7 days a week and i burned out after 2 years with a master's location: amsterdam title: junior data scientist education level: masters degree in mathematics industry vertical: banking company size: 15 000 majority of time spent using (tools): python r hive spark sql jupyter notebooks rmarkdown linux etc majority of time spent doing (role): making machine learning classification models monitoring performance of said models doing presentations about models to stakeholders creating dashboards doing ad-hoc analyses aligning with data engineer squads and business helping decide priorities for team and department etc compensation: 52k no bonus no options is that euros or usd? been wanting to move to europe and your country is one of the easier to get to jesus that is double the salary of an average data scientist in ireland literally double cost of living is very high though at least here in nyc cost of living is pretty high in dublin right now too probably not terribly near nyc but the level of wages is actually lower in dublin than the rest of ireland weirdly how much is avg monthly rent for 1 room? for a studio apartment you are looking about 800 that makes sense then nyc rent is almost double well studio apartments are pretty rare in ireland you are looking at 2 bed minimum usually those start around 1 4k for anything that isn't in a super rough area 2 bed minimum in new york in a not super rough area is probably 3k if you're lucky most likely much more especially if you're in manhattan and the uk and the rest of the continent old world doesn't pay so well it seems location: another big city in the us :-) title: principal data scientist education level: phd (phd in psychology +insight data science bootcamp) industry vertical: government public sector majority of time spent using: python (50%) r (50%) sql (like every day and mostly in sql server and postgres) majority of time spent doing: high and low level analytics ranging from complex statistics to basic descriptives some machine learning lots of visualization (using combinations of flask rshiny plotly bokeh d3 js etc) some basic etl work random adhoc stuff (e g webscraping for data) compensation: 88 9k generous retirement benefits and pto union membership default pay raises annually 35 hr work weeks i gripe to my peers that i could be making more (paysa says somewhere in the rage of 110-120k) but i can't really argue with the work life balance and diversity of work that i'm involved in can i ask what sort of psych you do? i'm currently getting my phd and haven't seen many examples of people going from psych to data science what was the transition like and how was insight (i'm planning on going this route too)? certainly! i was got my phd a in cognitive psychology and did a post-doc in cognitive neuroscience most of my work was behavioral rather than physiological (e g imaging) i think the transition from the social sciences is a natural one especially if you are used to doing a lot of statistics and have taken some number of stats courses (which i did) i would say the following about my transition into data science: i lagged behind on some programming fundamentals compared to my more stem-y peers who had more experience (usually matlab c++ or python) however i've gotten significantly better over time where i feel very comfortable writing in r and python (and whatever else language i need to pick up) i didn't know as much in terms of the mathematical mechanics of machine learning though having taken a zillion stats courses i was able to conceptually understand many of the models that are used regularly i had colleagues at insight who did physics ask me what a t-test was and to explain the mechanics of that i'm still not the most quantitatively oriented but i can learn with enough time spent in books someone from a social science background might benefit from spending formal time learning linear algebra all of my linear algebra i've picked up by way of learning statistics and reading a lot insight is great though they have a tendency of oversell the opportunities you get at minimum you get a great network also it's a nice bridge between academic work and non-academic work the mindset i have now is less "digging esoterically deep" about a particular project and rather determining what is the most essential aspects i need to get done what are the big questions i can answer and when is enough work good enough for me to stop? a lot of learning that mindset came from insight i would say the transition was smooth-ish i went through a brief unemployment period looking for a position (which i found after about 4-5 months) and i wouldn't look back at this point awesome thanks so much you are basically where i want to be in 3 years so it's encouraging to hear the transition was relatively smooth i think my programming ability is decent to good but i have a lot of work to do in linear algebra so i will keep that in mind thanks! 35 hr work weeks wow was this something you fought for during the hiring process or was it advertised from the start? it's advertised! perk of having a government job that's part of a union so do you come in at 8 and leave at 4? or do you eat lunch at your desk while working so you can leave at 3? when i started i didn't really take advantage of the 35 hr work week to be hour work week tbh my probation period is 6 months (which i'm still in now) so i was working 8:30-8:45 to 6 on most days i have since eased up a bit and work 9-5:30 on most days most of my team is gone at 5 sharp it also helps the rest of my department is gone by 5 too (some 4:30) i like having the flexibility but i partly just really enjoy the work want to put my time in and want to show that i'm willing to work hard because i'm really invested in my job i think once my probation is over i'll ease up a bit more it's sort of hard for an employee to get fired after their probation (at least for behavioral infractions or basically not doing any work) i don't see myself getting jaded anytime soon though the work we do is important (i actually work for the city govt) and i want to see our departments do well and our constituency to benefit from our work to answer your other question: i usually take the hour i get to eat lunch i like having a break in the middle of the day it helps me stay focused in the afternoon i’m not in nys or nyc but i go by various titles from researcher to analyst dc liberal arts undergrad mba in finance and accounting cloud based computing “” 1000 sql and sas i have fluency and can use python and r but don’t majority of time is inferring risk and cost trying to find the next big interesting thing and trying to stay away from implementation 140k inclusive of bonus which i have never seen so like 130 did your degrees help you land the job or was it your skills? the mba opened me up to working for first a health system and then an insurance giant my lack of skills is what pushed me to learn as much as i could can i began with sas writing code to build data for simple summary outputs with some manipulation to adjust for time etc i graduated to using sql to build data and sas purely for statistics and manipulation i’m getting much more interested in branching and doing more to fit linear regressions i now run hopefully that makes sense i just woke up and i work long days i think i knowing the business and knowing how to use programming statistics to gain the insights i want are the key i work with a lot of phds who don’t have that sales business edge thanks for replying! just curious what do you mean by trying to stay away from implementation? i don’t want to do operationalisation so if i figure out a way to do something i would rather oversee it’s implementation of when the people implementing it get more praise or money than me i'm not on the east coast but my salary is in line with yours and i thought i would give you some data points location: low coi city in southwest between 1 and 2 million people title: data scientist consultant education level: bachelors in philosophy and economics from top a 10 liberal arts college masters in business admin from a top 50 public school experience: 6 years of post grad experience but not necessarily all in analytics data engineering data science industry vertical: consulting company size: 50 - 200 majority of time spent using (tools): ms and postgres sql python pandas jupyter notebook scikit learn alteryx dataiku exasol aws some javascript visual analytics tools (power bi spotfire qlik tableau) majority of time spent doing (role): writing etl elts data analysis sql database optimization dashboard building building predictive models with python libraries creating data pipelines training end users on analytic software tools (alteryx tableau dataiku etc ) compensation: $95k salary no options 10 - 15% bonus summary: i'm a generalist and i realize more of a data engineer than data scientist (i've read the comments ) thanks for contributing! interesting stuff is your "consulting" employer a data analytics consulting firm or more of one like bcg mckinsey etc management consulting? we are more niche and specialized in data; whether it be analysis management strategy engineering or governance we don't have the breadth of a bcg or others im not sure you are working as data scientist sounds more like data enginier reporting people has to stop calling data science any thing done with python why was this comment down-voted? u danielfm123 makes a good point those roles sound more like data engineering i hope “data engineering” doesn’t become a thing i feel like that’s just a title to pay people less what? amazon google facebook etc etc all have data engineer roles in addition to data science and machine learning scientist it comes down to a very gatekeeping comment about python can’t be a data scientist tool the way the comment is worded provides little to the conversation and only serves as a distraction edit: ok i went back and reread the comment i put some inferences there that were not there i understand the point now and have no issue thanks for the civil discussion what? in no way does u danielfm123 state that python is not a data science tool the fact that you would glean that interpretation from an otherwise well-worded comment tells me you dont know much about the current state of ds and therefore are incorrect for classifying his her comment as a shitcomment people has to stop calling data science any thing done with python i have no issue with the first part of the comment it is a valid point it is the second part u danielfm123 completly dismisses any work done in python as not data science that is the way i read it what was your take away? i dont know what else to say other than your interpretation of his comment was incorrect i went back reread the comment a couple times i understand the point now not sure where i went off track i made an edit to my initial comment data science python but python does not data science thanks i read that the entirely wrong way i understand and agree with your point because is trendy to be data scientist well op's company calls them a data scientist so there's that well company doesn't know about the roles all of them are important on a data science team but a data scientist not doing inference is just an over payed analyst (not my words came from everys consultant) http: www burtchworks com big-data-analyst-salary big-data-career-tips data scientist or predictive analytics take your pick based on your description i would call you a data engineer myself i hate to break it to you but this role is a data engineer not a data scientist unless you are hand-writing machine learning algorithms with vanilla python "python pandas spark jupyter notebook sql" is the a classic data engineer stack "building dashboards for monitoring metrics" by simply writing sql queries is data engineering not data science also your pay is pretty low (at least by west coast standards) is this your first job out of college? source: have been data engineer currently am senior data scientist doing 100% machine deep learning i hate to break it to you but this role is a data engineer not a data scientist unless you are hand-writing machine learning algorithms with vanilla python where do you get this very narrow definition from? are you really suggesting everyone using any tools developed by someone else can not be a data scientist? if that were true i imagine the need for data scientists in most industries should be very low considering how much has already been developed and is readily available i think this also contradicts the common definition of a 'scientist' scientists in any academic discipline use all the tools that are useful and available to them only when existing tools are no longer sufficient they start developing their own [deleted] not trying to start an argument but source? this is literally the first time i've heard of this and if google and friends have actually said this i'd love to know thanks a bunch for the feedback! always great to get a reality check how much would you say is a reasonable salary (from your west coast perspective) for what i do? it is not my first job out of college but i'm not too far removed from college: was a software engineer for a few years now trying to break into data science could learn a lot from someone like you who's made it to 100% machine learning deep learning! id say at least $130k if youre doing meaningful work with spark are you in the bay area? i think that's a bit high for los angeles i live in la and work remotely with a company based in nyc making $160k yr any openings at this remote based company in data science? not that i know of i was brought in as a subject matter expert how'd you manage that? he's probably really fucking good at what he does obviously but the question is how to get to that point then you should probably ask him that i took your question as "how did you get those terms on that job" where the answer would be he negotiated for those terms because he's really good fair enough it's just that from what i read online companies would rather pay entry level jr dev salaries for fresh grads and mediocre developers than to bank a bunch on a 10x engineer or w e recruiter agree hi op i'm about to graduate from college and trying to enter a role similar to yours would it be okay to pm you with some questions i have? yep feel free to message me! anyone use splunk as part of their job? i've worked with data scientists that made dashboards in splunk that being said its not really a bi tool they just used it because it was available their ml toolkit is pretty bare and basic but its extensible i've used the native stuff and it has a lot of the "classic" ml algos i haven't used the toolkit to import anything from github yet but i attended a session that demonstrated that they can do it pretty easily in your definition what is a bi tool? wow splunk has a ml toolkit? last time i used it ~2 years ago they basically had a sql connector to stream 'events' from a sql query that's about it my definition of 'bi tool' is: an end user-accessible system that has analytics products in it (dashboards plots data tables notebooks etc ) yes splunk does did that maybe they have more features available now i know they were trying to move in that direction in my experience a purpose-built bi tool (looker etc) will have some kind of mapping or configuration layer so you don't have to maintain hundreds of sql queries as your data mapping layer just my 02 use whatever works for you :) splunk is trying to be more to a larger audience but they are still trying to win over the excel and sql crowd their itsi solution is pretty cool too basically take an architectural diagram and put sparkline charts over the logic blocks splunkbase is their app ecosystem too a lot of quality connectors in there i don't want to sound like a splunk shill (a "spill"?) but i think the sw is pretty great the data engineers are supposed to do the etl data scientists do the dashboards and the analysts use them i guess that makes it a bi platform [deleted] how so? it's a di bi platform one of the bigger ones if your company uses them it's usually to collect your work pc's logs to do ids ips and other things (cyber security) so if you are doing bad things on the work network then i guess you should be suspicious isn't this one of the worst ways to gather this kind of information? how so? also what other was would you recommend? always open to new and potentially better ideas! a salary survey from glass door or payscale? thanks for the suggestions! i haven't tried payscale yet but as stated in some of my comments above i've already looked at glassdoor paysa o'reilly's model report etc always looking for more data points and thought this would benefit the community as well would love to hear your stats! i apologize if i came off as rude i just think that with reddit's demographics skewing young you'd get a lot of people just starting their careers(assuming they're even being honest) my stats wouldn't be helpful since i'm not a data scientist no need to apologize! appreciate the thoughts discussion nonetheless a little bit of background about 7 months ago i decided i needed a career change due to a number of reasons working as a mechanical engineer was not fulfilling anymore and i started to hear more about data science and machine learning i've always had a passion for performing data analyses but never really considered it more than a responsibility of my various positions after looking more into the field of data science i was immediately hooked and wanted to learn everything i could in this field i started with a subscription to datacamp com and worked my way through the data scientist track with r path in addition i took some courses on coursera (machine learning with andrew ng) and edx (microsoft professional program for data science) i exposed myself to some kaggle competitions and went to a few local data science meetups these were great opportunities to network and speak with other data scientists analysts to see what they were working on and brush up on skills that would look great on my resume ultimately i focused on python (numpy pandas scikit-learn matplotlib and seaborn) r and sql the above courses are great for getting your toes wet and introducing you to the basics also keep looking at r datascience and reading of others success stories they have some great roadmaps! with my skill-set quickly growing i began applying to jobs about 3 months ago anyone who's been through the application process before will know that this is a painstakingly slow process so whatever you do don't give up and keep applying! i applied anywhere and everywhere that had a position that sounded interesting most importantly i was applying for positions i thought i could eventually do not just the ones that i could do now all the while still working on my skills through any free courses i could find up until about 3 weeks ago i was getting very few responses to my applications with even less interviews although discouraging i didn't give up i have to throw a big thanks to my friends and most particularly my roommates for keeping me going through this process the single biggest change through this process occurred after my roommate suggested i update my resume not just the content but the format and look too i reluctantly took this advice and spent a few hours crafting multiple revisions until i got something that i was happy with the result after updating my resume and applying to a few new jobs my responses skyrocketed i couldn't believe it (i fully intend to create a time series of this whole process btw) i went from next to no interviews to multiple phone calls in a single day and multiple in-person interviews in a week full disclosure it takes more than a pretty resume to extend the interview process so obviously it helps to understand the content i quickly managed to talk my way into a few interviews and i now landed my first job offer with another two interviews finishing up shortly! my biggest problem now is deciding what to do if i receive multiple offers but that is definitely a good problem to have i would have never thought i would be in this situation but it helps to know that it can happen to anyone so don't give up chasing your passions! i hope my story can serve as a little bit of inspiration to anyone else out there looking to make a career change it's never too late! tl;dr bored of my current job wanted a career change took some online courses to learn python r and sql created pretty resume applied to every job under the sun got a job offer! hey congratulations! on the same boat now can you share what’s the format of your resume? since most of you are requesting the format i've linked a few things below to get you guys started i would share the file itself but it's a little difficult to do so without sharing too much of my personal information i started with this template https: templates office com en-us resume-professional-tm16392793 found this for inspiration https: i pinimg com 736x 8f a9 19 8fa919e69e63468557459c877f818928--cv-template-resume-templates jpg and morphed that template to this keep in mind this is not the color pattern i went with i simply wanted to show you that you can get pretty close to that inspiration through some manipulation in microsoft word with shapes and modifying the colors the template is a great way to get a lot of information onto one page which is a very good thing to strive for more than 2 pages is just messy take some time to personalize it and make it unique to you my resume: https: imgur com a ukeig also it's very important to have a solid cover letter once i settled on the resume format and color patterns i basically created my cover letter to match it i used this template for that https: templates office com en-us cover-letter-professional-tm16402168 do you add your photo to your resume for real? i have been told not to do that by a lot of people 🙄 i put my photo too i use the same on every social media (fb linkedin etc) or website (github kaggle) i plan on not changing for a few years my theory is that it helps the recruiter that check your profile remember you and it makes me appear more human ps: it's a pic from my fraternity composite (suit red tie) i dont think it's a good idea to put a photo if it's not professional i did but it's completely up to you i happened to have a headshot taken from a friend who is a professional photographer which helped i'm a little taken aback do you have your old resume template to compare ? the colours you choose and the font are not going very well together i mean you got a job so it worked but it's clearly different from what i expected ! my old resume was a simple word document with only text listing my education and work experience from top to bottom very boring as i said those are not the colors i used at all check out the design tab in word and play with the themes and colors to see what looks best to you and go from there good luck! as you said "my resume" i thought you took the yellow and purple colours for your pattern but ok ! i worked a little in infographism so i made mine with latex and added a tiny creative touch (looks a little more like the second resume you displayed from the designer but a little more formal) do you send it in pdf or doc ? i had some people told me that a doc is better so the company can run it into a software that preselect candidates but frankly i'm totally sure about that info i've been submitting it as a pdf it also helps to have a "boring" formatted resume for auto-populating information in an application the format i used didn't always work so well for that aspect and when you're applying to 10+ places a day any way to save time helps thanks a lot for your answers ! :) hi i'm a bot for linking direct images of albums with only 1 image https: i imgur com hajxcrh png source|why?|creator|ignoreme|deletthis awesome story! what's you opinion of datacamp's r for data science track? did it help the most with learning r itself data cleaning and importing or actual analysis or all of the above? i have a free subscription from my school's club so i was interested in its value congrats again! it's great for exposing you to the different packages that you will utilize in r for any kind of data analysis it's decent for giving examples on how to clean data and import it as far as actual analysis it's a decent overview but your best practice in this area will be working on a data-set i highly suggest checking out kaggle specifically the titanic data-set and reviewing other users exploratory data analyses this is a great way to understand what are important things to look for it will give you an even better idea of what visualizations you can use to present your findings in a clear and concise way this is very important because it's essentially all you'll be doing in the real world you need to quickly be able to convince your audience that what you are presenting is important and valuable so they can make some kind of business decision off of your information @lanzman1769 so you’d recommend datacamp over other mooc such as as udacity udemy etc? i ask cuz i jut signed up for datacamps month service and just want a hear your opinion on it i can't really speak to udacity or udemy because i haven't actually used them but i really enjoyed the structure from datacamp udacity locks you in and doesn't allow you to move at your own pace and udemy was less structured than datacamp from what i saw i'm sure they have value i picked datacamp over them because of the structure and ease of use how knowledgeable you have to be to start working on the titanic data set? the titanic data set is designed to learn on so you can really start from nothing check out the other users exploratory data analysis' and see what they started with and how they got to their final prediction thanks for sharing this! it's great that you were able to benefit from inspiration here and that you're continuing the trend :) i'm a swe and having been passively interested in ds recently and have started looking into transitioning it would be great to hear more about your interviewing experiences and especially what you mentioned about the update to your resume congrats and thanks for sharing your inspirational story did you quit your previous job and focused fully on learning python and r? or did you also stayed working? it's always rough working and then coming home to do more coursework congrats again man i did not quit my job i devoted about 2-3 hours per day working on the courses for me it was a lot of fun learning but definitely got tough towards the end as you went over some of the more redundant courses fortunately i never really got hung up on them so i was constantly advancing which helped good for you! can you say what city you applied (most) in? i'm in the philadelphia area which has a pretty decent tech scene lots of marketing analytics opportunities with larger companies outside of the city it definitely helps being in an urban area for the amount of opportunities available congratulations bro! i am in the same boat now can you provide more details on the changes that you had made to the resume and if possible also share a copy with us that's aawesome! can you share what were the improvements in your resume besides content? sure see my comments here https: www reddit com r datascience comments 71lsm1 after_7_months_ive_finally_made_a_career_change dncphbe just to be clear content was critical as well with me not having some kind of cs background i needed to translate my career achievements into something that would resonate with hr at a tech company and the hiring managers if you're confident you know the content the hardest part will be getting past hr who screen the resumes before they even get to the hiring manager that's where the formatting helps (a little more background on myself i have a masters degree in mechanical engineering from a well-known private university in the us i tooted that horn as much as possible to help get my foot in the door i also had a substantial amount of programming experience with vba and matlab which helped me breeze through learning python and r ) i had to convert all of my resume content to contain common buzz words seen in a job listing (ie: python kpi's machine learning etc ) these were all things i had experience in they just had different terminology in the engineering world also writing down your achievements in this kind of format helps big time "improved x by y through z" congrats i was pretty much in your shoes few months ago and now i am here as well 2 years ago i finished with a useless physics degree :( randomly started coding and then got a m s in software where i stumbled upon ml at the very end of my studies usually to get an m s in software some schools require some professional experience did your school require that? also some people say getting further advanced degrees right after your bachelors may hinder you did you feel that at all? most schools actually don't require professional experience to do an m s maybe it is different outside of the usa in my case i had no choice but to get a m s because my bachelors was in physics which was useless for me without my m s i would have a hard time getting employed though i self taught myself machine learning anf data science no im over in minnesota my state university offers software as masters but they require some industry experience so it's more based for working professionals rather than recently graduated folks like me i've been thinking of pursuing a biostats or bioinformatics masters to build my credentials in data science stats and programming right now i'm using my biochem degree and work at a mid size company doing regulatory affairs work but i would love to do what you're doing that's a great and very inspiring story i'm currently moving to the field as well and i can imagine the hardship and the amount of work you had to bear (i'v been struggling with a data science course of john hopkins on coursera as well) you're getting your rewards and deserve it would you mind commenting on datacamp? i also want to join in that but quite reluctant because of the fee sure see my comments here! https: www reddit com r datascience comments 71lsm1 after_7_months_ive_finally_made_a_career_change dncnveo if you're already in the john hopkins course on coursera i would probably stick with that from my experience with coursera they are very demanding due to the time constraints the datacamp site is great because you work at your own pace also in my experience the estimated time requirements they post for datacamp are much longer than they actually took me that being said i would caution you to join now if you're already consumed by the coursera courses it's completely up to you how many courses you want to take at once but you might get more out of it doing one thing at a time especially if you're paying for them keep an eye out for when they offer discounts for the year membership at datacamp that's what i did congrats and thanks for your story were you working as a mechanical engineer while you were studying data science? i'm currently looking to do something similar but work is pretty intense right and with a lot of travel so i try and squeeze in some courses on flights and any other time i can yes i was fortunately things were settled down enough at work where i wasn't working crazy hours so i had enough time at home to go through the courses thanks for posting this! very helpful for people like me who are also looking to make the transition i worked in finance and investment before the city i live in just isn't a good market for investment unfortunately my family is here and my home is here and i prefer to stay rather than move to new york or san fran hoping to get into data science as a lot of the stuff i did in the finance investment realm was similar; lot of data analysis and presentation i've been taking courses on udemy and edx primarily sounds like i need to start networking more like you did as well great to hear as someone who is in the same position what changes did you make to your resume that you think helped getting interviews? do you mind looking at my resume? i have a master's in the field but i have only gotten 5 interviews out of over 100 applications i don't think a master's entitles you to interviews you have to show extracirricular interest congrats! i'm in the same boat working towards my first job in ml too curious what types of jobs ended up being the ones with the offers? were they different from others you were applying to? thanks! i toned down my search for more analyst type positions with exposure to data science i found that most companies aren't going to shell out six figure salaries to someone who completed an online course i noticed that most of the responses that i got were from more market seo type of companies rather than big tech companies but at the end of the day i just applied to everything i could some cool easy to edit resume templates here! https: www sharelatex com templates cv-or-resume hey man share with us your resume so we can see this miraculous before and after (remove the personal information if you want) i'm really happy for you! i wish you success! oh my i would love to see your resume i currently have no answers in my job search and i'm not sure it's because of my resume or not (i find it pretty clear and nice to look at but eventually i'm not an hiring manager ) if you could share your resume that would be really solid bruh congratulations! mind to share more insights about your resume? thanks in advance :) hi y'all! i play with a group of volleyball players every weekend typically we'll get a few dozen players with various skill levels and position preferences to play together each player has a name skill rating (integer 0-120) and two position preferences (enumerated: outside hitter "oh" setter "s" middle hitter "m" opposite hitter "opp") i'd like to be able to create an automated process that creates "ideal" teams from a player pool where each element (player) has those four defined attributes for my mvp i figure we can define a volleyball team as having the following: 6 players 2 oh (outside hitters) 2 m (middle hitters) 1 opp (opposite hitter) 1 s (setter) further the primary goal is to create balanced teams for the sake of this model i'm trying to minimize the spread of average skill ratings among however many teams are created here is a rudimentary model of my problem in excel circumstances are ideal: there are exactly 24 players with exactly enough players of each position to fill four teams skills levels were randomly assigned i used the evolutionary solving method to find minimize skill spread as i didn't satisfy linearity requirements with some of my formulas i'm just looking for general advice and direction should i use a programming language to model this? if so which? maybe an off-the-shelf program? edit: to be clear i have very little data science background i'm getting more into the field and see this as a learning opportunity that coincidentally could be a huge boon to my volleyball league so this is a somewhat restated knapsack problem where you're trying to fill n bags as equally as possible what u patrickswayzenu suggested is random search depending on how code intense you want to get and how important it is to get the true minimum you could also use uniform cost search dijkstra's algorithm this algorithm will get you the optimal cost solution given enough time and would work for your problem of minimizing cost (e g max skill - min skill team) the way you'd want to use it is to initially assign every player to a new team setup (e g n teams with 1 team having 1 player and the other 5 teams empty) and then let the search do its thing (it'll search the minimum cost scenario it has reached so far before exploring others) was going to reply that ! iirc part2 of the mit python course on edx has a lecture on that awesome! i'll look into all of that sounds promising thanks for the response to be clear you want to minimize the variance of skill rating constrained by have 2 oh 2m 1opp and 1 s yes? how large is the pool from which you can draw players? correct -- the ideal is to balance the average skill levels the teams it's fine for a low-skill and high-skill player to be on the same team the pool can realistically be anywhere from 10-54 players i figure for the sake of this discussion we can focus on pool sizes of multiples of 6 in the real world we often field teams of 5 if there aren't enough players if the pool is small enough you could probably just brute force it i know that isn't a "cool" method but it would be easy to implement what do you mean by "brute force" in this case? you can forget this since you've clarified you want balance between teams rather than within teams this is a bad suggestion i immediately thought brute force - what am i missing that makes it a bad idea? oh maybe it isn't such a bad idea i feel like 54 players is pretty big but since players have a preferred position i guess that makes the space to search a little smaller u trabaledo is the data in the picture you linked to an example? i've got an afternoon to kill and this might be fun yeah the data in the screenshot is an idealized example of the data i would expect to get in the real-world scenario i just used mockaroo to generate 24 random names skill levels (int 0-120) and then manually assigned positional preferences real data would be messier as far as positional preferences go if you'd like i can generate a "messy" set of data to test with correct -- the ideal is to balance the average skill levels the teams it's fine for a low-skill and high-skill player to be on the same team then the teams would have high variance so do you want all teams to have a similar average skill or do you want each team to have a small variance of skill between players? do you see what i am asking here? one refers to the variance of skill between teams one refers to the variance of skill within teams yep i guess i wasn't clear after all sorry about that i'm looking for teams to have a similar average skill everything looks like a "nail" to me so i'd brute force a solution by iterating over all options that meet the rule set unless the run time was totally unreasonable the run time will almost certainly be longer than the time to code it up what if i told you i'm a masochist eager to learn? i don't have much of a data science background this falls into optimization operations research i'm not as familiar with those topics as i am with stats i am going to spend an evening researching this because i find it interesting i think it's super interesting too i have an industrial engineering background and spent a reasonable amount of time studying linear programming as a result but that's about it in pseudo code and as a super naive approach: assign an integer to each player draw randomly from the integer list and assign to position 1 on team 1 remove the selected member from the available list randomly select from the member list for position 2 for team 1 continue above until team 1 is filled and then continue on until each team is filled calculate your metric and save the member team positions if it's the best repeat for some high number of iterations - e g 100k runs for such low numbers meh just bruteforce it the search space is just over 3 million filter it down to unique ones define an optimization function and find out which combo minimizes the function done huh that's a neat way to approach this i'll screw around in python and see if i can get something super janky working thanks! as others suggest you can probably do an exhaustive search given the relatively small size of the example more generally though you could do a greedy search variant: 1 choose team at random 2 choose best available player from the pool who fills one of the vacant positions on the chosen team 3 repeat code in r: https: github com raymondben volleyselect or to get a little more variation in the set of results replace step 2 with "choose randomly from the best 5 (or some other number) available players that could fill a vacant slot on the team" run this a bunch of times and then choose the best result hello r datascience a couple of months ago i asked dataanalyst for orientation on how to become a data scientist since then i have: learned r done a data science specialization on coursera done [sqlzoo](https:sqlzoo net) and codeacademy's tutorials created a portfolio (so far 1 nlp and data cleaning project) the thing is that sql seems to heavily be required by a lot of job opportunities so i would like to show my sql skills by tackling a "real life" problem sql project on my portfolio but to be able to do that i need your help with a few questions: 1 why sql is so important for data science? 2 what sort of tool should i use? don't get me wrong i loved sqlzoo's tutorial but they only teach how to use queries apart from the sqldf package on r i don't have any tools to work with sql which is the most common used? 3 finally what kind of projects would you recommend? i want to pick something that shows that i may not be a specialist in sql but i know what is important or necessary for instance would you recommend a data cleaning project with sql? is usual in the market? tldr: i want to add a sql project in my data science portfolio i have done sql tutorials what tools would you recommend to use and which kind of project should i do? sql is the bread and butter of your work your company data exists in databases you want to be able to get as far as possible with the data processing in sql (make use of that infrastructure your data engineering team provide you with) to do the last steps in whatever tools you use you don't want to rely on someone to get you the data you need aws console to run my queries fast to see they're getting the info i need before i port and re-work the code in python to make it take paramaters and make it work across projects and dates just work in sql make use of functions provided in the sql dialect you use i'd say 60-70% of my work is sitting in sql to pull and fix-up data to the point where i can run the last stretch in python r show you're able to setup tables partition those tables index etc etc source: gaming industry top tier company 2yr xp thanks! how do you run queries in the aws console? are you guys using athena? yeah that or internal pipelines but as they're used for bigger jobs i prefer building my queries against athena as it's quite fast and once i'm happy port them over in my notebooks if you're going for a startup world learn postgres between postgres and t-sql (transact-sql ms sql) you should be set most of the important stuff is exactly the same there are some esoteric functions that are somewhat different focus on window functions and summarizing data you can get quite far with pure sql also if you are learning postgres set aside some time to learn redshift-specific functions same with azure dw-specific functions for t-sql many companies now have cloud-hosted data warehouses and these two are the biggest ones aside from bigquery bigquery however uses a sql-like syntax which is more work excellent work i think others have better addressed some of your questions so i will limit this to other recommendations that would be useful learn spark especially if you like r learn hadoop learn python improve your functional programming abilities thank you very much! can you clarify the last point? sure thing it's easy to understand that programming's value to data science however most people think that just means object orientated programming like python however functional programming has a lot of ways it can make your life easier here is a decent article explaining it's value tl;dr: functional programming can be faster for big data things like spark are built on functional programming foundation scalia fp has a lot of capability and it would be useful to know it if you'd like to see how things are different between oop and functional check out this article one point i'm not seeing made here is that sql can be used for a lot more than just extraction or i should say relational databases there are a lot of advanced functions in most of the established relational database systems and can be used for a lot of pre-processing of the data before it leaves the database its a big advantage to do some of your heavy lifting inside an optimized relational environment rather than outside sounds like you're getting a lot of good suggestions but i'll throw in my bit: the short answer relational databases are extremely popular so you'll have to use them but they truly are fantastic ways to store use data there's a ton of them personally i usually use the database's command line for quick queries and run queries through python for more complicated stuff where i want everything to be re-producible the reality is that for data science sql-like work is usually one small part of a large project but rarely a project unto itself i recommend something like that there are many standard databases (like government census data) that you can download and then there is a one-touch script to install all of the tables in a database running locally do that and then make a project where you pull the data out of that database maybe i'll try your project suggestion! ty why sql is so important for data science? before the nosql craze all companies store their data in a relational database (relational database implies sql) mysql postgresql oracle microsoft sql all uses sql to store data queries (ask questions and retrieve) data from the database so that's why it's important to learn it also i believe most companies should be using relational database the nosql is overblown what sort of tool should i use? in my limited experience not a dba was a fullstack dev: an editor to save your sql script i guess learn a database and use it's console to run sql (postgresql and mysql have console) finally what kind of projects would you recommend? if you can get data from somewhere i usually scrape data but an example is scrapping from imdb com and get the raw data from there i design a database scheme and start storing it into the database after you get the data into your database you can start performing queries what's the highest grossing movie for tom hank? who does steven spielberg works the most with? queries like that your numbers 1 & 2 have already been addressed so i'd like to offer some tips on #3 don't think of sql as some project to add to a portfolio or a line item on your resume think of it as the language you'll need to know to get data things i would focus on: understanding set theory built in functions (don't try to reinvent wheels that are already available and fast) subqueries the difference between union and union all (blows me away how a lot of seasoned pros mess that up) indexing and index hints bonus: making use of table partitions if i were interviewing someone and they could demonstrate those basic ideas i'd move on to other topics in the interview i'd feel ok letting that person query a production database thanks great hint! i found it really hard to learn sql without an enterprise database to work with it is really easy to learn if you have the infrastructure to practice on even once i knew all the basics of transact sql i still struggled to set up and maintain my own databases this post is about the importance of understanding the data engineering workflow as a data scientist: https: medium com @rchang a-beginners-guide-to-data-engineering-part-i-4227c5c457d7?source=linkshare-5e0684b70d38-1515468344 think it does a really good job at explaining why sql is so foundational for a data scientist i'll try and keep this short ish sql is your tool for getting data from relational databases these are very popular so you will come across them professionaly hence the need for sql just get microsoft's sql server there's a developer edition and some free resources on how to use it provided by microsoft (you probably want to get the management studio as well which is also free ) https: www microsoft com en-gb sql-server sql-server-downloads projects are always the hard bit like yourself my sql experience is limitted i have mostly used it as an extraction tool but it is certainly not limitted to that i would grab a free data source (https: infogram com blog free-data-sources ) as a csv load it into sql server process it in some way and export the results for visualisation using those new r skills you can then build in joins and what have you once the basics are done got longer than i wanted hopefully it's useful edit: the numbering is correct when i edit the post but is displaying 1 instead of 3 for projects when i view it no idea why i am very curious why you would recommend microsoft's sql server? two things really: it's used a lot professionally sql server is fairly accessible via the management studio especially for a new learner it all depends on the local job market but where i live pretty much every company is on either mssql or oracle of those mssql is by far the better option i live in london what do people use around here? (not the guy you were talking to but ) sql server is the most common throughout the uk as far as i have seen however many big financial service institutions use oracle mysql is also a fine choice whichever is more convenient for you to learn is probably the best one to go with i will say that of the 3 rdbmss that i've installed at home (oracle mysql postgres) oracle was by far the fiddlyest i like postgresql but to be honest i haven't seen it listed in job postings anywhere near as often as the others hey thank you for your answers very complete and helpful! from what i'm getting from the answers it seems like sql is more of an extraction tool only do you think using sql queries inside rnotebook (using the sql package) would be enough for recruiters? exactly the opposite; sql is for storing and process relational data as many others have pointed out most businesses live and die by their relational databases and sql is the language needed to access and manipulate this data since the data is already structured and usually optimized to some degree it makes the most sense to manipulate the data within sql whenever possible there are also sql engines for non-relational data (like apache drill and snowflake) which turns sql into an almost universal language from personal experience when i was interviewing for my current data scientist position the number one language that companies wanted was sql well above python r or sas yeah when you have to parse 100k+ rows of data with 20+ columns sql is a lifesaver hello i am a computer science major and stats minor student currently i am looking for an entry level data scientist data analyst full time job around east coast usa i know how to handle r python and sql which are know as the three most required tools and here my qurstion arise: how proficient do companies expect from just-got-out-of-college-student? i was told that once you get hired the company is going to train you at least 3 months because what you've learned from college is not cutting edge or ready to be used stuffs in that sense i kinda thought "then perhaps companies would take something else than technical skills into account when they are hiring for entry level job " i have learned r from my university and currently using datacamp to improve my r skill will get into python and sql which will make me learn pandas as well i cannot stop thinking that my skill is not there yet and that is what makes me nervous i am a senior and i only have one semester left before i get into the real world and this is another question is it a game changer to speak more than two languages (i speak three including korean and mandarine) [deleted] that's one way of becoming a data scientist but data science is a large interdisciplinary field with a range of specializations no one will know everything people will likely have a niche that they are strong in and they complement an overall team or fill various specific roles within data science for some employers accomplishments with projects would speak more clearly don't be afraid to get your hands dirty doing will help you gain the intuition on what you need to learn more deeply you don't need to have a phd in math before you could start data science not me but here is an account of a guy's journey to data science by doing *registers for the critical thinking class that specializes in scientific thinking thanks! [deleted] it's the masters course that follows after sarcasm and internet trolling 101 you'll find course codes for these in any decent university's brochure in my experience looking for jobs for an entry level ds position you need the skill of someone doing data science for 1-5 years as someone who just started an interview data science course i'm so glad to hear this this is as accurate as it can get this process is infinitely easier with an internship first i've been looking for an entry level data analyst position since i've graduated with my masters and i've been having trouble getting called back for both internships and full-time jobs any tips? do companies check kaggle profile? some do i've been asked for kaggle and github before i can't answer directly these questions but i can come about if from another angle i'm speaking as a data engineer with an undergrad in economics as someone who has taken almost five years to self-teach themselves programming dba and etl work that time would have been better spent getting a masters of all the prereqs for data science a masters in a technical field is at the top this is because it is the clearest of all the signals of aptitude after completing a masters it's understood that you have lots of experience applying the scientific method to solve problems as someone without a masters it's incredibly hard to prove that i can speak stats and science it's also going to hard to advance your career in data science data analytics data engineering without a masters the alternative which i've done to eventually find the backdoor into the industry is to simply do projects lots of them and make them public as well in hindsight making my projects more public by building an online platform presence is something i should have done more of finally in terms of popularity python and r are tied at around 50 50 usage so knowing one is sufficient however they come in at 3rd for most used 'tools' of the data scientist first being sql and second being unfortunately excel so brush up on your sql if you aren't already functionally knowing basic sql commands and using variables and stored procedures is sufficient to pipe your data in python r aside from this it sounds like you've got some competitive skills don't be disheartened if it takes some time to break into the industry there is always a luck factor edit: because there have been comments as to the ranking of tool usage i've provided sources it can be seen that sql has remained the number one most commonly used tool with excel mostly in second - including the second most commonly used tool in 2016 - and never outside of fifth over the past five years o'reilly 2016 o'reilly 2014 matthew renze - data science: the big picture o'relly 2012 13 i dunno about excel although i firmly believe csv is the future i rarely ever use excel sql a ton and python and r and maybe excel like 1% of the time updated comment to include sources your route sounds similar to mine; i have been doing neuroscience for years and as a hobby learned and developed as kinds of 'fun' websites analytical programs and the like using sql regex php python etc so strong and formal background in 'science' with informal self taught 'data' i beat out a hundred formally trained data scientists for the position this isn't correct as an entry level data scientist i spend about 50% of my time in hive (sql) 25% in spark (scala) pyspark and the rest doing other odds and ends in python or similar 5% of my time is in excel i've edited the original comment to include sources as much as i don't really care for excel either as of last year it is tied for 1st with sql as the most commonly used tool among data scientists it seems you are confusing the individual usage rate of the tool with the overall usage rate of the tool? so you believe a master's in management information systems qualifies? research based masters are better than applied i don't know about your program specifically but if you can demonstrate your ability to think scientifically then you're well off and having a masters is better than not having one in the majority of cases one of the hardest problems of my job involved very little programming and not much machine learning we had to injest a major amount of pdf pptx data ad extract specific things from those documents the data came in many formats sometimes included the data and sometimes didnt it woulda be impossible to do it by hand given the amount of time we had to do it we spent hours studying patterns finding documents of similar format (there was no proper naming convention) these are the kind of things that i'd expect from an entry level position you won't be a good data scientist on your first job - unless you are there a while most people aren't but good news is that most people who already have data scientist jobs aren't that amazing either just for some context: i started my job a few months ago after completing my phd and was expected to hit the ground running as far as the data science was concerned i work in fraud which i didn't know much about so they spent time walking us through those sorts of things however it was very much the expectation that when they hire a data scientist they're hiring to bring data science skills into the company i don't think i could have done that without a phd (or several years' experience working in analytics data science) and i almost certainly wouldn't have gotten the job without it i would aim for an analyst position (which is probably what you mean by "entry-level" data scientist because there isn't really such a thing) with great access to mentors and training the 3-month training thing might be true for entry-level software developers at large companies but it's not true in data science i work in fraud which i didn't know much about i see what you did there this question is a little off topic but i see that you have a phd in neuroscience and now do data science i recently graduated with my phd in vision science (retinal development neurophysiology focus) and realize i don't want to stay in academia and i think i'd really love data science how did you get into the field as a fellow bio-related phd? i did a phd in computational systems neuroscience - ephys and modelling of v1 neurons loads of computational modelling programming and data analysis which helps i also did some machine learning work for my phd and spent a lot of time doing courses etc i got an internship in the final year of my phd for a company that did work on anti-money laundering technology and i applied (and got) a job in fraud analytics with a medium-sized bank after my phd i think finance banking is an okay way to get into data science they seem to be recruiting a lot at the moment (like everyone else i suppose) and they're pretty behind on tech adoption my employer is only now looking at adopting tools like git version control code reviews agile ways of working etc this has substantial downsides of course but pretty par for the course in financial services do 3 kaggle competitions why 3? 1 seems to simple 2 is a little simple but should gain a good amount of experience 3 should be a good amount to learn new things and techniques and should show some dedication to employers then i'm not sure how many current kaggle competitions are image deep learning related so i figured 3 would be a good start and good experience to have to put on his resume not sure why i'm getting downvoted lol how well you did how many kaggle competitions you participated in to be honest as soon as an applicant tells me they've done stuff on kaggle but don't share remember how well they did i lose interest immediately i recently graduated with a degree in computer science and i'm about to start my masters in advanced computing with data mining and machine learning these are topics i really enjoyed when i was in my final year at university and i would love to start a career in data science my questions are how did you get started? what self projects would you recommend to build skills? general career advice: to be great at anything wake up hours earlier than you think you need to get some studying in advice that i've figured out on my own: avoid reading data science blogs on a daily basis especially those that talk about tools read more papers on machine learning statistical analysis data visualization storytelling presentation and the domain you're interested in or working at yes times 100 to the avoidance of blogs wasted too much time reading them and the same basic high-level stuff over and over without ever understanding how it worked they can be good for inspiration for side projects though look for more project and or analysis-based blog posts there are some really good blogs tho i totally agree i personally just struggled with only reading blogs and not getting a deeper understanding of any topics or putting what i read into practice since all my time was spent on the blogs just a mistake on my part mind telling me about some? http: ruder io #open to be great at anything wake up hours earlier than you think you need to get some studying in addendum to that: get enough sleep i'd rather sleep 8 hours and be productive for the whole day than sleep 5 hours and have to spend extra time thinking about simple concepts for blogs i'd give an addendum: it's good to read blogs that talk about the philosophy behind data science because they'll turn you into a better analyst i read andrew gelman's blog daily because he's great at pointing out flaws in analysis and what people should have done to fix them the only tool blogs i follow are rstudio and r-bloggers mostly because hadley wickham packages make my life much easier and because sometimes i'll find an r package that significantly simplifies something i do a lot i collect articles throughout the week to read on the weekend via the app called pocket i think that is a more healthy approach r-bloggers is a fantastic website about one specific tool that everyone who uses r should always stay up to date with my issues lye more with websites like data science central kd nuggets and some others due to making you feel like you have to learn every tool related to data science it's incredibly paralyzing and distracting yeah and data science central is complete shit vincent granville invents meaningless techniques like "jackknife regression" but doesn't get them peer reviewed clearly because reviewers are just biased he once said that statistics doesn't teach people mle or time series and he has fake female bloggers and acted like a total douchebag when he was called on it he also bribed people to give good reviews to his book on amazon everything that comes from granville or data science central is the equivalent of a snake oil salesman telling you "what the doctors don't want you to know" i'd support a blanket-wide ban on them across every data subreddit there are some fantastic blogs out there but data science central is not one of them it helps to have a good bullshitometer when trying to find good blogs preach it took me some time to smell the bs from the site hopefully other newbs do quickly glad i'm not the only one to feel that way i even went as far as reading his book when i was trying to get started in ds two hours of my life i'll never get back i appreciate reading this advice! as someone who is actively looking for work post-undergrad and putting hours into studying from online resources everyday it's nice to get some sort of validation that this is a reasonable strategy don't do this unless you still get 8 enough hours of sleep the "you need eight hours of sleep" rhetoric is a myth that being said it's probably a good idea to not wake up after less than seven hours of sleep everyday some days (sundays tuesdays thursdays) i'll go to sleep at 9:30pm to wake up at 4am it's also a great way to get your exercises done without sacrificing personal time i just did the 930pm-4am today though not voluntarily stupid insomnia lol i wonder if a phone has something to do with that 😉 this is how i got in the field without previous formal training and exactly how i keep improving my knowledge anyways try to find the sweet spot between theory and practice for example if you can't wrangle data efficiently all that theory is going to be useless how many hours do you spend studying per day? my hours are so long i feel like i need to go into superhero mode to learn anything off the job i quit my job i started my studies months ago when i was working woke up everyday at 4am and stopped at 6:15am distractions slowed me down now i begin somewhere between till 6am-9am it really depends on the person and how badly you want something i'm one of the lucky ones who has a wife partner who supports me at this moment i'm working on a shiny dashboard i started at 5pm and it's 9:50pm as of typing this started doing ggplot2 stuff at 9am i love the advice to get up earlier to get studying in! do you suggest any site or journal in particular to find those papers? rpubs com is great the machine learning subreddit is great for data visualization and storytelling i'd say first purchase a book or a video course on how to visualize data with your preferred tool of choice (if you use r get 'ggplot2 elegant graphics for data analysis second edition') then read one of tufte's books and 'storytelling with data' by cole nussbaumer knaflic actually read 'storytelling with data' first for presentation read 'the new articulate executive' by granville toogood on top of looking up youtube videos on presentation thanks! 'storytelling with data' is already on my shelf among a ton of others i need to start going through if you're not used to reading a research paper (like i am) they can be a bit daunting or downright intimidating i'm guessing only practice with it can improve that same here rpubs has tons of amateur friendly stats and ml papers i like to google bing yahoo askjeeves for case studies on statistical analysis business analytics data science and machine learning the best i've come across for data science was the book 'data science for business' i've yet to finish it but the first two chapters alone opened my eyes i just started reading that too! glad i got some good books looking at rpubs and it does have a lot of interesting stuff there find a subject matter expert in the problem area you're working on and work with them until you know the business discipline inside out don't try and figure things out yourself it will save you time and help you understand results better for example i used to build trading algos i'd sit with the trader and he'd say you need to be able to understand this to the point you can sit in my seat if i'm ever out of the office good times [deleted] i got a ds job at a large tech company straight out of undergrad i had internship experience with coding and data analysis and was a research assistant while in college it's definitely possible get up to speed on algorithms the more algorithms you have under your belt for analysis regression estimation machine learning and so on the better-equipped you'll be for example i recently put together an edge-detection algorithm by stitching together bits of linear transform theory statistics and computer vision theoretically separate fields often share compatible mathematics finding and leveraging those connections gets big value i really recommend having some side projects with presentations made even if you haven't presented them it gets teams excited when you can show them things you've worked on in the past since usually proprietary stuff is off limits it also demonstrates some degree of storytelling and possibly domain expertise edit: i just reread the question as far as what projects to do it really is open if you know what domain you want to be involved in then go for something related but honestly just having something that is relatable is almost as valuable there is great free data all over the place now here is where i got my first side project data from https: snap stanford edu data (i used the amazon review data to predict whether a review was going to be highly rated from the text using the number of stars as the "truthed" data set) decent data sometimes shows up on kaggle if you have no idea where to get started politics is my main area of interest for my dissertation i produced a data visualisation to visualise the change in sentiment of politicians over time by performing sentiment analysis on their tweets you are on the right track to be a really good data scientist i would say you need to build: statistical mindset statistics knowledge business sense you can learn about statistics read books develop a statistical mindset start looking at the world and think about how chance plays a role and where it will really help you in your profession business sense here just take a problem you are passionate about and try to tackle it it can be anything ask yourself a question and try to answer it with any data you can get your hands on hope it helps unlike engineering great data science projects are more like open-ended puzzles you may have a general goal in mind (e g improve x solve y) but you get there through exploration curiosity and effort so you just start with whatever curiosity or puzzle intrigues you at the moment then you start asking questions looking for patterns and things out of place think outside the box about other approaches challenge assumptions follow leads without giving in to frustration and then bring it all together if this is a personal project then i would just first ask you what interests you personally? politics video games reddit civil war history space travel beer music etc? in terms of my own interests i would mainly focus on politics the only issue is finding the specific data r datasets can be a good resource to start out there is even a post on election results by county that could be useful https: www reddit com r datasets comments 61sl5h 2012_2016_primaries_and_general_elections_by ?ref=share&ref_source=link that looks pretty handy the only issue is that i don't live in the us so i don't know a lot about their politics apologies for my ethnocentrism! then it's likely a good time to learn some web scraping (there are plenty of good beautifulsoup tutorials on youtube and blogs) to target your country of interest finding data can be one of the most important parts of data science off the top of my head you could try to find polling data election histories politician voting records twitter feeds of politicians demographic information about voters video audio of speeches etc for inspiration check out some of what fivethirtyeight blog does with trying to uncover answers to complex political questions in the us this is certainly the main problem i have found implementing an algorithm is the easy part my issue is that i think too far ahead and become too demanding of the data i want i want a new unique challenge not just a kaggle challenge which lots of people are working on to get the same results what tips can you offer for not giving into frustration? working with large data sets or learning a new language lately has been leading me directly into those vicious mental blocks i am a month out from completing my msc so i'm sure some of this comes from burn-out my biggest "frustration" will usually be when i will have explored some data set or method deeply only to discover hours in that something is wrong with it maybe it has duplicate entries maybe there was an assumption about two disparate sets having a 1:1 relationship (i e could be joined) or maybe something is simply missing part of dealing with that frustration was to start writing more reusable code and even occasionally stop and clean optimize my code while doing exploratory analysis that way each pass through the code improves it and even if the analysis ends up being pointless you have produced some value from it also the act of stopping to focus on something else (code optimization) can help you step back from the immediate frustration thank you for the response i will have to keep this in the back of my mind as i begin my career for self-projects start your own open-source package that solves an actual problem by building on top of the tools you already use or become a contributor on those tools go to kaggle and pick a challenge that you're interested in and try to rank high in it definitely pick one deep learning platform maybe keras or pytorch (or both) and become proficient in it make sure you're a jupyter power user and make yourself very comfortable reading other people's python code blogs are mostly useful for example code so you can get started with new things rapidly for reading it's better first to avoid reading altogether just start doing things (your own open source project kaggle etc ) and read just enough to not get stuck later when you can be more critical about what you read you could do it more most importantly always have fun when it stops feeling like fun it's a sign that you're doing something wrong if you can find a couple of senior guys who will patiently ask your questions over the coming years that will go a really long way just to add best of luck to your journey may it be auspicious! :) there aren't shortcuts to being great at something don't get hung up on learning the "right" things just keep working personal opinion-- if you read theoretical papers that's good for keeping up to date on what's pushing the boundaries but might be less useful for keeping up to date with what might help you in some domain-specific applied problem if you also read some applied ml papers in other fields it might give a better sense on what kinds of tools and methods are mature enough that scientists in other fields can use them some of these tools are definitely mature enough for industry use as well for example: tensorflow spacy gensim keras seaborn one other thing to keep in mind is that scientific storytelling and business storytelling are very very different because what's intuitive in science is not intuitive to a businessperson being a data scientist aint a tough job unless you are strong in mathematical models statistics and some other tools like r python scala so yes you can always refer many tutorials out there just to keep yourself updated but end of the day a certification proper course take up through a instructor or live training is a requisite one such blog i read long back is how data science changing the world so also as many suggested blogs ain't helping ones but i disagree it gives you an headsup for sure cheers you are on the right path as you have taken the decision already thanks for your encouragement and advice i'm a senior sysadmin doing a lateral transition to data science and while i'm not done yet my advice is the same as i give other sysadmins and devs: learn to live on and love the terminal pick a real ide like emacs or vim and force yourself to learn it and use it constantly use terminal based apps and start learning text manipulation progs like see ask grep and at the higher level perl python start writing more bash scripts if you are using windows just stop then focus on your knowledge of critical thinking processes and the scientific method and you have a good base to explore data intensive paths in data science sysadmin and dev i've tried using vim a lot for my projects but i just don't have the patience to learn more of it just too much of a learning curve when i could use something like atom or sublime easy route i know i ordered my mac at my job 2 months ago still on windows 7 oneday soon t_t pick a real ide like emacs or vim i love these as much as the next guy but they're not ides they're text editors yeah you can add plugins and all but they aren't near the same thing as something like pycharm (nor should they be) i also disagree that windows is that much of a hindrance now that it has a decent terminal in the linux subsystem (in my experience) it hasn't caused me issues it's far from perfect but when i prefer basically everything else on windows it's a tradeoff you are the best kind of correct you are technically correct that they are text editors with extensibility built into their nature allowing them to easily be turned into ide's with all the same functionality as any gui such as pycharm visual studio etc instead of being dependent on an gui ide usually made for a specific language which creates licensing and lock in along with ip issues that are expensive to lawyer out the correct way(not that most companies do violations everywhere) allows emacs vim to bypass most of these issues being an open system focused on good editing of text and extensibility to make it do what you want it to do suddenly instead of fighting gui ide's which often bring their own set of bugs and learning curve time you can focus on what ide's are supposed to really do in the end; which is edit text windows is very much a hindrance i know this is not a popular opinion among many but having used admin'd supported and abused *bsd gnu+linux solaris and windows dos os for many years both my professional and personal opinion is that richard stallman was right in the first place and was simply a man ahead of his time there are certain fundamental principles and truths behind computing that people (usually young whippersnappers) seem to have forgotten or perhaps never learned for example as a truism in the end either the program controls the user or the user controls the program both macos and windows are user controlling programs they do not allow the freedom of thought to fix look under the hood and own the changes you make (and increasingly like to lock hardware to that os (tpm efi uefi etc) a major violation of first-sale doctrine among other rights) a good example of this kind of dangerous lock-in is the john deere tractor issue good luck telling a farmer he can't work on his own machine good luck telling me i can't own root on my own devices the proprietary software and os market has done nothing but shaft customers by increasing prices reducing functionality increasing bloat removing power from the user (win10 broke the camels back) surveil the user without consent install backdoors at most levels of the subsystems (nsa prism etc) and generally violate the privacy of the user by using third party workarounds to allow out of control government to sidestep constitutional requirements to obtain data along with making shit-tons of money selling the data to random advertisers so yes i do find that windows and other such proprietary systems (mean eyes at apple and google) are a hindrance because they restrict the user from fixing their own problems limit the users choice and inculcate a lazy gui dependent mind-set awash in habit formed inneffeciencies surrounded by security issues also i wish the windows people would stop giving me this hogwash about the new linux subsystem in windows being a good thing it essentially replaces the kernel the part that is gplv2'd and those of us who remember the microsoft of the 90's know better than to trust them again embrace extend (we are here) extinguish i prefer basically everything else on windows look i'm all for user freedom of choice even if unfortunately that means the user chooses a beautiful prison just don't try to tell me about how free you are in prison that's all freedom does require sacrifice including for me for example windows specific games to me though the freedom i have is worth that sacrifice the way you compute is an idealogical choice and it's time people realized that i would really appreciate an answer from someone at your level of knowledge and experience on these questions: • why emacs or vim over atom etc ? • "if using windows just stop" -- my work may force me to use windows for data applications upon arrival this fall i want to use macos but don't have a good reason for them other than that i'm used to it and find it subjectively better (i have worked with both) what are some good reasons for the "no windows" approach?? thanks very much! ignoring the philosophical reasons ("open source" or "free" being "better" because you prefer to) the downsides to windows over a system such as linux or macos to me (as a windows user) are the file structure is annoying (\ instead of for instance) the command line options suck i think the true difference comes down to how much of a power user you are if you are familiar with emacs vim or linux they offer the advantage of tweaking your operating system to fit your needs (you can't do this with windows) or tweaking your editor (as redteam mentioned you can use extensions or even make your own to turn emacs vim into pycharm - or better) to do so basically these platforms being open source give you a huge amount of control to make your workspace "yours" - suited to your wants needs the problem is to do this you have to know what you're doing (i e be a "power user") the rest of us simply have to use what other people created; in that sense there's little difference over atom etc other than the latter comes out of the box with less setup needed plus there's a steeper learning curve maybe you'll get the same functionality but you won't get the big advantages from what i see and hear ultimately your job (data scientist engineer analyst) is your job not designing a unique workstation which is simply a tool to help you do the former imo if you already have the background knowledge of these tools (redteam mentioned being a sysadmin so naturally he would know unix very well) then of course these tools will be better - you know what you need other people do not for the rest of us there's probably more important things to learn to enter or advance in data science yea i think you're bang-on: learn the "power-user" specifics when needed and or when it's interesting to do so otherwise use more out of the box tools to facilitate higher level work quicker and easier (but with less flexibility) my position is a split between data scientist and design engineer so i'm looking forward to seeing how boxed in i'll be when it comes to hardware and software choices thanks for your advice! • why emacs or vim over atom etc ? in two words power and freedom first you are able to focus on the data and text instead of tools around a certain format application etc you aren't adding layers of abstraction in other words you are stripping away layers of abstraction (which simplifies and reveals) personally i support emacs and for me the gpl nature of gnu tools is a major point in my decision to use tools because i have made a concious decision to use tools that offer the most freedom over time for the user this may seem purely ideological but it is also pragmatic because when formats are bare and open it is easy to pickup and fix a thing written 10 years ago whether that be a config file or a patch to the editor itself you can't do that with proprietary programs that being said atom would be a good example it's is mit licensed which while being considered "gpl compatible" still allows removing freedom from future users aka tivoization (eg playstation uses bsd but doesn't upstream positive changes back to public nor release source to public!) so the mit license for me is a deal breaker in most cases and that's before we even get to the technical capability point of view for example often working on systems including mainframes and servers you are headless and need to be comfortable working in a terminal to fix a problem emacs and vim ( slaps nano) (and m-x butterflies) can flip that bit on the broken headless server in budapest while you are vpn'd ssh'd in on your phone from the london metro while on vacation good luck doing that with atom what are some good reasons for the "no windows" approach? many i have listed but first take a minute to read this that being said i will give you some more practical reasons windows is licensing heavy from cal's on servers for high cpu counts (datacenter edition! + $$$) to rdp servers to potential mandatory audits (you thought your volume license covered that? evil laugh) the bigger fact is there is no transparency in licensing costs you can call 5 different ms vendors msp's etc and get 10 different answers as a user you don't see those issues but they are still there gnu+linux is free software no orwellian licenses no fees it forces you to know your system i'm going to just say it windows and often osx users are just too gui dependent they are used to too many layers of abstractions between the data files and their representation while it may be a pain in the ass at first and the learning curve not for all for those truly wanting to understand computing data science programming etc deserve to understand the os they are working on as much as possible because it enables them to accomplish whatever their goals are i'm not saying to have to be a kernel dev but understanding the full-stack as much as possible allows the scientist dev to make better decisions on tools and their use to give you an example when i am working in an all debian environment i make sure to use a debian based distro daily to ensure my working knowledge of the server os is up to date and functional now that may not be desktop use but it's not hard to have an suse deb devuan and centos terminal open to a vps or to a local virtual server you control (i like proxmox for example because it is powerful but it is also agpl and has a webgui that t1 t2 admins can use) bottom line it makes you technically more capable which is good for you and or your employer it's a privacy thing in this era of privacy erosion and growing surveillance distopia windows et al are major violators of user privacy you don't know what all data is being sent to whom or for what purposes the applications that deliver for the platform(s) are often just as bad as the platform itself abusing the closed source and proprietary nature to betray your private information to others a great example of this would be app permissions on ios and android for example i run lineage os with all the google removed yeah that means i have to manually install apk's but that also means i get control other users don't such as a youtube apk that has all advertising removed blocked advertising and privacy violations walk hand in hand in this modern age of the data market security i know nsa has plenty of linux 0-days and backdoors as well as windows ones ( mean looks at systemd) and i understand the relationship between os marketshare and malware probability but that said it is so much easier to lock down and secure your systems in this day in age for example corporate espionage sometimes done by foreign governments (often at borders) requires that any modern laptop with sensitive information should have full disk encryption and other similar forms of protection given my previous statements and the knowledge of cooperation between the intel agencies around the world and ms among other issues i wouldn't (and don't) trust bitlocker to keep my valuable corporate ip safe for example i could go on but i think that's enough to ruminate about i previously have supported all operating systems for pragmatic reasons despite my positions (seperating the business need is important) but windows 10 really got me fed up with the status quo and i vowed to finally dump windows and osx for good i did and while it hasn't been easy it has been worth it besides a few games (i don't use wine) i haven't found a windows program i haven't replaced blender for 3d modeling video editing wings3d for 3d modeling fast ue4 works on linux (watch that licensing though!) krita gimp for image editing bleachbit for data cleanup wiping deluge transmission for (cli) torrents obs for streaming mumble for voice emacs-erc irssi for irc chat firefox icecat emacs-eww uzbl and gnome web as browsers depending on situation mps-youtube for cli youtube with vlc as media player emacs dired and thunar as file management emacs as editor and sometimes libreoffice pandoc for document conversion tesseract for ocr scanned documents asciidoc and emacs org-mode for documentation note taking time-tracking gnus mutt and thunderbird for email depending on situation emacs org-mode as data science portal by calling r python c octave programs from within usually outputting reports to latex pdf via org-mode export bash is my shell gnu is my os linux is my kernel manjaro arch is my current distro awesomewm xfce are my wm de oh and i can recommend getting work to buy you the mac hardware in sales and exec office it is often a status symbol (bullshit beucratic games i know but this is business right) and it's super easy to either wipe and use linux native (might have to distro-hop a bit to find the one that most easily matches the hardware) or if you put a bit of effort into understanding apples jacked up efi implimentation you can dual boot so you have both environments if you need if you felt sadomasochistic you could even triple boot windows which i did for a while but it felt dirty if you really need windows put it in a vm if you really need a windows app without vm perf impacts try it in wine i try not to do either personally if i had a laptop choice from a business these days i'd go for one of the msi aluminum laptops (stealth pro or raider series) which beat macbook perf wise and are 2 3 the cost for data science work though i would want a powerful desktop over a laptop work from home via corporate vpn then ssh to your box without eating your home cpu cycles wow so much solid info thanks so much i have a lot of learning to do stoked to get after it! i got started by targeting the city where i most wanted to work most of my education didn't come from my undergrad or master's but from reading wikipedia articles on techniques i was interested in and by reading a lot of blogs to see what interesting techniques people work with it's been the best way i've expanded my statistical and technological toolboxes find some side-projects that interest you that show what you can do play with real data not just the data sets everyone plays with figure out what are the important questions to ask explore the data in a way to try to answer those questions and then figure out how to tell the stories to a wide non-technical audience don't ever start with "i want to use x method algorithm" let the data tell you which way to go finally do worry about whether the data is representative of the questions you want to answer since "big data" became a thing basic statistical notions of sampling and design of experiment have been neglected for quantity and ease of access but that is a form of convenience sampling and convenience sampling will never yield reliable results it's a reason why you hear about "racist algorithms" these days back when data was expensive and hard to get a lot of time was spent figuring out the best way to collect good data that was representative of the target population these notions are still important and essentially making a comeback now that some realize that quantity does not imply quality speaking as someone with a phd in a quantitative field who'd like to become a data scientist (at least i think i'd like to do that) it has been a challenge to figure out what the data science career is all about and especially how to get into it i think i'm a pretty smart discerning person but the things i've read and heard about this career are as a whole so confusing and so contradictory it's hard to know what to think several months ago after doing some research i thought this was a viable path for me and that i had a plan to get into the field today i don't know whether any of the things i believed about this career months ago are true whether it's realistic for me to get a data science position in the near future or whether this is a career i would even want i thought i'd list paraphrased version of some of the confusing and often contradictory things i've read and heard about this career for your amusement if anyone has thoughts by all means please comment although at this point i don't think i'll believe anything anyone says about this stuff without experiencing it firsthand data science is a viable career path for people without a strong technical background who are willing to take a couple online courses it's very straightforward to get a data scientist position if you have an advanced degree in any quantitative field especially if you can code it's almost impossible to get a data scientist position right out of (grad) school no matter what field your degree was in you probably need to spend a couple years working as a data analyst or bi person to get some domain expertise data science is a field with a serious talent shortage; companies are desperate for anyone who's at all qualified the data science job market is flooded with applicants; any open job is going to get dozens if not hundreds of reasonably qualified applicants it's really hard to get a job unless you seriously stand out the technical skills are the least important aspect it's more important to work well with people and be a good communicator you can easily learn the technical aspects on the job a data scientist just needs to know more statistics than a programmer and more coding than a statistician a data scientist needs to know all about optimization numerical analysis algorithms and data structures and lots of advanced machine learning and statistics the only things you really have to know are the fundamentals: be able to write some code in python and r and know about confidence intervals and linear regression in order to appeal to employers it's crucial to have experience with the exact technologies the employer uses if they use hadoop you must know hadoop etc there's no way to be a data scientist without knowing some pretty advanced math math isn't really relevant for data science in practice and some of the most successful data scientists i know don't even know calculus data scientists spend their time working on intellectually challenging problems that are just as interesting as those in academia data scientists spend most of their time on boring data munging issues usually with an ultimate goal of selling more shoes or some other horrible marketing thing the data science field is really hot and will surely continue to grow for many years to come data science has hit its peak; it will continue to be useful in its areas of legitimate application but companies are realizing it was oversold and they will be cutting back there is an amazing amount of value to be had in data science most data scientists contribute nothing worthwhile to their employers you must have a portfolio of projects to show employers i don't know any data scientists with a portfolio of projects; my coworkers and i think that's a waste of time here's another discouraging thing: looking at job ads for data scientist positions there is such a crazy variety of skills and experience desired for each position while i know that "required skills" aren't always really required it's pretty bewildering when: one position says you must have nlp experience; one says you must have cybersecurity experience; one says you must have javascript and web programming experience (?!); one says you must have experience in the health care industry; one says you must have retail marketing experience; one says you must have bioinformatics experience; one says you must have experience in the intelligence community; one says you must have django experience (?!); one says you must have experience with the microsoft products ssms ssas and ssrs; one says you must know sas; one says you must have experience in financial analysis in the above list when i listed specific technologies it was because they seem somewhat outside the standard data science toolkit as far as i know but even when they mention "mainstream" technologies they vary so much between the different positions and there is such a massive list different ones want aws spark hadoop hive pig postgresql mysql d3 tableau mapreduce flink mahout mongodb cassandra and many more judging by the ads there is probably hardly anyone in the world who is qualified for more than one of these positions and yet they are all "data scientist" positions actually i think half these employers just copy & pasted a bunch of buzzwords into their ads but that's depressing for another reason pure r&d roles where you can just explore data like an academician and where your minions will collect process organize and store that data for you are fairly rare at least in the corporate world i would probably call that role a "pure data scientist" this is where the academic skills of advanced statistics and computer science come in handy most people working in this field have to do some to a lot of data wrangling collecting organizing storing processing cleaning and managing the data you might work with this is where hands on experience working with data systems and programming languages become necessary most people doing this work have to be deeply plugged into their business and have a strong understanding of the business vision and strategic goals as well as how specific tactics might be deployed to further that vision this is where experience in a specific industry can really help every business and every industry has a huge glossary of strange terms acronyms and lingo that are unique to them often these same words can mean very different things in other industries and businesses you need to know them well finally once you have a good sense of what sorts of things will be of value to the business (and therefore justify your salary) and you are neck deep in the data with it all over your fingers then you can pick up some of those fancy ml and ai and tensor and symbolic algebra tools and put it all together to do "data science" some data science problems once solved still require a transition to a production system rarely is the data static and the question only answered once usually you will be involved in the project to "productionalize" the techniques and processes so they can be used in an ongoing basis in the organization this is where some engineering and operations skills come in handy as well as a sense of data architecture systems architecture and formal process management are very helpful you may get to do all of the above you may only do a piece of it you may find any part and or all of it carrying the title of "data scientist" you may also find some of it with other titles "data engineer" "data architect" "data administrator" "data analyst" and these all overlap and intersect there is probably some sort of n-dimensional venn diagram that would visualize how it all fits together but it isn't really necessary the trick is to find the role and industry and coworkers and boss that suits you best and soak it up while you can right graycube describes what's known as the data-science pipeline at one end is 'ingestion' - bringing in data from various sources ingestion roles tend to be dba & operations centric especially in larger companies then you have data cleanup (ake data engineering) transformation and basic reporting usually involves a bit more programming and sw dev skills this can lead into sw pipeline development - high-performance systems to support various kinds of analytics further along the pipleline you have data analysis which can vary from from basic reporting and sql querying all the way to model building for predictive analytics & machine learning esp at the higher end the analytics role tends to require higher-level math skills each sector of the pipeline calls for different skillsets and it's unlikely a person would use skills from the full range in their job unless they work in a small company in the startup and small company world we often wear many hats sometimes we leverage outside consultants or cloud services particularly for very specialized technical skills but money is always lean so we do as much ourselves as we can this is probably the best answer here of all very sad to see it hasn't got the votes it deserves (yet?) if anyone has thoughts by all means please comment although at this point i don't think i'll believe anything anyone says about this stuff without experiencing it firsthand credential-wise i've got a ms in a ds field - i've worked as the principle prediction focused (ml) data scientist at two separate companies now for the last 3 25 years data science is a viable career path for people without a strong technical background who are willing to take a couple online courses wat if one believes that a couple of online courses is all a non-technical person needs to get a senior level title with six figures then they're optimistic to the point of stupidity it's very straightforward to get a data scientist position if you have an advanced degree in any quantitative field especially if you can code why would this be the case? it's almost impossible to get a data scientist position right out of (grad) school no matter what field your degree was in you probably need to spend a couple years working as a data analyst or bi person to get some domain expertise mostly true for ms folks less true for phds who are interested in ds jobs that lean heavily on study design data science is a field with a serious talent shortage; companies are desperate for anyone who's at all qualified there is definitely a shortage of qualified folks compared to the average field but i find that companies are willing to wait rather than hire a dunce (my last company had been looking for 1 5 years before i came on) the data science job market is flooded with applicants; any open job is going to get dozens if not hundreds of reasonably qualified applicants it's really hard to get a job unless you seriously stand out much of this is the classic hr problem the people doing the resume screenings have no fucking clue what a good applicant looks like lean on networking (online and in-person) and recruiters are surprisingly useful in this field also the technical skills are the least important aspect it's more important to work well with people and be a good communicator to say technical skills aren't important in a very technical field is asinine soft skills are important everywhere but they aren't trumping tech skills here you can easily learn the technical aspects on the job why tf would i hire a good communicator that has no idea how to actually do the job? a data scientist just needs to know more statistics than a programmer and more coding than a statistician this is an old (and pretty good) general definition but i'm not sure why you (someone) has changed it to "just needs" - we aren't trying to fit some minimum requirement threshold here a data scientist needs to know all about optimization numerical analysis algorithms and data structures and lots of advanced machine learning and statistics you need to have a foundation in all of these things you will not be an expert in every ds area the only things you really have to know are the fundamentals: be able to write some code in python and r and know about confidence intervals and linear regression whoever said this was a jackass in order to appeal to employers it's crucial to have experience with the exact technologies the employer uses if they use hadoop you must know hadoop etc change "employers" to "human resources" and you've got it right there's no way to be a data scientist without knowing some pretty advanced math some pretty advanced math does not equal full courses of calc 1 calc 2 as some folks have said here i took them in college the concepts i actually use from these courses can be summarized in a week of learnings math isn't really relevant for data science in practice and some of the most successful data scientists i know don't even know calculus you need to understand derivatives to 'get' gradient descent knowing that the derivative of x squared is 2x is not "knowing calculus" understanding probability and stats (especially concepts related to bias) are 100 times more important unless you're a ml researcher data scientists spend their time working on intellectually challenging problems that are just as interesting as those in academia true imo data scientists spend most of their time on boring data munging issues usually with an ultimate goal of selling more shoes or some other horrible marketing thing i have no doubt that some ds have to spend a bunch of time on data munging but jesus their company is either extremely small or dumb as shit (invest in data management teams ffs) or they're in a niche that requires them to constantly be taking in new (unstructured largely) data types the data science field is really hot and will surely continue to grow for many years to come hope so we'll see data science has hit its peak; it will continue to be useful in its areas of legitimate application but companies are realizing it was oversold and they will be cutting back hope not we'll see there is an amazing amount of value to be had in data science i'm paid a pittance of the easily quantified value i've brought to my employer most data scientists contribute nothing worthwhile to their employers definitely true for some sometimes it's someone pretending they're a ds sometimes it's the companies fault you must have a portfolio of projects to show employers nah but you need to be able to talk about projects i don't know any data scientists with a portfolio of projects; my coworkers and i think that's a waste of time thanks a lot for your reply it's very straightforward to get a data scientist position if you have an advanced degree in any quantitative field especially if you can code why would this be the case? well that's what i was told at one point see this page on leaving academia for instance i was under the impression that having a quantitative phd and coding background it was just a matter of brushing up on my stats and reading a machine learning textbook i mean this guy did it and that's literally what he says right? a data scientist needs to know all about optimization numerical analysis algorithms and data structures and lots of advanced machine learning and statistics you need to have a foundation in all of these things you will not be an expert in every ds area but do you really need even a foundation in all of these things? honestly do most data scientists know the first thing about numerical analysis? there's no way to be a data scientist without knowing some pretty advanced math some pretty advanced math does not equal full courses of calc 1 calc 2 as some folks have said here i took them in college the concepts i actually use from these courses can be summarized in a week of learnings this is such a strange perspective to me i think you're saying "yes you need pretty advanced math but less than calc 2 " in other words less than what most reasonably talented high school students take who in the world considers this advanced math in any way? and why are math and physics phds considered good candidates for ds positions when they've spent almost their entire education learning very difficult things that are apparently completely useless for working data scientists and very little time learning the things that are actually relevant? well that's what i was told at one point see this page on leaving academia for instance i was under the impression that having a quantitative phd and coding background it was just a matter of brushing up on my stats and reading a machine learning textbook i mean this guy did it and that's literally what he says right? sample size of one implies that it's not impossible but says nothing about how 'straightforward' it is imo i didn't read the entire post but i'd say he's luckier than he realizes consulting is prob easier to break into given the nature of how that business model operates but do you really need even a foundation in all of these things? honestly do most data scientists know the first thing about numerical analysis? eh numerical analysis is probably not a 'core skill' unless you're in a ds niche my bad on overlooking that this is such a strange perspective to me i think you're saying "yes you need pretty advanced math but less than calc 2 " in other words less than what most reasonably talented high school students take who in the world considers this advanced math in any way? i guess people who never took more than algebra ii in high school? and why are math and physics phds considered good candidates for ds positions when they've spent almost their entire education learning very difficult things that are apparently completely useless for working data scientists and very little time learning the things that are actually relevant? most books in this field are written heavily in math notation so it definitely helps you if you don't go cross-eyed when you see formulas presented rather than concepts (i still prefer conceptual presentation) physics people spend more time than your average major with high performance computing and quite a few who are coming over as data scientists have signal processing backgrounds i beg to differ - you did not overlook numerical analysis it is probably the core skill if you want to be more than that statistician analyst bi guy linear algebra optimizations and sketching are at the heart of what i'd consider true data science but yeah you always can get by without like that lucky guy otherwise i'd agree to about all you wrote curious how are you required to use linear algebra in your job (other than simply reading matrix transpose multiplication notation when reading about nn)? how are you required to use optimization in your job? are you creating custom cost functions and writing your own gradient descent? is the issue here simply one of semantics? there are many applications of linear algebra in data science including neural networks vector space models of words and any data scientist working with accelerometer gyroscope such as movement trackers you can apply optimization as a data scientist without writing your own solver for example price optimization where you have a model for how likely a customer is to purchase a product and you set the prices offered to customers to maximize expected revenue for example i work on recreating published deep learning methods in my work on text mining and nlp that means understanding all those pieces you mentioned and correctly "putting them together" i certainly wouldn't waste my time developing my own optimizer or most numerical methods indeed that is correct but i need to be able to read and understand other people's numerical code decide if it's correct and maintainable in production if it will maximally use the processors floating point optimizations and just generally to know what numerical methods i need where and why most of this can't be done if you don't at least understand the relevant math in other words it's the difference between someone who just followed recipes and blogs and someone who can actually "debug" and "refactor" a machine learning system and you don't consider this a ds niche? (which i said in the post you originally responded to) if you believe that being able to code a nn from scratch is a core ds competency then that's what you believe nlp is changing faster than traditional regression and classification due to both updating encoding methods and dl progress so i can understand how to stay on top of things requires you to "peak under the hood" a bit further than other ds professionals w r t the tools used we all have limited time - "refactoring" boosted trees doesn't make sense in my business context i could rewrite it myself in c if need be but why would i? turning business problems into ml ones feature engineering productionization (and all the worms that go with that) are better use of focus in my specific environment i guess this points to the actual problem: to me a data scientist is someone who works on machine learning problems possibly even (actual) ai-related issues for many others it's just the new name for people doing bi or analysts statisticians maybe a bit more applied than "yesterday" that probably is much more defining our "definition borders" than anything we've discussed here ps: i also wouldn't refactor xgboost but i hope by now it's clear i wasn't referring to that i understand your sentiment there's a big difference between you and someone who just read a text mining book and points some fields in the direction of a few canned packages non-mobile link: https: en wikipedia org wiki numerical_analysis helperbotv1 1 r helperbot_iamabot pleasemessage u swim1929withanyfeedbackand orhate counter:27145 math and physics (and stats and cs and engineering and economics and ) phds rarely need phd-level understanding to do the jobs they do in industry you're lucky if you find a job that does require that depth of technical understanding more than a couple times a year great info as an ms ee student looking to pivot to ds do you have any tips? thinking of doing the kaggle nanodegree program i'm finishing a probabilistic machine learning course now but it's extremely high depth with little breadth so i feel like i need more practical experience with large data sets and industry tools that nanodegree program looks interesting and has great reviews i recommend: 1 program every single weekday prob should go ahead and choose r or python don't neglect statistics (especially stats concepts above memorizing this test vs that test) what's selection bias? what's confirmation bias? what's optimization bias? ect learn your way around databases and how to pull info at least learn basic design and be able to work your way through online sql courses you don't necessarily have to be a visualization expert but you're going to be in trouble in this field if you flat out can't present information solidly stephen few's 'now you see it' is a solid read but you'll want to practice if this area isn't natural for you kaggle nanodegree program? do you mean udacity? yeah well co-create by kaggle anyway i have no doubt that some ds have to spend a bunch of time on data munging but jesus their company is either extremely small or dumb as shit (invest in data management teams ffs) or they're in a niche that requires them to constantly be taking in new (unstructured largely) data types this hurts to read i work at a billion dollar private company with extremely low fixed costs and the lack of investment in data infrastructure is a constant pain i think it's more common than you realize though aspiring ds here thank you for your response it definitely cleared up some questions i had! where did you get your ms in ds from? just this morning i was looking at a data science oriented internship you're supposed to be just a college student a bachelor graduate at most but still they wanted you to have solid experience with sql python java hadoop r sas spss mapreduce hive etl and olap i don't even have any idea what some of these are i think hr people have no clue what they're looking for in a candidate which is sadly common with tech related subjects at least in my country as a student myself i am currently studying python and i'm probably going to pick up r as well in order to start applying to data science junior positions as soon as i graduate you're supposed to be just a college student a bachelor graduate at most but still they wanted you to have solid experience with sql python java hadoop r sas spss mapreduce hive etl and olap the fucking struggle man i'm in the middle of it right now i look through the internship credentials and ask myself "do people really know all of this at my age?" how old are you? i've been doing sql for about 3 years python for like 6 and recently have added hadoop r mapreduce (to a high level only not operational) hive (reasonably decent knowledge) etl and olap to my skill set no intention of learning sas spss or java i'm 21 i've been doing java for about 6 years r and sql for 3 years python for 2 years matlab for 1 year and i've dabbled a little in tableau and spss i've never used sas mapreduce hive etl or olap i would argue that its the focus you've had to date have you been 'trying' to do data science? and really pushing yourself on the tech front? fyi etl = extract transform load it isnt a technology its a 'process' olap = online analytics processing data warehouse typically an mpp database hive and mapreduce are very hadoop things i have sort of been trying to do data science i only recently discovered how interested i am in it because of the way my school stupidly structures classes so i haven't taken too many classes on it yet but i'm trying to get better with certain aspects like machine learning r etc on my own as well i was pretty lax about applying to jobs last year but now i am trying to buckle down and get serious about my future and thank you i've never heard of etl and i've heard of olap but don't know the first thing about it olap is just a use case (and subsequently a design optimisation) for a data warehouse for example redshift is a classic olap db it's mpp so instead of storing rows together it stores columns and hashmaps this means that if you want to sum over a large number of rows on a hard drive level it is doing sequential i o which is very important makes it very fast for certain use cases i'm 21 (undergrad junior) and just landed a data science internship at a prominent west-coast tech firm the role was originally for grad students only but i was able to impress them in the way that i completed their data science assessment as far as programming i only know r python and sql and i know tableau very well and believe it or not i only started teaching myself r this time last year that said i went in big time for the whole year but still the way i see it is that every role is so different that you just gotta build up your skill profile and then apply to as much as you can to see what sticks r is popular in economics which is what this data science role has an emphasis on so naturally i had a much better chance at it than if it was a more pythonic role my point is to not worry about technologies and stuff at our age level just choosing one and trying our best to get better and better at it is the way to go in a way i see it as less of a 'struggle' and more of an opportunity! we're in the amazingly unique position of getting into a booming field while we have all the time in the world to build up our skills no kids no job etc feel free to message me if you have any questions congrats on the job and yeah i definitely agree with you it is an opportunity not a struggle we have the free time right and now it should be something we enjoy thanks for the response! out of curiosity - when did you apply and interview for that internship? my favourite was a job posting recently for a junior data scientist £24 000 per annum with hadoop hive sql python and ml experience uhhh if they can tick off those boxes they're gonna be clocking 40-45k minimum yep the staffing people don't necessarily understand the nature of the work the hiring manager therefore puts in a bunch of skills that would be useful in order to help the staffing people screen applications and find viable candidates the result is that job listings reflect idealized candidates not actual job requirements because no one knows what "data science" is it's a new evolving field i think the best explanation is this video from 2011: https: www youtube com watch?v=0tueenl61hm a couple of things: develop your skills in all 5 areas that john rauser talks about: math stats programming curiosity writing and skepticism try to have some evidence that you can tackle the above: blog portfolio talks papers dissertation whatever don't look at job requirements as "required" skills those are wish lists you ought to look at the commonalities: r python math stats background understanding how to make visualization database knowledge (sql some nosql) etc you don't even know if you like being a data scientist you could move into a more "traditional" role like statistician programmer or analyst once you get a better idea you can move onto data science you don't even know if you like being a data scientist you could move into a more "traditional" role like statistician programmer or analyst once you get a better idea you can move onto data science some sources say that a data analyst is essentially a junior data scientist is that correct? how much easier would it be to break into that role? sometimes that's true sometimes they're exactly the same sometimes they're completely different it's probably not "no one" rather there are very diverging views i still stick to the old definition i learned the term first as a job description for machine learning and ai experts but many today define it as a "modernized" statistician analyst - and those voices despite newer are more louder explaining the confusion can i just say thank you for this post i've seen the same thing judging by the ads there is probably hardly anyone in the world who is qualified for more than one of these positions and yet they are all "data scientist" positions that is the basic gist of the whole data science movement no one person is going to be able to fulfill all these requirements as someone with a phd background you need to think about how you can use your experience in the context of the problem at hand there are going to be data scientist positions where not knowing math is not a big deal and there are going to be data scientist positions that are actually statistician positions the market covers a wide umbrella of people remember during the early days when you could become a programmer in any language but now it's all specializations i have a colleague with a data scientist position title on my team who only does data pulls and munging for outside hospital researchers as the industry gets more standardized the requirements for these positions will also e standardized you can get a job as a data scientist without domain expertise but to become an effective data scientist you will have to learn the domain the type of problems that data scientists work on are also quite diverse the complexity of the problems is not directly proportional to their "cool" factor i sort of understand your frustrations too because after my phd i struggled to identify a proper domain for my data science career i went to a few interviews that were purely etl type of jobs and some that were more business analyst type of jobs as you attend more interviews you will get a clearer picture of the landscape as for technologies i would recommend the following: 1) at least 1 database scripting language to pull structured data most companies use sql server or oracle if you know one the others can be picked up on the fly you should be at a minimum e able to pull aggregate and pivot the data 2) one programming or scripting language that has good statistical packages python r are the preferred tools but i have seen people use c++ and java too additionally you need to know the the language well enough to do simple tasks like histograms conf-intervals linear regression in a fast and efficient manner this will help you do a lot of testing on the fly it is a bonus if you know a specialized package relevant to the job or you can write one yourself optional: 3) a visualization software library like tableau or d3 these are simple enough that you can learn them on the fly ideally you would not be asked to do the more complicated stuff like designing ui i would recommend that you start exploring what domains could actually benefit from your expertise and how you can add value to a company if you have a clear understanding of your value to the job it will be easier to convince the interviewers thanks for your reply i know sql i know python and use scipy and numpy and sklearn i also have some experience with a lot of other languages in a way your comment adds to my confusion because it seems like you are dramatically underselling the level of experience and skill required relative to what i see elsewhere you're telling me i need to know python well enough to create histograms efficiently? uh i've been writing python for over 15 years and know it well enough to implement poorly described previously theoretical-only massively complicated numerical algorithms; i know about the cutesy features like decorators and coroutines and metaclasses; i've contributed to some of the big ds-related python libraries but i cannot convince anyone i might be a good person to hire i don't know what to make of the mismatch in expectations here you sound qualified enough for who we hire for "entry-level" (meaning for us without commercial ds experience but pretty smart otherwise) scientists: usually good quantitative phds who can program we've taken lots of physicists applied math neuroscience engineering cs statistics phds when we have an in-house interview the candidates usually talk about their phd or postdoc research if you have the programming experience and a phd what do you feel is stopping you from pursuing this path? are there industries related to your phd that could benefit from your knowledge? are there job postings where even if you don't have the domain knowledge your research and programming skills could be useful for example this and this are two different data scientist jobs i would not expect one person to be qualified for both these positions the first one requires someone with good technical skills while the second one requires people with experience in emr the question would then be where do you see your skill set as being more useful? also keep in mind that data analyst scientist engineer research scientist are all currently quite interchangeable in the industry so you don't have to necessarily look for a position with a data scientist tag if you have the programming experience and a phd what do you feel is stopping you from pursuing this path? what's stopping me is that no one will call me back are there industries related to your phd that could benefit from your knowledge? there are no industries related to my phd i feel that i can sell my academic background as helpful in the sense of having given me quantitative skills a skeptical mind the ability to work hard and relentlessly on a technical problem etc (all the things people say are the crucial factors) but as far as the direct relevance of my research topic to an industry or particular job? no i've got a phd in physics my research was experimental particle physics i'm in about the same boat you are i feel like i have all the big skills but no one wants to take a chance on me because my background is academic instead of industrial i've gone to conferences conventions expos and when talking to people in person i can wow them with my knowledge but then when the subject of my background comes up they inevitably switch off i can see it in their face and body language some will even tell me directly there was even one person whose booth had a big sign saying "we're hiring" that told me his company isn't good at hiring smart people and letting them do their thing they only hire people with specific experience i'm seriously at the end of my rope i've looked for all sorts jobs not just data scientist positions analyst data engineer software dev software qa etc etc etc i get very few interviews and not a single offer so if no one is calling back have you looked at other factors? how's your cv and cover letter writing skills? are you tailoring your application to each position? how many people have you asked for feedback on those applications prior to submission? yeah i'm working on these things i'm talking with people right now for advice about my resume some of the advice is difficult like the advice to tailor my application to each position it's not like i have 50 major data analysis projects under my belt and can choose to highlight a different subset of them depending on what might appeal to a given employer where do you live? i might be hiring soon thanks so much for this post! i'm considering transitioning into data science after i finish up my phd (cognitive neuroscience) and thus far there has been lots of contradictory information in my search speaking as an ex-physicist working as a data scientist in london; 5 years of experience (1 in software engineering + 4 in ds); having worked for 5 companies (mostly contracts); maybe 30 job interviews from now on i will say 'data scientist do this or that' but i will only mean 'typical data scientists in london as far as i can tell from my limited experience' short version is: if you are truly interested in data science and you have the patience then you will be able to do data science data science is a viable career path for people without a strong technical background who are willing to take a couple online courses i don't know any data scientists without some quantitative degree the typical path seems to be - get a phd in organic chemistry or some such get disenchanted with academia take a few online courses and rebrand as data scientist that doesn't mean that all these people actually need their quantitative backgrounds to do their jobs correlation is not causation the type of person who would be interested in data science is also the type of person who would get a quantitative degree you don't see many art history graduates doing data science not because they tried and failed but simply because they never tried it's very straightforward to get a data scientist position if you have an advanced degree in any quantitative field especially if you can code it may be in london at least i can give many examples of recent phds (including my wife) who found jobs in data science within a month then again all the ones who didn't get any offers and gave up on data sciene - i didn't get to meet them so i can't know the true success rate but at least it seems like all those people get hired based on seeming smart and knowing the basics of python and machine learning it's almost impossible to get a data scientist position right out of (grad) school no matter what field your degree was in you probably need to spend a couple years working as a data analyst or bi person to get some domain expertise please don't! this may be just my unrepresentative experience and personal bias but i afaict data analyst and bi are dead ends where data science is concerned you definitely don't need years of domain experience to do data science you do need software engineering skills (a little at least but the more the better) general grasp of machine learning and adjacent fields like nlp and good quantitative intuitions none of which you will learn as a data analyst you would be much better off taking a job as some kind of junior software engineer in a company that is doing somehting datasciencey in my experience engineers have a lot of freedom in choosing what they want to work on within a company this is a given in tech startups but i've seen it in corporations too if you express interest the company's reccomendation engine or ab tests or routing algorithm or whatnot i'm sure they would let you work on it and then - hey you're a data scientist (kind of) data science is a field with a serious talent shortage; companies are desperate for anyone who's at all qualified hard to tell ds salaries in london are still rising so there must be some truth to it it's confounded by some really good data scientists getting sucked into engineering or management and changing job titles the data science job market is flooded with applicants; any open job is going to get dozens if not hundreds of reasonably qualified applicants it's really hard to get a job unless you seriously stand out last time i was interviewing people i interviewed a dozen and half of them were terrible and the other half accepted other offers so we didn't hire any i'm sure deepmind has it's pick of the most brilliant grads but a typical company really doesn't unfortunately a typical company that doesn't already have a good data scientist on board will often be unable to distinguish between great and mediocre the technical skills are the least important aspect it's more important to work well with people and be a good communicator you can easily learn the technical aspects on the job this is just weird it's sort of true i do believe that you can learn everything on the job - that is if you're the type of person who easily picks up this stuff and has an interest in it but if you are this kind of person then how come you haven't picked it up already? people skills are important in every office job a data science job also requires data science skills a data scientist needs to know all about optimization numerical analysis algorithms and data structures and lots of advanced machine learning and statistics certainly not in an entry-level job interview on the job it depends being the nerd that i am i always look for excuses to use some cool algorithmic shit or new machine learning technique at work and i rarely find one if you work for a let's say delivery company then optimization may be all you'll be doing for years if you work on targeted ads then you may end up spending all your time doing advanced machine learning the only thing you mentioned that you will definitely be using is 'algorithms and data structures' but 'intro to algorithms' level is (sadly?) more than enuogh for most people the only things you really have to know are the fundamentals: be able to write some code in python and r and know about confidence intervals and linear regression i would throw in a few other ml algorithms into the mix + sql but yes i would say that to land an entry-level job this is about the level you need to be at in order to appeal to employers it's crucial to have experience with the exact technologies the employer uses if they use hadoop you must know hadoop etc only really dumb employers would really require this from a data scientist you will run into dumb companies but i want to believe they're the minority usually companies put those buzzwords in job ads to attract people who like cool tech not to filter out people who haven't used some particular gizmo if you're a decent programmer you'll pick up the new technology in no time and they know it i once got a contract to do a project in r despite never having used r before (which i didn't try to hide) there's no way to be a data scientist without knowing some pretty advanced math this is simply empirically false math isn't really relevant for data science in practice and some of the most successful data scientists i know don't even know calculus i think the confusion stems from the fact that for many people mathematical intuitions become so ingrained that they no longer think of them as math something as simple as the notion of a local minimum of a function where your optimisation process might get stuck - this is the kind of thing you have to think about occasionally as a data scientist i would barely call this type of thinking 'math' but it would be very alien to someone who never got any mathematical education (or even to ancient mathematicians!) - so i guess it really is math this guy explains better what i'm talking about https: arxiv org pdf math 9404236 pdf more substantial math than this happens very rarely and never in a job interview data scientists spend their time working on intellectually challenging problems that are just as interesting as those in academia yes sometimes depends some of us would say that the problems are more intellectually challenging than the ones in academia - which is why they quit academia in the first place data scientists spend most of their time on boring data munging issues or if it's not data munging then it's fighting with hadoop or with r or oracle or some other boring technical issue that must be cleared before your brilliant idea can be implemented this is completely true and in no way different from what happens in academia maybe philosophy and some branches of mathematics consist of pure thought (and even then you have to write papers and grade exams and so on) - the rest science consists of repetitive experiments calculations writing code handling equipment conducting surveys usually with an ultimate goal of selling more shoes or some other horrible marketing thing that is accurate if you don't enjoy the craft for it's own sake you might be disappointed but don't knock it until you try it you might be misjudging what will actually give you satisfaction 'steve jobs started out passionate about zen buddhism he got into technology as a way to make some quick cash but as he became successful his passion grew until he became the most famous advocate of “doing what you love” ' https: 80000hours org articles dont-follow-your-passion you must have a portfolio of projects to show employers you don't have to have anything but sure it helps especially at first before you have relevant work experience it's a way of showing that you're serious about ds and able to get something done on your own doesn't have to be anything super fancy i've seen a cv of a very senior head of data science with decades of experience where he listed a project that i recognised as one of the homework assignments from a coursera 'intro to data science' course thanks for your reply it's really helpful the only things you really have to know are the fundamentals: be able to write some code in python and r and know about confidence intervals and linear regression i would throw in a few other ml algorithms into the mix + sql but yes i would say that to land an entry-level job this is about the level you need to be at does this not seem odd to you? if the bar is really that low what's with all the graduate degrees and high salary and buzz around data science? the way you describe it it sounds like my 19 year old self could have spent a few months learning about a few machine learning algorithms and been completely qualified for a data science position 19 year old self could have spent a few months learning about a few machine learning algorithms yes this is exactly what i'm saying i wish i had done just that when i was 19 instead of wasting the best years of my life on quantum gravity of course the 19 year old would need a few years of experience to get good but so would the 27 year old with a phd in some obscure field and been completely qualified for a data science position qualified - yes employable - not right away you'd probably have to start with some engineering job or an internship and turn that into a data science position let me be clear that i don't think every 19 y o could do that but the kind of 19 y o who can study quantum field theory can skip the qft and study some ml instead someone is bound to comment "unless you know theory x which i was taught in grad school you have no right to call yourself a data scientist i'm using it all the time on the job and consider it the essence of data science skillset" well good for you snobby mcstrawman i don't doubt that there are specific positions that require specific expertise - someone must be making those self driving cars and i wouldn't trust my 19 year old self with this task without some proper training but the generic run-of-the-mill data scientist generalist most companies are looking for really doesn't need that much theory i have compiled a long list of questions i got asked in ds job interviews - http: nadbordrozd github io interviews so you can judge for yourself i have interviewed with consultancies and retailers startups and corporations tech giants and hedge funds i believe this is a fairly representative sample if the bar is really that low it does seem low and yet precious few people can clear it the set of people who can both code binary search and find the probability of having a disease conditional on a positive test result (given base rate and false positive rate) - is vanishingly small i know this is hard to believe but it's true go to any random company and start quizing the technical people you'll see maybe it's different in bay area but it's definitely the case in london does this not seem odd to you yes it does 'where have all the good men gone and where are all the gods?' if it's that easy why aren't all jobs already taken? i believe the reason is that data science is only now slowly becoming a recognised and socially acceptable career choice a smart quantitatively minded 20 year old 10 years ago had a choice of becoming a scientist an engineer or a quant data science wasn't a thing yet few people have heard of it so the 20 year old becomes a scientist or an engineer or a quant and if he's successful he never looks back only if he fails at his initial career choice he considers data science and so data science becomes populated by failures and dropouts so where are the good ones? the good ones are getting tenure leading software teams and milking the stock market i'm only half kidding of course with academic careers being the way they are it doesn't mean you're deficient if you drop out one might argue that there is something wrong with you if you decide to stay but software engineers have pretty good lives so successful software engineers have little incentive to make the switch that's why you see fewer engineers-turned-data-scientist than academic-turned-data-scientists i personally know a several extremely talented software engineers with medals from international math competitions despite 'only' having msc in cs each of them could run circles around an average math phd in any quantitative test they sometimes use machine learning if this is what the job requires but they are intellectually fulfilled and well remunerated being mostly engineers once data science becomes something smart 20 y o aspire to - as opposed to something smart 27 y o slide into - we're going to see tougher competition in the job market what's with all the graduate degrees and high salary and buzz around data science how can this vaguely quantitative person with a random degree and basic understanding of stats machine learning and programming possibly create enough value to justify the hype? simply put - there is a lot of low hanging fruit in the last decade or two: businesses started gathering unprecedented amounts of data new machine learning techniques emerged and or were packaged in easy to use libraries which made them available to the masses platforms like aws and technologies like hadoop and spark made it feasible to process large amounts of data without massive engineering effort and expense companies became aware of the possibilities and it became acceptable to be data-driven this opened up new opportunities for leverage neve existed before it's not this lone data scientist who transforms a company it's all the engineering groundwork the company must have completed before the new technologies the shift in management attitudes the data scientist only adds the final crucial brick on top of this structure and gets all the credit let me illustrate with a story you get hired by your favourite shoe retailer they have been selling shoes for 20 years they have an algorithm in sql that predicts sales promotional uplift customer churn based on historical data the algorithm is 'take the average of all past cases' and it's been like this since year 1 you explain to the management that this is a classic regression problem and you can do much better by making use of all the sales data that they gather and not use then you spend 3 months preparing features and testing different algorithms eventually you settle on random forests you show your bosses how much better your algorithm is doing on historic benchmarks compared to the old one you have made sure to avoid time-travel in your cross validation you get a green light to implement it in production this takes you another 2 months with the help of some engineers algorithm is now in production the company is saving money and everyone thinks you're a genius this was a fairly typical data science project you have not calculated a single integral you have not explicitly used any probability distributions you haven't even used deep learning you tried but the engineers said they don't want to maintain neural networks in production you made a mental note to hire better engineers when you're the boss you have used sql python sklearn maybe spark if you're lucky after the project is finished you realise that if you had known where all the pieces of data were and how to make the databases talk to each other etc you would've spent 3 weeks on the whole affair not 5 months you wonder why haven't any of the engineers at the company done any of that before it's not like your phd in astronomy gave you some magical ability to use random forests that they don't have one reason is obvious - the people who have worked at the company for a decade simply missed the data science revolution and are not even aware of the possibilities the other reason is more interesting no one has done that before because it wasn't anyone's job everyone knew that the algorithm they were using was stupid but the people who were competent enough to do something about it simply had other stuff to do tickets to close features to build you are the data scientist and you have the mandate to go around and tinker with algorithms so that is what you do and often it turns out to be not that hard not all projects are this routine but many are and someone needs to do them after having finished a few of those you feel confident enough to start looking for something more ambitious and you have enough on your cv for employers to hire you for one of those more ambitious projects the phd (or masters + 2 years or bachelors + 4 years) is what it takes to be employable the basic skills can be entirely self taught--although probably with closer to a year's work this is actually a nice summary i'd like to make a few loosely related points data science is not a monolith just as medicine is not a monolith what a trauma surgeon does all day is vastly different than what a psychiatrist does yet both are medical doctors likewise people who specialize in visualization have vastly different skills than an expert in neural networks frankly if someone just wants work in data science to crank out d3 js plots we should not expect to them be able to describe the fundamental theorem of calculus or know what a garbage collection algorithm is many data science roles 10 years ago would have just had the title statistician or programmer while even a very good data scientist could not replace both a true statistician and true programmer they may be sufficiently talented to replace them for the needs of a given business -- representing a real cash saving in the eyes of the business their ads are often a reflection of this reasoning i once listened to an interview on the talking machines podcast with a guy from renaissance technologies (hedge fund) he said something very telling essentially he said this: the most important tool in our work is simple linear regression with just one predictor -- which can be done by a 10th grader so why do they look for people who have phds in topology or particle physics? because it is very hard to find people with minds that are sharp and careful enough to know what that one predictor should be a phd is a great way to create such a mind and by extension a great way to produce a strong data scientist lastly data science should be used for more than serving up ads i agree the nice thing is that as you move into fields that are much more dry like machine learning in medicine the amount of hype and buzz drops off a lot there is also less money to be made (in the short term) which attracts a different kind of person so if you don't like the hype try getting into a dryer area of data science lastly data science should be used for more than serving up ads "the best minds of my generation are thinking about how to make people click ads that sucks " - jeff hammerbacher buddy i understand perfectly what you're saying i was lucky enough to find the job i wanted but in the end what i'm doing in the last 7 months is actually setting up all the work needed to be able to do some actual "data science" (in the broad sense) the fact is this: very few people can actually understand what a "data scientist" should do everybody is jumping on the wagon without really knowing what they want so they end up listing useless and pointless stuff (i even saw some time ago someone asking for 5+ years experience with spark ) what i can tell you is that when i check resumes i rarely even look at the degree (i'm neither a computer scientist nor a math major) what i check is past experiences and i would like to see some projects even very tiny ones are ok for me so roll up your sleeves get some work done and be confident: you'll find someone who's really looking for talent and not for fulfilling some executives latest buzz what i check is past experiences and i would like to see some projects even very tiny ones are ok for me so roll up your sleeves get some work done and be confident this makes me super happy to see i have a double degree in econ and international affairs and am currently a peace corps volunteer i started doing udacity courses last spring after being dazzled by hans rosling's work i didn't really feel all that motivated to practice what i learned on old kaggle datasets or the preloaded datasets that come with libraries like skit-learn so i just started talking to everyone i met about what i was learning and asking for datasets i quickly got a local municipality to let me poke at a workplace satisfaction survey they put out to their employees while i'm sure it's not a 'professional' grade report the graphs and explanations of some simple stats really impressed them after that i did the same with test scores that the peace corps is gathering for our esl program and then tried out the different ml algorithms i was learning about with udacity in the end all i ended up with were some comically ineffective classifiers but it gave me enough to have a productive conversation with a local ngo that deals with open government data and a startup based in the capital both of which i'm having a blast working with do i feel like i'm two steps away from a position in silicon valley? nope do i see myself as a data scientist? hell no just some guy in a village with an internet connection but do i feel like i'm on the right track yep there's a ton of data floating around out there not all of it is useful but there's enough to create a few projects to talk about and give any amateur a better idea of what the work may actually be like aka data wrangling [deleted] nobody gets a real ds position straight out of college that's a pipe dream these job descriptions are written for people with a bachelors' + 4-6 years work experience or a master's + 2-4 years or a phd plus lots of transferable work during research a few years' work experience something like the insight program a ds internship you're not a slow learner you're just jumping the gun by like four years get a data analyst position on a team with strong mentors working on interesting problems spend a few years learning learning learning on the job and then apply for ds roles i went from a phd research computational physicist to data science with relative ease i became one with little experience and have learned on the job also i've been taking online courses to help accelerate the transition a lot of what you've listed there seems to stem from a generally disorganized it shop and the business hoping a "data scientist" will be some sort of wizard who can magically fix what is at root a chronic management problem these are actually kind of risky jobs to take (if you manage to convince them you're the right colour unicorn) you will be spending most of your time managing expectations and doing remedial it engineering some people actually like this kind of thing but understand what you're signing up for when i look for data scientists i'm looking for someone who can express their ideas effectively in both code and to people quantitative skills matter although what i'm looking for is people who understand that what is written down might not actually reflect reality and how to quantify and manage that slippage the relative maturity of my area (government) is such that my team does a lot of educating and sometimes reengineering it's not that we like or want to do these things but they need to be done and understood by our colleagues before we can actually get to the interesting stuff tldr: employers can be fixer-upper projects too when i am hiring people for ds positions i am most interested about the attitude depth of knowledge (be it in your hobbies if nothing else) programming skills (in what ever language) and commitment all this pending phd in physics or some lesser science if the above holds it is easy to teach business knowledge to the required level also sql is for monkeys i take it granted that anyone with a phd will learn required skills in no time spark is openmpi for dummies so any hpc experience overrides this often i have interviewed people with stellar ds cvs with many clients who don't even understand the basics of the most trivial algorithms like svms is it really difficult for someone without a phd to get into data science? i hear a lot of conflicting info i'm graduating with an ms in electrical engineering and taking some machine learning courses i'm a data scientist with only a bachelors the steps were: do a lot of graduate-student-ish things while in undergrad join a lab conduct original research read the literature not textbooks spend a few years as a data analyst at a big tech firm learn how to write production quality code develop instincts for data smells expand the breadth of your technical repertoire—databases sql distributed computing scraping visualization kitbashing webapps together convince a startup that has no idea what the fuck it's doing to give you a job with the title "data scientist " learn on the job make contacts focus on developing depth and expertise in 2 or 3 things apply for "real" ds positions curious in the differences in your data analyst role(s) vs your current ds role like you built reports and now you build models? my first role was in advertising fraud i used an internal tool which had some similarities to spark—in-memory analytics with a sql-like interface—to slice and dice various signals indicative of malfeasance; built mapreduce pipelines to generate new signals; and built internal tools with web-based uis to kill publishers en-masse and let other analysts turn their intuitions into automation rules very little statistical or ml modeling really tons and tons of data munging querying feature engineering (though we called it "building signals") debugging obscure mapreduce crashes staring at histograms fiddling with jacascript and django browbeating petulant salespeople at the startup in my first data scientist-titled role there was zero data infrastructure when i arrived so job 0 was to build an entire analytics stack (ingestion storage processing augmentation and forwarding of client events then loading into spark clusters for rapid analysis) after that was done lots of clustering models to understand the business causal analysis to find latent factors explaining user behavior feature engineering and classifiers collab filtering to power application features these days i'm more of an ml engineer—computer vision neural networks i'm in a same path as your having only bachelor starting doing "data engineer" works (spark hadoop sql python ) with the hope of becoming a data scientist in the future could you elaborate more on your transition to ml position? should i read books do kaggle ? is self-learning sufficient? i'm learning those stats machine learning in free time but i feel like i'm not competence enough and no one gonna hire me for doing statistics machine learning without a master phd tl;dr read papers implement things make connections i had a solid background in machine learning theory and practice from undergrad then for the last two-ish years i've tried to read a handful of papers every week on both cutting edge stuff (deep learning etc) and foundational work applications relevant to my interests twitter and r machinelearning are great sources of new interesting papers use http: www arxiv-sanity com when i see something that really sparks my interest i try to implement it shit's not working? correspond with the author read through the original implementation blogs and tutorials are often more helpful than the original paper; these days authors often make a blog post concurrent with publication whenever i have any excuse to do so i've tried to apply more advanced techniques at work so i can walk into interviews and say "here's how i've a novel deep learning architecture in production " but i'll admit it was pretty hard to find places that were actually interested in novel applications of modern methods in ml and not just the standard moar feature engineering for the classifier god grind that lots of data science seems to be these days still feel like i lucked out seems like talent and dedication aren't enough on their own; you need contacts too and to be talking to the right people at the right time so it goes so it goes which is i guess the point of this thread! spark is openmpi for dummies so any hpc experience overrides this so this there are way too many people that dont realize hpc programming is conceptually a superset of hadoop and spark or how gpu computing works why is my cuda code slower in this small data case with a lot of interactions requiring synchronization or getting bad results from race conditions what exactly is trivial about svms? pardon my flaming but plainly i think your attitude is wrong (re-reading i guess you were being ironic please ignore :-)) bad wording in my behalf svms are often among the first algorithms you meet in ds but some people who seem experienced have only very shallow understanding of these (and it is just finding the hyperplane :) ) here's another discouraging thing: looking at job ads for data scientist positions there is such a crazy variety of skills and experience desired for each position while i know that "required skills" aren't always really required it's pretty bewildering when: one position says you must have nlp experience; one says you must have cybersecurity experience; one says you must have javascript and web programming experience (?!); one says you must have experience in the health care industry; one says you must have retail marketing experience; one says you must have bioinformatics experience; one says you must have experience in the intelligence community; one says you must have django experience (?!); one says you must have experience with the microsoft products ssms ssas and ssrs; one says you must know sas; one says you must have experience in financial analysis data scientists have sufficient depth in a bunch of technical areas but tend to be experts in one or two beyond that there is also domain specialization given those two things the range of required skills above doesn't surprise me having digested the discussion here i think it's the same old problem: to me people like hinton norvig schmidhuber or (even?) kurzweil are the data scientists (including especially! their very capable scientific offspring [and many others - bengio collobert manning mccallum asf asf - and for example koller as my list is getting very male-heavy - sorry :-(((]) in many cases however a data scientist is the guy who can apply statistical methods from any kind of glm to gradient boosting to dense well-defined problems the problem is i have no idea which definition will prevail (maybe even the "other" not "mine" is beginning to stick!) and you will have to understand from the context which definition the job description discussion article referred to i consider people focusing on ml mainly to be ml practitioners engineers but in the end its just semantics i'm a data scientist and kaggle master i currently work at a hedge fund and have also worked for 3 ds based startups too many not enough ds: there are tons of low quality ds candidates only a small percentage of ds folk do less harm than good harmful people include: people who are slow bad coders people with no practical (only academic) experience people who make mistakes avoidable by someone who understands the math you need to be none of these types of bad a good ds can recognize warning signs during the interview process so hires are few and far between there is a shortage of good ds and a plethora of bad ds what to know: as much as possible; as i said previously - if you were great at math but an inexperienced coder you'd be a waste of space it absolutely needs to be both i do believe that all that stuff about soft skills is only important if you're the interface between the ds department and the rest of the company "value added" by ds: much of the time ds does not add value probably because of 1 of 2 reasons: insufficient data infrastructure to support good ds or an "ineffective" ds an inefffective ds is someone who doesn't understand (or even know they have to understand) business needs requirements is overly caught up in the fun of ml itself to make headway on important or business problems or is just incompetent it might be the case that ds isn't worth the expenditure for most places; for others (like a hedge fund) it is an indispensable advantage what to know: there are a ton of jobs out there clearly there are varied or "conflicting" requirements just learn what interests you if you like visualization learn that if you like nlp learn that try to keep getting better at programming don't worry too much about what it is just continually increase your ability to manipulate data and build good models; to extract actionable insight try to get any job you can which involves coding and data even if it's not a "data scientist" role different people want different things and half the people on your team will have no idea what you do i just got interviewed by someone who had zero idea of what my skillset would apply to the company it was a round 1 interview but it was frustrating to say the least imo its a lot of marketing also a concerted effort by big education and the data science consortiums to promote this and gain revenue from people trying to get training dollars out of people in the real world i suspect a handful of people are amazing at data science** and the rest are just trying to figure it out throw in the fact that not every employer knows what data science is but because they have this great brochure from tableau or data science world they've bought into the marketing and now have to hire ten of these guys but don't know what they want -- true story seen it first hand you have to come to terms with the fact that "data science" is not one career it's three or four or fourteen here is a bunch written about this ambiguity: https: www quora com what-is-the-data-scientist-profession-how-does-it-differ-from-related-professions i think the confusion arrives because ds is treated as a career or well defined job spec it is not ds is a vast catch all term from some db engineering to stats to oop to business impact to apps to visualisation maybe half of the jobs that are called ds today existed before ds as a term was coined you need to look at the job at hand and forget about job titles do you have the skills? will you enjoy it? will you learn? is it innovative? ds cannot be taught in couple of courses it requires a lifelong approach of connecting data with ideas data science is a viable career path for people without a strong technical background who are willing to take a couple online courses idiots and people buying a bootcamp pitch where they are the customer it's very straightforward to get a data scientist position if you have an advanced degree in any quantitative field especially if you can code see next one for this person a few months later assuming he she doesnt have some great job networking connection it's almost impossible to get a data scientist position right out of (grad) school no matter what field your degree was in you probably need to spend a couple years working as a data analyst or bi person to get some domain expertise ideal but not true you should try to get an intenship this is usually started by phd grads who thought they would have to make no adjustments (learn new skills) and get handed a 100k+ jobs data science is a field with a serious talent shortage; companies are desperate for anyone who's at all qualified yes and no people who are immediately trustable ie have data science job experience have a shortage the guys in the next bullet point do not the data science job market is flooded with applicants; any open job is going to get dozens if not hundreds of reasonably qualified applicants it's really hard to get a job unless you seriously stand out applicants yes reasonably qualified probably not the technical skills are the least important aspect it's more important to work well with people and be a good communicator you can easily learn the technical aspects on the job not true you need a foundation communication is a crucial add on a data scientist just needs to know more statistics than a programmer and more coding than a statistician well yeah a data scientist needs to know all about optimization numerical analysis algorithms and data structures and lots of advanced machine learning and statistics yes but not all and you could get away by being aware of all but not an expert in a few the only things you really have to know are the fundamentals: be able to write some code in python and r and know about confidence intervals and linear regression this is for junior data scientist positions that used to be called data analyst in order to appeal to employers it's crucial to have experience with the exact technologies the employer uses if they use hadoop you must know hadoop etc well if you want to have an edge yes there's no way to be a data scientist without knowing some pretty advanced math some people consider calculus and linear algebra pretty advanced math what is advanced math is subjective calculus and linear algebra should be known math isn't really relevant for data science in practice and some of the most successful data scientists i know don't even know calculus data science managers probably if you are a manger you can just manage good luck putting that guy as the sole data person in a startup data scientists spend their time working on intellectually challenging problems that are just as interesting as those in academia interesting is subjective data scientists spend most of their time on boring data munging issues usually with an ultimate goal of selling more shoes or some other horrible marketing thing subjective and job dependant the data science field is really hot and will surely continue to grow for many years to come who knows data science has hit its peak; it will continue to be useful in its areas of legitimate application but companies are realizing it was oversold and they will be cutting back who knows there is an amazing amount of value to be had in data science not generalizable most data scientists contribute nothing worthwhile to their employers not generalizeable but if you believe that reconsider your hiring practices you must have a portfolio of projects to show employers well if you dont have an internship or work experience how else do people know you didnt just watch youtube videos? i don't know any data scientists with a portfolio of projects; my coworkers and i think that's a waste of time employed data scientist speak for i networked my way into a job or got in early and now people dont worry if i can do my job because i have relevant experience but it feels good to say the previous there's no way to be a data scientist without knowing some pretty advanced math hey great post! thanks for sharing i have a bs in electrical engineering wondering if you would be able to recommend a math book! i am currently looking in to changing careers possibly or at the very least developing new skill sets thanks! i'm glad you liked the post i can recommend math books (i have a phd in mathematics) but here are some caveats: you realize this post was about me not understanding how to become a data scientist right? if you're asking this question in order to help yourself become a data scientist i may be the wrong one to ask according to many actual data scientists that bit you quoted (that you have to know some "pretty advanced math") is actually not true see for instance u patrickswayzenu 's comment in this thread where he basically says that barely knowing calculus is sufficient you'll need to specify what specifically in math you want to learn mathematics is a huge field; even if you just restrict to those parts with a tangible connection to data science it's very big the math books i know and like are probably more theoretical and advanced than what an ee major learning data science needs or wants even as an undergrad i pretty much skipped straight to the theoretical stuff so if you aren't prepared to read and write a bunch of proofs and learn a lot of theory i may be of limited help if those caveats don't completely turn you off let me know what you want to learn and what kind of mathematics background you have and i'll see if i can help i think i have enough experience studying and applying ds to comment on this reading that list confuses me and posting it is a disservice to everyone it's a conglomerate of sayings from different perspectives inside and around ds where the majority of items listed are clearly opinions with no reservations you need to interpret them holistically and always consider exceptions why there are so many divergent views in academia and industry is that ds is an infinite skill-ceiling subject where professionals may share a core foundation but have different specializations and backgrounds about job ads: why is it surprising to demand domain knowledge in addition to an understanding of ds paradigms? can a salesman just say 'i am good at sales' without knowing about the product? about technologies: there is a huge overlap between these technologies that a surface read of their wikipedia pages will reveal how they are connected conclusion: stop making lists and focus on what constitutes the core of ds which is the body of knowledge that operates the pipeline of data and the leveraging of data assets in every institution (and you can add that to your list) academically it is the advancement of this body reading that list confuses me and posting it is a disservice to everyone wow there's no opinion so bizarre that someone on reddit won't come along and say it please tell me exactly how my posting this list harms anyone in any way all i've done is collected some of the things i've been told and i've read about data science which anyone learning about the field will encounter and will likely find just as confusing as i did (as you can see by the comments here) if you consider them false or misleading or biased or overly opionated don't complain to me; complain to the people who told them to me and continue to spread them around the internet these are people who by the way did not present their assertions as one-off possibly misleading opinions but instead presented them as the truth about the ds field that everyone should know about job ads: why is it surprising to demand domain knowledge in addition to an understanding of ds paradigms? it's surprising because it's different from what i was told to expect if people tell you "here's the way it works in this field and what you need to do to succeed " and you believe them and then you find out it's not true that's surprising it's also surprising because it's yet another way in which this field is so incredibly fragmented if data scientists really are divided into health care people marketing people bioinformatics people cybersecurity people etc and it's difficult or impossible to transition from one of these fields to another and then on top of that you've got divisions based on different technologies and roles it's hard to see how anyone outside of san francisco has enough job opportunities to have a career about technologies: there is a huge overlap between these technologies that a surface read of their wikipedia pages will reveal how they are connected please enlighten me about the huge overlap between javascript sas spark ssms tableau and mondodb not to mention the literally dozens of other acronyms i see in these ads focus on what constitutes the core of ds which is the body of knowledge that operates the pipeline of data and the leveraging of data assets in every institution (and you can add that to your list) this has got to be the vaguest most meaningless and weasely statement i have encountered yet about what ds is and yeah it definitely belongs on the list okay now i know just what it takes to be a data scientist and i will go focus on "the body of knowledge that operates the pipeline of data and the leveraging of data assets " good grief you might as well have said "the core of ds is lots of stuff about data" for all the meaning that sentence conveyed reading that list confuses me and posting it is a disservice to everyone how so? he is trying to verify whats true i love threads like these op claims they are well qualified for a data science position and then the only evidence they give is anecdotal nothing empirical if you really think that's what's going on in this thread you are a deeply confused person i don't know whether i'm qualified for a data science position because i've gotten lots of different contradictory messages about exactly what it takes according to some of those messages i am basically the perfect candidate according to other of those messages it would take years more education and or work experience before i had a hope of getting a position and now i see your other posts are all about cartoons nevermind yup i like cartoons not sure what that has to do with what i am saying here saying stuff like 'i saw this job post that says you need skill x' is not a data-driven argument for the 'datascience' subreddit that is the irony i am pointing out a beginner’s guide to data engineering — part i has been making the rounds in the data science blogosphere lately it's written by a data scientist at airbnb the more experienced i become as a data scientist the more convinced i am that data engineering is one of the most critical and foundational skills in any data scientist’s toolkit i find this to be true for both evaluating project or job opportunities and scaling one’s work on the job in an earlier post i pointed out that a data scientist’s capability to convert data into value is largely correlated with the stage of her company’s data infrastructure as well as how mature its data warehouse is this means that a data scientist should know enough about data engineering to carefully evaluate how her skills are aligned with the stage and need of the company furthermore many of the great data scientists i know are not only strong in data science but are also strategic in leveraging data engineering as an adjacent discipline to take on larger and more ambitious projects that are otherwise not reachable despite its importance education in data engineering has been limited given its nascency in many ways the only feasible path to get training in data engineering is to learn on the job and it can sometimes be too late i am very fortunate to have worked with data engineers who patiently taught me this subject but not everyone has the same opportunity as a result i have written up this beginner’s guide to summarize what i learned to help bridge the gap i think many of us in the field feel the same way are there any good books or courses that cover data engineering as defined by the article above? which technologies are most widely used in industry? hey there i'm involved with dataquest and we're building out a data engineering track - https: www dataquest io path data-engineer we're working on courses for luigi and airflow at the moment and then we plan to work on cloud infrastructure content we have a full-time content author for data engineering who worked on data engineering at uber feel free to dm me if you have any questions! fyi the links is broken :( i'm really glad you're building out the data engineering track as i've been wanting to take it for quite some time but does it make sense to double-up on work when it comes to teaching etl in python as luigi and airflow kinda solve the same problem? also can you in any way share what other things are being planned for the track? oops looks like we have a bug (investigating)! our public content roadmap for data engineering is still a work in progress unfortunately feel free to suggest topics! https: trello com b upwicfqf dataquest-content-roadmap our approach is very learn by doing even when we teach data structures (you'll build pipelines still and work with data) with regards to teaching both luigi and airflow we want to highlight the architectural differences (and we don't plan on going deep re-introducing the core concepts each time) :) fixed now :) yes the trello board hasn't been populated with the data engineering topics yet - it's also mentioned somewhere in the comments there it'd be great if you could share what else you have planned and ideally what the timelines are i suggest you should definitely cover dask (a k a pandas for big data) spark internals (from an infrastructure point of view) tuning spark clusters containerization (docker) provisioning (ansible) deployment of containers (kubernetes) and ideally cover one non-relational database (say cassandra) maybe also an intro to a statically-typed language such as scala interesting it's been awhile since i looked at dataquest (according to my emails since 2015) gonna take a look through these courses to see what is relevant to gaps in my knowledge o'reilly 'designing data-intensive applications' seems like a pretty good tool agnostic overview but i have only 'flipped though ' my other recommendation is learn how all of aws fits together focus on glue and the machine learning products they offer thank you for the book recommendation i have seen this book recommended elsewhere too why would you focus on glue? that offers nothing as it is a crawler that looks at data at rest in s3 and finds out format and metadata as a current data engineer i would focus on difference between sql dbs and nosql dbs advantages of certain data types over others how to properly use s3 with versioning how data is replicated across regions the difference between consistency and accessibility and which matters more to you different dbs and what they are useful for how to monitor and create triggers idempotent procedures and their benefits and vice versa etc an excellent starting point is the aws solutions architect exam guidebook gives a high level overview of all the services the most important thing to be a successful data engineer is to always think long term and what will be the state of the system in a couple of years and plan for all failures currently as you can oh pedantry it is a hell of a drug you are not wrong with a long term plan to be a fully rounded data engineer but there is a real value in the 'find a place to get started and solving problems your friends and colleagues are facing today ' figuring out what to do with piles of data at rest in s3 is a challenge i see a lot of people facing crawling it and automating some etl to summarize something and move something useful out of it seems like a great place to dip ones toe into aws in general learn by doing and it is a good place to get doing while i agree about learn by doing you don’t learn much from glue as it does most of the doing for you glue is a great tool but if you’re looking to learn i would disagree that it is the tool that should be used also you could do worse than listen to the data engineering podcast - done by the same dude that does podcast init: https: www dataengineeringpodcast com hi guys i’m looking for some advice my background: i’m majoring in statistics & economics at a major university with the stats major being added only in the past year i have one more year of undergrad left and am hoping to pursue an m s in stats after graduation my current dilemma: i’m searching for summer internships with any title similar to “data analyst” “data science intern” but i haven’t had much luck so far i have some experience with r python and sql (mostly from online courses) but not a very impressive resume for these kinds of positions at this point i am involved in a few interesting projects and will have a better resume after graduation but right now it’s pretty below-average would spending the summer improving programming skills taking relevant mooc’s and working on personal projects competitions provide similar benefit to the average summer analyst internship? i’m thinking i could do this and then go for internships during grad school but i’d love to hear some other opinions from people with careers in the field i was in your position about 8 or so months ago and the way i got a relevant internship was by way of looking at what i want to do almost think of an ideal job and look backwards from there in what you should do to get there look at full time postings to see what you need to develop in your skillset projects on your resume are probably the best bet to show things also have a blog post about datasets you worked with having a project shows you can think of a problem and apply a solution or look to develop a solution in summary having projects and even just emailing people helps to look at potential places intern supply is a good place to look at companies that may be hiring thanks for the advice nailing down a specific interest is definitely a goal of mine even just going through the process of applying has been useful you learn a ton about relevant skills by searching and that’s what i was thinking with projects! how many applications did you send in for the internship? and what kind of experience skills were you offering at the time? it’s difficult to remember exactly but i’d say around 100 or so 50 from my school portal and 50 externally at the time the relevant skills i knew were python sql and statistics (was taking it during the school term) if you wanna see my resume from the time pm me and i can show you i am a economics and applied math major and am in a similar situations as the op i would like to look at your resume if you don't mind i got an internship in business intelligence at a reputable company some of the projects they said interns worked on seem like data science (not hardcore modeling but gaining actionable insights from data) despite not being in the data science department (which does exist at the company) do you think an internship like this would look good or is it almost completely dependent on how the project and company are displayed on my resume and how i talk about it i was told business intel positions are a good step into data science for sure depends on how you project it but i’d assume most business intel positions use tools that data scientists also do that's what i thought definitely makes me feel better to hear someone else say that the thing that competitions and most personal projects generally don't you provide you is practice with the business-side of data science (picking good problems translating a vague business goal into a specific data science question effective salesmanship of your work etc etc ) which is imo the most crucial part of being an effective data scientist this is something that tends to be very difficult to replicate outside of working within some organization imo that's why internships and certain personal projects are the things that stand out most to me on a resume when i'm looking to hire new grad data scientists if you can't secure an internship you might try to supplement your independent studies work with some reasonably data science-y volunteer work for some organizations that need some random analyses done that said not having an internship isn't the end of the world especially since you're thinking about grad school next you'll definitely have more opportunities to lock down a solid internship then so don't stress! thanks for the reply! one of the projects i’m working on involves marketing for a local nhl team so i’m hoping projects like that can provide the kind of experience you were talking about do you have any advice for finding people non-profits or whoever who would need that kind of data-y help? it seems like anything would be beneficial whether it’s informal or not it's very important it's like a trailer of your work before your prospective employees see the movie i e hire you yeah i understand the importance i haven’t had much success yet though so i’m looking for other ways to have a productive summer if i cant lock down work somewhere i feel like boosting skills and working on projects could yield similar benefits no? have you spoken to your professors? 100% of my students get internships through me i am generally able to redirect them to appropriate organizations with whom i have contacts they don't go through the usual interview process but more what an idea data science interview process should be - a talk and an actual coding task for me it was really important i was in college just as the title data scientist was blowing up i did 3 data science internships(sophomore to senior) and those gave me hands on experience with technologies like spark hive the hadoop ecosystem d3 js and other things that made interviewing after graduation substantially easier i ended up coming on full time at my last internship and i've since transitioned to another ds position i don't think being where i am right now with only a bs was possible without my internship experience an important thing to consider about going to grad school is recommendation letters most us universities require a minimum of 3 and while the professors at your university might give you a recommendation a dwic(did well in course) recommendation has less weightage so if you want to go to grad school immediately after graduation either go for an internship or take up projects with 1-2 professors at your university so that when you ask them for a reco they can write about more than just your performance in the class hope this helps! i was in your position once my suggestion would be to talk to professors you know and ask them about opportunities they know about in industry even if they aren't statistics professors did you take a cool geology gen ed? ask the professor if he or she knows about any opportunities for an aspiring data analyst in geology you're an econ major so you have a leg up on a lot of statistics majors i'd look at some econ opportunities in industry but also be willing to look at academia are any hedge funds or consulting companies hiring economics interns? can you do econometrics work with a professor? your economics background gives you a lot more opportunities with a specialization to boot i know this question comes up quite often should i go to "xxxx" bootcamp or go to this graduate program i am currently in my first semester of grad school for data science and this is my take on the question before i started grad school i took some of the on-line stuff to get myself ready i went through khan academy for stats i did some courses through udacity datacamp and even some lynda prior to this i went through a boot camp for security plus my experience is the bootcamps and other online courses are great for teaching the "how" how do you run a stepside regression in r or how do you import a csv from the security plus it was memorization of answers and test-taking strategies these are worth while but for me grad school seems to be focused on the "why" for example yesterday in my stats class we spent 5 minutes on how to format forward backward or stepside regression we spent the next 2 hours discussing when why benefits dangers of these models why do some people like them why do some people say to avoid them it was all about the "why" in another class we focus on building models once again about half the class is focused on how to build the model in the application we are learning the rest of the time is focused on how to use the model what types of problems really work well on this model and why when should you avoid this model? once again "why" was the centerpiece of the discussion i am not saying to forsake all bootcamps and online training in fact this summer i plan on taking a couple to learn some more technical skills so i can apply some of the things i learned this semester i think each has their place based on your goals and your expectations i think they both work really well if you are in a position to do both i've been going through datacamp's python career track these past few months and i'd agree they're good courses to get the general understanding of how to use different techniques and technologies but applying them yourself is a different challenge basically you have to make the most out of them sure you can bang out the courses quick and say you now have the skills but in reality it's working on your own projects and using these courses for reference where the learning really happens i've considered going to a graduate program and figured going through datacamp would be a good start to see if i want to continue learning i think that is a great way to work out if this is a field you are even interested in i am seeing people do want to get into data science because of all the talk about how much you can make i want in because i really like looking at data and extrapolating meaning i have been having a great deal of fun playing r different databases and making models i know i am a sick individual yeah doing your own projects is consistently the best way to learn for most people basically - this is why companies value job experience the most above all interesting topic in my country i studied law before being transferred to the us where i currently live here i started an undergrad in software engineering and as i’m progressing to the classes i’ve been focusing on what to do after i get my degree (mid next year) i’ve been very interesting in various master programs (so far i talked to uh and uc berckley) as well in going to a bootcamp (so far i talked to the guys at woz u) so your input as well those who went to boot camps are very welcomed! whats your take on online data science masters programs? worth it? i think it comes to the reputation of the college if you are taking it from phonix university then i would say avoid it i saw one from the university of wisconsin that looked promising but before paying anyone money i would really take a look at what they are teaching and the requirements online by itself is not good or bad but you have to evaluate the provider same with in person if i get a data science degree from the university of american samoa i could be missing out as opposed to someone who got a degree from a known research university once again these are just my opinions i found a school i am interested in they use python r sql etc it's a small private non profit school but looks good enough for me i am not an engineer by any means in fact i am an accountant (sitting for cpa this year as well) but i really want an accounting and data science combo cpa here looking into data science myself what school did you find? i’m signed up for general assembly but i know it’s not the final answer sorry for late reply - bellevue university tbh university won't necessarily teach you either it's the work you put in outside of class that makes the difference it's the extended time you do at university and the fact that you're lead to the information that's deemed relevant that makes all the difference edit: but you still kind of need a university degree so you might as well that said as well you do pay for it and you have a bunch of other clones just like you in terms of what you get it's probably still worth it but still (it's worth noting that udacity is industry-recognised that's not the same as taking a course on udemy ) it's give and take really you have to play with the hand you've drawn no matter where you are hello everyone! i am a data scientist at satalia i would like to share a data science workflow that blends exploration and production this workflow is an attempt to bridge the gap between exploration in data science and productionisation in software development what motivated me to create this workflow is the lack of a specific training in software development that data scientists may have a lack of software development skills typically leads to scripted and untidy code that is not modular in turn software developers (and sometimes data scientists) have a hard time productionising such code that's why i created a data science workflow with the end-product in mind the workflow is an adaptation of methods mainly from software engineering with additional new ideas i have tested the workflow with colleagues and friends but i am aware that there are things to improve so it would be nice to have some feedback from you thanks! here is the link to the production data science workflow: https: github com satalia production-data-science cool there's also microsoft's team data science process https: github com msftimagine microsoft-datascience-process i contribute to an open source project called quilt that allows to teams to version share and discover datasets and models as part of the workflow https: github com quiltdata quilt quilt is pretty cool i came across it some time ago when looking at safe ways to handle data that is updated on a quarterly basis i haven't used quilt personally but i should definitely give it a try how well does integrate into the google cloud infrastructure? i wasn't aware of the microsoft's team data science process i skimmed through it and it seems really useful i will have to read it carefully the part that should be adapted to satalia the company i work at is roles and tasks which reflects microsoft's hierarchical structure the reason is that satalia is a decentralized and self-organised company aiming to have a flat hierarchy where decisions are supported by data and consultations with peers so we wouldn't have a group manager or a team lead as with microsoft the decentralized nature requires each peer to support each other and this behaviour is reflected in the production data science workflow the quilt registry is open source with some pretty minimal dependencies on s3 so as long as you have access to containers in gcp you can run your own registry we're looking at https: www minio io to abstract away s3 as an object store so that you can run quilt in any docker compatible cloud of course if you don't need data "on prem" then the public registry will work for you feedback and issues on gh are welcome the google s3 equivalent has basically the exact same api as s3 so it’s pretty straightforward hi thank you tons ive been looking for something like this for a while 99 9% of the tutorials and classes ive seen in this space do not include this aspect of the data science process i'm glad you find this workflow useful let me know if you have any comment on something you would do differently or improve thanks this is good stuff the testing part is what i needed thanks! this is really good and useful plenty of details and what i was looking for lately thanks! part b is useful for all people enthusiasts and professionals very well documented workflow thank you about me i am currently a candidate for a msc in data science i started the program today and it is a two-year program this past summer i was hired as a data analyst intern at a startup although i was not officially data scientist intern i did do some machine learning while on the job and implemented algorithms at various points i don't think there is much difference between a data analyst internship and a data scientist internship within a tech startup i do not know how data analyst interns operate in accounting firms banks etc the interview after a phone screening i was invited to interview at the office the first part was a behavioral interview with the data team lead the second part was a technical interview with the data scientist q1: explain one of your projects on your resume to me as if i am a client a: the best way to answer this question is to be as vague as possible clients non-engineering members don't care about the algorithm you used or the results of the confusion matrix what they do care about is how these results will affect their profits or their company in general don't talk about the decision tree implementation talk about how the significant factors you found that led to the classification of the predicted value q2: design a retail book grocery store relational database a2: basically you want a table of all the transactions the items and their prices customers each with keys that can join to the other table i was also asked to limit the tables join the tables order a table and group by a table we ran out of time but the next part was ctes the internship as an intern i was asked to do a variety of things however my most used skill while at my internship was web-scraping i developed scripts to scrape websites for a variety of information needed to make the company's current information more useful if you do not know how to web scrape i highly suggest you learn it i personally use beautifulsoup the data team was very small so i had the opportunity to work on all types of projects including working with pdf and free text data therefore i also learned how to work with unstructured data this is a pretty daunting task and if you ever end up having to work with this form of data i highly recommend keeping your bosses in the loop i didn't and finished it with only a day to spare which was not ideal i also worked with the data scientist to productionalize and augment the current models while the data scientist worked on more pressing matters i was testing different algorithms and examining outputs in forms such as confusion matrices lift charts auc curves etc conclusion overall i found this experience enjoyable although working at big 4 companies is ideal i think working at start ups in big cities like new york boston silicon valley etc is also super valuable being on a small team gave me some more responsibility than an intern would get and although none of the work i did was critical it still provided value and left me in good terms with the employer if there was anything anyone else wanted to know that won't remove the anonymity of the company please ask i saw some questions about people asking about data science internships an i wanted to give my two cents about my own "data science" internship hey kudos to successfully getting on the right track my question might be kinda general so i'm a first year ms cse(computational sci eng) student at gt i'm overwhelmed by the theoretical courses with little time to do data science projects the career fair is approaching yet i don't have a presentable resume moreover my friends keep telling me how hard it is to get the recruiters to talk to you if you say you want to do a data science internship so i want some tips on how to be more efficient in finding the right intern positions as an entry level data scientist? what do you mean finding the right intern positions as an entry level data scientist? does that mean you want to be a data science intern or you want to find the right internships to become a data scientist? full disclosure: i am a first year msc candidate in a data science program so i don't have any real work experience yet so i found my internship because this company was specifically looking for a data analyst intern for the summer i guess if you're trying to beat around the bush ask the recruiter about their analytics team a lot of companies will have a different name for this team i e knowledge team analytics team data team etc look it up and then press the recruiter about it ask about what type of work the team does show that you're interested in what they do hopefully the recruiter gives you his her business card and you can email him more about the data team hopefully the team gets wind of this info and you can get an interview again be sure to brush up on sql and do some projects i know school is tough but you're gonna have to do some self learning until you can take a machine learning course also honestly the basics of data science are not hard to pick up hope this answered your question thanks! i mean i want to find the right internships to become a data scientist for machine learning i have taken the andrew ng course and watched about half of caltech ml course for sql i've taken intro to database course on campus i also have experience using numpy pandas scikit-learn however these skills don't seem to satisfy the practical skills part of the job descriptions :( you get practical experience from doing projects it's like riding a bicycle you get that rotating the peddles gets it to move but you won't actually know how to rotate the pedals till you try i think i always know projects are the only things that are gonna take me out of this situation but i always find excuses from schoolwork to postpone doing "side projects" i really need to change that thank you! [deleted] i had a minor in computer science and had one year of experience using python i also had a couple months experience using r in terms of my day to day work while at the internship i had taken a data science class at my university where i learned most if not all of the information i used at my internship in this class i learned about pandas dataframes unsupervised learning supervised learning regressions mapreduce etc that class also taught me how to self learn since i was essentially thrown into a graduate level course without much technical background before that the best piece of advice i can give is know how to use stack overflow and read documentation what are the demographics of our msc cohort? i just applied for an ms in data science on friday but i have to wait till jan to find out if i'm in hoping to get a sense of what sort of folks i'll be around my program is roughly 50 people and ~40 are from india ~5 are international not from india and ~5 are u s graduates don't let that deter you! from my experience most technical masters are filled with international students and they're generally really nice people ah what i meant to ask was about specifics such as age range gender ratio types of backgrounds likability factor relative to other social groups that sort of thing basically are my hopes of meeting a top-notch woman a pipe dream? haha uh so in terms of age i guess everyones in their 20s a lot of them have technical backgrounds (engineering it computer science etc ) i can't nt really tell you about the other stuff since i'm three days into the program thanks for sharing this i recently found a job at consulting firm just emerging in the ds realm and my experience echoes yours tremendously you brought up some topics that are rarely touched here on but have a significant effect on my day to day namely: the relative importance of business impacts to technical feats (your q1) the difficulty and frustration of working with unstructured or very messy data the benefits of flexibility and more general computer knowledge (knowing how to web scrape web dev) good luck with your degree! yea i found that a lot of people didn't have much direction in becoming a data scientist so i want to share my experiences with the sub especially with ds gaining more traction my experience will be different than a lot of senior data scientists that's awesome though! i think i want to work at a consulting firm as a data scientist to see what industry really suits me hi thanks a lot for posting this what type of projects have you done before getting the interview? i have a masters in financial engineering with a strong theoritical background and i am now doing kaggle kernels to practice show off my skills do you think it's enough to get an internship in ds? doing kaggle is awesome to generate good experience especially since you'll be comparing yourself to other people i personally have not done a kaggle competition though i do recommend them i did however use datasets from kaggle and ran analysis on them ive looked at the mental health in tech dataset and the 2015 american football play by play dataset there are definitely others on there worth looking at i think what you're doing is great! be sure you can explain your project to someone and think of abstract situations where that data could be used in a business setting i'm a recent graduate i have started working on machine learning skills and data analysis with python i used kaggle and i can work with all the basics whatever is given there like titanic and house regression analysis what i find daunting is how to start a project where to look for projects that could interest potential employers and how to begin? can you give any insights? and also do you think it's necessary to nail all the maths behind all the algorithms? projects: honestly i prepare my projects through questions if i'm sitting in my room and wonder about something i go and gather the data i need to answer the problem i dont really worry about what employers want to see or not see especially at the internship level brandon rohrer a data scientist at facebook once told me at a q&a "i'm not really concerned about the project itself but the passion the person had the project " if you care about your work you're going to do your best to get that work done and develop quality skills this is also why i think web scraping is vital you can gather data even though it may not be available in a json or csv and do analysis on it i dont think getting all the math behind the algorithms down to a t is vital especially for an internship if you know how it works and are math-literate you'll be fine in industry it's rare you see someone making their machine learning algorithms thanks for the reply also what do you recommend learning for web scraping? i use beautifulsoup4 for python! is it in india or us? us what was your bachlor in? neuroscience thanks for sharing your experiences very helpful i wonder what it's like to work with other colleagues of other departments (e g product team or it) i ask this because i'm curious about what good inter-departments collaborations look like when still being a data science intern so i mainly spoke to the data team lead who would generally answer my questions for me i did not sit in on meetings with product sales etc i was simply told what the teams discussed in those meetings if it pertained to my work i would present my work (if needed) to the other teams but that was it looking for some advice to get myself into the data science analytics industry i've been unemployed for 4 months since graduating from a 3-month data science bootcamp haven't gotten any traction in any of the ~70 applications beyond the first interview during the interviews i have had the lack of any real experience always arrises "only a bachelors?" is something i've heard from every interviewer i also have had no internships long-term contracts or any previous technical experience i worked restaurant and coffee shop gigs to get my way through my degree the first 2 months i only applied to companies in my state but now i'm applying to anywhere in the continental us; using glassdoor to find jobs that match my qualifications data analyst jr data scientist bi analyst or anything that sounds like it's in the industry is what i apply for every cover letter is crafted on the spot and i rarely reuse a template to avoid sounding like a robot here's an anonymous version of my resume i feel as though that my plethora of relevant skills well crafted cover letters and fleshed out portfolio should land me something but here i am still without a lead on any type of job been doing a combination of living in my car and staying in my sister's basement for the past few months but they're moving at the end of the year and i'll be forced to go back into homelessness i'm pretty desperate to get something going before 2018 and i'm willing to relocate anywhere so can the more experienced folks here be exceedingly blunt with what's wrong with my resume and job search tactics? is my gpa of 3 48 too low? did i miss the data scientist train? what jobs am i actually qualified for? much appreciation for any words of advice one thing i want to mention about your resume is that you probably put too much on it in terms of skills like seriously it just looks like you're just putting a bunch of keywords on there i really do hope you delete them to tailor for each individual job posting otherwise i really like your cv (your gpa may or may not be an issue but if it is then you can try just not stating it? also i hear that some sort of a personal statement might be useful at the very least some bullet points mentioning what you've done what you want to achieve and what your specialities are i agree the skills section is really overwhelming i think someone looking at that resume would think you just listed anything the person ever tried rather than things you really are competent with i guess following on from your point what op should do is to choose the skills which he wants to develop further and keep those that way he gets hired to do something he enjoys (another way of looking at this as well op is that if you give people overwhelming choice then they become less happy with their selection this might not be a complete 1:1 match but you should see what i mean this would be a real turn off for employers imo ) you might have to do some grunt work and backdoor your way into things like i did look for business systems analyst jobs and build from there they for sure don't pay as well as the other jobs you've mentioned but it gets you in there with the data from there the world is yours btw where are you located currently? for business systems analyst jobs would you suggest learning something like tableau? i've been considering it lately i bounce around between denver and colorado springs sure tableau and power bi are both used in a lot of companies practical experience is going to help a lot no one will be saying "only a bachelors" when you've got 2-5 years behind you there's a regular meeting in denver of tableau users maybe you could do some networking there go to networking events and meet with people it's one way of introducing your skills and qualifications to employees of companies that may want to hire you also you can reach out to data scientists at companies to bypass the hr filter i've done a bit of reaching out to other data scientists via linkedin not a lot of success since i feel like i'm reinventing the wheel whenever i try to talk to them any suggestions on breaking the ice with professionals i've never met? thanks for your suggestion by the way i think they mean to go to conferences conventions and things like that linkedin is important but apparently conferences are the shit if you're in boulder there are absolutely "young professionals" things on meetup and the like you should also be going to your school's career center and finding out about any kind of job fairs they have if they know of any networking opportunities etc you need to embrace the reality that you need to grind things out you're giving the impression that you're kind of taking the easiest possible path - sending messages on linkedin is kind of the laziest and least effective way to network for example i just spent 3 seconds googling "data science professional organization boulder" and got multiple hits for data science meetup groups in boulder and denver why aren't you attending those physical events as opposed to sending messages on linkedin? why haven't you joined the digital analytics association or informs or any of the other numerous data science-y professional organizations? if you are seriously trying to develop a professional network you need to get off of a computer and start shaking hands hey no problem i think you can go to meetup events within your city there normally should be a decent amount of data scientists you can talk to from my experience many dss either don’t respond nor check their linkedin after beginning employment make sure to know something about the general topics within data science so you can ask intelligent questions about their research and show that you can be a potential contributor it's been suggested but go to meetups there are tons of them in the denver area - seriously you should be able to attend several each month - network enjoy the free beer and pizza also it may be worthwhile to search for local slack channels - there are several you can always try temp consulting gigs to get more experience - e g teksystems in denver or something similar is a great place to start building out your resume given your level of experience i would play down data science in favor of something more along the lines of a data analyst (consider ba or qa if you don’t have the applications experience to sell yourself in bi) thanks! i'll try out teksystems applied there 3 months ago but i should have a better shot now your portfolio section needs a lot of work i want to see the outcome rather than what you did especially about the small business you worked with what were the actionable insights you gave them? haven't gotten any traction in any of the ~70 applications beyond the first interview your resume and cv are fine - they got you interviews how did your interviews go? also where are you located? if you're in boulder there should be reasonable opportunities around but it isn't sf or nyc so temper your expectations appropriately generally speaking i would strongly suggest you learn web development in the meantime when interviewing for ds jobs those projects can be "ability to build dashboards to share insights with the team" but it also opens the door to get a web dev gig not as sexy maybe but pays great and there are many more web jobs than ds jobs your resume and cv are fine - they got you interviews thanks! good to know in the web development field i'm assuming you're talking flask? i was considering making myself a tiny webpage to display my portfolio more professionally https: insights stackoverflow com survey 2017#technology-most-popular-languages-by-occupation flask is probably fine in an ideal world i'd probably look at django if you're python-focused in an ideal world i'd consider ruby rails however you slice it javascript is basically mandatory editorial which is a shame because it's a garbage language editorial i've tried getting into js a couple of times hate the shit out of it i don't work in programming and what little ds i do it's r + python but the more i read and the more i read it looks like it's inevitable to have to get into javascript i got a course waiting for me in udemy to at least get me to basics but i'm like king procastrination about it check out the edx course by pennx on javascript https: www edx org course programming-web-javascript-pennx-sd4x also to answer your question about the interviews they go well until we start discussing my background at which point it's hard to say that i'm a "data scientist" without ever having anything close to the title in my work history because you aren't a data scientist scientists have published peer reviewed contributions to human knowledge and self directed large projects for years with limited (or no) supervision you're a keen undergraduate who wants to start out at a more senior position you can get there by having experience or by backdooring somehow or by going to grad school take the advice from others in this thread that's because you aren't and never have been a data scientist nor are you qualified to be one put simply as an undergraduate you have neither the training nor experience to even properly parse research to translate it into practical applications let alone conduct your own research that doesn't mean that data science can't be in your future but that's a future with either significantly more education or signfiicantly more experience and very probably both you a college graduate looking for an entry-level position working with data what tool did you use to create your resume? canvas but i'd redo it in latex if i had time i think the 'skills' section takes up too much space in your resume in fact i believe that the laundry list of skills is unnecessary in any resume i recommend allocating more space for your projects for each project allow enough space for 3 bullet points preferably including measurable deliverables e g 'project accounted for x% increased bookings' i'm not working us so i'm unsure if this is standard practice there keen to hear what the rest of the sub thinks regarding 'skills' also press on and don't give up! do some data science in your resume first get all your data put somes sentences about what you did in your jobs and projects all courses you took more is better look at the company requirement find the connection between your data and the requirement put it nicely in your resume apply in different areas of the country especially those with emerging data science markets: florida georgia etc if you're willing to backdoor your way in through grunt work check out the fdm group so you probably aren't getting past an initial screen most of the time your unusual resume format while interesting to look at isn't helping you 1 of 2 things happen when you submit of resume: your resume is automagically converted into plain text and entered into an hr management system or an actual person spends around 30 seconds scanning your resume to see if you meet basic requirements if there is a program parsing your resume for automagic conversion to plain text the unusual format is likely causing problems and errors likely resulting in a mishmash of blank fields and inaccurately converted information in turn resulting in your getting passed over because no one is going to take the time to sort that out if someone is actually looking at your resume they are likely spending a handful of seconds on it and the unusual format is making it more difficult for them to identify key information an hr screener is not interested in how unique of a butterfly you are they are going through dozens or hundreds of resumes and anything you do such as utilizing an unusual format reduces the odds you make it past that initial screen so stop using your visually interesting but practically damaging format use a traditional format you need to make it clearer how you have utilized the many skills you list it should be clear to anyone reviewing your resume how your skills have been utilized to accomplish something in other words if you are listing a skill you should be making an explicit reference to that skill when detailing your experience if you list "python" as a skill then you'd best be backing up your claim to having that skill with some kind of example in your experience to that end your description of projects should less be a high level thematic overview and much more of a "i utilized the scikitlearn package in python to perform a logistic regression analysis on selected reddit comments gathered via a self-constructed scraping program written using beatifulsoup4 " phrases like "using nlp techniques with machine learning algorithms " mean absolutely nothing what nlp techniques exactly were used? what machine learning algorithms? you need to be aware that no one is going to be looking at your github projects until they're trying to decide between hiring you or one or two other people simply put you aren't important enough for anyone to waste their time going through that until they're already seriously considering offering you a job that isn't going to happen until at least a first interview so you need to be focusing on getting past the hr screen and getting past the first interview if you list experience you need to have more than just a title and dates you need to be using that space to demonstrate how that experience is relevant concisely explain what you actually did (don't just list job responsibilities) and ideally be able to show how your work had an impact on something (preferably quantified) right now your resume reads more like you just tried to come up with every data science buzzword and put it on your resume as opposed to a summary of demonstrable abilities that's bad it makes you come off as full of shit i know it feels like you have no relevant experience and that might be panic-inducing but here's the thing employers aren't stupid they realize and understand that for an entry-level position (the only types of positions you should be spending time applying for not data scientist positions or senior positions or anything that asks for more than a year of experience doing anything) good applicants might be fresh out of college with little to no experience and they're ok with that don't be afraid to embrace your lack of experience as being a tabula rasa emphasize your willingness and ability to learn and improve and grow as a final note you need to get a job it doesn't have to be something you put on your resume but it can make some employers hesitant if an applicant has refused or appears to have refused to do work that they consider "beneath" them to many employers it speaks to certain negative character traits that they likely find undesirable and i say that as someone who has been on the hiring side no employer is going to look down on you for making sandwiches at subway while you're applying for a "real" job but many employers will look down on you for not being willing to grind it out while you are looking for the type of position you really want so to recap: 1) your resume format while visually appealing is likely interfering with systems that convert to plaintext and likely making a manual reviewer's job slightly more time consuming both of these lead to reduced odds of making it past an initial hr screen use a standard format for your resume 2) you need to be able to justify most if not all of the skills you list as having with some actual use case this goes hand-in-hand with not expecting anyone to actually read a github project page until you're being seriously considered you need to be sure that if you list something as a skill you have that your resume justifies your claim to having that skill and your description of your personal projects should be less "here's the concept of my project" and more "these are the specific and explicitly stated tools and skills i utilized this is how i used them and this was the outcome " 3) find some kind of work even if it's not something you're ever going to put on your resume an employer is never going to think poorly of you for working a job to pay the bills but very well might think you're lazy entitled if you choose to be homeless instead of flipping burgers i don't know us standards but your cv template is pretty bad what is your success rate with it? for any business related job use standard template work experience should go first and be detailed skills section is a joke without a proof of proficiency i could go on but cv as whole is bad regarding why u cannot get a job ask always for feedback don’t know why you were downvoted this resume format is really bad regarding why u cannot get a job ask always for feedback this is not very common in the us fyi there is virtually no benefit to the company and it opens them up to potential litigation if they say the wrong thing as a data scientist perhaps you might take the implications of the observed empirical evidence seriously even if they do not agree with your desired interpretation go to the best graduate university program you can that's exactly what i'm investigating by having this post here but thanks? i've been hunting for an entry-level data science job for a couple of months now and i've noticed a dozen reoccurring archetypes that pop up when you're perusing job boards feel free to correct my bewildered first impressions or offer your own sage advice the rockstar job posting has more acronyms than real words says it's entry level but you've never met anyone with that many skills the spreadsheet master excel vba access sas lists "microsoft office" as a required skill making you suspicious the statistician six sigma k-means a b testing you aren't confident enough in your math skills to bite the reporter google analytics d3 tableau web dev in charge of pretty pictures and weekly meetings with the suits the database manager sql and sql accessories most likely to deliver a "you're overqualified" rejection letter the data analyst python r sql etl linux scripting like a software engineer but with fewer code reviews the business financial analyst requires a degree in accounting not really what you're looking for but keeps getting mixed in with your job search results somehow the proprietary job uses a bunch of strange tech specific to a niche field gives off a "non-transferrable skills" vibe the government job requires a security clearance polygraph test and seven years of references probably in washington dc the healthcare job sas ssis weird filetypes not really sure what the client wants to know about the data the enigma lists no technical skills only a series of company values and a "data" title the data entry monkey requires a high school diploma says "analyst" but lies the researcher better have a phd before you even think about it and a dozen published papers in research journals that no one really reads but not arxiv because that doesn't count you missed the researcher - must have phd is machine learning or related don't apply unless you have one or more published papers in icml nips etc done and done what's wrong with arxiv? submissions to arxiv are not peer reviewed i don't know the screening process but i think they only reject 'inappropriate' submissions number seven gave me a good chuckle seen lots of those i still read the ones with that sort of name though as a promising role with the title financial trading analyst came up recently also like a software engineer but with fewer code reviews :) data analyst made me chuckle feel like our code reviews are too often: can it auto merge? cool review passed i'm in healthcare i did not know that ssis was our thing probably just an aberrant pattern then or a great and terrible curse that has been lain upon your people 'bout 50 50 i'd guess it's a curse that's one thing i'm sure off hah spot on this would be fun as a comic love it! i laughed so hard i'm hunting for jobs right now after my masters and i needed this this might be somewhat premature but: so i'm a junior in college and i was lucky enough to land a data science internship this summer it's at a mid-size (100-ish employees) tech company with a team of five data scientists where i'm the only intern on that team i want to know how i can crush it in my internship i'd love a return offer but also want to use this opportunity to develop work habits and skills that'll get me started on the right foot in my career i've only had one internship before (in software engineering) and that was at a tiny startup with literally 2 employees and me i've never had to navigate a company with actual office politics routines etc how do i do a good job? i'm mainly worried about my people skills and communication although i am also interested in technical advice if any of you are employees who've worked with interns i'd love your thoughts exhaust your resources (internal documentation stack overflow wikipedia) before asking questions this shows you’re self-sufficient and a value add not a time sink no employer will convert a time sink into fte when you’re unsure or stuck and asking for help show you’ve done your due diligence (ex offer suggestions and opinions backed up by research and personal experiments) mentors want critical thinkers and go-getters document everything and follow your team’s style conventions it’s common for mentors to design intern projects that take ftes 1-4 weeks to complete if your work is up to standard and re-usable everyone will remember you because you just saved them time this is great advice i had similar advice from my first mentor "never both your manager with problems if you have to come with as much as a solution as you can" i know this is kind of obvious but a good idea might be asking them beforehand about technologies programming languages etc they use so that you can brush up on your knowledge before you start there (you can just email the person who'll be supervising you with "hey as you know i'll be doing this internship this summer and was wondering if there's anything i can do to make me sure i'll be ready to dive right in when i start - maybe you can let me know more about the programming languages and technologies you use and if you already know about projects i might be working on so that i can prepare a bit make sure my knowledge is up to scratch?" (please keep in mind that i'm from germany so this advice works well for german companies but due to cultural differences i'm not sure whether it's applicable wherever you might be) also while it's definitely true that you should show that you're self-sufficient you should also not wait too long to ask a question like if a thing you think should be simple takes you a day that's too long and you're probably wasting a lot of time and your supervisors might be wondering why you're not getting anything done so set yourself a time limit for figuring out stuff and if you're still stuck when it's up just go and tell someone you're stuck and make sure you write down the answer so that you can look it up if you run into the same problem again and pay attention to how they worked out what to do also make sure you find a way to document things for yourself - it's annoying to have to look up the same thing again and again and it will get annoying for others if you ask the same questions again and again try to stay out of office politics if you can about getting to know your coworkers: what worked for me at my current place of work was honestly just going to the cafeteria one or two times a day to get coffee tea lunch because that seemed to be where everyone was hanging out and where we had some brief conversations (but it's difficult to figure out how much time to spend talking to people - you also don't want to come across as someone who's more interested in gossiping than work ) but in general if there's no cafeteria or something like that at your place maybe just ask people who go out for lunch what's a good place to go to for lunch and maybe they'll invite you to join them? i think i have an experience that is very similar to yours im a junior in college who picked up a part-time data science internship last semester at a company of a similar size to yours with a data science team (including data engineers) similar in size to yours i've been working with them since then and will be interning with them this summer; so far my work has been to build fraud and customer lifetime value models and dashboards prior to that i worked in cnn research with an extremely small startup where i was essentially siloed to my own project and had little contact with the company beyond my manager as a result i also walked in with little understanding of how companies really worked i wish i had the foresight to make this reddit post last semester the single best thing that would've helped me prepare for the internship would've been a strong understanding of sql my company stores everything in amazon redshift and im almost certain yours does too i thought that i could wing it with no knowledge beyond select group by and filter and i was dead wrong oftentimes when pulling together data for modeling i had to use window functions multiple subqueries etc which is not at all hard to learn during free time but was so frustrating when i couldn't do it during work since you have experience in software engineering this would be easier for you another basic thing which was a bit hard for me to understand but would've been really easy to learn and helpful was understanding scrum before i came in apart from that i think it's really important to keep developing a deeper understanding of the business real world data is incredibly messy and it takes domain expertise to parse through it all; there have been many times where i made an assumption from the data which turned out to be unfounded causing me to waste time getting to know my fellow workers really helped me there because i felt comfortable asking them questions all my projects have required frequent discussion with company personnel on various quirks of the business or the business data and i would not have gotten anywhere without drawing on the business understanding of the people who had been there for much longer than me moreover i would not have been able to design models or dashboards that were suited to the use case if i didn't have a clear understanding of everything that's going on that said a big mistake i made during the internship was not being mindful of my manager's time if it's not urgent it's best to ask a question using a nonintrusive medium such as slack or email and if you think a lower-level employee knows the answer to your question it's best to go to them instead i think that as long as you are clear in your communications show your manager that you are adopting a scientific mindset and make efforts to identify and quantify the value you add the company will appreciate you a lot! the single best thing that would've helped me prepare for the internship would've been a strong understanding of sql interesting i know almost nothing about sql and while i did study up on a bit for my interviews it never came up i guess i'll try to quickly learn about the company's stack when i get there but from what i gather they're a python shop which should be no problem for me what's the learning curve like for big data technologies like redshift? i'm currently struggling to learn google cloud because i got some credits and am wondering if it's worth even worth my time keep developing a deeper understanding of the business yep totally with you on that getting to know my fellow workers really helped me dumb question: how do i actually do this? should i ask them out for lunch coffee? should i just try to be around them as much as possible? big mistake i made during the internship was not being mindful of my manager's time good call especially with lower-level employees no sql? that's interesting i guess they assumed you'll be able to pick it up on the side what's the learning curve like for big data technologies like redshift? redshift is just a typical database it's not really in the realm of big data where you have to learn additional technologies like spark how do i actually do this? should i ask them out for lunch coffee? should i just try to be around them as much as possible? that's what i did what was your job search process like? i’m looking for a data science internship myself i was looking for data science positions at startups quantitative trading positions at proprietary trading shops and data analyst positions at big very reputable companies i consider these all data science so i'll take each case separately for explicit data science positions i had to do a lot of assignments they were usually testing whether i had the ability to process data using their data infrastructure (e g sql test one company used xml (??)) and then took a look at my ability to do an open-ended data analysis bigger companies tended to look more at my ability to process data using python i was asked to write programs that take a data input munge it and return a data output quantitative trading firms seemed to only care about my probability and statistics knowledge so their tests were the most academic and mathematical hope that helped yes extremely helpful! thanks hi everyone i browse this subreddit quite a bit actually it is the first thing i do when i get to work in the morning i see a lot of posts about data analyst positions and i thought i would share my current experience one month in a little about myself: i graduated from a very good small liberal arts school not super well known for its science and math programs but i graduated with a degree in physics and a minor in math my math minor which included probability models applied statistics in r mathematical statistics (proofs of t-test and other things you never want to prove like the rao-blackwell theorem) and bayesian statistics with r i chose work over grad-school because i was unsure exactly what field i wanted to go into and did not want to spend money on a degree that would not further my career i think the biggest thing that set me apart when i was interviewing was my demonstration that i can learn on my own personal projects were something that interviewers were very curious about and i was happy to show them a web-scrapper i built or a website with mysql developed in django a little bit about my job: i work as a marketing analyst for a natural gas company trying to bring insight into different strategies and predict a variety of different things this means i do all the sexy things like cleaning data cleaning data and cleaning data i currently use python and r for a lot of my data manipulation our company actual uses powerbi for visualization and reports and it is horribly slow but very free coming out of undergrad i was excited to dive right into applied machine learning algorithms and tried to learn as much as i can about that but i think the most important thing is curiosity and wanting to learn about the actual business itself it is impossible to be a good data analyst if you don't know why the data is being presented to you the way it is in order to answer interesting questions you first need to be able to ask them and that all starts with playing with the data and visualizations (eda!!) not just throwing a random forest regression svm or naive bayes algorithm from scikit-learn overall it has been a great experience my math training has been the most useful to me so far sql skills have been slowly improving and i have yet to use any bayesian tricks which i thought was the coolest thing i have every learned except for solid state physics please pm with questions i am happy to share my resume i just recently went through this process and would be happy to talk about my interview experience and all the times i had to fail in order to get the job i have today learn how to make reports and try to enjoy it because it is going to be a big part of being a data analyst i am going to work on some data pipeline stuff using luigi in python so i would love to hear if anybody has suggestions or things to watch out for let me know! just wanted to say thank you all on this subreddit i never learned about training imbalanced classifiers in school and i was able to create and properly train my model thanks to y'all for those wanting to know how to say this concisely: but i think the most important thing is curiosity and wanting to learn about the actual business itself it is impossible to be a good data analyst if you don't know why the data is being presented to you the way it is i believe the term is "domain experience" thanks for the inspiring post! domain knowledge sometimes referred to as subject matter experience as well the passion is a really big deal too when watching for talent that's like a big red beacon saying 'pick me!' this means i do all the sexy things like cleaning data cleaning data and cleaning data nailed it i am going to work on some data pipeline stuff using luigi in python unless your company is already using luigi i'd strongly recommend using trying out airflow instead it is also python based and imo much easier to extend pipelines and reuse pipeline components in the last year airflow has become an apache incubator project and by the numbers has a much more active development community than luigi +1 for airflow seems to have good momentum behind it even if the documentation is still a bit limited congratulations you seem extremely levelheaded and inquisitive i'm sure you will do fantastic in your endeavors! hey i'm 7 months into my job as vaguely classed as "reporting analyst" but really it just means a handle coercing a bunch of unrelated reports into more user friendly packages i'm just beginning to learn r and start to fiddle with actual data manipulation and data analysis but it's been slow going what resources would you recommend to have a more clear understanding of what to do with data in r? i assume we are looking at similar information despite being in different industries(i'm healthcare) my department doesn't really understand statistics which makes things more complicated but i'm honestly so out of practice with stats day science myself that i'm not much better i mainly deal with personnel production rtam and call center data cheers i am also a data analyst in the healthcare space and use r on a day to day basis to learn more about data manipulation in r i would focus on learning dplyr and the related "tidyverse" family packages there are lots of resources with online but the free ebook "data science with r" is a good place to start love the enthusiasm :) one suggestions - when prioritizing work ask yourself about each thing if doing it would help you or someone else make a meaningful decision sadly most of the real world jobs are just sql monkey :( i'm a computer engineering undergraduate and i'm starting a research in big data data science and data analyze also i'm learning python and i'll start to learn sql and hadoop the question do you know tools for this field and what languages should i learn? and good(free especially) courses and certifications in this area? hey man awesome post! my question isn't necessarily about the job itself but rather the interview process what was it like? do you remember the number of rounds and what were the technical questions asked (obviously not what they actually were but the nature of them ) another question i have is are there data scientists engineers where you work how is the work you do different then what they do? how is it similar? thanks i'm glad more people are opening up about their experiences in the industry! very insightful post thank you! as a former communication major having shifted onto statistics i have been taught more complex methods (e g from anovas and multiple regression until structural equation modeling and correspondance analysis) but without really ever digging into the math behind it could you list the specific math concepts you'd recommend to follow online? good luck on your side! hi fellow data science enthusiasts i am finishing university in a related field in few weeks and am now looking for work in data science my issue is that i do feel like i only know data mining and i don't have an extensive experience my goal is to work in marketing or at least with massive datasets therefore i do believe that my data visualization and distributed computing skills need to be strong but i didn't really apply these topics so far i will need to relocate for work which will make it hard for me to be hired if i'm just average any idea on which skills i should prioritize on and on how to do it efficiently? thank you for your advice ! put the word "blockchain" on your resume thanks ! is a high level knowledge enough or should i dig a bit deaper? it's a joke referencing this: this company added the word ‘blockchain’ to its name and saw its shares surge 394% afaik it's a database which combines the size of a temporal ledger with the security of having it in more than one place you forgot this: s i think that the best way is making projects the project should be at least a little unique (the more the better) it better be interactive and it should show that you have various practical skills for example i did a following project: https: digits-draw-recognize herokuapp com this is a site where users can draw a digit and it will be recognized while working on this project i did the following: created a simple site with a passable interface; collected the data by myself; trained two models: fnn on numpy and cnn on tensorflow; integrated them with the site so that they could predict the drawn digits; made integration with amazon cloud to save the drawn pictures; made it so that models are continuously trained on new data; made a flask app from all of this and hosted on heroku; from the aforementioned things i knew only how to train models so i learned a lot of things and the project impressed a lot of people as it showed that i can generate an idea make a plan collect data and deliver a comleted product i really like your project did you do more? and may i asked where the data was from? thanks ! thanks! i didn't have time for more projects but i want to do them but i have a github portfolio though it needs restructuring: https: erlemar github io i "made" the data by myself: at first i created a basic site where i could only draw a digit give it a lable and save this way i draw 1000 images i see from your flair here that your masters is in econ and you (presumably) work in finance any high level bullet points on your background and path and level of instructed learning that you could share (vs autodidact)? my path wasn't really good for a data scientist i have graduated from faculty of economics of msu (russia) and had no idea what i wanted to do had zero programming skills and didn't really like math (econometrics was an exception); then worked for ~4 years as an it-analyst in erp-system implementation in consulting companies after some time i realized that i didn't like the work - overworking changes of requirements a lot of testing of developments i have had enough and after several months of thinking decided to change my career here is my comment about my path to the first job: https: www reddit com r datascience comments 6jcvfl for_selflearners_what_learning_curriculum_has djdhwkd that was my work in finance: building a model predicting the probability of client activating credit card but there were two problems: i had to work in an open space with 100+ people and i worked alone as there were no other people knowing ml in my department (except my bosses) in the end i was able to successfully finish the project in python and then i was supposed to make it work in sas by that time i was already looking for a new job; i have built the aforementioned project during my free time while workng in this bank; i was lucky and got a new job with much better salary; but the job itself is questionable - there is no certainty in the future but i hope it will be ok; there was one thing which gave me a huge boost - in april i joined russian slack team (called ods) which is extremely helpful and advanced i got a lot of knowledge and some connections from it; also recommend this approach i have a side project where i scraped nfl data and make predictions on the lines got more questions about that project listed on my resume than any other i generally feel like the biggest edge you can give yourself is better computer science cred learn more software tools learn them better brush up on your algorithms and do difficult side projects another one that seems very popular is deep learning - i discuss it in the data science book i wrote but decided that it'll get a whole chapter in the next edition thanks so far i'm not too bad with python and its libraries what do you think would be worth it next? hadoop? scala? boy there's so much stuff and always more to learn about what you already know maybe best to pick whatever interests you most so you're likely to really dive into it but a few of the big ones would be deep learning spark (don't worry about hadoop itself) natural language processing and more about databases i know this is what is scary ! i feel like i know so little yet i think i'm good with what i know i'm looking to relocate in a very competitive place so i'm trying to stand out maybe try deep learning i think it's less useful than other stuff but it's very hot right now ( we will see how long that lasts) my book gives an overview of it (enough to have a broad overview of the subject a few sample scripts etc) but there are better resources available if you really want to learn it great advice sometimes i think software comp sci skills are underestimated in ds one thing to keep in mind and this may be a bit different is the idea of 80 20 put another way - 80% of the value is driven by 20% of the effort when using data science to solve business problems in marketing it is important to keep the overall problem in scope if you can solve the problem with a simple model greater accuracy from additional complexity doesn't provide any incremental value so true my first ds project i wanted to use the lastest and greatest neural network random forrest generating boot-strapped xgboosted bayesian regularized decision tree cluster a simple glm did the trick haha i can definitely relate i think it is super easy to get excited about the latest and greatest technologies and to get caught up trying to use them you'll earn a lot of credibility with your business counterparts if you can resist the complexity in favor of a simpler perhaps less cool approach that still yields the right result question: what is your degree in? you might only have to weed out the crap jobs like business intelligence i am currently in my 5th year working as a registered nurse i am very unhappy of the thought of spending my working life in nursing and looking for alternative career options i’m very interested in technical areas like data science but currently have no technical background i have two bachelors degrees one in nursing and the other in economics prior to nursing i worked for several years in typical non technical 9-5 office job for a consulting firm first as a business analyst and then a project manager i’m not super keen on going back for a 3rd bachelors or even a masters or expensive boot camp as degrees od not guarantee breaking into a field with the plethora of online resources available is it possible for somebody like me who already has a college education but not ins technical field to pick up the necessary skills to get hired into data science? i’m also in my mid 30s (likely i’ll be in my late 30s when i am ready to enter the field) so that may be a negative factor as well the k2 bootcamp seems like a thorough program and not insanely expensive ($7k) but i would need to complete their prerequisite recommended course work i’m also considering doing some of the relevant computer science coursework in the ossu (open source society university) curriculum to prepare for something like k2 would this preparation be enough for somebody with my non technical background to be able to land a job in data science (being in my late 30s most likely when applying)? i want to get an understanding of how realistic (or unrealistic) this is before i get started any and all advice are much appreciated! thanks! go into healthcare analytics i would seriously spend time working on healthcare data in your personal time then slowly build solutions for your hospital i would focus first on automation then creating better reports for nursing leadership i second this if you can leverage your current career then you'll be in a much better position a nurse who started developing interesting and practical technical solutions to problems in their workplace is a story that writes itself and i would definitely give that person an interview a nurse who changed careers and went to a data science bootcamp less clear about that one how do you feel about boot camps as an employer? i work in financial services so it's definitely a bit different than most tech companies we generally hire people with proven problem solving skills and good numeracy a big chunk of the analytics department i work in have phds and most have master's degrees an applicant with bsc+bootcamp will generally not win out over someone with a phd or bsc+experience due to hr constraints we have to do competency-based interviews where we ask about times you have solved problems and so on doing a phd will give you loads of anecdotes to help your case much less so for inexperienced people that being said i have worked with people with phds that aren't very good and people without a master's that are very talented given the choice i would do technical screens and weight credentials degrees much less heavily but that's not how it works awesome thank you for the answer it definitely gives me some insight i recently decided against taking a bootcamp very last minute i figured i'd save the $12k; it seems like you're mostly paying for networking hopefully i can teach myself and come out and even better analyst in the end i have a masters in marine science that was very heavy in the experimental process and data analysis so i'm hoping that will work in my favor 95% of the utility of bootcamps is networking sure you might learn some stuff but what makes it potentially worth $12k is the face time you get with prospective employers if you treat it as such it might pay off definitely don't do it for skills development side thanks that makes me feel a bit better about my decision i’m very interested in technical areas like data science but currently have no technical background my first question would be where this interest comes from? it's hard but i also believe it's possible i'd start with several free low cost online courses - this way you can gauge how interested you remain before dropping thousands of dollars improbable given current market conditions (some people with quantitative degrees and masters phds are having a difficult time) your best bet is to get your hands dirty with data in your field apply some techniques you have learned in mooc’s some people from my bootcamp were from non-technical backgrounds and they were landing interviews so it's not impossible however your lack of a technical background sets you behind other people in terms of starting position and consideration of your resume by hr therefore it may be more of a uphill struggle for you the difficulty of finding a data scientist analyst job is very location dependent so also consider carefully the location for example here in the bay area it's not easy even for ph ds to land the first data scientist job (which is what i'm trying to do now) for example here in the bay area it's not easy even for ph ds to land the first data scientist job (which is what i'm trying to do now) just three to four years ago it was easy to find a ds job in the bay area market has gotten saturated out of curiosity do you mean the market for entry-level ds jobs (as it seems u jargon59 mentioned) or ds jobs at all? i'd say it's across the board the entry level market is tough it's just the result of so much interest in the area the market for experienced ds folks is semi tough there are lots of experienced people in other roles (software engineers statisticians opp research etc) and phd's leaving academia who are moving into ds so it's tightened as well i think both although the bar is higher for entry level people because of the inexperience note that nobody really wants an entry level ds if they can avoid it everybody wants someone who can hit the ground running check out microsoft professional series on data science through edx org try to land a job in your healthcare companies service desk get some tech experience and work your way into your into your it career field by networking ith the plethora of online resources available is it possible for somebody like me who already has a college education but not ins technical field to pick up the necessary skills to get hired into data science? if you just had the nursing background i'd say you have a very uphill battle however that degree in economics has given you a huge leg up if i were in your shoes i would do the following: review all my old econ textbooks which an emphasis on renewing your knowledge on the analytical mathematical modeling aspects simultaneously learn the basics of a a coding language (python has a very low barrier to entry and has extensive inexpensive and open-source learning resources) and then start pursuing real-life healthcare analytics economics problems with the goal of using those problems as a framework to develop from a basic knowledge of coding to being able to use your chosen programming language to solve real-world problems at some point during or after that step you should be able to put some examples of your entire process of defining a problem gathering data cleaning data performing analysis and interpreting results into action-oriented conclusions on github or a something similar it's going to be a lot of work as a nurse i suspect that you already deal with long hours if this is something you want to pursue you need to be prepared to accept that those hours are going to get longer you are going to have to commit to several days a week at least a couple hours at a time to spend on learning new information as well as reviewing old information to bring it back to the forefront and to integrate all of that together in other words it's a very doable transition but it's going to require consistent significant commitment from you to make a reality there is a whole world of healthcare economics and healthcare analytics that you are really well positioned for as you have the knowledge and experience to contextualize both the problems and answers in both those areas but i want to make this clear: with no technical background you are going to have to grind for at least a year probably closer to 2 or 3 before you are in a position to take advantage of your rather unique qualifications if you aren't willing or aren't able to commit to that grind it's likely to be a waste of your time now a quick aside about bootcamps: you aren't ready for one bootcamps are designed to take someone with some knowledge and really teach them how to best utilize what they already know in terms of the technical aspects you don't know anything to there's nothing for you to utilize i would suggest not considering a bootcamp until you've completed at least your first full-fledged data project (which means asking a question gathering and prepping data performing an analysis and interpreting the results) just because you won't get nearly as much out of it unless you have that experience and context be prepared for the reality that real data science might be a decade or more down the road and that you will need to prove yourself first you are going to be changing careers and you are going to be starting off pretty low on the data science ladder your background in nursing and economics will make you highly competitive for any kind of healthcare economics or analytics positions but there is very little chance you will receieve serious consideration in a data science role until you have a work history that proves you have the chops and it may be necessary at some point in the future to get an advanced degree to make a transition from a data analyst-type role to a data-scientist type role nursing health informatics seems to be the most logical and feasible step for you get the experience then move into a more data-intensive role in healthcare the problem with data science is that everyone wants to do it but most of those people don't understand what it is they see only the glamor and glitz and don't understand that it is bloody hard work even for the smartest first ask yourself: why do you want to get into it? what interests you about it? answer that then go for the next step the first poster's advice is sound: leverage what you know about your current industry produce sample analyses of healthcare data on weekends build small programs that solve practical problems in your workplace show them to people post them on github people will notice at some point i would advise you not to give up your day job until you get a clearer idea of whether you really want to go down this road and what you would have to do to get the kind of position you want under no circumstances should you jump blindly into it i recently met a person who had given up a good managerial job at a good company to pursue data science he was good with databases and spreadsheets but did not understand that that was not enough to guarantee him a job with salary and benefits comparable to those he previously enjoyed the people being hired for the top jobs are usually people with advanced scientific degrees (ms at least phd often) who are already thoroughly trained in the coding and data analysis skills practiced in scientific disciplines and which have ready applicability to data science i remind you you are in direct competition with these people and even they can have trouble--a phd is not enough to be hired you have to show value which is not always easy to do at some point there will be a consolidation of the field and the entry-level jobs will dry up data scientists are automating many of the routine tasks that are now the province of entry-level employees so i would be very cautious about leaving your current job if you keep your nursing job but build on what you know you might have a shot at getting hired by a firm which works in the health care industry once you're in you can learn from other more senior data scientists and branch out otherwise it's tricky just curious why you're becoming disillusioned with nursing? signed a male nursing student also have you considered being some kind of research nurse? i've been looking into the possibility of going into data science for a while and i've seen a lot of conflicting opinions about whether it would matter at all be beneficial or be detrimental if i were to quit at this point i have a ba and a master's in geology (paleontology) and if i quit my program now i would get a second master's (in health science) as a consolation while i've been thinking about this in the abstract for a while this has come to a head now because my fiancee left me i've been taking some time to think about things what i want out of life and i think i'm coming to the conclusion that i don't want to stay in academia this is a big decision and i know i need to think about it more before i have entirely decided but recently i've been leaning heavily towards leaving my program and figuring out what else to do data science is the first option that i would like to pursue and while i am proficient in r and know the basics of python (and have a good statistics foundation) i've made an account on kaggle to bone up on machine learning and whatever other useful skills i may not yet have my big question is if i try to transition into data science now will i be able to make it work or will quitting the phd be a big black mark on my resume? i'm kind of stressed out about the idea of continuing the phd right now but if i take some time and change my perspective and decide it's the best way forward to go into data science i think i could probably push through the remaining 2-3 years but if i could successfully transition now i think that those 2-3 years would probably be better spent getting experience in industry sorry for the rambling tldr: fiancee left me doing some soul searching don't think i want to deal with academia anymore if i quit my phd now will i be able to successfully transition or do i need to suck it up and push through the next few years and finish the degree before i try to switch? there is a demand for top data scientists that exceeds the supply but it's kind of the other way around for entry-level data scientists: there are far more recent bootcamp mooc graduates than companies willing to hire them if you want to leave academia do it now because in 2-3 years you will be competing against younger people for the same junior positions you can make an opportunity of the situation with your fiancee (sorry though) to start a new life burn the ships :) i think quant masters is enough for a good career in ds i wish i’d quit with a masters last years of phd we’re miserable and i’m never getting them back but now you can add those three letters to your name :) i could not quit my phd (without paying back the scholarship that is) i wish i would have done just the bare minimum and spend more time networking or on these "job market preparation" workshops that the doctoral college offered haha yeh me too! it doesn't sound like op has a quant masters though geology and health science is never going to get a recruiters attention i dunno if their r skills are good i would definitely interview them couple of really good ds’s i’ve known come from an experimental psychology background that's a big if what i consider proficient in r is unlikely what that degree prefers proficient in r i have a bag in mechanical engineering in a ms in applied mathematics it will be a hard sell for sure op should really make a good case and look for startups where his background is suitable i'm sure there are healthcare startups out there that would be more interested in someone doing basic regression exploratory analysis and etl that has domain knowledge over a bleeding edge machine learning expert i don't see a geologists with a health background ending up in data science to be frank unless you're special your resume is going to read like doesn't know what he wants to do well what experience do you have working with data in an academic or professional setting? your post doesn't tell me if you're qualified a masters or even a phd in any random subject(yes geology and health science is random for ds) doesn't mean you'll get interviews just like that i think figure out what you want to do while you're finishing your phd and then you'll have a solid certification (and prove you can do research) when you want to move into data science hey all as per one user's great advice from a post about two weeks ago i began my journey into ml and data science i completed andrew ng's course on ml and found it extremely interesting i loved every bit of it i was on coursera every day and completed everything in that course it was very cool to go on kaggle read some tutorial kernals and just find myself noting what the provider should have done differently as per prof ng's advice i feel like i have a solid understanding of the fundamentals of some of the most basic and widely used ml algorithms today and how to use them properly i'd now like to contribute on kaggle but i really do not have the skills to do ml (or really any data science) in python r though i probably could mash up some code from some popular kernals i really wouldn't know what i was doing and so that would be pointless i've discovered two courses (specializations) that focus on deep learning general data science using python that seem pretty good at this point i'd like to learn python over r has anyone taken these courses? does anyone have an opinion on what are some good ways to learn python with data science? sometimes i think i could be overcomplicating this but i really don't think it's wise to jump into kaggle only to possibly burn myself out because i don't know python perhaps someone has been in a similar situation and can help guide me? again i could just jump into the above two courses but if anyone can help optimize my solution so that i start in a better direction that would be huge! thanks you everyone! as it stands my game plan is to get on kaggle build up a portfolio and use that to help me land a job in the ml realm i've actually found some interesting jobs that combine both my collegiate background with ml pretty neat watch "data school" videos on youtube there's a series on pandas for datascience (btw i fucking love the guy who makes the videos) then find a dataset on kaggle and make a kernel yourself i did the titanic one i was in your situation and i could easily read and understand the tutorials on beginer challenges but actually doing it from scratch is completely different and way harder get your hands dirty and play with the dataset you will learn a lot you can do it directly on kaggle so you don't have to install python and the packages hey thanks! i'll check the videos out yeah i just learned about their in-browser de that should help out a lot in-browser i cannot download python at work can i run it in a browser? you can also create free private or public online jupyter notebooks through microsoft azure i am not 100% certain if you can import your own files but there is a browser instance of a python environment on kaggle! sign up and then you will have access i started off with the coursera course as well and have since moved onto dataquest (dataquest io) i'm not sure if you're interested in full data science or simply machine learning but the dataquest curriculum covers a lot of bases and i've learned a ton from it thank you i think i saw that link on kaggle i'll check it out after i burn through the above coursera courses actually i may just jump into dataquest if these coursera courses are not working out thank you so much for your response - hugely appreciated quick note about the payments too you can complete the first two modules on the free plan to ensure you like it from there essentially all modules up the machine learning are on the "data analyst" path meaning you can pay $30 mth while you complete these after that can upgrade to "data scientist" path and complete rest based on how fast you completed coursera i imagine it can be done in 2-3 months total how good is datacamp? i'm not sure if you meant dataquest but it's great imo i like it because you don't have to put effort into what to learn next they expose you to many of the aspects needed to become a data scientist in a way that is easy to learn (coding in the browser) they also vary the activity types and help you get started on some "portfolio" pieces that you can customize and potentially show to employers all in all i'm very happy with the program sure thanks for the inputs but i was referring to www datacamp com have you come across their courses? i looked at them briefly but chose dataquest instead so i can't say too much about the details of their offerings alright thanks! maybe look into some of the math statistics regarding linear regression logistic regression and neural nets btw if you want to step into deep learning you can start prof andrew ng's course on deep learning in coursera itself which has just recently started it's a specialization of 5 courses the first three deal with all the prerequisite knowledge to get into deep learning and the next two are on cnn and rnns i have finished the first three and waiting for the next two courses to begin plus it familiarizes you with python numpy and some deep learning frameworks as well yes! i actually linked that course in my post i am really looking forward to that prof ng is a cool dude i think he almost cried at the end of his ml class he's such a sweet dude did you take the first ml class? of course! :) i think that's the starting point for almost every self learner for ml have you tried edx org lots of python data science classes as well i like uc san diego microsoft harvard & columbia classes i believe there are more but i haven't tried them edit: i learned python from the mit class at edx i actually kept forgetting about edx! the first person that responded to me in my last post mentioned this site i will certainly check out the mit python class thank you so much! there is also an ml series through the university of washington on coursera that is pretty good it's a bit more math heavy than ng's course though https: www coursera org specializations machine-learning awesome! and it's in python 👍 thank you! i'll have to make a decision between this one and the other two specializations there's a big debate about what a data scientist should be called i like the term "data scientist" but have to admit that it is a bit vague i'm doing a lot of software data engineering work as well as data science work (modeling visualizations presentations to business crowds) should i change my title to "full stack data scientist"? is it more descriptive of what i do? to me it's less vague than "data scientist" because it implies a heavy software data engineering skill set i'm also the only member of my team who takes projects through the entire pipeline (data clean up to modeling to web app integration) what do you guys think? what are the advantages? disadvantages? will recruiters or future employers see me as a jack of all trades but master of none? just want to make sure i'm sending the right signal and not hurting myself throw "devops" in there for good measure s i think we just need to start using gsm so full stack devops data scientist? :) can't tell if you're being sarcastic s means sarcastic no s means he means it s ahh got it just stop data farmer data janitor janitorial analytics "it's 'custodian ' dick!" hahahaha data custodian? i haven't heard that term out of web software development so i would be confused by that title if you feel like you have enough experience i would put both data scientist web developer so people know you do both and didn't make something up if i were hiring for a position and i saw that title at best i'd probably ignore the "full stack" modifier at worst i'd wonder why it was there yes "data scientist" is vague but it will be apparent from your resume and cover letter what your specialty (or lack thereof) is i don't think you'd gain much from adding it as an official title that said if you referred to yourself as a full stack data scientist in your cover letter at worst i would ignore it but more likely i'd consider it a positive web data ninja no--keep it simple data engineer might be more appropriate agreed i got down voted below but the new title to describe the exact discipline is "ml engineer" rather than "data engineer" which is slightly generic seems like machine learning engineer may be the best title titles don't matter all that much the story your resume cover letter tells is better a software engineer who learned a lot of modeling is a bit of a different story than a modeling scientist who learned a lot of data engineering then throw in the trajectory from industries areas and types of projects you've worked on i generally advocate for applying for a few positions focusing on a few companies and tweaking tuning resumes and cover letters for those (over the spray and pray applying strategy) whenever you're applying for a position or trying to get a recruiter's data scientist hiring manager's attention always be thinking about why you'd be interesting to that person the doubts that person would have when they look at your 1 page resume and how you can eliminate them etc at a glance if i saw the title full stack data scientist i'd assume the applicant is a bser who created his own job title just to sound more grandeoise the point of a job title is to quickly convey what you do and help others interpret it and understand what your skill sets are making one up defeats the whole point ml engineer is a good shout as a few people have posted below or just go with data scientist but in your bullet points make it clear you're a bit more engineering too i like it thanks! i go with software engineer specializing in machine learning i go with software engineer specializing in machine learning -reiinakano i'mabotmadeby u eight1911 idetecthaiku amazing tempted to put this in my linkedin profile now good bot thank you best_mord_brazil for voting on haikubot-1911 this bot wants to find the best and worst bots on reddit you can view results here evenifidon'treplytoyourcomment i'mstilllisteningforvotes checkthewebpagetoseeifyourvoteregistered! data looker : finding data is what i use ml engineer i absolutely think you should people always scoff at long and descriptive titles at parties but it is the best way to fully show what you are doing to recruiters and hiring managers i think full stack implies that you are doing it all i honestly also only think it is a matter of time before people like you are the unicorns and data science jobs will be broken up more into separate functions like architecture modeling visualization etc more and more i'm already seeing exactly this at my job some team members are data engineers some statisticians some visualization gurus and we each gravitate towards those areas thanks for the advice! idk i'd lean toward not changing it i don't know whether full stack has an understood definition in this context my first impression would be a full stack web developer rebranding themselves as a data scientist hey reddit trying to work though this and figured i would see what others thought i work at a federally qualified health center as a data analyst coming up on 3 years now and my boss has asked me to suggest a new title this is part of a restructure of the team that currently only has 2 of 4 positions staffed and difficulty hiring due to low salaries as a small-medium sized non-profit the difficulty in coming up with a new job title is in the variety of the work i do (warehouse architecture development requirements gathering data analysis report writing quality council work infrastructure setup and maintenance etc) a little bit about me and my contributions i had no experience in data when i started but have had several big 'wins' i built a data warehouse from a single t-sql script with select into statements to a concurrently extracting high performance dimensional data warehouse i performed 85 95% of the work and vision on two projects that were recognized by our federal auditors as fqhc best practices i also have discovered security vulnerabilities in operational systems that triggered projects to replace them with more secured systems education wise i have a bs in computer science and i completed the stanford ai-class com course in the top 99% of students my long term goals are driving towards data science but i still lack the practical machine learning and predictive analytics skills and practice while i feel like i could suggest almost any title within reason and have it considered i'm hesitant to go with something that might set me up for failure so soon i was pretty partial to going with a data engineer title but a consultant i highly respect suggested something like senior data analyst since it would be more in line with data scientist in the future and the idea is growing on me the biggest requirement that was handed down to me is that it has to be market review-able so nothing too out of the mainstream any thoughts advice? in no particular order: - chief data bser (probably somewhat accurate use of your degree title) - data master (vaguely ghost busters) - matrix engineer (red pill - hey there's another one: red pill administrator) - the one who knows (probably accurate) - data (for star trek fans) - chief information officer executive scientist - chief data scientist random thoughts: - senior says you've been there longer; chief says you're in charge of it - analyst says you think about things for someone else; engineer says you perform disciplined creation i agree with the consultant hehe top 99 % (aka anywhere above bottom 1 %) ah crap i mangled it it was top 5% with a 99% score for the class guess i'm not getting a promotion after all! greatest god in the universe the consultant was trying to get you to aim high with title i've helped design it orgs and believe me: aim high if i was looking at a resume from a "senior data scientist" from a small non-profit i'd expect solid technical skills a wide breadth of experience lots of experience working with non-technical people and relatively little depth in terms of fancy ml engineering this seems to be you this is very common for "data scientists" in small immature orgs (despite being a very different job from data science at say facebook) in contrast if i was looking at a "senior data analyst " i'd expect general excel r work building reports supporting research outreach etc -- but would be pleasantly surprised to see you'd build a data warehouse have a cs degree etc if i were you i'd either go for a senior data scientist or lead data analyst both accurately signal the type of work you do and position you well to be in level 2 type da ds roles in large mature orgs if you so choose if i were i keep trying to change my job title to 'data janitor' but no one will let me based on what you've done i'd aim for a 2 level bump in title so whatever is above senior data analyst in your field that's an issue this is partially trying to solve we don't have levels for any of the positions part of the department restructure is to define those levels our cio was thinking of da i ii iii etc but i think i convinced him a more modern scheme with senior and lead type roles would be better based on feedback co-workers and leaders have given me i would agree with a two level bump i'm just not sure if that's senior da or lead da or possibly something else thanks i'll do further research cocksmith [deleted] nope it's pretty clear i'm not there yet just asking for the best title that would help me the most to get there eventually [deleted] i don’t think title matters all that much though it’s about what you know and how you apply it if only lol [deleted] absolutely - but let's be real at most organizations that resume has to get through either a recruiter ats or hr person tasked to the role before anybody qualified to look is going to see it those people are the ones who the title is a big boon to getting past hello tl;dr: i'm interested in how you transitioned from an analyst role that mainly uses more gui based technologies such as excel to a role in the data science field more specifically: given your current role as {{title}} what specific skill do you think benefited you the most? was it more programming side such as knowledge of python pandas etc? was it more computer science side such as algorithms and data structure? or was it data science specific such as machine learning deep learning etc? what resources have you use to learn the skill and what would you recommend not recommend? for those who came from a role that mainly uses excel or other similar tools what role did you transition into? what skill specifically helped you transitioned from one role to the next? finally a bit of background i work for a small consumer packaged goods company (20 people-ish) my day to day work mainly focus on excel in the past 6 months however i've begun learning python currently i'm using python to scrape our internal reports (written in php and outputs various html tables) then uploading them to a postgres database in aws and wrote a flask web app that connects the app to the database in order to generate new html tables i've hosted the web app on a digital ocean vpn along the way i've learn some python flask multiple flask packages postgres nginx gunicorn and quite a bit about pandas the issue i'm struggling with right now is what path should i explore further? given that i eventually want to work in the data science world i'm not sure if i should dive deeper into learning python (algorithms data structure) or i should go ahead and start learning about machine learning and such? any feedback would be greatly appreciated! thanks in advance i learned to write t-sql code and threw excel to the curb then went back to school and earned a masters in data science with a heavy exposure to statistics business and learning programming (java python r etc ) having the credentials of the degree is the single most important thing to getting my foot in the door as a data scientist same regarding t-sql and excel although excel still functions as an easy to use etl tool which i can shared with less technical colleagues in different departments i'm currently studying for a degree while working as a junior data scientist although it's probably more of a data engineer role at present also learning java python and r pluralsights has taught me a lot the book "learning python" by mark lutz was great too hey i have that book too lol i'm curious when you say junior data science data engineer could you maybe describe a task that you've done recently that either stood out to you or maybe a task that you would say is representative of the work that you're doing in the role? and thanks for the reply! sure thing recently i migrated some data from an old legacy platform into a newly developed big data platform it would have been nice to be able to reprocess the old data using spark which is what we use for any new data coming in but to ensure consistency the old data was transformed to fit this is the boring part once this transition is done we can start doing some funky stuff which brings on the need for java python r the company i work for has a lot of data coming in from different devices which has been collected over several decades it's primarily been stored in sql and distributed in excel reports we've hired an experienced data scientist to guide us through i report directly to him and i'm pretty much soaking up as much as possible pretty cool thank you very similar story here how's the market? i have a similar background and i'm finishing my ms soon im a little bit concerned about availability of jobs and wether my qualification is enough since a lot of jobs require experience in the field i had some experience in the field already so may not be the best measure but the dfw market is pretty warm not quite hot as a lot of the companies here are trying to figure out how they want to use data scientists but there are quite a few job openings apparently this is a common story but i'll share mine: as an excel analyst i learned a bit of python r to automate some of the tasks i was doing on a daily basis and because i was interested in them then went and got my master's degree and learned more about the statistics in the meantime i got a lot better at coding i will say that for whatever skills i had just having the degree was a big one for getting people to talk to me i could have learned 90% of what i learned in my degree program online for very little money (the 10% because i had a really great professor that could explain complicated ideas in a way that i was able to grasp them much more quickly than i could have on my own) but just having the piece of paper made people say "ok this is much better than the online 10 hour bootcamp on these other resumes" if i had the option to just purchase the degree without actually learning anything i still think the roi would've been good in other words though many people make the change in a more organic way strapping on a backpack and heading to school is a really good option awesome thank you for the reply would you mind sharing a bit more about how you learned to code while working on your master program? did you use any specific sites? how did you go about finding projects to work on in order to improve your coding? i totally agree that getting an ms is a huge boost to your career but at the same time it is also a luxury anyway kudos to you for completing the degree and thanks for sharing! sure! i got a lot better at coding by working on projects but i also used a lot of books and online resources datacamp com is one i mention in a lot of my comments short easy to digest lessons (maybe too short even) it's around $30 month but occasionally they have a $150 year deal around holidays i kind of still use it on and off so don't get the year but i do think it's valuable i mostly code in r now - r for data science is a good resource to work through to start and that one is free i have mixed feelings about udacity but that's another good online resource my school was really focused on using sas though they allowed me to use whatever i wanted whenever i had some extra time i took my work in sas and replicated it in r so i was getting familiar with both languages for projects finding a good data set can be annoying but only because there are so many i'd literally just google things like "data sets for data science projects" or "famous data sets" and find stuff there kaggle occasionally has interesting data sets too like anything else a lot of practice practice practice run into a problem then spend 5 hours figuring out how to solve the problem so the next time you know how to do it in 5 minutes edit: one more content - i listed a few resources whatever resource you choose to learn to code pick one and stick with it i did a lot of jumping around to find the "perfect" resource that would teach me everything it doesn't really work that way if you find you hate a textbook or site then absolutely move on otherwise just work your way through page by page i am currently in high school and i am trying to figure out the career path to go down (i have to choose my major) i am 80% sure i want to do something cs or tech related but someone recommend to check out data science i think the field has a great outlook and matches up with most of my skills i did some research but i still can't find the answers to some of my questions how many hours does an average data scientist work in a week and is the job stressful? thank you! :) this depends a lot on the place you’ll work in i'll second this when i started as a data scientist i was working for a fortune 100 company and the hours (especially during busy season) could be beyond ridiculous nowadays i work for a midcap company who is much more focused on work life balance and employee health and happiness i get paid for 36 hours a week i haven't done the exact count but i reckon i do very close to 36 hours a week (+ unpaid lunch where i usually do other stuff) that being said i do a lot of extra stuff in my spare time - write papers do courses blog etc but i do that from home (i've got a wife and a one year old) and usually after my boy has gone to bed my setup works well for me but what you consider a good work-life balance obviously depends on you many people in my office work from home once or twice a week (mostly once) and some work part-time edit: and it's not really any more stressful than any other we have targets and deadlines to meet but as long as you manage expectations it should rarely be stressful thank you for the response! :) i also get paid for 36 hours a week working 4 days a week and that's about what i work it's not particularly stressful overall and the work-life balance is nice this is actually very close to what i was going to post (down to waiting for my 7 month old to g to bed) i want to add that for me that extra stuff doesn't really add to the stress as all those courses and research is so interesting! depends on the job you take i specifically applied only to jobs that had good work life balance because it's very important to me right now in general startups are more stressful more hours than bigger companies but there are tons of exceptions to that at my job (fairly large tech company that isn't facebook google amazon) i work 9-5 and staying late working longer hours is almost frowned upon within my team i'm exploring new technologies and doing r&d so there are few strict stressful deadlines agreed with startups but a startup environment can be great if you work with older people i work in a startupy environment and lot of my coworkers have families and teenage kids so they really push work life balance people don’t care how much you work as long as you get everything done depends where you work consulting and investment banking tends to have longer hours i work at a consulting firm that also develops software the software development side has a great work - life balance and developers rarely work more than 45 hours a week consulting work is much more dynamic and requires the company to take advantage of opportunities as they come up as well as execute work relatively quickly the amount of hours an average consultant works per week varies wildly and ranges from 40 hours to a hard upper limit of 70 hours the data science team is sandwiched between both sides of this business typically the work load of a data scientist depends on the project we are assigned to for example - i worked on a consulting project last fall client wanted to know if we can develop a software solution to the problem now that it was more characterized and now i'm working with the a development team on the software extension of that project so half a year of shitty hours - to normal hours my colleague was assigned to a development project from the get-go and has yet to experience the consulting side of a project i work 10-12 hours a day(lunch included) really depends on the company though the place i was before it was more like 7-8 hour day less stressful than a back up production dba i was thinking of making a website listing all skills required for someone to become a data scientist: minimum mid level and expert based on the data science community's professional opinion i'm talking about from the ground up listing the skill or knowledge set and the reason for needing it and its importance level people will get to put in skills they think are important and it will be up or down voted by the community example: partial derivatives - importance level 7 - usage: xyz you should be able to :abc123 the purpose behind this is to come closer to define the necessary skills for data scientists and make it easier for those who want to excel in the field to have a clear path with no doubts i think it will have many more benefits go or no go? this is fine if you goal is to focus just on a compendium of common techniques in data scientists' toolbox but i think that actually accounts for a fraction of the primary skills that make someone a data scientist especially at a higher level after all how do you quantify abstract concepts like: curiosity ability to learn proactive and self-motivated communication storytelling problem solving-oriented multitasking parallel thinking those are far more important than any set of quantifiable technical skills you can come up with this all said the role of a data scientist is quite broad so i think you are going to have a very hard time reconciling both the breadth and depth of technical skills into a single list if that were the case then we shouldn't advise anyone in any field of study the skills you mentioned are more experience-based which i do agree are important but lets be realistic without knowing the basic hard skills then they mean nothing there are many people who want to know what they should study and focus on and how it will help them grow in the field all the other stuff comes with experience and practice you posted this saying "go or no go?" and someone is giving you advice as to why it's a "no" but now you're being defensive about it do you really want feedback or not? yes i fundamentally don't believe it is possible to become a data scientist without experience this recent blog post sums it up very nicely that data science is a practice not a particular skill set as for basic skills obviously you need some basic level of math and programming ability (though sometimes not much) but out of the 100s of data scientists at my company most have a very distinct educational background there are perhaps 3-4 who shared a significant amount of the same technical skills i did when they joined the only commonalities that i can think of is that probably all of us had at least one course in probability and statistics and had written at least a few lines of code in some language but even that i can think of a few people who did not what is common among a pure math phd a stats phd a molecular biology phd a physics phd a cs phd and a systems science phd are the skills i listed above are you for real? :) i wouldn't want to work for your company for sure :p on a more serious note i think you are missing the bigger picture to the idea im sure others can share their opinions as well i'm not missing the bigger picture i just think your premise is flawed for multiple reasons like i said even if you ignored my issue with what a data scientist is your list is still going to have a massive problem handling breadth and depth for example take something like clustering (i e cluster analysis) off the top of my head i can think of at least ten distinct sub-areas within that area (e g hierarchical centroid density graph fuzzy etc) each which has decades of associated research and hundreds (or thousands) of books dedicated to the topic you could likely write a very large compendium of only clustering skills and have it become a massive endeavor yet still not fully cover the topic despite all of that there are data scientists who won't know almost any clustering methods since their background is focused in a different area based on your argument all the masters phd's and other courses offering ds education are flawed so do you have a solution to bring things closer to a workable solution ? there is nothing that can be perfect but think about what we can do to make things a bit easier for people trying to get in the field and those already in but want to become stronger to some extent personal development in data science is going to be a pretty individualized thing now if you focus is purely on something like "machine learning" "statistics" or "programming computer science" i think you can quantify those to some extent just by looking around at the coursework for degrees in each of those as someone who has to hire from the pool of master's and phd students - yes those courses offering data science are massively flawed if you think the role is as simple as a checklist i suggest you spend some more time in the industry "as somone who has to hire from the pool of master's and phd students " can you please further elaborate? thanks https: www udemy com data-science-beginners-guide-to-the-command-line not even close to what i am suggesting plus this is not community based so there is a large bias example: partial derivatives - importance level 7 - usage: xyz let's do that for writing letter "a" is pretty important i give it a 10 but "m" is just a 9 and "x" a 3 do i need all of them or can the community guide me to learn only the letters that are most important s you can't learn ml without learning all the basic concepts just take a nice intro course and learn them all what will you do when you don't understand the bias variance tradeoff the difference between regression and classification boosting vs bagging regularization and why it's necessary etc does a data scientist have to know machine learning? you can try to use pure statistics but why? my point was more about whether machine learning is actually a prerequisite skill that must be learned to become a data scientist i think we are looking at this from a point of view of someone who already has a strong understanding of the field we are looking at it from the perspective of a beginner to mid level data scientist for this stage i think the best thing to do is andrew ng's machine learning course i know something changed after i finished that course i gained an insight and confidence that i didn't have (and a lot of knowledge) andrew has a gift explaining too bad it's only a short course and there are other good ones but this one remains special job description is as follows we need: -an excel connoisseur know excel backwards and forwards and be comfortable with using advanced techniques to collect display and interpret data -sql expertise proficiency in sql is required r python java and or c++ are a bonus -nosql tools have familiarity with at least one nosql tool; hadoop hive pig spark h2o etc -a data miner be able to extract patterns from large sets of data in order to deliver business insights from data basic qualifications -currently enrolled in a college or university and pursuing an associates or bachelor's degree in computer science statistics or related major -strong applied experience -creative thinker who knows how to create real-world products -curate acquire & maintain datasets -create and manage data in traditional and non-traditional ways -develop processes using java scala sql shell etc -share engineering support release & on call -estimate plan & rollout changes -key skills: -sql java python nosql tool ie: hadoop pig spark h2o etc is there a set of stuff that i should do to prepare for all of that? i'm relatively new to data science did an internship at a large company last year but didn't get too much experience from it because i was unprepared going in would love to be prepared for this one i know some of the basics of data science from taking some classes on udemy etc seems more like a data analyst position look at datacamp and dataquest for handholding skill learning kaggle for more practical projects if you can do all that you probably don't need the internship experience hope it's at least paid is it just me or is that a really weird job description? know excel inside out plus sql java and hadoop?? plus none of those listed are actually nosql -- hbase cassandra mongodb i'd buy the others are just big data programming environments edx has some good big data courses that will get you the hadoop side mfw mongodb marklogic arent listed but hadoop is smh i don't understand all of those are clearly nosql even hive is nosql sql has specific requirements that none of those envrionments enforce it feeling like sql doesn't make it sql your definition is pretty non-standard then https: en m wikipedia org wiki nosql i don't see hadoop hive pig spark or h2o referred to anywhere in there nor in 6 years as a big data consultant have i heard them referred to as "nosql tools" most default hadoop distributions only have hbase that would fit in that category in most standard definitions key value document or object graph storage mechanisms that do not enforce acid relational paradigm would be part of the definition which none of those tools are non-mobile link: https: en wikipedia org wiki nosql helperbotv1 1 r helperbot_iamabot pleasemessage u swim1929withanyfeedbackand orhate counter:134656 that first sentence tells me the employer has no idea what data science is or how it should be used gemeral rule of thumb: if any data science job description mentions excel run away are you kidding? it's an internship not a posting for a sr data scientist also excel is a fundamental tool that anyone working in data should known welll enough 100% serious if it can be done in excel it is not data science excel is a descent tool for data summary extracts but that is the extent of its use in data science if the internship is really for a data scientist position even the hr people should have a basic understanding of what will be involved could just be an hr person mix mashing a template if true even more reason to run seems like you need to learn excel learn that sometimes its easier better to just do things in excel than importing the data to a different platform if your interest is programming consider learning vba? i have 4 months till the internship begins are you in school? if so they clearly think your interview showed that you are strong enough keep up what you know strengthen what they want that you don't since you've already been accepted consider reaching out to your future supervisor for advice? i am currently a sophomore in college my interview went well but it did last year for another company as well i went in with prior knowledge and came out feeling like i didn't learn too much since that didn't work out as well i really want to be prepared to go above and beyond for this internship foreword: this is geared as a question more towards data analysts rather than data scientists the difference i was told was that data science is much more programming heavy and involving of statistics and data analyst was more indicative of working with or manipulating that data so this is just to clarify that this is geared more towards the analysis side so i’m curious to what extent have you been comfortable with python in your current or previous jobs? i’ve met a lot of “analysts“ at corporations that have absolutely no skills in python or even r admittedly many of them do not have the title data analyst so i don’t think they are really good source to ask about so that’s why i’m here! i’m wondering what level of familiarity data analysts have with python i’ve seen posts here indicating that some people have just a working knowledge and that they are not masters of it while others have absolutely no knowledge and get by just fine i’m genuinely curious not expert but enough to slop my way through googling solutions and making my little py scripts work are my scripts good and lightweight? nah are they written like expert py programmers? no way can i sit down and code my ass off and not reference anything? nope can i go more than 4 basic lines of importing packages and my datasets without googling what to do next? no way do they work? hell yes have they made us literally tons of money ($1 000 000+ usd) from pattern recognition etc yep [deleted] basically we collect loads of data on people their searches on our website and then personal data when they make a purchase all of this data goes into a massive csv sql table some where that we can then extract with specific parameters for example: give me all of the data from people that purchased from 1-1-17 to 9-30-17 it dumps that little query into a csv then i usually write some little script in r looking for any funky correlations kind of a blood trail then once i have that i usually cut down some of the junk columns so it’s a little more compressed sometimes if it is still too large to mess with i have to run it through aws because my work issued lenovo laptop is a turd and i can query it even faster then i can easily manipulate it in py the hard numbers data i usually leave in r just seems easier especially if i do not need to throw it up to aws now if i do have to throw it up to aws or i am text mining py is my go to text mining is wayyyy easier in py i find numbers mining is easier in r one other thing i do if i am on the bloodtrail stage is pull the csv in to tableau sometimes it’s easier to see it visually with graphs but back to your py — so i have little scripts that run correlations decision trees etc the usual stat suspects i also built some py webscrapers using beautifulsoup that was slick ideally i’d like to get into webpage dev with django it’s pythons website framework-ish but here’s what i’ve found- with py you kinda build a one off tool for your project you may build it for a project and never use it again but these elite py programmers that build webpages build something that is obviously reusable but isn’t big data worthy in fact these “elite py programmers” can’t even tell you shit about numbers and stats some are ok but most can’t i hope this is what you’re looking for! i can’t be more descriptive because i work in big data (which is probably small data compared to most) for a fortune 500 and they are super uptight about what i can share with others if you have more ?s respond back to this post and i’ll do my best to answer sorry for any typos i am on my iphone best! it really depends on the organization's needs the analytics field is new to a lot of companies and they truly don't understand what skills are really needed i just recently entered into a data analyst role but my skills are probably different than others i am about average on most tools used for that role i can write basic python can build dashboards in tableau and powerbi write sql build workflows in alteryx and knime and i understand database architecture i suck at vba and i'm above average in excel but no expert my point is having the ability to gather data from multiple sources and analyze that data is the key to that role it doesn't really matter what tools you use to get there as long as it's accurate a data analyst has to gather and interpret raw data then massage it to the point that non-technical people can understand it and present those findings this here matches with my experience and that of my friends colleagues in my experience strong sql skills curiosity and analytical instincts are the most important things every business has different needs and their own unique software stack heck if you work at a large enough company these things might even vary between or within teams knowing python in particular shouldn't be necessary for most analyst jobs but having familiarity with it and or programming concepts are very good if you like python and want to specialize then by all means do it! then search out positions that want that skill i don’t think anyone is good at vba my friend i almost cringed when i saw that :p you can ask me to learn anything just don’t ask me to learn that vba isn't as sexy as python r and the like but its damn useful everyone has office and everyone has spreadsheets writing quick vba scripts to clean or prep your data for analysis has saved me loads of time not saying it's required but as is the theme here it's another feather in the hat of one's capabilities many a data analyst has fallen victim to relying on vba only to realize that it is more or less incredibly limited it takes more time to customize and set it up to be used across different workbooks than it does to just do it yourself repeatedly example: selecting all data creating a pivot table repeatedly on different workbooks even when the data is more or less the same size and shape it's tough because the macro recorder outputs the vba specifically for that exact workbook which is just plain silly so you have to then go in and tamper with the vba and troubleshoot it extensively just for a simple task jeeze i'll take my chances doing it myself :p i hear you good points we use a bit of everything and are exploring as many new solutions as we can a touch of vba has served us well for years completing routine tasks with static data sources it's workable with dynamic sources as well but as you say that brings more coding perhaps it will ultimately go the way of the dodo but for now it continues to assist us daily and probably won't go anywhere soon without a considerable effort to flip everything over into py or other old habits die hard i have some python experience at my current job it serves as a nice 'glue' to put a lot of things together for example i am automating document generation for a specific report we do for geographical areas where all of the data that needs to be changed can be stored in a csv so i use python (and jinja) to interface with latex and can generate hundreds of these things very quickly for any statistical analysis i use r but i am definitely not a data scientist i am usually manipulating and reporting on the data rather than managing it gathering it or mining it directly this may help also — tools i use regularly— excel ms access basic basic sql queries to csv py r studio — love the summary function! tableau amazon web services azure — basically aws but i think it is easier to use and is cheaper for my company i like jupyter for running my py scripts openrefine — super easy to use scrubs shit data great tool i work in adtech and my job requires no python or r whatsoever i've gotten pretty good at sql over the last couple years but i do all my data manipulation in excel i am learning python to advance my career prospects but that's out of personal desire not a work requirement your last paragraph is literally me… a lot of employers specify they like people with python but not a requirement only strong excel skills but i have a lot of fun with python so i do it for fun but if any employer needs those skills all the better you know? also sql seems incredibly easy to me i feel like if you try something ridiculously hard like python for long enough when you get to something a lot easier like sql it’s just a breeze for you you know? people think writing sql statement is a nightmare but when you are writing function calls and unit tests for python on a daily basis some sql is easy people think writing sql statement is a nightmare respect the join respect the null grasp cardinality it takes some time to get used to many junior programmers find it creepy they all had it at school and many did not have fond memories nor good grades for the subject try something ridiculously hard like python hard? please define easy or give an example (computing language) oop is a dime a dozen these days java c c++ the list goes on python is a game changer especially when doing machine learning or something like neural networks many things you’ve never even done something remotely close to it’s fun but hard java more or less is super easy but clunky naming conventions are super strict and it’s aged so it’s been standardized more i do all my data manipulation in excel i'll take "nightmares i hope to never have" for 300 trebek yeah my python of pretty rudimentary but i'm starting to realize how much time it's going to save me i worked in an insurance company and an isp as a data analyst never needed python sql is only thing i needed to know very little python or r - - most sql in access excel and tableau but im still learning python and r on the side because you never know when you might need it :) i am data analyst for large bank here in cali i didn't know python and could handle most regular demands (except for forecasting) if you know sql and a cube tool (such as excel pivot tableau ) you can get a long way but i was frustrated when trying to introduce algorithms for clustering or routing problems in these cases i extract from the relevant sources then transform it in excel into whatever form required then fed the result to a tool that offered the specific functionality i needed (heuristics lab heal lkh-solver ) then use this output transform it again and upload to the system the main problem with this approach was the need to switch between interfaces and port data between the tools: doable but cumbersome also if more often than not the dev-team they say: "man we can't do that unless you can arrange for some months in the planning " so i picked up python as a glue and use it as a way to shame the devs into expanding their horizons it's one thing if you offer a proof of concept consisting of a rag-tag of tools or to offer a solution written in 1 interface language if they can't rebuild it they can always use build a small application process interface it soon also became obvious it was almost impossible (at my level of expertise) to find functionalities that require different tools looking back at it i used excel as a wrapper then i discovered python excelled at this so what would you say is the most important to know as a data analyst in regards to python? what will the brunt of your work in python be related to provided that you know it? so i don’t go studying some wild ridiculous topic that has no relevance i'd start with regular expressions it makes the task of cleaning data much easier (those messy user input fields that you tend to avoid can actually be used) then move on to pandas then look into matplotlib also try jupyter as a working environment but to be honest i like to do wild stuff to be more precise: use other people ridiculously brilliant approaches (like how you gps plots a route) you could give it 100-200 hours if it's not your candy drop it if it is you might be able to solve problems most analyst avoid because they don't know how to program and that many developers avoid because they prefer to stick to +- * hahaha that's funny i've been working in reverse of everything you said! i started with pandas then learned jupyter and i have regex on my list i never really understood the applicability of regex though :-\ i'm not good at learning something arbitrary out of context but if i learn it while doing something useful then it sticks like glue sql like is nice but i want a like in with all bells and whistles possible this is regex can you give any super cool examples? being at a wedding currently i have to decline would take some time to provide clear examples just imagine: a like in operator (rough sql equivalent is join on like sad performance though) why are you on reddit at someone’s wedding to begin with ಠ_ಠ are you one of those people who is inescapably attached to their phone and uncomfortably checks it when in awkward social situations? the party was hosted at the place i live i never a smartphone for forums just good old pc i have to admit don't always enjoy large social gatherings where there are few people i know depends on my mood and the public if there's talk about poetry philosophy or maths i can have lots of fun if there's talk about telepathy the moon influence on plant growth and auras py forecasting is shady when i want stuff forecasted i plug it into tableau sounds shady but my results are damn good like +- 1 5% for monthly forecasts i’ll take that all day like +- 1 5% for monthly forecasts i’ll take that all day if the value of the deviation is low compared to your cost agree 100% but can we do better if the value is higher? yes you could is this a sensible choice? depends on your appetite py forecasting is shady what models does tableau use? i doubt the mathematics used are different i know a long shade is cast by the fact that in py you'll spend hours or days rather than minutes learning the specific application functionality but there is also a small light: the sheer breadth of options available here’s the way i look at it (and perhaps my response was harsh) — we spent many man hours perfecting a py model the total man hours cost was +$3000 tableaus cost was $750 for a license and it took use 1 man hour to get the result total = $820ish and the model was + 1 5% you have to understand why we went that route and our biz is seasonal so just because we grow from sept to oct by 2% this doesn’t mean i’ll grow in shit months like jan or feb understand that i am in the midwest travel biz i hope that makes sense tableau picked up on that seasonality [deleted] oh look an analytic prick shows up how nice tableau's visuals are fantastic and take about 30 seconds to do and they have spent tons of $$ perfecting their seasonalities within their forecasting modules op here is another fun fact about the py job field: it is filled with know-it-all pricks with 0% social skills on an offline hello there my partner and i both lead data scientist teams in the online travel sector we often hire grads and phds who have the technical skills and theoretical knowledge and then help them develop the additional competencies needed to become competent data scientists after doing this for many times we developed a pretty effective approach with a decent success rate recently we spent some time here on r datascience and noticed that many people here face a similar challenge: they have finished a data science book course or even online degree yet still felt that there is a gap between the technical stuff they learn and the skill they need to become competent data scientists? if you are one of them we would love to have 5 minutes of your time to answer a few questions here: https: goo gl forms wnlup0kxrdza5v312 we are considering to develop a course to bridge this gap your feedback will be super valuable for us thanks for your time in advance! would you keep me up to date about the course? thank you for the interest psylekin if you leave you email in the form we will keep you posted when we start testing the course if we fill out the survey will you use that for some form of newsletter? would be nice to get updates hello andy_d0 the answers we gather from the survey will influence the format and content of the course since we can't fit everything into a 3-6 month course we need to prioritize the topics and see what is the best way to get them through if you leave your email in the form we will keep you informed when we start testing the course or have other updates rest assure we won't spam you ;-) i'm very interested in all aspects of this form submitted thanks! thank you for the feedback too ib33 we will keep you informed :) sounds like a good idea filled out the form and would like to be informed of any developments hi everyone; i'm (supposedly) a data engineer who found himself doing mostly sql work and some data science work(feature engineering) the project i got assigned to is basically developing a churn model so first we've got to come up with and calculate some features(which is what i'm working on; selecting features from various datasets and suggesting new ones) i have some questions: is it normal that the only code i've written so far is various dialects of non-procedural sql? i came with an expectation that my main work will be using python scala to build data pipelines and have already expressed this desire the take-home test was exactly that - parse this json file schedule the load into database query it it was fun but at my current project the moment the fun part - gathering and parsing - has already been done by another team(data governance) and it seems my sole responsibility for the foreseeable future is to query it i sure hope that this is going to change; the job requirements listed python experience and the take-hoem test as well as interview tasks all reflected that i seriously don't want to have my coding skills to atrophy so i'm concerned i wouldn't consider writing sql to be necessarily far afield of data engineering are you being asked to provide periodic refreshes of the feature data? if so maybe they're expecting you to automate that process so here's the exact situation: i have to extend a data mart(a very large table with dozens hundreds of columns) where each column is a candidate feature(variable) for a yet-to-be-developed churn model to accomplish this i have to write a shitload of queries get lots of data from teradata and sas and use it to extend the main data mart in hive with that data i can also suggest new candidate features and do so sometimes sounds like a de task to me a little disappointing that the ds team isn't familiar enough to do that themselves but this doesn't seem explicitly out of scope i think it's definitely fair game to mention to your manager that this may not be an efficient use of your time - it's also fair game for you to figure out a way to automate this if your title is a data engineer i would expect you to be doing plumbing not writing queries i suggest approaching your line manager to discuss your responsibilities if what you are doing now is very different to what is in your job description what was advertised then you can express this: "based on the job description and the interviewing process i was under the impression that i would be doing x most of my time is currently spent doing y i fear i am not being challenged enough and would like to do more work like x will there be any opportunities for doing x in the near future?" but what does plumbing really entail? i have lots of data in teradata and sas; i must use it to enrich a data mart in hive; is this a real data engineering task? in my experience writing sql is definately a data engineering task (you can call them ddl's if it makes you feel better) analysts might be able to write ad-hoc queries but your job as an engineer is to set up infrastucture around ongoing processes e g unit tests on your code monitoring alarms partitions indexes etc you can look at using an orm like sqlalchemy if you're keen to work in python if you want you can also familiarise yourself with the code that's doing the etl that way when you need to bring in some new data in the future you can make the change to the code yourself and have it reviewed by the team that maintains that code thanks for the perspective in light of a recent thread i've spent some time compiling a list of data science and interview questions that aren't necessarily related to technical skills like programming but more in line with problem solving and a general understanding of the field some of these are questions i have personally asked data analyst and data science applicants that have come through my company and others are ones i have taken from various sites or borrowed from others the goal of this post is to promote thoughtful discussion around how to answer some of these questions while providing some sense of what hiring managers may look for when interviewing some will have a right and wrong answer others won't and some will be easier than others have fun and enjoy!! :) explain the difference between l1 and l2 regularization methods? tell me the difference between an inner join left join right join and union? estimate the number of 'happy birthday' posts that are logged on facebook everyday you have a data set containing 100k rows and 100 columns with one of those columns being our dependent variable for a problem we'd like to solve how can we quickly identify which columns will be helpful in predicting the dependent variable identify two techniques and explain them to me as though i were 5 years old what is the central limit theorem and why is it important in data science? what are your 3 favorite data visualization techniques? how do you handle missing data? explain the 80 20 rule and tell me about it's importance in model validation in your opinion which is more important when designing a machine learning model: model performance? or model accuracy? what is one way that you would handle an imbalanced data set that's being used for prediction? (i e vastly more negative classes than positive classes ) explain the following parts of a linear regression to me: p-value coefficient r-squared value what is the significance of each of these components and what assumptions do we hold when creating a linear regression? given tweets and facebook statuses surrounding a new movie that was recently released how will you determine the public's reaction to the movie? explain like i'm 5: k-means clustering algorithm i have two models of comparable accuracy and computational performance which one should i choose for production and why? what is your favorite thing about data science? if this post gets enough attention i am happy to provide more enjoy! edit: word i would love to know how one can explain the k means algorithm to me like i'm five sounds like a really interesting question! you have three bowls of chips at a children's party the children are as lazy as they are hungry your job is to place the three chip bowls in the party room first you foolishly place them randomly around the room as you begin to walk away the children who are far from thechip bowl that's closest to them begin to whine "i don't want to walk that far to get my chips!" you being the spineless jellyfish you are move the bowl closer to the whining children unfortunately for you this causes the children who were close to the bowl to whine so you move the second bowl closer to these new whining children but that causes a third group of children now further from their party snack bowl to start whining you start to see a pattern every time you move a bowl some children reduce their whining while others start whining more your solution is to continually move the snack bowls toward the larger groups of children while not getting too far from the smaller groups of children eventually you get to a point where moving the snack bowls in any direction will cause an increase in the amount of whining you'll have to hear so you declare your job finished then curse yourself for volunteering as a chaperone eventually you get to a point where moving the snack bowls in any direction will cause an increase in the amount of whining you'll have to hear so you declare your job finished then curse yourself for volunteering as a chaperone how do you know that this configuration of 3 chip bowls isn't a local minimum? maybe theres an even global minimum configuration out there that will have even less whining? you don't know but the k means algorithm doesn't always give the global minimum this is the clever interviewer follow-up duh duh duuuhhhhh imagine you and your friends spilled cheetos on the floor and you want to divvy up the work of picking up the cheetos among your friends you're all lazy so you decide that each person should have a small area from which to pick up the cheetos but don't really care if some people pick up more cheetos than the other the important thing is that they don't have to walk around much when doing so first each person throws a ball in the room randomly you then assign each person the cheetos that are closest to their ball then you find the average position off all their cheetos and move their ball to that average position reassign the cheetos according to the new ball positions continue to recalculate the average friend's cheeto positions move the balls and reassign the cheetos until everyones cheetos stop changing ownership from iteration to iteration you have successfully implemented kmeans okay so how many 5-yos are going to understand iterations and averages? touché however the eli5 sidebar does say "friendly simplified and layman-accessible explanations - not responses aimed at literal five-year-olds " just a shot: pick a number let's make it three put down three different colored beads on a table where there are a lot of other uncolored beads now you have to color all the beads based on which color bead they are closest to once you are done with giving all of the beads colors take the actually colored beads and put them at the center of all of the beads you made that color keep assigning colors to beads and moving where the colored beads are until the beads stop changing colors even though i fully understand k-means this confused me yeah fair enough :p i wouldn't be thrilled if i got that question on a phone interview you have a bunch of kids with marbles and you ask them to throw them directly in the air without actually seeing how many kids there are you have to form a best guess so first you start out with 1 kid but it's obvious there are several "pockets" of marbles since they threw them directly in the air and even though some marbles seem really scattered for the most part you can see groups of marbles on the carpet you increase the number of children in your guess until all your pockets are accounted for these are really great but 110% nothing like the ones i got when interviewing which seemed to mostly focus on problem solving and my familiarity with certain kinds of methods but fwiw i work at a relatively "mature" ds company totally! every company will interview different these are just some of the questions i ask interviewees they help determine skill level and understanding of key concepts beyond technical requirements from there we can break into methods and more detailed problem solving what were some of the questions you were asked in your interviewing process? lemme think some more technical ones like what do you do to avoid doing the most likely thing every time? what is an interaction? how would you quantify this particular problem? how do you deal with sparsity? trying to remember more at my place of employment though most questions just required a knowledge of mle and an awareness of hlms oh one question that gets asked a lot is: how would you quantify and then make decisions under uncertainty? what is an interaction? damn depending on what level of detail they wanted my intro to research methods undergrads could answer that well i was then asked about how to calculate the degrees of freedom of an interaction forgot about that part :p still not seeing the hard part? interview questions don't have to be hard it's not something everyone remembers i took a statistical inference class on coursera and i could answer that how would you quantify and then make decisions under uncertainty? isn't that like all of inferential statistics ? i'm not sure i'd know where to begin with answering that like are they looking for some connection between uncertainty and entropy? like be hesitant about deciding on a hypothesis when you have a great deal of entropy in a distribution of beliefs hypotheses? no i think the idea is you have n observations and the question is what do you do when n is small? first how do you quantify uncertainty personally and then since there are lots of ways of dealing with that (e g priors confidence intervals etc ) pick one strategy and explain why but really it's a vague question so you can say something frequentist-y or bayesian-y just explain why you like that approach they also give you a good understanding of the interviewee's social skills and ability to "layman down" technical details this although often overlooked is a very important skill mmmm i'm pretty sure i know where you work and i also interviewed for an internship there last year and some of these questions aren't too different i also got asked about model validation feature importance and an open-ended question like that fb bday posts one of course i'm not denying your experience just wanted to throw out that knowing the questions like the above still might help get a job at places like the one we have in common oh i totally agree but i think the character was a little bit less fun and also more practical if i had to say i really like these but sadly i can only answer 2 3 of them without some quick googling also you forgot the "you" in the movie question can anyone answer these two? i have two models of comparable accuracy and computational performance which one should i choose for production and why? what is one way that you would handle an imbalanced data set that's being used for prediction? (i e vastly more negative classes than positive classes ) articles videos books are fine too i'm new to data science and these questions got me curious i have two models of comparable accuracy and computational performance which one should i choose for production and why? verify that comparable accuracy means comparable precision recall etc then go for the more interpretable one or for the more established algorithm i e if logistic regression does as well as an svm with a custom kernel go for logistic regression depending on the model try to look into the logic using something line lime and choose the one that's consistent with your intuition look at where each model is wrong and see if the cost of the error is the same example: both models are wrong 10% of the time but one is wrong about customers worth 100k and the other is wrong about customers worth 10k choose the second what is one way that you would handle an imbalanced data set that's being used for prediction? (i e vastly more negative classes than positive classes ) many classifier algorithms allow you to weight errors so use a higher weight for the more rare class if you have enough data you can under sample the negative class alternatively you could oversample your positive class you can reduce entropy using various techniques i use this library: http: contrib scikit-learn org imbalanced-learn index html that's a very nice list! here are a few more that i sometimes ask: what is your favorite machine learning data science algorithm? take a few minutes and explain it as if i were a layman (using a whiteboard to draw whatever you want) write code to output fizzbuzz for integers between 1 and 100 (inclusive) (yes i've had candidates fail this ) (for folks who are reasonably comfortable with sql) suppose we have a dataset with users and items and we have a specific table that determines which users have which items eg user_id item_id 1 1 1 2 1 3 2 2 2 3 2 4 3 2 3 3 3 5 for simplicity's sake assume that we only care about whether or not a user has a given item (and not how many of an item) and that each (user_id item_id) pair appears exactly once write a query that outputs distinct pairs of item_ids and the number of users that have the given pair of items where each such pair of item_ids appears exactly once for the input above the output would be: item_id_1 item_id_2 num_users 1 2 1 1 3 1 2 3 3 2 4 1 2 5 1 3 4 1 3 5 1 suppose you have a dataset and you're using it to build a classifier of some kind (regression decision trees svm etc) what does it mean to say that the classifier is overfitting the data? why is that bad? what can be done to help avoid overfitting? no offense but these sound very much like the questions i've been asked by hiring managers who are systems people and are trying to put together a data team can someone help with the kind of answers expected for these two questions: estimate the number of 'happy birthday' posts that are logged on facebook everyday given tweets and facebook statuses surrounding a new movie that was recently released how will you determine the public's reaction to the movie? (number of daily facebook users % 365) * (fraction of friends that post hbd in time line * average number of friends per user) in my experience talking about the assumptions of such a rough estimation is more important than getting in the ballpark yes when i'm interviewing i'm less concerned with the number itself and more looking for how they approach the problem the first is a guestimate question meant for the interviewer to gauge how you would work through an ambiguous problem no right or wrong solution but meant to test logical thinking and some probability the second is another open-ended ask regarding an ambiguous solution there is no right or wrong solution to this either but meant to see how you would work through the problem and measure your creativity with applications of data science nlp ml etc explain the following parts of a linear regression to me: p-value coefficient r-squared value what is the significance of each of these components and what assumptions do we hold when creating a linear regression? and why is r2 so horrible for determining the quality of a model and name at least two better metrics the context for your application of linear regression is important so simply saying r-squared is a horrible way to determine the quality of a model seems like a broad generalization if you want to model the linear relationship between two variables r-squared is perfectly acceptable if you are using it to assess a model's performance look elsewhere if all i care about is the correlation between two variables r is sufficient in the context of generating coefficients statistical tests of significance for the coefficients in predicting an outcome variable i can manipulate r2 too easily for it to be a reliable indicator of the quality of a model if you want to model the linear relationship between two variables r-squared is perfectly acceptable in this case you aren't using r2 you are using r2 which just happens to equal r2 in simple linear regression in the context of multiple linear regression they aren't equal it seems like nitpicking but r2 is always the measure of correlation which would be used in your first example and r2 is always the measure of variance explained by a model and they just happen to be equivalent in some scenarios i agree with op r2 is generally terrible for model quality in anything other that an extremely barebones quick and dirty assessment do you have some more information on why r2 is horrible? maybe an article somewhere? cheers sorry there seems to be overlap of questions from '120 data science interview questions' is it just a coincidence? not at all i mention in the post that some of these are not my own original questions although i'm not familiar with this resource so i'll check it out ok here's the dilemma: i'm a harvard-trained phd scientist working in biotech i manage a team of scientists and deal with large data sets (~30 gb set of genetic information) and write my own python scripts to analyze them (although the scripts are fairly basic) currently i live in boston and need to escape this city i'm sure it's great for some people but i cannot stand living here any longer i'd much prefer to live in say montreal problem is in the places where i want to live there is almost no biotech for that industry options are basically boston or sf i want to live in neither solution: migrate to an industry that is thriving in montreal and other cities where i want to live that means data science my question is: does my management experience science chops (phd and postdoc from fancy schools in genetics) limited coding skills and comfort with large data sets qualify me for anything decent in data science? or do i need to retrain? additional question: is data science employment in a bubble or will there be lasting demand? the reason i ask is that bioinformatics (data science for genetics basically) is in a huge bubble the reason is that many bioinformaticians are being paid to do very basic work that is quickly being replaced with off the shelf software thanks for your replies they are appreciated calling u omega037 i've responded trying my best not to doxx myself too easily people with your background certainly can migrate into data science roles especially if you have strong experience in statistics and or programming we have several data scientists who have backgrounds in things like genetics genomics computational biology or just biology sometimes these people are hired directly (i e a job posting for genomics data scientist) but often they begin in a role on an analytics team that is peripheral to a data scientist and then make the transition either naturally (if they already have the requisite skills) or by developing their programming statistics skills while in their role as an aside while there is certainly more concentration of employers in areas like boston and san francisco there are biotech data science jobs all over the place my company is in the midwest and employs hundreds of data scientists my company is in the midwest and employs hundreds of data scientists that's good to know i'll certainly keep my eyes open for those kinds of opportunities interesting username it's a commentary on society's view regarding 'toxic masculinity' interested in hearing your evidence for bioinformatics being in a bubble i think there's a big difference between phd-level bioinformaticians developing tools for analysis of ngs data and phd-level bench scientists pushing the right buttons on those tools to identify differentially expressed genes or whatever and then calling themselves bioinformaticians interested in hearing your evidence for bioinformatics being in a bubble sure for context i work at a company with a biotech core and the people running the machines and doing the analysis agree with this assessment i'll say this right up front: i am not saying that there are not bioinformatics problems that are outside the reach of people not specifically trained and experience in the field just that those problems are not what's the bread and butter that many bioinformaticians are being hired and retained for much of the actual tricky cs is now routinely done with packages that are easy to work with eg bowtie for alignments now that i've made that disclaimer there's a few factors at play: 1) the companies selling the hardware have not released software capable of performing a lot of standard bioinformatics analysis this is because they want you to pay them for that service 2) the result has been that each core needs dedicated bioinformaticians unless they want to pay illumina or whoever the service fees 3) much of bioinformatics exists in the following category: easy enough for a motivated person to teach themselves in a few weeks using code-academy and other online resources but complex enough to turn most people off from doing so 4) off the shelf third party software solutions are becoming more common there are examples of cloud pipelines that are very similar to the kind of pipeline that you or i might pipe together in a unix environment so basically a lot of what bioinformatics are being hired for is nothing near the level that you need a dedicated cs grad for this is why so many bioinformaticians have merely migrated from other fields with a little on-the-job training hell there are companies who will hire you with not much more ability than being able to do basic string manipulation in python that feels like a bubble to me more and more biologists are doing their own nga i do my own as i'm tired of waiting for the core to get around to my files they and i do pretty much the same thing and get the same results meanwhile i trained myself to do it in about 2 weeks including learning python as an anecdote my company now refuses to hire biologists who make use of ngs but who cannot do their own nga the ability to do your own nga is becoming a requirement we will no longer hire a dedicated bioinformatician for a lot of tasks that we used to it feels like the bubble is bursting i'm another data-literate fancy-school phd making the transition to data science i'm currently enrolled at insight data science it's a free program exclusively for phds i'm five weeks in of seven and love it you can check it out here: http: insightdatascience com thanks do you feel that this kind of program is necessary to transition? certainly interested in the boston program probably not i like to think of it like this: if you spent 7 weeks doing your own job search you'd probably get a few in-person interviews i'm doing 7 in-person interviews for companies that are specifically paying money to hire insight fellows you get practice with the interview process (i couldn't whiteboard coding interviews well) etc all good points will look into it it seems like the golden handcuffs may keep me from running away today but we're expecting an ipo soon at that point hell yeah for this sort of thing following if you are interested in a director of data science roll at a fintech start up let me know i'd love to set up some time to chat on skype fintech start up i've done some financial modelling recently very interested heading to work but will send a pm tonight *role - though your version sounds delicious i work for a large fortune 100 company on the bi reporting team i work with large datasets on a daily basis i mostly use sql with oracle and teradata i have my bachelor degree in information systems the information systems degree i have required a good amount of calculus and stats i’m pretty damn good at my job i’m looking to make the next step in my career and add to my skill set i find it hard to justify 60k from a school like usc for a m s in business analytics or even 30k for the same degree from csu cal poly my question is what do you think about the microsoft professional program for data science? it’s cheap and it looks like it can help get my feet wet with python r etc thoughts? great question! i'm now five years into a data analyst career track and i know how tough this particular transition can be (developing new skillsets) taking some kind of education program is ok and is better than doing nothing since you likely won't learn those tools as your current spot the alternative - and this is what i've chosen to do in order to develop skill sets - is to find a new position at a new company where you'll be stretched and challenged by using new tools my first job let me learn sql excel very well my next job introduced me to r i changed again and now am in python and pyspark (for communicating with amazon's distributed database system) each job hop i made allowed me to learn a new tool because the new job required that tool (edit: each also gave me a raise of 30-40% which was nice too) it's way easier to get better at a tool that you have to use day to day - i find i struggle learning stuff that doesn't apply to what i am doing this isn't much more helpful than saying "find a new job" but i really believe that's true new jobs is the method i've used to advance my career any time i sense i am stuck and feel like my potential is being artificially capped - which is what it sounds like you're running into now i 100% agree with the above statement because it is what i did too to level up my first job as an “analyst” i was fortunate enough to be doing r gis tableau matlab python for a non-profit research group however i wasn’t learning from any mentors who were more skilled than me because of the size and resources of the non-profit i then switched to a medium finance startup and did more modeling in r more sql i ran into the same problem at this medium startup of lacking a really smart mentor or even a team of smart data scientists who wanted to go above tableau and sql after 2 years at each of the prior companies as more of a bi analyst who does statistical modeling on small data sets (millions of rows) i made the switch to data scientist at a publicly listed company with an endless amount of streaming data i am getting up to speed on best practices and small time saving tricks (vim tmux jupiter notebooks juno lab docker pyspark impala) each time i moved jobs after 2 + years i left on a good note and i received a significant pay raise (30-50%) that i couldn’t get through negotiation or showing the value added of my skill set above a bi analyst more importantly i’m learning from people now who are much smarter and experienced than i am they are in the pattern of building their own tools for making their lives easier which is something i never saw in my bi role where we looked to vendors to fill that need this is why jumping ship can be more favorable to say negotiating a tiny raise and a title change i find employers thought of me as a fixed expense and even asking making a case for a 20%-30% raise after proving i was delivering value beyond my role was very time consuming especially when you throw hr performance review processing into the mix if i had stayed at that first non-profit and settled for a tiny raise i would be making 1 2 as much have fewer transferable skill sets fewer connections with the relatively same work life balance that’s why when my uncle might scoff at me at thanksgiving for finding a new employer again after 2 5 years in my head i’m thinking these aren’t the days of pension plans and the company looking out for you my skill sets and experience are my only true asset i’m finding that in the technical field of data science that meetup connections or the “network” aren’t as valuable as my ability and skill sets which is good what would you say is the best way to show data scientist potential from a data analyst position ? i already have learnt a lot online and with projects during free time i also try my best to fit the techniques at my office (introduced python to the team made a clustering project for customers and a forecast module of some of our kpi) but its not always easy because of the normal tasks taking a lot of time (sql coding dashboards and reports) do you feel there is an essential skill or experience that really telegraphs: that guy is ready for the real thing ? applying advice: i would say get creative with sliding in forecasting demographics prediction into your work (even after hours) so you can put on your cv that you got paid to do forecasting which goes much farther than listing coursera certificates it seems like you’ve already done some of that and i was there not too long ago some non-traditional approaches i tried with success but later said no to an offer when you look for jobs on indeed etc search for forecasting or statistical modeling and after impressing them negotiate that title you want that matches your skillsets at no additional charge to them (the con being they probably won’t be a legit ds team why i said no) just having 2-3 years with that title gets you on to your next more desirable data scientist job and passed the mindless recruiters skimming cv’s and not reading between the lines i think getting the interview was the real obstacle calling yourself a statistical analyst or a forecast modeler in my opinion is fair game if you find yourself doing that as part of your day to day work again sneak that extra stuff in on your weekends for work talk to your employer and let them know how much that title or progression means to you be persistent: my logic was i am young don’t have kids and have a thick skin i applied to so many opportunities and got ignored or rejected many times i cold emailed startups asking for a lower paying data science role i negotiated with my current work to give me the “associate data scientist” title at no extra charge to them (hr is quite forgetful + doesn’t understand basic human incentives which is why i left) i tried a lot of off the beaten track methods to go from analyst to data scientist and after at least a hundred applications over two years time i got the opportunity through a standard interview i did embellish my skills a little within comfort (i knew i could learn some niche skill set i was familiar with within the two weeks before starting and i spent 10 hours a day before starting doing just that) and my studying and experience prepared me for “tell me a cool project you did” or explain how you would setup an ab testing experiment why be so hungry for a title? answer) snowball effect the longer you wait for someone to hand you that title the less years of experience you have with that title it sounds shallow but titles do matter because hr recruiting are the gate keepers on who gets that first interview if i’m looking at jacob with 3 yrs as bi analyst vs fred with 2 yrs as associate data scientist who do you think i’ll interview for the full data scientist role fred might have cold emailed and asked a no name startup for 40k to do the job of a fully fledged data scientist just for a title they’ve never heard of but i don’t know that because i’m in hr and have 100 applications to get through i don’t even know what a data scientist does is it hadoop or was it pandas? also fred didn’t spend 30-50k for an online ms in data analytics fred got paid 40k for hustling has a list of paid projects kpi’s ready and after two years of cheap labor will get that first legit ds role sooner that the analyst who took two years to finish their online ms before even applying i stress the snowball effect because a few years with an associate or jr data scientist at dirt pay will get you to ds land faster and more cost effectively than a 2 year online masters while standing still in your role that being said i have a graduate degree in an unrelated field so i’m probably a hypocrite on this point way ahead: interview advice: don’t tell them your current salary i don’t care if it’s part of their process let them make the first move on deciding what you’re worth at this point you’ve earned it! excellent advise thanks for the long write up of course my situation is a bit different but i really like the idea of trying to negotiate a title where im at my manager would have no problem with it but hr is more annoying and since its a corporate business there are rules im stuck in my city for now so it decreases opportunities i guess but i have the right background with a msc in stats and trying my best to go above my job description and use ds techniques ill wait a bit for applying as i still have a lot to learn and i see opportunities for projects where im at and room for growth salary is also high so no urgency there another option i want to push for is to work with our in house data scientist i would say make sure your boss really has the power to create titles and perhaps get to personally know someone in hr who knows your story i’ve been burned twice because of bureaucracy best of luck thank you for sharing your experience i agree with this and hope to follow in your footsteps at my current role i have no exposure to r or python how do i get a position with exposure to these where i have no experience in them? thank you for the insight i think you are right and the best way to move up in skills is to move to other roles positions i'm super proficient in sql vba and ssis how do i move on to a role where i can learn python and r with no experience in those languages? this comes down to the interviews you want to hear those words when interviewing - if they don't say them you probably won't use them or at least you should be trying to figure out what data you will have access to and see if it will justify python or r use as for experience in those languages i've found no one cares if you can learn other ones you can figure out python or r or pyspark or anything else i've never had an interviewer have an issue with me not knowing a language the bigger gate you have to pass is convincing them you can solve their problems once you have any new technologies nailed down languages are just a means to an end and your end is making them more money perfect thank you my follow up question is how do i get a job that requires r or python if i have no experience in it as you mentioned i guess i can look out for jobs that have exposure to those languages and it will help with the learning curve also how do i know what data sets can justify python or r use? i currently work with data at a large health insurance company but it's mostly for reporting and analytics as you mentioned i guess i can look out for jobs that have exposure to those languages and it will help with the learning curve yep exactly this fwiw the smaller the company the more freedom you have to learn i know you're at a fortune 100 company but i have found working at smaller companies ( 1000 employees) to give me the most freedom to experiment and learn but that comes down to personal preference at some point also how do i know what data sets can justify python or r use? i currently work with data at a large health insurance company but it's mostly for reporting and analytics firstly if you're on the fence of which to learn i highly recommend python it can do everything r can do and then a ton more it's a full fledged object oriented language as opposed to r where that's just tagged on as for those languages you can use python today to practice retrieving and cleaning data that's the easiest way to introduce yourself to the language there are statistical packages you can use if you know statistics i don't yet so i haven't explored those parts of python as deeply thank you for help i started incorporating python for report automation this week it's a lot easier to use then java which i'm already familar with in regards to making the transition to data science do you think i need a ms degree to make the transition? my question is what do you think about the microsoft professional program for data science? it’s cheap and it looks like it can help get my feet wet with python r etc thoughts? you can get your feet wet for free having that on your resume won't add anything (personal opinion) i find it hard to justify 60k from a school like usc for a m s in business analytics or even 30k for the same degree from csu cal poly depends on where you want to go from here but ds jobs mostly have ms degrees as table stakes what kind of ms degrees do they have? computer science math? or is data science specific degree adequate? i've been reading good things about georgia tech's online degree for ms in analytics thoughts? stem in general i work with a guy doing gt's program and i've heard good things from other people as well plus it's absurdly cheap i'm using my companies pluralsight account to learn python i feel like it's a good format to learn python has anyone used this? https: medium com @rchang advice-for-new-and-junior-data-scientists-2ab02396cf5b two years ago i shared my experience on doing data science in the industry the writing was originally meant to be a private reflection for myself to celebrate my two year twitterversary at twitter but i instead published it on medium because i believe it could be very useful for many aspiring data scientists fast forward to 2017 i have been working at airbnb for a little bit less than two years and have recently become a senior data scientist — an industry title used to signal that someone has acquired enough technical skills to lead projects as i reflect on my journey so far and imagine what’s next to come i once again wrote down a few lessons that i wish i had known in the early days of my career if the intended audience of my previous post was for aspiring data scientists and people who are completely new to the field then this article is for people who are already in the field but are just starting out my goal is to not only use this post as a reminder to myself about the important things that i have learned but also to inspire others as they embark onto their ds careers! feedback comments and suggestions are welcome! thank you for reading :) "we all have skills that we would like to develop and intellectual interests that we would love to pursue it’s important to evaluate how well our aspirations align with the critical path of the environment we are in find projects teams and companies whose critical path best aligned with yours " this is some really sound advice nice! this is the single most important advice there are a lot of places where ‘data scientist’ may mean simple a b testing experiments excel work etc (which are useful in their own right but not that difficult to implement) if you want to progress in your learning you need to find a place that can help foster deeper understanding of distributed computing pipeline building cloud computing etc this an excellent post and wonderfully written i came into it expecting to skim through it but instead got out my notepad and clicked on every link nice work thank you for the words of encouragement! airbnb internship application is pretty unnerving there is a part in the form where you choose your university it listed 3-4 top universities and a box of other where you can type in your university yeeeeaaaah https: www airbnb com careers apply2 847330?gh_src= i hear you and i understand it can look intimidating i didn't design the 2018 summer intern application but i would guess that the reason only 4 schools are listed is because these are the schools that we actually visited in the past few weeks the rationale is probably that because we had a presence at these schools more applicants from these four schools would apply that said please do apply if you are interested we do not only consider people from these schools! we take a very holistic approach for our recruiting and many of us on the team review the take home challenge before we can even see where the applicant is coming from to reduce bias in the process so i hope you would consider airbnb :) cheers reduce bias in the process it's funny how i went upto my professor couple of days back to understand the bias variance trade off coming up in every machine learning lecture makes so much more sense now haha thanks robert those were useful points and i appreciated the format with key takeaways i know you commented on this post but to others looking for more on the subject here is a thread from a while back that is similar in nature: what habits or best practices do you wish you had known at the beginning of your data science career? here's my background i recently went back to school and got my degree from a prestigious university in astrophysics i'm in my late 30s now i have work experience in real estate field which obviously doesn't have much with my degree especially since i got my degree couple of years ago i was originally an astrophysics major so i had taken all the science ap classes which is to say that i've been science minded all my life i'm at a point where i want to utilize my skills in a field that's growing and is in top of whatever is going on i also don't want to end up working at a tech job as a entry level trying to work with young 20 year olds fresh out of college starting career to move up in the company i don't want to spend next 10 years trying to move up in the company to be in management i've been studying python and django so i can work as a dev in a remote job however recently i've come across data science and am very excited & motivated i've started taking the udemy data science and machine learning with python course and have been studying hard for it i've also bought a book on data science from scratch and linear algebra to refresh my skills on math i'm currently contemplating joining a 16 week bootcamp where they teach everything to be a data scientist and also do project based learning as well they help setup portfolio to prepare for interviews their syllabus includes machine learning natural language processing & sentiment analysis recommender systems big data with spark & splunk deep learning and time series and computer vision with open cv their project based learning includes applying cross industry standard process for data mining(crisp-dm) and end to end development where students take a new project from start to finish it says it will allow students to demonstrate their skills in data acquisition data cleaning data enrichment modeling evaluation and deployment i've taken a look at some of the machine learning material and to be honest i didn't find the sample material too difficult in terms of math i've had times in upper division physics where i just had no idea what i just read or saw and just felt dumb however with this material it doesn't seem mathematically hard but perhaps logically hard i want to know with this kind of background is it realistic for me to get a job within this year as a data scientist? i've seen a lot of indeed job postings that require masters or phd which kinda makes me feel discouraged especially when they're looking for years of experience on top of that with an astrophysics background and learning all the skills that hopefully are used by data scientists in real life is it feasible for me to get a 100k data scienist job within this year? or would i be relegated to some type of junior analyst or junior data scientist (if there is one) with pay much less that what i'm looking for? if it's a junior data scientist position where i can gain experience and move up to senior position i'll be up for it to get experience i know this is a long post sorry but i wanted to give the exact situation i'm in so i can get some honest feedback or advice anything would be appreciated i'm a plane with fuel ready to take off and looking for a destination worth while going tldr: went back to school to get my astrophysics degree couple of years ago was in real estate before that but i was science oriented all my life learning python and looking to be in a bootcamp for data scientist i want to know if it's realistic for me to get a job as data scientist this year without advanced degree and years of experience if i learn everything they do? i want to know with this kind of background is it realistic for me to get a job within this year as a data scientist? i'm assuming you have a b sc yes? if you are mathematically literate and can code you have the bare minimum for data science as with most things the bare minimum is not enough with an astrophysics background and learning all the skills that hopefully are used by data scientists in real life is it feasible for me to get a 100k data scienist job within this year? no or would i be relegated to some type of junior analyst or junior data scientist (if there is one) with pay much less that what i'm looking for? yes you should also note that salary is related to location demand and supply i live in canada and it is not hard to find a data scientist role for 80-100k that is likely not the case in other places if it's a junior data scientist position where i can gain experience and move up to senior position i'll be up for it to get experience this is most likely the case this isn't as bad as you think within 3 months my inbox was exploding with recruiters offering to chat about new positions ok thank you for the response i guess i might have to look for an alternative path for now i currently live in ny and am moving back to canada soon to re-tool and switch careers into data science any recommendations for great bootcamps degrees etc in canada? what makes you think theres greater demand in canada over other places as well? uhhh very unlikely i have a quant phd and still dont think i can get ready in 6 months but please prove me wrong can you please come back to this post after you got the job and let us know how long it took you? is it feasible for me to get a 100k data scientist job within this year? no it is not easy nor is it likely that your profile will attract a $100k data scientist job in 2018 if you want a job in tech i think the boot camp is your best bet a good boot camp will probably land you some worthwhile opportunities few of them will be data scientist titles and fewer still will offer a $100k starting salary a bootcamp might be an expensive option considering the fact that you already got your formal education you may get by with a simple data science education to enhance your knowledge then you can give it a try or would i be relegated to some type of junior analyst or junior data scientist this is the most likely scenario but i wouldn't say "relegated" given your lack of experience this would be a necessary step is it feasible for me to get a 100k data scientist job within this year? no even most masters grad don't find this type of gig for a year or two i've seen a lot of indeed job postings that require masters or phd which kinda makes me feel discouraged this is the reality unless you have a solid resume with lots of side projects and experience you likely won't get a good data science job without a masters phd i suggest you build your portfolio with some realistic interesting projects ok thanks for your response i guess my expectation was wildly out of reality does your astro experience include big data analytics? i know that some topics in astronomy (not necessarily astrophysics) crunch through huge image files do a ton of filtering smoothing etc if so that would help get you an analytics-type job though it'd be a stretch to land a data science job with only that experience you're right that ml theory isn't like nightmarish graduate physics that certainly would be a plus for me however experience with ml is critical have you done a kaggle competition? edit: be careful with boot camps if they don't motivate the content with real data and mimic the experience you'd get irl that education could be superficial -- 1 mile wide 1 ft deep take your time unless something really really big happens i think ds will remain a desirable competitive career for decades it’s called devmasters which is offered in california what is the name of the boot camp you are looking at? he commented which boot camp here hi all i'm currently a web analyst without any r or python skills i'm looking to dive deep into one of them to start and am looking for efficient tracks or curriculum's without having to attend a bootcamp so far i'm just following datacamp's python - data analyst track but from your experience what's a recommended next step after datacamp? should i check out another course via udemy udacity or edx get my hands dirty with something like kaggle or perhaps check out a full data science course from harvard of the sort? would appreciate any advice! cheers i have worked as an it-analyst in erp-system implementation for 4 years then i realized i couldn't do it anymore left the job and spent 6-8 months to self-study: learned python enough for data science; went through andrew ng course on coursera (googled python solutions for it); tried several kaggle competitions; learned a bit of everything by googling playing with examples reading books etc: classification regression clustering recommender systems nlp neural networks tried to renew knowledge of statistics theory of probabilities linear algebra calsulus; made a little portfolio: https: github com erlemar erlemar github io ; sent resume to dozens of companies went to 15-20+ interviews did 5+ test tasks without success; at last got lucky to find a company which hired me currently i work in a bank and try to create a model predicting the probability of client activating credit card by myself; one of main things is practice you can go through many courses but doing a project by youself is a real thing by yourself? did you appointed that project to yourself or was it something that came from above? i'm currently looking for new jobs but all i see is agencies and companies talking a big game when all they do is set up google analytics all day if you happen to document any of your work i'd love to read about it it was from above and isn't really a great thing the company was looking for a person specifically for this project but there is almost noone else in my department who knows machine learning only both of my bosses know something about it but only one of them knows python and he has little time to help so i do the project by myself and make mistakes of course the project itself isn't difficult (especially considering that this is a bank) i have already build the machine learning part now i spend most of the time on getting new variables and analysing them if you are interested i can describe how i worked (and work) on this project in more details yes i'd love to hear it even if it's just a simple classifier i think it's good for you too i've started posting my work on my personal site and sharing it on social media (linkedin and twitter mostly) people notice this stuff and i think word gets around that you know your stuff my hope is that it helps establishing some sort of "personal branding" (however cliché that is) looking for jobs becomes easier when you're known i've seen people make a career out of sharing their work the main problem with sharing such thing is non-disclosure agreement so the description of work could be something like this: in my current company i have worked on this cool project concerning [cut due to nda] i used model [cut due to nda] on data [cut due to nda] and achieved accuracy of [cut due to nda]! with this the metric [cut due to nda] improved by [cut due to nda]! but seriously yes then i'll write my story i just need some time for that yea the nda of course what i do normally is to focus on the problem analysts i simulate the data never use the og data you used at work of course but you can write a post like "clustering customers" then create a fake dataset and mostly explain the analysis i think most people will get the business angle anyways and here is it great!! thx hey man got thinking about restructuring my self-learning again i was wondering how you planned yours? did you read books cover to cover on all the different topics (i e calculus finite math probability machine learning set theory etc) or did you take classes? how was your schedule and techniques (did you take notes did exercises etc) i would love to spend time reading 500 pages books on this stuff but it's not feasible on a full-time job schedule i'm thinking of devoting 2hrs of each working day on self education but even at that rate there must be a time efficient way of learning all of that content and more importantly remember it what was (is) your method? i didn't really plan my curriculum i just had several targets and tried to complete them step by step also i want to point out that i have left the previous work so that i could study more for ~6 months i have spend 4-6 hours at average every day on studying machine learning and other things at first i wanted to have enough sources of information for this i subscribed to several linkedin groups subreddits facebook groups twitter accounts and so on i needed to learn programming i chose python but it could be some other language (though r and python are most common for beginner) i took one book (automate the boring staff with python) read it and completed most of the exercises then i spend some time training my skills on various sites all of this took ~1 5 months; then i spent several weeks on andrew ng course because i wanted to study machine learning basics and the course is still one of the best i didn't want to learn octave matlab and my python skills were lacking so i have found several github repos with solutions and studied them; after that i wanted to get some "real" practice i went on kaggle and tried several training challenges - such as titanic it was really useful but the information was overwhelming sometimes i took notebooks of other kagglers and "dismantled" them - i tried to modify every cell so that i was sure how it worked; after some time i was able to create solutions by myself; then i wanted to know the basics of various spheres of machine learning - clustering nlp recommender systems deep learning i googled a lot read a lot of articles and created a number of notebooks for my github portfolio now i think that most of that wasn't really necessary as for math statistics and so on - at first i started doing courses on khanacademy but soon realized that i don't need it so i read theory only when i couldn't understand what was going on for example reading some calculus when i was writing backpropagation; now i can say that one of the most important things are projects it could be simple analysis with visualization which generated some insights or a machine learning project when you predict something and explain how you did it or a good kernel on kaggle or something else but it should tell a story - why did you decide to do it how did you do it what are the results and how can they be used or maybe you created some new package which can be used by other people you need to show that you can create new value; also it is very important to understand that there are many types of work in data science i hoped to work on a project with sophisticated machine learning for example but the reality wasn't so nice my project in bank leaves little freedom in machine learning and needs a lot of statictics - correlation multicorrelation significance and so on so you can't be ready for everything a good idea would be to choose what you want to do and learn things necessary for this most of my curriculum has been improvised as time passed i can't link to any specific one because on any given day i would read a least 15 different things about 30 different subjects in hindsight i would suggest you start by picking one language (i chose r) and find a youtube playlist to learn the basics i used marin stats (dude has very concise videos) you should start learn how r works as a programming language before thinking about stats also check david langer's lectures on data wrangling learn about the tidyverse i mostly use those libraries in r work through some examples and get a grasp of the language when you feel you get it then learn statistics i would start by finding cheat sheets that's how i normally compress my learning i find a high ground cheat sheet and break down the topics from there watching videos to understand what the subject is all about then i get into details i think of stats as a series of tests to validate what you believe is happenings in your data so all you need to do is to know when and how to implement them and interpret them for things to stick you have to use them i write r script to train on a subject and post then on my github what's resource did you use to get a grasp of the r language? i have a stats background already but zero programming background so when i approach a problem i just learn specific tasks related to the problem but have no higher order understanding of the language therefore i always run into problems because i haven't "learned" that specific task the programming side and syntax is what i find the most difficult the stats are easy compared to the work required to do the data manipulation and analyses i want as i said youtube is your best friend it took me maybe a couple of months to "get" what r does on a theoretical level until then you'll have to sludge through mechanically solving tasks thing is when you don't know how to program it really is like learning a new language you know some words and sentences here and there but you're not really being creative with it but that comes with time it's an advantage that you already can the stats part as i said in the previous comment david langer's playlist on yt on data wrangling really helped to grasp the high level thinking behind data manipulation it's just about merging datasets creating new variables etc but what's cool is that he makes you understand the logic behind it also read david robinson's blog variance explained and julia silge's blog reading data scientists work helped me tremendously don't focus too much on the code initially functions can always be remembered (i still look up functions most of the time but now i have a feeling for what want to do) instead focus on understanding the reasoning behind the analyses they do looking at other people's work will help you undress how to think about data problems projects in r here's the link to langer's series data science with dplyr thanks a lot for your reply and resources will have a look at them all i just read sklearn's documentation you'd be surprised how much information is in there https: github com open-source-society data-science there is a set curriculum by oss (open source society) it looks pretty comprehensive http: www-bcf usc edu ~gareth isl try datacamp they offer data science oriented programming courses for r and python i have the dc subscription and i can say it's pretty great if you have the discipline to do all their classes you can learn so much for myself i've used mostly coursera there after andrew ng's course (i'd recommend that for anyone who hasn't taken it) i took the statistics with r and intro to ds in python specializations reasons being: they don't try to sell "do this and you're instantly a data scientist" which i don't think is true of any mooc the "data science" specialization uses python instead of jhu's which uses r personally i think python will be a better investment (though many will disagree!) they teach you enough to begin applying your knowledge immediately to other datasets (this is huge) they teach you basics (simple) that are quite important (not a waste of time) after taking the classes i then try to apply what i learned to at least one (if not more) independent projects they can be competition (kaggle etc ) datasets personal projects or past projects done for coursework that helps solidify what you learn as well as build a portfolio a win-win for me if the subject has a textbook that's usually a good place to start anywhere where it's a guided thing rather than you piecing things together especially if you have no background knowledge in the subject but honestly i found that once i get the very basic concepts down reading about the same topic from multiple sources gets me past the hurdles pretty well lastly do a project something you actually want to do using your newfound superpowers so the lessons stick and you get something that you actually want safaribooksonline com literally the best resource to learn anything cs related right now im going through introduction to machine learning with python by sarah guido; andreas c müller which was recommended by the data scientist at my company for a good intro there is alot of tutorials on kaggle too but safaribooks has so many techicals books and video courses i haven't been able to come across a better resource yet maybe a claraficatory follow-up question from me: would you first concentrate on one language or also learn another one as soon as you get the basics of you first language? what' your experience motivation and learning wise? claraficatory - come on that isn't a real word is it? i mean it's valid insofar as it conveyed your point but is it in widespread usage? genuinely asking :) i'm not a native english speaker i know it mainly from questions from the audience after academic presentations but to be honest here i suppose it was just a way to justify sneaking in a question of my own :) it's all good brother - it makes etymological sense so i was genuinely unsure as to it's legibility as a 'real' word but hey - you wrote it and i understood it so who needs some stuffy academics to validate it? these are the same guys that include 'twerking' and 'web 2 0' in their official dictionaries so if your word isn't in then it damn well should be! thanks for responding and also for not attacking me! x self-learners self-teachers i'm wanting to attend an online grad school program for a number of reasons i have degrees in finance i t and math with minors in computer science and economics i feel i am a strong coder with a breadth of languages data science is the direction i want to head: it plays very well to my skill set and it's a career which i think i'll actually enjoy last summer i applied to 5 programs (some local some online) my top 2 choices were berkeley (ms in data science) and tamu (ms in statistics) i got into tamu but not berkeley - i am scheduled for a feedback phonecall with berkeley to see what went wrong i have a feeling it was my gre scores which weren't amazing - i have my reasons but it is what it is suffice it to say reapplication would likely mean a gre retake my gpa was stellar (i graduated with triple honors) and i think my essays were on point here's the rub: i'm starting to think it may have been a good thing not getting into berkeley this has nothing to do with the quality of berkeley's program (which i'm certain is excellent) but more about what data science programs cover and what the market is doing here are some bullet points from my perspective: data science pros the berkeley brand power is immense; this is a driving factor for me the program curriculum looks very complete despite its youth program access for life cutting edge and very buzzy data science cons much more targeted degree; almost vocational the portability of this degree is questionable the hireability of this degree is questionable if we were to see a recession claims from employers that data scientists are coming out of programs "half-baked " where they know how to push buttons but can't apply (or even state) the theorems they're using properly (edit: i want to be clear - this bullet point is not aimed at the berkeley program but ds as a field of study) statistics pros tamu brand is formidable program is well established very portable degree in a variety of jobs the hireability of this degree is far less questionable during a downturn several success stories of individuals completing this degree and supplementing with inexpensive data science courseware to achieve a data science-related job statistics cons program is far more theoretical less codey perceived as "boring" by some employers; doesn't maintain the buzz factor so here's the question: do i retake the gre (which is 3 months of study on top of job and school) to try for berkeley again or do i gun it for tamu and take some ds online courses along the way? assume cost isn't a factor thanks in advance! edit: i finally received my feedback call - details in the comments perceived as "boring" by some employers; doesn't maintain the buzz factor this isn't really a bad thing people are skeptical of data science programs that said berkely is probably an exception i think you can't really go wrong with an ms in stats the only concern is opportunity cost indeed! i totally agree - berkeley does appear to be the exception with employer consideration which is one of the bigger reasons i'm struggling with this decision yea - as i was typing that out i kinda realized i wasn't adding much to the discussion lol sorry at this point i'd appreciate any input at all you're reconfirming my thought process and there's some comfort in that i've interviewed 3 people from the berkeley program none of them got past my technical screener i've had people from bootcamps get past it ymmv understood - that's definitely my biggest concern with the program thank you do you know how long these two programs last? also does the tuition fees matter? great questions - they both can be completed in about the same amount of time and the cost isn't a factor (i've edited above to include the bit about cost) then i guess it comes down to whether you want to spend the three months of difference studying gre or work toward getting more experience in data sci after the end of your degree also if you orient your thesis (if you have one) toward a ml subject the difference would be a lot smaller interesting - the program at tamu is thesis-less i believe however i was considering creating an online portfolio of ml and ds projects that i could show to prospective employers in lieu of getting a ds degree [deleted] excellent! thank you and congratulations! really the only thing i'm a bit worried about is the qualifying exam have you taken it yet? thanks! no not yet i plan to this year though they give you a lot of older tests so you have a general idea of what they quiz you on but i have heard its a tough test however i think it gives the program more legitimacy having an exam like that what was your gre score? several points below the statistical average of admittance i could give you the sob story of why i basically had to take the thing blind but it's irrelevant the question is how to proceed it could also be work experience after bachelors i am in the berkeley program and the typical student has around 5+ years of experience in it or other technical fields i personally had 8 years as a data analyst before starting the program some folks have 20+ years including a couple of c level executives pm me if you have any questions also try asking the admission representative what you could do to improve your profile very interesting indeed - i hadn't even considered the experience aspect it's a bit downplayed in the applicant material but it makes a lot of sense thank you for this! i'm expecting my call from the review committee early january - i'll post if this was a deciding factor just received my feedback call: it was indeed work exp did you have any (relevant or otherwise) work experience? if yes do you mind me asking how much and why they felt it wasn’t enough? my husband just applied to this program and on paper he would be exactly what they’re looking for except his work experience is not very applicable also do you remember how long it took you to hear from admissions? did you look at smu? their msds was actually cheaper than berkeley at least when i compared them a couple years ago i did cost is not a concern (i saved much of my g i bill) while i'm certain smu offers an excellent program i'm not seeing the same level of brand recognition among employers as i am with tamu or berkeley what sources are you using for the brand recognition analysis? from personal experience in the dfw market smu was highly recognized but that may be due to local exposure it was a mix of about 40 professors and potential employers granted they were all in my area; since i am not willing to move at this time i found it to be a relevant sample again i am certain smu offers a great program (especially based on the online reviews) i just have yet to see evidence that would support smu receiving the same brand recognition as berkeley or tamu among universities or employers how does active recruitement at smu msds programs from at&t toyota fossil siriusxm just to name a few sound? that sounds lovely! but it doesn't necessarily address the brand argument there's a local msds program here that sees active recruitment from amazon jp morgan excel and comcast the big name employers are great but there are so many more prospects of interest to consider among small- to medium-sized employers in particular (which i find fantastically interesting for my own reasons) i think we would find few candidates who hold the same brand weight as a berkeley grad again that is a regional thing in texas and dallas in particular smu is on par or just below ivy league schools in louisiana tulane is the same way in both places uc berkeley is no better than any state school which are all below private schools unless the school is super elite like ivy league or oxford everything is regional preference [deleted] because i am one of the first 100 people to graduate from smu with a masters in data science and smu and ucb were the first two schools to offer an online masters in data science i know first hand how difficult the smu program is and how many doors have opened because of that degree if none of those companies interest you then i've wasted my time explaining things to you because they are all leading employers and excellent companies well maybe not at&t but they do employ a lot of people i finally received my feedback call from berkeley - as u2fan implies above work experience was the detrimental factor that killed my application most applicants have a minimum of 3-5 years of related work experience while my gre scores were not awesome (and below the statistical average) they were within the acceptable range of entry another interesting point in my feedback interview: they liked my statement of purpose but it lacked the direction the committee normally sees in applicants who've worked in industry for a few years long and short: i'm glad to know it wasn't my gre that killed my app xd in an effort to stay up to date and relevant i've dabbled in a number of data science courses since graduating with my master's in 2016 i've learned a lot of syntax and or theory but i haven't really worked on anything that has produced output that i can add to a github page and show potential interviewers what i've accomplished i know some of the coursera specializations have capstone projects has anyone found any that are particularly worthwhile? any other courses to recommend where i could work on some sort of task project that would showcase my skills once complete? thanks! edited to add: i know kaggle serves this purpose but i'm really looking for an actual course where i can spend time learning and then applying that specific knowledge to a project along the way note that it can be a lot less impressive in resume when you implement a project which is pre-defined as a success by a course i e you learn everything you need for this project and the project is designed by course authors to showcase what you learned even worse if parts of the project are already implemented and you just need to fill the slots doing such projects is a great way to learn because they give you some hands-on experience but they are not the best way to show your skills and creativity in a real world you won't have such pre-staged projects real world is messy cvs of junior data scientists with all these capstone projects look the same good kaggle results (not yet another titanic analysis) well-written scientific papers (even if they are just on arxiv org or maybe even in a blog) useful open source contributions creative projects you do on your own (like @krhyyyme suggested) - this is what make you stand out is it too much? probably yes but that's the way to go if you want to be noticed a reasonable junior data scientist position may receive 100+ applications in a week or two; some companies may do a lot of interviews others might talk only to top candidates a capstone project is unlikely to make you stand out so do it mainly for learning not for boosting your cv take it as a personal opinion though :) to be honest - if you already have basic working theoretical knowledge i would brainstorm an idea and try to get it working most project-based courses will have you do this anyway here is a cool list of public data sets to help with brainstorming: https: github com awesomedata awesome-public-datasets most master's programs let alone mooc's tend to have the same recycled projects that's not to say the coursera capstones aren't worth doing for a personal learning exercise; nor is it to say you can't produce something cool as a result (i came up with at least one semi-interesting visualization from an assignment) but don't expect the project to be a tell-all or "wow" any interviewers http: cs109 github io 2015 this is the best resource i've found at the homepage there is a reference to old homeworks+labs another one is going to kaggle learn and follow their tutorials and see if you can improve some of them saved! you could take up a business case at analyttica treasurehunt https: learn analyttica com in the courses you can work on real-life data sets you can actually make use of the data sets and inbuilt algorithms you could as well create your project with your own data and showcase it to anyone i'm a statistician by trade my entire academic background is in statistics including an advanced qualification i have done no statistics at my current job and i'm at the end of my rope i've been at my current job since may 2017 it's my second job directly out of college i was laid off from my first job during the "golden handcuffs period" (i owed them money if i quit before a year) because they didn't have any data sciencey work for me to do i was essentially a dba there even though i’d keep suggesting reasonable analyses and easy ways to make their infrastructure more data driven instead of heuristic i brought up my concerns about wanting to actually do data analysis when i took my current job which is at a small team in a satellite office of a main office my boss is a very smart person with strong statistical credentials he's also annoyed at the lack of the statistical work but is staying around because he expects it to show up this company is pretty early in the process of building serious data infrastructure the only statistics i've done are one quick demo analysis in the beginning and another that took a few months but was cancelled due to internal factors (i had no results at that point) at both jobs people are aware that i know advanced statistics but i've always been stuck doing other work i have been doing literally nothing but demoware and basic dashboarding since september i've done some coding tasks because "oh you're the data guy and i'm a senior guy do this vaguely related data thing that isn't statistics " i've been making mockups for pms to show at conferences the work i do is very high visibility but has no statistics involved which is why i'm annoyed i do the work but i've been miserable because it has very little relation to what i want to be doing unlike my boss who has a considerable resume i have little experience actually doing statistics i had an internship as an undergrad where i had no mentorship and they didn’t know what to do with me so they kept moving me from random task to random task and a statistics reu where i was literally one of two students who was assigned a non-statistical project it's particularly frustrating because i like literally everything else about this company my boss is awesome my coworkers are super nice everyone is super positive and i can work remotely one day a week i'm well paid (honestly more than i should be given the work i've been doing) and have great benefits i have a performance review next month i'm expecting it to be fantastic given that i've repeatedly gotten compliments on my work there's a machine learning team that i talk to a lot (outside the context of work) that's doing a lot of work i'm interested in like voice and image recognition i would like to leverage my positive performance review into being able to do more machine learning work because i like this company otherwise i will quit after i hit the one-year mark and find a job that will actually allow me to build models rather than being a glorified dashboard and demo guy how should i best ask to move to machine learning when the dashboarding work i'm doing is so vital? if they don't let me transfer teams how can i leverage my current experience into a new job where i'm doing actual data science rather than making and maintaining dashboards for other people? i've done some coding here and have good sql skills but i've found that my background in those areas has made me a ripe candidate for "code things that manage data and do basic bi stuff" roles rather than the more technical advanced statistical roles i'm interested in my current thought is to emphasize my statistical background make it clear that i want to be doing analysis work and to point out that i was able to handle high visibility high impact projects that require talking to stakeholders in the past i think it's important to remember that applied statistics is just as much about data engineering and data viz (aka "dashboards") as p-values anova whaterver else you learned in school particularly relevant is tuckey's 1962 definition (quoted in the also very relevant https: www tandfonline com doi full 10 1080 10618600 2017 1384734) for a long time i have thought i was a statistician interested in inferences from the particular to the general but as i have watched mathematical statistics evolve i have had cause to wonder and to doubt …all in all i have come to feel that my central interest is in data analysis which i take to include among other things: procedures for analyzing data techniques for interpreting the results of such procedures ways of planning the gathering of data to make its analysis easier more precise or more accurate that said for your particular career circumstances it sounds like you want to be on the ml team i would strongly suggest not bringing this up with your current manager; he wants you to keep doing what you're doing since it's generating a lot of value for the business and making him look good (don't underestimate your salary value btw good bi analysts are actually very hard to find) also you probably won't be able to change teams until they have an official opening (either by getting budget for a new hire or from someone leaving) so you need to just wait until a new position is announced advertised on the ml team as soon as this happens talk to the manager on the ml team about applying they should strongly prefer you to an external candidate since it will be faster cheaper for them and you've already proved yourself to be a good culture fit this will probably also be easier than finding a ml position at a new company as it's hard to land these roles without work experience in the mean time some more advanced statistical techniques you can use in your current role might include: multilevel bayesian models (for putting error bars on segmentation plots) cart conditional inference trees for automated segmentation drill-down analyses bayesian experimental frameworks (if you're doing a b tests) i think it's important to remember that applied statistics is just as much about data engineering and data viz (aka "dashboards") as p-values anova whaterver else you learned in school particularly relevant is tuckey's 1962 definition (quoted in the also very relevant https: www tandfonline com doi full 10 1080 10618600 2017 1384734) thanks i’m aware but these things are 100% of my job i have no problem cleaning data doing a little engineering and making nice visualizations as long as i can do more intense analysis i do no statistical analysis none in my career trajectory it’s been over a year since i last performed and interpreted a logistic regression i’m not even doing “here’s a pretty graph here’s the story” types of analysis that said for your particular career circumstances it sounds like you want to be on the ml team i would strongly suggest not bringing this up with your current manager; he wants you to keep doing what you're doing since it's generating a lot of value for the business and making him look good (don't underestimate your salary value btw good bi analysts are actually very hard to find) also you probably won't be able to change teams until they have an official opening (either by getting budget for a new hire or from someone leaving) thanks the way this team works is we’re an autonomous business unit we kind of hire as we find people that are smart and have them work on internal ml problems it’s not like a “we have six positions to hire put the posting out” situation i was initially hired because someone contacted me to ask if i was interested in an unlisted position i’m also doing no actual business analysis i’m literally just making reports that pms ask for so they can do the analysis it’s basically “oh this is doing well this isn’t doing well” work and nothing beyond that so you need to just wait until a new position is announced advertised on the ml team as soon as this happens talk to the manager on the ml team about applying they should strongly prefer you to an external candidate since it will be faster cheaper for them and you've already proved yourself to be a good culture fit this will probably also be easier than finding a ml position at a new company as it's hard to land these roles without work experience that’s a good point thanks! i figure once my good performance review is in i can ask about it i know the ml manager quite well and my boss reports to him i just worry that my boss will be mad at me if i go over his head and i’ll see him all the time i don’t think he’ll be mad if i don’t want to do dashboarding all day in the mean time some more advanced statistical techniques you can use in your current role might include: multilevel bayesian models (for putting error bars on segmentation plots) cart conditional inference trees for automated segmentation drill-down analyses bayesian experimental frameworks (if you're doing a b tests) we’re not even at this stage the only software i can use for this dashboarding is tableau because it’s what stakeholders want it’s extremely limited in what it can do i’ve also been told that “boxplots are too confusing for mbas” and to stick to nice bar charts i’ve never run an a b test here and even if i did my boss would ask me to stick to more traditional techniques here's an update i had my performance review today and it was glowing i was complimented on all of the work i've been doing and told that it's vital to the company my boss asked if i had anything i wanted to bring up i said i wanted to take on some machine learning work because i find it interesting and want to get more analysis experience he responded by shitting all over the ml team saying that they aren't doing anything valuable and that my dashboarding is bringing business value and i'll be able to analyze that data in the future i asked again mentioning that they have a ton of projects that are interesting and that were sold to me when i signed on and that it wouldn't be all of my work he basically told me tough shit and that i have to keep doing exactly what i've been doing i'm going to start hunting in may once i have a year's worth of experience here there's no reason for me to continue working here if i'm not going to get to work i find interesting o dam thanks for the update unfortunately not surprised there's a guy on our team who's in almost exactly the same position as you do keep trying to go around your manager directly to the ml team manager if you can (unfortunate problem is that most ml teams don't want a junior that they're going to have to train mentor) in my experience "time split" setups don't really work well in practice in the mean time i would encourage you to try something like rshiny for new dashboards and to try and sneak some real stats in there (bayesian modelling in particular is a pretty in-demand skill right now) but yeah sounds like your best bet might be to move on thanks i don’t usually do side projects because my other hobbies occupy my free time but i’ll try to come up with one i can do in the next few months i could go over my boss’s head and do this but he’s in close contact with this other manager (whose work he apparently thinks is useless) and i’d have to see him all the time i don’t see any way i can transfer without burning a bridge and i’d be fine coming back in a few years once their infrastructure is more developed i’ve begun hitting up my friends at other companies and basically have them on standby until may when my year is up i’ve decided to swear off entirely on small companies for my next position i’ve been at two and both times i’ve get roped into doing work that hinders my development and that bores me because “hey i’m an upper management dude and you do data stuff right? do this thing that involves data and no analysis or ml for me ” i somewhat disagree with the other comment i think your performance review is a great time to bring up your career i wouldn't necessarily phrase it like 'i want to move to ml team' but rather just lay out your 2-3 5 and 10 year goals if your boss is 'in your corner' then they will work with your to achieve your goals if not then obviously look for another job also depending on the company and your relationship with your boss & the boss of the ml team id try to slowly do an 80 20 type deal (or even 100 20) essentially do your current role and try and get your foot in the door with the ml team even if that means just validation cleaning de work or oddball projects that they dont have the bandwidth for it really depends on the work environment i know at my company they are ok with movement as long as youve worked in your current role for a year or so thanks! i feel like 100 20 is going to be what i have to do because of this other work i just hope the business pms don’t get mad because i’ll be dedicating time to other work here's an update i had my performance review today and it was glowing i was complimented on all of the work i've been doing and told that it's vital to the company my boss asked if i had anything i wanted to bring up i said i wanted to take on some machine learning work because i find it interesting and want to get more analysis experience he responded by shitting all over the ml team saying that they aren't doing anything valuable and that my dashboarding is bringing business value and i'll be able to analyze that data in the future i asked again mentioning that they have a ton of projects that are interesting and that were sold to me when i signed on and that it wouldn't be all of my work he basically told me tough shit and that i have to keep doing exactly what i've been doing i'm going to start hunting in may once i have a year's worth of experience here there's no reason for me to continue working here if i'm not going to get to work i find interesting hi guys i have 6+ years of experience data working in traditional data warehouses as an etl developer data analyst and a data architect i've worked on 3nf data warehouses and built and enhanced many kimball style dimensional data marts past couple years i have been doing more solution design and data modeling than etl development here is the current technology i am experienced in: * relational databases : can write complex sql as well as plsql scripts store procedures database triggers etc most of my work has been with oracle databases but i have also worked with other vendors throughout my career informatica: this has been the etl tool i am the most familiar with i no longer do any hands on etl development but i work closely with developers as i give them the specifications on what to build i also get in there and debug from time to time business objects: i have never been a report developer and have worked mainly on the database side however bo has been the reporting tool that all the teams i have worked on have used so i am familiar with it erwin: this is the data modeling tool i use now that you know where i am coming from maybe you can give me some advice on how best to pick up "modern" data engineering skills i am aware of what is in demand in the market and have been proactive do self study creating a list of technology to pickup this is what i have picked up so far as well as plan to learn in the near future but because i don't use these on a daily basis i consider my level to be be that of a beginner: python: i know the syntax and the fundamentals of the language i've also spent time with some popular libraries like pandas and numpy what should i really focus on specific to data engineering and maybe data science? ay resources you can point me to? aws: i've signed up for the free tier and have gone through the majore services (s3 dynamodb ec2 etc ) what services should i really focus on here and can you point me to some good resources? i already use acloudguru nosql: i've done a course and played around with dynamodb so i understand key value and documented oriented databases should i spend time learning mongodb or do the concepts transfer over easily? any other tips you can share? i also plan to play with a column oriented database spark: i have no experience with spark this is next on my list i plan to do some courses that leverage databricks any tips or resources i can use? redshift: i haven't yet played with it but intend to any tips? i assume most of learning this is conceptual coming from row oriented databases and i should be able to pick this up fairly fast any thoughts? tableau: i've messed around with the community edition this is really low on my list so what do you think of my approach to pick some of these skills up i'm using whatever resources i find online (udemy pluralsight etc) to learn this tech and practicing on my local machine or in the cloud when possible in my experience i think self learning can only go so far and there is no substitute for working on actual deliverables with a team my aim is to get to a point where i won't be deadweight and then just apply for jobs that utilize these newer skill sets so i would say you're setting yourself up pretty well for a data engineering job but not a data science job is that your aim? data engineering? if so i would look at: spark as a priority hadoop minimum hive and kafka if you want to do 'cloud data engineering' rather than 'data engineering' i would be looking at aws' glue lambdas emr kinesis services from a professional accomplishment perspective doing the aws solutions architect associate and then big data speciality wouldn't hurt i did post this q in the dataengineering sub but given the user base of that sub i don't think anyone will respond this seemed like the next best place i think given my current experience modern data engineering is lower hanging fruit than data science i have done a few intro to data science classes but my stats skill isn't there to transition to a pure data science position (although i think the work sounds awesome!) thanks for your advice i agree that spark is a priority if you know of a resource that teaches how to use it in a production type environment it would be helpful thanks! sadly not - but if you find one please let me know? i need to teach myself more about data engineering too! also find out what a lambda architecture is :-) that one i always use when interviewing dengs you're definitely on the right track here are the things i directly have experience with that i think you should focus on python - invest some time learning how to write idiomatic python this will get you closer to knowing python in a way people will respect in addition to numpy and pandas spend time with tensorflow in order to dip your toes into modern ml data engineering redshift - a huge number of smart companies are starting to migrate to redshift-centric elt-based (instead of etl) systems i would definitely spend time learning it spin up a cluster find a fat open source data set and start exploring it with redshift learn the basics of how redshift data partitioning works and how to get data into out of the database tableau - a lot of places still use tableau but many are moving to cloud-based platforms there are tons of them these systems are honestly not that hard to figure out so i wouldn't prioritize any time on them reading - buy "designing data-intensive applications" by martin kleppmann published by o'reilly right now just do it that book alone will do more to modernize you as a data engineer than learning any specific technology it will help frame the knowledge you've gained so far into the fundamental problems that need to be solved for data applications it's easily the best programming book i've picked up in the last 5 years tbf on redshift i would say there are 3 things to understand: what is an mpp database and what does the blocksize mean what are interleaved keys how do you optimally load data in all great points i would add that understanding how a columnar database works is just as important to start gaining an intuition for what problems it solves well agreed but i figured that would be captured by mpp :-) thanks for the book recommendation looks great thanks for this i did pick up the book and reading the first few chapters told me it was gold any resources to help with idiomatic python? what do you bean by etl vs etl-bases systems? etl vs elt - extract transform load vs extract load and transform hello i really want to become a data scientist and would appreciate some guidance on the best strategy i should take i really enjoy statistics data analysis computer science and solving real world problems i know i have the drive and passion to do it however there is a lot of work to become competent in this field and i want to know what i can do to get ahead about my background i am graduating in this upcoming spring semester from the university of illinois with a bachelors in engineering physics my degree is flexible as it is not pure physics so i was able to take a couple 400 level stats probability classes along with a few cs courses one of which being data structures which is regarded as the school’s flagship course for cs (i am taking it this spring as i just finished all the pre-recs) i know how to code but would by no means consider myself an expert i have been spending a lot of my time trying to sharpen my skills in python but would only call myself intermediate at best since i am not a cs major i feel behind in this aspect compared to other students pursuing data science any advice on specific areas i should learn would be really helpful i started a little project over thanksgiving to scrape ebay and get sales data of certain collectible cards because i noticed there was a lot of arbitrage and i am knowledgeable on the topic i bought and sold 1 item already and made a ~25% roi it was cool to me and inspiring to keep building the program will be designed to scrape ebay’s sold listings page and takes all the relevant data on the item and put it into a multi-dimensional array from here i want to figure out how to constantly store and update the lists to have a rolling list so i can have data reaching back further than the 3 months ebay gives you then i can find price margins to flip the cards and see how the price moves over time since there are literally hundreds of viable cards to trade this program will really help me sort through all the data and gain valuable knowledge it is still in its early stages but it currently can scrape the data from the sold page and put the right data into an array now that finals are over i will spend more time progressing it this project is the only thing i can use to separate me from the competition i have no internships in data science and i had a rough first year when i transferred here so my gpa is not the best although it has risen significantly this last year once i graduate i think the best thing i can do is to get an internship to get more experience and train up my skills before i think i am qualified enough for a real job i am going to spend this winter break filling out applications and working on my project and hoping that someone will take me i have filled out some apps already and have a coding problem for akuna’s internship position due this week i don’t think i am a strong enough canidate to get the position but i want to attempt the coding challenge for the experience and to learn what areas i need to study more to get up to par in terms of python each test interview application teaches me something to make myself a better candidate anyways this was very long but helped me to lay everything out if you happened to take the time to read all this and have any advice as to what i can do to make myself a better internship candidate or advice towards getting a job i would greatly appreciate it also if there are any specific things you think are essential to know in terms of python that will help me in my future i would love to hear them i also am highly considering going to grad school to get a masters it would be in computer science financial engineering or some other related field i would only explore this option after i take a year off and practice coding create projects and hopefully intern if i feel i still am not qualified to get a job i will go on to get my masters or phd in something relevant but since my grades are not stellar i am not sure what kind of programs i can get into but i am sure if i work hard on my own i can prove my worth thanks a lot for your time honestly your best chance to break into data science is grad school you have a few huge disadvantages and grad school will either solve them or buy you time to address them undergrads have a hard time in this job market the burtch works 2017 salary survey estimated only 10% of all data scientists have an undergrad about 50% have a master's there are few ds positions in the midwest unless some big names are coming to uiuc in the spring you'll be limited by your geography your competition have stronger resumes than you at this stage internships projects and more relevant coursework for example my goal is to put you're resume in context i'm not trying to discourage you you have a stem undergrad that should get you accepted to a reasonable grad program a grad degree will qualify your for more jobs it will give you time to network and find good companies and it will give you time to build your resume all good things if you do grad school i think you should start asap there's no sense in delaying the inevitable for one year here comes some unsolicited advice i gather from your post history that you only recently discovered data science (you posted about becoming an entry level trader a few months ago) this is not a field one can just jump into the competition has been working at this for years and they are struggling if you're panicking to find a career and you aren't fully set on data science i suggest looking for data analyst positions instead and if i'm correct that you aren't set on data science then disregard my grad school advice above - i wrote it with data science in mind i want to say a few words about your ebay project it's interesting it has the potential to be really great nonetheless it's incomplete; it's a proof of concept at best if you can develop it i think it can be a huge selling point in the future as it stands it's good-not-great you aren't fully set on data science i suggest looking for data analyst positions instead what's the difference? i'm an econ bs student with 2 years work experience as a data analyst at my company 1 of 2 in my 350 person company i'm now looking at getting a masters but i'm limiting myself (for now) to online programs eventually i want to be a know-it-all-do-it-all data analyst consultant i want to take up programming not to become a programmer but to be able to understand and manage them can i ask for whatever advice you're willing to give based on what i've said above? thank you what's the difference? u jannemg12 gave a tight response in this thread 5 months ago there are many other explanations throughout the internet too can i ask for whatever advice you're willing to give based on what i've said above? are you asking for career advice? as in "how do i become a 'know-it-all-do-it-all data analyst consultant'?" u jannemg12 gave a tight response in this thread 5 months ago that response is perfect! i'm a data analyst and i want to become a data scientist under that definition are you asking for career advice? as in "how do i become a 'know-it-all-do-it-all data analyst consultant'?" exactly i'll give more details below but you can ignore it if you choose i'm a data analyst with two years work experience bs in economics good at excel and quickly finding trends patterns my value-added at my company is my lack of industry experience (hoteliers and travel agents) so i'm "new school" and "outside the box" as well being quick and accurate with gathering (1 or 2 sources) and analyzing data in my current company i'm asking for more work with deeper levels of analysis required i also have started adding what i call "fun facts" to team-internal emails they tell me if i'm right or wrong but so far i've received good feedback right now i play in microstrategy and apply my own analysis to questions and problems i'm working on getting back into sql (did it briefly at another company last year) so that i can have more freedom and do more work i'm also working with ms excel olap cubes but that's a different team so i won't be given much more access i'm also going to find some moocs or other free cheap courses to deepen my understanding of stats and start learning python my goal is by this time next year to be using sql and python (and our new bi tool that is being developed in-house) to do deeper analysis and work i am also slowly researching master's programs or even certificates my gpa at graduation was a flat 3 0 from my main school i also attended two other community college systems in order to graduate early and i don't know what my gpas there are i like my company and i'm sure they would let me work remotely to attend school but i'm in dallas so my salary is matched to our cost of living makes online programs much more attractive especially since my local option is pretty much only southern methodist (which is where i may do the certificate) career goals i want to become a data scientist if you want to break into data science you'll want to get more focused about your end goal eventually i want to be a know-it-all-do-it-all data analyst consultant i assume "know-it-all-do-it-all data analyst" means data scientist the "know-it-all-do-it-all" data scientists are called unicorns for a reason; they are so rare they may as well not exist there are maybe 10 people alive who can fully sub in for anyone on a big data analytics team i want to take up programming not to become a programmer but to be able to understand and manage them i can't tell if you want to be a staff data scientist or a manager if you aren't sure i recommend reading u drhorn's comment on deciding between staff and management the answer should affect the grad degree you pursue if you want to be a staff data scientist then it sounds like you're on the right track otherwise i'm not sure i'm the best person to advise your journey into management it's tuesday afternoon so you could probably get a number of worthwhile responses with a separate post on this subreddit grad school for the staff data scientist i think it's smart to pursue grad school whichever career path you choose but it's almost a necessity if you're going the staff data scientist route it's true that you are already a data analyst and that you will have a leg up on new grads; however grad school is required for many data scientist positions you'll have an easier time breaking into the field with a grad degree you may even get a bump while you're taking classes i know you know it but i'm going to say it anyway: your 3 0 gpa will hurt you in my experience a 3 0 is the cutoff for a lot of grad programs definitely try to bump your gpa on paper if your community college credits justify it (but definitely don't lower the gpa from a 3 0 even if it's technically correct to lower it) finally i don't think it's likely you'll be able to jump into a reasonable stats or applied math program unless you took a lot of math during your bs econ you'll need linear algebra at a minimum to jump into these programs (although some may let you take lin alg after you're admitted) all that said i'm confident you can still find a respectable grad program you are clearly demonstrating your chops at work despite your report card flaunt your work secure two strong letters of rec and you should find success i can recommend a few online programs i know from second-hand experience that northwestern and depaul have worthwhile programs i've read ut's online m s computer science degree is alright and the name is respectable when you're in grad school use the projects to build your resume your projects will be talking points in interviews the "know-it-all-do-it-all" data scientists are called unicorns for a reason; they are so rare they may as well not exist there are maybe 10 people alive who can fully sub in for anyone on a big data analytics team agreed i'd like to have working knowledge of "everything" in data (if even possible) for the purpose of working in a data team and eventually managing a team either at a company or in a consultancy i can't tell if you want to be a staff data scientist or a manager if you aren't sure i recommend reading u drhorn 's comment on deciding between staff and management the answer should affect the grad degree you pursue that is a great question! i went through and answered that post seems i lean more towards management but i think i aim more for gaining experience and naturally becoming a manager rather than a straight shot at manager finally i don't think it's likely you'll be able to jump into a reasonable stats or applied math program unless you took a lot of math during your bs econ you'll need linear algebra at a minimum to jump into these programs (although some may let you take lin alg after you're admitted) math is actually my weak point that's something i really need to work on i just had the thought to see where my overall gpa is (alma mater + community colleges) and then take math courses again at the community colleges to boost my overall gpa all that said i'm confident you can still find a respectable grad program you are clearly demonstrating your chops at work despite your report card flaunt your work secure two strong letters of rec and you should find success i appreciate that i'm a lazy student to say the least but i like to think that my work shows i'm competent and skilled (and the feedback at my company seems to reflect that) i'm that kid who floated in high school ap classes thank you so much for the advice i'm happy that you didn't scoff at online programs i'm nervous about trying that route since there appears to be a disdain for those programs edit addendum: what do you make of certificates rather than a masters? this is the one i'm considering and i've heard good things from a friend who's finishing up now i'm considering this certificate and a ba certificate so rather than mba or data analytics masters i get a little bit of both happy to help! i'm happy that you didn't scoff at online programs i'm nervous about trying that route since there appears to be a disdain for those programs the perfect path into data science is graduating from stanford at 24 with two phds in computer engineering and statistics and 6 years full-time experience in the bay area the rest of us have to make the best with what we have in your case online programs build for people just like you: working professionals and can't justify leaving two years of earnings to get a degree we live in the 21st century and university of phoenix isn't the only online option edit addendum: what do you make of certificates rather than a masters? this is the one i'm considering and i've heard good things from a friend who's finishing up now i'm considering this certificate and a ba certificate so rather than mba or data analytics masters i get a little bit of both i can't definitively speak to the value of certificates the purpose of a grad degree in part is to check the box that requires a grad degree on most data scientist job applications personally i don't think a certificate would check that box personally i don't think a certificate would check that box that's what i thought i think the only real benefit would be making me look better for master's programs still doesn't solve the problem of a low bachelor's gpa though thanks a lot for the advice i really appreciate it i do really want to get into data science as a career while i am still trying to figure out my life path i have basically narrowed it down to data science or quant trading and from my understanding the skill sets for the two arent very far apart from each i appreciate the honesty though i would much rather hear the truth and try and fix it than to be oblivious do you think i have a chance to land an internship for this summer or am i too far outclassed? i really want to go to grad school if that is my best route but i have not yet taken any gre's or applied and i think it might be getting too late i will look into it regardless but in the case where i have to take a year off would the best course of action for me be to try and get a data science internship ( assuming im qualified enough) practice coding and do projects on my own then start grad school the following year? or should i try to apply for jobs as a data analyst and study in my spare time to bolster my resume? i just feel like i am running out of time and there is a lot of pressure on me to get a career path going i know it will take a lot of time and energy to become competent enough but i am willing to do it if it will work i know i am capable to do this its just a matter of figuring the best plan and fully committing to it again i really appreciate the advice this is probably the biggest decision for me at this point in my life and any insight or knowledge really helps happy to help! do you think i have a chance to land an internship for this summer or am i too far outclassed? i think you're outclassed but you do have a chance chances get better if that ebay project develops into something more i really want to go to grad school if that is my best route but i have not yet taken any gre's or applied and i think it might be getting too late i'm going to do some shameless advertising here consider michigan state university's ms business analytics program i didn't graduate from this program but i know the program director and many students who've graduated from it; this is a quality program it's a 12-month program that starts in january of each year the curriculum includes corporate projects in the spring and fall and an internship in the summer the average graduate's salary figure of $85 200 is not inflated from second-hand experience they accept a lot of students with backgrounds like yours it's expensive for out-of-staters (tuition is $30k-$40k i think) but again the salary figures are not inflated i think most graduates stay in the midwest some of my colleagues are graduates from the program who are data scientists today i know i am capable to do this its just a matter of figuring the best plan and fully committing to it that mindset will serve you well going forward thanks for the recommendations i think at this stage i just need to bolster my gpa as much as i can to get into a grad program and hopefully land some sort of internship or job and network with professionals to get letters of rec i think i am just too far behind to get into grad school right now with my gpa and no good letters of rec would i be qualified enough to get some sort of entry level data analytics job this summer? i graduated from a data science masters program within the last year and i agree with vogt4nick that your best chance to become a data scientist is through grad school vogt4nick's answer is very comprehensive and i thought i'd just add some of my experience as a recent data science masters graduate regarding internships it will be very difficult to land one without enrollment in an advanced degree or a person in the company who can vouch for you the competition for internships is very tough (even for masters students) and a lot of companies specify that an advanced degree is required or highly preferred even the internships that don't have that requirement will likely have lots of masters and phd candidates apply in addition to data science challenges tests given by companies i would recommend studying topics such as statistics algorithms data structures and programming be prepared to whiteboard and show that you have some general business knowledge and can articulate well i would also advise to start applying to grad school as soon as possible you'll have to take either the gmat or gre if you haven't done so already i'm not sure what the technical programming requirements are for computer science or financial engineering master's programs are but for a data science program you just need to demonstrate that you have basic knowledge of programming and you can improve during the program through the classwork projects and extracurricular projects data science masters programs also look for applicants who have some basic understanding of statistics which you also already have regarding getting a full-time data scientist job in most cases the education will land you the interview and the project(s) will set you apart from other candidates unfortunately in today's competition a full-time job will be hard to land on just projects alone i also want to add that bootcamps are worth considering depending on your timeline what your ultimate career goals are and your current gpa while bootcamps aren't as comprehensive as masters programs they cut out the theoretical stuff and teach you the basics of what you need to know fast they also increasingly have stronger alumni networks and have connections at well-known companies from what i understand the average job placement rate isn't as quick but if you enroll in a reputable bootcamp you should be able to land a full-time data scientist job within a reasonable time frame hope that helps! best of luck and feel free to respond if you have any follow up questions thanks a lot for taking the time to give me advice it really helps put everything in perspective i will definitely look into some grad programs i dont mind getting more educated and it buys me time to practice my skills while still producing something i am going to look into attending as soon as i can you mentioned you specifically have a masters in data science is that a common program? what kinds of programs should i be looking into? what comes to my mind would be statistics computer science financial engineering i need to do something this summer regarding an internship job what kinds of positions should i be targeting in your opinion? i think a data science internship would be the best but i feel like i am too outclassed to be competitive would i be able to get a job in data analytics and would that be worth while for me to pursue in the mean time i just am afraid its too late to go to grad school next fall and that i wont be able to find a job or be able to do anything for a whole year a masters in data science (sometimes also used interchangeably with "analytics") is not as common as computer science or statistics but is becoming increasingly popular as more companies seek data scientists if your goal is to become a data scientist a masters in data science analytics is your best bet most data science analytics curricula contain a good balance of statistics machine learning data engineering programming and possibly some business impact class a graduate statistics degree or computer science degree can also be suitable but i think you'll have to put more effort into taking classes that might not be part of your degree program to get a good proficiency of the main pillars of data science i'm not sure about computer science but a statistics graduate degree is going have a greater focus on theory while a data science analytics program will have some theory but a greater focus on application to solving problems some other graduate degrees i've seen data scientists hold are industrial engineering math and physics i think it would be good for your resume application if you can find a data analyst job pick an industry you're interested in and find a job internship that will let you work with data manipulation visualization and maybe some modeling depending on the company if you perform your assigned job well you may also have the opportunity to take on more data science-y responsibilities any data analyst job internship will only help and not hurt you and most data analyst jobs internships do not require graduate degrees i saw on another post that you wrote that you're from the chicagoland area there are a lot of chicago companies looking for data analysts both interns and full-time :) good luck! thanks a lot :) i am from chicagoland area i will spend my break researching and applying for data analysts positions and hopefully something works out! very informative posts thanks again good luck to you presenting an alternative- i'm a data scientist who hasn't gone to graduate school (yet) if there are things you want or need to learn to do the kind of work you want to do you should go but if you're like me and enjoy learning on the job or on your own it's not always necessary i do think you'll have to leave urbana-champaign i'd leverage your alumni network in chicago and other cities hard there are a lot of uiuc graduates (especially outside the midwest) who'd be willing to give a current undergrad career advice also check out uptake in chicago- they do a lot of interesting ds work edit: just saw you didn't specify which university of illinois campus but advice applies to any :) yeah i do go to uiuc i live in the south suburbs and would be returning home this summer i do like to learn on my own and on the job i basically teach myself everything in the library anyways so it does not matter much to me if im getting a certificate for it or not i am just not sure how else i can get a job or internship without going to grad school is data analytics something i should look into in the mean time or should i just self study until i am good enough for an internship? data analytics and internships are worth looking into don't rule yourself out before you try i'd also try to talk to some people in person about career advice (informational interviews) try to talk to people with a mix of education levels are you opposed to being a data analyst? i was in a similar situation (last year) i was a pure physics undergrad and took high level statistics and taught myself how to code currently as a data analyst i get to build models using machine learning techniques forecast customer behavior and i also end up doing a lot of survival analysis although i am not a data scientist it is a good start before grad school i too plan to get my masters to become a more qualified candidate but my company is going to pay for my masters which is a huge plus my real advice is apply to everything that remotely interests you not all jobs will be perfect but even if you title doesnt say junior data scientist this does not mean you can't find cool work and yes i write a lot of sql queries but i also get to use stan and other fun tools nt say junior data scientist this does not mean you yeah that is the route i am thinking of taking how competitive are data analysts positions? would i be able to get a job right after graduation or am i better off aiming at internships? has anyone read through the book 'python data science handbook' by jake vanderplas? i've finished chapters 1-4 from that book and it kinda lost steam at the end with quite a few of the codes not properly explained and some code was understandably deprecated (things like pd datetools timedelta) i'm planning to read the last chapter (machine learning) it's more than 200 pages so i don't know if i should power through it or i should pick up a better and more up-to-date machine learning book instead has anyone finished that chapter and give me some feedback? also do you have any recommendations for any machine learning book that is short up-to-date and more applied? i'm still shoring up my math and stats skills and just want to read up on some practical application of machine learning in the meantime i'm thinking of buying geron's book but i've heard from some comments that it is already somewhat outdated (which kinda lowers my confidence with vanderplas' book further since it was released in late 2016) if you are interested in a thorough practical guide (without too much theory) on machine learning applications on specifically scikit-learn then i would recommend reading the documentation all of it this will not qualify as short but you will learn a lot you want to know how people write books on a certain topic? well they read the documentation from end to end which is always up to date chris albon is coming out with a shortish machine learning cookbook on scikit-learn in march of 2018 i have a short (and incomplete) jupyter notebook that covers the fundamentals of scikit-learn the first half will give you a solid intro on how scikit-learn works i got my start in data science and ml thanks to jake i watched many of his talks on youtube before reading his book sure things went over my head but consuming material in different formats will eventually lead to a foundation and thus the ability to follow along word for word it takes time i often would start a book tutorial class only to fizzle out i would try a different one and then found that i could come back once i heard the concept described in different ways by different people andrew ngs youtube videos are a good example of the above i actually got a lot out of this book as an introduction to ipython numpy pandas matplotlib i found it very helpful regarding chapter 5 i would recommend reading the first 50 pages as they are a good overview of the machine learning process however after that the chapter is just a series of brief examples of many different types of algorithms which for someone new to machine learning wasn't particularly useful after reading the python data science handbook i got hands-on machine learning with scikit-learn & tensorflow by aurelien geron which offers much more in-depth coverage of machine learning since that is the sole focus of the book i would recommend both books to anyone new to data science i know nothing about ds only know python sql played with ipython numpy can i just dive into geron's book ? (as a starting point) yes he doesn't assume any background in data science if you have a solid grasp of general-purpose python then you should be fine that being said he really does focus heavily on machine learning and you'll need a good grounding in pandas and matplotlib in order to be effective in data science that's where vanderplas is really helpful islr is my favorite intro ml book - http: www-bcf usc edu ~gareth isl i have been watching jakes pyconn 2015 talk on sci-kit learn which is 3 hours long but i have found answers to a lot of the questions i had and also filled in a few blanks i would certainly recommend this and other of his videos on youtube most authors of such books publish the source code on github where they update it according to new updates etc so my first tip to you is to check the authors github i'm a freshman majoring in the newly established data science program at ucsd i am looking to add a math minor to strengthen my math skills but am not completely sure what math courses will benefit me the most in this career can someone look through this catalog and tell me what i should take? http: www ucsd edu catalog courses math html upper divs only thank you based on a very brief overview of the courses i made a list of the topics i've used professionally math 10b calculus ii math 10c calculus iii math 11 calculus-based introductory probability and statistics math 15a introduction to discrete mathematics math 18 linear algebra math 31ah honors linear algebra math 20d introduction to differential equations math 102 applied linear algebra math 111a mathematical modeling i math 111b mathematical modeling ii math 170a introduction to numerical analysis: linear algebra math 170b introduction to numerical analysis: approximation and nonlinear equations math 171a introduction to numerical optimization: linear programming math 173a optimization methods for data science i math 173b optimization methods for data science ii math 180a introduction to probability math 181a introduction to mathematical statistics i math 181b introduction to mathematical statistics ii math 181e mathematical statistics—time series math 183 statistical methods math 185 introduction to computational statistics math 186 probability and statistics for bioinformatics math 189 exploratory data analysis and inference math 270a numerical linear algebra math 270b numerical approximation and nonlinear equations math 271a-b-c numerical optimization math 274 numerical methods for physical modeling math 281a mathematical statistics math 281b mathematical statistics math 281c mathematical statistics math 282a applied statistics i math 282b applied statistics ii math 283 statistical methods in bioinformatics math 289c exploratory data analysis and inference how important do you consider math modeling? i get the feeling it's the equivalent of algorithms & structures in cs many of the courses seem specific types of it maybe i'm partial because of my background in hard sciences but (and take this with a grain of salt since i don't know the class personally) i used monte carlo simulations all the time for testing type 2 errors in matching prediction algorithms there's other useful to know topics as well in classes like that: regression sketching compression algorithms streaming regression tail bounding reducing dimensionality regression etc so whatever class you get exposure to those topics is the class i'm recommending would any class with emphasis in machining learning be useful? what about discrete math and graph theory if i want to get into social networks? or take an upper level course that actually teaches you some network science algorithmic game theory but that maybe hard to find in all institutions would you recommend discrete math and ai courses? i would say that they are practically mandatory all my students take discrete math and i teach ai isn't ai a separate domain entirely? also do you think in the future more universities will adopt a "data science" degree and evolve from just a bunch of cs and stats courses strung together? universities have already started responded to industry efforts and have created data science undergraduate degrees for instance i teach a 3 course data science sequence in my department it's neither a fully cs nor a wholly statistics course ai means many things to many people most recently ai = ml which i disagree with i think ai courses should focus on modeling to build intelligent systems given that we are light years away from classical ai [deleted] also linear algebra for machine and deep learning i don't really wanna read through your catalog but you definitely want a strong calculus and linear algebra foundation discrete math and optimization wouldn't hurt you should have enough calculus before the upper divs (vector calculus) in order of importance for upper div: statistics sequence and computational statistics probability sequence numerical analysis sequence numerical optimization your data science classes might cover 3 and 4 then if you got room you could add some discrete math combinatorics or cryptography i probably need some probability theory but are stochastic processes that important? and thanks for your time! for getting a minor in math it's probably too much to more precise you should get linear alegebra and vector calculus (20ef) for lower division then lots of statistics classes (181ab 185) and probability 180a and one more of what you think you need want stochastic processes and understanding more complex probability models could help you but not needed there is stochastic gradient descent which is an application math modeling course might help with some solid coverage in calculus stats don't major in data science computer science is the way to go i'm not so sure data is the future majoring in data science is overfitting people can’t define what exactly is data science so imagine doing a degree in it do either computer science statistics mathematics or physics while that maybe the popular narrative in some circles scholars have mostly come to a consensus what data science is in recent years i wouldn't put too much stock in the name of an undergraduate degree but rather your performance and what you've actually done any sources? data science is very different across business lines and different types of data a financial institution will want data scientists who know statistical modeling well because they want to explain the model to regulators (will rarely use neural networks) whereas a tech firm will want a data scientist who knows distributed computing calling api’s and image recognition (will mostly use neural networks) just look at the job requirements of data science in both these types of firms and you can see more emphasis in certain majors sure this is reflected in undergraduate academic programs in data science that have recently been created some good examples are cornell purdue and marquette in addition i talk to other researchers in the mainstream academic conferences (icml www icwsm) all the time and this is what we talk about everyday in talks presentations and panels a data scientist has to know everything therefore a data science major usually consists of more credits to constitute the major than say a cs or statistics major our bright data science majors are hired by financial institutions tech companies and unsurprisingly by many organizations one would traditionally consider "blue collar" or traditional because they are taught a breadth of necessities i just sent an advisee to a major finance corporation and another to a major tech company they are all graduates (to be in may) of our data science program interesting took a look at it and seems like a pretty solid foundation of courses so great job to you but my criticism is that the program really just looks like a statistics and computer science double major with less statistics and computer science courses than it would have if it was a double major in those fields the business still defines the term “data science” in the end and unless there is an accurate consensus among a true representative of businesses on what data science is (sampling problem) then i am still confident in saying data science is a broad term like data analyst with no concrete definition as of yet in some businesses a data scientist is just some reporting analyst but with more advanced metrics or a data wrangler or it model production assistant i have a nice information technology and big data degree i'd like to sell you joking aside how established the major program and the school department alumni networks etc are the most important things in choosing a program don't get tricked by the newest buzzwords or some universities rebranded big data whatever program that's a very blanket statement we regularly get industry partners (all fortune 500 companies) and our industry advisory board asking us to build a data science undergraduate major clearly there is demand and intention that happened with it too it didn't work out that great for a lot of the people who got their major in it vs cs plenty of data science programs launching out of business schools etc meaning that it is a lot less clear what someone will bring to the table with a new data science degree vs a known cs degree also the number of people getting "data scientist" positions without at least a masters is really low you should take linear algebra statistics any machine learning course and mathematical modeling also learn python or r disclaimer: i have no background in data science other than a bit of an interest in it if i am missing basic concepts it's not because i'm ignorant i simply haven't been exposed to them yet so please teach me! i've been working on a program that combines input from various experts on mma to predict the probability of each fighter in a match-up winning (output as american betting odds with the eventual goal of picking bets) as i've expanded to include more sources (i'll have anywhere from 10-30 picks depending on the significance of the fight) i've noticed that some predictors seem to be below 50% accurate; this is just from eyeballing the data and evaluating small samples i have not yet gone back and extensively evaluated individual sources assuming that my hunch is right and some are picking wrong more than right does this affect how or even if i should use their input? as it exists there is no differentiation between the sources; every source is weighted equally and combined to reach the final output on one hand this makes me feel like i'm diluting my best sources and therefore making the implied odds less accurate i am only taking their input on who would win nothing else so 50% accuracy almost makes their input feel detrimental in some ways output based on only those that meet a certain standard of accuracy seems more useful on the other hand picking mma fights is hard there isn't a system like in most sports where each team plays a large mostly-randomized selection of other teams leading to occasional games that even a novice could predict the outcome of mma matches are each selected by an experienced matchmaker and need to be agreed on by both fighters essentially a fight that's easy to predict is a poorly made match and quite rare in addition because of the large variety of skills used in mma and the virtually infinite number of ways they can interact even a severe underdog can find a path to victory because of this the number of predictors who maintain even 55% accuracy are relatively rare so my sample size would be quite small also even if someone is wrong 60% of the time doesn't that mean that a consensus between them and the high-quality predictors is even more significant and implies a less even matchup? these are the solutions i see: a) continue combining each source equally b) remove any sources below a certain threshold of accuracy c) sort the sources into tiers based on their accuracy and output different odds based on the different combinations for example there would be one set of odds from only the most accurate of predictors another set including mid-tier and up and another set using everybody's input d) evaluate each source individually and weight their sway on the outputted odds according to their accuracy this option is far from applicable as my program works currently and modifying it to work that way would be very time-consuming not just initially but also to maintain however as my coding knowledge expands this may become more viable and even in its current state there may be something i can learn from your responses to this option or some aspect of this option i can incorporate just y'know explain your reasoning or pick a second option how would you approach this? tl;dr but to answer the question in the title if someone is wrong more than they are right just do the opposite of what they say and you will be right more than you are wrong which might be the wrong thing to do if someone is wrong about a lot of little things but always right about the big things then there is a lot of value in what they say as with everything context is important analogous to minimizing mse vs mae i think 538 addresses this pretty well nate silver talks about "house effect"--some polls lean left some lean right as long as they are consistent you can adjust for it if a judge is really worse than just guessing randomly give his predictions negative weight if there is some factor that explains his failings (e g he gives too much creedence to a fighter's past record irregardless of the quality of the opponents) you can work around this i love 538! if for example a predictor was friends with a certain gym and therefore biased towards an otherwise-random set of fighters causing them to be wrong more than they're right but for a consistent reason is it still more effective to take the opposite of their predictions? correcting for their specific bias would seemingly be the most effective approach but if i didn't know the predictor well enough to know the source of the bias would i still be better off taking the opposite of their pick or is this effect somewhat cancelled when they have a specific consistent bias rather than just being generally bad? great question i don't know unfortunately the tricky part is that you're competing with other modelers so if it takes a lot of work to figure out each predictor's bias a modeler with more resources could do that and beat your negative weight model the way to sort the specific bias is going to be tough you'd need a lot more data than you currently have and you might not even have access to the data but you could list as many factors as you can account for (height race number of fights record weight military experience favored fighting style age coach gym) and make a model to see if there are any inherent biases to the fight picker's choices make the negative model and see how well it bears out before you embark on this highly specific model it may be just as effective i agree with your conclusion that the specific model is likely not worth it it is a fun idea however especially with the wealth of data stored on athletes; in addition to everything you mentioned i could find out who's biased towards high-volume strikers who's more influenced by submissions who's biased towards fighters with long careers et cetera it's very interesting to think about i think applying a weight is a good approach until it hits a certain point where you may want to stop considering the source i think of it like my relationships with people all input is valuable at first as time goes on if someone is consistently wrong with their input i tend to lean towards my employees who are right more often than not at some point if it gets bad enough someone is labeled "full of sh*t" and their opinion gets very very little attention now i'm also limited on resources as i only have x employees on the team but assuming i could pull in an infinite number of sources i'd be dropping those who aren't correct much more frequently and likely at a higher threshold of "incorrectness" so depending on your number of sources i'd maybe weight things heavier on the correct side and continue to do so if the other side remains incorrect and slides down that scale side note: your project sounds fun are you including things like the prize amount for a fight? while that may not necessarily affect the predictions people make it could affect the results (how hard fighters train or perhaps if the purse is big enough that the fighter doesn't train much because it's a huge pay day either way etc) i haven't thought of that no the prize amount specifically could be hard to include because they tend to be set per contract so for a few fights rather than one fight and they make twice their contract amount if they win (i think that's a little barbaric but that's irrelevant) so even if they didn't care about their record and ranking they still have a large incentive to win however certain fights certainly do mean more; for example some fights are set up to decide who'll get a title shot (pay skyrockets once you hold a title along with your reputation and negotiating power) so if a fighter were noticed to tend to not take some fights as seriously as others you could possibly expect them to perform above their average if they're trying to gain a title shot or are actually fighting for the title of course all kinds of things factor in to a title fight like nerves and the extra rounds (most fights are 3 rounds main events and title fights are 5) so accounting for all of that might be better left to the predictors than a formula still it's interesting to think about true if you are analyzing their predictions it could be left out if you were trying to create your own predictions then i think it would be something to consider at any rate it sounds like an interesting project good luck! it really depends on when they are right wrong this is a basic precision recall question if a model is wrong most the time but right during the most significant cases it isn't necessarily a bad model but if it's wrong most of the time and right sometimes when other mdoels are also right it probably doesn't need to be used interesting point; maybe some people are especially good at picking upsets which while annoy to sort into its own category would be very valuable for playing gambling odds http: halfsigma typepad com half_sigma 2007 03 why_a_career_in html ^ is data science career dependent on "temporary knowledge capital" too since it is heavily dependent on programming skills? young man's game my ass that's a seriously shit article how? it's just unsubstantiated nonsense written by a doofus with a bone to pick the 60 year old "5 year" programmer has an immeasurable advantage over the 27 year old i'm 48 i've been writing software professionally for 30 years and was fucking around with doing it on my own for literally 10 more than that programming is a fucking blast i still love it if having a "prestige position" is a career goal then that person's got serious insecurity issues the author provides no references at all for his claims; it's not even anecdotal horseshit i've worked with all kinds of programmers in several industries i've worked in dot-coms huge r&d positions wall street and all other things being equal i'll take programmers in their 40s over programmers in their 20s every time the whole blog post is a disaster but doesn't programming require knowledge that does have high turnover rate? sure you could learn it but so too could a 27 year old who is probably mentally faster and def cheaper the only time what you are saying is true is if they had the same experience background i e they both code for 4 years and even then it's more dependent on various other aspects i e liking to code be able to quickly understand what is happening etc only in the most superficial way people dramatically overweight the cost of mentally adapting to a new language or framework the accrued knowledge of having learned "outdated" technology almost all applies perfectly well in learning a new technology the great lie of programming is that today's technology is different in any meaningful way than yesterday's yeah the syntax is different sure functional programming differs in structure and philosophy from procedural and old school object-oriented programming and there are different domains: front end development web development systems programming embedded systems programming database work but toolkits aside those don't change nearly as much as the current crop of trendy technophiliacs would have you believe you get to a point in your career where switching languages is just a damn detail a nuisance then you learn some neat stuff that language (or toolkit or framework or design pattern or database) brings to the table and you fold it in with the rest of your learning shrug say "neat" and you're better at what you do older people have to compete for the same wages the fact of the matter is most programming languages or stacks aren't that different from one another a 50 year old person educated in cs isn't incapable of learning new stacks or languages--they have the foundation necessary already i work with plenty of incredibly gifted data scientist in their late 20’s early 30’s (my age group) i also work with a number of individuals in their late 40’s and 50’s that are equally capable (usually more so due to vastly superior domain knowledge and additional real world experience) i like having a mix of both to be honest the younger guys are usually the ones who are pushing the newest latest greatest tech algorithms while the seasoned vets will gladly trade a % point in terms of accuracy to have a significantly simpler model and tend to deliver a solution much quicker we end up playing off each other’s strengths and it’s been incredibly productive ageism in tech is bullshit as is this article in my mind real data science comes down to a strong grasp of mathematics and theory with just enough programming and big-data skills as are needed to apply one's thoughts and ideas to real data however especially in a smaller company team a data scientist role in the real world may involve a significant amount of what is now being called "data engineering " data engineering is a lot more like software development in terms of its dependence on programming skills and facility with specific technologies and the two sets of responsibilities are not always separate where manpower is limited edit: i did not read the above article my response is simply addressing the question of dependence on ever-changing technologies and does not endorse the ideas presented in the article real data science is the application of the scientific method to business questions mathematics and programming is just a tool this is the best argument i was thinking this too: that the only way data science could be an occupation that is much less dependent on temporary knowledge capital it has to require that data scientists are skiled in math stats and that the programming skills needed for the profession are less complex and have a much lower turn over rate but what i hear most often is that computer science is more important in data science and that most cs majors could do it without further stats training what do u think? i mean computer science skills are much more needed in this profession which is what i hear is that true? and how common are these "data engineering" roles? don't even big companies want to hire people who can do both ; jack of all trades and computer science is more important than stats ? if you feel like learning new technologies constantly is a downside to this career path and not a huge upside it's the wrong career path for you! there do seem to be a lot of polyglot data scientists out there (including myself) but i know plenty who basically never go beyond basic r or sas scripts on their local machine that then get emailed to someone in it to productionize yes of course i was mostly responding to the op article that postulated that having to learn new things was somehow a negative even if you are using existing languages packages change standard workflows change things get better more efficient it's a boon there will be of course be organizations that are comfortable with 'stagnation' - sort of like how you can still land a job doing development for the as 400 - so even if you don't learn new skills you could have a prosperous career but if you aren't open to constant learning you are likely to limit your potential for professional growth imo how could it be upside much less huge? i got into software development data science because learning new and more efficient tools and techniques gets me excited working with the same tools for the rest of my career sounds terribly bland i think it's less a "young man's game" and more a "motivated man's game" what do you do for a job? you said you do software development data science does your job require you to analyze data? it sounds more like you are building software instead of using software to find what the data means i do a whole mix of things my role evolved from my previous role as a data analyst where i was exclusively providing insight making key business decisions at the time we were doing all our analytics in excel which was painful and repetitive given the size of our organization moving into a software role i still analyze data and provide insight using the skills that i learned as a data analyst but on a grander scale i tend to focus on trends in the business as a whole rather than getting too granular over time i spend more time providing insight than building software but since we had no etl analytics infrastructure when i started in this role there was certainly a huge bias towards the software side of things that may be shifting as we have (finally) added official 'data scientist' and 'data engineer' roles to our organization in our next fiscal year i see that's awesome what is your educational background? do you have degrees in cs and or stats? no degrees - i did start a cs degree out of high-school but i didn't complete it as i started working in an unrelated field i think the classes i took certainly helped me in terms of allowing me to hit the ground running when i started down this career path i fill in the gaps with moocs and independent study as best i can there's no replacing having room in your life for a formal education so i am sure that it holds me back a bit to be studying these things in my spare time (what little i have with an infant son) some days i feel extremely underprepared for the things that i'm working on (or would like to be working on) but for the most part i am very confident in my work despite my lack of a degree now that being said because i came from an analyst background within our organization i think there's a slight bias towards success for me because i already know our business and the sorts of questions we need to be asking if you dropped me into a new org and i had to spin up business knowledge alongside learning new skills i'm sure it would be overwhelming ahhh interesting when did you start working at your current company? did you start off as a business analyst at the company or another position? "young man's game" um no it requires brain power not muscle but ofc i'd rather hire someone young and cheap to train up "young man's game" um no it requires brain power not muscle but ofc i'd rather hire someone young and cheap to train up you're contradicting yourself yeah i'm a bimbo that entire article is shit the author makes big bold statements all throughout without backing them up with evidence i didn't become stupid once i passed 50 i'm now on my way to 60 and i have more development and analytical experience than most 25 year olds i didn't become stupid once i passed 50 the article is not saying you become stupid but that your employers think you become stupid or at least that you aren't worth the higher salary than younger folks at least my employeer doesnt think so i make over 6 figures and i am yhe go to person for solutions is that guy from google branching out into ds? quite a few of us are women and aren't all that young what a time capsule this was written over 10 years ago before the recession took wind out of the sails of law and finance or most of us had heard the term "data science" with ivy leaguers increasingly taking computer science classes and flocking to careers in tech and startups i would not say it holds up well why not? are programming languages trends not still explosive? check out some of the author's more recent posts and ask yourself how reliable you consider their empirical analyses that article was written in 2007 most of what he wrote isnt relevant anymore why not? "computer programming is a low prestige profession" you think that holds true? anyone who believes this needs a reality check the guy also talks about visual basic and net programming visual basic is for building guis c# is still useful depending where you work (though probably not for data science) overall the guy just has a really bitter outlook on being a programmer let me just stress here that programmer = = data scientist data science may involve programming but the two strictly speaking have different goals and thus the "temporary knowledge capital" is going to be very different for both fields statistics math and calculus is never going to change it may grow but you're not going to have a scenario where the calculus you learned 10 years ago is suddenly obsolete and replaced by calculus 2 0 math is math the fundamentals of the "science" in "data science" isn't going anywhere data science may involve programming but the two strictly speaking have different goals how will the knowledge turnover rate differ for the two professions? isn't data science heavily dependent on programming often even more so than on stats? i mean many if not most data scientists have only degrees in cs not math or stats isn't that stats learned in uni and above an overkill since they won't get applied in industry? it depends on the specific role the company wants a company can be filling a position with a "data scientist" title but actually want someone with a cs degree who can build software this can be more akin to either a data engineer or a software engineer if that's the case then it would be important to be up to date with the most relevant languages and tools if the company is looking for a data scientist in its strictest definition then it doesn't necessarily matter what tools they use as long as they have the expertise to build statistically sound models in this case their value is in their knowledge of math and stats the bottom line is it really does not take a cs degree to learn how to code i think if you work at a very high level in industry research the theoretical stats you learn in school won't be overkill at all thanks! but intuition tells me that the latter kind (the strict definition of data science) are very rare in industry is that true? are won't very high level in industry research want phds instead of master's? i think there is something alluring to hiring managers about the young scientist who knows a lot more than his age as a young scientist (mid 20s) i took my job because there were older scientists who could help me learn how to go from being a young data scientist into a great one there are very few young data scientists that are great most of the people we look up to ar in their late 30s early 40s vladimir n vapnik inventor of the support vector machine algorithim is employed by facebook he's in his 80s if there was ever a picture of great living data scientists (similar to the modern physics conferences at the start of the 20th century) he would be sitting front and center with everyone's eye looking at him in their peripheral vision there is a bias in tech against older poeple and women this is stupid but this largely stems from the arcane hiring process fuck that hiring process but this largely stems from the arcane hiring process fuck that hiring process no it's about money women want to have babies and can't work during pregnancy and stuff i'm giving you the benefit of the doubt but you probably want to edit your post i understand your point (and i would include that in my definition of arcane because its hard to understand why someone would especially in a field so hard to find hires that they would have any discrimination) but you might want to elaborate on your point a bit so it doesn't seem like you are advocating for something completely misogynist but you might want to elaborate on your point a bit so it doesn't seem like you are advocating for something completely misogynist i'm female not misogynist just find procreation disgusting what an arrogant act to impose life onto a neurologically vulnerable being when there is so little certainty bout their welfare! maybe you aren't misogynist i'm not really saying you are your two posts have demeaning implications about the female workforce i again suggest you change that i have recently become interested in potentially pursuing a career in data science i have a non-traditional background in that i don’t have an academic history in mathematics or computer science i have a bachelor’s degree in biology with a strong knowledge of evolutionary theory the only related coursework i have taken is calculus i and elementary statistics of course before i apply to any master’s program i will have to do a lot of self-teaching as well as take pre-requisite courses at a community college however my question is this: if the goal is to become a data scientist would it be better to pursue a master’s degree in computer science or data science? the generalist in me is saying that computer science will give me more flexibility in my career should i choose to one day pursue software engineering say (although currently i don’t have much interest in software engineering as a career ) on the other hand the data science master’s will provide me with the specific skill set needed to succeed as a data scientist any input is appreciated i think a computer science master would be more well rounded but then again that's just my opinion maybe a data science master is more pragmatic anecdotal from friends of mine with ms cs - do a stats ms instead wouldn't that be too hard without background? some people with stats ms would say the other way around really depends naturally though i'd can't fathom a stats ms saying there is literally nothing from their program that they now use maybe cs doesn't age well? he's 42 cs degree programs change a lot over the years and some of the things learned are not useful in todays world so i would agree in stats even understanding traditional linear models or hypothesis testing helps me understand neural networks better (neural networks stacks many linear models together in many different architectures and apply non linear link(activation) functions markov chains in statistical stochastic processes are used as markov decision processes in reinforcement learning although i am biased i feel that newer algorithms and ml models all use old school stats but in different ways eh i think it depends on the master's program some masters in statistics are very classical traditional and don't cover much machine learning or other new techniques and tools i find that in general cs programs are more up to date but again it completely depends on the school generally :if do a cs masters but also take a lot of stats electives especially in linear modeling and generalized linear modeling mixed models are good too if do a stats masters also take a lot of cs electives especially those in machine learning (most stats departments dont offer these yet) if you do a data science masters take a lot of both stats and cs electives honestly the more the better to get that statistical thinking and programming implementation skills i say cs not just for the content but also the soft benefits meaning networking with similarly interested of employed people easier to get internships etc if the computer science masters has a thesis (i'm from physics so i don't know how cs masters work) you could try to do a cs masters and focus on something related to machine learning for your thesis also if your do either program try to get a days science internship since having real world experience really helps with getting a job anecdotal from a very unemployed me with a stats ms - do the ms cs (with a focus on machine learning or data science (if offered)) does your experience reflect good programming skills or just theory i don't know about good programming skills exactly but my program was about 60% actual data analysis coding and 40% theory the bulk of our work was with datasets with most lectures covering theory so i have an upcoming opportunity where i would be a strong candidate to join a new data science team my company is forming internally and i'm hoping someone can help guide me to the best resources for meeting the job requirements rather quickly i have worked on projects with the leaders creating this team and networked with some of the folks already hired to the team so i'll be a recognizable face through those discussions it seems like there are a couple gaps i would need to cover to meet some of the technical requirements: knowledge of bayesian methods knowledge of deep learning methods able to write tensorflow code (or h2o) these were already in my learning pipeline but to be sure the opportunity doesn't pass me by i want to accelerate my learning specifically in these areas some things i'm already doing: working through dataquest io exercises to brush up my python command line and spark sql skills watching andrew ng's machine learning course lectures to get additional mathematical depth on foundational ds methods listening to informative educational podcasts to get different perspectives and analogies on ds methods (data skeptic twiml etc) what i really need are some crash courses to get me ramped up as quickly as possible on the gaps i have for the jobs that will be posted in a few months (read march or april) some barriers i am facing currently include my current full time job being fairly demanding with not much free time day-to-day i have a two year old at home so i spend a good portion of my evenings and most of the day on weekends spending time with her (except nap times and after bed time) i understand this is probably not all achievable but i want to prioritize what i can and at minimum show a good faith effort that i can close these gaps quickly thoughts advice and resource recommendations welcome feel free to be as brutally honest as possible so i can temper my expectations p s incase it's relevant i already have the o'reilly data science book series (november humble bundle) sidelined in my desk drawer just haven't had time to pick them up yet part of that includes a text on h2o several on python for data science and a bayesian methods text with python being the code used within edit: per u stochastic_response i will include a bit more about my experience background my education is a bit unconventional with an ms in industrial-organizational psychology (has a heavy emphasis on statistical and research methods) working experience has been hr and people analytics focused where i spend a lot of time automating processes and incorporating statistical analyses for deeper insights i recently partnered cross functionally to produce an economic analysis for my company looking at macroeconomic drivers of company performance i've spent time learning new statistical and ml methods via the islr andrew ng's ml course and some other texts on these topics i have a pretty shallow background in computer science admittedly as for free time it's pretty minimal could probably commit 30 minutes to 1 hour at work max per day and 1 to 2 hours max per day at home for my company i'll keep it brief and just say it's a large company creating a data science team for the first time they have very few internal folks who understand statistics and fewer who use it in their role while the ds team is just getting started they are aggressively investing in these types of resources topics i've been told would be used: deep learning nlp bayesian methods run off the mill stuff (regression clustering decision trees) edit 2: incase any one is following this i'll be starting off with u nickmiz's recommendation to hammer through the udemy courses: python for data science and machine learning boot camp machine learning a-z deep learning a-z complete guide to tensorflow for deep learning with python this is because the pacing and specificity of the courses seem appropriate given my timeframe and free time to learn i'll likely follow up these courses with andrew ng's deep learning course to get more depth on the mathematical mechanics of dl and tack on some spark aws stuff as well youre casting an extremely wide net for your timeline and free time you didnt say a ton about your experience but here is my two sense: unless you know explicitly that the team plans on using deep learning i wouldnt spend most of your time on it if they do id probably just do andrew ngs deeplearning ai stuff and the try and incorporate bayesian optimization into training(because that is what i assume they mean by bayesian methods) really make sure you understand what hes doing and build off of it if youre not sure about deep learning usages focus on understanding the simple and fast stuff(regression classification clusters and really just stats) many of the projects of a new ds team can be classified as 'low hanging fruit' so more simple stuff that can be done to demonstrate value(which is rarely using dl) thanks for the advice given my op being somewhat vague i edited to add some more info i agree with a lot of what you said here for context the person i talked to that cited the skills i mentioned in the op is a data scientist working directly with the strategy leader of the ds team they're forming so he has some concept of what they're going to do but they're still shaping their direction additionally i think a lot of their initial short term projects won't use deep learning as they will likely seek out accomplishing some of the fundamentals (customer segmentation better recommender systems etc) that said i might take you're advice and double-down on some of the preliminary methods to deep learning (topics in andrew ng's course the islr and bayesian methods) follow up to your post: do you think bayesian optimization is all i really need for industry applications in ds or should i go ahead and work through the most typical bayesian methods? sorry i wasnt clear in my post either bayesian stats is great i definitely think you should invest time learning it if the team is focusing on primarily on deep learning than bayesian optimization would be the main usecase for bayes methods but if they were focusing on other things than really bayes can be applied in many different forms i also have a background in i o (phd) and transitioned from selection to a data scientist role within the consumer research demand area of our company here are a few udemy courses that you may find useful python for data science and machine learning bootcamp machine learning a-z deep learning a-z complete guide to tensorflow for deep learning with python these courses tend to be fairly inexpensive ~$10-$15 i found them very useful for a primer on many topics that didn't get covered in my methods and stats courses (support vector machine nlp neural networks etc ) glad to see another i o trying to transition into data science! thanks for the advice and +1 for i o to data science relevance! i actually downloaded a few udemy courses a while back so i'll definitely add these on top can you give me an idea of how long they took you to work through? any other pain points you had transitioning from i o to data science? not very long at all i finished them in a couple of weeks i'd say the most difficult part of the transition for me has been version control when it comes to putting projects into production i'm used to running eda and building models on my computer just recently started using virtual machines for this we have attempted to move some of them to production in spark and it turns out they are running python 2 and i built everything in python 3 the pickled objects don't work etc our company just built a machine learning cloud platform though so i'm thinking it will help with version control as each project will automatically load the same requirements on everyone's instance so mostly the software engineering stuff i'm not sure how connected to siop you still are but they are actually running a machine learning kaggle like competition this year which is a really good way imo to get more i os more familiar with the difference in language and methods cs231n is a really top-notch course from stanford i highly recommend at least watching the lectures (https: www youtube com playlist?list=plljy-ebtnft6eumxfyrinrs07mcwn5uia) even though its computer vision and not nlp you will certainly get loads of knowledge i have done 2016 version with andrej karpathy (best lecturer ever) although stanford has released last year's lectures recently the field is moving incredibly fast so newer course might have much fresher data there is no there karpathy though so as of writing this i left my job to study and pursue a career in data science currently studying calculus 2 - 3 linear algebra experimental design mongodb bayes and some other stuff i want to keep my sql skills proficient and to a lesser extent r but i have no idea how to with r i figured i can just grab some public csv data sets screw some values up a bit via excel then clean said screw ups in r then run some stats and data visualizations automate as much as possible via user created functions with sql all i can come up with is to log some sort of data manually multiple times a day each day what data i have no clue do some joins then some basic aggregating by factor the only problem is this leaves out many other aspects a data analyst scientist does with a typical mysql database points for any similar suggestions for nosql mongodb i'll share my solution here once i've figured something out i think you should pace yourself rather than trying to do a lot at the same time what do you mean? is what i'm studying not mandatory of a junior data scientist? it's not the content it is the amount doing to many things at the same time is the fastest way to burn out yeah you're right so as of writing this i left my job to study and pursue a career in data science these things can be learned during off hours huh ok what would you suggest the focus be on then? i'm suggesting not quitting your job to study +1 i'm a software engineer and simultaneous in georgia tech's omscs program it's tough but it's the path to success did you do the gre or the gmat before applying i'd like to get into the data analyis program i did do the gre but only because i was also applying to physical location schools (ex: got into uva) the omscs program doesn't require or even accept gre scores though what's your career plan after getting the ms? so actually i just applied for an internal data scientist position and am told i'll get an offer i'll finish my ms in another year and a half so by then i'm sure l have new goals awesome! i was deciding b t their cs and analytics online masters as well have you had any issues with their online platform or anything? overall good experience? no issues it's been an excellent experience i've done a lot of schooling - ba philosophy ms finance post-bacc cs and now ms in cs this is my favorite one so far i actually interact with other students more thanks to the forums and slack chat than i ever did in the on campus degrees it works well for me i gave up the chance to do research on campus for the opportunity to work full time but i'm ok with that the economics of getting paid a real salary plus only paying ~$7k for the whole degree make it an easy choice nice i'm getting back into schooling myself got a bs in finance in '08 and through a series of twists ended up doing risk analytics and data management for a large fi got the time and money now to tack on that masters do you think an bscs is req to succeed at the ms? i'm a coder but largely self-taught with no formal training i think you'll have a hell of a tough time if you go into it without knowing data structures and undergrad algorithms inside and out depending on how well you program you might be fine on data structures but it's good to know things like problem complexity i knew a little python before doing the post-bacc and i don't think i would have succeeded in the ms without the undergrad classes i have a 4 0 gpa now but it's still really really challenging it'll also depend what specialization you choose i can't speak to the os security side (more low level programming) but the ml one can be really tough taking 2 classes a semester i often spend 30 hours a week on school and sometimes more during projects i can't imagine having to first teach myself the basics go on the omscs website and look up some classes they have a list of pre-reqs do you know them? if not you need to study them here's an example for the machine learning class: https: www omscs gatech edu sites default files images cs-7641-prerequisites-test-readiness-questions pdf did you do the gre or the gmat before applying i'd like to get into the data analyis program i would argue you do neither and focus on skills that will actually help you get a job sure you will need to know sql and a coding language but sql is easy and if you aren't comfortable to with a language to do basic manipulations without daily practice you definitely put the cart before the horse by quitting your current job yes you will need to show you can use tools in your interviews but you won't get an interview if you can't show statistical proficiency (not plugging into scikitlearn and getting results but demonstrate you know why you are using method x y or z or why you might use one in a certain situation) you will need to show that you have the curious mind to come up with a question figure out a way to address the question and execute on that plan if you don't have the science skills the tools skills don't matter so focus on those -- learn try new methods ask questions and write up results bonus side effect: you get tool exposure along the way all that said: from your description you are going to have a rough road in getting a ds job without a lot of high quality work in a portfolio to overcome your lack of background education (i'm assuming here from what you are studying that you don't have advanced training in a stem field -- forgive me if i misunderstood) i think some assumptions were made i am currently proficient in sql and r maybe i made the mistake of not mentioning it but i am studying (beginner - advanced) stats and probability this is r datascience after all my github portfolio isn't going save me from having a hard time with this career transition my education is in business and background is in data entry databases and business i'd say those with my background are well prepared to ask the right questions more so than someone with a traditional stem background nonetheless thanks for the information still can't figure out what people see in mongodb almost everything you might want to do with mongodb you can probably do better with elastic and or postgresql for both r and sql find some unstructured text like pdfs or word files rip out the text classify it then map keywords and topics against taxonomies and synonyms stored in sql cluster related terms by string distance that is a great idea for me sql is about getting the data out the serious stuff then happens in your platform of choice be it r python knime second point: forget mongodb and nosql simply said it's crap and real businesses working with important data in 99 9% cases use sql database while ti helps to know underlying math for standard data wrangling it's not at all required so i would first focus and getting data (sql) and working with it (r python ) this is practical and gives a potential employer value linear algebra does not in other comments you mention background in business what industry were you in? i'm a mis (was still part of the business division at my uni) undergrad with and mba so i am curious about how our plights are similar and different also i haven't even started a github how do you like it? i don't know if this is applicable but you can do crowdsource data science projects on kaggle (or crowdanalytics) they give datasets and a lot of objectives that you can aim for you can win prizes and get ranked but that's probably not important for you i think this is the best way to train r and python but maybe you can work in some sql somehow good luck! what exactly is the question here? you don't want to forget r and sql? even if you forget the details of sql syntax you're unlikely to lose your mental model of relational algebra and you'll pick it back up again quickly similarly with r you might forget some syntax or maybe even some best practices but you'll know the concepts of when to use certain types of models or algorithms good point i just worry about freezing at a job interview when tasked with an sql test you can try some of the sql puzzles on hackerrank to keep practicing: https: www hackerrank com domains sql select thanks a million for the link this is exactly what i needed :) those making the transition into data science often ask what classes to take and what concepts to learn but what’s more important is what mindset do you have don’t get me wrong there are important strategies and tactics but with a mindset you can make decisions for yourself and your unique situations i propose that the most important mindset is: escaping theory hell and heading towards practice heaven theory hell is: learning the machine learning model intricacies before you start a project studying linear algebra before you learn how to code delving into abstract topics before you find a real world problem you want to solve learning data science approaches without context or without understanding their use practice heaven is: being excited about interesting problems and learn theory as you need it learning to code and then appreciating the math behind it as you learn building the skill of defining why’s before you learn any how retaining more as you learn because you understand the context of the theory data science came from academia but it doesn’t mean that the academic approach is the best way to learn it it can actually be a great disability in this changing landscape so in you are in the data science journey focus on doing to learn not learning to do and you will be ahead of most others have questions? reach out at fernandodata com about just because learning to "code" randomforest(y ~ x) on a titanic dataset is easy doesn't mean it's simple i don't think anyone who doesn't even know linear algebra will get anything out of "appreciating the theory" as they learn this is definitely true but for folks starting out in data science it will be infinitely more effective to start thinking about problems rather than theory i've seen many folks stuck before they even begin this is completely true if you can't be bothered to learn linear algebra you don't have any business working with data i have the total opposite approach to things seeing the theory work in action is probably the most rewarding part of my job it's true once you're working in the field but when starting out it can end your career before it starts hey_compactsupport i'm curious for folks starting out what advice would you give them? what i see is that the high churn rate from courses and people studying data science is that they get discouraged from learning theory without answering the why this changes as one learns and appreciate the math behind it but when one is just starting it's a different story would love to hear your take on the difference in leaning approaches for absolute beginner to advanced folks -thanks! my advice would be to do what works i'm theory driven i find that more rewarding and interesting than just running algorithms if you do what works for you then you have a better chance of sticking with the material rather than churning "data science came from academia but it doesn’t mean that the academic approach is the best way to learn it " yes that's why we have hacks with no theoretical practical or ethical basis doing data work it's a real problem when you go to school you might learn the limitations of methods and data if you just jump in thinking that you can learn everything on the job you are going to miss a lot linear algebra is not hard not sure why anyone would whine about it just do it you'll be better off for it in the end "learning data science approaches without context or without understanding their use" this is nonsense if your mentors aren't sending you to applied projects along the way then they are doing you a disservice "it can actually be a great disability in this changing landscape " the "changing landscape" appears to be to let untrained hacks work with data without understanding it this is to all of our detriment while it is exciting and fun to work with data if you have no theoretical basis to understand the limitations of the methods you are working with you are going to do a pretty horrible job interpreting the results just sayin hey midmidmidmoon! thanks for your feedback i actually agree with the lack of theoretical knowledge of some data scientists in the field what i also believe is that for folks just starting out reading ml books or taking andrew ing's course can be a great way to ending a data science career before it starts when i started learning i spent months learning theories which i promptly forgot because it didn't map to anything in the world however when i started to just focus on practice then even the theory started to stick it answered the why behind the work this is applied to folks starting out and not to be a learning technique for a whole career because if you don't learn the theory there is a ceiling to what you can do! would love to hear your thoughts! my opinion is that people should take the time to get educated t-up linear algebra is pretty much definitions amd matrix multiplication never understand why people act like it's hard useful to take a day or 2 to learn basics if you care about methods linear algebra was one of the easiest math courses i took in university if you really give it a go you can learn a full course's worth of material in less than a month https: youtu be pi0mryufixu this presents a false dichotomy between theory and empiricism i like to think of it more as a spectrum "data science" didn't come from academia it is merely a "reimagining" of industrial research to make it appear sexier the term definitely originated as an industry buzzword particularly by sociotech startups in the silicon alley area of manhattan hey anctheblack thanks for the feedback i totally agree with what you're saying the point i'm making is that for absolute beginners focusing on the abstract can end up discouraging them from continuing since it is not put into a context i think this bias towards machine learning and stats theory is because most higher level data scientist with masters and phds where taught that way what would be your advice to a person new to the field? i'm sure there is a more nuanced way to approach it thanks! my first advice to any person new in the field is to learn the theory and concepts empiricism doesn't exist without theory and vice versa big data has been a huge buzzword (uh term) for quite a number of years now how many companies actually have 'big data' though? i recently attended a data science bootcamp with people from companies around the country and while nearly everyone thought that they worked with 'big data' the company that i work for was the only one with data that even surpassed a few million rows (and even i only use teradata -- the hadoop environment is only for people with highly specialized skills) the instructors claimed that there are only a handful of companies out there that truly have 'big data' problems in that they work daily with so much data that distributed systems are absolutely necessary to what degree do you agree or disagree? volume velocity variety (and sometimes validity veracity) are the 3 (4) v's conventionally used to define "big data" volume is easy to have as is variety not that many places have the velocity part though it depends on how the data is used in many cases while i wouldn't say it's only a handful i do agree that most data heavy companies don't do all that much of the 3 vs of big data simultaneously maybe i don't understand something but how can "velocity" define a data? velocity is a perfomance of processing data but not the term to define a data itself am i wrong? processing gigabytes of data per second can be as complex an exercise as processing petabytes daily probably more complex tbh i know but depending on what technologies you're using you can process different amount of data in different time for example person a with technologies used x can process 100 gb of data for 1 second while person b using technologies named y can process 1 tb of data for the same 1 second so velocity itself can't define how big the data is or am i wrong with the definition of velocity? i think velocity volume etc are intended as metrics that define the practical limits of traditional single machine approaches to storing retrieving and analyzing data so a data stream with a velocity of 100gb sec would probably pose difficulty storing data using conventional tools thus it is "big" big in big data just means it can't be handled with conventional tools its not necessarily a measure of size velocity isnt a measure of data's size but it is a measurement that may qualify a data pipeline as "big" some who refer to big data projects are talking about dealing with a static offline data set i see velocity to define the rate at which new data is arriving that must be processed before it is useful so you're correct that velocity is partially determined by the performance of processing the data but that's not all i guess i'd say the processing performance determines the latency along with communication latencies but the rate at which the data is produced determines the velocity ex with iot sensor data being collected every 10ms and millions of sensors there is a much higher velocity of data being generated than a database receiving 100 orders per day if they're processed later in a 30day batch report the iot sensor scenario still has a higher velocity but they both have the same latency velocity can mean how fast it is processed and how fast it is acquired in the first place but my main point is say even if some central server at mcdonald's acquired all orders at once in continuous time the company probably wouldn't need to do much with that data immediately the "real time analysis" of streaming data is a less often encountered part of big data hence few (in relative terms) companies actually deal with that i work at a company where we'll get dozens of gb of data in an hour but that's the raw stuff what gets to me much later is dozens of mb from which i get insights that will eventually turn into some real time analysis features so while my company does deal with big data in the 3v sense a lot of the work does not as the others have said the velocity of data is generally considered the rate at which new data is coming in so it is not really about any single datum but rather the data set as a whole a common example of the velocity of data is twitter (and more broadly data stream mining) the speed at which new tweets are created makes doing almost anything with it impossible certainly no super-linear algorithms ( o(nlog(n)) or anything like that) is able to keep up and often even sub-linear algorithms in o(log(n)) are often necessary if you want to have any live analyses a lot of pretty standard approaches just no longer work and break when you have that even counting things can be challenging so when velocity is used to define big data it is this axis of a data set (rather than the individual datums themselves) that is generally being considered if we take the big data stance of "it's the data that makes everything break" then when you have a data set increasing in size so fast that they had to change their entire database infrastructure just to manage id creation (https: blog twitter com engineering en_us a 2010 announcing-snowflake html) you start to get an idea of why velocity is added to the 3v's of big data i work for a company in the fitness & wellness space and i would say we have "kinda big data" our sql-based data warehouse started to exceed the daily slas for having executive reports ready because it was just getting too big (most fact tables in the 100m+ rows) we moved from sql server informatica to s3 redshift spark in the meantime we started to ingest some large data sets (web & app tracking data realtime data from about 10 000 stationary bikes & treadmills around the world all emitting messages 10 times per second (so at 1kb messages that's about a terabyte every three hours) realtime beacon data) as the first "data scientist" (put it in quotes because even though i have 18 years of engineering analytic ml experience i only have a bachelor's degree so i doubt i'd get taken seriously calling myself a data scientist) the advantage of replatforming to me was that i could start to build smart systems like recommendation engines chatbots and data products on top of a more flexible open-source environment other than trying to detect malfunctioning hardware in our fitness centers there's nothing that i would say requires something distributed like spark (do people seriously put themselves through the hassle of using hadoop any more now that spark is so gangsta?) but our data engineers love it it's several orders of magnitude cheaper than anything else (because we use emr so we only pay for compute time we use) so our vp loves it and i have the ability to do make some pretty boss data science pipelines for stuff like personalization & modelling isn't that a shame that the "data science" brigade would be so dismissive of you with just a bachelors but nearly two decades of experience? the worst thing the tech has ever done for the meritocratic culture imo was to give academics such a foot hold into the industry worst not just a foothold but actually fetishize them it use to be people were applauded if they were so good they never even finished school now everyone wants three advanced degrees and 20 authorships i blame recruiters somewhat for this as data scientist is a high demand field but also a fairly new term it's hard to find people with experience as a data scientist because they were probably called something else before so the recruiter searches for people with phd in a scientific field because they tend to have all the core requisites and are easy to find next thing you know all of the data scientists people have worked with have had phds so that starts to define what a ds is fwiw this may be partially driven out of other data intensive and interdisciplinary fields such as chem bio informatics in these fields having a handle on all the necessary sub-disciplines requires long years of study and enough experience playing with data and programming to be competent enough to handle a dataset in some areas where results are taken very seriously you don't need a phd to be a data scientist in fact a lot of the phds ive worked with have made poor data scientists because their solutions are too academic and aren't practically applicable data scientists come in a bunch of different types so don't discount yourself because you didn't get a phd only going to speak generally since the other posts covered more specific definitions i am always skeptical of the people i know who say they work with "big data" but are running processing locally on a laptop in my field even a 16 node cluster will get bottlenecked easily at my last job we had a 400 node cluster and petabytes of storage that was backed up daily what were you using a 400 node cluster for? (industry vertical if you don't want to get too specific) computational anatomy primarily with biomedical imaging data (mri ct all kinds) but others also did things with genetics data i found that a lot of people who say "big data" can still run a handful of mysql tables on a single machine hell my first ds job was with a company whose entire dataset fit comfortably in memory so i could around with it in python more and more though companies do have datasets the require hadoop etc logging has just gotten easier storage cheaper and the tools to query​ faster "big data" can be defined a couple different ways: storing a lot of data (tables with millions billions of rows) processing a lot of data many companies fit into the second definition not as many fit into the first i'd say most companies have millions of rows of data but i wouldn't call that big data agreed i think your parent comment fell into the classic trap of thinking their data is bigger than it is if your entire corpus fits in memory it's not big data i work for one of the core vendors in this space from what i see the main problem in your question is that the definition of big data keeps getting bigger the biggest data by volume at the moment is machine data usually measured in low 100's of tbs per day but easily going into petabytes per day this still runs into the problem of what is economically storable think oil rig sensors or all the network monitoring data at a small telco big data can also be difficult data for example some parts of the insurance industry only deals in low tbs per day but it's all very messy xml that needs to be shredded and analysed a hundred ways very quickly velocity is a problem when you have billions of tiny messages that need to be analysed against other data to do things like detect fraud in banking transactions within a few hundred milliseconds - it's not by volume but the geographic distribution of the inbound messages and the window for response puts it in a sister category so perhaps it would be more useful to say that 'big data' is any data processing problem that doesn't fit in a single databases server or laptop - what makes it bigger than those stalwarts is not always the same however i'd imagine that pretty much any large company corporation especially older ones has at least 10s 100s of gigabytes if not terabytes worth of data if they have any efforts in an area like imagery this can be vastly more that said how much of that data is accessible or intended for any use beyond record keeping may be far less their use cases will determine the systems databases they need and usually those systems aren't really "big data" we have particular apps tools and analytics pipelines that do require pulling and processing data at scale and in those cases we do use things like hadoop spark cassandra etc more like petabytes except all of it is virtually inaccessible in the enterprise industrial iot space it's quite easy even for a simple poc to get beyond a few million rows of data when you have sensors & gateway devices publishing frequently we've worked with several companies who have thousands of devices in the field reporting data back but it's more likely that they're just getting started so in our space i guess i'd say about 20 or 30% of companies i've seen already have what might be termed "big data" problems while the others aren't more than a few years behind and want to avoid issues as they roll out their solutions there are many things we've done with our iot data management platform (http: brightwolf com platform ) to deal with the unique challenge of iot data and it's use in applications reports ml models etc but honestly i cringe at the improper use of 'big data' for iot and those who hadooop-up unnecessarily a lot of them have big data but very few have questions you can answer with big data for example i worked at a company with billion-row redshift database i did a study where i looked at accuracy rates for record matching between peoples' names and phone numbers i couldn't use the billion rows because the only way to determine if a record was created accurately is to actually call the number and ask (or use voicemail) to figure it out millions of rows but 30000 phone calls it became a "small data" problem similarly i was working on a team with a new product over time it'll probably become big data but when i left it was only about 500 000 rows because we didn't have that many subscribers i'd say very few companies have access to big data sets (100s of billions of rows) for practical use out side of google sap etc i have a double masters in applied mathematics and operations management i am in my fourth year of phd at a pretty good business school (studying operations management) but with a shitty advisor i still have 2 or more years to finish subject to publication pipeline i feel like i have learnt many technical skills (no thanks to my advisor) while being a phd student; but over these final years the learning is moving towards learning the social skills of academia -- negotiating the publication process handling the job-market -- which i find to be pretty esoteric uninteresting and meaningless i am finding myself more and more inclined towards taking up a data-scientist job even if i were to finish my phd so my question is: given that i want to go the data-science route is there value in finishing my phd? 2 years seems like a very long time given my inclination perhaps i can take up a data-scientist position right away and spend the next 2 of years gaining directly relevant experience? 2 what do you folks think? have you considered looking at internships as a compromise between the two? you'd be able to get out and try something new see if you like it and get a pretty valuable bullet on your resume i just joined a hedge fund for the next 6 months they wanted me full time immediately after the conclusion of my phd and to sweeten the deal they offered me their full-time base salary as an intern the phd scheduling gives us a lot of flexibility for that kind negotiation to happen because we're not beholden to only being interns during the summer i'm in a kind of data science software engineering hybrid role maybe someone more experienced can speak on how valuable the phd stamp has been after having their first job i was more interested in the engineering side so it wasn't as big a deal for me and i may just stay at the hedge fund rather than go back i'm working in a data science department and involved in recruiting i'd say keep going if you're telling me in a job interview that you didn't finish your phd i'm asking myself the following questions: how good is actually your work? do you lack stamina perseverance? if i hire you will you just quit in a year because you're bored? it's partly unfair but i'd take someone with just a master's degree over someone with a master's degree and half a phd the balance of good and bad reasons in each case means that from an ex post bayesian perspective i would assume you had good reasons to not start a phd but i would assume you had bad reasons to start and not finish a phd also fwiw i got hired in my first job about nine months before my expected thesis defense date; i took a 6-month break from my thesis to work full time then worked part-time for another 6 months to get my phd work to a barely defensible state it was not by far a memorable thesis nor even the best i could have done but hey i got the 3 letters after my name and they served me well in the rest of my career! the balance of good and bad reasons in each case means that from an ex post bayesian perspective i would assume you had good reasons to not start a phd but i would assume you had bad reasons to start and not finish a phd this hits on a huge point about hiring that you only truly understand when you're hiring someone yourself: hiring is for the most part an exercise in managing risk for every role you will see many many candidates who should be able to contribute in that role what you spend the most time figuring out is whether or not the flaws this person has could prevent them from being successful the top reasons why people are not successful in technical roles like data scientist? they are assholes they are entitled they are lazy if you tell me you quit your phd halfway through i am going to have a hard time ruling out the possibility that you are either difficult to get along with someone who thinks they are entitled to special treatment or someone who doesn't have the drive to finish things the only answer i have heard to that question that put my mind at ease was "i had a medical situation arise in my family and i was forced to leave the program in order to start making money" other reasons normally will sound bad "i didn't get along with my advisor" "i got tired of dealing with red tape" "i did not want to work the politics of academia" all bad answers nice set of comments social skills of academia -- negotiating the publication process handling the job-market -- which i find to be pretty esoteric uninteresting and meaningless this is a huge red flag for me these skillsets are the foundations of getting buy-in and working with people how to navigate the job market effectively is a significant challenge i don't think most people appreciate being able to sell yourself and your skillset is undervalued and therefore people aren't good at it for every person who thinks they're smart person there's probably a smarter person with more experience out there competing against them i'd expect someone who quit their phd 66% of the way through to have to convince me why going out and getting a job was a better trade-off than completing it that'd be a much harder sell than "the phd was hard and social skills are hard" as someone who left a phd program i'd second this advice having completed 3 years of a phd program before leaving definitely made my resume look worse my job search would have been a lot easier if i had spent the next 2 years getting a phd i'm not in the field so i can't speak to that angle but i've been in academia itself part of the answer will depend on why you wanted the phd in the first place if it was "just" to get a datascience job for instance that's not nearly as strong of a reason for doing it phds and the academic process are long frustrating things and you need to have strong enough reasons and motivations to help you get through them what's going to help you finish your dissertation? when the writing is tough and you're struggling to finish and the whole thing just feels like crap what is going to help you keep going? maybe it's a desire and joy in the research maybe the "terminal" degree is a personal goal jobs certainly don't hurt either or maybe you just went for the phd because you didn't know what else to do you could definitely earn some more money by switching to a job instead but i think your ability to do that would depend in large part on your technical skills capacity that you said you've been learning i do think there are some values to learning the "social skills of academia " it's frustrating but an important skill that phd students often underestimate technical skills are important but i found that a lot of companies and interviewers actually care about what you do how you handle those situations i'd recommend finishing up phd but definitely keep up your ds technical skillset (perhaps kaggle and classes) please don't leave your phd program i don't know if it's possible but if you can amend the situation with the advisor you would be much better off would it be possible to switch advisors? i think you have much more to gain by finishing the phd then you would by leaving now i would just focus on improving your experience since it sounds like you are somewhat discouraged and demoralized now [deleted] i assume you left the phd for a data science job ? in terms of your career did you wish you had finished the phd? is the lack of phd holding you back somehow? don't let the 4 years go to waste man! don't get attached to past investments evaluate your options considering only future gains yes most data scientist positions require a phd in stem just apply to jobs and if you get an offer you like over finishing then you can consider leaving leaving without prospects is too risky i'm a bot bleep bloop someone has linked to this thread from another place on reddit: [ r u_vroomparis] should i finish my phd? if you follow any of the above links please respect the rules of reddit and don't vote in the other threads (info contact) hey everyone! so for a brief background i've got a bs in bioengineering ms in a bio science mba six sigma black belt (asq) pmp and an it mgmt cert from a univ i say "somewhat-related" to data science in the sense that they are all technical areas heavy in statistics business and it i've been doing operational-related process improvement and data analytics for a large distribution manufacturer now for the past 6 years i've always had a massive interest and passion for machine learning visualization and telling stories from data i was one of the pioneers in the efforts at my firm to get our legacy system cobol data into sql and taught myself sql vba visual studio and sharepoint because i saw the need since then i'm now a company expert on all of these things and i'm the go-to guy from senior execs if they want a complex dashboard or to understand any of the big data in the company i'm looking to pivot from my current food distribution company into the tech industry as a data scientist i've reached out to old colleagues in the tech industry and they confirmed my own research which is that "data scientist" may mean very different things to very different firms that said it looks like it would be wise to study hadoop tableau aws and r from the ground up as for python they said i should strongly build my skills with scikit (numpy scipy pandas matplotlib) which is something i've had superficial exposure to one colleague said i should definitely emphasize the "self taught" bit on my resume as well they said bootcamps are an option (though they seem incredibly expensive) and i've discovered pluralsight which i signed up for as well which seems more affordable i've heard very mixed things about udemy for people in the field and those familiar with hiring expectations do you have any advice on where to start? is it realistic to pivot into this role given my background? are bootcamps necessary and if so are there any that aren't $1k+? are other certifications wise i e from microsoft or some other certification body? thanks so much in advance for the insight after lurking this sub i see that many of you are where i want to be in my career in the next few years [deleted] good insight thanks! and yeah i definitely come from science pluralsight is what i just started using and it seems pretty legit for cheap self-study as is i've applied to many top companies but all have been thanks but no thanks automated responses you might be a good data science analyst engineer for some bio related fields domain knowledge is important in any data related field so you have an advantage keep on working on your stats maths and computer science knowledge though i appreciate the advice definitely have to keep working and learning no calls back yet from the big tech guys but i am targeting them and learning more requirements it's nice to be happily employed while searching though to be honest the big tech guys would be more interested in people who have a masters preferably in computer science or someone who already has industry experience so perhaps i might be better suited focusing on startups or mid sized tech firms rather than the big guys to get direct industry experience? yes and also a masters will help a lot keep in mind people with masters in statistics mathematics and computer science (basically top data science majors) are having a hard time landing positions in the big tech world that's interesting because most of what i've read claims that there's a huge demand and not enough talent for data scientists is this just flat out misleading or perhaps only relevant to the overall industry with the big guys just able to be more selective than mid sized or startups? there’s a huge demand for data scientists who have both a masters in a relevant degree and relevant industry experience look at data science job descriptions and look at the job requirements almost all will require masters and some experience the ones that don’t require those things might have you be doing business intelligence or reporting with the title of data scientist (hence title inflation) or it is at a smaller company i would look at coursera there are is a great introductory course by andrew ng there but i would look at r courses as well cool thanks! by adding coursera to the list it looks like pluralsight coursera and udemy are the best economical choices for self study i've just started to learn data science by myself with dataquest have you tried it? do you think it's a good one? you have strong experiences and can use many tools languages and can't get the job you want i almost start from crash i wonder what the chance for me i haven't heard of dataquest but i'll check it out i'm pivoting from supply chain distribution into tech so i can't really speak on your chances on an interview but i can attest to the fact that hard work and a thirst for knowledge can take you a long way in whatever path you choose i just met a senior data scientist at a well known company who didnt know r or python just vba power bi and a bit of sql if he can do it you can do it i have an undergrad degree in finance and have been working at an asset management firm for over 6 years now while the experience has been awesome i realized my passion lies elsewhere and want to get into data science as a potential career change preferably to something closer to business intelligence or business analytics i’ve been looking at graduate degrees in business analytics like the one from nyu stern to help me transition i know this is not purely “data science” but i think it uses a lot of related skill sets i was hoping to get some advice from people here on whether there is a place for me in the analytics field especially coming from a non tech engineering background? would firms value my less traditional background from a data science standpoint? yes there is definitely a place for people with educational background in finance finance teaches you a universal business domain all companies need to quantify the value of their options finance including asset management is a set of skills tools for modeling risk and reward decisions data science involves much of the same skills but often different tools instead of using excel to create models we write our data reshaping and modeling logic down in code instead of guess and check methods we use grid search through hyperparameters instead of relying on a market to give us feedback on the success of our models we use the scientific method (hypothesize design experiment collect data analyze data refine hypothesis) i have been a data analyst professionally since 2011 and a data scientist since 2015 i have one of those new fangled masters of data science save your money you can learn all you need to get a job from free online courses and practice on side projects if you are worried about how much you don't know and how much there is to learn i can relate data science is a large not well defined field however to be a data scientist that has no problem getting hired at great companies the learning path is pretty straightforward learn to code (choose python or r to start) learn to use databases of different kinds (sql like mysql or postgresql nosql document stores like mongodb columnar stores like redshift or cassandra key-value stores like redis) learn the art of data transformation (aka data wrangling munging cleaning reshaping etc ) this probably includes learning why mapreduce took off and why spark is so popular now though you don't necessarily need to learn those tools learn the fundamentals of statistics (summary descriptive statistics and probability distributions why a statistic like mean average can misrepresent the data regression to the mean and central limit theorem) this can probably be done through one or two online courses through edx or coursera (no need for certification) learn a dozen machine learning algorithms (linear regression logistic regression decision trees random forest gradient boosted trees neural nets svm knn kmeans dbscan hdbscan collaborative filters are my recommendations but feel free to pick whatever you think is interesting) goal here is not to code from scratch or master in an out of the math but rather to understand what the algorithms do from a logical perspective and what the trade-offs between them are (bias vs variance assumptions made computing cost etc ) also learn how to get data into the format needed for these algorithms and how to use your programming language to train models using them learn how to evaluate predictive models -- accuracy sensitivity vs specificity f1 score confusion matrix auc and roc cross validation and train test split learn how to tell a story with data don't make reports dashboards just because that is what's asked to be a data scientist you need to think like a scientist do research present your findings don't wow dull academic papers write interesting blog posts but for your business stakeholders convince them why the research you did matters what you learned what you can do now that you know that information why you should or should not continue on that project iterate and communicate don't over exert yourself on projects with little to no reward people who aren't programmers often have no idea how to judge how long code takes to write people who are not data scientist often have no idea how hard it is just to find out if the data you have is useful even if you do know making good estimates is hard spend a few hours exploring but always be willing to say it's not worth pursuing further understand the difference between good and good enough you aren't aiming to get an a on every project you aren't aiming to make a positive return (though you should use that as a metric for success) you are aiming to collect and analyze data to make the best decisions possible sometimes the best decisions are negative roi because doing something is less expensive that doing nothing (this can only be temporary or the business will fail but can be a valid outcome) a simple model that takes you from 50 50 guess to 80 20 guess is typically a much better project than one that takes you from 90 10 to 95 5 try this for at least 6 months (in serious effort) if you don't feel you are learning enough or making progress then consider graduate school at worst you'll have a huge head start on your fellow students and this probably more likely to learn more from the classes since you're not also trying to learn to code and basic statistics at the same time good luck i should have specified you don't need to be amazing at all of those aspects you should try to be great at one or two but comfort competence is enough to get you hired fantastic reply!! this is all great advice i recently made a similar career switch and while i haven’t progressed as far as this post details it’s all very good advice thank you for the advice! i’ve been using sites like udemy and datacamp to learn basic and intermediate r programming it’s definitely something i need to try to apply on my day job to really learn though again great advice thank you thanks for this post xubu i'm in a similar boat as op i'm a former investment portfolio manager in spite of having an excellent track record the job market for my skills is not as good as i would like (particularly as passive investing starts to dominate) this is particularly true given where i live in the country (new york and san fran are better for investment than my city) so i'm taking online courses for data science programming and the such pretty close to following your list of recommendations actually why does moving to ds mean leaving am? i run a data science function in an am (formerly ran a quant team) am is severely lacking people who take a non traditional approach to analytics and research it may be worth looking at internal opportunities as they would also pay for your training yup this is another option i am considering beside the quant shops and hedge funds many of the more traditional investment firms are only just starting to invest in big data analytic capabilities and are trailing other industries nyu has a financial engineering masters program fyi im an experienced analyst and am looking for an online masters program to help me advance my analytics skills employment outlook etc i’m personally interested in something along the lines of predictive analytics machine learning etc i’m considering a masters in data science but i’m also strongly considering a masters in math or statistics i’ve seen threads here where people strongly recommend math statistics over data science for various reasons so i want to at least keep my options open assuming i want to do math or statistics vs data science would it be better to do a ms in statistics or a ms in applied math? i like applied math more purely for the interest of it but i think statistics is more relevant and more practical if i want to go the data science route is that correct? applied math seems a bit broad too broad and it doesn’t focus my expertise as much plus will hiring managers appreciate a masters in statistics or a masters in math? thoughts? "applied math" degree programs often focus very heavily on differential equations and numerical methods specifically as opposed to other areas of math that stuff would not help you as a data scientist check the coursework carefully not always i did applied mathematics in my masters but focused on information theory which was mostly linear algebra i have a degree in applied math and did the differential equations side and statistics since i was well funded i have a masters in applied math lots of numerical computing and differential equations but for most ds work you want more statistical modeling if i were to redo my education i would definitely go stats with whatever cs i could find time for what is the coursework? applied math can be very abstract and not so applied stats will give you more education in the stats way of thinking and fewer opportunities to build novel models but it’s a great exposure to probability at a fundamental level generally speaking "applied math" in universities mean applications of math on physical or engineering problems for example how do you approximate a solution to a differential equation describing the temperature of a moving fluid at a certain time and point? it involves a lot of differential equations and numerical analysis math applied to problems of decision making in business government military etc is called operations research so for example how does an airline schedule flights from jfk to lax in december and what should the ticket prices be to maximize profits? statistics is not quite the same as math many people would say it's a science of its own statistics usually deals with randomness and analysis interpretation of information collected from a data set it sounds a lot like data science right? nate silver once told a crowd of statisticians at a 2013 joint statistical meeting: i think data-scientist is a sexed up term for a statistician personally i agree some may differ in opinion but i think "data science" is just 21st century statistics many universities don't make the distinction between the fields i laid out into separate departments others do but generally speaking at universities these are the "applications" of math i don't really know what applied math entails but in my experience for data science most statisticians do not know enough computer science and most computer scientists do not know enough statistics as an experienced analyst you almost surely know sql and some programming or scripting languages if you get a solid grounding in statistics you'll be much better positioned for data science machine learning and statistical learning are essentially synonyms being able to perform detailed mathematical derivations of optimization procedures like gradient descent is good to have but a good statistics program will give you a solid mathematical foundation that will serve you well in data science most statistics programs will have a wide selection of electives available to you if you don't already have a strong foundation in calculus linear algebra and discrete methods you will benefit from them in both statistics and data science can you link the coursework? people are speaking very broadly here about what applied math means it's pretty program dependent you can basically expect a year long sequence on general numerical methods and from there it's program dependent masters in applied math (since you like that stuff) with focus on advanced matrix algebra statistics and probability convexity optimization and scientific computing numerical analysis in vancouver most of data scientist who work with big banks start ups samsung few government companies they all from sfu u university they don't have a single math course i wonder if you really need an online course for it i'm looking to enter the master's program in data science with university of wisconsin i have a b s in physics through them and am working currently as a chemist at a pharmaceutical company and frankly i hate it so i'm looking to go in a different direction with my career i was told by the advisor there that i would need three courses in stats programming and databases just to show i'm not blowing smoke up their asses with my claimed skills in those areas due to the evil of pharma i work a 12-hour rotating shift meaning i cannot take classes in person they have to be online and i've never done online before so i'm kind of at a loss as to what's considered 'decent' while still being affordable i've found several places to take an online stats course but i'm having trouble finding a decent database course and an even harder time finding a generic programming course that isn't html i think they need to come from a legit and accredited school so i don't think devry will count anyone in a similar situation have some answers? are there some kind souls out there who can point me in the right direction? appreciate the help i was happy with courses from both aleks for math (like stats) and straighterline for a bunch of other stuff i don't think that aleks is accepted for college credit any more but sl definitely is and may have courses that will work for you if your school has a pla program something i've seen fairly commonly is for people to take a free or inexpensive mooc and use that experience as the basis for their pla challenge to a given course part of the reason it needs to be accredited is that i need a b in the courses as well and they seem pretty strict on that rule so an open course or something i don't think will count i'll look into straighter line and bring it up with them and see if that will be acceptable thanks straighterline doesn't issue grades just pass fail though you can view your scores on their website most of the online coursework sources are similar so if an actual grade is required it's going to be a lot harder to find something you may have to start looking around at other universities with online programs that have the courses you need but i fear this will get expensive pretty quickly tesu is regionally accredited and offers most of their course catalog online but the out of state non-matriculated cost is around $500 cr if memory serves so that's almost five grand to cover the three courses you need good luck crap i started getting my hopes up lol but yeah that's the problem i'm running into with all of this is that i might have to go to three separate schools and hope i can get into a database course without having to slog through a bunch of prereqs really appreciate your help though head to degreeforum com sign up and ask there it's mostly focused towards earning degrees entirely online but there are some very knowledgeable people there that can probably offer more suggestions than you'll find here good news! the advisor accepted the straighterline courses for the prereqs! now i just have to find a database course and i'm all set to start my master's thanks a lot you really helped me out! cool glad to hear it good luck check community colleges in wisconsin they will be both cheap and reputable most universities will accept transfer credits from ccs within their same state thanks i dug around there first and they said the offered stats course is too lightweight and uwm has an online database class but it's $1200 i'm seeing if they'll accept foothill university they said they didn't care where it was from as long as they thought it was ok hey there i'm involved with a startup called dataquest - dataquest io - and you're literally our target learner to avoid being too self-promotional i'm happy to chat over dm if you'd like (or about anything else re: data science i certainly don't claim we cover everything at dataquest and there's tons of other great resources i'd be happy to point you to!) take udacity nanodegree program for data science https: github com mikesprague udacity-nanodegrees or coursera data science path https: www coursera org specializations jhu-data-science thanks but i'm looking for an online stats and database course that i need for a pretty for the data science program not a data science program in that case take a course on statistical inference https: www coursera org learn statistical-inference and an introductory course on database management https: www coursera org learn database-management after learning about sql go to hackerank and solve their sql problems to have a good grasp on effectively using sql to fetch the data you want from database consider learning a programming language also python highly recommended thanks again but they need to be accredited with a letter grade i already know c++ c# and python and been messing around with r i've been programming for years and have plenty of statistics in my background i need specifically what i asked for in my op the rest of what you're saying is useful but i think i may have been unclear with i'm asking for ohhh my bad didn't read the complete post :d no worries any help is good help tl;dr looking for some advice about getting into data science analysis at the highest level given my skills and timeline i'm currently in the process of changing career from academic philosopher to data scientist having done a phd and published in philosophy i feel that i have excellent 'soft skills' and the ability to learn new concepts very quickly as well as ask the 'right' questions techwise most of my coding experience comes from writing google docs add-ons (e g one called cross reference) but what i have done is fairly advanced for that limited medium and has involved solving (relatively) difficult coding problems i am completing the 'data science with python' course on datacamp and finding python relatively easy (not that different from javascript) i am also learning statistics from textbooks and tableau gretl from a udemy course on datascience my hope is to apply for a job by the beginning of june 2018 so my question is: how would my time best be spent between now and then to help me get into the best role that i can? should i dedicate more time to stats or coding? i have been told to take on a data science project and document it online but should i focus on some unexplored data that need a good clean up or should i take an established dataset and do something analogous to other people's projects? how should i document a project such that an employer might see it? can i put such 'hobby' projects on my cv? is it realistic to expect to get anything above the entry level data analyst roles? if you made it this far thank you! any advice is greatly appreciated so that i can put together a solid plan for the coming year i do not think you will get interviews for data analyst roles companies don't want someone who is going to get bored in the first month of their job and leave - you absolutely will have you thought about doing udacity's machine learning nano degree ? the only reason i suggest this is because of your timeline that's pretty quick and you aren't really a candidate for an entry level job it's relatively affordable and is project based i have never done it but doing that + learning how to program + a really strong capstone might net you a data scientist role given you have a phd i'd certainly at least interview you ideally you should have some projects that deliver data driven products understanding --- evidence of strong coding on github --- evidence of strong writing from your publications expect your technical interviews to be tough and weighted high because you do not have a math cs or physics degree edit: i applied to entry level jobs right outside of my phd and some mid-senior level roles and didnt get any interviews for entry level but some for the mid-senior roles i entered a bootcamp to help me with the technical interviews ect (since i was just getting interviews based of my physics programming background) was able to multiple offers after the bootcamp so while i learned some (i actually spent most of my time doing other data science related learning) that "some" was the stuff i needed to actually get through an interview to me that would have been incredibly hard to find self assess in a timeline of 3 months so for me it was money less spent if i did it all over i'd probably would have done udacity thanks for this this is the problem for me i'm overqualified (in the wrong area) for entry positions and underqualified (in the right area) for more advanced ones! one plan was to build a website that acts as a portfolio where i can present the findings of a datascience project projects in a neat interactive way -- and also document how i arrived at that point that way i can demonstrate skills from data wrangling data analysis dynamic interactive presentation of data and technical writing i'll definitely check out udacity's ml course as well what kind of topics did you cover in the bootcamp? could you possibly suggest a textbook or another source that you could recommend? first do you want a role as "data scientist?" why are you so sure this is what you want to do? data science is a broad term but the core is statistics and machine learning most of the technical skills are needed so that you can effectively perform the above tasks if it is 100% about money focus on programming pure tech as there are lots of well paying jobs in those areas outside of data if you are extra sure you want data science then the focus should be on statistics and machine learning i don't know what your phd topic was but you could try to brand yourself as a [most relevant topic here] data scientist [cultural data scientist] [logic data scientist] etc (these are terrible but the point is to use your background as a strength ) cross reference is a great show piece have you ever tried doing analysis of your users? from a real quick look i'd say you might want to focus specifically on some natural language processing work lots of companies have huge sets of natural text or transcriptions etc and they have no idea what to do with them things like affect detection (microsoft cognitive whatever) are pretty popular show me cross reference and an example nlp project (pull down some reddit data or twitter data ) and i'd be likely to believe that you are a "nlp data scientist " imagine you were able to build a system that could help track perceptions of "marketing ethics" based on social media data suddenly your philosophy research isn't just some "unrelated thing" but something that maybe only you can do anyway just my quick thoughts main point being put your previous work and skills up front as items that make you valuable rather than "how can i get into ds " try to find a way to say "how can people be doing ds without my knowledge " also wrap your projects in an llc you can play the "startup" game a bit thank you for replying i thought about the language analysis route definitely lends itself to a philosopher's skill set (phil of language among other things) i thought about looking into sentiment analysis which i think fits with the kinds of things you're describing at this point i am wondering whether to just go the tech route to be honest pure coding is something i really enjoy i'm still getting my head around all of the roles that fall under terms like 'data scientist' 'analyst' 'developer' etc natural language processing is more about computation mathematics and statistics than actual linguistics what an interesting background! please post later and let us know how it turns out to be brutally honest your philosophy background does not contribute to data science skills (barely even the soft skills except for the rudiments of how to communicate clearly and act professional) and nobody will believe that they do so you have a very hard road ahead of you i would say that: probably the most important thing for you to learn is coding you need to be able to chug out several-hundred-line programs efficiently it's hard for me to gauge from your description whether you are used to "algorithmic thinking" - it's perhaps the most important but hardest to quantify skill for a data scientist you need some data science projects online to prove to people that you can do this stuff i would only put it on a cv under an "example projects" section that's unusual in a cv but it might make sense in your case since you will have to prove yourself read a good book on data science to give you an overview of the field for obvious reasons i always suggest the book i wrote but there are other good options too i hope i'm not sounding too bleak here if you are able to acquire ds skills and get an interview then you should be fine; your phil background will then be a cool talking point as a pure data scientist i don't think you could get hire above entry level you might want to consider a hybrid role one that's like half ds and half businessy stuff where you could get a more solid ds background if you can learn quickly this might leapfrog you up the professional development ladder as you could then take a higher level ds job after a year your phd does not help you with ds tech skills but it might allow you to do a lateral transfer into data science as a pure data scientist i don't think you could get hire above entry level you might want to consider a hybrid role one that's like half ds and half businessy stuff where you could get a more solid ds background this doesn't sound bleak to me my aim was always to start at the bottom even if that means starting below pure data science and working up i'm just concerned that phd will make me look like someone who'll jump ship if the job is not stimulating enough my library has your book! i'll be checking it out (literally) next week thanks gotcha if you are ok starting at the bottom then that's an unconventional career trajectory but i don't think the phd would hurt you out of curiosity what did you study in philosophy? my wife is really into political philosophy hope you enjoy the book! please review on amazon if you have a chance :) i'll start as high as i can but i'm realistic! i studied nietzsche for my phd which has some relation to political philosophy i subsequently worked on consumer ethics with limited success your wife might enjoy a paper i published called nietzsche's cultural elitism which deals with rawls among others back in school i took a couple quantum computing courses at the time it had been geared mainly towards the theory since the practical application had not yet been fully developed but in the first of those classes i had a philosophy student take the class with me because he was interested in the course material something about quantum physics plays heavily in the philosophy of choice which is what his research was in (or something like that) this other student didn't last 4 weeks in the class he did not have the requisite math skills to follow along which in the class' formulation amounted to elementary linear algebra now i imagine that my interaction with a philosophy phd is pretty unique but it isn't a long stretch of the imagination that a hiring manager will not think highly of your skills in mathematics computer science or even whatever industry in which they happen to conducting business i think your goal of june 2018 is unrealistic i would expect a physicist employed as a software engineer or a mathematician employed as an informaticist to take 6 months to pull together a decent expansion of their already-acquired skills into a data scientist it can be done; i was a shoe salesman before changing careers but it will take both time and effort so let me try to answer your questions you should spend more time to stats and coding more time than you planned at least a year where you have a full time mentor or collaboration group to help guide you 2 years if you're going alone from a book or class my analogy is golf anyone can hit a golf ball with a club it takes a lot of hours of practice to do it effectively yes you should create a portfolio so that you can prove your worth to a hiring manager your phd isn't going to do it create a github account check out the competitions on kaggle learn shiny and jupyter document everything and keep your projects even the ones that aren't elegant or have failed after a series of recognized successes begin to build a portfolio the only people we hire for data scientist roles that aren't entry level are already data scientists thanks for replying so quickly! i have an a-level in maths which included some calculus (and linear algebra) i have also really enjoyed looking into the maths underlying statistical tests (albeit not far into it!) i also have way better it skills and programming ability than any philosophers i know but all that being said you're probably right to say i need more time that's actually not a huge problem the issue is really using that time in the most efficient way i'm also in your position (unrelated phd strong stats and quant methods background looking to transition into data science because i'm passionate about data and its applications specifically) and from what i've heard it's hard to get anything entry level with a phd on your cv resume i'm doing the coursera data science program and am also taking courses on python and machine learning from edx i hope it will be enough but i'm not sure maybe a similar path would work for you? what i wouldn't give for a strong stats background! it's so annoying that i'll struggle get in at a lower level and work up because i'm 'overqualified' i’ve noticed a bunch of posts here that are of the form “i’m a x major and want to be a data scientist what do i do?” so i thought i would chime in and give my opinion i hire data scientists so i thought i would tell you what i look for when i’m hiring newbies experienced people are another story first you need to be able to code and i’m more interested in hacking skills more than formal systems development if you need a formal requirements document and 6 agile sprints to complete a data science engagement you are taking too long besides 50%-70% of your work will be taking the crappy data you are given and putting it in a form that can be analyzed for that you’ll need perl python vb etc after you have done it you probably don’t need to do it again in the exact same way so you need to hack that mother it would be awesome if you had minions to do the shit work for you but let’s face it if you are just coming out of school you are the minion i knew people in grad school that got straight a’s but couldn’t program their way out of a paper bag i don’t want those people so convince me that you can write code you need to understand data this seems to be under represented in “how to be a data scientist” posts but it is very important but you need to have a good feel for data representation and modeling sooner or later you will need to impose structure on data or the data that is given to you will be highly structured being able to put your data in the correct form can make all the difference in the world when it comes to accuracy and speed of analysis you need to be curious if you see a dead squirrel on the top floor of a parking garage and your first thought is “ewwwww a dead squirrel” i don’t want you if you first thought is “hmmm why did that squirrel come up here if there are no trees nearby? there isn’t any squirrel food around to draw him in how did he die? i assume that he was hit by a car but is that really the case?” and you go into full csi:rodent mode then you are someone i want to talk to if something slightly odd pops up during your analysis it could be your mistake bad data or an interesting discovery all of these are important but if you don’t notice and follow up on the initial oddity then we’ll never know will we? i understand if you keep that side of you under wraps around normal people i do as well my friends and close relatives all think i’m strange when i start wondering out loud about things like that if you haven’t read all the sherlock holmes stories then maybe you should however that being said don’t take it too far i’ve seen stats people go down rabbit holes for weeks and come back with a set of technically correct but complete useless results this is especially true with big data systems there are an infinite number of cool but useless things you can find your curiosity should always be filtered though the sieve of practicality just keep asking yourself if what you are investigating will ultimately be useful to the client you need imagination there are multiple ways to approach any analytic problem and you need to be able to see most of those more importantly you need to be able to imagine entirely brand new problems that can be solved with the data and tools that you have if you only find what you are told to find you are a failure convince me that you can see things that others can’t unless what you see are hallucinations then please keep those to yourself tell me a story you need to be able to convert your brilliant analysis into something that normal people can understand if you use the term “p-value” while explaining results to a client i’ll dock your pay unless you are lucky enough to have a client that understands all that stuff then by all means geek out you need to have good communications skills you need to explain your results why they are significant and why someone should trust them in a way that civilians understand you will also need to interact with clients on a continual basis since you may understand analytic techniques but you most likely won’t understand the domain you are working in a big part of data science is learning about the business domain you are operating in if you don’t you run the risk of embarrassing yourself by giving clients results that are obviously wrong or trivial my first results that i delivered to a client many years ago were horribly wrong because of some fundamentally incorrect assumptions i made about the completeness of the data that i would have known about had i asked and they were completely obtuse my work product sucked and i paid the price finally you need an area of technical analytic expertise this is where your background in stats machine learning natural language processing etc comes in there isn’t much to really say here if you are coming out of school this is what your degree should be in or you need to have shown a significant project or two in these areas you should be able to describe how to solve a sample problem that i throw at you using your preferred technology i expect you to be technically competent in one area and conversant in the rest hmmm this was way longer than i thought it would be hopefully it helps of course additions comments and vicious flames welcome as a fellow data scientist hirer everything you wrote made me so happy then i thought of every interview where somebody violated one of these points and i started drinking again so long as it is high end liquor that is perfectly acceptable and the candidates seem much more qualified after a few shots i feel the obvious step at that point of the interview is to thrust the partially consumed bottle of woodford reserve at them and say "analyse this" in your best action-movie-one-liner voice besides 50%-70% of your work will be taking the crappy data you are given and putting it in a form that can be analyzed read this far and realized you deserve a medal everything else is icing on the cake x months to clean the data 1 day for the ba to analyze it all in excel how do you sell that to a regular civilian? because many people do not understand and simply want to hear "this variable has increased by 1% last month" what do you mean exactly? how do i justify my existence or how to i communicate my results? maybe i use different data sets than you'd expect but if i said (to the relevant person) that kpi x has increased by 1% everyone would understand that sorry for not being clear you wrote "x months to clean the data 1 day for the ba to analyze it all in excel " people usually do not understand that getting the data in the right format takes a long time and they think that working with data is more like the ba part how do you explain that to someone that is waiting results that you can not really say anything until your data is clean? i think its just built into people's perceptions i don't know what similar business' are like but they must be pretty poor on reporting getting the changes put into to production db takes a few months by itself that being said i emailed a ba 2 months ago with a question about a dataset and didn't get a response i guess everyone is busy i read this in r in action and it's proven itself true time and time again if you use the term “p-value” while explaining results to a client i’ll dock your pay i like you i laughed at this way harder than i should have as a statistician who recently transitioned into a more prototypical data scientist role all i can say is that you really have to know your audience sometime when i'm presenting results i'm in a room full of phd's with excellent statistical background who will grill the shit out of me if they think for even a second that i don't know exactly what's going on with the data and the significance of the findings other times i'm in a room full of highly successful but data illiteral individuals who don't care how it works or why they just want to know how all of those fancy looking charts on the screen turns into $$$$$ [deleted] that would be nice but i'm not going to hold my breath until then? i just have to make the best of the resources at hand can i come work for you? thanks for the helpful post i'm currently in grad school and i hope to become a data scientist ideally once i graduate this summer (but more realistically it'll be an "eventual" goal to work towards) i think the main issue now especially for us wannabe data scientists is whether a company would be willing to take us math stats compsci etc students and recent grads and take a chance on us clearly there are very rigorous requirements for a proper data scientist much of which cannot be taught in a classroom so it seems like the best way to actually become a data scientist is to gain some experience leaving us in a catch-22 situation clearly there are very rigorous requirements for a proper data scientist maybe my experience is wildly atypical but this kind of made me laugh :) my route was decidedly different phd in 2010 worked in finance for 2 years got laid off moved to austin and started working at startups at the end of the day i'm not sure that i'm a prototypical data scientist but i think that few people who do "data science" (whatever that means) are when i hire data scientists i want to know that they can answer questions you don't have to be proficient in everything because even i spend a fair amount of time on stack overflow but you do have to be able to look at data and tell a story i also get the sense that people confuse "data science" (broadly speaking) with some platform or specific technology (which is what i assume you mean with "rigorous requirements") i saw paco nathan (look him up) at a conference here in austin and he said something like "chemistry isn't about test tubes" chemistry is about understanding the world at distances from an angstrom to a micron (ish) that sometimes requires test tubes more often it requires intuition and curiosity--that's what makes a good chemist not being really good at using test tubes if you have to learn how to use a test tube and you probably will then do it so if you find yourself in a catch-22 like this i think you need to try doing something that convinces people that you can solve problems and tell stories we hired a guy who started a data science blog go fork some repo and commit to it (i know for a fact that twitter hires data scientists from people who commit to certain repos) make sure you have some analytics on your resume and that you can do stuff with python and or r and that you know some sql one final note: most companies (especially places like facebook google and linkedin) do strong culture screens we typically turn people down not because they can't hack the job but because they don't fit with the company culture--and we're a small company this is an easy fix--you need to really understand the company where you're applying and you need to be excited about that opportunity and if you don't fit in you don't fit in---that's ok i guess what i meant by "rigorous requirements" was: a deep understanding of statistics many people say they know statistics when they really just know some hypothesis testing steps and get confused by basic concepts (which to be fair can be unintuitive) proficiency in programming i don't think the language usually matters but sometimes you need someone who knows a compiled language like java or c++ well which is something even the most hardened statistician can trip up on as for the actual programming itself while i consider myself a proficient "implementer" (as in i'll take some mathematical algorithm and implement it in code that'll be both efficient and generalizable) i've never taken an actual computer science course and still have trouble with more lower level computer science concepts outside of algorithm implementation i find that a lot of other people have an inverse problem--they have degrees in computer science but they have trouble writing say an optimization algorithm from scratch hacking and troubleshooting i know a lot of developers and engineers high up on the corporate ladder who still can't diagnose and fix simple computer issues and that's okay in some cases--they're specialists who are geared towards performing a small set of tasks accurately and efficiently but i have a feeling that this won't fly in data science a bunch of other stuff i'm neglecting in the end i just feel that there's such a massive volume of things to know (or at least be familiar with) that unless you've either spent some time as a data scientist or working in a large variety technical fields you'll always come up short i've heard that there's no such thing as an "entry level data scientist " which makes me think that maybe despite my degree and internship i'll end up back in it or something once i graduate ahh i see it all depends on what job you're applying for--you can't expect to get a senior level position out of school i feel like you are reading a very thin slice of job postings to have this view and you're definitely wrong about "entry level" data scientists--look for job postings which require 0-3 years of experience that's the definition of entry level--here's one from apple on linkedin typically the requirements will say "ms phd in analytic field or commiserate industry experience" for you entry level think of a job posting as a three-year old's letter to santa they want a pony a diamond ring a time machine and a dinosaur that'd be nice but they'll probably be happy with $30 worth of plastic and possibly the boxes in which it was packaged in particular: deep understanding of statistics depends on the type of job but typically means that you know enough about things like linear regression to be conversant and understand their limitations depending on where you're applying proficiency in a programming language like c++ or java shouldn't be necessary learn python and that should get you pretty far also "proficiency" doesn't mean you're going to have to write 1000 lines of production ready code a day it means you understand the general patterns and language structures i took a (single) computer science course once cs 1301 then i forgot everything i learned and had to reteach myself c++ in grad school that's how it typically works hacking troubleshooting = solving problems as they arise this is probably 80% of what you spend your time doing in grad school i think a lot of people decide "i want to do data science" without understanding what it really means and then get confused discouraged because their window into the field is a bunch of job postings that some hr person tagged on monster or (worse) some article written by someone who majored in english because they hated math if you can tick most of the bullet points in the op you're probably as well-off as any other working data scientist when they were first employed econ finance major checking in can you provide an example of of an "optimization algorithm" and how it was built? i'm trying to build some intuition to do better research (googling) i am narrowly familiar with lagrange optimization and with python if that provides some context just in case it wasn't clear i was referring to mathematical optimization--finding a minimum or maximum (edit: my bad it's clear by your comment that that's what you meant) the most obvious algorithm would be gradient descent before today i hadn't even heard of that algorithm just spent the last 45min reading articles and watching videos on it does a library of these "common" algorithms exist? or can you send a few more to play around with the article i found had python code building a gradient descent to 'find' a linear regression i thought that was way too cool and it helped me better understand even though i still hardly understand it you might want to check this out many of these are covered in conventional undergraduate numerical analysis courses more important than anything is being business savvy and actually knowing what your stakeholders want and how to achieve this a lot of people need to get off the whole machine learning hype train and do what is best for the business using what tools are best for the business you need to understand data this seems to be under represented in “how to be a data scientist” posts but it is very important but you need to have a good feel for data representation and modeling sooner or later you will need to impose structure on data or the data that is given to you will be highly structured being able to put your data in the correct form can make all the difference in the world when it comes to accuracy and speed of analysis adding to this you need to know what methods are appropriate for the data you need to have some understanding of theory-building and how it informs the research design process this is a big problem i'm running into frequently with other analysts data scientists in our company i'm currently in the middle of untangling a giant mess of a project that was poorly designed -- the initial team seemed to just throw methods at the data to see what stuck well now the models are unstable because they never actually fit you can code all you want and rattle off all the sexy methods that are out there -- but do you know when to use them? and when not to? now create a bot to comment this on every one of the "i want to be a data scientist" posts that come up no we need to come up with an nlp algorithm that analyzes and finds the commonalities between them so we can feed to to r for some statistical analysis then we can wait where was i? shouldn't be that hard the very basic nlp stuff should work - count the questionmarks the "how" and "what" the "data science scientist" and you'll probably hit around 80-85% accuracy where do you hire by the way? and what's the difference between a newbie and an experienced person? what you described is actually a pretty wide skill set most of which can only be acquired by practice i'm in the washington dc area for the purposes of this post i'm considering "newbies" to be coming straight out of college maybe with only a year or two experience but if you have worked for a living the criteria changes a bit people that want to change careers are tougher to evaluate my expectations for new college grads is low they don't cost as much and i know they don't have all the skills i just want them to listen and be trainable would it be possible for you to post which company you're hiring for or other companies you know looking for newbies? i'm curious what the salary range for one of your college grads would be? i've been doing an analyst type role for few years along with building out tools to automate the pulling analyzing and reporting of the data so other people can do it next time i'm working on improving my skillset so i can move up the ladder and i'm wondering what entry level pay is up there the squirrel hypothetical is my new go to interview question! csi: rodent brilliant! that goes with my other interview question: "you're in a desert walking along in the sand when all of a sudden you look down and see a tortoise it's crawling toward you you reach down and you flip the tortoise over on its back " and then build a model to estimate the response time for peta creating a new billboard ad based on previous turtle (er excuse me tortoise ) flipping incidents? bonus binary variable: pamela anderson is is not in the advertisement double bonus: will there be a sarah mclachlan song playing during the ad? p(sm|peta) = 1 0 yaaayyyy! i just wanted to celebrate that i'm deep enough in the john hopkin's coursera data science certification that i understood what this means! i think i'd fail that question my response would be something like "what? why would i do that? poor thing was already out of it's element - there's no need to guarantee it dies i would follow it's track though hoping it came from some kind of water though and how'd it even get here in the first place?" sorry that is the nerd test you can't come into the data science clubhouse unless you know what movie it came from without google oh blade runner apparently that's one of those movies sitting on my backlog that i haven't bothered watching partially because last time i decided to watch it i got confused about the multiple different versions and wasn't sure which one was the definitive one is this stickied? vb? visual basic? seriously? ;) i totally get your point about hacking the data though i don't see how this is a job description fitting of a 'data scientist' vs say a normal statistician though? quick quiz what application(s) that can store and display data is almost guaranteed to be on every clients desktop and they all know how to use? right excel and sometimes access many times i wished i had a vb programmer so i could make what i delivered a lot slicker than a dump of shit into a spreadsheet and data scientists have to work with what data they are given and usually what they are given is crap so either they make the data better or they don't do anything i recently forced myself to pick up vb for somewhat similar reason my time costs the company a lot of money and thus they want to maximize their roi who could blame them what this often led to was one of the higher ups coming to me and saying "i need you on this project asap but we can't let the other project fall behind so can you use this afternoon to teach a few of the ba's how to do whatever it is that you've been doing?" trying to explain why i couldn't teach a ba with limited stats knowledge and zero programming experience r python java c+ ++ stats machine learning etc in a few hours was about as successful as my attempts to explain to my 2 year old why he can't have m&m's at 6:30am even though i loathe vb (personal opinion truly don't care if anyone disagrees thinks i wrong) writing a macro and being able to say "all you have to do is click this button ok?" has been great for my mental well being plus it lets me get back to the stuff i enjoy sooner trying to explain why i couldn't teach a ba with limited stats knowledge and zero programming experience r python java c+ ++ stats machine learning etc in a few hours was about as successful as my attempts to explain to my 2 year old why he can't have m&m's at 6:30am your writing style is very amusing congrats for that write a macro and everything's ok? that thought gives me chills don't get me wrong vb has saved my butt a number of times but the best thing i've done is treat it as a fragile "solution" and made sure everyone else knows this too however it sure as heck gets the monkey off your back and that alone can be worth the trouble! more of write a macro and give yourself some breathing room to address some more pressing concerns the macro is just a patchwork solution until those involved can get caught up on what they need to know to do the actual analysis without significant hand holding [deleted] if i create a web interface to present my results then i have to maintain it and answer questions about it and listen to people whine that the color hurts their eyes and get emails that say "hey could you make it do x like excel does" if i have tabular data as output i can export as csv load into excel email it and its not my problem any more given the popularity of mac and linux in industry i've worked for two gigantic banks and it's windows windows everywhere i've known of 1 mac that a senior chief whatever used and a bunch of linux in the server rooms anyway to further what ddttox said with excel you can create a file email it out and be done with it the people you're sending it too have some excel skills so they can tweak it if they'd like or copy data charts whatever into other programs it also avoids the 6 month process of "hey it services i'd like to get some room on some web & sql servers and i'm not real great at building websites so i'll need some help with that too " the bible: http: www amazon com exec obidos isbn=0470475358 ref=nosim jwalkassociatea that just makes me feel dirty ah that totally makes sense i can see why vb would be useful here best description of a curious creative get-it-done mindset i've seen thank you so much for posting i've been doing this a while and i want to come work for you how much and how well do self-directed side projects substitute for work experience? i'll be graduating in a few years and i'll have three solid projects via academic research but only an internship or two of work experience thanks for the great post! i'll answer for me but would encourage the other senior data scientists to comment as well go back to the part of the post that says "tell me a story" it isn't up to me to decide if your projects and research are significant its up to you to convince me that they are one of the things my dad taught me back in high school and my manager mentor taught me on my first job is that everyone is in sales regardless of what you do brilliant work and ideas mean nothing unless you can infect other people with them excellent! +1 as a person who thinks is a bit too old to became a ds (yeah i'm complaining but i have my reasons) i found this post really refreshing maybe a bit hopeful still dreaming about this :) thanks if you use the term “p-value” while explaining results to a client i’ll dock your pay keep being a poser http: errorstatistics com 2015 03 16 stephen-senn-the-pathetic-p-value-guest-post read all of errorstatistics or mayo's books thankyou! i'm sick of all these shitty ass "iama ___ am i qualified???" posts granted i may have posted one a while ago i find that the very strong abundance of these posts repel anybody with any legitimate information to post in here meh they don't bother me too much data science is pretty ill defined so asking the "what do i really need" question isn't that outrageous its not like asking "how do i become a ruby on rails programmer" where the answer is obvious data science requires a wide skill set and some of those skills are not easily quantifiable well i do notice there to be common (and uncommon) themes of the following regression statistics and linear algebra machine learning and artificial intelligence datamining and databases data representation visualization layman's interpretation implementing the above using statistical libraries such as r or scipy but the primary theme seems to be simply a ton of experience with analysis which is where masters and phd's really seem to flourish since most would correctly expect that research experience involves heavy analysis considering how easily accessible it is to get ahold of free datasets online with a simple google search i would figure that would be enough for many people to really sink their teeth into some analysis and sandbox around then to boost skills google search free data mining and web scraping books and apply them to really start rounding off a portfolio the lack of security and self sufficiency in this sub has recently mad me sad over the past few months =( i'll spare my life story but in brief i'm a lab scientist interested in transitioning to data science i have a ton of python experience and in general my math skills are good but i have basically zero machine learning experience beyond a coursera course (so i have textbook-type knowledge of this stuff but haven't applied it) and my statistics are not super high-level either (condensed matter physics makes surprisingly little use of real stats) i've been offered admission to an insight session i have seen the criticism online that people who can get admitted to insight could just go and get a data science straight-out and it's making me wonder if i could pull off the same given enough time (i have been rejected from a few positions that i applied to on my own in the past but admittedly i have not applied to all that many prior to doing insight) i also read that salary offers are lower to fellows because of the premium paid to finance the fellowship (essentially a recruiter fee) is there any truth to this? what does r datascience think of insight these days? a lot of existing info seems to be from ~ =1 year ago and things are evolving quickly in this field so i really appreciate if you all could share your opinions on this issue so i'm an insight alum (albeit from more than a year ago so your older-info issue still stands :p ) i definitely found the program to be useful -- that said people who can get admitted to insight could just go and get a data science straight-out holds some water in terms of technical skills yeah everyone coming in could hack it -- this is actually my criticism of most other bootcamps things like ga tend to be more or less a crash course in pandas and sklearn which are (a) readily self-taught and (b) frankly the easiest part of the job the value i got from the program was more on the soft skills side - coming from academia my working mode was basically lock myself in a room for a year and turn the crank until i can come out with the perfect paper that 12 people will read (i was in a small field ) the "hi now go do a project" approach was actually pretty powerful because that's a lot of how it goes in the workplace -- i need to come up with a concept and take it through to a minimum viable product on a very quick turnaround in order to get buy-in to actually pursue the idea further there's not really such a thing as a "minimum viable paper" so this was a shift then there's the matter of being able to pitch the idea to people from nontechnical backgrounds - not many non-physicists go to physics conferences but the best insight projects are one that immediately present themselves as an approach to a problem that someone can you know give a shit about maybe it's a little cynical to say that you need to learn to package and sell yourself idk but certainly being able to teach explain stuff nontechnically is a plus i've actually probably gotten more value out of being an alum than from the fellowship itself -- you get exposed to people from a lot of different backgrounds (again came from a small insular field) which is good for picking up new ideas i generally found that no one comes in knowing everything but for whatever problem you encounter someone in the room will know it (except for nlp in my session which of course my project was on ) the alumni network is really active too so alums that are out at companies now can post up problems they have and kick around ideas advice solutions on a regular basis tl;dr yeah i could've gotten a job without the fellowship had offers to do just that but i think putting off the salary for three months was worth it to do the program also a former fellow i would echo the idea that the network is the most valuable part of this fellowship not only do you get the "back door" into a ton of highly-reputable companies during the fellowship but you also get that gold star on your resume that lots of companies will take as a sign that you're high quality regardless of what else is on your resume in this and many other fields the path to solid future opportunities isn't always your skills but who you know apply if you get in and want to take the salary hit for 3 months then i don't think you could go wrong to do it but if it's not for you then that's fine and there are plenty of other paths into data science these days except for nlp in my session which of course my project was on same in my session! there were only two of us doing nlp and we were both learning it on the fly yep three in mine -- team social media careening uncontrollably to victory :p not a fellow myself but work with several i hear mixed reviews from them while i can't speak for how much the program taught i can say they every fellow i work with is a great colleague the biggest con i've heard is that the program is essentially "hi now go do a project " that's not the right environment for everyone (but it works great for others) yeah there is very little instruction before the project phase and that doesn't always work well for people who have very little ds exposure before they enter the program however you get a lot of time after the project phase to hone those skills and most everyone comes out pretty well prepared by the end of it i'm normally very against boot camps but insight is the only exception because of the quality of their students i'd recommend it as a leg up i did a talk by one of their alumni who's at stitchfix and publishing ml papers and he was doing a lot of cool stuff i work for a company that interviewed people from insight firstly salary and recruiter fees are different things coming from budgets given to different departments (hr and ds) so it is not true that salaries are lower if you don't go via a headhunter or insight you might occasionally get a higher sign-on bonus that's it for us insight was simply one of the headhunting programs we would use they do a good job in pre-screening academics people so if we wanted that kind of people they would save us time you should look at them as headhunter cause that's what they are they would send us all these resumes from people and we would select the best ones if you have contacts in the industry who can get you a referral then there won't be much advantage in using them you are more likely to get an interview via a referral than through insight but if you don't know anyone in companies then insight is a way to have the hiring manager look at your resume the project thing is just marketing it doesn't matter it is much harder to get a ds job today than 3 yrs ago so i would assume that their placement rate is much lower today thanks for the reply the added competitiveness of the market was actually what made me consider insight more seriously (after low-to-no- interest on the applications i did submit previously) unfortunately i don't really know anyone personally in the industry who could get me a referral at the moment; most of my contacts are still academic not a pro or con re: insight but there are plenty of companies that are totally fine with hiring your "lab scientist + programming" background - half the data scientists at my job had that plus maybe an intro coursera course's worth of modeling thanks for the reply i keep wondering the same thing - could i do it without insight? it seems like the requirements are more stringent than they used to be so the people getting hired in the past may have had an easier time of it i also did insight more than a year ago and i still echo what others have said that the network you develop is a great benefit i know people spread out in diff data science companies all over the place which is great for future job prospects as well as having sounding boards for ideas also the ratio of application to interview is much higher with insight then applying with your resume by yourself i'm sure it's trending that way but i still hire primarily people straight out of grad school that's probably more true for larger companies with a more well-established data science department; there's less need for people to be able to hit the ground running so we can help them fill in the gaps i've made several posts about my own experience but i don't regret doing insight i didn't really have the soft skills or the network to hack it for an entry level ds job prior to insight i learned alot from my peers and i thrived quite a bit in the "hi do a project" atmosphere i was able to polish up my cs and ml fundamentals enough where i could get my foot in the door for ds jobs if you are able to go without salary for 2-3 months i'd say do it if not for the network we've interviewed a lot of people from bootcamps and haven't made any offers yet the opinions i've developed: 1) none of the bootcamps teach any useful skills you'll learn the skills you need for a successful professional career starting in your first job so it's important to get there as quickly as possible this means that if you can get your first position without them you should go that route but this isn't viable for all candidates 2) insight is the best of the bunch and the only one worth considering this is because (i) it has an extensive network and (ii) it's free if you're not in the position i alluded to in (1) this means your opportunity cost is low you'd have limited downside and high upside hope this helps is the data incubator (which is also free and has a network) worth considering? the candidate quality we saw from there was a step or two lower than insight and from what i've seen the network isn't as extensive especially in the startup space (which make up the best first jobs because they generally have higher quality teams to learn from and better opportunities for learning to drive business value) thank you! is there a big difference in value between the different insight programs (data science data engineering health data ai)? unfortunately i don't know much about the differences in their programs themselves sorry! i have a friend who found a data scientist job after this program and this is the only thing that i know about insight :) more data never hurts ;-) thanks for your response! which location did you apply for? i'm curious because i'm waiting to hear back after interviewing i've heard mixed things from people who work in industry it has a good reputation but some people say you're better off going through the referral process if you have contacts at tech companies (because it will save you time and energy not because the program is bad) wondering the same thing i applied for the ny location haven't heard anything back hi! hacktoberfest digitalocean com is soon starting! this is a festival to honor open source code where participants contribute to github repositories in order to help the community i'm a young data scientist working with machine learning in python but i don't have a great experience in general programming i wonder how i and other people with similar skills (i e science computing maths stats ml numerical analysis etc) can participate in this festival? thanks! you can contribute to pandas scikit-learn given your experience in python and ml there are plenty of ways by which one can contribute to these libraries without programming too you can submit doc pull requests for instance i would suggest you start with issues marked 'easy' and try to progress from there this link: http: pandas-docs github io pandas-docs-travis contributing html will be the best way to start with pandas for instance great advice! i will look into this! thanks:) i'm a front-end developer for operation code - (repo here) we're really looking forward to hacktoberfest and i wanted to make a bid for data scientists to consider working with us for the month! you can read more about our non-profit here but the tl;dr is that we're an organization comprised of tech industry professionals helping american veterans get into the tech industry as they transition to civilian life to help retiring service members transition into technical careers we need to understand their needs and the options available to them for instance how many active or recently separated service members are trying to get into tech? what fields are they interested in and what fields offer the best chances? what online courses or other resources are they using? which software companies have veteran-friendly hiring programs? additionally these answers could prove invaluable in working with the american government in best providing resources to prior service members some of this data may be available from the department of defense or other government agencies or from universities or nonprofits the first task is to figure out what data is available and where it can be found depending on what we find the next steps will be to collect the data produce reports and automate the data collection and reporting so that we can stay on top of it it's a loose deliverable that can be expanded upon more in our slack team (simply register on our website to join - the channel is #data-science) essentially if you'd like to help answer challenging questions that will eventually be visualized and presented to lawmakers for the chance to help thousands of america's finest step right up! we don't have machine learning projects in play but documentation is a perfectly acceptable form of pr for hacktoberfest we'll be making a repository for analytics which you can submit prs of markdown or pdf files you may potentially have ideas that can further help in our mission and there's many other places to contribute if you so choose i wanted to make a bid for data scientists to consider working with us for the month! id be interested in checking it out i have some python experience where would be the best place to learn more about what i can contribute? i saw this but its all javascript and ruby like i said it's a bit of a loose deliverable in terms of data science we're essentially seeking research data sources and statistically-backed answers to the previous questions we have one in-house data analyst who utilizes our slack user data to create visualizations for our c-level execs to help back their points on the hill here are some example questions we need answered with data: - how many active or recently separated service members are trying to get into tech? what fields are they interested in and what fields offer the best chances? what online courses or other resources are they using? which software companies have veteran-friendly hiring programs? if you can find reliable sources of information and points to collect data that help answer the following questions then you can feel free to visualize the data as well - with whatever framework language you feel comfortable in all of that can be merged in as pdfs spreadsheets jupyter notebooks or even markdown files as prs to a repository we're going to create tomorrow called data-science [deleted] we'd love to connect! we've partnered with amazon apex github docker o'reilly hackerrank and oracle if you'd like to be a bigger part of our organization in terms of hiring veterans please email us at staff@operationcode org thank you for your service i may be interested! dm meh hi all (hope this is the right place for this) i have an ms in chemistry and have been teaching various chemistry labs at a university level for about 10 years been looking into relocating to a place that doesn't have many chemistry teaching jobs (vermont) and i am finding my skillset to be woefully inadequate for basically any other job in the pay range i'm looking for even though i am very good at my current job so i'm trying to update my skill set and bought some udemy classes in python and data science machine learning i do have some experience with programming so i am confident that i am capable of learning python and being able to use it effectively a couple of questions: -are positions common for someone who is capable but doesn't have a formal degree? what could i do to improve myself in this area taking formal classes is out of the question both in terms of time and money (have 3 kids and full-time job) -could my background in chemistry be seen as a boon for someone getting into data science? any help advice you could provide would be very helpful! [deleted] thank you for taking the time to make this detailed informative well-researched response i am glad that there appears to be some opportunities that may be available for someone in my position provided i work to get to the point where i am hireable do you know anyone working in the field who has taken any of the available classes on udemy on data science or machine learning? curious where they would fit in the context of preparing for a job or if they are only useful for updating skills for someone already with a job in this field what kind of phd in maths are we talking about? some specific fields or is it more like "well he's got a phd in maths so he can surely solve anything we throw at him"? well since you are trying to get a job i would heavily recommend making a portfolio and have a lot of stuff on your github i came from nanoengineering (chemistry) and found the process to be a grind not to be self promotional but here is an interview on how i got my first data science job https: coolpythoncodes com first-data-science-job thanks for the heads up definitely going to watch some of your videos i liked reading your interview good perspective thanks! i would appreciate that don't worry about not having some sort of degree it really is about the skills if you can find the time to teach yourself some data science you should be golden i have an msc in chemistry as well i can't speak for people's perceptions but if you can understand nmr you've got more math chops than most folks who've done a data science boot camp maxwell equations and partial derivatives are also potentially useful for understanding models lastly understanding how to propagate uncertainty is also very useful hey that's good news been teaching thermodynamics and quantum mechanics for six years so i've got that stuff down thanks for the perspective! yeah i grew up i'm vermont and ended up leaving primarily due to the job market (and to pursue a graduate degree) i live in nyc now and as much as i'd love to move back to vt the only realistic way i see that happening is in a remote position it isn't a hub for high tech companies but hopefully you can help change that i'm currently in the midst of interviewing jr candidates and i'd say beyond being able to discuss the data science process and specific ml models the biggest things i'm looking for are logical reasoning skills ability to communicate and some knowledge about what's actually going on under the hood of these models while udemy is a good resource (i like jose portilla's courses) i'd also suggest finding some independent project to work on and put a bunch of code and output on github (maybe check out kaggle for interesting datasets) finally it doesn't hurt to find a text book on probability stats linear algebra etc and just read the whole thing and do all of the problems time consuming but you'll get more out of it than watching videos and copying code best of luck to you! the job market will definitely be challenging in vermont just something to consider maybe you can find a job where you can work remotely most of the time yeah vermont is a tough nut to crack there are some positions available right now; more than chemistry teaching anyway -are positions common for someone who is capable but doesn't have a formal degree? you have a graduate degree in a hard science you're going to have a leg up on a lot of people you'll likely be competiting as a top-tier candidate at least for entry level or junior positions what could i do to improve myself in this area it never hurts to have a github with some projects on it that demonstrate your ability to write functional code -could my background in chemistry be seen as a boon for someone getting into data science? definitely specifically if you are looking for data analyst or junior data science positions for positions where your education and experience are relevant in other words if you apply to a financial services firm it's not going to help you if you apply to a position with an outfit that does work with genome sequencing of any of the -omics things like that where your education is going to be relevant to the topics it can be a huge help i have degrees in applied math and statistics but ever since i enrolled in the machine learning course on coursera by andrew ng the data science machine learning route is calling my name i've always liked the idea of building models to make predictions but the ml course i'm taking puts it into a much larger perspective which i've really come to appreciate what else should i look into? i've learned a decent chunk of python in my spare time and plan on trying to pick up some sql soon and touch up my sas skills yeah i know probably could've googled this but i find redditors tend to be a far better source of info haha edit: way to come through guys! phd physics did the andrew ng course did some kaggle competitions read islr (read it!) solved a bunch of problems in r and posted them on my github (along with kaggle stuff) to show command of the language and problem-solving skills dabbled in enough sql to get through any interview questions then i focused on making my resume look spectacularly suited for data science half of the interviews i got cold by applying or through a recruiter the other half (and where i landed a job) i got through my network your network is the #1 thing to get you into the field in my opinion farm that for whatever it's worth i know a couple of people who have the title of data scientist with just a bachelor's degree because they had a great foot in the door through their network though they started as an analyst how did you build a network? i am also doing a phd in physics and trying to get into data science but my phd topic is in a branch of theoretical physics that has nothing to do with data science so i haven't encountered anyone doing it at all good question building a network is really hard when so many of your peers are pursuing an academic track i think i was fortunate in that 4 of my colleagues graduated around the same time i did and had success getting placed in a job - no surprise but they also got in through their networks with the exception of one person who ended up getting a job for the person who ended up getting me a job i also met a person at a funeral of all places through a family friend and got an offer eventually that way so if you're still in your phd program you're in excellent shape since your school probably already has data science meetups find a mailing list and determine where those meetups are and make some friends casually mention your interest to your phd colleagues they may know some experimentalists - they're the ticket in fact go mingle with the experimentalists at graduate student events in the department (our department had gatherings once a month to drink beer and chat) ask everyone what they plan to do when they finish you'd be surprised at how many people want to ditch academia a great group among experimentalists are high energy people working on analyses for the lhc i am next to certain that some of them plan to defect to data science so use your time wisely i didn't leverage my network until i had already finished but i had many friends and i got lucky start now! do you have a good company that hired you? and how many interviews did you have to go through? i am in a similar situation and just failed interviews at both the bcg and mckinsey data analytics teams and wonder where else to start i feel that it's rather simple to find a company but difficult to get a good one without revealing too much i would say the company i'm with is quite good but it's ultimately a stepping stone the data science team is still small since it's more of a start-up environment with all the pros and cons that go along with that i went through 6 interviews including one at microsoft which i didn't get at the time that team was ultimately looking more for a data engineer software engineer than a scientist and i didn't have the cs chops that they needed even though i crushed the tech screen which was math probability theory heavy they said in 6 months to a year the team will be more mature and that i should re-apply but i'm not sure if that's genuine or just what they say to everyone i know someone on the team so i can read the pulse when i come to that bridge down the line another interview was one i didn't really take seriously since it was really just practice for microsoft i didn't get an offer from them and i'm still not sure why entirely they were very happy with my data analysis sample they requested so maybe they sensed that i wasn't really that into what they were working on in all cases i made it past the tech screen but of the 6 interviews i went on only two wound up with offers and i consider that fortunate both were jobs i lined up through my network (2 out of 3) despite the lateness of my reply i've actually put this information to use already i'm 3 chapters into islr (and this is after a week of reading it) it's super informative and as always it doesn't hurt to recap over some things i'm going to network more finish islr probably start reading esli once i'm done with islr do some kaggle stuff later in the month here i'm happy the information helped looks at previous answers i'm going to be an outlier here i just kind of stumbled into it no degree or anything like that (at all) i was a single dad out of the military so college really wasn't an option and it'd be kind of silly to go now (unless i want access to their labs to get behind paywalls or something) instead: i'm generally fearless with computers and new things i make l mistakes and i make them quickly but i'd learn from them (free xp!) and increase my skills while racing stumbling ahead of everyone into little-explored terrain i'm curious about how things work and ask a lot of questions on pretty much any topic i have poor memorization skills so i'm forced to get down to fundamental building blocks and create models and 'objects' in my head my google-fu is strong i've developed an aggressive 'get useful things done' attitude with all of that i bounced around in tech-related jobs that required a lot of business interaction i eventually picked up a healthcare related job back in 2006 or so and shifted quickly to analytic work because there was so much to discover and i got to do things that helped really people which was a little addicting from there it was just me charging into the soft spots and there's a lot of opportunity for data analytics and visualization in general i'm always discovering new things and get to be surprised a lot of them which is kind of fun as for the daily workload it's generally a combination of: swimming in and cleaning messy data using whatever machine learning tool i get to work with (rapid miner sas spss) to add insights to my data running around interrogating people about how the business works making data pretty and interactive (visualizations primarily tableau) great answer happy to hear there is a diversity of experience! that's a very solid answer! thanks for your input! anytime! i swear i'm not the only one :) we un-degreed types are a rarity but we can be difference makers too (when you've got five engineers stuck on a problem ask a nurse right?) finished masters in mathematical physics saw this data science thing was really picking up in 2014 i had done a lot of research in physics with real data so i understood the methodologies with some self learning i landed a job with 0 experience as a junior data scientist at a very well known financial company i didn't like the environment there so started consulting for a bit i am running a large global team which aims to create proper data applications not just small projects in r or python some important stuff: my research background was the best training for this get at least a masters that is the bare minimum if you want actual mobility phd is not necessary keep learning i have a lynda com premium account develop your soft skills like documenting your work speaking to non-technical audience etc real data scientists are far and few between i see a lot of people who work on some basic stats in r thinking they are data scientists they are data analysts company you interview with interview them in return most organizations have no freaking idea what they are doing get away from that windows and learn to use bash properly in linux unix you need to be sharp with your computer skills most important: understand the damn business data science is not a real science it is a business process of extracting actionable information from data a data scientist at a company works on image recognition systems for product improvement a data scientist working on image recognition at a university is just a computer scientist doing research in computer vision that is the difference and the paycheck too they are different as well stuff i do on daily basis: a whole bunch of teaching stuff to my team overseeing visualization of products in development how you visualize the information is far more important than the awesomeness of your algorithms going from ideas which pop up from within the company or client to a well-documented project plan at first then a poc then a final product generally this cycle is 1 month to 4 months a lot of code debugging with my team hands on stuff i do in sql r python scala bash etc edit: formatting edit2: one more thing: skills can be learned attitude cannot be if i interview someone who i can see works hard and has the willingness to learn i will hire that person over someone with all the skills in the world but a bad attitude what courses would you recommend on lynda com for someone who's new to machine learning data science? i also have a premium account but not sure where to start for these topics i would say learning technical things like their bash for unix users are quite nice i find their content to be less academic but more business oriented i have looked at the coursera classes too i find some of them to be far too shallow to be worth my time i prefer to learn my mathematical theory from the academic papers not the easiest way but it is the most thorough for training people in my team i usually have them go through python and bash on lynda first and then r from coursera i generally add my own teaching style to those courses as well what if you know python bash r linux fortran statistics etc from an academic computational science background but are looking for a data science job? how do you go about telling others what you know in a manner that is digestible and usable to the broader audience? that is a very good skill set the most important thing i look for in candidates when i interview them is their ability to tell a cohesive story i generally ask questions about mindset of a candidate one particular question i was asked at my first data science interview (which is also the job i landed after 6 more interviews with the same company) "what was a difficult problem you had to solve what was difficult about it how did you solve it?" i see if a candidate can tell a story beyond "i did this equation" i like to see how you interpret things with that said this is much more to do with your personality i would advice that you figure out what kind of a person you are talking to don't go too deep into the weeds with business people don't touch egos of technical people i'd say that advice goes far beyond the interview the general concept of understanding your audience and talking to that level is key with communicating i've also had quite a bit of experience with telling a story about difficult concepts my podcast about bitcoin has been gaining ground (shameless plug www thebitcoinpodcast com) thanks for the advice it makes me feel more confident about my skillset and that my personality will only help me i would say personality is far more important the baseline skill set we all need that is a give you have more than the baseline it seems there are many people who can do math but few have the personality which can make them into story tellers considering you're a data scientist or hire them would you happen to have any good reading material on the philosophy of data? i'm quite fascinated with what it is and its implications to how we interact with the world i'm a computational physics phd so my interaction with data has always been evidence of nature and the story it tells is that of natural law other data on the other hand is the story or what the data is and how its created i don't know of much literature that addresses this edit: spelling from autocorrect unfortunately i don't have an answer for you i have been fascinated by the world in general ever since i was around 9 i started tutoring and teaching my peers and even people older than me when i was 12 over the years i have developed my own philosophy on science nature and many other things in the world in general ask yourself how does your eye work? photons traveling at certain speed and frequency which register in a photosensitive biomaterial and get converted into another type signal this time electric which travel into the brain and invoke the concept of "red" "truck" "road" etc think of your mind solving countless classification problems all at once there was etl involved here at some level a type of data (characteristics of a photon) was converted to another type which lead to a result from some type of an algorithm thinking about the entire world itself in n-dimensional data manifolds makes like easier and harder in some ways just don't think about that while talking to girls :p or do it i don't know i play this game sometimes when i am out where i try to think about dating as an analytical art that is how i think i have never posted this online anywhere not sure if this is crazy or not! i tend to think of the decision process as a multivariate optimization problem and making a decision is knowing the boundary and convergence criterion at both a global and local level it's incredibly fascinating that (if we view things this way) the level of complexity our minds are capable of without even objectively knowing it in terms of the data side i see it generally as an expression of matter purely a broadcast of communication our attempt to find relation and meaning behind this broadcast is an attempt to understand the intended expression if we take your light example (i do spectroscopy) the light we perceive is the transition of matter from a higher energy level to lower one either through nuclear or electronic movement the matter is "exhaling " and we perceive that as fucking color (within that limited range because humans) since all matter is different in composition to the given resolution of our perception it has its unique fingerprint of "exhalation " this idea can be applied to almost all forms of data which is fun to do and extrapolate ad absurdum tl;dr: you are not alone in this degrees in cs and biotech with a phd became an independent freelancer after a few postdocs because my network wanted to pay me $$$$$ outside of academia a decent academic background and lots of networking was crucial to get to where i am now i have a strong scientific background but also trained very hard to acquire business insights no use bringing in data expertise if you are clueless about the business side of things also important is that you have to deliver a successful tool or product at the end of the day no time for useless experimentation in most companies that want to invest in data your time will be limited if your insights are limited if your technical or business skills are half baked you're not going to like the job i usually work on bootstrapping machine learning and data science projects in companies that want to create unique selling propositions for their investors and clients currently working as big brother at festivals (realtime tracking of behavior business intelligence and improving festival kpis with data wanting to improve the experience for the crowd) and pushing to keep privacy safe in the event world no use bringing in data expertise if you are clueless about the business side of things so much this and as you start climbing up the chain office politics starts to shift the skill dynamic charm likability and delivery become disproportionately important because otherwise you can't get or keep anybody's attention higher up the chain your last paragraph now that sounds interesting as for understanding the business component i'll need to be sure to do that that's something that i feel can be easily overlooked as far as networking is concerned i am definitely improving on that front as well i decided that was what i wanted i got a phd in experimental psychology and had a base in stats (i use that word very loosely - experimental psychology does not do much with stats) i got a job at a startup i learned r i did mostly data munging and data vis i learned sql while there i worked there for two years while i finished my degree after i graduated i worked there for another 4 months they were paying $33 000 cdn with no shares and nothing there was no real 'big data' no infrastructure to build my skills no need for ml no compensation so i left i got a job as an analyst at another company to do their surveys data gophering (i e sql) and some research i started using ml because i wanted to they liked that and sort of let me choose what i wanted to do they hired two other people to handle the grunt work that was originally meant for me and then data science that process was fairly quick - it took a few months their research manager has been battling for a data-driven culture for years and recognized what i was after they hired me i should note that this whole time i was buying books on calc algebra stats and ml (introduction to statistical learning elements of statistical learning) i was taking coursera courses (ml with andrew ng nlp at stanford python for everybody r programming introduction to statistical learning now functional programming with scala) it was a longer road for me because of my educational background i don't regret psychology (i am very good at research i know a lot about how people work what people actually are) but i wish i had known i was going to do this earlier on i'd have taken more statistics some compsci sas is passe learn r instead cheekiness aside social skills are paramount in getting keeping a job regardless of your education (or lack thereof) check check and check most of the data analysis and modeling courses i did for my master's used r i know a handful of sas but won't put more effort into learning more until i brush up on r and learn more python studied meteorology got a phd applied as a software developer got hired as a data scientist by a smaller company who does quite important stuff for huge international companies a month ago they said it's fine if i have to learn some stuff as long as i have fun with working with data and i totally do coming from meteorology i already have a fairly good grasp at handling data and they give me time and freedom my company is only starting to get into big data analysis and i'm the first and only person they hired for this job so right now i'm mostly learning various stuff that is mostly just reading books and tutorials on the internet also i'm doing basic analysis on the data using python pandas on my local machine (actually it's a virtual one) while we're working on setting up a database on a cluster right now the only thing i can say is that my phd really did help and knowing python as well btw sql-syntax is easy to get into i haven't looked into machine learning yet it will probably come at some point but there is no real pressure i have the freedom to proceed at my pace and i'm having fun with it yeah i wish i had taken a week or so during one of my winter breaks to learn sql it's super easy syntax! i'm doing it now though i figured having some python experience would be helpful in landing a job (which yours sounds pretty nice!) i'm going to take the coursera data science specialization (w python) here soon thanks for the insight! another phd in physics here i jumped ship recently and became a data scientist with 0 experience and 0 self learning my company was willing to let me learn as i work there is a pretty good course going on at edx that teaches machine learning with spark i am taking it now this is the first course in the series that's pretty nice! maybe i can find a position like that only time will tell i'll need to put a pin in that course for later i kind of already got course crazy with the mooc's haha getting a job depends on two things: 1) increasing your odds of getting a random job by improving your resume & skills and 2) increasing the number of jobs you apply to people seem to forget about and underestimate #2 once you have a "good enough" resume it's a lot easier and more beneficial to apply to more jobs than it is to keep trying to improve your resume i am slowly finding that out i pumped out some 30 job applications once earlier this summer and have been contacted by 4 or 5 of them for phone screens good to know i'm on the right track! [deleted] could you tell us about how you transitioned from boot camps to hired? i've read about galvanize and have considered it do they have job fair kind of opportunities or did you get your position through other avenues? edit: and is it really worth 16k? there is huge demand for the field right now but companies are by no means handing out jobs they will keep a position open for months and interview lots of candidates if necessary the problem they are running into is virtually every experienced data scientist out there is already employed it was worth the 16k for me out of my cohort of 12 i'm aware of 2 classmates who have not been able to find a job and 3 i am not sure about our cohort ended at the beginning of may you need to be prepared to pound the job market on your own and budget for a multi-month search galvanize will host a career day with a room full of employers and will host other events that are good networking opportunities but at the end of the day you are on your own keep in mind they will coach you to ask for an average data scientist salary but unless you already have experience no one will offer you that and it will turn off certain employers you will be given career advice by people in their early 30s who never hired anyone in their life listen to what they say but seek outside opinions as to how to structure your resume etc got my master's in electrical engineering and financial math now working for a unicorn company in the bay as a data scientist something i found useful are: andrew ng's course on coursera (which you already did) also read the lecture notes for his graduate course at stanford (more detailed than the coursera one) kaggle competition and or some side projects you can download many interesting datasets from some websites like uci ml repository and do side projects with the data (and put on your github cv) i mainly use python (not good at r) however i find matplotlib is very difficult to use for me i am now learning tableau for visualization if you have time read textbooks written by kevin murphy (machine learning: a probabilistic perspective) and or trevor hastie et al (element of statistical learning) both of which will greatly strengthen your understanding of ml algorithms don't learn sas unless you want to work for pharmaceutical companies the syntax of sas is quite different from other languages sas is not so popular among tech companies lmao unicorn company just finished that course! i'll need to check out his grad course notes i could see those being useful down the road i've been looking into doing something like that kaggle competitions anyways i'm sure i can find something creative to do i am starting to use more python actually going to do the data science specialization (w python) via coursera starting soon i have a lot of r experience but need to brush up i will definitely check those out! i downloaded the isli text already that's true i only know a hand full of sas anyways not going to invest time in it unless a kick ass job comes along and wants me to have it in my toolbox thanks for the help! one year into my role as a data scientist fresh out of college (bachelors in industrial engineering from nus) in a tech firm in singapore here are my two cents: when starting: a decent portfolio goes a long way whether that consists of kaggle competitions some applied research work you did in school or a capstone project it always gives the interviewer evidence that you're capable of delivering on the job especially if you list the tools that you've used if you have a link to share containing some of the visualizations or the summarized reports all the better as it's something that recruiters business and technical people all can appreciate my college degree had very little to do with what i'm doing currently and if not for the fact that i started doings moocs and priming my school projects to have a 'data science' component i wouldn't have been able to land my current role in fact by virtue of my portfolio i wasn't even asked to do any coding tests; my boss could infer the requisite skills that would produce the respective quality of work in my portfolio when you're in the job: diversify your learning whether it's a mooc on spark via an edx course experimenting with the latest imagenet winners' deep learning models setting up your own linux server or attending meetups where people share their domain technical expertise there's always something to learn given that data science is an intersection of software engineering math stats and business reasoning skills hone your skills in each of those areas i've seen people who were content to just attend meet-ups but never got around to implementing anything and consequently weren't as valuable to the team on the other hand there are those turn into code monkeys and seldom question about the business value of the task given to them or pause to appreciate the inner workings of xgboost despite using it in multiple kaggle competitions random observations: i think machine learning developers are quite distinct from data scientists if you were to go down the machine learning track it necessarily involves more theoretical grounding and engineering work while the data science track involves more of mining insights separately there are also jobs which look for 'data scientists' but are looking for little else than someone who can do automate data cleaning in r true story hope this helps and all the best! well i am glad to know i am on the right track! i recently finished that machine learning course and am going to probably take on a data science specialization in python once the courses start up in september via coursera this was very helpful what are the best educational backgrounds and skill sets to have for data science analytics stats analysis in the public sector? you could just be a traditional scientist there are lots of jobs in the public sector for traditional scientists lots of national labs have data scientist positions opening with difficulty hiring and keeping people because industry salaries are so high in comparison this is true though i will say that national labs over the long haul (including benefits) tend to be as good compensation for most people its just hard to get someone to commit 20-30 years of their life to one place people think of it that way but it doesn't have to me that way a national lab job can be a stepping stone to a faculty position higher up the food chain at the national lab or something else entirely in fact i've been to a number of faculty development talks and workshops and whatever and they usually recommend that if you are good tenure isn't important that once you acquire it either through a faculty position or a permanent position at a national lab you can carry it with you as you look for a new position good point but what you mean by traditional scientists? he means a natural or life scientist probably i'll say though that jobs at national labs are very competitive not quite as competitive as academia again to be a "traditional scientist" you probably need a phd people who value physical understanding of a phenomena and not just statistical understanding i don't know "data scientist" is such a murkily defined term that it could be anything a traditional scientist is a pretty well defined term it seems and since often times the qualifications are very similar why not just go be a postdoc the public sector always wants statisticians although be warned that you will make far more money in industry the government likes phds in statistics that's a great point why would someone want to work for the public sector? i've been asked that question too by my professor during grad school my response is that someone has to try and why not me! the pay isn't bad but definitely close to anything private has to offer if you're talented also in the public you're behind in technology and projects you get to work on so you have to keep learning on your own to keep up i mean if you like it you like it but a lot of people come in without realizing things like low pay and bureaucracy so as long as you know what you're doing go for it! thanks! in which departments do they want statisticians? and is it just the federal gov that wants them? demand will not be as explosive public sector will almost be a step back in terms of adopting new technologies and ideas though i believe that is a good thing since data science is smack in a hype cycle a phd in stats policy (with quant background) math computer science perhaps polysci that's a lot of phds!! 1 will suffice just kinda the options ds in the public sector (which i'm basically in) tend to have phds because most of them are at ffrdcs in the united states ds in the public sector (which i'm basically in) tend to have phds because most of them are at ffrdcs in the united states so they don't want master's in statistics? i'm just saying what i see i think its certainly possible with a bachelors but the same is true of any bigger company i work in the public sector (at the city county level) we're still very young and new at this so what we do is quite basic there are some areas that are much more mature like health related work there will be growth in general and i can see that happening more with internal services as well (hr purchasing finance) we've been working to try to understand homelessness and build a model to predict wait times and what we can do to decrease wait times as an example i've also done one where we build profiles based on usage of applications for it we have a combination of phds and masters level with different types the degrees in the group i'm in include includes statistics economics gis business developer and primary research thanks for sharing we have a combination of phds and masters level with different types \ so it's not true that the gov only wants phds in statistics instead of master's? in which subjects do most you and your colleagues hold? we have a few phds on the health services side but majority does not we have no phds in the team that i'm on we're the corporate bi & analytics group most of us have masters level only and some bachelors in terms of background in this team there's one with geography i'm tech and business one statistician one economist one primary research focused and the others more just general degrees nothing specific we have to compete against the private sector and realistically the good phds and just good candidates in general we probably can't afford or not even consider working here in the first place so we have to have realistic expectations maybe at the federal state level but i work at the lower level where we actually deal with the residents and data science is fairly new to us edit: what areas branches of the government are you interested in? this would help answer the question a bit thanks for informing me i'm interested in primarily working for the county city does working for the state require relocation to the state's capital? i'm interested in all statistical analysis work but mostly business intelligence i'm also interested in security but that's just computer science work instead of stats huh? i'm actually in canada so i'll speak to that perspective there generally is province (state) wide that is located everywhere not just the capital usually in major cities of the province (state) what flavour of statistics in general? if you want pure statistics there are groups usually at all levels that only look at statistics like census (demographic data growth etc ) and how it applies to them gather stats and other purely stats work with minor analysis then there are groups that look at those statistics and apply the financial view to them (out look of the city in the next 5 years to get a aaa rating for example) or you can do public facing work based on the census data what are the main trends where should we build a school? what's the best place to put a parmedic station so much more and they're all just flavours of statistics (although some more than others) well they're just bio statistic and bio informatic in fda just saying as for high demand i'm not sure for me i feel like government is slow in the uptake on things especially technology hackla civic meetup is state funded but that's more programmers doing it projects for free? i went there no real demand it's mostly programmers i know the city of la have teamed up with usc and lacc in the past for data science projects hi reddit data scientists i'm considering a career change and would value your input i'm in my early 30s with a tenure-track professorship in physics for a variety of reasons (mobility pay and dislike of teaching undergraduate classes among them) i'm considering making a move to data science my research analysis and statistics skills are strong i've used python r sql in my research (i'm comfortable getting things done in these languages but am not an expert developer) i think where i fall short is that (1) i lack experience in software development particularly in a team environment (i have written plenty of software for my own research and have published some of my code for general use but have not worked on collaborative codes) (2)i have not done much web development and (3) my breadth of experience with database languages and tools other than basic sql is non-existent what steps should i take to get my foot in the door? how hireable am i at the moment without a laundry-list of industry-specific skills and what are the minimum steps i need to take to become hireable? i was considering creating a data blog where i work on small projects that interest to me with the code available on github or similar i could use this as a way to build new skills and showcase my work to prospective employers thoughts? or is there a better use of my time? if you do think this is a good idea what types of projects analyses will be most impressive to prospective employers? thanks! do you have a sabbatical coming up where you could do the insight fellowship? that i think is the best way that being said you might be able to apply to a place like google- they tend to hire people like you because you are smart and can learn on the job how hirable you are depends on a lot finance type places will like you places that always hire smart data people will like you because they don't have necessarily have an immediate need it would be worth trying to contact some recruiters headhunters to get there opinion also it would be worth looking at places like national labs or federally funded research places to see if you would be a good fit there (either in physics or data science) imo you will be very competitive because getting a tt position is very hard thanks for the pep talk :) unfortunately i'm only a couple of years in so don't have a sabbatical on the horizon the university i work at is a teaching-focussed liberal arts college not an r1 institution (still competitive to get a tt position but less impressive overall) i guess there's no harm to start sending out resumes feelers what kind of risk is there of a prospective employer contacting my current employer without asking me first? that would put me in a very uncomfortable position! +1 for insight i wouldn't worry to much about having a prospective employer contact your current employer you don't need to provide references until someone asks in silicon valley i don't think it's very common to check references at all (mileage may differ elsewhere) big companies won't give references so they mostly don't ask for them there's also literature that suggests references from current or most recent place of employment are worthless as an employment signal (too much incentive for current employers to game) also there are probably people you know from grad school post doc that are now in data science or other industries that can give you specific advice based their personal knowledge of your experience some of these people can help you get your foot in the door you don't need to do projects to showcase your skills to prospective employers your research serves that role i doubt you will have any problem getting interviews your best use of time is to figure out what kind of companies you want to work for figure out what questions you need to answer to pass the technical interview and study practice those things my best advice is pick an industry that you want to be a data scientist in and learn what the data science applications are for that field if you feel like you need to learn some more stuff (maybe spark jupyter ecosystem deep learning) and build a project or two i know you probably don't have much spare time (i was a physics major at a small liberal arts school) for me (and perhaps for you) the biggest question employers had for me was taking business data building a scientific experiment around it and then using that experiment to drive action that gave them some monetary value if you could build a project around that i think people will be very willing to forgive that you dont have the "industry" software engineering experience learning how to use git docker and the jupyter ecosystem is not that hard it just might take you some time to do these things because you of course have a day job that might leak into the night i dont know if they would contact your current employers (probably only if they were going to hire you) however i'll leave someone with a bit more experience to answer that i think places with large coding data science teams are probably the best fit for you (amazon google facebook) they struggle to get people with strong understanding of scientific method (and in some cases mathematics) but have no shortage of people who can code they are also well suited calibrated to handle people with your backgrounds the other route you might go is the ffrdc route if you don't mind working in defense they also value people with your background great advice now i just need to figure out where to start in terms of industry - i guess i have to think about it and do some research i'd definitely be happier in a job that has a significant research component rather than an emphasis on software development but beyond that i'm passionate about solving a variety of interesting problems (and not being confined to the same problem for years and years as often happens in academia ) another thought - do you facilitate any undergraduate research ? i wonder if you can get any of your students to use some machine learning in your projects then you could point to a project that you "led" that also involves data science my fondest memories was being a research assistant when i was at my slac! i do have research students and work on my own research as well i could probably cook up a student project but it is a significant time investment to train an undergraduate! just depends on whether i spend my time digging in here or making an escape several slacs have started data science programs in the recent past perhaps advocating for developing such a program is a good way to teach yourself skills and still maintain an excuse to do something and contribute to service expectations there tends to be a flow from more research focused institutions to industry establishing yourself as an expert in a particular domain might be a good strategy this is great thought and if i were able to get some teaching relief to develop new curriculum it could be an option perhaps this might serve as some inspiration i majored in physics at a small liberal arts college got my ph d in physics and then transitioned to a role as a data scientist shortly after my defense i definitely felt (and still feel) the same way that you do about where i fall short as i had pretty much exactly the same experience with software development i felt like i wasn't hireable in my current state either and some places passed because i didn't have knowledge about their particular arena (nlp hadoop etc ) however i was able to find a position in order to succeed in that position i've focused on what the company needs and what's the simplest way to improve what they're currently doing most business problems don't need google's ai team to solve if you are capable enough to get where you have in your career you can solve the types of problems you'll most commonly encounter moreover being able to communicate those results to people without any technical knowledge or inclination is more important to success in most cases finally remember that data science is a lot like physics you will never know everything in your specialty and likely won't know a ton about things outside of it it doesn't mean you're a bad physicist as long as your work is methodologically sound and answers a question that someone is interested in you're doing it right thanks for sharing your experience how long were you on the market before finding a job and what was the transition like? what type of company did you end up with? (i would probably love to work on google's ai but good to know there are more attainable jobs!) it was almost exactly one month from when i sent out my first resume to my first offer i ended up having two offers and started work about 7 weeks after sending out the first resume in the time since i started i also received interview requests from other places that took longer to go through resumes given that you had a more successful academic career i'd imagine it would be even easier for you to find a decent position you would make a very convincing candidate for many positions your lack of pure development skills can be an issue for some positions which are more about dev deployment but there are many positions which are more about solving a business issue analytically and with your background any decent employer won't doubt you should be able to do this just make sure what are the expectations for the position (some are more about finding a code monkey) at the company i work at (in europe) we have hired a couple of persons in a similar situation (e g uni professor in mathematics) and it went rather smoothly in terms of tools you already know the basics a smart person like you won't have trouble to learn about a new environment such as hadoop aws it is quicker to learn this than advanced statistics scientific method the main doubt some companies might have would be how well you would adapt to a business environment main differences are: having to work on strict deadlines understanding what business value will an analysis bring focus on what will be actionable next and not do something for the beauty of it try to prepare some examples of things you've done (not necessarily in a professional context) that shows you have the right mindset having some side projects to show for is a good idea it is something i did personally as i thought recruiters will better understand my skillset showing analyses of real world data rather than a cryptic paper on genetic interactions the project don't really matter just show that you can identify a good question people can relate to and answer it using data as suggested by other members best way to know if you are a convincing candidate is either to send a couple of cvs and see the response you get or contact recruiters through linkedin share your cv and see if they think you would be a good candidate for some positions they are working on (they'll make money if they place you so they won't hesitate to help someone with a strong cv) i also switched from academia to industry (after 2+ years of postdoc) and it was easier than expected (it took me about a month to get an offer) if you quickly get a first offer that means you can wait a bit and see if better offers come in if you're not convinced at first the main challenges are to tailor your cv for the job (focus on soft and hard skills ditch your paper list ) and preparing answer to common questions such as: "why do you want to leave academia?" "how is your experience relevant?" "how well will you adapt to a new context?" if you need some help on those or regarding the cv i am happy to give a hand thanks for the suggestions and for sharing your experience do you think it is better for me to submit my cv or resume? i have both ready to go but the cv has pages of info about grants and publications and overall a different focus than my resume (this is significantly shorter and focuses on the key analyses algorithms and data sources i've used) from the descriptions you gave it seems your resume would do a better job recruiters won't care about your publications and grants obtained unless you are applying to positions very close to what you worked on before in my cv i just added an "achievement" section which contained something like "published x papers in international peer reviewed journals accumulating y citations" and "got the prestigious x grant (y% success rate)" i have the same situation at the moment i am physicist doing my phd for some reasons which are partially the same as yours i decided to move to data science my first step was just to play around with problems on kaggle this can teach you some basics then i had few courses on sql (codecademy or any other interactive sources would be enough for the entry-level) and then i was quite lucky to get the internship at the company after practicing there for around 4 months i got a job offer from another big company i believe your background speaks for yourself learning sql is not so hard especially if you will apply for the junior position just keep going try to find some projects internship and you will succeed i am pretty sure it is not so important to have some particular technical skills knowing languages but rather to be able for analytical thinking and finding smart solutions this job is not very different from the academic research in general sense it requires mostly being smart and working hard the rest you will learn automatically being involved in real projects the most part of employers understand that thanks for the reply and encouragement the long list of specific requirements on job advertisements can be really intimidating i'm actually looking forward to taking some online courses - compared to most of my late nights prepping lectures and grading coding academy sounds like a ton of fun i was thinking i might brush up on my sql and then work on machine learning i posted this in cscq but i figured i'd also post here even if this sub would be more biased i'm trying to decide which would be better for me between a masters in computer science vs data science i'm currently enrolled in an ms in computer science and my school is opening a new program for data science i do not have a bachelors in computer science which means i had to take some prerequisite courses to get into the masters program i come from a strong statistical background (actuarial science) and really want to combine my math and statistic skills with programming in my shoes would you switch? would an ms in data science pigeon hole me too much in the job market? would a degree in data science increase my value at all over a degree in computer science? thanks in advance for your answers in my shoes would you switch? would an ms in data science pigeon hole me too much in the job market? would a degree in data science increase my value at all over a degree in computer science? short answer i generally hear (and have) is no degrees in data science today can be seen as all hype no substance computer science is a solid field that is well established learn the principles come with the statistics and you'll show you're more able to take on industry challenges dang i'm honestly surprised to hear this answer in this sub thank you for your honesty! i have a master's in cs and work as a data scientist i had a lot to learn about stats but i was very prepared for the algorithms and software engineering part of the job take a look at the two standard jobs for those degrees (software engineer and data scientist respectively) and see which appeals to you more honestly it sounds like data science appeals more to me but from the other responses i'm getting it sounds like cs would be the "safer" way to go this question is posted every week please do a search probably depends on the coursework curriculum don't get caught up in labels too much look at the curriculum for each degree and figure out which aligns more with your career or education goals i could probably take the majority of the data science related courses in the cs program and the data science program is still working if you are motivated and curious enough i'd say this doesn't matter i come from a master in ds my friend from a master in cs we end up doing the same things in different companies (even though i'm a little better when it comes to maths and he is when it comes to programming skills but not a big difference) thank you for your answer! i'm not sure i would bet on the quality of the new curriculum have you been able to review what the course offerings will be? so far not very good because they're still figuring out how to teach it i see an inconsistent number of answers to this question all over the internet some say you need to be an expert in calculus and linear algegra while others say all the complex algorithms are already done for you in sklearn so you don't need that level of expertise with sklearn i know you need some math and statistics skills in order to understand what the algorithm is telling you but do you really need an intense understanding of calculus and linear algegra to do data science? some argue it is needed while others say its a scare tactic to keep people out of the field and keep demand high it almost feels like saying "hey you need to know c++ in order to write sql because sql is written and based on c++ which is obviously not true because you don't need c++ to write sql code i just want to clarify i know you need math(pretty solid background in stats but not a phd level) to do data science i'm not disputing that but my question is do you need to know in depth linear algegra calculus etc in order to do data science thanks it really doesn't hurt if you have a strong level of math you can venture out past pre-packaged algorithms and explore a problem by inventing your own methods i think you'd have a better appreciation for an algorithm's assumptions where they fail how you can modify the assumptions by changing the underlying math the limitations of your new assumptions and so on if i just told you that this supervised algorithm x is assuming some features have some independent gamma distributions and you find out that assumption is not valid because there is some frechet behavior with dependencies but you still want to use the idea behind algorithm x well how can you go about this problem better if you didn't have the prior knowledge of understanding the important but subtle difference a gamma vs fretchet assumption could have? if an another algorithm y requires a finite variance parameter but you have an infinite variance with finite mad how can you reformulate this? another example in what situations do robust statistics break down i e when does using the median instead of the mean have the effect of increasing your tail risk? if you interested in data science for anything regarding fat tails extreme values sub-exponentials failing cramer conditions insurance finance hedging ruin bankruptcy shock terrorism crime environmental events random processes then you probably going to do more harm to the world than good without a really solid background in math you will get inconsistent answers because the job field is inconsistent right now i only have a bachelor's degree in a math related field that was extremely heavy on doing proofs for statistical theory and i struggled with the very difficult classes on the subject i never thought that i would have to utilize those skills ever again after graduation as a data scientist i do find that when i'm having to implement newer libraries or methodologies sometimes the only way to understand what's actually going on or how a methodology works (i e t-sne deep learners recommender systems lightgbm) is working through the equations and dusting off my old stat theory books overall: if you get hired without a math background or without an understanding advanced mathematics you could probably be a decent data scientist that performs basic stuff (i e classification with a decision tree or logistic regression); however if you want to be able to progress in your career and develop methodologies and engineer new ways to solve problems i do think that having a strong background in mathematics is a great tool edit: useful classes: calc i-iii statistical theory linear algebra time series & other upper level stats classes even with decision tree and logistic regression you do need to understand what’s going on which means you need to understand the math behind those algorithms let me put this in terms that matter to an aspiring data scientist you don't need calculus and linear algebra to get hired with the title "data scientist " that said you cannot understand distributions and statistical models without calculus and linear algebra you can definitely understand distributions without calculus and linear algebra and there is very little calculus in most statistical models data science is more about learning and picking up things fast and accurately than any particular university major with rigorous math you pick up intuitions and techniques very fast and accurately and generally could have an easier learning curve than most that being said data science is math however you can be a data scientist without formal training in math statistics or computer science i just think that it would be tougher for those data scientists to pick up the new algorithmic techniques and tools when the field changes hence why most data science jobs require masters in one of those fields above given the choice i will always be preferential to working with people who know the maths it is possible to be a functional data scientist without being a mathematical wizard but my experience is that without a certain level of mathematical literacy you just struggle to be an effective practitioner (this is not just a problem with machine learning but just thinking about stuff mathematically) my suggestion is that should ideally a have done course in linear algebra and a course or two in calculus you should also probably brush up on it every few years you can get a job without it of course - i work with people who don't have that background but they struggle with a lot of concepts and are nowhere near as effective as those with strong mathematical backgrounds the more math the merrier interesting comments i appreciate it i understand the reasons why knowing all the math is important instead knowing and using an algorithms my background is mostly in programming with some math i thought about data science but maybe now it might best to go back and learn some linear algegra and other types of math before diving in 7 imo there is 2 types of data scientists those with a strong background in programming but not math then those with a strong background in math but not programming (if you know both you're golden!) i'm a junior data scientist right now that came from a programming background i can build models at ease with all of the available machine learning packages however i don't really know the math behind it but at a high level i know how certain parts are working in my daily work i don't usually do too much math (i don't do tons of eda work mostly just building models) but now that i can do the modeling i've been trying to learn the math behind it because i know i'll need it to grow from here i think it all really depends what exactly you're doing a data scientist and the company you work at i was lucky to get in a marketing company because there's not alot of complex math i need to do for marketing data if i was working at google or something i'm sure i would need to know way more math than i do now my advice to you is to just approach it from the programming side because it's easier to get your foot in the door then try to learn the math as you go one of the biggest things my mentor told me that has helped me alot is to become an expert in one machine learning algorithm and go from there i chose gradient boosted trees so when i'm not creating models i'm usually reading up on gbm's and trying to understand everything thats really going on i mean it's kind of a silly question of course the way it's asked the answer is none it's like asking how fast to you have to be to be a runner or how tall do you have to be to play center in basketball? you can obviously do 'data science' whatever that means to you without math and you can probably get a less visible data science job without math as well but without math (which is simply a logical system) you're as limited as a 5' dude trying to hold down the block or maybe a football player without shoes sports analogies aside if high school freshman sophmore level math is an effective 'scare tactic' should the people that are scared of that really be working in a quantitative field? how would that even keep demand high? calculus is offered in high school and i took linear algebra as a freshman and so do 10s of thousands of students every year hi i am a applied math graduate student mostly looking at careers with cs i have cs background from my undergrad and am always trying to learn new things i just took a data science course on udemy and am currently taking a machine learning course on coursera i often play around with kaggle data sets i am only taking one class this semester- advanced probability which led to the discovery of markov chains and how its used in predictive modelling google's pagerank etc i have to do a master's project with substantial math in it as a final presentation for last my degree requirement i was going to do a data compression related presentation but i have become suddenly really interested in markov chains and in a career as a data scientist ml engineer how can i use my newly acquired knowledge about markov chains logistic and linear regression neural networks and all these things i have learned to make a kick-ass masters project that also serves as an impressive talking point during job interviews? i know i could do a lot and i have all the tools and skills but i need to lock down a project-i don't know what tho i feel i have this information overload and am having a hard putting it all together to showcase my skills and hardwork that i put into outside of the classroom in my free time i have used them in a feature for a classification model i had discrete time sequenced data and wanted to predict a binary outcome after observing the sequence for example given the shopping history of a customer are they likely to return in the next 30 days i estimated two transition matrices with data of the respective classes and then used them to compute the log likelihood of a given sequence being generated by each then computed a ratio as my feature that sounds really interesting! is it possible for you to share your research results presentation with me? unfortunately i can’t due to it being proprietary information sorry! i would love to see that too ! i've always wanted to explore the idea of using markov chains for clustering if you can build a similarity graph on your data you can use markov probabilities to identify interconnected clusters and perhaps even learn things about the clusters (e g degree of connectivity to other clusters) if there's a time dimension involved i wonder if you could model entities transitioning between clusters with mc clusters in this case can be markov blankets however finding the right markov blankets and probability of transitioning from one to another is another topic we can then compare this new approach with mcmc based sampling using markov blankets so i've been noticing that a ton of the data science jobs out there usually require a masters but preferably ph d for the full-fledged data scientist title because it's a very hot field naturally various boot camps and moocs have popped up all offering courses in ds ml ai but you might ask yourself "will the courses teach me what i need or do i really need to spend 5 - 9 years at some university?" well here are my thoughts (fwiw i have a masters in applied physics - machine learning): 1 even though many of the techniques are fairly old the data science field itself is pretty young because of this not many (educational) institutes have been offering any official or streamlined program towards ds hence why so many applicants come from slightly-related fields where the same techniques are applied because of the above many schools do not offer ds ml ai before graduate school - and even then you're just scratching the surface you learn the bare basics and won't really get to touch the cool or recent stuff before you start on a ph d this in effect creates a very large pool of applicants that hold a masters or ph d 2 the field is expanding rapidly and some techniques or research areas (deep learning for example) are quite recent because of the pace you really do want applicants that can a) fluently read research papers b) convert ideas from research papers into working code c) potentially own research because research papers are written by researchers for researchers they are also quite dense to read in short you need to know the language (at times advanced math and engineering) and you need to know it quite fluently if you're a ph d it's a good indicator that you're excellent at that scientists read research papers literature all day notes with that said ds ml are hot buzzwords and many companies deal with quite trivial problems many of these jobs can absolutely be done without problem just by doing short-term self-study and knowing a bit of this and a bit of that there are many "abc" study guides floating around here listing up all the fundamentals however if you want to work as a scientist researcher and want be on the forefront of things cutting edge i would advise you to seek a higher education whichever way you slice it you need rigorous experience the tl;dr not many universities offer stand-alone ds ml programs and many of the ds ml programs are part of a masters degree specialization and then a ph d program research group because of that many applicants and candidates hold a masters or ph d - so you get a kind of inflation in the degree level research in some areas move fast so you need to have some researcher skills to keep up - like reading research literature quite seamlessly academic (or other) research background will help you at this end notes i predict that in 10-15 years there will be more undergrad "specialized" jobs in ds or ml simply because more schools will be offering said undergrad programs i know that schools all over are doing everything they can to open programs like that because of the high demand from both industry and students we tend to restrict to phd primarily because the pool has become flooded with lower quality applicants in response to the high demand and salaries we may pay a premium for it and miss out on the occasional wiz kid who only has a bachelor's degree but it is the easiest way to get a decent pool of candidates who are likely to be able to perform well additionally obtaining a phd means that the person has a natural curiosity and has demonstrated the ability to do novel independent work those are qualities that are very hard to otherwise assess from a resume or even a phone live interview it makes sense to me that biotech would require a candidate to have a phd for data scientists i'd imagine that data science with respect to biotech is a bit more difficult (subject matter) than other fields in your experience what other fields besides biotech typically require a candidate have a phd? based on my experience pretty much any large research organization will lean towards phds yep a lot of linguistic data science jobs also seem to require a phd in my experience (source: my phd is in linguistics; am linguistic data scientist ) same situation where i work to put it in ds terms requiring a phd gives you a high false negative rate but a low false positive rate :) not necessarily the fairest system but makes sense for a lot of companies since their time and resources for finding and interviewing candidates is limited interesting i'm doing an msc in data science and hope to have it done by next september my undergrad was in psychology and as part of that i did a dissertation which was presented at a national conference i've submitted a more focused and concise paper to an international conference just this week additionally i'll be doing a dissertation with the masters and received a scholarship to study it also i'm 28 so i didn't just do this stuff from high school as someone who did a primarily research driven undergrad doing the msc would i fit the bill? i was offered a phd but thought i'd get more benefit out of the masters sooner plus the social sciences need people who understand more than classical test theory (still the most common methodology) i'd eventually like to go back and do it but i have some other stuff (start a family travel) that i wanna do before that would you consider someone with my background? sorry if i jumped on you a bit here i just don't often get the opportunity to speak with folks like yourself :) the way our hiring process works is this: hr does an initial screen of applicants against very basic requirements and then then provides a stack of resumes to the team's manager that manager does an initial pass to create a somewhat smaller pile and then passes them around the team a bit based on who the manager team is interested in the manager arranges a phone screen with them those candidates who still seem good after the phone screen are brought in for a technical and behavioral interview with 2-3 team members unfortunately your background would basically get screened out at the first step one of those simple requirements to hr is that the candidate have a phd they may make an exception for master's + 10 years experience or something but probably not that said we do have a few people with only a master's on our teams generally they are either people who initially came to the company in a different role and made the transition or they are people who knew someone on the team (we can basically tell hr to ignore the screen) now don't get too discouraged because this is just specific to my company we are a large biotech company that does a massive amount of scientific research so we have thousands of phds that likely also plays a role in a desire for the phd pedigree i also figure that data science is not as saturated as standard it or software engineer yet meaning a decade from now we'll have bachelor degrees in data science and companies will have data science work tailored to their level (entry-level work some analysis) i understand thanks for your response and telling me a little more about it it's odd i'm doing good in the masters so far and i thought i'd have a lot more trouble given my background but it's going ok so far thankfully who knows what i'll end up doing! any advice for someone in my position? network as much as you can not only does it potentially generate leads when later looking for a job but in general it can be helpful for advise and just keeping yourself in the know about the field everyone could use a better network so that's solid advice thank you might pop back to this sub when i know a bit more just focusing on getting the basics right for now (stochastic modelling glms etc) there's time to get into harder stuff but my take on this area is that if someone don't know the fundamentals like the back of their hand they might struggle a lot later on thanks for taking the time to chat! i know generally master's is considered "standard " but many biotech and health companies i hear can be quite stubborn in terms of requirements - not only a phd but a biostatistics similar healthcare related phd can be a requirement as well for the rest of us there's lots of different types of firms though! data scientist is senior title imagine if you asked to be senior consultant straight from bachelor's? ps lot of faux teams and buzzword oriented companies have started to call their business intelligence and data analysts people as data scientists which is not what this field was originally if that's your prediction for 10-15 years i predict that in 30-40 years any subject matter expert worth their salt will need to know the methodology data scientists put together and use today and the tools available will be much more developed and kept up to date domain knowledge is the last 10% of value add that is essential at the top and statisticians will go the way of network engineers building tools as platforms it has to do with training and experience on how to conduct research undergraduate degrees simply do not provide someone with the education or experience on how to structure research or even how to properly parse research to translate theory to practice virtually no one with an undergraduate degree actually knows how to properly parse research regardless of their field regardless of changes advances in "off-the-shelf" solutions that can be utilized by someone with only an undergraduate education the need for the whole array of skills that go into conducting and reviewing research won't go away and those skills still won't be available to someone who doesn't have a graduate degree in other words you don't need any particular background to learn how to code or to use an "off-the-shelf" ml ds tool however you do need that advanced education in order to learn about conducting and interpreting research because you just can't get that experience outside of that environment i agree there's no good way to get those skills now at the vast majority of companies but developing "junior" or "entry level" positions that focus on using tools to do grunt work while learning the research skills (i e non-academic grad school) would be another way this requires a pretty big shift in view (since there are people with the training already why take the hit at training in house) and money time investment so it wouldn't be an easy change the position i accepted is basically this i'll be graduating with only a bachelors in cs and a minor in math it looks like they'll put me through around a year of doing grunt work for real data scientists while taking classes half the day then at the end of that year if my superiors like my work i can apply to move into a real data science position myself edit: corrected some grammar mistakes yes which is also what i've observed my gripe with the "boot camp" culture is that they are designed to be an intense quick fix short-cut to the various industries for some jobs they work just fine but for other jobs like data science they're too superficial and short i'm gonna go ahead and say that the vast majority of people can't learn to conduct rigorous research in the span of 6-12 months it takes far longer than that with universities and ph d holders research training has been a very gradual thing baby steps during undergrad and industry-level research while finishing up the ph d not to mention that you touch (and go deep) on a huge variety of topics from your first year to your last year trying to cram 5 to 8-9 years of topics into a 6 month is impossible i do however think that boot-camps and such are great for people with already advanced degrees and that have done some relevant work for example if you have a masters or ph d in biology a high-quality course in general data science topics will probably add more value than if someone with a undergrad (or no degree at all) took the same if your goal is to be a industry-level data scientist this is actually the same feel i have about many of these coding boot camps they are excellent for people that already can code but may have been out of the game or haven't followed the trends you already have the fundamentals down but need to either learn or freshen up on some topics this is somewhat my experience i was doing a phd in computational geophysics - which was approximating solutions to physical equations instead of stochastic but because my mathematics programming skill was advanced i was able use a bootcamp to leverage myself into a good data science job however i will say that people without advanced degrees stem degrees who worked very hard in the 3 month bootcamp did get good data jobs and i think in a few years if they continue learning they will be a positive force in the industry they aren't senior data scientists but they are good people to work with who understand enough to know when they are doing something wrong and ask for help i personally have time to deal with people who know what they are having problems with and can get ahead of the problem before they mess up but its the opposite (mess up and not know it) that really costs $$$ and time hi i'm following a master's in data science and i have the opportunity to take one of the 2 following courses: big data the course will touch the following subjects: novel programming frameworks for big data processing: theory and practice algorithmic solutions with focus on large inputs mapreduce apache spark clustering primitives graph analysis primitives data stream processing probably we'll be using python rather than java scala human data analytics core subjects are: dimensionality reduction with pca clustering: k-means - som - gng modeling times series: hmm neural networks: feed forward convolutional the applications would be mainly in modeling ecg signals speech face recognition inertial signals i've always associated mapreduce and related skills more to a big data engineer position (which i'm not really interested in) rather than a data scientist one do you have any suggestion on which course would be the most beneficial? unluckily i won't be able to attend both of them so the goal would be to follow the one that would give me greater benefit when i'll be looking for a job thanks for your help edit: if this may help ideally i'd like to work in the finance real estate field but this hugely depends on how things will go after the master's spark is used by data scientists too where i work we use databricks notebook platform which is built on spark by the guys who work on the core open source project additionally tools like spark are made for scenarios where you have "big data" more data than fits on one machine most places don't have that much data yet so focusing your skill set on that pushes your career toward institutions that have that niche of problems the second class is a bit more generalizeable as you are learning core concepts you can use in many frameworks including spark (via sparkml) so based on your interests i'd day the latter if you get a job or internship in data science and they use spark you'll learn on the job that's how i learned it so take that anecdote for what it's worth thanks a lot i was also considering to attend the course which could teach me things that would be otherwise difficult to learn on the job so your feedback is indeed valuable thanks again the human data analytics course seems best but that's just my preference the first seems more like a de route while the second is much more ds focused also the second is harder to learn (i think) so you could just learn the concepts from the big data course on your own like others have said the first is more about engineering while the second are solutions to classify data (e g given housing info predict how much it will sell for) both are likely equally able to be learned on your own but as others have said the latter seems to be more difficult so you may benefit from teaching support in that case also the latter is likely much more interesting hello reddit i'm looking for some advice and there is no shortage of that on the internet i hear i hate reading long posts so i'll try to keep mine short i'll just say who i am and what my situation is and what i want to do looking for advice on how to get there who i am: i have an ms in statistics i currently work as a data scientist at a medium-size consulting firm i'm very early in my career (i'm 26 about a year out of grad school) my work involves me coding in r pretty much all day (for some projects i am unfortunately forced to work in stata) i've hit a wall though i consider myself pretty much an expert in r (everyone defines "expert" differently what i mean is there isn't much functionality left in r that i have left to learn or care about ) my work also involves running my scripts in parallel on an aws server while my company does not use it i am also pretty familiar with python as a data science tool what i'm looking to do: i'm constantly trying to expand my knowledge and increase my skills to make me a more well-rounded "data scientist" (i love this highfalutin term but when i started studying statistics it didn't even exist yet) while my company does not need or use this skill very much (yet) i have my sights set on learning the intricacies of sql my company will also pay for any certification exams i want so i began gearing towards getting oracle sql certified (exam 1z0-071) i started looking into oracle because it seems to be the gold standard in dba however a coworker recently told me that microsoft also offers their own forms of sql certification i'm not entirely sure what the differences are in terms of exam content or certification prestige upon doing some cursory research it also looks like microsoft's t-sql has some sort of integration with r (my mother tongue!) which really appeals to me integration with r would also be something that would make my company more enthusiastic to let me pursue this skill so: what should i do? and where should i start? any information or advice would be greatly appreciated thanks in advance reddit! with rodbc you can use any version of sql you learn r does work with microsoft products now that they've adopted the language but you don't have to use t-sql for it to run sql is also very similar across the different versions so if you learn one type (pl sql or t sql) then you've essentially learned all of them it's never going to hurt your career to know sql if you can get into it then i think you should rodcb is really cool and useful - but not for what i'm looking at here rodcb is for taking databases and bringing them into r as a data frame from what i understand i don't see a point in using sql commands amidst r code since the tidyverse family already does a sufficient job of making easy data munging (at least for what i do) that being said - microsoft's t-sql looks to integrate r code within sql code i'm unsure of the utility of this but that's just because i haven't done any research yet (hence why i'm posting here) i'm not sure about the details but i see great benefits in the r code executing within the database server that way you don't need to read it over the network first in order to do the processing i intuited the same thing my db guy tells me that the r on server execution by ms is too new and (for now) poorly implemented so we're sticking with rodbc for now microsoft offers a similar database management system called sql server it used to be the nemesis of the big dog oracle back in early 90's sql server has made enormous leaps in making it a very popular and cost effective dbms you will see almost every company running a sql server instance or cluster for storing relational data today the recent sql server 2016 version has brought r integration natively to it! which means you can write your program in r package and deploy it to sql server 2016 natively and use it in your sql queries as if it were another table or view neat! (more details here: https: docs microsoft com en-us sql advanced-analytics r getting-started-with-sql-server-r-services) sql server is much much more than that and i'd suggest that you absolutely take on a microsoft sql server certification there a few versions based on what type of work you'd like to do within sql server space since you like writing code i'd recommend to pick one that is more developer oriented than say a dba oriented just look up sql server certifications in google and use their guided wizard to find out one that best suits you based on your interests and needs oh and most of the concepts you'll learn will apply to oracle as well in case you'd like to add an oracle developer certification to your learnings as well good luck! if your targeted end customer is fortune 500 companies - get the oracle certifications if your targeted end customer is small to mid size enterprises get sql server certifications if your targeted end customer is tech-enabled start-ups then go for postgresql certifications and get familiar with mysql most of what you learn will be applicable on the other platforms - or close enough that you can easily look up the specific syntax when needed i believe where the bigger differences come into play is in things that more at the architect or dba level how to shard the database and maximize for different kinds of performance for instance this is great! i suppose my problem is i have no targeted end customer this is essentially for my resume which my company uses to advertise their employees' skills to get more clients so i guess is there one of those 3 which suits those needs? or does it not matter do you think? in that case i would look at current company clients and ask what database do they most commonly have installed in enterprise systems then go get the certifications for that one so probably either oracle or sql server start-ups are much less likely to use a consulting firm what about for those not working for consulting firms and self-taught would your criteria for which certification to get still be fitting? if you are looking for full-time work then ask yourself which kind of company do i want to work in? big enterprise small to medium enterprise or start-ups? the stuff i build on my own i use open source - so mysql but it could easily be postgresql a certification doesn't do me any good it is my product on the other hand my consulting clients - i have been learning sql server because the back-end of various enterprise systems like great plains and retail management system is often sql server so when i write a web app to integrate with those systems and pull data out for reporting i am needing to work with sql server if i wanted to be an employee rather than a consultant i would go get certifications in sql server sql server 2016 has r integration previous versions do not as with anything microsoft the first version will probably not work that well just a heads up honestly don't waste your time with sql server certifications no one cares experience certification every time i've never heard of anyone looking for a data scientist with certifications in sql there is definitely some need for a data scientist who knows hadoop though! i agree 100% for my job being well versed in sql was a prerequisite but they never asked me if i was certified getting hadoop certified on the other hand people will actually pay attention to (and the certification test is actually pretty easy) as with anything microsoft the first version will probably not work that well just a heads up my dba confirms this i wanted to be able to quit using rodbc in favor of doing it all on the server he said it's a bad idea until the implementation gets better w3schools com has a fantastic free tutorial on sql that covers oracle and sql server versions of the code you could finish the entire walkthrough in about 3 hours if you wanted to good luck tbh i somewhat don't recommend devoting your resources towards getting a sql cert i would try to get your company to pay for almost any other certificate as u jackmaney somewhat alluded to you may find yourself saddled with taking care of databases as your day-to-day if you aren't careful you may find yourself taking everyone else's orders and building and maintaining their databases all while your statistics and analytic skills are wasted i'm not saying that you shouldn't learn sql but i'd be wary of getting a giant flashing sign on yourself that says "i'm a dba " if you are really wanting to get some kind of data training i'd say hadoop spark or hive maybe some mongodb or go crazy with like a graph database oracle sql server mysql etc each implement some version of the ansi sql standard so if you understand the basics of sql then you can use any specific rdbms[1] does your firm primarily use any particular type of database? if so i'd probably go for certification in that since you'll probably have coworkers who understand the ins and outs of that particular rdbms (and it should go without saying but if it weren't for your employer footing the bill getting any kind of sql certification wouldn't be remotely worth it ) if you're looking for a resource i recommend head first sql it's geared for folks with no sql experience at all--and quite possibly no programming experience either so you may or may not find it a bit too basic being fluent in r will be a major advantage in that you're already used to manipulating and aggregating data that is arranged in nice neat little tables one way that sql is different from most other languages however is that when you issue a query or a command to manipulate data you aren't telling the database how to go about getting what you want instead you're just telling the database what you want and leaving it to do its thing the five-dollar phrase to describe this is that sql is a declarative language i've found sql to be an invaluable skill that i use regularly and if nothing else sql comes in handy when using spark--which may or may not be what you use when you "run r scripts in parallel on aws" but if not i'd check it out it even has an r frontend [1]: relational database management system (eg oracle mssql etc) edited to add: whichever certification you go for you shouldn't go for it because "it's a gold standard in dba " dba and data scientist are two very very different jobs it's pretty unlikely (at best) that you'll end up in a position with the title of data scientist but where you'll be maintaining tweaking and configuring databases thanks! this is very good advice to be honest this is a solution looking for a problem my company currently uses no database tools storing every single excel csv etc we get on our local server we recognize this is unsustainable and our company is growing fast enough such that needing a dba might indeed be in our future as for learning sql itself - i probably should have mentioned that i know the sql commands for querying data that's the simple part select commands inserting updating data is cake what i don't know how to do is set up a sql server add data en masse to it or basically do anything with sql that makes it more useful than the r skills i already have and we actually parallelize with the r package parallel which really doesn't do anything fancy (it's just implementing parallelization in apply statements) coincidentally another one of my "2017 goals" is to learn spark (for python) and as for your edit: yes i agree there are just so many options i don't know where to start i want to be the best "data scientist" i can be and sql just seems to be the next rung in the ladder my managers agree it is an excellent skill and so encourage me to go for it you're welcome! if you're already familiar with using select statements etc then bulk loading data is fairly simple (assuming that the data is in the form you want which is of course the not-always-so-simple part) there are commands for it: eg copy in postgresql and bulk insert in sql server of course if feasible you can always read a csv in row by row parse it and then use insert statements to shove data in one row at a time (slower than bulk loads but sometimes useful for one-off analysis) unless you want to branch off into dba and or general sysadmin devops work i wouldn't worry about setting up and configuring a sql server instance (unless of course this is the only way that a database will be set up at all which would present a red flag ) summary: ms biology sep 2016 i'm 27 years old (almost 28) working a low pay temp job since sep 2016 in a medical device company (no prospects on permanent status +job search is terrible) i'm quite disillusioned with the scientist life and career outlook the very little i know about statistics and data analysis (basically just t-tests anova) i enjoy i like to incorporate quantitative things everywhere i can in my work i am taking trainings at work to get a handle on minitab and learn how to do industry manufacturing analyses like gauge r&r i have begun to learn python code please: help me by giving me advice on how to proceed from here i'm totally lost and while i'd like to start looking for data analyst jobs i have no experience or skills yet what do i do?do you recommend a code academy bootcamp? certificate in data science or ms? i'm not opposed to another ms i'd just have to take some introductory stats and programming and update my gre what can i do to start (yes i understand i might have to be in the biotech scientist job for sometime until i really transition) thank you if you don't know do a lot with statistics how are you certain you want to be in data science? i have been exposed to simple statistics and by no means was it formal education i learned stats that i needed when i did my thesis projects and at my work now analyzing my experimental data i want to know more and i am willing to learn it formally in order to get a better handle on it hey i’m also a scientist trying to transition my way out of biotech i had just competed a 12 week long bootcamp costing around $15k i would say you should consider the time factor: the market demand for data scientists analysts by the time you graduate to decide what type of program is best for you for a data analyst it may be possible to get the position with an online certification or a few classes but not a data scientist position at least here in the bay area the bar has been raised extremely high for them i'm willing to do what is needed for data scientist (ie an ms) but i know i would be obviously starting with the analyst position what bootcamp did you do and what was it? for me personally a bootcamp at 15 is a lot i'd pay more and spend more time to get a degree beats doing a science phd and still getting screwed in the workplace i went to metis which was a great experience for me yes it's a lot but i don't have the luxury of time at my hand i already have a ph d and it doesn't make sense for me to get a masters going for another masters probably is a decent choice for you as long as you're okay with the sacrifice in time effort energy there are loads of free or very low-cost resources out there including both textbooks and online courses many have been suggested on this subreddit i would suggest starting out by working your way through an 'intro to data science' or 'intro to machine learning' course or textbook to make sure you actually like the work and not just the hype then try to apply the skills you've learned to one of many public datasets out there both to get more familiar and to start building up a portfolio for your applications a bootcamp or master's will accelerate this and or keep you honest with your progress and time commitment but you're looking at a significant time or monetary investment for both there's a vast amount of free resources so if you're serious about a potential career change then at least get a first formal look i would suggest starting out by working your way through an 'intro to data science' or 'intro to machine learning' course or textbook to make this is wise advice thank you for offering this because i understand i do not want to head down a road that i don't enjoy when you say "build a portfolio" do you mean like github? ie since i don't have any experience in this field from an employment aspect i could apply to jobs and show them some projects that i did ? you can learn web-scraping using beautifulsoup or scrapy and obtain the data from the internet or you can download the data from kaggle and apply some basic data science concepts to extract insights or to create a predictive model yes if you have some meaningful side projects to show with clean code and nice visualizations it can go a long way towards getting that first job plus working on data yourself rather than just following classroom examples will expose you to the inevitable roadblocks or new avenues that require you to go on stack overflow and learn some new tools and syntax great i've started with just making a very simple python program for bioinformatics purposes but i need to really increase my pace my goal is to get more comfortable in coding with python and then venture into r you think its ok to jump into actual data wrangling material when i am still learning the python code? because i've seen a few seminars where they say "intro to data science and visualization no coding necessary" i just question - how can that be? you don't need another degree if you have an ms in biology best course of action for data science you would be either self-study (textbooks such as intro to stat learning moocs kaggle) -with- a personal project portfolio or a bootcamp program for data analyst you might need slightly less (i e a few online classes smaller portfolio no need for a paid bootcamp) like mentioned a data scientist is a stats-heavy position so you should ideally enjoy statistics too ld be either self-study (textbooks such as intro to stat learning moocs kaggle) -with- a personal project portfolio or a bootcamp program would a relatively low cost certificate be enough? for intro to stat learning- the text by robert tibshirani and trevor hastie ? thanks certificates aren't usually seen as a big deal intro to stat learning is by james hastie and tibshirani - don't confuse it with elements of stat learning which is a great but much harder book hey all a little background about me i currently work as a data analyst at ibm for over 2 years now recently i’ve been looking at potential paths for career growth and it seems like any path forward in data science is very strict on having a math or similar degree this is just based on my experience with seeking an advanced role and speaking to recruiters can anyone here share a bit of advice? do i stand a chance advancing in this field as a candidate with no degree? i’m at a point where i plan to aggressively self learn a new skill and weighing my options between furthering my data skills or completely pivoting to something else thanks data science is very strict on having a math or similar degree you're going to be severely handicapped without at least a stem bachelors if you're hoping to go beyond senior analyst - though i imagine bi related managerial positions in small companies are still on the table as a general ceiling bi managerial positions in large companies too source: my very situation i think you need to consider the following: 1) are you going to be able to pick up a rigorous mathematical education without a classroom and 2) will you be able to convince your prospective employer to choose you over someone with a master's in cs or a science phd? i think most people will struggle with the first and almost everyone will struggle with the second data science is different to other fields like cs in that you really do need a certain level of mastery of "advanced" statistical concepts to be an effective practitioner (i say "advanced" because it's all undergraduate-level mathematics) i echo the sentiment that you will really struggle to make serious progress in this field without a degree my company won’t even hire a data scientist without at minimum a masters 3-4 have phds better off teaching yourself to write code without a formal education imho i️ was in the same boat school wasn’t my thing tinkered with computers starting at a young age i️ started building websites for people i️ knew that turned lead me to building giant corporate sites and eventually having a decent sized dev team i️ oversaw and managed that led to another job where i️ got to really learn a ton and now i️ build large distributed systems that do really amazing things my best advice is to never stop learning never become stale always be curious read a lot network a ton contribute to open source projects i agree with your last sentiment to never stop learning and that is what i’ll continue to do it doesn’t look like becoming a data scientist will be a clear path without a degree regardless instead i’ll just take my data foundation and pivot to a field that can benefit from it they still use us regular engineer types to actually build the stuff to support business needs so learn that if coding interests you been doing this for 10 years now and i️ still feel like i️ get paid to tinker around on a computer all day in my opinion: 1) if you're asking if it's possible the answer is yes the ceo and cfo of my company are both from no-name bottom tier colleges they probably would've asked a similar question on reddit 30 years ago 2) it's possible but it will obviously be harder some companies will judge you right off the bat but there will be certainly be some (and stuff like networking will be immensely helpful here to get your foot in the door) willing to give you an interview if you have a strong resume demonstrate strong experience on your resume once you have an interview it's fair game and you just have to impress them and show them that you know your stuff you will need to build your resume and progress a bit slower jump from data analyst to senior build a resume with projects and demonstrate a solid grasp of a variety of skills 3) put a lot of time studying as much as you can - coding statistics etc 4) i would eventually try to get some degree go to your local college doesn't have to be harvard most local state schools can be quite cheap from there you can even do some type of online ms in stats math analytics cs since you work at ibm are there any possibilities they will pay for you to get a degree in cs or so? as a way for you to grow your career within the company? i’ve looked into this at various companies including ibm my experience is you have restrictions and hoops you’d have to jump through usually only to get denied then your team can only have so many seniors and manager roles the only route seems to be to a different company def not hi all i'm kind of new here i have been a software developer for some years and i'm progressively getting excited by ai and data science i have always thought that at some point in life i would be able to run my own small medium company and i was wondering if this would be a viable option as a data scientist the question may sound stupid (or it actually is help me) but the reason why i am asking this is because i feel that as a medium experienced software developer it would be somehow easy to "just have the right idea" and mock a prototype software web app mobile app to showcase to investors on the other side i see data science as a key department in medium big companies but i find it difficult to think of a medium experienced data scientist starting off his company by at least initially leveraging only his own skills i'd rather think of a senior data scientist running a company based on some cutting edge technology i hope this all makes sense i'm just trying to balance my current desire to get into data science with what i think would be my ultimate goal of running my own small company i would love to hear your feedback oh and if it makes any difference i'm 28 thanks my ceo (small company) is of the opinion that if you want to raise venture capital one of the easiest ways of doing that is creating a good pitch which uses ml he has basically transitioned the company into an ai fintech company without actually knowing all that much about ml the major difference between us and startups in this space is that we have a lot of goodwill with our clients since we have worked with them for many years on other technologies (i say we but i only joined after the ml stuff) a huge determinant of your success is likely to be the human factor - can you develop good relationships with your clients get them to share potentially sensitive data and so on thanks for your feedback i mostly agree with that as far as i know they key thing most of the times is a match between a good idea and the ability to communicate its potential to investors would you mind sharing some tips on how to get people to share that sensitive data? i'm currently involved in a project with machine learning but the major hurdles so far seem getting the data rather than the technical aspect we've got a few entrepreneurs convinced so far but every new one is (rightfully) reserved it's a question that basically breaks down to: how do you build trust? it's not an easy question to answer for us though we've worked with our clients for a long time and have consistently delivered a quality service but even then when we actually want access to their sensitive data we have to physically go to their offices to run our processing pipeline and we're not allowed to take data off-site even the models we build which potentially contain sensitive information about the data it was built on has to be left there so i think it's about understanding the risk tolerance of the client and then mutually developing a solution which satisfies that risk tolerance maybe you have to invest in it (and physical) security infrastructure maybe you have to agree to only bring an anonymised and heavily processed version of the dataset out maybe you have to do both i agree with the above another important factor which is missing is the bureaucracy level of the client as someone who is working with healthcare providers i can tell you that no amount of trust will cut through a sturdy admin assistant who is following protocol you need to be more creative there :) blackmail what kind of company? it's important to realize as early as possible that "small medium company" has different definitions to different people would you like to go down the startup route? or build a more traditional model? the former is insanely difficult the latter is just crazy difficult what is your definition? i currently live in europe and i don't think starting your own company is neither crazy or insanely difficult you just need the right mix of people who believe in you and an idea that solves some kind of problem obviously hard work determination etc are part of the mix too i think this is very related to the fact that our model of startup is much different than in the us honestly i am not thinking about the kind of company i think it will very much depend on the problem it will try to solve and thus the resources it will need the thing that worries me atm is that for example creating a startup (mobile web app) off a service that delivers food from restaurants to your home is pretty easy once you have the idea (which is again pretty simple) while i feel like having an idea that leverages specific data science expertise will have to solve deal with some super-techy-advanced stuff which is probably much more difficult to come up with don't you agree? i can't understand if i'm simply unbiased because of my previous working experience you just need the right mix of people who believe in you and an idea that solves some kind of problem * i think the key is to have an idea that solves some kind of problem that no one else is already solving better there are a boatload of ml-based companies out there in just about every niche you have to figure out your place in that jungle tl;dr tons of opportunities but you're still constrained by the standard sales marketing business challenges you would be in any other industry except here you have the data gathering constraint layered on as well there are a ton of new ways that ml could be applied to a given industry that would create a ton of value i can't think of a single industry that would say definitively that they've already found all the ways that ml can make money in their industry so in terms of unexplored opportunities yes there are myriad ways you can use ml to make money in terms of actually starting a business to take advantage of those opportunities you've got the problems people are outlining above gathering the data is required to build the ml models and most of the best data exists in company's internal databases getting companies to trust and use and pay for your product even if you have the data is still the standard entrepreneurial sales marketing challenge it is in any other industry first you should think about your datasets before doing data science projects what your data scientists (or you) will do without data? traditional approach: 1) find a problem which affects a lot of users 2) build a product and start collecting data 3) (hire data scientists and) optimize your product or business processes by your dataset using ml not traditional approach (i'd love to see examples): 1) find a great dataset which no one monetized yet 2) hire data scientists and develop a product based on the dataset hi r datascience i currently work as a data analyst at a large tech firm i graduated 2 5 years ago with a bs in applied mathematics (with a few extra statistics courses) and a minor in computer science i want to increase my skill set (in particular handling big data sets and application of statistical modeling) to become a data scientist since 88% of all data scientists have at least a masters degree (and 46% have a phd)1 i know this will be necessary to achieve my goals i want practical hands-on experience so i can jump into a role and immediately apply learned methods i am not particularly interested in too much theory or research at this point so i am interested in shorter more practical applied statistics masters degrees over the typical ms in statistics the masters of applied statistics program at colorado state university has caught my eye it has been around for five years (compared to ucla's in its first year) and i would plan to attend full-time on-campus for the 11-month program i reached out to the director of the program for a list of careers graduates pursue and her list of 10ish jobs were almost all statistician at various companies here is the curriculum for the lazy here is the list of classes: three-week review course: mathematical tools for statistics three-week review course: computing tools for statistics probability with applications mathematical statistics with applications experimental design mixed models regression models and applications generalized regression models applied bayesian statistics methods in multivariate analysis non-parametric methods analysis of time series survey statistics statistical learning and data mining computational and graphical statistics methods in simulation and computation quantitative reasoning topics in industrial and organizational statistics end with a six-week group consulting project do these classes seem like they would prepare me for applying to data scientist jobs? i am concerned that there is no machine learning class listed it looks like they offer a graduate-level computer science course that covers it i may try to audit that class or if anyone has online resources for self-teaching i'd love to hear them! my plan after finishing the program is to either stay in colorado or move to los angeles where my family lives either way i would like to find a job that pays at least $100k so does this seem like a good plan? should i go for a ms in statistics instead? help a fellow math lover out :) thanks for reading! the machine learning course in the list is “statistical learning and data mining” statistical learning is another term for machine learning [deleted] yes this is the summary from the catalog i didn't think it quite qualified as the ml i would need [deleted] you would probably learn logistic regression and decision tree based classifiers k means clustering perhaps k medoids clustering and pca for unsupervised learning “machine learning is a manifestation of statistical learning techniques to help learn about the data better ” https: www google com amp s www techinasia com talk data-science-simplified-statistical-learning amp ml isn't just about the tools it's also about the goal which is the point u __compactsupport__ is making people who are only introduced to the "explanatory approach" tend to be handicapped in a prediction focused environment - overly concerned with multicollinearity reliance on metrics that aren't tied directly to the outcome of interest (aic r-squared) overly concerned with p-values breiman discussed this concept 16 years ago https: projecteuclid org euclid ss 1009213726 i understand that statistics optimizes metrics for explanatory and inferential power (estimation) whereas a computational model would optimize for prediction on future values two very different things indeed what i am saying is that statistical learning is just machine learning through a statistical lenses; we get insight on why our predictions are good if statistical learning was 100% inferential then why is "elements of statistical learning" by hastie a common reference textbook technique for machine learning what i am saying is that statistical learning is just machine learning through a statistical lenses; we get insight on why our predictions are good i think that's a healthy view point i do not think it's a view point commonly taught ymmv i spent months i wish i had back debating old guard stats guys about this on linked in if statistical learning was 100% inferential it doesn't mean that to me personally but when a class is titled "statistical learning" then more often than not it's an inferentially focused class (in my experience) if cross-validation tree ensembles etc appear in the syllabus then it's safe to say they are covering prediction as well then why is "elements of statistical learning" by hastie a common reference textbook technique for machine learning because professors have poor taste in learning material otherwise they'd use kuhn's applied predictive modeling :p “statistical learning” is a framework for machine learning and had success in applications such as computer vision speech recognition and bioinformatics (from the wikipedia page) so to me it is mostly for prediction and the terms “machine learning” and “statistical learning” are interchangeable almost i think you are thinking of statistical modeling which tends to always be for inference and estimation of parameters in linear models for example fixed and random effects estimation what i am saying is that statistical learning is just machine learning through a statistical lenses; we get insight on why our predictions are good when i read this i think "explanatory" but then you referenced "statistical learning is a framework for machine learning and had success in applications such as computer vision speech recognition and bioinformatics " i can't connect those two concepts because i don't see vision and speech recognition solutions as "explanatory" at all [deleted] thanks for your response! you are correct that the program focuses on r and sas i will focus on ml and python in the time between now and the start of the program there seems to be some interest here to hear from hiring managers about the employment side of the equation i have literally ( about 5 minutes ago ) taken on two new junior positions - i thought i would cover off some interviews for you guys - i saw a lot of themes some background: role is for a junior role on lowish pay - my expectation is that this person will progress over the next 2-3 years to eventually be on 4x - 5x their salary role is a mixture of coding and true data science with a preference for someone with a machine learning background - although wasn't considered mandatory i have a strict policy of specifically not recruiting anyone with a msc or phd in statistics ( and i'm not alone in this - lots of managers of data science teams use stats as a specific exclusion - flip side is that some consider it mandatory ) stats people are very bright but their professors in general let them down very badly - they aren't taught flexibility and not taught a lot of valuable skills and so always seem to struggle in the real world for context - the majority of people who were applying had not had a private sector job before - most were either straight out from their courses or had been doing grad assistant type work for a while resumes i have 47 unique resumes from 3 different recruitment agents the majority where grads and post-docs with stem related degrees who said in their intro that they wanted to do "data science and stuff" these got whittled down from 47 to 10 i will be totally straight with you - that process took me less than 1 2 hour at most - probably more like 20 minutes common issues i saw: nothing but education most people who applied for this role had very deep educational background - but most showed no personal interest in this type of work nothing in their hobbies and interests no work for charities or groups doing "stuff" with data not even a "i play about with a arduino" - judging by their cv's these people have done their degree masters phd with no level of personal interest in their subject some recruiters won't mind this - personally it's a big issue for me - this is an r&d role and so it needs enthusiasts "extensive knowledge of big data with excel" my team probably have 20+ platforms and systems available to them and i generally don't care what they work in ( unusual by the way - most places either prefer sas or prefer r as a default with other platforms as secondary choices) i don't think that a single one of them has gone more than 48 hours without using excel i am very pro- excel i do however expect more than just excel i also expect more than excel and vba vba is frankly frigging amazing at what it does - i have made my previous employers a lot of money using vba but if i want you to deal with 10 000 000 records - which i will - then you need more than this matlab i had a whole bunch of matlab only experts as well thats ok but if you want me to have to spend $100k year to give you your tool of choice - you either need to dazzle me on why your amazing or have a 2nd and third language in your arsenal as well and be damn sure to put a specific comment in your cv along the lines of "i know matlab is a bit niche i am a legend in python r sas perl as well" "i know sql" what does this actually mean? have you noodled with a bit of postgresql? are you a leading expert in pl-sql and the bees knees at working with clobs and blobs? are you an intermediate user of t-sql but only if you have all the ms widgetry available and turned up to 11? do you use toad for looking at pictures of databases or tuning them? for this specific role of mine it's wasn't hugely important - but some knowledge of sql is - for any role in this industry ( cos of group by and hive if nothing else) and if you have enough knowledge to add value my expectation is that you know that "i know sql" is like saying "i can do maths" - its so expansive a statement it makes you look bad by putting it the generic cv it sucks writing cv's - i personally hate doing it - so i can understand why other people do as well but - you need to suck it up at least 1 2 the cv's i got were totally generic - it was entirely obvious that the same cv was being used to apply for a data science job in the financial services and an internship in a laboratory you only get 2-3 pages to show yourself off in a cv and so all any recruiter gets is a fraction of you if you use a generic cv you don't need to create a custom one for every single position but you really should have a few "flavours" of cv's on hand - perhaps one for university positions one for your phd speciality in private industry one for a data science specialty in private industry they are a pain in the arse to write but you only have to do it once and it makes a huge amount of difference also - don't assume i am a specialist in your specialist area - unless the rest of your cv is earth shattering i am not going to go looking at wikipedia - because i have 46 other cv's in front of me a whole bunch of work a whole load of meetings and frankly probably 10 or more of the 47 cv's i have will turn out to be as good as you this sounds brutal - and it actually may be is but it's also realistic most people who applied are educationally similar - and so that means that the education is nothing special you need to show me either "valuable" "different" or "interesting" to get a phone interview the way to think about it is that you have spent 17 years and god knows how much money to get your education it doesn't take more than an extra 2-3 hours and no extra money in front of the laptop to make you shine - so it's probably a good investment anyway - from the 47 cv's i got i whittled it down to 8 good solid grads and post-docs 0- people i was interested in with good cv's one horrible cv with little glimpses of genius in it that felt worth a 1 2 hour phone call and one frankly weird cv from a 20 year old kid which broke every single rule above with not even a higher education background - but brings me onto the last rule sometimes it's about who you know this kid is frankly a ruffian if i met him on a dark night i would cross the street to keep away from him he has essentially no education and does have a criminal background however - he's also known by someone that i respect a lot and who knew i was looking for heads he would never ever have made it onto my radar through usual channels - but he had impressed people who had impressed me - so he skipped the queue again - perhaps unfair - but i don't know anything about the 46 other people and a guy i think of as a genuis says that this kid is top talent - so he automatically wins lesson here - reap your rolodex professors people in industry people you met at events collect phone numbers make an impression and be open enough to ask them for help will post the interviews in the next message how exactly do you expect someone to go to 4-5x salary in a few years? either your starting salary is ridiculously low (which could explain lack of quality) or the 4-5x is impossible the minimum i would expect for a data scientist is like $60k a year (and that had better be a very junior right out of college person in a low col area it is ridiculous that you go from making $60k to $300k not saying it never happens but for you to expect it if that $300k number is right please tell me what industry area so i can look to get into it hah 50- 200k is certainly possible if you are talking about someone without much of a formal education someone in that position could be more a less self taught but without being able to get the attention of hiring managers that being said the kid must be pretty exceptional 200k in 3 years starting with that bbackground would be tough though but if there is any field to do it in its data science no one gets hired to do data science - or any job - for any other reason than money - they make or save x and they get paid a percentage of x and the company keeps the rest as profit if i have really good people then they are going to get poached i can offer them very interesting work but my company doesn't have the big fancy name that others do which for a lot of people matters a lot - so basically if i want them to stay i will pay them more $300k is still vastly less than the top members of my team make for my organisation every year so it's good business sense i do like them as people - but you don't pay people because they are nice chaps - you pay them to protect your revenue streams as for industry - i work in a bit of a niche in financial services those kind of salaries are in no way uncommon you have the quants - which is in the same world - earning much much more than this from what i can see - certainly in the uk and if you set aside the quants in the city then the big money is where most people don't look - retail storecards supply chain optimisations edges of the financial services it's not google or twitter so it's not glamourous but data experts can make dramatic impacts to companies bottom lines - which makes them money i have a problem with your first statement never been in it for the money i wouldn't hire anyone who is "i have a strict policy of specifically not recruiting anyone with a msc or phd in statistics " " they aren't taught flexibility" alrighty then i really would like to see op elaborate more here a statement like "all x majors are y" seems incredibly myopic not to mention the irony of complaining about a lack of flexibility when you are showing literally no flexibility of your own it's really strange for me to hear that it's true that many stats majors have no real world job skills but the best data scientists i know have maths stats backgrounds and have picked up comp sci on the way i think they're worth an interview at the very least furthermore they're likely the ones who have the foundations and understand ml as something more than a black box and use tools judiciously is there any data scientists from econometrics backgrounds? or are they strictly preferred in consultant type positions? yes i have 1x economist and 2x econometric-ists (i probably invented a word here ) all three are majority self taught we have put time and training into all of them in all three cases to bring their coding up from "good hobbyist" to "doesn't drive the operation team demented" - i like how their minds work - they work very differently to the physics and maths guys - totally different some amazing work gets done when they pair up - they fill each others gaps actually - thats probably not right it's probably better to say that they have very different mental models of the world - great results come when then mental models overlap econometrics ds reporting haven't seen many others though did you learn stuff like nearest neighbors or random forests by yourself? yep and i'm still learning daily i would suggest - and this is only my opinion and this definitely doesn't apply to everybody who recruits - that the "science" part of data science isn't just about understanding the intricacies of the algorithm but about being able to apply it's results to the real world understanding the algorithm absolutely gives an extra string in the bow but it's not the be-all and end-all a counter-example would be someone working for a regulator checking other companies work those people absolutely and utterly must be masters of the algorithms and know all the edge cases and gotcha's i won't put words in op's mouth but i have noticed a resistance to hiring phds in data science roles because they usually have a tough time communicating with non-technical supervisors and moving away from the researcher mindset (the things that seem most interesting to try with your data aren't necessarily the same things that will make an employer money) yeah idk i see plenty of phds being hired sure some of them don't have great communication skills or a business mindset but that's what interviews are for plus op was specifically talking about stats students which just seems like a kid of ridiculous generalization there are a lot of employers who will actively recruit msc's but specifically not phd's or who will not think about them any differently personally i don't care - but there is a fair degree of prejudice about phd's in a number of companies - a mixture of reasons really - a lot boils down to an assumption that this is a person who is so clever that they are a nightmare to manage or can't communicate well - or especially that they are so used to the pace of academia that they can't be pushed to hard tight deadlines my experience is that some can initially be poor communicators but if they are the right person that's nothing that can't be fixed with mentoring and practice some don't like being pushed - but that applies to people from any background about equally one of my absolute best guys is a nightmare as soon as there is a time pressure - he just buckles but the rest of this work is phenomenal so we work around and accommodate it i will say that more phd's come across as arrogant in the first 5-10 minutes of an interview than msc's so which i don't fully understand - lets be clear though it's not like all of them do - it's a percentage more of them and i guess that this gets peoples back up - first impressions count etc etc - but really it's not an enormous issue however - the general prejudice - at least in the uk - against phd's is real it's not crippling but it's there i wouldn't let my kid do a phd unless they wanted a life in academia as my perception is that it's mildly harmful to getting your first couple of jobs ( which is sad but seems to be the case) good summary and lines up with what i have heard from employers and phd friends as well phds are useful and valued in a lot of fields but they're definitely at a disadvantage when applying for client-facing roles or roles where they have to communicate their research findings to someone from a non-technical background in a few years when companies can afford to grow data science departments and there is enough talent available for them to do so it will likely make more sense to hire phds alongside data science communicators who distill the work phds have done to something that's more approachable for those with a business background he also puts down the "ms widgetry" yeah sorry i don't have the entire python standard library memorized plus pandas plus scipy plus numpy plus matplotlib etc intellisense often illuminates new ways to solve a problem or at the very least makes coding happen faster and if op is talking about widgetry in sql server like cubes or rollup (oracle has this too btw) or fucking all of r right there in the db (to be fair only in sql server 2016) then he needs to check his privilege i don't care at all about using widgets as a productivity booster - i am all in favor of it and do it myself as often as i can but - there is a big big difference between using them as a productivity enhancer and not being able to function without them example: if your working with postgresql there is a huge value in doing a lot of your work within pgadminiii rather than from the command line however some people can only work from pgadmin iii - which is an issue if they suddenly can't access it - perhaps because of a computer rebuild or working at a client site or needing to work in a restricted environment where the tools aren't available - all of which are common enough problems as i said in my initial post - it's not life or death if thats the position you are in just don't fall into the "i know sql" trap not to mention the irony of complaining about a lack of flexibility when you are showing literally no flexibility of your own u nico-suave & u cjf4 it isn't irony when it is based on past experienced and you have a stack of 100 cvs and only 1-2 hours to go through them all it's time management that said i know plenty of recruiters and data science managers in silicon valley who are weary of msc and phd types they're usually used to very clean data and making very clear decisions whereas real-world data (esp at businesses) are dirty and biased as hell and these guys tend to be very uncomfortable with using such data and helping the business make a decision additionally there's communication and being pedantic i had a phd consultant - who i was interviewing for a position - and went this huge rant about how he disliked his previous client because they kept on interchanging confusing "a b testing" and "multivariate testing " he obviously had a huge problem on understanding how to communicate manage upwards and has bad expectations on working with non-techies for me i think the above goes with not being used to working in an office environment where you have to manage-upwards with non-techies if i see a phd who has worked well in an office space of course i'll take a close look but i'd be reluctant to be that person's testing ground - unless there's some other variables at play too "i specifically recruit people who are complete unqualified for the job title i'm hiring for " ftfy op i don't think this is the case the reality is that "data scientist" has joined "big data" at the top of the hype curve there are at least 50 different very different roles you could end up doing with that sort of title my teams' job is to extract significant levels of new different interesting and profitable information from noisy datasets using what ever methods all for the best signal extraction for this my company gets a good deal of money we have discovered that finding the edge cases of these signals - finding the noodly oddness that suggests there is something not fully understood in the more basic models - is a way to reap far higher levels of impact from the clients which the sales guys can use to turn into better revenue so - i can about signal extraction from noise within huge datasets and i'm especially interested in the edge cases which means i need a team who can use large numbers of tools and technologies and algorithms without having a bias towards any of them i think i hire about the right type of people - their results are certainly ok and i consider these guys data scientists flip side is that i have used the example a few times of a mortgage analytical group in a big bank they would consider the guys i hire as an anathema they typically have very fixed processes and procedures and there is a commonly understood place to "start" and "stop" they also use data to extract value and ( some at least ) are using ml techniques spark etc they also refer to their people as data scientists they are different people - very different - but at the moment at least they fall under the same banner you talk the talk but you explicitly filter out people who understand statistics people who understand how to extract signal from noise and to recognize meaningful signals from randomness that can be confused for signal regarding the different roles you should read this and watch this no there are a vast number of methods to extract signals from noise stats has some chemistry has others physics has a whole set of families and sub-families that focus on these types of problems engineering has some wonderful methods which don't even need the slightest tweak to be able to be used straight away in this world no group is more or less good at using their own set of methods - and there is value at looking at many of these problems with many of these methods but i have been doing this job and running teams like my current one for more than 10 years - before "data science" was a thing i have hired many statisticans and programmers and engineers and scientists literally hundreds of them the scientists and the engineers flat out do better it's not a personal attack on the statisticians - they have had a different mindset hammered into them the stats guys who i have hired do great work they make plenty of money for my company but the scientists and engineers just make more given a limited budget to run my team on and a limited head count - which should i hire? again - the type of team i run doesn't fit into all scenarios i get offers every few weeks from people who would like my entire team to move to their company but equally i am other team managers worst nightmare it's about the job it's not personal it's just money - and in my world the stats guys make my company less having said that when i saw your initial comment i did reach out to some other recruiters who i know well - i e will trade text messages with and go for a beer with four of them specifically reject stats grads 2 of them are ambivalent and 2 have a minor preference for them this is obviously a tiny sample so take it for what it's worth i think perhaps you didn't read my other post if i hire a stats grad then they are utterly wonderful at stats you want a curve fitting to some data - these are your kiddies and thats ok for a lot of companies all these people have jobs waiting for them in the big banks but where their professors let them down is that they are being taught stats and not science a typical stats grad will chew through the work you put in front of them - probably better than the stem guys that i usually here - certainly quicker for the first year or so however stats guys are taught - for many many many years to find an answer "find the right answer - then move onto the next question" someone from a physics chemistry engineering etc background is not quite as good at the maths itself - although they will learn however they have had years and years of being told that they need to assume they are wrong and to keep worrying at the bone until they can't worry any more again - i hire people because they make the company i work for money there is - in any industry more money in the errors the problems and the niggles solving the deep underlying problem rather than the surface problem will always save a client more money and so they will pay the data scientist more to find it trivial example: a client wants some analysis done on the dates of birth of their own customers first thing anyone is going to do is give the data an eyeball check - so rip off the years and plot the day month combos you get a nice gently sloped straight line - like you should in this case - there's a hump at 04 04 a stats guy will ( and i know this i use this as a recruitment test where it's been done by perhaps 200 people) will see a hump and accept it as noise move on to the next part of the problem someone from a physics background will go digging ( well - some of them will - which is the people who i'll be interested in) - why is there a hump? there's no good reason for a hump - it's almost certainly not natural dig dig dig i won't give away the answer - because maybe one day i will interview someone on this page at a high level - it's an indicator that there is a very specific programming error within this clients systems (you can probably work it out though) and when you see what that specific error is the next question - the science part - is "what else does this effect?" - and then you go back and look at some other data and you find a huge problem which is awesome because huge problems = big money this is a real world example from a few years ago- a stats guy did look at it initially and passed it over a physics guy ( actually a specialist in optics ) went digging for it and they turned a £40 000 piece of work into a £2m piece of work it is not the students fault - it's the way they are educated mindsets are a lot harder to alter than skillsets because mindsets take years to develop and so years to change it is not the students fault - it's the way they are educated i agree i see the same thing in engineering where there is more curve-fitting using sophisticated mathematics e g manifold theory than physics-based understanding of the problem a specific example is control engineering where different controllers for varying conditions are curve-fitted together (gain scheduling lpv etc ) rather than developing a better underlying physics model (perhaps augmented by additional observers or data streams) that not only smoothly interpolates the data but enables extrapolation beyond the current envelope (where data is available) granted the underlying physics model may require sophisticated mathematics as well additional information and more time to study model the dynamics but the gains are significant over a purely mathematical statistical solution good lord i very heartedly agree with this there are an enormous number of problems in the financial services "big data data science" world considering a "system" from the point of control theory - the concepts of damping feedback closed and open circuits etc - is both a genuine paradigm shift and also generally blows peoples socks off - it turns out that a bynch of people have a sub-concious niggle that their pet lump of data is part of some form of feedback system and don't know the words for it like i have said a few times - this is where a data scientist can make the difference between £000's and £000 000's of pounds of revenue there are some places where you can get all wonkish and need an outright expert - btw pid experts are a weird breed of geniuses with the most complicated spreadsheets you have ever seen - but it's reasonable to want to hire someone - be it an economist an engineer or a scientist who can make the two jumps of "there seems to be more here than i thought - let me go dig" and "this feels like some sort of system with a feedback loop" literally anything which involves money likely involves a system which can be me modeled as either dynamic or dynamical and many of them invove multiple systems colliding many (most?) of the signals that you see in the data feeds are consequences of interacting dynamic and or dynamical systems i don't expect anyone who works for me to be an expert but i would at least expect to be able to hold a discussion about the difference between a "static" and a "dynamic" system and to be able to spot one if they see it and i would expect swearing and frowning when discussing "dynamical" systems some of the recommendations here feel trollish tech hiring managers seem to have an antagonistic attitude toward applicants maybe all corporate jobs are like that? it does seem a little worse in data science though i'd guess it's because the job requirements are ill-defined so hiring is hard and managers are already salty about giving a lot of money to people who are often smarter or better-educated than they are dude we're trading my time for your money it's a transaction no need to act like either party is doing the other a favor i think a major problem is that hiring managers are overloaded with marginally-qualified applicants if op received 8 really good applications i bet he she would have spent more time evaluating each one but 47 is a stack and no one wants a stack of work (except maybe people who implement basic data structures ha! good one moving on ) but mostly i want to agree with you about the job postings themselves i'm trying to get into the field and i really know my interests and specialty so i only apply for jobs that are looking for precisely that so many jobs postings are written that have extremely vague bullet points ("4+ years of related work experience is required specifically in these two areas: proven experience analyzing data performing statistical analysis creating data models forecasting and making actionable recommendation" to grab one off indeed com) that don't tell me if my areas of expertise are what they're looking for so tons of people think "i could do that" and apply is optimizing a logistic regression in python (or r if i had to) helpful in that position? sometimes i think companies don't really know what they're looking for and want someone to tell them which is fine that's why they'd hire someone like me but instead of being overly vague or listing literally a dozen different software tools that the ideal candidate should have 5 years of experience with they should describe some of the business problems that their dream person should be able to solve then i'd know within five minutes of thinking about the problem whether that's something i can really help with also i wish they'd list the talent pool they already have how big is the team? what are they great at? there's nothing like bouncing ideas off of people who can come at a problem from a different angle is that the case or is the person they hire expected to be a robust expert and build everything themselves? maybe there should be a sub just to teach hiring managers how to find the data scientists they're looking for they should describe some of the business problems that their dream person should be able to solve i love this sounds like you should make that sub or at least write a standalone post on it i agree! u vincentveritas you should definitely go for it if you can make the time to write something up common issues i saw: nothing but education oh come on so many people receive advice to not go into their hobbies and interests what if their hobbies are completely different to their daytime work? i am a flight test instrumentation engineer who sometimes analyzes very small amounts of data (a few gb at a time at most) when i'm not at work i enjoy playing with my jetson tk1 development kit among other things why do i need to mention what i do outside of work and why do you expect it to be similar to what i do at work? you say you want flexible people and then you look for people who quite frankly sound freaking boring because their personal lives and their work lives look the same on the flip side what if their hobbies are their daytime work as in the do daytime work for fun not during the daytime? how does one represent that in a cv? why do people care about hobbies on cvs? what i mean is what i do on my offtime is my business why do you (generic you) as a hiring manager care? this whole business of "we want passionate people who really give their all" is nice but you don't get to have expectations of what i do when you don't pay me i am absolutely passionate about my job and i put everything i have into it while i'm at work when i'm not at work i want to do different things to recharge and stay interested in life now some people do have the same hobbies as they do at work and more power to them but you shouldn't expect that [deleted] yes exactly my last job tried to pull this nonsense "the guys who will get ahead are the guys who will work 80 to 90 hours a week!" me: "but you only pay us for 40 hours a week " management: "but eventually you'll get a pay raise!" me: "that doesn't help me now i don't work for free " then they started putting all kinds of pressure on me to work more instead i got the hell out bro ever heard of korean japanese work culture? 80 90 is just the mark for reliability :d i live and work in japan a lot of those hours are spent not working sleeping at your desk getting drunk with your boss at mandatory after work parties etc are reasonable activities to include in those 80-90 hours a week yes i have heard of that work culture and i refuse to have anything to do with working that much without pay i have had 100 hour weeks in the past (on deployment) but i got paid 1 5x for each hour over 40 and 2x for hours on weekends i like to go rock climbing should i list this on my cv or will it be evidence im not actually passionate about some job i spend the majority of my time doing while awake? depends on if you're trying to get a job for u kindasortadata or not i kinda sorta don't want one after reading their post neither do i the post reeked of the kind of management i do my absolute best to stay away from put it in genuinely there is a weirdly high number of it and data science types who do either rock climbing or bouldering it's not going to hurt and gives you something to discuss cycling as well in all the companies i have been at there always seems to be many more it and data guys who are into cycling than the general population and not "going for a toddle down to the shop" - i mean "yeah i did a quick 90 miles before lunch yesterday " sort of riding i'm going to go look into this more tomorrow - maybe there will be a nice graph in it somewhere golf not so much i respect that you have taken the time to respond to like almost every comment here that's pretty cool of you i always harp on personal projects because it bothers me someone wants me to do something related to something i spend the majority of my time doing i happen to have "personal projects" but i don't think of them as something to do that is like work i think of them as something to do instead of work and i'm not going to hate a guy who after spending all day at work wants to go home and vegetate on the couch watching friends reruns (okay i do this sometimes and i don't care if people judge) i don't expect anything of you no recruiter does and i mean this literally - until i interview them all these people are is names and bullet points on a few pages of paper many of the cv's will be very similar the people won't be in reality but the cv's make them look like they are i am not going to interview all 47 of them so how do i make a distinction? one thing that i need for this job is a passion for problem solving and critical thinking i want people who can be given a task and do it without being spoon fed every step of the way i really do want people who are especially interested in playing with data if you don't show me that on a cv and someone else who is fairly similar to you in terms of listed skills does then i may well short-list them over you just to save some time am i going to give you a job because you list some cool hobbies - nope but it won't hurt and maybe it'll give you a small edge to get a phone interview okay now the tone of this sounds much more reasonable and less hostile (not at all hostile) than your original post that makes sense your cv should show why you aren't just another schmuck off the street as long as you do that i don't think that a single one of them has gone more than 48 hours without using excel uh wat excel has its place and it's definitely useful but once i'm past the very initial data exploration i'm probably not touching excel i have a strict policy of specifically not recruiting anyone with a msc or phd in statistics lol you have no idea what you are hiring for a data scientist is basically an applied statistician also ditto everything u thelordb said about salary for small to medium sized data sets - say under 1 2million records - there are some types of work where excel - no matter how uncool it is - is just about perfect lets say your chopping a file layout around - or more commonly - your just eyeballing it and want to drop in some graphs to do something like a time series analysis to look for trends there are a million and one tools that specialise in elements of this better than excel - but excel is often "good enough" for rough work excel textpad and v viewer are like the swiss army knifes of a data persons world ugly always less than optimal but often "good enough" as for a data scientist being an applied statistian personally i don't believe that to be true there are a large number of applied statisticians just in my industry they do good quality work they add a lot of value but i don't believe it is data science what i personally believe doesn't especially matter but their own companies also don't believe it and so hire people like my team to do things differently not better differently my expectation is that this person will progress over the next 2-3 years to eventually be on 4x - 5x their salary interesting do you mean the person will stay in house or leave within the next two years for that 4x salary? what about github profiles or porfolios? i'm going to pay what it costs to keep the people who work well typically my team will average somewhere in the region of an 8% - 15% yoy pay rise and a decent ( lets say 15% ) bonus if they stay in their initial role if they step up in seniority then they will make a big jump in pay if they get good enough to get offers from other places and they do good work then i will counter-offer to keep them in the north of the uk there is far more demand than there is supply so ending up on 4x to 5x starting salary is not in any way for reference - 1 2 of my team is paid a good deal more than i am even though i manage them like i said in a previous comment - this doesn't cover the world of quants - those guys can earn insane amounts - it's a different world because i have 46 other cv's in front of me a whole bunch of work a whole load of meetings and frankly probably 10 or more of the 47 cv's i have will turn out to be as good as you you should omit that because it goes against the "data scientist shortage narrative" companies are lobbying for i've always hear the data science shortage was in qualified candidates not necessarily candidates overall? i've heard it's senior candidates that are in short supply more so than qualified same in a lot of industries (i'm new to ds previously worked in two different industries and hear the same complaints) people with maybe 5-10 years experience so let's say 7 as an average of what they're really looking for if you think back to events around 7 or 8 years ago i'm sure you can guess how many fresh junior recruits were being hired at that time which i reckon has resulted in a shortfall down the line (present day) at the moment ( and this will change over the next couple of years as the dedicated data science courses get good and spit out their 3rd and 4th year of candidates - which is 2016 2017 ) there are not an amazing amount of full blooded data scientists in the north of england - actually there isn't a lot in a lot of places which want them so - at the moment companies are taking people with great potential and moulding them into the shape they need lots of those cv's had 70% - 80% of what was needed so it was a case of taking those out of the pool that looked a little bit different and seeing what they are like as people be right back as i get more elaborate about my sql knowledge thanks for the helpful insight what is your take on when to be applying? for example i'll be defending in may is now too early or too late to be applying? depends on if you are uk us or other in the us - jobs don't seem to wait 2-4 weeks and they're gone in the uk - 1 month - 3 months is practical ( a lot of people applying for these types of roles who are in the world of it are on 3 month notice periods so it's accepted to some degree) other - unknown but:::: like i said - learn to reap your rolodex - make contracts get your name known link up with recruiters now build your linkedin network go to conferences and talk to people at the bar now is exactly the time to be doing all of that hey i'm just a guy who got a job post-academia not a manager generally when companies are hiring it's because they needed somebody three weeks ago how flexible is your program? are you on something that can fund you a bit into the summer while you look for a job (say between defense and deposit)? if so you can just start your search as soon as you defend yeah my funding ends in august but i'm planning to defend in may it seems like some places are hiring right now for may and some are much more short term it's tough to make decisions with that sort of stratification! i would strongly recommend defending before you start tempting yourself with jobs that will want to hire you with false urgency from someone who accidentally found herself working full time before defending it's a bad idea oh yeah i had a few friends who did that too do not do that glad you got through it u shannonoh! haven't yet nearing the end writing code and learning new things to get the last bit done for a dissertation in the wee hours of the morning outside of family and work time is not very efficient oof good luck! thank you much! definitely a few friends who did ph d - industry were able to do an internship during the summer after grad school so you may want to apply to a few of those (which will probably mostly open in december but some might start soliciting soon) just to give yourself lots of options the internship often turns into a full time position in these cases but you'd be able to apply for other positions while you did that with bonus "industry experience" on your cv what will your topic be and what is your major job conference? jsm has happened but assa aea is still coming up very helpful for me trying to get into the data science field i personally want to utilize what i learn in the non profit field and hope to work on related projects to boost my resume my question is what do you think about gaps in my resume or that i worked for target? currently in a media analyst position but these were in the past this is assuming that i have more than just tertiary expertise in r sql and etc gaps in the cv: time matters the most recent gaps are bigger issues than gaps a while ago if they look like you got fired rather than you had a lot of other stuff on that may raise a flag as well employers like target - i would think that most recruiters wouldn't have an issue with it - it shows you can get up on time do a job and not get fired or arrested - which puts you ahead of some cv's perhaps maybe you might find some employers would turn their nose up - for example the private wealth arms of some banks can be weirdly niggly about peoples backgrounds and whether daddy had money and a bentley- but fuck them - it's a small minority gaps due to working on classified projects are a right pain it means your going to spend your whole interview saying "i can do that but i can't tell you about it" and also - i always want to ask for the gritty details which is very poor form if you've been on a classified position previously the hr team at your previous usually have a set of guidance on what you can and can't discuss - get hold of them and add as much information as you can within the strict limits of what your told you can say best thing to do is explain the gaps you don't have to beg for forgiveness about them or worry about them - most people have some sort of gap - just lay it out cooling and professionally don't use more than 2 lines of text would you recommend talking about the gaps in the cv resume or just be prepared to explain the gaps in the possible interview? i'm definitely ready to explain the gap (was taking time off to be with my family back home) maybe a bit late to the party here but since you mention you were looking for low level junior guys what do you consider to be the minimum they should be capable of? a lot of the job adverts i've seen mention something vague like "skills: python r" but i don't know quite what level they expect (similar to your "i know maths" example) what do you expect your new juniors to be able to do? it depends on the specific role really - which is often controlled by the industry for example - for my junior role i would expect someone to have good practical python skills first - and the assumption that they need more teaching in both the more complex mathmatical elements and also how to write "enterprise grade" python - i e graceful failure modes well optimised gentle on the discs etc by "practical" skills at a junior level i mean being able to put together something like a data restructuring tool ( i e generate metadata hash tabling recasting of elements spin the data back out in a different structure based on the meta ) within a day and ideally quicker than that building something like a first iteration of a clustering model ( i e before tweaking and tuning and refactoring) i would expect in a few hours something like a visualisation - i would be looking for something like generating an ugly time series with a high level library like mathplotlib in 1 2 hour or so flip side is that making a pretty time series visualization of the same data may take an expert a week and a junior a month equally a highly performant very tightly fitting but not over-fitted cluster model with a decent set of documentation and a confusion matrix might be a solid month or more of work for a junior - so they would probably really be 6 months of calendar of picking at it as they learn new skills and continue to refine it were as a senior grade would be doing that sort of work in a few days or a week at the other industry extreme - if you were doing something like a basel model in a large bank it might be ok to spend a month or even a lot more finding an extra 0 1% of precision of fit to an existing model which may actually be a mornings work giving a bump to a logistic regression and then 7 5 weeks checking re-checking and triple checking for every single possible edge case and still not being trusted when you say your done in no way is that because the bank work is easier or they hire different people - it's because the level of precision and accuracy needed between a r&d team and a multi-billion dollar regulatory bank model is very different - and it takes different types of people to fill the roles - i e a bank would probably not consider 3 4 of my team - even the most senior ones for even an entry level junior role but they are damn good data scientists and return millions of pounds of revenue each every year equally - the kind of person who has the mindset to find that last 100th of 1% in a risk model probably isn't the right kind of person for my team and i wouldn't take them on either i guess what i'm trying to say is that "context matters" but when your reading job adverts you really need a bit of industry context your name says you like stats: 40% of new mothers in the usa are unmarried - this has been the case for the last 6 years this is perfect thank you i'll add the details to my little sidebar notes that keeps me on track i suspected i'm not up to scratch with my practical programming and it seems so from your post so something i need to work on (i've fallen into the habit of just using crappy for loops for everything etc) i posted about this before and got flamed for it but will say again the vast majority of the time it doesn't really matter how you do something as long as it gets done there is a guy sitting next to me who writes wonderful perl and python the lady sitting opposite me write ugly sas ugly selcopy and messes about with ugly rust both of them do perhaps £5m of work a year each if you can do good maths and good science fairly quickly using a for loop then thats ok much better that than you witing lovely code which doesn't do much i'm curious about experimenting with data and being able to manipulate them and present useful things from them so i've been looking at suitable ways to get started i've found a few options notably some free courses on edx coursera udemy and udacity and the specialist sites dataquest and datacamp aside from some bits on statistics i haven't found any other mention of any maths being taught as part of these courses nor any mention of pre-requisites for maths knowledge is it that maths isn't actually a big part of data science? or is it that a grounding in maths for data science is meant to be acquired elsewhere and if so what is considered sufficient? the last time i did any maths work with any regularity was almost twenty years ago right now i probably wouldn't be comfortable with anything more than functional maths skills and introductory algebra so although i'd like to brush up on maths i don't want to get invested in learning data science only to find a few weeks or months in i'm hit with something that's significantly beyond my abilities whilst i'm here as i'm totally unsure of which course to start with could anybody offer their recommendation? as i said my maths knowledge is mostly functional; i don't have any programming knowledge but i'm comfortable with an editor and command line i think i'm set on python because of its wide application i like the idea of being able to 'do' stuff with data online (charts maps - fun stuff that would impress a 34-year old child) in my shoes what might you do? data science has a lot of components to it there isn't a ton of math in say visualization but there is in statistics and machine learning if you are looking for advice on where to get help with math i suggest you start by searching this sub for similar questions it's not so much help with maths - more trying to understand what the maths pre-requisites are in the context of doing any of these entry-level courses as someone who has just started taking courses online in data science i think linear matrix algebra could be tremendously useful knowing how to use matrices to represent large sets of equations sounds like something you should find use for in studying relations between variables any thoughts? thanks for your input i've only ever heard the word 'matrices' - i don't know anything about them the gist i've picked up so far is that the ideal 'minimum' maths knowledge for grasping the foundations of data science (e g 101) involves statistics probability and algebra do matrices also figure in to that or are they at a more advanced level of data science application? again i’m still learning and have no data science work experience yet so i can’t say much about what skills are actually used in the field but i’ve heard people say familiarity with matrices is very helpful here’s how i like to explain it simple linear equations are easy to solve with basic algebra skills but what if you’ve got 50 of them? you could use a matrix to keep track of all the coefficients in that ‘system’ of equations further there are several techniques that allow you to find solutions to the system without having to do the algebra for each equation individually hope that helps! i'm afraid my maths knowledge is too rusty to fully comprehend that explanation : but it doesn't sound like it's a level of knowledge that's out of reach for me i'll crack on and if when i hit a brick maths wall i guess i'll figure it out when i get there haha okay good luck! machine learning specifically cost functions here are the steps that i would recommend to you - learn basics of programming with python (maybe add in databases here as well) learn pandas learn scikit learn (start with basic algorithms ) from here you will be able to apply machine learning algorithms via python packages then if you want to dive into the math behind it you can go as deep as you want you will most likely need to know some stats probability topics to validate the data used in the model and model evaluation you don't really need to know linear algebra calculus to apply algorithms this is the path that i went i can apply any algorithm that has a python package because i came from a programming background i don't know all of the math that's going on behind the scenes but i do understand it at a high level i think it's way easier and more practical learning data science this way opposed to learning the math then programming thanks i don't know why you've been down-voted: your answer is helpful and addresses some aspects of my post tbh whilst i'm not particularly fond of maths it's something i'd like to work on i think i might prefer it if some courses included it in the learning for the practical reasons you pointed out quick context: i have been involved in higher education research through internships graduate school and working at two community colleges over the last 10 years in that time i conducted a logistic regression as part of a program evaluation for my master's thesis helped determine significant factors of enrollment using logistic regression used multiple regression to create residuals that would be used in part of a process to predict enrollment for the fiscal year and employed multiple regressions as part of my path analysis in my dissertation examining factors that may impact online community college students' decision to remain enrolled in college however when one of the interviewers asked me yesterday if i had ever tested any of these models i did not really have an answer i was always focused on r-squared values for model fit as oftentimes i did not have other data available to test out the results other than what i used to create the model i am concerned that i overstated my ability to do predictive analytics and am wondering if i should remove it from my resume what do you expect someone who lists predictive analytics as an area of expertise to be able to demonstrate? one of the problems they would expect me to answer on this job would be to look at factors that predict college enrollment and to be able to test the model is my knowledge of how to conduct regression analyses going to be enough or would i need a full data science background through extended training? my passion is to use data to help tell a story to make students successful and i thought that enrollment management analyst would be a great step in my career but after the interview yesterday i am wondering if i am promoting someone i am not but i may also be overreacting as well thanks for any advice you can provide tldr: do i demonstrate the skills one would expect for predictive analytics or what do i need to develop? you did a lot of the work associated with predicting modeling but the one piece you're missing conceptually is validation - it is what makes your model meaningful the answer in your case would be to split your dataset into a test set and a training set (30 70 20 80 whatever) -build the model on the training set and then try to run it on the test set you can get the error rate by dividing false positives & negatives over the total number of observations you predicted in the test set it's a relatively simple way to determine how your model would perform on "new" data and and important part to do to make sure you could deploy your model a slightly more complex method is cross validation (splitting your data into like 10 chunks and building models with each of them then testing each model against the rows that it wasn't built with averaging the errors at the end) and loocv understand those methods and look up the bias variance tradeoff and that is a solid conceptual foundation for predictive modeling don't you predict all observations in a model? how does one know how many they predicted in a regression? i get that there is an overlap between type 1 and 2 error and this is what the error rate is how is this different from looking at the difference in r-squared between two models in other words you would want no change in r squared if your hypothesis is model 1 predicts the same way as model 2 do you have any paper tutorial or r package that calculates non-predicted cases? i'm a little confused by this but interested edit: it's the caret package sorry my phrasing probably didn't help make my point clear you predict all observations you give a model to predict but you don't want to give it observations to predict which went into building the model itself that's why you set some aside when you're building it then apply your model to them to see if it's classifying those observations correctly on its own the ones it got incorrect divided by the total number of observations it was given to predict can be considered your error rate e g i have 100 rows i use 80 to build a model then run it on the other 20 to see how it does on "new" data after i've built it it gets 15 correct so the error rate would be 5 20 (25%) cross-validation is just using multiple different 80 20 combinations within the dataset for testing and averaging the error to get a more accurate idea of how it's performing i could build two models on the same data and then compare the error rates that they get on predicting new data to help me select "the best" one i can see this being useful for classification cluster analysis also when the data is not normal what do you mean by correct? do you set a threshold of what is in or out? is this threshold whatever alpha is? i'll have to think about this some more it seems to be a method when you don't know how you got the sample in front of you and need to check how much predictive power you have he referenced logistic regression which is a classification method your model spits out a value between 0 and 1 that is effectively the 'probability' of the observation being the 'goal' class (in business - this is usually applied to be the likelihood of an event happening like someone's likelihood to purchase or shed a product or default on a loan or something) in interpreting your model's predictions you can change your threshold to something other than 5 and see how your false positives negatives change (e g you're trying to predict people who will default on a loan you see there were a lot of false positives - where the model classified real non-defaulters as defaulters with a default likelihood of 59 so you can change your threshold to 6 instead of 50 and see what that does to your false positive negatives) in some applications false positives have more impact than false negatives so it's good to have some business intuition in the process gotcha now makes sense set a cut value based on the inputs to determine "yes" or "no" yes and no on the one hand it is clear that you have domain expertise that is huge and i think is a big plus in your favor (granted we dont know if other candidates have this experience) on the other hand it is clear that you didn't do the greatest job validating your model some ways to improve using a train test split as u packofwildcorgis mentions it is so critical to create your model with one set of data and validate on a different set of data see https: sebastianraschka com blog 2016 model-evaluation-selection-part1 html for more details translating the statistical performance of a model into some actionable idea this is something like "i looked at the top 10% of student by predicted (from my model) and found that they are like 8x more likely to drop-out than the average student" that is something that is more meaningful to a decision than saying "my r-squared is 6" this comes with the caveat that anything you report should be on data that was not used to fit your model (see point #1) i'll echo what other people said don't remove predictive analytics but get better at it i'm actually really happy to read these replies i'm writing a journal article on a logistic regression i did (analytics haven't really been utilized in this field for this purpose so it's a novel approach) and i've been terrified that when i submit it for review the reviewers are going to basically go "what the fuck is this guy thinking!? he has no idea what he's doing!!!!" but i actually performed model evaluation (mcfadden's psuedo-r2 auc and 5-fold cv) and the results seemed good (for the most part the kappa on my cv was really low which i'm not happy about but everything else seems good) so i'm way more confident now lol i am concerned that i overstated my ability to do predictive analytics and am wondering if i should remove it from my resume yes you did whether you should remove it is mostly a philosophical question i was always focused on r-squared values for model fit you'd have to go out of your way specific to a problem at hand to find a worse metric for predictive modeling it's not tied to an outcome and it gives you garbage feedback i am not being facetious - go take one of your models check the r-squared add 10 variables of random noise and re-check your r-squared it got "better" i'm not trying to be overly harsh here but you don't even understand the fundamentals of the most basic model used in predictive analytics - what you'd learn your first month of your first class even tangentially related to pa edit - saying you do predictive analytics without knowing how to validate is like saying you work in finance but don't know what interest is don't take it down take a 10h course on machine learning in r or python or do some focused tutorials to catch up on other algorithms like random forests svm and maybe some easy keras examples and you are good to go with your existing knowledge it will be rather easy to get into these topics yes this the difference between a data scientist and a statistician is that a data scientist has an unrelated degree and a 10 hour data scientist mooc a statistician has a degree in statistics hiring managers mostly don't know the difference so take the mooc and profit christ i know man it sucks but right now the industry loves hiring 24 year old guys with social science degrees and some ds moocs as data scientists they spin up amazon instances run up huge bills create unvalidated models and generally run amok but it sure beats spending the money and effort to get a stats degree that stuff is hard i haven't found that to be the case personally most data scientists being hired at the big boys that know what they're doing have phds or ms at minimum usually in stem of some sort they experienced the problem you're talking about or at least are aware of it and now it's hard for even data science veterans to find a role due to how careful people are when selecting candidates not that you won't find work as it's pretty easy if you put in some time but you need to cast a wider net because you might get some specific question wrong in the phone screen or not impress them immediately on the specific language stack they're using or something i've been pretty successful at my job for four years at least judging by feedback (i don't mean to sound arrogant i am not if you get to know me) but even i have problems when looking for new jobs i don't have many examples of that i know of personally but the one data scientist i know that has an economics degree and only a bs actually knows his stats pretty well i wouldn't write him off just because he majored in a soft science and doesn't have a graduate education in this case he made up for that by showing off what he can do that being said i have noticed some consulting startups call their staff "data scientists" but they actually are more on the data hacker side of things perhaps using bootcamps and the like sometimes there is title inflation in that sort of field since they look to impress potential clients where do you live? this is definitely not the case in the south (us) from my experience the data scientists here with social science degrees are heavy on matching (propensity cem) experimental design and submitting papers for peer review; which i think is completely appropriate if they have a solid stats background the only peers i meet that are doing any significant ml all have hard science or math degrees we hire lots of ss people as we do a lot of path analysis causal diagrams matching etc for 'caucal inference' with observational data it's definitely more stats work but everyone has the job title of data scientist the core ds folks tend to be phd's in physics stats or cs also don't forget to bring a macbook on the interview don't expect anybody to take you seriously without a macbook he said he wants to put "predictive analytics" on his cv not work as a data scientist which incorporates far more than just predictive analytics with your existing knowledge it will be rather easy to get into these topics he doesn't understand cross validation he doesn't understand basic pa metrics this is one of the worst responses i've seen on this forum anyone upvoting this is doing so because they're in a similar spot and are suffering from wishful thinking that they can be a predictive analytics person after a 10 hr course but hey you're welcome to "fake it til you make it" if you're comfortable with that ethically yes you overstated your ability "i used logistic regression and multiple regression and know what an r-squared is" is very very far from a predictive analytics expert this is hs level stats also: r-squared is not an appropriate metric for evaluating your model in a prediction setting read up on the topics of "overfitting" and "cross-validation" did i overpromote my abilities by listing predictive analytics as an area of expertise? only time will tell s i think where disconnect between your skill statement and the definition of 'predictive analytics' would be easy enough to bridge if you spend some time learning how you'd go from logistic as a technique to better more nuanced classification techniques such as svm thats the usual flow when it comes to working in such engagements the tools could be r python or even alteryx so i'd say you are okay granted you spend some time on those techniques and some basic ml shouldn't you already know? i expect anyone who put "predictive modelling" on their resume to have the least knowledge as per the book "introduction to statistical learning" i'd add applied predictive modeling to the list it's far more focused on "here's a problem here's what we tried and here's what worked and how we know it did " by far the best pa book imo