{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Final Project: Big Data Science" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project Overview" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In the final project, we hope everyone can think of yourself as a real-world data scientist. Your goal is to come up with some interesting questions, find the right datasets, and implement a data-science pipeline to answer those questions. In order to achieve this, please follow the following steps:\n", "\n", "1. Form a data-science team of **3-5** persons (the same group as your blog post). \n", "2. Pick up a project idea and write a proposal\n", "3. Give a 5-min Milestone Presentation\n", "5. Present your project in the poster session\n", "6. Submit your code, video, and report \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Todo List" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following table summarizes the TODO list of the final project. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "ID |
What
|
When
|
Where
\n", "-- | ------------------------ | ------------------------- | ------------------\n", "1 | Proposal | Thu 02/16 at 11:59 PM | Submit the filled [form](https://1sfu-my.sharepoint.com/:w:/r/personal/sbergner_sfu_ca/Documents/Teaching/cmpt-733/733-sp23/Proposal.docx?d=w1c8198e98ded4d14aae57b65d170cc37&csf=1&web=1&e=EvHgCC) to CourSys\n", "2 | Milestone | Thursday 03/9 at 9:30 AM | Submit your poster and presentation video to CourSys\n", "3 | Poster Presentation | Tuesday 04/11 at 8:00 AM
Tuesday 04/11 at 10:45 AM | Submit your poster to CourSys
Present your project in ASB 10900\n", "4 | Final Report | Wednesday 04/12 at 11:59 PM | Submit your video and report to CourSys\n", " \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project Ideas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To evaluate whether your project is good or not, please ask yourself the following three questions:\n", "\n", "1. Is it important? (i.e., what impacts can your project make?)\n", "2. Is it challenging? (i.e., does a naive solution work very well?)\n", "3. What can I learn by doing the project? (i.e., new tools, new techniques, new domain knowledge, new methodologies). \n", "\n", "A good project should be important, be challenging, and be able to push you to learn something that you don't know before. \n", "\n", "Note that you need to conduct a *deep analysis* of the data. By deep analysis, I mean you have to think deeply about your analysis results, and report some **insightful and reliable findings**. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a list of project ideas from previous years. I do not recommend selecting from the list since they have been done by former students. \n", "\n", "0. [Machine learning based surveillance system using transfer learning for rare diseases (2021)](./Machine-learning-based-surveillance-system-using-transfer-learning-for-rare-diseases.pdf)\n", "1. [Automatic Hierarchical Time-Series Forecast at Different Aggregation Levels for Fashion Products (2020)](./Auto_Forecast_Hierarchical.pdf)\n", "2. [Elevator Anomaly Detection System (2020)](./Elevator-Anomaly-Detection-System.pdf)\n", "3. [Explain Data and Interpretable Machine Learning (2020)](./Explain-Data-and-Interpretable-Machine-Learning.pdf)\n", "4. [Explore the Impact of Weather on Short-time Demand Forecast for Fashion Retailers (2020)](./Weather_Enriched_Short_Term_Fashion_Forecast.pdf)\n", "5. [Incident Social Listening (2020)](./Incident-Social-Listening.pdf)\n", "6. [Machine Learning Applied to Web Scraping (2020)](./Machine-Learning-Applied-to-Web-Scraping.pdf)\n", "7. [Medical Language Understanding (2020)](./Medical-Language-Understanding.pdf)\n", "8. [Model Fairness & Transparency (2020)](./Model-Fairness-&-Transparency.pdf)\n", "9. [Object-Detection-in-X-Ray-Images (2020)](./Object-Detection-in-X-Ray-Images.pdf)\n", "\n", "10. [Analyzing Social media user interaction (2019)](./Analyzing-Social-media-user-interaction.pdf)\n", "11. [LTF - Big Data Financial Analysis (2019)](./LTF-Big-Data-Financial-Analysis.pdf)\n", "12. [Measuring Observable Influence and Impact of Scientific Research Beyond Academia (2019)](./Measuring-observable-influence-and-impact-of-scientific-research-beyond-academia.pdf)\n", "13. [A prototype Canadian Natural Hazards Database (2018)](./A-prototype-Canadian-Natural-Hazards-Database.pdf)\n", "14. [Automated Feature Detection of Aerial Imagery from South Pacific (2018)](./Automated-Feature-Detection-of-Aerial-Imagery-from-South-Pacific.pdf)\n", "15. [Fall Detection using wearable sensor data (2018)](./Fall-Detection-using-wearable-sensor-data.pdf)\n", "16. [Machine learning to detect misstated financial statements (2018)](./Machine-learning-to-detect-misstated-financial-statements.pdf)\n", "17. [Predicting Soccer games and tournaments (2018)](./Predicting-Soccer-games-and-tournaments.pdf)\n", "18. [Predictive Maintenance on IOT devices (2018)](./Predictive-Maintenance-on-IOT-devices.pdf)\n", "19. [Property value prediction with market data (2018)](./Property-value-prediction-with-market-data.pdf)\n", "20. [Topic modeling and visualization of news comments (2018)](./Topic-modeling-and-visualization-of-news-comments.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step-by-step Instruction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Proposal (5 points)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "* Download the Initial Plan [form template](https://1sfu-my.sharepoint.com/:w:/r/personal/sbergner_sfu_ca/Documents/Teaching/cmpt-733/733-sp23/Proposal.docx?d=w1c8198e98ded4d14aae57b65d170cc37&csf=1&web=1&e=EvHgCC) \n", "* Submit the filled form to CourSys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Milestone Presentation (10 points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Communication skills are super important for data scientists. Please use this opportunity to practice your communication skills.\n", "\n", "You can think of this presentation as a mid-term report for your project. Your presentation should consist of three parts:\n", "\n", "1. `Motivation (2)`\n", " * Why is it an important project? (1)\n", " * Why is it challenging? (1)\n", " \n", "2. `Progress Report (2)` \n", " * What have you done so far? (1) You need to provide evidence (e.g., screenshots of repo and commits, an initial demo) for your progress.\n", " * Is it on schedule? (1) You need to show the entire project schedule and point to where you are. \n", " \n", "3. `Future work (2)` \n", " * What do you plan to do next? (1) You need to show the detailed schedule of the remaining part of the project.\n", " * How to mitigate risks? (1) You also need to discuss whether there is any risk to complete the project on time. \n", "\n", "\n", "Imagine your manager (who knows little about the technical part of data science) is sitting in the audience, you need to explain your complex project to your manager in a simple way, and make her/him feel excited about it. \n", "\n", "* `Did you convey complex information in a simple way? (2)`\n", "* `Did you excite and motivate the audience? (2)`\n", "\n", "\n", "Search \"how to give a good talk\" on Google. You will find a lot of good advice. Use them to improve your presentation. \n", "\n", "**Submission**\n", "\n", "* Make a video to record your presentation (The format is similar to [Example 1](https://www.youtube.com/watch?v=slVJ6vCAlEI) and [Example 2](https://www.youtube.com/watch?v=WHZEmDe2IAE)). \n", "\n", "* The video length should be within 5 mins (but longer than 4 mins). \n", "\n", "* Submit your PPT and video (Youtube URL) to CourSys. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Poster Session (20 points)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is showtime! Make a poster to present your data product. Please make your poster look as professional as possible. Here are a few things that you can put on the poster (10 points):\n", "\n", "* Why do you do this project?\n", "\n", "* What questions do you try to answer? \n", "\n", "* What's your methodology to get the answers? \n", "\n", "* What datasets/tools do you use? \n", "\n", "* What's your data-science pipeline like?\n", "\n", "* Why is your solution good? Why does your result make sense?\n", "\n", "* What's your data product?\n", "\n", "* What have you learned through the project? \n", "\n", "* What do you plan to do if you have more time? \n", "\n", "Design tips:\n", "* https://www.brightcarbon.com/blog/effective-academic-posters-powerpoint/\n", "* Use high-quality images that help draw attention and convey important information.\n", "* Keep it simple and uncluttered, use white space, limit text and images, and avoid distracting design elements.\n", "* Use a clear and concise title to convey the topic of your project.\n", "* Organize your content logically: Your poster should be organized in a way that is easy to follow and understand. Use headings, subheadings, and bullet points to guide the reader.\n", "* Use a hierarchy of font sizes with a font that's easy-to-read from a distance, sans-serif works well.\n", "\n", "During the poster session (10 points), you will be given 5 mins to present your poster, and TAs and instructors will ask a few questions after your presentation.\n", "\n", "**Submission**\n", "\n", "The poster session is scheduled on Tuesday April 11th, 2023. Please upload your poster to CourSys before 8:00 AM." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Video & Code & Report (30 points)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Code (10 points)**\n", "\n", "Like CMPT 732, you must use a Git repository for your project. The department's [GitLab](https://coursys.sfu.ca/2019fa-cmpt-732-g1/pages/GitLab) server is a good way to get one (instructions at that link). Group members must commit their own contributions to the repo. You are encouraged to publicize and open-source your work on [GitHub](https://github.com/) or similar. \n", "\n", "In your repository, please include a file README.txt (or README.md if you prefer) indicating how we can actually test your project as well as other notes about things we should look for. If you created some kind of web frontend, please include a URL in the README.md as well." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Report (10 points)**\n", "\n", "You need to submit a report giving an overview of your project. The report should have at least 2500 words with the following structure:\n", "\n", "* `Project Title:` Come up with an attractive project title (see this [page](https://blog.hubspot.com/marketing/a-simple-formula-for-writing-kick-ass-titles-ht#sm.0000072w4nzrqmfn2vwd6y30afmyo) for some tips);\n", "* `Motivation and Background:` Who cares about this project? Any related work? \n", "* `Problem Statement:` What questions do you want to answer? Why are they challenging? \n", "* `Data Science Pipeline:` What's your data-science pipeline like? Describe each component in detail.\n", "* `Methodology:` What tools or analysis methods did you use? Why did you choose them? How did you apply them to tackling each problem? \n", "* `Evaluation:` Why is your solution good? Why does your result make sense? \n", "* `Data Product:` What's your data product? Please demonstrate how it works.\n", "* `Lessons Learnt:` What did you learn from this project? \n", "* `Summary:` A high-level summary of your project. It should be self-contained and cover all the important aspects of your project.\n", "\n", "Please choose A or B:\n", "\n", "* [A]. Write your report on Medium and publish it on [SFU Big Data Science Publication](https://medium.com/sfu-cspmp/tagged/big-data)\n", "* [B]. Write your report in PDF and submit it to CourSys" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Video (10 points)**\n", "\n", "Please make an attractive video to introduce your project. Here are some requirements:\n", "\n", "1. The video length should 3 mins for ideal viewer attention, but can be up to 5 minutes.\n", "2. Explain why this is an important project\n", "3. List the questions you want to answer as well as the datasets you collected\n", "4. Give a high-level idea on how you use data science skills to answer those questions \n", "5. Need to show the conclusion of your project\n", "6. Put your contact information at the end of the video\n", "\n", "You can get some inspirations from [KDD 2017 Promotional Videos](https://www.youtube.com/watch?v=wSMaDSQjCG4&list=PLliTSxmRFGVPkvUZb3Q-DzvOgs20LOinA), [KDD 2018 Promotional Videos](https://www.youtube.com/watch?v=7hOxFSxdS3k&list=PLQKvj098sSkomub2IQ78cq8AHU692mIK-), [2018 Project Showcase](https://sfu-db.github.io/bigdata-cmpt733/final-project-sp18.html), and [2019 Project Showcase](https://sfu-db.github.io/bigdata-cmpt733/final-project-sp19.html)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Submission**\n", "\n", "We will create a web page on the course website and put your projects there. On the page, we will put a project title, a project summary, and the three URLs that link to your codebase, video, and final report. Please submit your project title, project summary, final report (Medium URL or PDF), code (Github/GitLab URL), and video (Youtube URL) to CourSys.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0 (main, Nov 15 2022, 05:43:36) [Clang 14.0.0 (clang-1400.0.29.202)]" }, "vscode": { "interpreter": { "hash": "1a1af0ee75eeea9e2e1ee996c87e7a2b11a0bebd85af04bb136d915cefc0abce" } } }, "nbformat": 4, "nbformat_minor": 1 }