# Data processing in Python In this assignment, we will ask you to elaborate on the code we have started developing in these [challenges](https://ocademy-ai.github.io/machine-learning/data-science/working-with-data/numpy.html#your-turn). The assignment consists of two parts: ## COVID-19 spread modeling - [ ] Plot $R_t$ graphs for 5-6 different countries on one plot for comparison, or using several plots side-by-side - [ ] See how the number of deaths and recoveries correlate with a number of infected cases. - [ ] Find out how long a typical disease lasts by visually correlating infection rate and death rate and looking for some anomalies. You may need to look at different countries to find that out. - [ ] Calculate the fatality rate and how it changes over time. *You may want to take into account the length of the disease in days to shift one time series before doing calculations* ## COVID-19 papers analysis - [ ] Build a co-occurrence matrix of different medications, and see which medications often occur together (i.e. mentioned in one abstract). You can modify the code for building a co-occurrence matrix for medications and diagnoses. - [ ] Visualize this matrix using a heatmap. - [ ] As a stretch goal, visualize the co-occurrence of medications using the [chord diagram](https://en.wikipedia.org/wiki/Chord_diagram). [This library](https://pypi.org/project/chord/) may help you draw a chord diagram. - [ ] As another stretch goal, extract dosages of different medications (such as **400mg** in *taking 400mg of chloroquine daily*) using regular expressions and build dataframe that shows different dosages for different medications. **Note**: consider numeric values that are in close textual vicinity of the medicine name. ## Rubric Exemplary | Adequate | Needs Improvement --- | --- | -- | All tasks are complete, graphically illustrated and explained, including at least one of two stretch goals | More than 5 tasks are complete, no stretch goals are attempted, or the results are not clear | Less than 5 (but more than 3) tasks are complete, visualizations do not help to demonstrate the point ## Acknowledgments Thanks to Microsoft for creating the open-source course [Data Science for Beginners](https://github.com/microsoft/Data-Science-For-Beginners). It inspires the majority of the content in this chapter.