# Project Write-up ## Project Overview: Because the US is founded on a rich immigration culture since the pilgrims came to the east coast in the 17th century, places like the name of cities reflect such information on the area in which those cultures influence. The aim of this project is to find out the hidden pattern behind the city names in the US in terms of the language origin of that name, and connect those patterns with the pattern or history of immigrating cultures behind those languages to see how American history plays a role in the city names. The data source used for this project comes from a [world cities dataset](https://www.kaggle.com/datasets/viswanathanc/world-cities-datasets/data) that includes all cities in the world, we transfrom it to include only cities in the US that shares the same name as cities in other countries for simplicity (see details on [here](https://github.com/fjiang316/dsc106-project3/blob/main/project3_dataset_notebook.ipynb)) ## Graph Decision: Since we are comparing the geographical patterns and the influence of various cultures of different languages on the city name in the US, the most effective way to conduct this purpose is through having a geographical map to show the positional trend with dots. For clarity, we also included checkboxes representing some popular cultures on the right side of the graph so that the readers can access the pattern of different language influences one at a time to see the trend more clearly. After that, they can select multiple checkboxes to compare the patterns. We also made it so that whenever a checkbox is clicked, other points for cities with names not originating from that culture disappear on the graph, so that we reduce noises that would disrupt the pattern from being seen clearly. For easy interpretation, we also provide descriptions at the bottom so that whenever certain boxes get checked, we offer some context about that selected culture to provide an easier time to interpret the pattern seen. Moreover, for more detailed view on each of those cities, we added tooltip interaction so that whenever readers hover over one of the points, the city name, state, and the names of foreign countries that has cities of the same name will appear in a rectangular box, so that it adds credibility to the plot and provide more insights for the reader. Since there are many points, we also added the effect so that the point that gets hovered on will turn to green color, so that it makes it clear to the reader which city they are checking for at the moment. Because there are a lot of points clustered on east and west, so to provide better viewing in distinguishing the points, a zoom in effect is also added so that the reader can zoom in on any parts of the map and drag and pull to see the cities in detail. ## Other Alternatives Considered: We have also considered a bar chart in which we listed the bars according to the number of city names from each language origin, but that way we lose the geographical pattern of those data, which is also an important point the graph tries to convey. A geographical map with dots representing locations for each city preserves both the number (in clusters of points) and the geographical trend, which says more about the direction of spread of certain cultures, and we can include details about each city. Also, for the map, we thought about using a world map to highlight both the foreign countries and the US cities, but that would make the readers lose focus and the cities would appear too small on the entire plot to see, so focusing on only the US map seems to be the best choice. ## Development Process: During the development process, both members contribute in coming up with ideas on the dataset and ways to present the visualization with various helpful elements. Specifically, Evelyn contributes in loading the map and zoom effect, the dots represent cities and the hover text, and the description text for the languages. Feiyang contributed in transforming the dataset, adding a checkbox effect on the color and visibility of the dots, and deploying the website with write-up included. Each spent about 10 hours for the entire development process. The development process starts with a rough idea on a checkbox interaction plot, and gradually contains more interaction elements to serve the purpose of the plot better. The most difficult part during the development process was to add interactions to the dataset, and the interactions took up the most amount of time during the entire process.