{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring 67 years of LEGO\n", "> The Rebrickable database includes data on every LEGO set that ever been sold; the names of the sets, what bricks they contain, what color the bricks are, etc. It might be small bricks, but this is big data! In this project, you will get to explore the Rebrickable database. To do this you need to know your way around pandas dataframes This is the Result of Project \"Exploring 67 years of LEGO\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science]\n", "- image: images/lego-bricks.jpeg" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "1d0b086e6c" }, "deletable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 1. Introduction\n", "

Everyone loves Lego (unless you ever stepped on one). Did you know by the way that \"Lego\" was derived from the Danish phrase leg godt, which means \"play well\"? Unless you speak Danish, probably not.

\n", "

In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!

\n", "\n", "![lego](image/lego-bricks.jpeg)" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "044b2cef41" }, "deletable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 2. Reading Data\n", "

A comprehensive database of lego blocks is provided by Rebrickable. The data is available as csv files and the schema is shown below.

\n", "\n", "![schema](image/downloads_schema.png)\n", "\n", "

Let us start by reading in the colors data to get a sense of the diversity of lego sets!

" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "dc": { "key": "044b2cef41" }, "tags": [ "sample_code" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamergbis_trans
0-1Unknown0033B2f
10Black05131Df
21Blue0055BFf
32Green237841f
43Dark Turquoise008F9Bf
\n", "
" ], "text/plain": [ " id name rgb is_trans\n", "0 -1 Unknown 0033B2 f\n", "1 0 Black 05131D f\n", "2 1 Blue 0055BF f\n", "3 2 Green 237841 f\n", "4 3 Dark Turquoise 008F9B f" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import modules\n", "import pandas as pd\n", "\n", "# Read colors data\n", "colors = pd.read_csv('dataset/colors.csv')\n", "\n", "# Print the first few rows\n", "colors.head()" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "15c1e2ce38" }, "deletable": false, "editable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 3. Exploring Colors\n", "

Now that we have read the colors data, we can start exploring it! Let us start by understanding the number of colors available.

" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "dc": { "key": "15c1e2ce38" }, "tags": [ "sample_code" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "135\n" ] } ], "source": [ "# How many distinct colors are available?\n", "num_colors = len(colors.name.unique())\n", "print(num_colors)" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "a5723ae5c2" }, "deletable": false, "editable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 4. Transparent Colors in Lego Sets\n", "

The colors data has a column named is_trans that indicates whether a color is transparent or not. It would be interesting to explore the distribution of transparent vs. non-transparent colors.

" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "dc": { "key": "a5723ae5c2" }, "tags": [ "sample_code" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamergb
is_trans
f107107107
t282828
\n", "
" ], "text/plain": [ " id name rgb\n", "is_trans \n", "f 107 107 107\n", "t 28 28 28" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# colors_summary: Distribution of colors based on transparency\n", "colors_summary = colors.groupby('is_trans').count()\n", "colors_summary" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "c9d0e58653" }, "deletable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 5. Explore Lego Sets\n", "

Another interesting dataset available in this database is the sets data. It contains a comprehensive list of sets over the years and the number of parts that each of these sets contained.

\n", "\n", "![sample](image/1k4PoXs.png)\n", "\n", "

Let us use this data to explore how the average number of parts in Lego sets has varied over the years.

" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "dc": { "key": "c9d0e58653" }, "tags": [ "sample_code" ] }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD4CAYAAADrRI2NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAARnklEQVR4nO3deZBlZX3G8e8joKAS0UyrFNAOWEgklAtpjRW3CGoQVDRlEqiYELdJuZvE6BhSav6wCo17aamjTsANF8QlIUZwpVKl4IAgICKoEx0hzhBKcQXRX/64Z7Rtu3tuN3Pumdvv91PV1ee89/R9f++cmWfePvcsqSokSe24zdAFSJImy+CXpMYY/JLUGINfkhpj8EtSY/YeuoBxrFu3rtavXz90GZI0VS666KLrq2pmYftUBP/69evZsmXL0GVI0lRJ8j+LtXuoR5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDWmt+BPsjnJ9iSXL2h/bpKrklyR5FV99S9JWlyfM/7TgePmNyR5BHAicJ+q+n3g1T32L0laRG/BX1XnAzcsaH4mcFpV3dRts72v/iVJi5v0lbv3Ah6a5BXAz4AXVtWXFtswyQZgA8Ds7OzkKtSqrd94zmB9bz3thMH6lqbNpD/c3Ru4M/Ag4B+BDybJYhtW1aaqmququZmZ37rVhCRplSYd/NuAs2vkQuCXwLoJ1yBJTZt08H8UOAYgyb2A2wLXT7gGSWpab8f4k5wJ/DGwLsk24GXAZmBzd4rnzcAp5dPeJWmiegv+qjp5iZee3FefkqRd88pdSWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1Jjegv+JJuTbO+etrXwtRcmqSQ+b1eSJqzPGf/pwHELG5McAjwK+HaPfUuSltBb8FfV+cANi7z0OuBFgM/alaQBTPQYf5LHA9+tqksn2a8k6dd6e9j6QkluD5wKPHrM7TcAGwBmZ2d7rEyS2jLJGf89gUOBS5NsBQ4GLk5y98U2rqpNVTVXVXMzMzMTLFOS1raJzfir6jLgrjvXu/Cfq6rrJ1WDJKnf0znPBL4AHJFkW5Kn9dWXJGl8vc34q+rkXby+vq++JUlL88pdSWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5Jakyfj17cnGR7ksvntf1rkq8l+UqSjyQ5oK/+JUmL63PGfzpw3IK284Cjquo+wNeBl/TYvyRpEb0Ff1WdD9ywoO3cqrqlW/0icHBf/UuSFjfkMf6nAp9Y6sUkG5JsSbJlx44dEyxLkta2QYI/yanALcB7l9qmqjZV1VxVzc3MzEyuOEla4/aedIdJTgEeCxxbVTXp/iWpdRMN/iTHAS8GHl5VP5lk35KkkT5P5zwT+AJwRJJtSZ4GvAnYHzgvySVJ3tpX/5KkxfU246+qkxdpfmdf/UmSxuOVu5LUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNWbit2yQ+rB+4zmD9Lv1tBMG6Ve6NZzxS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSY8YK/iRHrfSNk2xOsj3J5fPa7pLkvCRXd9/vvNL3lSTdOuPO+N+a5MIkz0pywJg/czpw3IK2jcCnq+pw4NPduiRpgsYK/qp6CPCXwCHAliTvS/KoXfzM+cANC5pPBM7ols8AnrCyciVJt9bYx/ir6mrgn4EXAw8H3pjka0n+dAX93a2qruve7zrgrkttmGRDki1JtuzYsWMFXUiSljPuMf77JHkdcCVwDPC4qrp3t/y6Pgqrqk1VNVdVczMzM310IUlNGnfG/ybgYuC+VfXsqroYoKquZfRbwLi+l+RAgO779pUUK0m69cYN/uOB91XVTwGS3CbJ7QGq6t0r6O/jwCnd8inAx1bws5Kk3WDc4P8UsN+89dt3bUtKcibwBeCIJNuSPA04DXhUkquBR3XrkqQJGvcJXPtW1Y92rlTVj3bO+JdSVScv8dKx4xYnSdr9xp3x/zjJ0TtXkvwB8NN+SpIk9WncGf8LgA8lubZbPxD4i35KkiT1aazgr6ovJfk94AggwNeq6ue9ViZJ6sW4M36ABwDru5+5fxKq6l29VCVJ6s1YwZ/k3cA9gUuAX3TNBRj8kjRlxp3xzwFHVlX1WYwkqX/jntVzOXD3PguRJE3GuDP+dcBXk1wI3LSzsaoe30tVkqTejBv8L++zCEnS5Ix7Oufnk9wDOLyqPtVdtbtXv6VJkvow7m2ZnwGcBbytazoI+GhfRUmS+jPuh7vPBh4M3Ai/eijLkg9RkSTtucYN/puq6uadK0n2ZnQevyRpyowb/J9P8k/Aft2zdj8E/Ht/ZUmS+jJu8G8EdgCXAX8L/Ccre/KWJGkPMe5ZPb8E3t59SZKm2Lj36vkWixzTr6rDdntFkqRereRePTvtC/wZcJfVdprk74CnM/rP5DLgKVX1s9W+nyRpfGMd46+q/5v39d2qej1wzGo6THIQ8DxgrqqOYnQh2EmreS9J0sqNe6jn6Hmrt2H0G8D+t7Lf/ZL8nNGD26/dxfaSpN1k3EM9r5m3fAuwFfjz1XRYVd9N8mrg24ye23tuVZ27cLskG4ANALOzs6vpqlnrN54zdAnNGPLPeutpJwzWt6bbuGf1PGJ3dZjkzsCJwKHA9xk9y/fJVfWeBX1uAjYBzM3NebGYJO0m4x7q+fvlXq+q166gz0cC36qqHd17nw38EfCeZX9KkrRbrOSsngcAH+/WHwecD3xnFX1+G3hQd4fPnwLHAltW8T6SpFVYyYNYjq6qHwIkeTnwoap6+ko7rKoLkpwFXMzo84Iv0x3SkST1b9zgnwVunrd+M7B+tZ1W1cuAl6325yVJqzdu8L8buDDJRxhddPVE4F29VSVJ6s24Z/W8IskngId2TU+pqi/3V5YkqS/j3p0TRhda3VhVbwC2JTm0p5okST0a99GLLwNeDLyka9oHT7+UpKk07oz/icDjgR8DVNW13LpbNkiSBjJu8N9cVUV3a+Ykd+ivJElSn8YN/g8meRtwQJJnAJ/Ch7JI0lQa96yeV3fP2r0ROAJ4aVWd12tlu4k30ZKk37TL4E+yF/DJqnokMBVhL0la2i4P9VTVL4CfJLnTBOqRJPVs3Ct3fwZcluQ8ujN7AKrqeb1UJUnqzbjBf073JUmacssGf5LZqvp2VZ0xqYIkSf3a1TH+j+5cSPLhnmuRJE3AroI/85YP67MQSdJk7Cr4a4llSdKU2tWHu/dNciOjmf9+3TLdelXV7/RanSRpt1s2+Ktqrz46TXIA8A7gKEa/STy1qr7QR1+SpN807umcu9sbgP+qqicluS2je/1LkiZg4sGf5HeAhwF/A1BVN/Obz/OVJPVoiBn/YcAO4N+S3Be4CHh+Vf14/kZJNgAbAGZnZydepLSnG+oGhN58cPqt5NGLu8vewNHAW6rq/oxuAbFx4UZVtamq5qpqbmZmZtI1StKaNUTwbwO2VdUF3fpZjP4jkCRNwMSDv6r+F/hOkiO6pmOBr066Dklq1VBn9TwXeG93Rs83gacMVIckNWeQ4K+qS4C5IfqWpNYNcYxfkjQgg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1ZqgLuJow1E20JGk5zvglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxgwV/kr2SfDnJfwxVgyS1aMgZ//OBKwfsX5KaNEjwJzkYOAF4xxD9S1LLhprxvx54EfDLpTZIsiHJliRbduzYMbnKJGmNm3jwJ3kssL2qLlpuu6raVFVzVTU3MzMzoeokae0bYsb/YODxSbYC7weOSfKeAeqQpCZNPPir6iVVdXBVrQdOAj5TVU+edB2S1CrP45ekxgz6BK6q+hzwuSFrkKTWOOOXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxkw8+JMckuSzSa5MckWS50+6Bklq2RCPXrwF+IequjjJ/sBFSc6rqq8OUIskNWfiM/6quq6qLu6WfwhcCRw06TokqVWDPmw9yXrg/sAFi7y2AdgAMDs7O9G6JC1t/cZzhi5h4raedsLQJexWg324m+SOwIeBF1TVjQtfr6pNVTVXVXMzMzOTL1CS1qhBgj/JPoxC/71VdfYQNUhSq4Y4qyfAO4Erq+q1k+5fklo3xIz/wcBfAcckuaT7On6AOiSpSRP/cLeq/hvIpPuVJI145a4kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0Z9CZtkjQNhrwxXR83iHPGL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjRnqYevHJbkqyTVJNg5RgyS1aoiHre8FvBl4DHAkcHKSIyddhyS1aogZ/wOBa6rqm1V1M/B+4MQB6pCkJg1xk7aDgO/MW98G/OHCjZJsADZ0qz9KctWCTdYB1/dS4TAcz55vrY1prY0H1t6Y1uWVt2o891iscYjgzyJt9VsNVZuATUu+SbKlquZ2Z2FDcjx7vrU2prU2Hlh7Y+prPEMc6tkGHDJv/WDg2gHqkKQmDRH8XwIOT3JoktsCJwEfH6AOSWrSxA/1VNUtSZ4DfBLYC9hcVVes4q2WPAw0pRzPnm+tjWmtjQfW3ph6GU+qfuvwuiRpDfPKXUlqjMEvSY2ZuuBfK7d7SLI1yWVJLkmypWu7S5Lzklzdfb/z0HUuJcnmJNuTXD6vbdH6M/LGbp99JcnRw1W+tCXG9PIk3+320yVJjp/32ku6MV2V5E+GqXppSQ5J8tkkVya5Isnzu/ap3E/LjGea99G+SS5Mcmk3pn/p2g9NckG3jz7QnQhDktt169d0r69fVcdVNTVfjD4M/gZwGHBb4FLgyKHrWuVYtgLrFrS9CtjYLW8EXjl0ncvU/zDgaODyXdUPHA98gtE1HA8CLhi6/hWM6eXACxfZ9sju79/tgEO7v5d7DT2GBTUeCBzdLe8PfL2reyr30zLjmeZ9FOCO3fI+wAXdn/0HgZO69rcCz+yWnwW8tVs+CfjAavqdthn/Wr/dw4nAGd3yGcATBqxlWVV1PnDDgual6j8ReFeNfBE4IMmBk6l0fEuMaSknAu+vqpuq6lvANYz+fu4xquq6qrq4W/4hcCWjK+encj8tM56lTMM+qqr6Ube6T/dVwDHAWV37wn20c9+dBRybZLGLYpc1bcG/2O0eltvxe7ICzk1yUXd7CoC7VdV1MPpLDtx1sOpWZ6n6p32/Pac79LF53uG3qRpTd0jg/oxmlFO/nxaMB6Z4HyXZK8klwHbgPEa/mXy/qm7pNplf96/G1L3+A+B3V9rntAX/WLd7mBIPrqqjGd2l9NlJHjZ0QT2a5v32FuCewP2A64DXdO1TM6YkdwQ+DLygqm5cbtNF2va4MS0ynqneR1X1i6q6H6O7GDwQuPdim3Xfd8uYpi3418ztHqrq2u77duAjjHb493b+at193z5chauyVP1Tu9+q6nvdP8xfAm/n14cKpmJMSfZhFJLvraqzu+ap3U+LjWfa99FOVfV94HOMjvEfkGTnBbbz6/7VmLrX78T4hyd/ZdqCf03c7iHJHZLsv3MZeDRwOaOxnNJtdgrwsWEqXLWl6v848NfdWSMPAn6w81DDnm7BMe4nMtpPMBrTSd1ZFocChwMXTrq+5XTHft8JXFlVr5330lTup6XGM+X7aCbJAd3yfsAjGX128VngSd1mC/fRzn33JOAz1X3SuyJDf6q9ik/Bj2f0af43gFOHrmeVYziM0dkGlwJX7BwHo2N1nwau7r7fZehalxnDmYx+rf45o1nI05aqn9Gvp2/u9tllwNzQ9a9gTO/uav5K94/uwHnbn9qN6SrgMUPXv8h4HsLoMMBXgEu6r+OndT8tM55p3kf3Ab7c1X458NKu/TBG/0ldA3wIuF3Xvm+3fk33+mGr6ddbNkhSY6btUI8k6VYy+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1Jj/h8Rov7+Kaj/YgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "# Read sets data as `sets`\n", "sets = pd.read_csv('./dataset/sets.csv')\n", "\n", "# Create a summary of average number of parts by year: `parts_by_year`\n", "parts_by_year = sets.groupby('year').num_parts.mean()\n", "\n", "# Plot trends in average number of parts by year\n", "parts_by_year.plot(kind='hist');\n" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "266a3f390c" }, "deletable": false, "editable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 6. Lego Themes Over Years\n", "

Lego blocks ship under multiple themes. Let us try to get a sense of how the number of themes shipped has varied over the years.

" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "dc": { "key": "266a3f390c" }, "tags": [ "sample_code" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yeartheme_id
019502
119531
219542
319554
419563
\n", "
" ], "text/plain": [ " year theme_id\n", "0 1950 2\n", "1 1953 1\n", "2 1954 2\n", "3 1955 4\n", "4 1956 3" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# themes_by_year: Number of themes shipped by year\n", "themes_by_year = sets[['year', 'theme_id']].\\\n", " groupby('year', as_index=False).agg({'theme_id': pd.Series.nunique})\n", "\n", "themes_by_year.head()" ] }, { "cell_type": "markdown", "metadata": { "dc": { "key": "a293e5076e" }, "deletable": false, "editable": false, "run_control": { "frozen": true }, "tags": [ "context" ] }, "source": [ "## 7. Wrapping It All Up!\n", "

Lego blocks offer an unlimited amount of fun across ages. We explored some interesting trends around colors, parts, and themes.

" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }