{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# データサイエンス100本ノック(構造化データ加工編) - Python\n# for AzureNotebook"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 【注意】オリジナル版との変更点\n1. Azure Notebook ではDockerを使えないので、2020.06.18時点の 100knocks-preprocess/docker/work/data にあるCSVファイルをPostgreSQLから入手する代わりに使っています。\n2. オリジナルのCSVデータ, geocode.csvの'latitude'列名の初めにスペースが入っていたため、それを削除しました。\n\n オリジナル(100knocks-preprocess ver.1.0): ' latitude' --> 'latitude'\n \n \n3. 本環境下において、いくつかのオリジナル解答例にエラーが確認されましたので、オリジナルのスクリプトを最大限尊重しコードを訂正しました。\n4. オリジナルの解答例に必要のないlibraryはimportせず、そして必要なlibraryをAzureNotebookでインストールするように最初のセルを改変してあります。\n5. また、SQLではなく上記CSVからデータを読み込むように'はじめに'の最初のセルを改変してあります。"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 【Azure Notebook】動かす前に\n1. 本スクリプトはPython3.6で動作検証しました。メニュ-->Kernel-->Change Kernelから、Python 3.6を選んでください。Python3ではlibraryをインストールする際にエラーが発生します。"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## はじめに\n- 初めに以下のセルを実行してください\n- 必要なライブラリのインポートと~~データベース(PostgreSQL)~~ 100knocks-preprocess/docker/work/data にあるCSVファイルからのデータ読み込みを行います。geocode.csvに変更を加えたため、またgit cloneをするとAzure Notebooksの簡便さを損なうため、筆者(noguhiro2002)のgithubレポジトリから直接読み込みます。\n- pandas等、利用が想定されるライブラリは以下セルでインポートしています\n- ~~その他利用したいライブラリがあれば適宜インストールしてください(\"!pip install ライブラリ名\"でインストールも可能)~~\n- オリジナルの解答例を元に、必要なライブラリーをpipでインストールします。\n- 処理は複数回に分けても構いません\n- 名前、住所等はダミーデータであり、実在するものではありません"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# pipでオリジナルの解答に必要なライブラリーをインストール\n!pip install --upgrade pip\n!pip install -U pandas numpy scikit-learn\n!pip install imbalanced-learn\n\n\n# pipでオリジナルの解答に必要なライブラリーをインポート\nimport os\nimport pandas as pd\nimport numpy as np\nfrom datetime import datetime, date\nfrom dateutil.relativedelta import relativedelta\nimport math\nfrom sklearn import preprocessing\nfrom sklearn.model_selection import train_test_split\nfrom imblearn.under_sampling import RandomUnderSampler\n\n\n# データを github/noguhiro2002/100knocks-preprocess/work/data フォルダよりDataframe形式でRead\ndf_customer = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/customer.csv')\ndf_category = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/category.csv')\ndf_product = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/product.csv')\ndf_receipt = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/receipt.csv')\ndf_store = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/store.csv')\ndf_geocode = pd.read_csv('https://raw.githubusercontent.com/noguhiro2002/100knocks-preprocess_ForColab-AzureNotebook/master/data/geocode.csv')",
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": "Requirement already up-to-date: pip in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (20.1.1)\nRequirement already up-to-date: pandas in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (1.0.4)\nRequirement already up-to-date: numpy in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (1.18.5)\nRequirement already up-to-date: scikit-learn in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.23.1)\nRequirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas) (2.8.1)\nRequirement already satisfied, skipping upgrade: pytz>=2017.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas) (2019.3)\nRequirement already satisfied, skipping upgrade: threadpoolctl>=2.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (2.1.0)\nRequirement already satisfied, skipping upgrade: scipy>=0.19.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (1.1.0)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (0.14.0)\nRequirement already satisfied, skipping upgrade: six>=1.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.11.0)\nRequirement already satisfied: imbalanced-learn in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.7.0)\nRequirement already satisfied: scikit-learn>=0.23 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (0.23.1)\nRequirement already satisfied: scipy>=0.19.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (1.1.0)\nRequirement already satisfied: numpy>=1.13.3 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (1.18.5)\nRequirement already satisfied: joblib>=0.11 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (0.14.0)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn>=0.23->imbalanced-learn) (2.1.0)\n",
"name": "stdout"
},
{
"output_type": "stream",
"text": "/home/nbuser/anaconda3_501/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (4) have mixed types.Specify dtype option on import or set low_memory=False.\n interactivity=interactivity, compiler=compiler, result=result)\n",
"name": "stderr"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# 演習問題"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-001: レシート明細のデータフレーム(df_receipt)から全項目の先頭10件を表示し、どのようなデータを保有しているか目視で確認せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.head(10)",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/html": "
\n\n
\n \n \n | \n sales_ymd | \n sales_epoch | \n store_cd | \n receipt_no | \n receipt_sub_no | \n customer_id | \n product_cd | \n quantity | \n amount | \n
\n \n \n \n 0 | \n 20181103 | \n 1257206400 | \n S14006 | \n 112 | \n 1 | \n CS006214000001 | \n P070305012 | \n 1 | \n 158 | \n
\n \n 1 | \n 20181118 | \n 1258502400 | \n S13008 | \n 1132 | \n 2 | \n CS008415000097 | \n P070701017 | \n 1 | \n 81 | \n
\n \n 2 | \n 20170712 | \n 1215820800 | \n S14028 | \n 1102 | \n 1 | \n CS028414000014 | \n P060101005 | \n 1 | \n 170 | \n
\n \n 3 | \n 20190205 | \n 1265328000 | \n S14042 | \n 1132 | \n 1 | \n ZZ000000000000 | \n P050301001 | \n 1 | \n 25 | \n
\n \n 4 | \n 20180821 | \n 1250812800 | \n S14025 | \n 1102 | \n 2 | \n CS025415000050 | \n P060102007 | \n 1 | \n 90 | \n
\n \n 5 | \n 20190605 | \n 1275696000 | \n S13003 | \n 1112 | \n 1 | \n CS003515000195 | \n P050102002 | \n 1 | \n 138 | \n
\n \n 6 | \n 20181205 | \n 1259971200 | \n S14024 | \n 1102 | \n 2 | \n CS024514000042 | \n P080101005 | \n 1 | \n 30 | \n
\n \n 7 | \n 20190922 | \n 1285113600 | \n S14040 | \n 1102 | \n 1 | \n CS040415000178 | \n P070501004 | \n 1 | \n 128 | \n
\n \n 8 | \n 20170504 | \n 1209859200 | \n S13020 | \n 1112 | \n 2 | \n ZZ000000000000 | \n P071302010 | \n 1 | \n 770 | \n
\n \n 9 | \n 20191010 | \n 1286668800 | \n S14027 | \n 1102 | \n 1 | \n CS027514000015 | \n P071101003 | \n 1 | \n 680 | \n
\n \n
\n
",
"text/plain": " sales_ymd sales_epoch store_cd receipt_no receipt_sub_no \\\n0 20181103 1257206400 S14006 112 1 \n1 20181118 1258502400 S13008 1132 2 \n2 20170712 1215820800 S14028 1102 1 \n3 20190205 1265328000 S14042 1132 1 \n4 20180821 1250812800 S14025 1102 2 \n5 20190605 1275696000 S13003 1112 1 \n6 20181205 1259971200 S14024 1102 2 \n7 20190922 1285113600 S14040 1102 1 \n8 20170504 1209859200 S13020 1112 2 \n9 20191010 1286668800 S14027 1102 1 \n\n customer_id product_cd quantity amount \n0 CS006214000001 P070305012 1 158 \n1 CS008415000097 P070701017 1 81 \n2 CS028414000014 P060101005 1 170 \n3 ZZ000000000000 P050301001 1 25 \n4 CS025415000050 P060102007 1 90 \n5 CS003515000195 P050102002 1 138 \n6 CS024514000042 P080101005 1 30 \n7 CS040415000178 P070501004 1 128 \n8 ZZ000000000000 P071302010 1 770 \n9 CS027514000015 P071101003 1 680 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-002: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、10件表示させよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].head(10)",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 0 | \n 20181103 | \n CS006214000001 | \n P070305012 | \n 158 | \n
\n \n 1 | \n 20181118 | \n CS008415000097 | \n P070701017 | \n 81 | \n
\n \n 2 | \n 20170712 | \n CS028414000014 | \n P060101005 | \n 170 | \n
\n \n 3 | \n 20190205 | \n ZZ000000000000 | \n P050301001 | \n 25 | \n
\n \n 4 | \n 20180821 | \n CS025415000050 | \n P060102007 | \n 90 | \n
\n \n 5 | \n 20190605 | \n CS003515000195 | \n P050102002 | \n 138 | \n
\n \n 6 | \n 20181205 | \n CS024514000042 | \n P080101005 | \n 30 | \n
\n \n 7 | \n 20190922 | \n CS040415000178 | \n P070501004 | \n 128 | \n
\n \n 8 | \n 20170504 | \n ZZ000000000000 | \n P071302010 | \n 770 | \n
\n \n 9 | \n 20191010 | \n CS027514000015 | \n P071101003 | \n 680 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd amount\n0 20181103 CS006214000001 P070305012 158\n1 20181118 CS008415000097 P070701017 81\n2 20170712 CS028414000014 P060101005 170\n3 20190205 ZZ000000000000 P050301001 25\n4 20180821 CS025415000050 P060102007 90\n5 20190605 CS003515000195 P050102002 138\n6 20181205 CS024514000042 P080101005 30\n7 20190922 CS040415000178 P070501004 128\n8 20170504 ZZ000000000000 P071302010 770\n9 20191010 CS027514000015 P071101003 680"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-003: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、10件表示させよ。ただし、sales_ymdはsales_dateに項目名を変更しながら抽出すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].rename(columns={'sales_ymd': 'sales_date'}).head(10)",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_date | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 0 | \n 20181103 | \n CS006214000001 | \n P070305012 | \n 158 | \n
\n \n 1 | \n 20181118 | \n CS008415000097 | \n P070701017 | \n 81 | \n
\n \n 2 | \n 20170712 | \n CS028414000014 | \n P060101005 | \n 170 | \n
\n \n 3 | \n 20190205 | \n ZZ000000000000 | \n P050301001 | \n 25 | \n
\n \n 4 | \n 20180821 | \n CS025415000050 | \n P060102007 | \n 90 | \n
\n \n 5 | \n 20190605 | \n CS003515000195 | \n P050102002 | \n 138 | \n
\n \n 6 | \n 20181205 | \n CS024514000042 | \n P080101005 | \n 30 | \n
\n \n 7 | \n 20190922 | \n CS040415000178 | \n P070501004 | \n 128 | \n
\n \n 8 | \n 20170504 | \n ZZ000000000000 | \n P071302010 | \n 770 | \n
\n \n 9 | \n 20191010 | \n CS027514000015 | \n P071101003 | \n 680 | \n
\n \n
\n
",
"text/plain": " sales_date customer_id product_cd amount\n0 20181103 CS006214000001 P070305012 158\n1 20181118 CS008415000097 P070701017 81\n2 20170712 CS028414000014 P060101005 170\n3 20190205 ZZ000000000000 P050301001 25\n4 20180821 CS025415000050 P060102007 90\n5 20190605 CS003515000195 P050102002 138\n6 20181205 CS024514000042 P080101005 30\n7 20190922 CS040415000178 P070501004 128\n8 20170504 ZZ000000000000 P071302010 770\n9 20191010 CS027514000015 P071101003 680"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-004: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\""
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].query('customer_id == \"CS018205000001\"')",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 36 | \n 20180911 | \n CS018205000001 | \n P071401012 | \n 2200 | \n
\n \n 9843 | \n 20180414 | \n CS018205000001 | \n P060104007 | \n 600 | \n
\n \n 21110 | \n 20170614 | \n CS018205000001 | \n P050206001 | \n 990 | \n
\n \n 27673 | \n 20170614 | \n CS018205000001 | \n P060702015 | \n 108 | \n
\n \n 27840 | \n 20190216 | \n CS018205000001 | \n P071005024 | \n 102 | \n
\n \n 28757 | \n 20180414 | \n CS018205000001 | \n P071101002 | \n 278 | \n
\n \n 39256 | \n 20190226 | \n CS018205000001 | \n P070902035 | \n 168 | \n
\n \n 58121 | \n 20190924 | \n CS018205000001 | \n P060805001 | \n 495 | \n
\n \n 68117 | \n 20190226 | \n CS018205000001 | \n P071401020 | \n 2200 | \n
\n \n 72254 | \n 20180911 | \n CS018205000001 | \n P071401005 | \n 1100 | \n
\n \n 88508 | \n 20190216 | \n CS018205000001 | \n P040101002 | \n 218 | \n
\n \n 91525 | \n 20190924 | \n CS018205000001 | \n P091503001 | \n 280 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n9843 20180414 CS018205000001 P060104007 600\n21110 20170614 CS018205000001 P050206001 990\n27673 20170614 CS018205000001 P060702015 108\n27840 20190216 CS018205000001 P071005024 102\n28757 20180414 CS018205000001 P071101002 278\n39256 20190226 CS018205000001 P070902035 168\n58121 20190924 CS018205000001 P060805001 495\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100\n88508 20190216 CS018205000001 P040101002 218\n91525 20190924 CS018205000001 P091503001 280"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-005: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & amount >= 1000')",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 36 | \n 20180911 | \n CS018205000001 | \n P071401012 | \n 2200 | \n
\n \n 68117 | \n 20190226 | \n CS018205000001 | \n P071401020 | \n 2200 | \n
\n \n 72254 | \n 20180911 | \n CS018205000001 | \n P071401005 | \n 1100 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-006: レシート明細データフレーム「df_receipt」から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上数量(quantity)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上または売上数量(quantity)が5以上"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'quantity', 'amount']].query('customer_id == \"CS018205000001\" & (amount >= 1000 | quantity >=5)')",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n quantity | \n amount | \n
\n \n \n \n 36 | \n 20180911 | \n CS018205000001 | \n P071401012 | \n 1 | \n 2200 | \n
\n \n 9843 | \n 20180414 | \n CS018205000001 | \n P060104007 | \n 6 | \n 600 | \n
\n \n 21110 | \n 20170614 | \n CS018205000001 | \n P050206001 | \n 5 | \n 990 | \n
\n \n 68117 | \n 20190226 | \n CS018205000001 | \n P071401020 | \n 1 | \n 2200 | \n
\n \n 72254 | \n 20180911 | \n CS018205000001 | \n P071401005 | \n 1 | \n 1100 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd quantity amount\n36 20180911 CS018205000001 P071401012 1 2200\n9843 20180414 CS018205000001 P060104007 6 600\n21110 20170614 CS018205000001 P050206001 5 990\n68117 20190226 CS018205000001 P071401020 1 2200\n72254 20180911 CS018205000001 P071401005 1 1100"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-007: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上2,000以下"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & 1000 <= amount <= 2000')",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 72254 | \n 20180911 | \n CS018205000001 | \n P071401005 | \n 1100 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd amount\n72254 20180911 CS018205000001 P071401005 1100"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-008: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 商品コード(product_cd)が\"P071401019\"以外"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & product_cd != \"P071401019\"')",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n customer_id | \n product_cd | \n amount | \n
\n \n \n \n 36 | \n 20180911 | \n CS018205000001 | \n P071401012 | \n 2200 | \n
\n \n 9843 | \n 20180414 | \n CS018205000001 | \n P060104007 | \n 600 | \n
\n \n 21110 | \n 20170614 | \n CS018205000001 | \n P050206001 | \n 990 | \n
\n \n 27673 | \n 20170614 | \n CS018205000001 | \n P060702015 | \n 108 | \n
\n \n 27840 | \n 20190216 | \n CS018205000001 | \n P071005024 | \n 102 | \n
\n \n 28757 | \n 20180414 | \n CS018205000001 | \n P071101002 | \n 278 | \n
\n \n 39256 | \n 20190226 | \n CS018205000001 | \n P070902035 | \n 168 | \n
\n \n 58121 | \n 20190924 | \n CS018205000001 | \n P060805001 | \n 495 | \n
\n \n 68117 | \n 20190226 | \n CS018205000001 | \n P071401020 | \n 2200 | \n
\n \n 72254 | \n 20180911 | \n CS018205000001 | \n P071401005 | \n 1100 | \n
\n \n 88508 | \n 20190216 | \n CS018205000001 | \n P040101002 | \n 218 | \n
\n \n 91525 | \n 20190924 | \n CS018205000001 | \n P091503001 | \n 280 | \n
\n \n
\n
",
"text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n9843 20180414 CS018205000001 P060104007 600\n21110 20170614 CS018205000001 P050206001 990\n27673 20170614 CS018205000001 P060702015 108\n27840 20190216 CS018205000001 P071005024 102\n28757 20180414 CS018205000001 P071101002 278\n39256 20190226 CS018205000001 P070902035 168\n58121 20190924 CS018205000001 P060805001 495\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100\n88508 20190216 CS018205000001 P040101002 218\n91525 20190924 CS018205000001 P091503001 280"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-009: 以下の処理において、出力結果を変えずにORをANDに書き換えよ。\n\n`df_store.query('not(prefecture_cd == \"13\" | floor_area > 900)')`"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_store.query('prefecture_cd != \"13\" & floor_area <= 900')",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n store_name | \n prefecture_cd | \n prefecture | \n address | \n address_kana | \n tel_no | \n longitude | \n latitude | \n floor_area | \n
\n \n \n \n 18 | \n S14046 | \n 北山田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区北山田一丁目 | \n カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | \n 045-123-4049 | \n 139.5916 | \n 35.56189 | \n 831.0 | \n
\n \n 20 | \n S14011 | \n 日吉本町店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区日吉本町四丁目 | \n カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | \n 045-123-4033 | \n 139.6316 | \n 35.54655 | \n 890.0 | \n
\n \n 38 | \n S12013 | \n 習志野店 | \n 12 | \n 千葉県 | \n 千葉県習志野市芝園一丁目 | \n チバケンナラシノシシバゾノイッチョウメ | \n 047-123-4002 | \n 140.0220 | \n 35.66122 | \n 808.0 | \n
\n \n
\n
",
"text/plain": " store_cd store_name prefecture_cd prefecture address \\\n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n38 S12013 習志野店 12 千葉県 千葉県習志野市芝園一丁目 \n\n address_kana tel_no longitude latitude \\\n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n38 チバケンナラシノシシバゾノイッチョウメ 047-123-4002 140.0220 35.66122 \n\n floor_area \n18 831.0 \n20 890.0 \n38 808.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-010: 店舗データフレーム(df_store)から、店舗コード(store_cd)が\"S14\"で始まるものだけ全項目抽出し、10件だけ表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_store.query(\"store_cd.str.startswith('S14')\", engine='python').head(10)",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n store_name | \n prefecture_cd | \n prefecture | \n address | \n address_kana | \n tel_no | \n longitude | \n latitude | \n floor_area | \n
\n \n \n \n 2 | \n S14010 | \n 菊名店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区菊名一丁目 | \n カナガワケンヨコハマシコウホククキクナイッチョウメ | \n 045-123-4032 | \n 139.6326 | \n 35.50049 | \n 1732.0 | \n
\n \n 3 | \n S14033 | \n 阿久和店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4043 | \n 139.4961 | \n 35.45918 | \n 1495.0 | \n
\n \n 4 | \n S14036 | \n 相模原中央店 | \n 14 | \n 神奈川県 | \n 神奈川県相模原市中央二丁目 | \n カナガワケンサガミハラシチュウオウニチョウメ | \n 042-123-4045 | \n 139.3716 | \n 35.57327 | \n 1679.0 | \n
\n \n 7 | \n S14040 | \n 長津田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市緑区長津田みなみ台五丁目 | \n カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | \n 045-123-4046 | \n 139.4994 | \n 35.52398 | \n 1548.0 | \n
\n \n 9 | \n S14050 | \n 阿久和西店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4053 | \n 139.4961 | \n 35.45918 | \n 1830.0 | \n
\n \n 12 | \n S14028 | \n 二ツ橋店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区二ツ橋町 | \n カナガワケンヨコハマシセヤクフタツバシチョウ | \n 045-123-4042 | \n 139.4963 | \n 35.46304 | \n 1574.0 | \n
\n \n 16 | \n S14012 | \n 本牧和田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市中区本牧和田 | \n カナガワケンヨコハマシナカクホンモクワダ | \n 045-123-4034 | \n 139.6582 | \n 35.42156 | \n 1341.0 | \n
\n \n 18 | \n S14046 | \n 北山田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区北山田一丁目 | \n カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | \n 045-123-4049 | \n 139.5916 | \n 35.56189 | \n 831.0 | \n
\n \n 19 | \n S14022 | \n 逗子店 | \n 14 | \n 神奈川県 | \n 神奈川県逗子市逗子一丁目 | \n カナガワケンズシシズシイッチョウメ | \n 046-123-4036 | \n 139.5789 | \n 35.29642 | \n 1838.0 | \n
\n \n 20 | \n S14011 | \n 日吉本町店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区日吉本町四丁目 | \n カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | \n 045-123-4033 | \n 139.6316 | \n 35.54655 | \n 890.0 | \n
\n \n
\n
",
"text/plain": " store_cd store_name prefecture_cd prefecture address \\\n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n4 S14036 相模原中央店 14 神奈川県 神奈川県相模原市中央二丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n19 S14022 逗子店 14 神奈川県 神奈川県逗子市逗子一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n\n address_kana tel_no longitude latitude \\\n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n4 カナガワケンサガミハラシチュウオウニチョウメ 042-123-4045 139.3716 35.57327 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n19 カナガワケンズシシズシイッチョウメ 046-123-4036 139.5789 35.29642 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n\n floor_area \n2 1732.0 \n3 1495.0 \n4 1679.0 \n7 1548.0 \n9 1830.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n19 1838.0 \n20 890.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-011: 顧客データフレーム(df_customer)から顧客ID(customer_id)の末尾が1のものだけ全項目抽出し、10件だけ表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.query(\"customer_id.str.endswith('1')\", engine='python').head(10)",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 1 | \n CS037613000071 | \n 六角 雅彦 | \n 9 | \n 不明 | \n 1952-04-01 | \n 66 | \n 136-0076 | \n 東京都江東区南砂********** | \n S13037 | \n 20150414 | \n 0-00000000-0 | \n
\n \n 3 | \n CS028811000001 | \n 堀井 かおり | \n 1 | \n 女性 | \n 1933-03-27 | \n 86 | \n 245-0016 | \n 神奈川県横浜市泉区和泉町********** | \n S14028 | \n 20160115 | \n 0-00000000-0 | \n
\n \n 14 | \n CS040412000191 | \n 川井 郁恵 | \n 1 | \n 女性 | \n 1977-01-05 | \n 42 | \n 226-0021 | \n 神奈川県横浜市緑区北八朔町********** | \n S14040 | \n 20151101 | \n 1-20091025-4 | \n
\n \n 31 | \n CS028314000011 | \n 小菅 あおい | \n 1 | \n 女性 | \n 1983-11-26 | \n 35 | \n 246-0038 | \n 神奈川県横浜市瀬谷区宮沢********** | \n S14028 | \n 20151123 | \n 1-20080426-5 | \n
\n \n 56 | \n CS039212000051 | \n 藤島 恵梨香 | \n 1 | \n 女性 | \n 1997-02-03 | \n 22 | \n 166-0001 | \n 東京都杉並区阿佐谷北********** | \n S13039 | \n 20171121 | \n 1-20100215-4 | \n
\n \n 59 | \n CS015412000111 | \n 松居 奈月 | \n 1 | \n 女性 | \n 1972-10-04 | \n 46 | \n 136-0071 | \n 東京都江東区亀戸********** | \n S13015 | \n 20150629 | \n 0-00000000-0 | \n
\n \n 63 | \n CS004702000041 | \n 野島 洋 | \n 0 | \n 男性 | \n 1943-08-24 | \n 75 | \n 176-0022 | \n 東京都練馬区向山********** | \n S13004 | \n 20170218 | \n 0-00000000-0 | \n
\n \n 74 | \n CS041515000001 | \n 栗田 千夏 | \n 1 | \n 女性 | \n 1967-01-02 | \n 52 | \n 206-0001 | \n 東京都多摩市和田********** | \n S13041 | \n 20160422 | \n E-20100803-F | \n
\n \n 85 | \n CS029313000221 | \n 北条 ひかり | \n 1 | \n 女性 | \n 1987-06-19 | \n 31 | \n 279-0011 | \n 千葉県浦安市美浜********** | \n S12029 | \n 20180810 | \n 0-00000000-0 | \n
\n \n 102 | \n CS034312000071 | \n 望月 奈央 | \n 1 | \n 女性 | \n 1980-09-20 | \n 38 | \n 213-0026 | \n 神奈川県川崎市高津区久末********** | \n S14034 | \n 20160106 | \n 0-00000000-0 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n1 CS037613000071 六角 雅彦 9 不明 1952-04-01 66 \n3 CS028811000001 堀井 かおり 1 女性 1933-03-27 86 \n14 CS040412000191 川井 郁恵 1 女性 1977-01-05 42 \n31 CS028314000011 小菅 あおい 1 女性 1983-11-26 35 \n56 CS039212000051 藤島 恵梨香 1 女性 1997-02-03 22 \n59 CS015412000111 松居 奈月 1 女性 1972-10-04 46 \n63 CS004702000041 野島 洋 0 男性 1943-08-24 75 \n74 CS041515000001 栗田 千夏 1 女性 1967-01-02 52 \n85 CS029313000221 北条 ひかり 1 女性 1987-06-19 31 \n102 CS034312000071 望月 奈央 1 女性 1980-09-20 38 \n\n postal_cd address application_store_cd application_date \\\n1 136-0076 東京都江東区南砂********** S13037 20150414 \n3 245-0016 神奈川県横浜市泉区和泉町********** S14028 20160115 \n14 226-0021 神奈川県横浜市緑区北八朔町********** S14040 20151101 \n31 246-0038 神奈川県横浜市瀬谷区宮沢********** S14028 20151123 \n56 166-0001 東京都杉並区阿佐谷北********** S13039 20171121 \n59 136-0071 東京都江東区亀戸********** S13015 20150629 \n63 176-0022 東京都練馬区向山********** S13004 20170218 \n74 206-0001 東京都多摩市和田********** S13041 20160422 \n85 279-0011 千葉県浦安市美浜********** S12029 20180810 \n102 213-0026 神奈川県川崎市高津区久末********** S14034 20160106 \n\n status_cd \n1 0-00000000-0 \n3 0-00000000-0 \n14 1-20091025-4 \n31 1-20080426-5 \n56 1-20100215-4 \n59 0-00000000-0 \n63 0-00000000-0 \n74 E-20100803-F \n85 0-00000000-0 \n102 0-00000000-0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-012: 店舗データフレーム(df_store)から横浜市の店舗だけ全項目表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_store.query(\"address.str.contains('横浜市')\", engine='python')",
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n store_name | \n prefecture_cd | \n prefecture | \n address | \n address_kana | \n tel_no | \n longitude | \n latitude | \n floor_area | \n
\n \n \n \n 2 | \n S14010 | \n 菊名店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区菊名一丁目 | \n カナガワケンヨコハマシコウホククキクナイッチョウメ | \n 045-123-4032 | \n 139.6326 | \n 35.50049 | \n 1732.0 | \n
\n \n 3 | \n S14033 | \n 阿久和店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4043 | \n 139.4961 | \n 35.45918 | \n 1495.0 | \n
\n \n 7 | \n S14040 | \n 長津田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市緑区長津田みなみ台五丁目 | \n カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | \n 045-123-4046 | \n 139.4994 | \n 35.52398 | \n 1548.0 | \n
\n \n 9 | \n S14050 | \n 阿久和西店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4053 | \n 139.4961 | \n 35.45918 | \n 1830.0 | \n
\n \n 12 | \n S14028 | \n 二ツ橋店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区二ツ橋町 | \n カナガワケンヨコハマシセヤクフタツバシチョウ | \n 045-123-4042 | \n 139.4963 | \n 35.46304 | \n 1574.0 | \n
\n \n 16 | \n S14012 | \n 本牧和田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市中区本牧和田 | \n カナガワケンヨコハマシナカクホンモクワダ | \n 045-123-4034 | \n 139.6582 | \n 35.42156 | \n 1341.0 | \n
\n \n 18 | \n S14046 | \n 北山田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区北山田一丁目 | \n カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | \n 045-123-4049 | \n 139.5916 | \n 35.56189 | \n 831.0 | \n
\n \n 20 | \n S14011 | \n 日吉本町店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区日吉本町四丁目 | \n カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | \n 045-123-4033 | \n 139.6316 | \n 35.54655 | \n 890.0 | \n
\n \n 26 | \n S14048 | \n 中川中央店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区中川中央二丁目 | \n カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ | \n 045-123-4051 | \n 139.5758 | \n 35.54912 | \n 1657.0 | \n
\n \n 40 | \n S14042 | \n 新山下店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市中区新山下二丁目 | \n カナガワケンヨコハマシナカクシンヤマシタニチョウメ | \n 045-123-4047 | \n 139.6593 | \n 35.43894 | \n 1044.0 | \n
\n \n 52 | \n S14006 | \n 葛が谷店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区葛が谷 | \n カナガワケンヨコハマシツヅキククズガヤ | \n 045-123-4031 | \n 139.5633 | \n 35.53573 | \n 1886.0 | \n
\n \n
\n
",
"text/plain": " store_cd store_name prefecture_cd prefecture address \\\n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n26 S14048 中川中央店 14 神奈川県 神奈川県横浜市都筑区中川中央二丁目 \n40 S14042 新山下店 14 神奈川県 神奈川県横浜市中区新山下二丁目 \n52 S14006 葛が谷店 14 神奈川県 神奈川県横浜市都筑区葛が谷 \n\n address_kana tel_no longitude latitude \\\n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n26 カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ 045-123-4051 139.5758 35.54912 \n40 カナガワケンヨコハマシナカクシンヤマシタニチョウメ 045-123-4047 139.6593 35.43894 \n52 カナガワケンヨコハマシツヅキククズガヤ 045-123-4031 139.5633 35.53573 \n\n floor_area \n2 1732.0 \n3 1495.0 \n7 1548.0 \n9 1830.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n20 890.0 \n26 1657.0 \n40 1044.0 \n52 1886.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-013: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の先頭がアルファベットのA〜Fで始まるデータを全項目抽出し、10件だけ表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.query(\"status_cd.str.contains('^[A-F]', regex=True)\", engine='python').head(10)",
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 2 | \n CS031415000172 | \n 宇多田 貴美子 | \n 1 | \n 女性 | \n 1976-10-04 | \n 42 | \n 151-0053 | \n 東京都渋谷区代々木********** | \n S13031 | \n 20150529 | \n D-20100325-C | \n
\n \n 6 | \n CS015414000103 | \n 奥野 陽子 | \n 1 | \n 女性 | \n 1977-08-09 | \n 41 | \n 136-0073 | \n 東京都江東区北砂********** | \n S13015 | \n 20150722 | \n B-20100609-B | \n
\n \n 12 | \n CS011215000048 | \n 芦田 沙耶 | \n 1 | \n 女性 | \n 1992-02-01 | \n 27 | \n 223-0062 | \n 神奈川県横浜市港北区日吉本町********** | \n S14011 | \n 20150228 | \n C-20100421-9 | \n
\n \n 15 | \n CS029415000023 | \n 梅田 里穂 | \n 1 | \n 女性 | \n 1976-01-17 | \n 43 | \n 279-0043 | \n 千葉県浦安市富士見********** | \n S12029 | \n 20150610 | \n D-20100918-E | \n
\n \n 21 | \n CS035415000029 | \n 寺沢 真希 | \n 9 | \n 不明 | \n 1977-09-27 | \n 41 | \n 158-0096 | \n 東京都世田谷区玉川台********** | \n S13035 | \n 20141220 | \n F-20101029-F | \n
\n \n 32 | \n CS031415000106 | \n 宇野 由美子 | \n 1 | \n 女性 | \n 1970-02-26 | \n 49 | \n 151-0053 | \n 東京都渋谷区代々木********** | \n S13031 | \n 20150201 | \n F-20100511-E | \n
\n \n 33 | \n CS029215000025 | \n 石倉 美帆 | \n 1 | \n 女性 | \n 1993-09-28 | \n 25 | \n 279-0022 | \n 千葉県浦安市今川********** | \n S12029 | \n 20150708 | \n B-20100820-C | \n
\n \n 40 | \n CS033605000005 | \n 猪股 雄太 | \n 0 | \n 男性 | \n 1955-12-05 | \n 63 | \n 246-0031 | \n 神奈川県横浜市瀬谷区瀬谷********** | \n S14033 | \n 20150425 | \n F-20100917-E | \n
\n \n 44 | \n CS033415000229 | \n 板垣 菜々美 | \n 1 | \n 女性 | \n 1977-11-07 | \n 41 | \n 246-0021 | \n 神奈川県横浜市瀬谷区二ツ橋町********** | \n S14033 | \n 20150712 | \n F-20100326-E | \n
\n \n 53 | \n CS008415000145 | \n 黒谷 麻緒 | \n 1 | \n 女性 | \n 1977-06-27 | \n 41 | \n 157-0067 | \n 東京都世田谷区喜多見********** | \n S13008 | \n 20150829 | \n F-20100622-F | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n2 CS031415000172 宇多田 貴美子 1 女性 1976-10-04 42 151-0053 \n6 CS015414000103 奥野 陽子 1 女性 1977-08-09 41 136-0073 \n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 223-0062 \n15 CS029415000023 梅田 里穂 1 女性 1976-01-17 43 279-0043 \n21 CS035415000029 寺沢 真希 9 不明 1977-09-27 41 158-0096 \n32 CS031415000106 宇野 由美子 1 女性 1970-02-26 49 151-0053 \n33 CS029215000025 石倉 美帆 1 女性 1993-09-28 25 279-0022 \n40 CS033605000005 猪股 雄太 0 男性 1955-12-05 63 246-0031 \n44 CS033415000229 板垣 菜々美 1 女性 1977-11-07 41 246-0021 \n53 CS008415000145 黒谷 麻緒 1 女性 1977-06-27 41 157-0067 \n\n address application_store_cd application_date \\\n2 東京都渋谷区代々木********** S13031 20150529 \n6 東京都江東区北砂********** S13015 20150722 \n12 神奈川県横浜市港北区日吉本町********** S14011 20150228 \n15 千葉県浦安市富士見********** S12029 20150610 \n21 東京都世田谷区玉川台********** S13035 20141220 \n32 東京都渋谷区代々木********** S13031 20150201 \n33 千葉県浦安市今川********** S12029 20150708 \n40 神奈川県横浜市瀬谷区瀬谷********** S14033 20150425 \n44 神奈川県横浜市瀬谷区二ツ橋町********** S14033 20150712 \n53 東京都世田谷区喜多見********** S13008 20150829 \n\n status_cd \n2 D-20100325-C \n6 B-20100609-B \n12 C-20100421-9 \n15 D-20100918-E \n21 F-20101029-F \n32 F-20100511-E \n33 B-20100820-C \n40 F-20100917-E \n44 F-20100326-E \n53 F-20100622-F "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-014: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の末尾が数字の1〜9で終わるデータを全項目抽出し、10件だけ表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.query(\"status_cd.str.contains('[1-9]$', regex=True)\", engine='python').head(10)",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 4 | \n CS001215000145 | \n 田崎 美紀 | \n 1 | \n 女性 | \n 1995-03-29 | \n 24 | \n 144-0055 | \n 東京都大田区仲六郷********** | \n S13001 | \n 20170605 | \n 6-20090929-2 | \n
\n \n 9 | \n CS033513000180 | \n 安斎 遥 | \n 1 | \n 女性 | \n 1962-07-11 | \n 56 | \n 241-0823 | \n 神奈川県横浜市旭区善部町********** | \n S14033 | \n 20150728 | \n 6-20080506-5 | \n
\n \n 12 | \n CS011215000048 | \n 芦田 沙耶 | \n 1 | \n 女性 | \n 1992-02-01 | \n 27 | \n 223-0062 | \n 神奈川県横浜市港北区日吉本町********** | \n S14011 | \n 20150228 | \n C-20100421-9 | \n
\n \n 14 | \n CS040412000191 | \n 川井 郁恵 | \n 1 | \n 女性 | \n 1977-01-05 | \n 42 | \n 226-0021 | \n 神奈川県横浜市緑区北八朔町********** | \n S14040 | \n 20151101 | \n 1-20091025-4 | \n
\n \n 16 | \n CS009315000023 | \n 皆川 文世 | \n 1 | \n 女性 | \n 1980-04-15 | \n 38 | \n 154-0012 | \n 東京都世田谷区駒沢********** | \n S13009 | \n 20150319 | \n 5-20080322-1 | \n
\n \n 22 | \n CS015315000033 | \n 福士 璃奈子 | \n 1 | \n 女性 | \n 1983-03-17 | \n 36 | \n 135-0043 | \n 東京都江東区塩浜********** | \n S13015 | \n 20141024 | \n 4-20080219-3 | \n
\n \n 23 | \n CS023513000066 | \n 神戸 そら | \n 1 | \n 女性 | \n 1961-12-17 | \n 57 | \n 210-0005 | \n 神奈川県川崎市川崎区東田町********** | \n S14023 | \n 20150915 | \n 5-20100524-9 | \n
\n \n 24 | \n CS035513000134 | \n 市川 美帆 | \n 1 | \n 女性 | \n 1960-03-27 | \n 59 | \n 156-0053 | \n 東京都世田谷区桜********** | \n S13035 | \n 20150227 | \n 8-20100711-9 | \n
\n \n 27 | \n CS001515000263 | \n 高松 夏空 | \n 1 | \n 女性 | \n 1962-11-09 | \n 56 | \n 144-0051 | \n 東京都大田区西蒲田********** | \n S13001 | \n 20160812 | \n 1-20100804-1 | \n
\n \n 28 | \n CS040314000027 | \n 鶴田 きみまろ | \n 9 | \n 不明 | \n 1986-03-26 | \n 33 | \n 226-0027 | \n 神奈川県横浜市緑区長津田********** | \n S14040 | \n 20150122 | \n 2-20080426-4 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n4 CS001215000145 田崎 美紀 1 女性 1995-03-29 24 144-0055 \n9 CS033513000180 安斎 遥 1 女性 1962-07-11 56 241-0823 \n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 223-0062 \n14 CS040412000191 川井 郁恵 1 女性 1977-01-05 42 226-0021 \n16 CS009315000023 皆川 文世 1 女性 1980-04-15 38 154-0012 \n22 CS015315000033 福士 璃奈子 1 女性 1983-03-17 36 135-0043 \n23 CS023513000066 神戸 そら 1 女性 1961-12-17 57 210-0005 \n24 CS035513000134 市川 美帆 1 女性 1960-03-27 59 156-0053 \n27 CS001515000263 高松 夏空 1 女性 1962-11-09 56 144-0051 \n28 CS040314000027 鶴田 きみまろ 9 不明 1986-03-26 33 226-0027 \n\n address application_store_cd application_date \\\n4 東京都大田区仲六郷********** S13001 20170605 \n9 神奈川県横浜市旭区善部町********** S14033 20150728 \n12 神奈川県横浜市港北区日吉本町********** S14011 20150228 \n14 神奈川県横浜市緑区北八朔町********** S14040 20151101 \n16 東京都世田谷区駒沢********** S13009 20150319 \n22 東京都江東区塩浜********** S13015 20141024 \n23 神奈川県川崎市川崎区東田町********** S14023 20150915 \n24 東京都世田谷区桜********** S13035 20150227 \n27 東京都大田区西蒲田********** S13001 20160812 \n28 神奈川県横浜市緑区長津田********** S14040 20150122 \n\n status_cd \n4 6-20090929-2 \n9 6-20080506-5 \n12 C-20100421-9 \n14 1-20091025-4 \n16 5-20080322-1 \n22 4-20080219-3 \n23 5-20100524-9 \n24 8-20100711-9 \n27 1-20100804-1 \n28 2-20080426-4 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-015: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の先頭がアルファベットのA〜Fで始まり、末尾が数字の1〜9で終わるデータを全項目抽出し、10件だけ表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.query(\"status_cd.str.contains('^[A-F].*[1-9]$', regex=True)\", engine='python').head(10)",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 12 | \n CS011215000048 | \n 芦田 沙耶 | \n 1 | \n 女性 | \n 1992-02-01 | \n 27 | \n 223-0062 | \n 神奈川県横浜市港北区日吉本町********** | \n S14011 | \n 20150228 | \n C-20100421-9 | \n
\n \n 68 | \n CS022513000105 | \n 島村 貴美子 | \n 1 | \n 女性 | \n 1962-03-12 | \n 57 | \n 249-0002 | \n 神奈川県逗子市山の根********** | \n S14022 | \n 20150320 | \n A-20091115-7 | \n
\n \n 71 | \n CS001515000096 | \n 水野 陽子 | \n 9 | \n 不明 | \n 1960-11-29 | \n 58 | \n 144-0053 | \n 東京都大田区蒲田本町********** | \n S13001 | \n 20150614 | \n A-20100724-7 | \n
\n \n 122 | \n CS013615000053 | \n 西脇 季衣 | \n 1 | \n 女性 | \n 1953-10-18 | \n 65 | \n 261-0026 | \n 千葉県千葉市美浜区幕張西********** | \n S12013 | \n 20150128 | \n B-20100329-6 | \n
\n \n 144 | \n CS020412000161 | \n 小宮 薫 | \n 1 | \n 女性 | \n 1974-05-21 | \n 44 | \n 174-0042 | \n 東京都板橋区東坂下********** | \n S13020 | \n 20150822 | \n B-20081021-3 | \n
\n \n 178 | \n CS001215000097 | \n 竹中 あさみ | \n 1 | \n 女性 | \n 1990-07-25 | \n 28 | \n 146-0095 | \n 東京都大田区多摩川********** | \n S13001 | \n 20170315 | \n A-20100211-2 | \n
\n \n 252 | \n CS035212000007 | \n 内村 恵梨香 | \n 1 | \n 女性 | \n 1990-12-04 | \n 28 | \n 152-0023 | \n 東京都目黒区八雲********** | \n S13035 | \n 20151013 | \n B-20101018-6 | \n
\n \n 259 | \n CS002515000386 | \n 野田 コウ | \n 1 | \n 女性 | \n 1963-05-30 | \n 55 | \n 185-0013 | \n 東京都国分寺市西恋ケ窪********** | \n S13002 | \n 20160410 | \n C-20100127-8 | \n
\n \n 293 | \n CS001615000372 | \n 稲垣 寿々花 | \n 1 | \n 女性 | \n 1956-10-29 | \n 62 | \n 144-0035 | \n 東京都大田区南蒲田********** | \n S13001 | \n 20170403 | \n A-20100104-1 | \n
\n \n 297 | \n CS032512000121 | \n 松井 知世 | \n 1 | \n 女性 | \n 1962-09-04 | \n 56 | \n 210-0011 | \n 神奈川県川崎市川崎区富士見********** | \n S13032 | \n 20150727 | \n A-20100103-5 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 \n68 CS022513000105 島村 貴美子 1 女性 1962-03-12 57 \n71 CS001515000096 水野 陽子 9 不明 1960-11-29 58 \n122 CS013615000053 西脇 季衣 1 女性 1953-10-18 65 \n144 CS020412000161 小宮 薫 1 女性 1974-05-21 44 \n178 CS001215000097 竹中 あさみ 1 女性 1990-07-25 28 \n252 CS035212000007 内村 恵梨香 1 女性 1990-12-04 28 \n259 CS002515000386 野田 コウ 1 女性 1963-05-30 55 \n293 CS001615000372 稲垣 寿々花 1 女性 1956-10-29 62 \n297 CS032512000121 松井 知世 1 女性 1962-09-04 56 \n\n postal_cd address application_store_cd \\\n12 223-0062 神奈川県横浜市港北区日吉本町********** S14011 \n68 249-0002 神奈川県逗子市山の根********** S14022 \n71 144-0053 東京都大田区蒲田本町********** S13001 \n122 261-0026 千葉県千葉市美浜区幕張西********** S12013 \n144 174-0042 東京都板橋区東坂下********** S13020 \n178 146-0095 東京都大田区多摩川********** S13001 \n252 152-0023 東京都目黒区八雲********** S13035 \n259 185-0013 東京都国分寺市西恋ケ窪********** S13002 \n293 144-0035 東京都大田区南蒲田********** S13001 \n297 210-0011 神奈川県川崎市川崎区富士見********** S13032 \n\n application_date status_cd \n12 20150228 C-20100421-9 \n68 20150320 A-20091115-7 \n71 20150614 A-20100724-7 \n122 20150128 B-20100329-6 \n144 20150822 B-20081021-3 \n178 20170315 A-20100211-2 \n252 20151013 B-20101018-6 \n259 20160410 C-20100127-8 \n293 20170403 A-20100104-1 \n297 20150727 A-20100103-5 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-016: 店舗データフレーム(df_store)から、電話番号(tel_no)が3桁-3桁-4桁のデータを全項目表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_store.query(\"tel_no.str.contains('[0-9]{3}-[0-9]{3}-[0-9]{4}', regex=True)\", engine='python')",
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 17,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n store_name | \n prefecture_cd | \n prefecture | \n address | \n address_kana | \n tel_no | \n longitude | \n latitude | \n floor_area | \n
\n \n \n \n 0 | \n S12014 | \n 千草台店 | \n 12 | \n 千葉県 | \n 千葉県千葉市稲毛区千草台一丁目 | \n チバケンチバシイナゲクチグサダイイッチョウメ | \n 043-123-4003 | \n 140.1180 | \n 35.63559 | \n 1698.0 | \n
\n \n 1 | \n S13002 | \n 国分寺店 | \n 13 | \n 東京都 | \n 東京都国分寺市本多二丁目 | \n トウキョウトコクブンジシホンダニチョウメ | \n 042-123-4008 | \n 139.4802 | \n 35.70566 | \n 1735.0 | \n
\n \n 2 | \n S14010 | \n 菊名店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区菊名一丁目 | \n カナガワケンヨコハマシコウホククキクナイッチョウメ | \n 045-123-4032 | \n 139.6326 | \n 35.50049 | \n 1732.0 | \n
\n \n 3 | \n S14033 | \n 阿久和店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4043 | \n 139.4961 | \n 35.45918 | \n 1495.0 | \n
\n \n 4 | \n S14036 | \n 相模原中央店 | \n 14 | \n 神奈川県 | \n 神奈川県相模原市中央二丁目 | \n カナガワケンサガミハラシチュウオウニチョウメ | \n 042-123-4045 | \n 139.3716 | \n 35.57327 | \n 1679.0 | \n
\n \n 7 | \n S14040 | \n 長津田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市緑区長津田みなみ台五丁目 | \n カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | \n 045-123-4046 | \n 139.4994 | \n 35.52398 | \n 1548.0 | \n
\n \n 9 | \n S14050 | \n 阿久和西店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区阿久和西一丁目 | \n カナガワケンヨコハマシセヤクアクワニシイッチョウメ | \n 045-123-4053 | \n 139.4961 | \n 35.45918 | \n 1830.0 | \n
\n \n 11 | \n S13052 | \n 森野店 | \n 13 | \n 東京都 | \n 東京都町田市森野三丁目 | \n トウキョウトマチダシモリノサンチョウメ | \n 042-123-4030 | \n 139.4383 | \n 35.55293 | \n 1087.0 | \n
\n \n 12 | \n S14028 | \n 二ツ橋店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市瀬谷区二ツ橋町 | \n カナガワケンヨコハマシセヤクフタツバシチョウ | \n 045-123-4042 | \n 139.4963 | \n 35.46304 | \n 1574.0 | \n
\n \n 16 | \n S14012 | \n 本牧和田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市中区本牧和田 | \n カナガワケンヨコハマシナカクホンモクワダ | \n 045-123-4034 | \n 139.6582 | \n 35.42156 | \n 1341.0 | \n
\n \n 18 | \n S14046 | \n 北山田店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区北山田一丁目 | \n カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | \n 045-123-4049 | \n 139.5916 | \n 35.56189 | \n 831.0 | \n
\n \n 19 | \n S14022 | \n 逗子店 | \n 14 | \n 神奈川県 | \n 神奈川県逗子市逗子一丁目 | \n カナガワケンズシシズシイッチョウメ | \n 046-123-4036 | \n 139.5789 | \n 35.29642 | \n 1838.0 | \n
\n \n 20 | \n S14011 | \n 日吉本町店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市港北区日吉本町四丁目 | \n カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | \n 045-123-4033 | \n 139.6316 | \n 35.54655 | \n 890.0 | \n
\n \n 21 | \n S13016 | \n 小金井店 | \n 13 | \n 東京都 | \n 東京都小金井市本町一丁目 | \n トウキョウトコガネイシホンチョウイッチョウメ | \n 042-123-4015 | \n 139.5094 | \n 35.70018 | \n 1399.0 | \n
\n \n 22 | \n S14034 | \n 川崎野川店 | \n 14 | \n 神奈川県 | \n 神奈川県川崎市宮前区野川 | \n カナガワケンカワサキシミヤマエクノガワ | \n 044-123-4044 | \n 139.5998 | \n 35.57693 | \n 1318.0 | \n
\n \n 26 | \n S14048 | \n 中川中央店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区中川中央二丁目 | \n カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ | \n 045-123-4051 | \n 139.5758 | \n 35.54912 | \n 1657.0 | \n
\n \n 27 | \n S12007 | \n 佐倉店 | \n 12 | \n 千葉県 | \n 千葉県佐倉市上志津 | \n チバケンサクラシカミシヅ | \n 043-123-4001 | \n 140.1452 | \n 35.71872 | \n 1895.0 | \n
\n \n 28 | \n S14026 | \n 辻堂西海岸店 | \n 14 | \n 神奈川県 | \n 神奈川県藤沢市辻堂西海岸二丁目 | \n カナガワケンフジサワシツジドウニシカイガンニチョウメ | \n 046-123-4040 | \n 139.4466 | \n 35.32464 | \n 1732.0 | \n
\n \n 29 | \n S13041 | \n 八王子店 | \n 13 | \n 東京都 | \n 東京都八王子市大塚 | \n トウキョウトハチオウジシオオツカ | \n 042-123-4026 | \n 139.4235 | \n 35.63787 | \n 810.0 | \n
\n \n 31 | \n S14049 | \n 川崎大師店 | \n 14 | \n 神奈川県 | \n 神奈川県川崎市川崎区中瀬三丁目 | \n カナガワケンカワサキシカワサキクナカゼサンチョウメ | \n 044-123-4052 | \n 139.7327 | \n 35.53759 | \n 962.0 | \n
\n \n 32 | \n S14023 | \n 川崎店 | \n 14 | \n 神奈川県 | \n 神奈川県川崎市川崎区本町二丁目 | \n カナガワケンカワサキシカワサキクホンチョウニチョウメ | \n 044-123-4037 | \n 139.7028 | \n 35.53599 | \n 1804.0 | \n
\n \n 33 | \n S13018 | \n 清瀬店 | \n 13 | \n 東京都 | \n 東京都清瀬市松山一丁目 | \n トウキョウトキヨセシマツヤマイッチョウメ | \n 042-123-4017 | \n 139.5178 | \n 35.76885 | \n 1220.0 | \n
\n \n 35 | \n S14027 | \n 南藤沢店 | \n 14 | \n 神奈川県 | \n 神奈川県藤沢市南藤沢 | \n カナガワケンフジサワシミナミフジサワ | \n 046-123-4041 | \n 139.4896 | \n 35.33762 | \n 1521.0 | \n
\n \n 36 | \n S14021 | \n 伊勢原店 | \n 14 | \n 神奈川県 | \n 神奈川県伊勢原市伊勢原四丁目 | \n カナガワケンイセハラシイセハラヨンチョウメ | \n 046-123-4035 | \n 139.3129 | \n 35.40169 | \n 962.0 | \n
\n \n 37 | \n S14047 | \n 相模原店 | \n 14 | \n 神奈川県 | \n 神奈川県相模原市千代田六丁目 | \n カナガワケンサガミハラシチヨダロクチョウメ | \n 042-123-4050 | \n 139.3748 | \n 35.55959 | \n 1047.0 | \n
\n \n 38 | \n S12013 | \n 習志野店 | \n 12 | \n 千葉県 | \n 千葉県習志野市芝園一丁目 | \n チバケンナラシノシシバゾノイッチョウメ | \n 047-123-4002 | \n 140.0220 | \n 35.66122 | \n 808.0 | \n
\n \n 40 | \n S14042 | \n 新山下店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市中区新山下二丁目 | \n カナガワケンヨコハマシナカクシンヤマシタニチョウメ | \n 045-123-4047 | \n 139.6593 | \n 35.43894 | \n 1044.0 | \n
\n \n 42 | \n S12030 | \n 八幡店 | \n 12 | \n 千葉県 | \n 千葉県市川市八幡三丁目 | \n チバケンイチカワシヤワタサンチョウメ | \n 047-123-4005 | \n 139.9240 | \n 35.72318 | \n 1162.0 | \n
\n \n 44 | \n S14025 | \n 大和店 | \n 14 | \n 神奈川県 | \n 神奈川県大和市下和田 | \n カナガワケンヤマトシシモワダ | \n 046-123-4039 | \n 139.4680 | \n 35.43414 | \n 1011.0 | \n
\n \n 45 | \n S14045 | \n 厚木店 | \n 14 | \n 神奈川県 | \n 神奈川県厚木市中町二丁目 | \n カナガワケンアツギシナカチョウニチョウメ | \n 046-123-4048 | \n 139.3651 | \n 35.44182 | \n 980.0 | \n
\n \n 47 | \n S12029 | \n 東野店 | \n 12 | \n 千葉県 | \n 千葉県浦安市東野一丁目 | \n チバケンウラヤスシヒガシノイッチョウメ | \n 047-123-4004 | \n 139.8968 | \n 35.65086 | \n 1101.0 | \n
\n \n 49 | \n S12053 | \n 高洲店 | \n 12 | \n 千葉県 | \n 千葉県浦安市高洲五丁目 | \n チバケンウラヤスシタカスゴチョウメ | \n 047-123-4006 | \n 139.9176 | \n 35.63755 | \n 1555.0 | \n
\n \n 51 | \n S14024 | \n 三田店 | \n 14 | \n 神奈川県 | \n 神奈川県川崎市多摩区三田四丁目 | \n カナガワケンカワサキシタマクミタヨンチョウメ | \n 044-123-4038 | \n 139.5424 | \n 35.60770 | \n 972.0 | \n
\n \n 52 | \n S14006 | \n 葛が谷店 | \n 14 | \n 神奈川県 | \n 神奈川県横浜市都筑区葛が谷 | \n カナガワケンヨコハマシツヅキククズガヤ | \n 045-123-4031 | \n 139.5633 | \n 35.53573 | \n 1886.0 | \n
\n \n
\n
",
"text/plain": " store_cd store_name prefecture_cd prefecture address \\\n0 S12014 千草台店 12 千葉県 千葉県千葉市稲毛区千草台一丁目 \n1 S13002 国分寺店 13 東京都 東京都国分寺市本多二丁目 \n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n4 S14036 相模原中央店 14 神奈川県 神奈川県相模原市中央二丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n11 S13052 森野店 13 東京都 東京都町田市森野三丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n19 S14022 逗子店 14 神奈川県 神奈川県逗子市逗子一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n21 S13016 小金井店 13 東京都 東京都小金井市本町一丁目 \n22 S14034 川崎野川店 14 神奈川県 神奈川県川崎市宮前区野川 \n26 S14048 中川中央店 14 神奈川県 神奈川県横浜市都筑区中川中央二丁目 \n27 S12007 佐倉店 12 千葉県 千葉県佐倉市上志津 \n28 S14026 辻堂西海岸店 14 神奈川県 神奈川県藤沢市辻堂西海岸二丁目 \n29 S13041 八王子店 13 東京都 東京都八王子市大塚 \n31 S14049 川崎大師店 14 神奈川県 神奈川県川崎市川崎区中瀬三丁目 \n32 S14023 川崎店 14 神奈川県 神奈川県川崎市川崎区本町二丁目 \n33 S13018 清瀬店 13 東京都 東京都清瀬市松山一丁目 \n35 S14027 南藤沢店 14 神奈川県 神奈川県藤沢市南藤沢 \n36 S14021 伊勢原店 14 神奈川県 神奈川県伊勢原市伊勢原四丁目 \n37 S14047 相模原店 14 神奈川県 神奈川県相模原市千代田六丁目 \n38 S12013 習志野店 12 千葉県 千葉県習志野市芝園一丁目 \n40 S14042 新山下店 14 神奈川県 神奈川県横浜市中区新山下二丁目 \n42 S12030 八幡店 12 千葉県 千葉県市川市八幡三丁目 \n44 S14025 大和店 14 神奈川県 神奈川県大和市下和田 \n45 S14045 厚木店 14 神奈川県 神奈川県厚木市中町二丁目 \n47 S12029 東野店 12 千葉県 千葉県浦安市東野一丁目 \n49 S12053 高洲店 12 千葉県 千葉県浦安市高洲五丁目 \n51 S14024 三田店 14 神奈川県 神奈川県川崎市多摩区三田四丁目 \n52 S14006 葛が谷店 14 神奈川県 神奈川県横浜市都筑区葛が谷 \n\n address_kana tel_no longitude latitude \\\n0 チバケンチバシイナゲクチグサダイイッチョウメ 043-123-4003 140.1180 35.63559 \n1 トウキョウトコクブンジシホンダニチョウメ 042-123-4008 139.4802 35.70566 \n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n4 カナガワケンサガミハラシチュウオウニチョウメ 042-123-4045 139.3716 35.57327 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n11 トウキョウトマチダシモリノサンチョウメ 042-123-4030 139.4383 35.55293 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n19 カナガワケンズシシズシイッチョウメ 046-123-4036 139.5789 35.29642 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n21 トウキョウトコガネイシホンチョウイッチョウメ 042-123-4015 139.5094 35.70018 \n22 カナガワケンカワサキシミヤマエクノガワ 044-123-4044 139.5998 35.57693 \n26 カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ 045-123-4051 139.5758 35.54912 \n27 チバケンサクラシカミシヅ 043-123-4001 140.1452 35.71872 \n28 カナガワケンフジサワシツジドウニシカイガンニチョウメ 046-123-4040 139.4466 35.32464 \n29 トウキョウトハチオウジシオオツカ 042-123-4026 139.4235 35.63787 \n31 カナガワケンカワサキシカワサキクナカゼサンチョウメ 044-123-4052 139.7327 35.53759 \n32 カナガワケンカワサキシカワサキクホンチョウニチョウメ 044-123-4037 139.7028 35.53599 \n33 トウキョウトキヨセシマツヤマイッチョウメ 042-123-4017 139.5178 35.76885 \n35 カナガワケンフジサワシミナミフジサワ 046-123-4041 139.4896 35.33762 \n36 カナガワケンイセハラシイセハラヨンチョウメ 046-123-4035 139.3129 35.40169 \n37 カナガワケンサガミハラシチヨダロクチョウメ 042-123-4050 139.3748 35.55959 \n38 チバケンナラシノシシバゾノイッチョウメ 047-123-4002 140.0220 35.66122 \n40 カナガワケンヨコハマシナカクシンヤマシタニチョウメ 045-123-4047 139.6593 35.43894 \n42 チバケンイチカワシヤワタサンチョウメ 047-123-4005 139.9240 35.72318 \n44 カナガワケンヤマトシシモワダ 046-123-4039 139.4680 35.43414 \n45 カナガワケンアツギシナカチョウニチョウメ 046-123-4048 139.3651 35.44182 \n47 チバケンウラヤスシヒガシノイッチョウメ 047-123-4004 139.8968 35.65086 \n49 チバケンウラヤスシタカスゴチョウメ 047-123-4006 139.9176 35.63755 \n51 カナガワケンカワサキシタマクミタヨンチョウメ 044-123-4038 139.5424 35.60770 \n52 カナガワケンヨコハマシツヅキククズガヤ 045-123-4031 139.5633 35.53573 \n\n floor_area \n0 1698.0 \n1 1735.0 \n2 1732.0 \n3 1495.0 \n4 1679.0 \n7 1548.0 \n9 1830.0 \n11 1087.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n19 1838.0 \n20 890.0 \n21 1399.0 \n22 1318.0 \n26 1657.0 \n27 1895.0 \n28 1732.0 \n29 810.0 \n31 962.0 \n32 1804.0 \n33 1220.0 \n35 1521.0 \n36 962.0 \n37 1047.0 \n38 808.0 \n40 1044.0 \n42 1162.0 \n44 1011.0 \n45 980.0 \n47 1101.0 \n49 1555.0 \n51 972.0 \n52 1886.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-17: 顧客データフレーム(df_customer)を生年月日(birth_day)で高齢順にソートし、先頭10件を全項目表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.sort_values('birth_day', ascending=True).head(10)",
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 18,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 18817 | \n CS003813000014 | \n 村山 菜々美 | \n 1 | \n 女性 | \n 1928-11-26 | \n 90 | \n 182-0007 | \n 東京都調布市菊野台********** | \n S13003 | \n 20160214 | \n 0-00000000-0 | \n
\n \n 12328 | \n CS026813000004 | \n 吉村 朝陽 | \n 1 | \n 女性 | \n 1928-12-14 | \n 90 | \n 251-0043 | \n 神奈川県藤沢市辻堂元町********** | \n S14026 | \n 20150723 | \n 0-00000000-0 | \n
\n \n 15682 | \n CS018811000003 | \n 熊沢 美里 | \n 1 | \n 女性 | \n 1929-01-07 | \n 90 | \n 204-0004 | \n 東京都清瀬市野塩********** | \n S13018 | \n 20150403 | \n 0-00000000-0 | \n
\n \n 15302 | \n CS027803000004 | \n 内村 拓郎 | \n 0 | \n 男性 | \n 1929-01-12 | \n 90 | \n 251-0031 | \n 神奈川県藤沢市鵠沼藤が谷********** | \n S14027 | \n 20151227 | \n 0-00000000-0 | \n
\n \n 1681 | \n CS013801000003 | \n 天野 拓郎 | \n 0 | \n 男性 | \n 1929-01-15 | \n 90 | \n 274-0824 | \n 千葉県船橋市前原東********** | \n S12013 | \n 20160120 | \n 0-00000000-0 | \n
\n \n 7511 | \n CS001814000022 | \n 鶴田 里穂 | \n 1 | \n 女性 | \n 1929-01-28 | \n 90 | \n 144-0045 | \n 東京都大田区南六郷********** | \n S13001 | \n 20161012 | \n A-20090415-7 | \n
\n \n 2378 | \n CS016815000002 | \n 山元 美紀 | \n 1 | \n 女性 | \n 1929-02-22 | \n 90 | \n 184-0005 | \n 東京都小金井市桜町********** | \n S13016 | \n 20150629 | \n C-20090923-C | \n
\n \n 4680 | \n CS009815000003 | \n 中田 里穂 | \n 1 | \n 女性 | \n 1929-04-08 | \n 89 | \n 154-0014 | \n 東京都世田谷区新町********** | \n S13009 | \n 20150421 | \n D-20091021-E | \n
\n \n 16070 | \n CS005813000015 | \n 金谷 恵梨香 | \n 1 | \n 女性 | \n 1929-04-09 | \n 89 | \n 165-0032 | \n 東京都中野区鷺宮********** | \n S13005 | \n 20150506 | \n 0-00000000-0 | \n
\n \n 6305 | \n CS012813000013 | \n 宇野 南朋 | \n 1 | \n 女性 | \n 1929-04-09 | \n 89 | \n 231-0806 | \n 神奈川県横浜市中区本牧町********** | \n S14012 | \n 20150712 | \n 0-00000000-0 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n18817 CS003813000014 村山 菜々美 1 女性 1928-11-26 90 \n12328 CS026813000004 吉村 朝陽 1 女性 1928-12-14 90 \n15682 CS018811000003 熊沢 美里 1 女性 1929-01-07 90 \n15302 CS027803000004 内村 拓郎 0 男性 1929-01-12 90 \n1681 CS013801000003 天野 拓郎 0 男性 1929-01-15 90 \n7511 CS001814000022 鶴田 里穂 1 女性 1929-01-28 90 \n2378 CS016815000002 山元 美紀 1 女性 1929-02-22 90 \n4680 CS009815000003 中田 里穂 1 女性 1929-04-08 89 \n16070 CS005813000015 金谷 恵梨香 1 女性 1929-04-09 89 \n6305 CS012813000013 宇野 南朋 1 女性 1929-04-09 89 \n\n postal_cd address application_store_cd \\\n18817 182-0007 東京都調布市菊野台********** S13003 \n12328 251-0043 神奈川県藤沢市辻堂元町********** S14026 \n15682 204-0004 東京都清瀬市野塩********** S13018 \n15302 251-0031 神奈川県藤沢市鵠沼藤が谷********** S14027 \n1681 274-0824 千葉県船橋市前原東********** S12013 \n7511 144-0045 東京都大田区南六郷********** S13001 \n2378 184-0005 東京都小金井市桜町********** S13016 \n4680 154-0014 東京都世田谷区新町********** S13009 \n16070 165-0032 東京都中野区鷺宮********** S13005 \n6305 231-0806 神奈川県横浜市中区本牧町********** S14012 \n\n application_date status_cd \n18817 20160214 0-00000000-0 \n12328 20150723 0-00000000-0 \n15682 20150403 0-00000000-0 \n15302 20151227 0-00000000-0 \n1681 20160120 0-00000000-0 \n7511 20161012 A-20090415-7 \n2378 20150629 C-20090923-C \n4680 20150421 D-20091021-E \n16070 20150506 0-00000000-0 \n6305 20150712 0-00000000-0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-18: 顧客データフレーム(df_customer)を生年月日(birth_day)で若い順にソートし、先頭10件を全項目表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.sort_values('birth_day', ascending=False).head(10)",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n
\n \n \n \n 15639 | \n CS035114000004 | \n 大村 美里 | \n 1 | \n 女性 | \n 2007-11-25 | \n 11 | \n 156-0053 | \n 東京都世田谷区桜********** | \n S13035 | \n 20150619 | \n 6-20091205-6 | \n
\n \n 7468 | \n CS022103000002 | \n 福山 はじめ | \n 9 | \n 不明 | \n 2007-10-02 | \n 11 | \n 249-0006 | \n 神奈川県逗子市逗子********** | \n S14022 | \n 20160909 | \n 0-00000000-0 | \n
\n \n 10745 | \n CS002113000009 | \n 柴田 真悠子 | \n 1 | \n 女性 | \n 2007-09-17 | \n 11 | \n 184-0014 | \n 東京都小金井市貫井南町********** | \n S13002 | \n 20160304 | \n 0-00000000-0 | \n
\n \n 19811 | \n CS004115000014 | \n 松井 京子 | \n 1 | \n 女性 | \n 2007-08-09 | \n 11 | \n 165-0031 | \n 東京都中野区上鷺宮********** | \n S13004 | \n 20161120 | \n 1-20081231-1 | \n
\n \n 7039 | \n CS002114000010 | \n 山内 遥 | \n 1 | \n 女性 | \n 2007-06-03 | \n 11 | \n 184-0015 | \n 東京都小金井市貫井北町********** | \n S13002 | \n 20160920 | \n 6-20100510-1 | \n
\n \n 3670 | \n CS025115000002 | \n 小柳 夏希 | \n 1 | \n 女性 | \n 2007-04-18 | \n 11 | \n 245-0018 | \n 神奈川県横浜市泉区上飯田町********** | \n S14025 | \n 20160116 | \n D-20100913-D | \n
\n \n 12493 | \n CS002113000025 | \n 広末 まなみ | \n 1 | \n 女性 | \n 2007-03-30 | \n 12 | \n 184-0015 | \n 東京都小金井市貫井北町********** | \n S13002 | \n 20171030 | \n 0-00000000-0 | \n
\n \n 15977 | \n CS033112000003 | \n 長野 美紀 | \n 1 | \n 女性 | \n 2007-03-22 | \n 12 | \n 245-0051 | \n 神奈川県横浜市戸塚区名瀬町********** | \n S14033 | \n 20150606 | \n 0-00000000-0 | \n
\n \n 5716 | \n CS007115000006 | \n 福岡 瞬 | \n 1 | \n 女性 | \n 2007-03-10 | \n 12 | \n 285-0845 | \n 千葉県佐倉市西志津********** | \n S12007 | \n 20151118 | \n F-20101016-F | \n
\n \n 15097 | \n CS014113000008 | \n 矢口 莉緒 | \n 1 | \n 女性 | \n 2007-03-05 | \n 12 | \n 260-0041 | \n 千葉県千葉市中央区東千葉********** | \n S12014 | \n 20150622 | \n 3-20091108-6 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n15639 CS035114000004 大村 美里 1 女性 2007-11-25 11 \n7468 CS022103000002 福山 はじめ 9 不明 2007-10-02 11 \n10745 CS002113000009 柴田 真悠子 1 女性 2007-09-17 11 \n19811 CS004115000014 松井 京子 1 女性 2007-08-09 11 \n7039 CS002114000010 山内 遥 1 女性 2007-06-03 11 \n3670 CS025115000002 小柳 夏希 1 女性 2007-04-18 11 \n12493 CS002113000025 広末 まなみ 1 女性 2007-03-30 12 \n15977 CS033112000003 長野 美紀 1 女性 2007-03-22 12 \n5716 CS007115000006 福岡 瞬 1 女性 2007-03-10 12 \n15097 CS014113000008 矢口 莉緒 1 女性 2007-03-05 12 \n\n postal_cd address application_store_cd \\\n15639 156-0053 東京都世田谷区桜********** S13035 \n7468 249-0006 神奈川県逗子市逗子********** S14022 \n10745 184-0014 東京都小金井市貫井南町********** S13002 \n19811 165-0031 東京都中野区上鷺宮********** S13004 \n7039 184-0015 東京都小金井市貫井北町********** S13002 \n3670 245-0018 神奈川県横浜市泉区上飯田町********** S14025 \n12493 184-0015 東京都小金井市貫井北町********** S13002 \n15977 245-0051 神奈川県横浜市戸塚区名瀬町********** S14033 \n5716 285-0845 千葉県佐倉市西志津********** S12007 \n15097 260-0041 千葉県千葉市中央区東千葉********** S12014 \n\n application_date status_cd \n15639 20150619 6-20091205-6 \n7468 20160909 0-00000000-0 \n10745 20160304 0-00000000-0 \n19811 20161120 1-20081231-1 \n7039 20160920 6-20100510-1 \n3670 20160116 D-20100913-D \n12493 20171030 0-00000000-0 \n15977 20150606 0-00000000-0 \n5716 20151118 F-20101016-F \n15097 20150622 3-20091108-6 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-19: レシート明細データフレーム(df_receipt)に対し、1件あたりの売上金額(amount)が高い順にランクを付与し、先頭10件を抽出せよ。項目は顧客ID(customer_id)、売上金額(amount)、付与したランクを表示させること。なお、売上金額(amount)が等しい場合は同一順位を付与するものとする。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] \n ,df_receipt['amount'].rank(method='min', ascending=False)], axis=1)\ndf_tmp.columns = ['customer_id', 'amount', 'ranking']\ndf_tmp.sort_values('ranking', ascending=True).head(10)",
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 20,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n ranking | \n
\n \n \n \n 1202 | \n CS011415000006 | \n 10925 | \n 1.0 | \n
\n \n 62317 | \n ZZ000000000000 | \n 6800 | \n 2.0 | \n
\n \n 54095 | \n CS028605000002 | \n 5780 | \n 3.0 | \n
\n \n 4632 | \n CS015515000034 | \n 5480 | \n 4.0 | \n
\n \n 72747 | \n ZZ000000000000 | \n 5480 | \n 4.0 | \n
\n \n 10320 | \n ZZ000000000000 | \n 5480 | \n 4.0 | \n
\n \n 97294 | \n CS021515000089 | \n 5440 | \n 7.0 | \n
\n \n 28304 | \n ZZ000000000000 | \n 5440 | \n 7.0 | \n
\n \n 92246 | \n CS009415000038 | \n 5280 | \n 9.0 | \n
\n \n 68553 | \n CS040415000200 | \n 5280 | \n 9.0 | \n
\n \n
\n
",
"text/plain": " customer_id amount ranking\n1202 CS011415000006 10925 1.0\n62317 ZZ000000000000 6800 2.0\n54095 CS028605000002 5780 3.0\n4632 CS015515000034 5480 4.0\n72747 ZZ000000000000 5480 4.0\n10320 ZZ000000000000 5480 4.0\n97294 CS021515000089 5440 7.0\n28304 ZZ000000000000 5440 7.0\n92246 CS009415000038 5280 9.0\n68553 CS040415000200 5280 9.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-020: レシート明細データフレーム(df_receipt)に対し、1件あたりの売上金額(amount)が高い順にランクを付与し、先頭10件を抽出せよ。項目は顧客ID(customer_id)、売上金額(amount)、付与したランクを表示させること。なお、売上金額(amount)が等しい場合でも別順位を付与すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] \n ,df_receipt['amount'].rank(method='first', ascending=False)], axis=1)\ndf_tmp.columns = ['customer_id', 'amount', 'ranking']\ndf_tmp.sort_values('ranking', ascending=True).head(10)",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 21,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n ranking | \n
\n \n \n \n 1202 | \n CS011415000006 | \n 10925 | \n 1.0 | \n
\n \n 62317 | \n ZZ000000000000 | \n 6800 | \n 2.0 | \n
\n \n 54095 | \n CS028605000002 | \n 5780 | \n 3.0 | \n
\n \n 4632 | \n CS015515000034 | \n 5480 | \n 4.0 | \n
\n \n 10320 | \n ZZ000000000000 | \n 5480 | \n 5.0 | \n
\n \n 72747 | \n ZZ000000000000 | \n 5480 | \n 6.0 | \n
\n \n 28304 | \n ZZ000000000000 | \n 5440 | \n 7.0 | \n
\n \n 97294 | \n CS021515000089 | \n 5440 | \n 8.0 | \n
\n \n 596 | \n CS015515000083 | \n 5280 | \n 9.0 | \n
\n \n 11275 | \n CS017414000114 | \n 5280 | \n 10.0 | \n
\n \n
\n
",
"text/plain": " customer_id amount ranking\n1202 CS011415000006 10925 1.0\n62317 ZZ000000000000 6800 2.0\n54095 CS028605000002 5780 3.0\n4632 CS015515000034 5480 4.0\n10320 ZZ000000000000 5480 5.0\n72747 ZZ000000000000 5480 6.0\n28304 ZZ000000000000 5440 7.0\n97294 CS021515000089 5440 8.0\n596 CS015515000083 5280 9.0\n11275 CS017414000114 5280 10.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-021: レシート明細データフレーム(df_receipt)に対し、件数をカウントせよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "len(df_receipt)",
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 22,
"data": {
"text/plain": "104681"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-022: レシート明細データフレーム(df_receipt)の顧客ID(customer_id)に対し、ユニーク件数をカウントせよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "len(df_receipt['customer_id'].unique())",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 23,
"data": {
"text/plain": "8307"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-023: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)と売上数量(quantity)を合計せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').agg({'amount':'sum', 'quantity':'sum'}).reset_index()",
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 24,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n quantity | \n
\n \n \n \n 0 | \n S12007 | \n 638761 | \n 2099 | \n
\n \n 1 | \n S12013 | \n 787513 | \n 2425 | \n
\n \n 2 | \n S12014 | \n 725167 | \n 2358 | \n
\n \n 3 | \n S12029 | \n 794741 | \n 2555 | \n
\n \n 4 | \n S12030 | \n 684402 | \n 2403 | \n
\n \n 5 | \n S13001 | \n 811936 | \n 2347 | \n
\n \n 6 | \n S13002 | \n 727821 | \n 2340 | \n
\n \n 7 | \n S13003 | \n 764294 | \n 2197 | \n
\n \n 8 | \n S13004 | \n 779373 | \n 2390 | \n
\n \n 9 | \n S13005 | \n 629876 | \n 2004 | \n
\n \n 10 | \n S13008 | \n 809288 | \n 2491 | \n
\n \n 11 | \n S13009 | \n 808870 | \n 2486 | \n
\n \n 12 | \n S13015 | \n 780873 | \n 2248 | \n
\n \n 13 | \n S13016 | \n 793773 | \n 2432 | \n
\n \n 14 | \n S13017 | \n 748221 | \n 2376 | \n
\n \n 15 | \n S13018 | \n 790535 | \n 2562 | \n
\n \n 16 | \n S13019 | \n 827833 | \n 2541 | \n
\n \n 17 | \n S13020 | \n 796383 | \n 2383 | \n
\n \n 18 | \n S13031 | \n 705968 | \n 2336 | \n
\n \n 19 | \n S13032 | \n 790501 | \n 2491 | \n
\n \n 20 | \n S13035 | \n 715869 | \n 2219 | \n
\n \n 21 | \n S13037 | \n 693087 | \n 2344 | \n
\n \n 22 | \n S13038 | \n 708884 | \n 2337 | \n
\n \n 23 | \n S13039 | \n 611888 | \n 1981 | \n
\n \n 24 | \n S13041 | \n 728266 | \n 2233 | \n
\n \n 25 | \n S13043 | \n 587895 | \n 1881 | \n
\n \n 26 | \n S13044 | \n 520764 | \n 1729 | \n
\n \n 27 | \n S13051 | \n 107452 | \n 354 | \n
\n \n 28 | \n S13052 | \n 100314 | \n 250 | \n
\n \n 29 | \n S14006 | \n 712839 | \n 2284 | \n
\n \n 30 | \n S14010 | \n 790361 | \n 2290 | \n
\n \n 31 | \n S14011 | \n 805724 | \n 2434 | \n
\n \n 32 | \n S14012 | \n 720600 | \n 2412 | \n
\n \n 33 | \n S14021 | \n 699511 | \n 2231 | \n
\n \n 34 | \n S14022 | \n 651328 | \n 2047 | \n
\n \n 35 | \n S14023 | \n 727630 | \n 2258 | \n
\n \n 36 | \n S14024 | \n 736323 | \n 2417 | \n
\n \n 37 | \n S14025 | \n 755581 | \n 2394 | \n
\n \n 38 | \n S14026 | \n 824537 | \n 2503 | \n
\n \n 39 | \n S14027 | \n 714550 | \n 2303 | \n
\n \n 40 | \n S14028 | \n 786145 | \n 2458 | \n
\n \n 41 | \n S14033 | \n 725318 | \n 2282 | \n
\n \n 42 | \n S14034 | \n 653681 | \n 2024 | \n
\n \n 43 | \n S14036 | \n 203694 | \n 635 | \n
\n \n 44 | \n S14040 | \n 701858 | \n 2233 | \n
\n \n 45 | \n S14042 | \n 534689 | \n 1935 | \n
\n \n 46 | \n S14045 | \n 458484 | \n 1398 | \n
\n \n 47 | \n S14046 | \n 412646 | \n 1354 | \n
\n \n 48 | \n S14047 | \n 338329 | \n 1041 | \n
\n \n 49 | \n S14048 | \n 234276 | \n 769 | \n
\n \n 50 | \n S14049 | \n 230808 | \n 788 | \n
\n \n 51 | \n S14050 | \n 167090 | \n 580 | \n
\n \n
\n
",
"text/plain": " store_cd amount quantity\n0 S12007 638761 2099\n1 S12013 787513 2425\n2 S12014 725167 2358\n3 S12029 794741 2555\n4 S12030 684402 2403\n5 S13001 811936 2347\n6 S13002 727821 2340\n7 S13003 764294 2197\n8 S13004 779373 2390\n9 S13005 629876 2004\n10 S13008 809288 2491\n11 S13009 808870 2486\n12 S13015 780873 2248\n13 S13016 793773 2432\n14 S13017 748221 2376\n15 S13018 790535 2562\n16 S13019 827833 2541\n17 S13020 796383 2383\n18 S13031 705968 2336\n19 S13032 790501 2491\n20 S13035 715869 2219\n21 S13037 693087 2344\n22 S13038 708884 2337\n23 S13039 611888 1981\n24 S13041 728266 2233\n25 S13043 587895 1881\n26 S13044 520764 1729\n27 S13051 107452 354\n28 S13052 100314 250\n29 S14006 712839 2284\n30 S14010 790361 2290\n31 S14011 805724 2434\n32 S14012 720600 2412\n33 S14021 699511 2231\n34 S14022 651328 2047\n35 S14023 727630 2258\n36 S14024 736323 2417\n37 S14025 755581 2394\n38 S14026 824537 2503\n39 S14027 714550 2303\n40 S14028 786145 2458\n41 S14033 725318 2282\n42 S14034 653681 2024\n43 S14036 203694 635\n44 S14040 701858 2233\n45 S14042 534689 1935\n46 S14045 458484 1398\n47 S14046 412646 1354\n48 S14047 338329 1041\n49 S14048 234276 769\n50 S14049 230808 788\n51 S14050 167090 580"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-024: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も新しい売上日(sales_ymd)を求め、10件表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('customer_id').sales_ymd.max().reset_index().head(10)",
"execution_count": 25,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 25,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n
\n \n \n \n 0 | \n CS001113000004 | \n 20190308 | \n
\n \n 1 | \n CS001114000005 | \n 20190731 | \n
\n \n 2 | \n CS001115000010 | \n 20190405 | \n
\n \n 3 | \n CS001205000004 | \n 20190625 | \n
\n \n 4 | \n CS001205000006 | \n 20190224 | \n
\n \n 5 | \n CS001211000025 | \n 20190322 | \n
\n \n 6 | \n CS001212000027 | \n 20170127 | \n
\n \n 7 | \n CS001212000031 | \n 20180906 | \n
\n \n 8 | \n CS001212000046 | \n 20170811 | \n
\n \n 9 | \n CS001212000070 | \n 20191018 | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd\n0 CS001113000004 20190308\n1 CS001114000005 20190731\n2 CS001115000010 20190405\n3 CS001205000004 20190625\n4 CS001205000006 20190224\n5 CS001211000025 20190322\n6 CS001212000027 20170127\n7 CS001212000031 20180906\n8 CS001212000046 20170811\n9 CS001212000070 20191018"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-025: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も古い売上日(sales_ymd)を求め、10件表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).head(10)",
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 26,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n
\n \n customer_id | \n | \n
\n \n \n \n CS001113000004 | \n 20190308 | \n
\n \n CS001114000005 | \n 20180503 | \n
\n \n CS001115000010 | \n 20171228 | \n
\n \n CS001205000004 | \n 20170914 | \n
\n \n CS001205000006 | \n 20180207 | \n
\n \n CS001211000025 | \n 20190322 | \n
\n \n CS001212000027 | \n 20170127 | \n
\n \n CS001212000031 | \n 20180906 | \n
\n \n CS001212000046 | \n 20170811 | \n
\n \n CS001212000070 | \n 20191018 | \n
\n \n
\n
",
"text/plain": " sales_ymd\ncustomer_id \nCS001113000004 20190308\nCS001114000005 20180503\nCS001115000010 20171228\nCS001205000004 20170914\nCS001205000006 20180207\nCS001211000025 20190322\nCS001212000027 20170127\nCS001212000031 20180906\nCS001212000046 20170811\nCS001212000070 20191018"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-026: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も新しい売上日(sales_ymd)と古い売上日を求め、両者が異なるデータを10件表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()\ndf_tmp.columns = [\"_\".join(pair) for pair in df_tmp.columns]\ndf_tmp.query('sales_ymd_max != sales_ymd_min').head(10)",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id_ | \n sales_ymd_max | \n sales_ymd_min | \n
\n \n \n \n 1 | \n CS001114000005 | \n 20190731 | \n 20180503 | \n
\n \n 2 | \n CS001115000010 | \n 20190405 | \n 20171228 | \n
\n \n 3 | \n CS001205000004 | \n 20190625 | \n 20170914 | \n
\n \n 4 | \n CS001205000006 | \n 20190224 | \n 20180207 | \n
\n \n 13 | \n CS001214000009 | \n 20190902 | \n 20170306 | \n
\n \n 14 | \n CS001214000017 | \n 20191006 | \n 20180828 | \n
\n \n 16 | \n CS001214000048 | \n 20190929 | \n 20171109 | \n
\n \n 17 | \n CS001214000052 | \n 20190617 | \n 20180208 | \n
\n \n 20 | \n CS001215000005 | \n 20181021 | \n 20170206 | \n
\n \n 21 | \n CS001215000040 | \n 20171022 | \n 20170214 | \n
\n \n
\n
",
"text/plain": " customer_id_ sales_ymd_max sales_ymd_min\n1 CS001114000005 20190731 20180503\n2 CS001115000010 20190405 20171228\n3 CS001205000004 20190625 20170914\n4 CS001205000006 20190224 20180207\n13 CS001214000009 20190902 20170306\n14 CS001214000017 20191006 20180828\n16 CS001214000048 20190929 20171109\n17 CS001214000052 20190617 20180208\n20 CS001215000005 20181021 20170206\n21 CS001215000040 20171022 20170214"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-027: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、降順でTOP5を表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)",
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 28,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n
\n \n \n \n 28 | \n S13052 | \n 402.867470 | \n
\n \n 12 | \n S13015 | \n 351.111960 | \n
\n \n 7 | \n S13003 | \n 350.915519 | \n
\n \n 30 | \n S14010 | \n 348.791262 | \n
\n \n 5 | \n S13001 | \n 348.470386 | \n
\n \n
\n
",
"text/plain": " store_cd amount\n28 S13052 402.867470\n12 S13015 351.111960\n7 S13003 350.915519\n30 S14010 348.791262\n5 S13001 348.470386"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-028: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の中央値を計算し、降順でTOP5を表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)",
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 29,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n
\n \n \n \n 28 | \n S13052 | \n 190 | \n
\n \n 30 | \n S14010 | \n 188 | \n
\n \n 51 | \n S14050 | \n 185 | \n
\n \n 44 | \n S14040 | \n 180 | \n
\n \n 7 | \n S13003 | \n 180 | \n
\n \n
\n
",
"text/plain": " store_cd amount\n28 S13052 190\n30 S14010 188\n51 S14050 185\n44 S14040 180\n7 S13003 180"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-029: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに商品コード(product_cd)の最頻値を求めよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').product_cd.apply(lambda x: x.mode()).reset_index()",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 30,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n level_1 | \n product_cd | \n
\n \n \n \n 0 | \n S12007 | \n 0 | \n P060303001 | \n
\n \n 1 | \n S12013 | \n 0 | \n P060303001 | \n
\n \n 2 | \n S12014 | \n 0 | \n P060303001 | \n
\n \n 3 | \n S12029 | \n 0 | \n P060303001 | \n
\n \n 4 | \n S12030 | \n 0 | \n P060303001 | \n
\n \n 5 | \n S13001 | \n 0 | \n P060303001 | \n
\n \n 6 | \n S13002 | \n 0 | \n P060303001 | \n
\n \n 7 | \n S13003 | \n 0 | \n P071401001 | \n
\n \n 8 | \n S13004 | \n 0 | \n P060303001 | \n
\n \n 9 | \n S13005 | \n 0 | \n P040503001 | \n
\n \n 10 | \n S13008 | \n 0 | \n P060303001 | \n
\n \n 11 | \n S13009 | \n 0 | \n P060303001 | \n
\n \n 12 | \n S13015 | \n 0 | \n P071401001 | \n
\n \n 13 | \n S13016 | \n 0 | \n P071102001 | \n
\n \n 14 | \n S13017 | \n 0 | \n P060101002 | \n
\n \n 15 | \n S13018 | \n 0 | \n P071401001 | \n
\n \n 16 | \n S13019 | \n 0 | \n P071401001 | \n
\n \n 17 | \n S13020 | \n 0 | \n P071401001 | \n
\n \n 18 | \n S13031 | \n 0 | \n P060303001 | \n
\n \n 19 | \n S13032 | \n 0 | \n P060303001 | \n
\n \n 20 | \n S13035 | \n 0 | \n P040503001 | \n
\n \n 21 | \n S13037 | \n 0 | \n P060303001 | \n
\n \n 22 | \n S13038 | \n 0 | \n P060303001 | \n
\n \n 23 | \n S13039 | \n 0 | \n P071401001 | \n
\n \n 24 | \n S13041 | \n 0 | \n P071401001 | \n
\n \n 25 | \n S13043 | \n 0 | \n P060303001 | \n
\n \n 26 | \n S13044 | \n 0 | \n P060303001 | \n
\n \n 27 | \n S13051 | \n 0 | \n P050102001 | \n
\n \n 28 | \n S13051 | \n 1 | \n P071003001 | \n
\n \n 29 | \n S13051 | \n 2 | \n P080804001 | \n
\n \n 30 | \n S13052 | \n 0 | \n P050101001 | \n
\n \n 31 | \n S14006 | \n 0 | \n P060303001 | \n
\n \n 32 | \n S14010 | \n 0 | \n P060303001 | \n
\n \n 33 | \n S14011 | \n 0 | \n P060101001 | \n
\n \n 34 | \n S14012 | \n 0 | \n P060303001 | \n
\n \n 35 | \n S14021 | \n 0 | \n P060101001 | \n
\n \n 36 | \n S14022 | \n 0 | \n P060303001 | \n
\n \n 37 | \n S14023 | \n 0 | \n P071401001 | \n
\n \n 38 | \n S14024 | \n 0 | \n P060303001 | \n
\n \n 39 | \n S14025 | \n 0 | \n P060303001 | \n
\n \n 40 | \n S14026 | \n 0 | \n P071401001 | \n
\n \n 41 | \n S14027 | \n 0 | \n P060303001 | \n
\n \n 42 | \n S14028 | \n 0 | \n P060303001 | \n
\n \n 43 | \n S14033 | \n 0 | \n P071401001 | \n
\n \n 44 | \n S14034 | \n 0 | \n P060303001 | \n
\n \n 45 | \n S14036 | \n 0 | \n P040503001 | \n
\n \n 46 | \n S14036 | \n 1 | \n P060101001 | \n
\n \n 47 | \n S14040 | \n 0 | \n P060303001 | \n
\n \n 48 | \n S14042 | \n 0 | \n P050101001 | \n
\n \n 49 | \n S14045 | \n 0 | \n P060303001 | \n
\n \n 50 | \n S14046 | \n 0 | \n P060303001 | \n
\n \n 51 | \n S14047 | \n 0 | \n P060303001 | \n
\n \n 52 | \n S14048 | \n 0 | \n P050101001 | \n
\n \n 53 | \n S14049 | \n 0 | \n P060303001 | \n
\n \n 54 | \n S14050 | \n 0 | \n P060303001 | \n
\n \n
\n
",
"text/plain": " store_cd level_1 product_cd\n0 S12007 0 P060303001\n1 S12013 0 P060303001\n2 S12014 0 P060303001\n3 S12029 0 P060303001\n4 S12030 0 P060303001\n5 S13001 0 P060303001\n6 S13002 0 P060303001\n7 S13003 0 P071401001\n8 S13004 0 P060303001\n9 S13005 0 P040503001\n10 S13008 0 P060303001\n11 S13009 0 P060303001\n12 S13015 0 P071401001\n13 S13016 0 P071102001\n14 S13017 0 P060101002\n15 S13018 0 P071401001\n16 S13019 0 P071401001\n17 S13020 0 P071401001\n18 S13031 0 P060303001\n19 S13032 0 P060303001\n20 S13035 0 P040503001\n21 S13037 0 P060303001\n22 S13038 0 P060303001\n23 S13039 0 P071401001\n24 S13041 0 P071401001\n25 S13043 0 P060303001\n26 S13044 0 P060303001\n27 S13051 0 P050102001\n28 S13051 1 P071003001\n29 S13051 2 P080804001\n30 S13052 0 P050101001\n31 S14006 0 P060303001\n32 S14010 0 P060303001\n33 S14011 0 P060101001\n34 S14012 0 P060303001\n35 S14021 0 P060101001\n36 S14022 0 P060303001\n37 S14023 0 P071401001\n38 S14024 0 P060303001\n39 S14025 0 P060303001\n40 S14026 0 P071401001\n41 S14027 0 P060303001\n42 S14028 0 P060303001\n43 S14033 0 P071401001\n44 S14034 0 P060303001\n45 S14036 0 P040503001\n46 S14036 1 P060101001\n47 S14040 0 P060303001\n48 S14042 0 P050101001\n49 S14045 0 P060303001\n50 S14046 0 P060303001\n51 S14047 0 P060303001\n52 S14048 0 P050101001\n53 S14049 0 P060303001\n54 S14050 0 P060303001"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-030: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標本分散を計算し、降順でTOP5を表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').amount.var(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 31,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n
\n \n \n \n 28 | \n S13052 | \n 440088.701311 | \n
\n \n 31 | \n S14011 | \n 306314.558164 | \n
\n \n 42 | \n S14034 | \n 296920.081011 | \n
\n \n 5 | \n S13001 | \n 295431.993329 | \n
\n \n 12 | \n S13015 | \n 295294.361116 | \n
\n \n
\n
",
"text/plain": " store_cd amount\n28 S13052 440088.701311\n31 S14011 306314.558164\n42 S14034 296920.081011\n5 S13001 295431.993329\n12 S13015 295294.361116"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-031: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標本標準偏差を計算し、降順でTOP5を表示せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').amount.std(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)",
"execution_count": 32,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 32,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n
\n \n \n \n 28 | \n S13052 | \n 663.391816 | \n
\n \n 31 | \n S14011 | \n 553.456916 | \n
\n \n 42 | \n S14034 | \n 544.903736 | \n
\n \n 5 | \n S13001 | \n 543.536561 | \n
\n \n 12 | \n S13015 | \n 543.409938 | \n
\n \n
\n
",
"text/plain": " store_cd amount\n28 S13052 663.391816\n31 S14011 553.456916\n42 S14034 544.903736\n5 S13001 543.536561\n12 S13015 543.409938"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-032: レシート明細データフレーム(df_receipt)の売上金額(amount)について、25%刻みでパーセンタイル値を求めよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例1\nnp.percentile(df_receipt['amount'], q=[25, 50, 75,100])",
"execution_count": 33,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 33,
"data": {
"text/plain": "array([ 102., 170., 288., 10925.])"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例2\ndf_receipt.amount.quantile(q=np.arange(5)/4)",
"execution_count": 34,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 34,
"data": {
"text/plain": "0.00 10.0\n0.25 102.0\n0.50 170.0\n0.75 288.0\n1.00 10925.0\nName: amount, dtype: float64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-033: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、330以上のものを抽出せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_receipt.groupby('store_cd').amount.mean().reset_index().query('amount >= 330')",
"execution_count": 35,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 35,
"data": {
"text/html": "\n\n
\n \n \n | \n store_cd | \n amount | \n
\n \n \n \n 1 | \n S12013 | \n 330.194130 | \n
\n \n 5 | \n S13001 | \n 348.470386 | \n
\n \n 7 | \n S13003 | \n 350.915519 | \n
\n \n 8 | \n S13004 | \n 330.943949 | \n
\n \n 12 | \n S13015 | \n 351.111960 | \n
\n \n 16 | \n S13019 | \n 330.208616 | \n
\n \n 17 | \n S13020 | \n 337.879932 | \n
\n \n 28 | \n S13052 | \n 402.867470 | \n
\n \n 30 | \n S14010 | \n 348.791262 | \n
\n \n 31 | \n S14011 | \n 335.718333 | \n
\n \n 38 | \n S14026 | \n 332.340588 | \n
\n \n 46 | \n S14045 | \n 330.082073 | \n
\n \n 48 | \n S14047 | \n 330.077073 | \n
\n \n
\n
",
"text/plain": " store_cd amount\n1 S12013 330.194130\n5 S13001 348.470386\n7 S13003 350.915519\n8 S13004 330.943949\n12 S13015 351.111960\n16 S13019 330.208616\n17 S13020 337.879932\n28 S13052 402.867470\n30 S14010 348.791262\n31 S14011 335.718333\n38 S14026 332.340588\n46 S14045 330.082073\n48 S14047 330.077073"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-034: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求めよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。\n"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# queryを使わない書き方\ndf_receipt[~df_receipt['customer_id'].str.startswith(\"Z\")].groupby('customer_id').amount.sum().mean()",
"execution_count": 36,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 36,
"data": {
"text/plain": "2547.742234529256"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# queryを使う書き方\ndf_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python').groupby('customer_id').amount.sum().mean()",
"execution_count": 37,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 37,
"data": {
"text/plain": "2547.742234529256"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-035: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求め、平均以上に買い物をしている顧客を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、データは10件だけ表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "amount_mean = df_receipt[~df_receipt['customer_id'].str.startswith(\"Z\")].groupby('customer_id').amount.sum().mean()\ndf_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_amount_sum[df_amount_sum['amount'] >= amount_mean].head(10)",
"execution_count": 38,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 38,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n
\n \n \n \n 2 | \n CS001115000010 | \n 3044 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n
\n \n 13 | \n CS001214000009 | \n 4685 | \n
\n \n 14 | \n CS001214000017 | \n 4132 | \n
\n \n 17 | \n CS001214000052 | \n 5639 | \n
\n \n 21 | \n CS001215000040 | \n 3496 | \n
\n \n 30 | \n CS001304000006 | \n 3726 | \n
\n \n 32 | \n CS001305000005 | \n 3485 | \n
\n \n 33 | \n CS001305000011 | \n 4370 | \n
\n \n 53 | \n CS001315000180 | \n 3300 | \n
\n \n
\n
",
"text/plain": " customer_id amount\n2 CS001115000010 3044\n4 CS001205000006 3337\n13 CS001214000009 4685\n14 CS001214000017 4132\n17 CS001214000052 5639\n21 CS001215000040 3496\n30 CS001304000006 3726\n32 CS001305000005 3485\n33 CS001305000011 4370\n53 CS001315000180 3300"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-036: レシート明細データフレーム(df_receipt)と店舗データフレーム(df_store)を内部結合し、レシート明細データフレームの全項目と店舗データフレームの店舗名(store_name)を10件表示させよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.merge(df_receipt, df_store[['store_cd','store_name']], how='inner', on='store_cd').head(10)",
"execution_count": 39,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 39,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n sales_epoch | \n store_cd | \n receipt_no | \n receipt_sub_no | \n customer_id | \n product_cd | \n quantity | \n amount | \n store_name | \n
\n \n \n \n 0 | \n 20181103 | \n 1257206400 | \n S14006 | \n 112 | \n 1 | \n CS006214000001 | \n P070305012 | \n 1 | \n 158 | \n 葛が谷店 | \n
\n \n 1 | \n 20181116 | \n 1258329600 | \n S14006 | \n 112 | \n 2 | \n ZZ000000000000 | \n P080401001 | \n 1 | \n 48 | \n 葛が谷店 | \n
\n \n 2 | \n 20170118 | \n 1200614400 | \n S14006 | \n 1162 | \n 1 | \n CS006815000006 | \n P050406035 | \n 1 | \n 220 | \n 葛が谷店 | \n
\n \n 3 | \n 20190524 | \n 1274659200 | \n S14006 | \n 1192 | \n 1 | \n CS006514000034 | \n P060104003 | \n 1 | \n 80 | \n 葛が谷店 | \n
\n \n 4 | \n 20190419 | \n 1271635200 | \n S14006 | \n 112 | \n 2 | \n ZZ000000000000 | \n P060501002 | \n 1 | \n 148 | \n 葛が谷店 | \n
\n \n 5 | \n 20181119 | \n 1258588800 | \n S14006 | \n 1152 | \n 2 | \n ZZ000000000000 | \n P050701001 | \n 1 | \n 88 | \n 葛が谷店 | \n
\n \n 6 | \n 20171211 | \n 1228953600 | \n S14006 | \n 1132 | \n 2 | \n CS006515000175 | \n P090903001 | \n 1 | \n 80 | \n 葛が谷店 | \n
\n \n 7 | \n 20191021 | \n 1287619200 | \n S14006 | \n 1112 | \n 2 | \n CS006415000221 | \n P040602001 | \n 1 | \n 405 | \n 葛が谷店 | \n
\n \n 8 | \n 20170710 | \n 1215648000 | \n S14006 | \n 1132 | \n 2 | \n CS006411000036 | \n P090301051 | \n 1 | \n 330 | \n 葛が谷店 | \n
\n \n 9 | \n 20190805 | \n 1280966400 | \n S14006 | \n 112 | \n 1 | \n CS006211000012 | \n P050104001 | \n 1 | \n 115 | \n 葛が谷店 | \n
\n \n
\n
",
"text/plain": " sales_ymd sales_epoch store_cd receipt_no receipt_sub_no \\\n0 20181103 1257206400 S14006 112 1 \n1 20181116 1258329600 S14006 112 2 \n2 20170118 1200614400 S14006 1162 1 \n3 20190524 1274659200 S14006 1192 1 \n4 20190419 1271635200 S14006 112 2 \n5 20181119 1258588800 S14006 1152 2 \n6 20171211 1228953600 S14006 1132 2 \n7 20191021 1287619200 S14006 1112 2 \n8 20170710 1215648000 S14006 1132 2 \n9 20190805 1280966400 S14006 112 1 \n\n customer_id product_cd quantity amount store_name \n0 CS006214000001 P070305012 1 158 葛が谷店 \n1 ZZ000000000000 P080401001 1 48 葛が谷店 \n2 CS006815000006 P050406035 1 220 葛が谷店 \n3 CS006514000034 P060104003 1 80 葛が谷店 \n4 ZZ000000000000 P060501002 1 148 葛が谷店 \n5 ZZ000000000000 P050701001 1 88 葛が谷店 \n6 CS006515000175 P090903001 1 80 葛が谷店 \n7 CS006415000221 P040602001 1 405 葛が谷店 \n8 CS006411000036 P090301051 1 330 葛が谷店 \n9 CS006211000012 P050104001 1 115 葛が谷店 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-037: 商品データフレーム(df_product)とカテゴリデータフレーム(df_category)を内部結合し、商品データフレームの全項目とカテゴリデータフレームの小区分名(category_small_name)を10件表示させよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.merge(df_product\n , df_category[['category_major_cd', 'category_medium_cd','category_small_cd','category_small_name']]\n , how='inner', on=['category_major_cd', 'category_medium_cd','category_small_cd']).head(10)",
"execution_count": 40,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 40,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n category_small_name | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 弁当類 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 弁当類 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 弁当類 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 弁当類 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 弁当類 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 弁当類 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 弁当類 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 弁当類 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 弁当類 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 弁当類 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_small_name \n0 198.0 149.0 弁当類 \n1 218.0 164.0 弁当類 \n2 230.0 173.0 弁当類 \n3 248.0 186.0 弁当類 \n4 268.0 201.0 弁当類 \n5 298.0 224.0 弁当類 \n6 338.0 254.0 弁当類 \n7 420.0 315.0 弁当類 \n8 498.0 374.0 弁当類 \n9 580.0 435.0 弁当類 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-038: 顧客データフレーム(df_customer)とレシート明細データフレーム(df_receipt)から、各顧客ごとの売上金額合計を求めよ。ただし、買い物の実績がない顧客については売上金額を0として表示させること。また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、非会員(顧客IDが'Z'から始まるもの)は除外すること。なお、結果は10件だけ表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_tmp = df_customer.query('gender_cd == \"1\" and not customer_id.str.startswith(\"Z\")', engine='python')\npd.merge(df_tmp['customer_id'], df_amount_sum, how='left', on='customer_id').fillna(0).head(10)",
"execution_count": 41,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 41,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n
\n \n \n \n 0 | \n CS021313000114 | \n 0.0 | \n
\n \n 1 | \n CS031415000172 | \n 5088.0 | \n
\n \n 2 | \n CS028811000001 | \n 0.0 | \n
\n \n 3 | \n CS001215000145 | \n 875.0 | \n
\n \n 4 | \n CS015414000103 | \n 3122.0 | \n
\n \n 5 | \n CS033513000180 | \n 868.0 | \n
\n \n 6 | \n CS035614000014 | \n 0.0 | \n
\n \n 7 | \n CS011215000048 | \n 3444.0 | \n
\n \n 8 | \n CS009413000079 | \n 0.0 | \n
\n \n 9 | \n CS040412000191 | \n 210.0 | \n
\n \n
\n
",
"text/plain": " customer_id amount\n0 CS021313000114 0.0\n1 CS031415000172 5088.0\n2 CS028811000001 0.0\n3 CS001215000145 875.0\n4 CS015414000103 3122.0\n5 CS033513000180 868.0\n6 CS035614000014 0.0\n7 CS011215000048 3444.0\n8 CS009413000079 0.0\n9 CS040412000191 210.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-039: レシート明細データフレーム(df_receipt)から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。ただし、非会員(顧客IDが'Z'から始まるもの)は除外すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_sum = df_sum.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_sum = df_sum.sort_values('amount', ascending=False).head(20)\n\ndf_cnt = df_receipt[~df_receipt.duplicated(subset=['customer_id', 'sales_ymd'])]\ndf_cnt = df_cnt.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_cnt = df_cnt.groupby('customer_id').sales_ymd.count().reset_index()\ndf_cnt = df_cnt.sort_values('sales_ymd', ascending=False).head(20)\n\npd.merge(df_sum, df_cnt, how='outer', on='customer_id')",
"execution_count": 42,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 42,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n sales_ymd | \n
\n \n \n \n 0 | \n CS017415000097 | \n 23086.0 | \n 20.0 | \n
\n \n 1 | \n CS015415000185 | \n 20153.0 | \n 22.0 | \n
\n \n 2 | \n CS031414000051 | \n 19202.0 | \n 19.0 | \n
\n \n 3 | \n CS028415000007 | \n 19127.0 | \n 21.0 | \n
\n \n 4 | \n CS001605000009 | \n 18925.0 | \n NaN | \n
\n \n 5 | \n CS010214000010 | \n 18585.0 | \n 22.0 | \n
\n \n 6 | \n CS016415000141 | \n 18372.0 | \n 20.0 | \n
\n \n 7 | \n CS006515000023 | \n 18372.0 | \n NaN | \n
\n \n 8 | \n CS011414000106 | \n 18338.0 | \n NaN | \n
\n \n 9 | \n CS038415000104 | \n 17847.0 | \n NaN | \n
\n \n 10 | \n CS035414000024 | \n 17615.0 | \n NaN | \n
\n \n 11 | \n CS021515000089 | \n 17580.0 | \n NaN | \n
\n \n 12 | \n CS032414000072 | \n 16563.0 | \n NaN | \n
\n \n 13 | \n CS016415000101 | \n 16348.0 | \n NaN | \n
\n \n 14 | \n CS011415000006 | \n 16094.0 | \n NaN | \n
\n \n 15 | \n CS034415000047 | \n 16083.0 | \n NaN | \n
\n \n 16 | \n CS007514000094 | \n 15735.0 | \n NaN | \n
\n \n 17 | \n CS009414000059 | \n 15492.0 | \n NaN | \n
\n \n 18 | \n CS030415000034 | \n 15468.0 | \n NaN | \n
\n \n 19 | \n CS015515000034 | \n 15300.0 | \n NaN | \n
\n \n 20 | \n CS040214000008 | \n NaN | \n 23.0 | \n
\n \n 21 | \n CS010214000002 | \n NaN | \n 21.0 | \n
\n \n 22 | \n CS014214000023 | \n NaN | \n 19.0 | \n
\n \n 23 | \n CS022515000226 | \n NaN | \n 19.0 | \n
\n \n 24 | \n CS021515000172 | \n NaN | \n 19.0 | \n
\n \n 25 | \n CS039414000052 | \n NaN | \n 19.0 | \n
\n \n 26 | \n CS021514000045 | \n NaN | \n 19.0 | \n
\n \n 27 | \n CS022515000028 | \n NaN | \n 18.0 | \n
\n \n 28 | \n CS030214000008 | \n NaN | \n 18.0 | \n
\n \n 29 | \n CS021515000056 | \n NaN | \n 18.0 | \n
\n \n 30 | \n CS014415000077 | \n NaN | \n 18.0 | \n
\n \n 31 | \n CS021515000211 | \n NaN | \n 18.0 | \n
\n \n 32 | \n CS032415000209 | \n NaN | \n 18.0 | \n
\n \n 33 | \n CS031414000073 | \n NaN | \n 18.0 | \n
\n \n
\n
",
"text/plain": " customer_id amount sales_ymd\n0 CS017415000097 23086.0 20.0\n1 CS015415000185 20153.0 22.0\n2 CS031414000051 19202.0 19.0\n3 CS028415000007 19127.0 21.0\n4 CS001605000009 18925.0 NaN\n5 CS010214000010 18585.0 22.0\n6 CS016415000141 18372.0 20.0\n7 CS006515000023 18372.0 NaN\n8 CS011414000106 18338.0 NaN\n9 CS038415000104 17847.0 NaN\n10 CS035414000024 17615.0 NaN\n11 CS021515000089 17580.0 NaN\n12 CS032414000072 16563.0 NaN\n13 CS016415000101 16348.0 NaN\n14 CS011415000006 16094.0 NaN\n15 CS034415000047 16083.0 NaN\n16 CS007514000094 15735.0 NaN\n17 CS009414000059 15492.0 NaN\n18 CS030415000034 15468.0 NaN\n19 CS015515000034 15300.0 NaN\n20 CS040214000008 NaN 23.0\n21 CS010214000002 NaN 21.0\n22 CS014214000023 NaN 19.0\n23 CS022515000226 NaN 19.0\n24 CS021515000172 NaN 19.0\n25 CS039414000052 NaN 19.0\n26 CS021514000045 NaN 19.0\n27 CS022515000028 NaN 18.0\n28 CS030214000008 NaN 18.0\n29 CS021515000056 NaN 18.0\n30 CS014415000077 NaN 18.0\n31 CS021515000211 NaN 18.0\n32 CS032415000209 NaN 18.0\n33 CS031414000073 NaN 18.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-040: 全ての店舗と全ての商品を組み合わせると何件のデータとなるか調査したい。店舗(df_store)と商品(df_product)を直積した件数を計算せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_store_tmp = df_store.copy()\ndf_product_tmp = df_product.copy()\n\ndf_store_tmp['key'] = 0\ndf_product_tmp['key'] = 0\nlen(pd.merge(df_store_tmp, df_product_tmp, how='outer', on='key'))",
"execution_count": 43,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 43,
"data": {
"text/plain": "531590"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-041: レシート明細データフレーム(df_receipt)の売上金額(amount)を日付(sales_ymd)ごとに集計し、前日からの売上金額増減を計算せよ。なお、計算結果は10件表示すればよい。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\ndf_sales_amount_by_date = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift()], axis=1)\ndf_sales_amount_by_date.columns = ['sales_ymd','amount','lag_ymd','lag_amount']\ndf_sales_amount_by_date['diff_amount'] = df_sales_amount_by_date['amount'] - df_sales_amount_by_date['lag_amount']\ndf_sales_amount_by_date.head(10)",
"execution_count": 44,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 44,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n amount | \n lag_ymd | \n lag_amount | \n diff_amount | \n
\n \n \n \n 0 | \n 20170101 | \n 33723 | \n NaN | \n NaN | \n NaN | \n
\n \n 1 | \n 20170102 | \n 24165 | \n 20170101.0 | \n 33723.0 | \n -9558.0 | \n
\n \n 2 | \n 20170103 | \n 27503 | \n 20170102.0 | \n 24165.0 | \n 3338.0 | \n
\n \n 3 | \n 20170104 | \n 36165 | \n 20170103.0 | \n 27503.0 | \n 8662.0 | \n
\n \n 4 | \n 20170105 | \n 37830 | \n 20170104.0 | \n 36165.0 | \n 1665.0 | \n
\n \n 5 | \n 20170106 | \n 32387 | \n 20170105.0 | \n 37830.0 | \n -5443.0 | \n
\n \n 6 | \n 20170107 | \n 23415 | \n 20170106.0 | \n 32387.0 | \n -8972.0 | \n
\n \n 7 | \n 20170108 | \n 24737 | \n 20170107.0 | \n 23415.0 | \n 1322.0 | \n
\n \n 8 | \n 20170109 | \n 26718 | \n 20170108.0 | \n 24737.0 | \n 1981.0 | \n
\n \n 9 | \n 20170110 | \n 20143 | \n 20170109.0 | \n 26718.0 | \n -6575.0 | \n
\n \n
\n
",
"text/plain": " sales_ymd amount lag_ymd lag_amount diff_amount\n0 20170101 33723 NaN NaN NaN\n1 20170102 24165 20170101.0 33723.0 -9558.0\n2 20170103 27503 20170102.0 24165.0 3338.0\n3 20170104 36165 20170103.0 27503.0 8662.0\n4 20170105 37830 20170104.0 36165.0 1665.0\n5 20170106 32387 20170105.0 37830.0 -5443.0\n6 20170107 23415 20170106.0 32387.0 -8972.0\n7 20170108 24737 20170107.0 23415.0 1322.0\n8 20170109 26718 20170108.0 24737.0 1981.0\n9 20170110 20143 20170109.0 26718.0 -6575.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-042: レシート明細データフレーム(df_receipt)の売上金額(amount)を日付(sales_ymd)ごとに集計し、各日付のデータに対し、1日前、2日前、3日前のデータを結合せよ。結果は10件表示すればよい。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例1:縦持ちケース\ndf_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\nfor i in range(1, 4):\n if i == 1:\n df_lag = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1)\n else:\n df_lag = df_lag.append(pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1))\ndf_lag.columns = ['sales_ymd', 'amount', 'lag_ymd', 'lag_amount']\ndf_lag.dropna().sort_values(['sales_ymd','lag_ymd']).head(10)",
"execution_count": 45,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 45,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n amount | \n lag_ymd | \n lag_amount | \n
\n \n \n \n 1 | \n 20170102 | \n 24165 | \n 20170101.0 | \n 33723.0 | \n
\n \n 2 | \n 20170103 | \n 27503 | \n 20170101.0 | \n 33723.0 | \n
\n \n 2 | \n 20170103 | \n 27503 | \n 20170102.0 | \n 24165.0 | \n
\n \n 3 | \n 20170104 | \n 36165 | \n 20170101.0 | \n 33723.0 | \n
\n \n 3 | \n 20170104 | \n 36165 | \n 20170102.0 | \n 24165.0 | \n
\n \n 3 | \n 20170104 | \n 36165 | \n 20170103.0 | \n 27503.0 | \n
\n \n 4 | \n 20170105 | \n 37830 | \n 20170102.0 | \n 24165.0 | \n
\n \n 4 | \n 20170105 | \n 37830 | \n 20170103.0 | \n 27503.0 | \n
\n \n 4 | \n 20170105 | \n 37830 | \n 20170104.0 | \n 36165.0 | \n
\n \n 5 | \n 20170106 | \n 32387 | \n 20170103.0 | \n 27503.0 | \n
\n \n
\n
",
"text/plain": " sales_ymd amount lag_ymd lag_amount\n1 20170102 24165 20170101.0 33723.0\n2 20170103 27503 20170101.0 33723.0\n2 20170103 27503 20170102.0 24165.0\n3 20170104 36165 20170101.0 33723.0\n3 20170104 36165 20170102.0 24165.0\n3 20170104 36165 20170103.0 27503.0\n4 20170105 37830 20170102.0 24165.0\n4 20170105 37830 20170103.0 27503.0\n4 20170105 37830 20170104.0 36165.0\n5 20170106 32387 20170103.0 27503.0"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例2:横持ちケース\ndf_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\nfor i in range(1, 4):\n if i == 1:\n df_lag = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1)\n else:\n df_lag = pd.concat([df_lag, df_sales_amount_by_date.shift(i)],axis=1)\ndf_lag.columns = ['sales_ymd', 'amount', 'lag_ymd_1', 'lag_amount_1', 'lag_ymd_2', 'lag_amount_2', 'lag_ymd_3', 'lag_amount_3']\ndf_lag.dropna().sort_values(['sales_ymd']).head(10)",
"execution_count": 46,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 46,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ymd | \n amount | \n lag_ymd_1 | \n lag_amount_1 | \n lag_ymd_2 | \n lag_amount_2 | \n lag_ymd_3 | \n lag_amount_3 | \n
\n \n \n \n 3 | \n 20170104 | \n 36165 | \n 20170103.0 | \n 27503.0 | \n 20170102.0 | \n 24165.0 | \n 20170101.0 | \n 33723.0 | \n
\n \n 4 | \n 20170105 | \n 37830 | \n 20170104.0 | \n 36165.0 | \n 20170103.0 | \n 27503.0 | \n 20170102.0 | \n 24165.0 | \n
\n \n 5 | \n 20170106 | \n 32387 | \n 20170105.0 | \n 37830.0 | \n 20170104.0 | \n 36165.0 | \n 20170103.0 | \n 27503.0 | \n
\n \n 6 | \n 20170107 | \n 23415 | \n 20170106.0 | \n 32387.0 | \n 20170105.0 | \n 37830.0 | \n 20170104.0 | \n 36165.0 | \n
\n \n 7 | \n 20170108 | \n 24737 | \n 20170107.0 | \n 23415.0 | \n 20170106.0 | \n 32387.0 | \n 20170105.0 | \n 37830.0 | \n
\n \n 8 | \n 20170109 | \n 26718 | \n 20170108.0 | \n 24737.0 | \n 20170107.0 | \n 23415.0 | \n 20170106.0 | \n 32387.0 | \n
\n \n 9 | \n 20170110 | \n 20143 | \n 20170109.0 | \n 26718.0 | \n 20170108.0 | \n 24737.0 | \n 20170107.0 | \n 23415.0 | \n
\n \n 10 | \n 20170111 | \n 24287 | \n 20170110.0 | \n 20143.0 | \n 20170109.0 | \n 26718.0 | \n 20170108.0 | \n 24737.0 | \n
\n \n 11 | \n 20170112 | \n 23526 | \n 20170111.0 | \n 24287.0 | \n 20170110.0 | \n 20143.0 | \n 20170109.0 | \n 26718.0 | \n
\n \n 12 | \n 20170113 | \n 28004 | \n 20170112.0 | \n 23526.0 | \n 20170111.0 | \n 24287.0 | \n 20170110.0 | \n 20143.0 | \n
\n \n
\n
",
"text/plain": " sales_ymd amount lag_ymd_1 lag_amount_1 lag_ymd_2 lag_amount_2 \\\n3 20170104 36165 20170103.0 27503.0 20170102.0 24165.0 \n4 20170105 37830 20170104.0 36165.0 20170103.0 27503.0 \n5 20170106 32387 20170105.0 37830.0 20170104.0 36165.0 \n6 20170107 23415 20170106.0 32387.0 20170105.0 37830.0 \n7 20170108 24737 20170107.0 23415.0 20170106.0 32387.0 \n8 20170109 26718 20170108.0 24737.0 20170107.0 23415.0 \n9 20170110 20143 20170109.0 26718.0 20170108.0 24737.0 \n10 20170111 24287 20170110.0 20143.0 20170109.0 26718.0 \n11 20170112 23526 20170111.0 24287.0 20170110.0 20143.0 \n12 20170113 28004 20170112.0 23526.0 20170111.0 24287.0 \n\n lag_ymd_3 lag_amount_3 \n3 20170101.0 33723.0 \n4 20170102.0 24165.0 \n5 20170103.0 27503.0 \n6 20170104.0 36165.0 \n7 20170105.0 37830.0 \n8 20170106.0 32387.0 \n9 20170107.0 23415.0 \n10 20170108.0 24737.0 \n11 20170109.0 26718.0 \n12 20170110.0 20143.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-043: レシート明細データフレーム(df_receipt)と顧客データフレーム(df_customer)を結合し、性別(gender)と年代(ageから計算)ごとに売上金額(amount)を合計した売上サマリデータフレーム(df_sales_summary)を作成せよ。性別は0が男性、1が女性、9が不明を表すものとする。\n>\n> ただし、項目構成は年代、女性の売上金額、男性の売上金額、性別不明の売上金額の4項目とすること(縦に年代、横に性別のクロス集計)。また、年代は10歳ごとの階級とすること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_receipt, df_customer, how ='inner', on=\"customer_id\")\ndf_tmp['era'] = df_tmp['age'].apply(lambda x: math.floor(x / 10) * 10)\ndf_sales_summary = pd.pivot_table(df_tmp, index='era', columns='gender_cd', values='amount', aggfunc='sum').reset_index()\ndf_sales_summary.columns = ['era', 'male', 'female', 'unknown']\ndf_sales_summary",
"execution_count": 47,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 47,
"data": {
"text/html": "\n\n
\n \n \n | \n era | \n male | \n female | \n unknown | \n
\n \n \n \n 0 | \n 10 | \n 1591.0 | \n 149836.0 | \n 4317.0 | \n
\n \n 1 | \n 20 | \n 72940.0 | \n 1363724.0 | \n 44328.0 | \n
\n \n 2 | \n 30 | \n 177322.0 | \n 693047.0 | \n 50441.0 | \n
\n \n 3 | \n 40 | \n 19355.0 | \n 9320791.0 | \n 483512.0 | \n
\n \n 4 | \n 50 | \n 54320.0 | \n 6685192.0 | \n 342923.0 | \n
\n \n 5 | \n 60 | \n 272469.0 | \n 987741.0 | \n 71418.0 | \n
\n \n 6 | \n 70 | \n 13435.0 | \n 29764.0 | \n 2427.0 | \n
\n \n 7 | \n 80 | \n 46360.0 | \n 262923.0 | \n 5111.0 | \n
\n \n 8 | \n 90 | \n NaN | \n 6260.0 | \n NaN | \n
\n \n
\n
",
"text/plain": " era male female unknown\n0 10 1591.0 149836.0 4317.0\n1 20 72940.0 1363724.0 44328.0\n2 30 177322.0 693047.0 50441.0\n3 40 19355.0 9320791.0 483512.0\n4 50 54320.0 6685192.0 342923.0\n5 60 272469.0 987741.0 71418.0\n6 70 13435.0 29764.0 2427.0\n7 80 46360.0 262923.0 5111.0\n8 90 NaN 6260.0 NaN"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-044: 前設問で作成した売上サマリデータフレーム(df_sales_summary)は性別の売上を横持ちさせたものであった。このデータフレームから性別を縦持ちさせ、年代、性別コード、売上金額の3項目に変換せよ。ただし、性別コードは男性を'00'、女性を'01'、不明を'99'とする。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_sales_summary = df_sales_summary.set_index('era'). \\\n stack().reset_index().replace({'female':'01',\n 'male':'00',\n 'unknown':'99'}).rename(columns={'level_1':'gender_cd', 0: 'amount'})",
"execution_count": 48,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_sales_summary",
"execution_count": 49,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 49,
"data": {
"text/html": "\n\n
\n \n \n | \n era | \n gender_cd | \n amount | \n
\n \n \n \n 0 | \n 10 | \n 00 | \n 1591.0 | \n
\n \n 1 | \n 10 | \n 01 | \n 149836.0 | \n
\n \n 2 | \n 10 | \n 99 | \n 4317.0 | \n
\n \n 3 | \n 20 | \n 00 | \n 72940.0 | \n
\n \n 4 | \n 20 | \n 01 | \n 1363724.0 | \n
\n \n 5 | \n 20 | \n 99 | \n 44328.0 | \n
\n \n 6 | \n 30 | \n 00 | \n 177322.0 | \n
\n \n 7 | \n 30 | \n 01 | \n 693047.0 | \n
\n \n 8 | \n 30 | \n 99 | \n 50441.0 | \n
\n \n 9 | \n 40 | \n 00 | \n 19355.0 | \n
\n \n 10 | \n 40 | \n 01 | \n 9320791.0 | \n
\n \n 11 | \n 40 | \n 99 | \n 483512.0 | \n
\n \n 12 | \n 50 | \n 00 | \n 54320.0 | \n
\n \n 13 | \n 50 | \n 01 | \n 6685192.0 | \n
\n \n 14 | \n 50 | \n 99 | \n 342923.0 | \n
\n \n 15 | \n 60 | \n 00 | \n 272469.0 | \n
\n \n 16 | \n 60 | \n 01 | \n 987741.0 | \n
\n \n 17 | \n 60 | \n 99 | \n 71418.0 | \n
\n \n 18 | \n 70 | \n 00 | \n 13435.0 | \n
\n \n 19 | \n 70 | \n 01 | \n 29764.0 | \n
\n \n 20 | \n 70 | \n 99 | \n 2427.0 | \n
\n \n 21 | \n 80 | \n 00 | \n 46360.0 | \n
\n \n 22 | \n 80 | \n 01 | \n 262923.0 | \n
\n \n 23 | \n 80 | \n 99 | \n 5111.0 | \n
\n \n 24 | \n 90 | \n 01 | \n 6260.0 | \n
\n \n
\n
",
"text/plain": " era gender_cd amount\n0 10 00 1591.0\n1 10 01 149836.0\n2 10 99 4317.0\n3 20 00 72940.0\n4 20 01 1363724.0\n5 20 99 44328.0\n6 30 00 177322.0\n7 30 01 693047.0\n8 30 99 50441.0\n9 40 00 19355.0\n10 40 01 9320791.0\n11 40 99 483512.0\n12 50 00 54320.0\n13 50 01 6685192.0\n14 50 99 342923.0\n15 60 00 272469.0\n16 60 01 987741.0\n17 60 99 71418.0\n18 70 00 13435.0\n19 70 01 29764.0\n20 70 99 2427.0\n21 80 00 46360.0\n22 80 01 262923.0\n23 80 99 5111.0\n24 90 01 6260.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-045: 顧客データフレーム(df_customer)の生年月日(birth_day)は日付型(Date)でデータを保有している。これをYYYYMMDD形式の文字列に変換し、顧客ID(customer_id)とともに抽出せよ。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_customer['customer_id'],\n pd.to_datetime(df_customer['birth_day']).dt.strftime('%Y%m%d')],\n axis = 1).head(10)",
"execution_count": 50,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 50,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n birth_day | \n
\n \n \n \n 0 | \n CS021313000114 | \n 19810429 | \n
\n \n 1 | \n CS037613000071 | \n 19520401 | \n
\n \n 2 | \n CS031415000172 | \n 19761004 | \n
\n \n 3 | \n CS028811000001 | \n 19330327 | \n
\n \n 4 | \n CS001215000145 | \n 19950329 | \n
\n \n 5 | \n CS020401000016 | \n 19740915 | \n
\n \n 6 | \n CS015414000103 | \n 19770809 | \n
\n \n 7 | \n CS029403000008 | \n 19730817 | \n
\n \n 8 | \n CS015804000004 | \n 19310502 | \n
\n \n 9 | \n CS033513000180 | \n 19620711 | \n
\n \n
\n
",
"text/plain": " customer_id birth_day\n0 CS021313000114 19810429\n1 CS037613000071 19520401\n2 CS031415000172 19761004\n3 CS028811000001 19330327\n4 CS001215000145 19950329\n5 CS020401000016 19740915\n6 CS015414000103 19770809\n7 CS029403000008 19730817\n8 CS015804000004 19310502\n9 CS033513000180 19620711"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-046: 顧客データフレーム(df_customer)の申し込み日(application_date)はYYYYMMD形式の文字列型でデータを保有している。これを日付型(dateやdatetime)に変換し、顧客ID(customer_id)とともに抽出せよ。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_customer['customer_id'],pd.to_datetime(df_customer['application_date'])], axis=1).head(10)",
"execution_count": 51,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 51,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n application_date | \n
\n \n \n \n 0 | \n CS021313000114 | \n 1970-01-01 00:00:00.020150905 | \n
\n \n 1 | \n CS037613000071 | \n 1970-01-01 00:00:00.020150414 | \n
\n \n 2 | \n CS031415000172 | \n 1970-01-01 00:00:00.020150529 | \n
\n \n 3 | \n CS028811000001 | \n 1970-01-01 00:00:00.020160115 | \n
\n \n 4 | \n CS001215000145 | \n 1970-01-01 00:00:00.020170605 | \n
\n \n 5 | \n CS020401000016 | \n 1970-01-01 00:00:00.020150225 | \n
\n \n 6 | \n CS015414000103 | \n 1970-01-01 00:00:00.020150722 | \n
\n \n 7 | \n CS029403000008 | \n 1970-01-01 00:00:00.020150515 | \n
\n \n 8 | \n CS015804000004 | \n 1970-01-01 00:00:00.020150607 | \n
\n \n 9 | \n CS033513000180 | \n 1970-01-01 00:00:00.020150728 | \n
\n \n
\n
",
"text/plain": " customer_id application_date\n0 CS021313000114 1970-01-01 00:00:00.020150905\n1 CS037613000071 1970-01-01 00:00:00.020150414\n2 CS031415000172 1970-01-01 00:00:00.020150529\n3 CS028811000001 1970-01-01 00:00:00.020160115\n4 CS001215000145 1970-01-01 00:00:00.020170605\n5 CS020401000016 1970-01-01 00:00:00.020150225\n6 CS015414000103 1970-01-01 00:00:00.020150722\n7 CS029403000008 1970-01-01 00:00:00.020150515\n8 CS015804000004 1970-01-01 00:00:00.020150607\n9 CS033513000180 1970-01-01 00:00:00.020150728"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-047: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)はYYYYMMDD形式の数値型でデータを保有している。これを日付型(dateやdatetime)に変換し、レシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_ymd'].astype('str'))],axis=1).head(10)",
"execution_count": 52,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 52,
"data": {
"text/html": "\n\n
\n \n \n | \n receipt_no | \n receipt_sub_no | \n sales_ymd | \n
\n \n \n \n 0 | \n 112 | \n 1 | \n 2018-11-03 | \n
\n \n 1 | \n 1132 | \n 2 | \n 2018-11-18 | \n
\n \n 2 | \n 1102 | \n 1 | \n 2017-07-12 | \n
\n \n 3 | \n 1132 | \n 1 | \n 2019-02-05 | \n
\n \n 4 | \n 1102 | \n 2 | \n 2018-08-21 | \n
\n \n 5 | \n 1112 | \n 1 | \n 2019-06-05 | \n
\n \n 6 | \n 1102 | \n 2 | \n 2018-12-05 | \n
\n \n 7 | \n 1102 | \n 1 | \n 2019-09-22 | \n
\n \n 8 | \n 1112 | \n 2 | \n 2017-05-04 | \n
\n \n 9 | \n 1102 | \n 1 | \n 2019-10-10 | \n
\n \n
\n
",
"text/plain": " receipt_no receipt_sub_no sales_ymd\n0 112 1 2018-11-03\n1 1132 2 2018-11-18\n2 1102 1 2017-07-12\n3 1132 1 2019-02-05\n4 1102 2 2018-08-21\n5 1112 1 2019-06-05\n6 1102 2 2018-12-05\n7 1102 1 2019-09-22\n8 1112 2 2017-05-04\n9 1102 1 2019-10-10"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-048: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)は数値型のUNIX秒でデータを保有している。これを日付型(dateやdatetime)に変換し、レシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s')],axis=1).head(10)",
"execution_count": 53,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 53,
"data": {
"text/html": "\n\n
\n \n \n | \n receipt_no | \n receipt_sub_no | \n sales_epoch | \n
\n \n \n \n 0 | \n 112 | \n 1 | \n 2009-11-03 | \n
\n \n 1 | \n 1132 | \n 2 | \n 2009-11-18 | \n
\n \n 2 | \n 1102 | \n 1 | \n 2008-07-12 | \n
\n \n 3 | \n 1132 | \n 1 | \n 2010-02-05 | \n
\n \n 4 | \n 1102 | \n 2 | \n 2009-08-21 | \n
\n \n 5 | \n 1112 | \n 1 | \n 2010-06-05 | \n
\n \n 6 | \n 1102 | \n 2 | \n 2009-12-05 | \n
\n \n 7 | \n 1102 | \n 1 | \n 2010-09-22 | \n
\n \n 8 | \n 1112 | \n 2 | \n 2008-05-04 | \n
\n \n 9 | \n 1102 | \n 1 | \n 2010-10-10 | \n
\n \n
\n
",
"text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 2009-11-03\n1 1132 2 2009-11-18\n2 1102 1 2008-07-12\n3 1132 1 2010-02-05\n4 1102 2 2009-08-21\n5 1112 1 2010-06-05\n6 1102 2 2009-12-05\n7 1102 1 2010-09-22\n8 1112 2 2008-05-04\n9 1102 1 2010-10-10"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-049: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"年\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.year],axis=1).head(10)",
"execution_count": 54,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 54,
"data": {
"text/html": "\n\n
\n \n \n | \n receipt_no | \n receipt_sub_no | \n sales_epoch | \n
\n \n \n \n 0 | \n 112 | \n 1 | \n 2009 | \n
\n \n 1 | \n 1132 | \n 2 | \n 2009 | \n
\n \n 2 | \n 1102 | \n 1 | \n 2008 | \n
\n \n 3 | \n 1132 | \n 1 | \n 2010 | \n
\n \n 4 | \n 1102 | \n 2 | \n 2009 | \n
\n \n 5 | \n 1112 | \n 1 | \n 2010 | \n
\n \n 6 | \n 1102 | \n 2 | \n 2009 | \n
\n \n 7 | \n 1102 | \n 1 | \n 2010 | \n
\n \n 8 | \n 1112 | \n 2 | \n 2008 | \n
\n \n 9 | \n 1102 | \n 1 | \n 2010 | \n
\n \n
\n
",
"text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 2009\n1 1132 2 2009\n2 1102 1 2008\n3 1132 1 2010\n4 1102 2 2009\n5 1112 1 2010\n6 1102 2 2009\n7 1102 1 2010\n8 1112 2 2008\n9 1102 1 2010"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-050: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"月\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。なお、\"月\"は0埋め2桁で取り出すこと。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# dt.monthでも月を取得できるが、ここでは0埋め2桁で取り出すためstrftimeを利用している\npd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.strftime('%m')],axis=1).head(10)",
"execution_count": 55,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 55,
"data": {
"text/html": "\n\n
\n \n \n | \n receipt_no | \n receipt_sub_no | \n sales_epoch | \n
\n \n \n \n 0 | \n 112 | \n 1 | \n 11 | \n
\n \n 1 | \n 1132 | \n 2 | \n 11 | \n
\n \n 2 | \n 1102 | \n 1 | \n 07 | \n
\n \n 3 | \n 1132 | \n 1 | \n 02 | \n
\n \n 4 | \n 1102 | \n 2 | \n 08 | \n
\n \n 5 | \n 1112 | \n 1 | \n 06 | \n
\n \n 6 | \n 1102 | \n 2 | \n 12 | \n
\n \n 7 | \n 1102 | \n 1 | \n 09 | \n
\n \n 8 | \n 1112 | \n 2 | \n 05 | \n
\n \n 9 | \n 1102 | \n 1 | \n 10 | \n
\n \n
\n
",
"text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 11\n1 1132 2 11\n2 1102 1 07\n3 1132 1 02\n4 1102 2 08\n5 1112 1 06\n6 1102 2 12\n7 1102 1 09\n8 1112 2 05\n9 1102 1 10"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-051: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"日\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。なお、\"日\"は0埋め2桁で取り出すこと。データは10件を抽出すれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# dt.dayでも日を取得できるが、ここでは0埋め2桁で取り出すためstrftimeを利用している\npd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.strftime('%d')],axis=1).head(10)",
"execution_count": 56,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 56,
"data": {
"text/html": "\n\n
\n \n \n | \n receipt_no | \n receipt_sub_no | \n sales_epoch | \n
\n \n \n \n 0 | \n 112 | \n 1 | \n 03 | \n
\n \n 1 | \n 1132 | \n 2 | \n 18 | \n
\n \n 2 | \n 1102 | \n 1 | \n 12 | \n
\n \n 3 | \n 1132 | \n 1 | \n 05 | \n
\n \n 4 | \n 1102 | \n 2 | \n 21 | \n
\n \n 5 | \n 1112 | \n 1 | \n 05 | \n
\n \n 6 | \n 1102 | \n 2 | \n 05 | \n
\n \n 7 | \n 1102 | \n 1 | \n 22 | \n
\n \n 8 | \n 1112 | \n 2 | \n 04 | \n
\n \n 9 | \n 1102 | \n 1 | \n 10 | \n
\n \n
\n
",
"text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 03\n1 1132 2 18\n2 1102 1 12\n3 1132 1 05\n4 1102 2 21\n5 1112 1 05\n6 1102 2 05\n7 1102 1 22\n8 1112 2 04\n9 1102 1 10"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-052: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計の上、売上金額合計に対して2000円以下を0、2000円超を1に2値化し、顧客ID、売上金額合計とともに10件表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_sales_amount = df_sales_amount[['customer_id', 'amount']].groupby('customer_id').sum().reset_index()\ndf_sales_amount['sales_flg'] = df_sales_amount['amount'].apply(lambda x: 1 if x > 2000 else 0)\ndf_sales_amount.head(10)",
"execution_count": 57,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 57,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n sales_flg | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 0 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 0 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 1 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 0 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 1 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n 0 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n 0 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n 0 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n 0 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n 0 | \n
\n \n
\n
",
"text/plain": " customer_id amount sales_flg\n0 CS001113000004 1298 0\n1 CS001114000005 626 0\n2 CS001115000010 3044 1\n3 CS001205000004 1988 0\n4 CS001205000006 3337 1\n5 CS001211000025 456 0\n6 CS001212000027 448 0\n7 CS001212000031 296 0\n8 CS001212000046 228 0\n9 CS001212000070 456 0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-053: 顧客データフレーム(df_customer)の郵便番号(postal_cd)に対し、東京(先頭3桁が100〜209のもの)を1、それ以外のものを0に2値化せよ。さらにレシート明細データフレーム(df_receipt)と結合し、全期間において買い物実績のある顧客数を、作成した2値ごとにカウントせよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_customer[['customer_id', 'postal_cd']].copy()\ndf_tmp['postal_flg'] = df_tmp['postal_cd'].apply(lambda x: 1 if 100 <= int(x[0:3]) <= 209 else 0)\n\npd.merge(df_tmp, df_receipt, how='inner', on='customer_id'). \\\n groupby('postal_flg').agg({'customer_id':'nunique'})\n\n",
"execution_count": 58,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 58,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n
\n \n postal_flg | \n | \n
\n \n \n \n 0 | \n 3906 | \n
\n \n 1 | \n 4400 | \n
\n \n
\n
",
"text/plain": " customer_id\npostal_flg \n0 3906\n1 4400"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-054: 顧客データデータフレーム(df_customer)の住所(address)は、埼玉県、千葉県、東京都、神奈川県のいずれかとなっている。都道府県毎にコード値を作成し、顧客ID、住所とともに抽出せよ。値は埼玉県を11、千葉県を12、東京都を13、神奈川県を14とすること。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df_customer[['customer_id', 'address']], df_customer['address'].str[0:3].map({'埼玉県': '11',\n '千葉県':'12', \n '東京都':'13', \n '神奈川':'14'})], axis=1).head(10)",
"execution_count": 59,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 59,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n address | \n address | \n
\n \n \n \n 0 | \n CS021313000114 | \n 神奈川県伊勢原市粟窪********** | \n 14 | \n
\n \n 1 | \n CS037613000071 | \n 東京都江東区南砂********** | \n 13 | \n
\n \n 2 | \n CS031415000172 | \n 東京都渋谷区代々木********** | \n 13 | \n
\n \n 3 | \n CS028811000001 | \n 神奈川県横浜市泉区和泉町********** | \n 14 | \n
\n \n 4 | \n CS001215000145 | \n 東京都大田区仲六郷********** | \n 13 | \n
\n \n 5 | \n CS020401000016 | \n 東京都板橋区若木********** | \n 13 | \n
\n \n 6 | \n CS015414000103 | \n 東京都江東区北砂********** | \n 13 | \n
\n \n 7 | \n CS029403000008 | \n 千葉県浦安市海楽********** | \n 12 | \n
\n \n 8 | \n CS015804000004 | \n 東京都江東区北砂********** | \n 13 | \n
\n \n 9 | \n CS033513000180 | \n 神奈川県横浜市旭区善部町********** | \n 14 | \n
\n \n
\n
",
"text/plain": " customer_id address address\n0 CS021313000114 神奈川県伊勢原市粟窪********** 14\n1 CS037613000071 東京都江東区南砂********** 13\n2 CS031415000172 東京都渋谷区代々木********** 13\n3 CS028811000001 神奈川県横浜市泉区和泉町********** 14\n4 CS001215000145 東京都大田区仲六郷********** 13\n5 CS020401000016 東京都板橋区若木********** 13\n6 CS015414000103 東京都江東区北砂********** 13\n7 CS029403000008 千葉県浦安市海楽********** 12\n8 CS015804000004 東京都江東区北砂********** 13\n9 CS033513000180 神奈川県横浜市旭区善部町********** 14"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-055: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、その合計金額の四分位点を求めよ。その上で、顧客ごとの売上金額合計に対して以下の基準でカテゴリ値を作成し、顧客ID、売上金額と合計ともに表示せよ。カテゴリ値は上から順に1〜4とする。結果は10件表示させれば良い。\n>\n> - 最小値以上第一四分位未満\n> - 第一四分位以上第二四分位未満\n> - 第二四分位以上第三四分位未満\n> - 第三四分位以上"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例1\ndf_sales_amount = df_receipt[['customer_id', 'amount']].groupby('customer_id').sum().reset_index()\npct25 = np.quantile(df_sales_amount['amount'], 0.25)\npct50 = np.quantile(df_sales_amount['amount'], 0.5)\npct75 = np.quantile(df_sales_amount['amount'], 0.75)\n\ndef pct_group(x):\n if x < pct25:\n return 1\n elif pct25 <= x < pct50:\n return 2\n elif pct50 <= x < pct75:\n return 3\n elif pct75 <= x:\n return 4\n\ndf_sales_amount['pct_group'] = df_sales_amount['amount'].apply(lambda x: pct_group(x))\ndf_sales_amount.head(10)",
"execution_count": 60,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 60,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n pct_group | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 2 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 2 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 3 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 3 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 3 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n 1 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n 1 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n 1 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n 1 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n 1 | \n
\n \n
\n
",
"text/plain": " customer_id amount pct_group\n0 CS001113000004 1298 2\n1 CS001114000005 626 2\n2 CS001115000010 3044 3\n3 CS001205000004 1988 3\n4 CS001205000006 3337 3\n5 CS001211000025 456 1\n6 CS001212000027 448 1\n7 CS001212000031 296 1\n8 CS001212000046 228 1\n9 CS001212000070 456 1"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# 確認用\nprint('pct25:', pct25)\nprint('pct50:', pct50)\nprint('pct75:', pct75)",
"execution_count": 61,
"outputs": [
{
"output_type": "stream",
"text": "pct25: 548.5\npct50: 1478.0\npct75: 3651.0\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例2\ndf_temp = df_receipt.groupby('customer_id')[['amount']].sum()\ndf_temp['quantile'], bins = pd.qcut(df_receipt.groupby('customer_id')['amount'].sum(), 4, retbins=True)\ndisplay(df_temp.head())\nprint('quantiles:', bins)",
"execution_count": 62,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": "\n\n
\n \n \n | \n amount | \n quantile | \n
\n \n customer_id | \n | \n | \n
\n \n \n \n CS001113000004 | \n 1298 | \n (548.5, 1478.0] | \n
\n \n CS001114000005 | \n 626 | \n (548.5, 1478.0] | \n
\n \n CS001115000010 | \n 3044 | \n (1478.0, 3651.0] | \n
\n \n CS001205000004 | \n 1988 | \n (1478.0, 3651.0] | \n
\n \n CS001205000006 | \n 3337 | \n (1478.0, 3651.0] | \n
\n \n
\n
",
"text/plain": " amount quantile\ncustomer_id \nCS001113000004 1298 (548.5, 1478.0]\nCS001114000005 626 (548.5, 1478.0]\nCS001115000010 3044 (1478.0, 3651.0]\nCS001205000004 1988 (1478.0, 3651.0]\nCS001205000006 3337 (1478.0, 3651.0]"
},
"metadata": {}
},
{
"output_type": "stream",
"text": "quantiles: [7.0000000e+01 5.4850000e+02 1.4780000e+03 3.6510000e+03 1.2395003e+07]\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-056: 顧客データフレーム(df_customer)の年齢(age)をもとに10歳刻みで年代を算出し、顧客ID(customer_id)、生年月日(birth_day)とともに抽出せよ。ただし、60歳以上は全て60歳代とすること。年代を表すカテゴリ名は任意とする。先頭10件を表示させればよい。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例1\ndf_customer_era = pd.concat([df_customer[['customer_id', 'birth_day']],\n df_customer['age'].apply(lambda x: min(math.floor(x / 10) * 10, 60))],\n axis=1)\n\ndf_customer_era.head(10)",
"execution_count": 63,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 63,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n birth_day | \n age | \n
\n \n \n \n 0 | \n CS021313000114 | \n 1981-04-29 | \n 30 | \n
\n \n 1 | \n CS037613000071 | \n 1952-04-01 | \n 60 | \n
\n \n 2 | \n CS031415000172 | \n 1976-10-04 | \n 40 | \n
\n \n 3 | \n CS028811000001 | \n 1933-03-27 | \n 60 | \n
\n \n 4 | \n CS001215000145 | \n 1995-03-29 | \n 20 | \n
\n \n 5 | \n CS020401000016 | \n 1974-09-15 | \n 40 | \n
\n \n 6 | \n CS015414000103 | \n 1977-08-09 | \n 40 | \n
\n \n 7 | \n CS029403000008 | \n 1973-08-17 | \n 40 | \n
\n \n 8 | \n CS015804000004 | \n 1931-05-02 | \n 60 | \n
\n \n 9 | \n CS033513000180 | \n 1962-07-11 | \n 50 | \n
\n \n
\n
",
"text/plain": " customer_id birth_day age\n0 CS021313000114 1981-04-29 30\n1 CS037613000071 1952-04-01 60\n2 CS031415000172 1976-10-04 40\n3 CS028811000001 1933-03-27 60\n4 CS001215000145 1995-03-29 20\n5 CS020401000016 1974-09-15 40\n6 CS015414000103 1977-08-09 40\n7 CS029403000008 1973-08-17 40\n8 CS015804000004 1931-05-02 60\n9 CS033513000180 1962-07-11 50"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例2\ndf_customer['age_group'] = pd.cut(df_customer['age'], bins=[0, 10, 20, 30, 40, 50, 60, np.inf], right=False)\ndf_customer[['customer_id', 'birth_day', 'age_group']].head(10)",
"execution_count": 64,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 64,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n birth_day | \n age_group | \n
\n \n \n \n 0 | \n CS021313000114 | \n 1981-04-29 | \n [30.0, 40.0) | \n
\n \n 1 | \n CS037613000071 | \n 1952-04-01 | \n [60.0, inf) | \n
\n \n 2 | \n CS031415000172 | \n 1976-10-04 | \n [40.0, 50.0) | \n
\n \n 3 | \n CS028811000001 | \n 1933-03-27 | \n [60.0, inf) | \n
\n \n 4 | \n CS001215000145 | \n 1995-03-29 | \n [20.0, 30.0) | \n
\n \n 5 | \n CS020401000016 | \n 1974-09-15 | \n [40.0, 50.0) | \n
\n \n 6 | \n CS015414000103 | \n 1977-08-09 | \n [40.0, 50.0) | \n
\n \n 7 | \n CS029403000008 | \n 1973-08-17 | \n [40.0, 50.0) | \n
\n \n 8 | \n CS015804000004 | \n 1931-05-02 | \n [60.0, inf) | \n
\n \n 9 | \n CS033513000180 | \n 1962-07-11 | \n [50.0, 60.0) | \n
\n \n
\n
",
"text/plain": " customer_id birth_day age_group\n0 CS021313000114 1981-04-29 [30.0, 40.0)\n1 CS037613000071 1952-04-01 [60.0, inf)\n2 CS031415000172 1976-10-04 [40.0, 50.0)\n3 CS028811000001 1933-03-27 [60.0, inf)\n4 CS001215000145 1995-03-29 [20.0, 30.0)\n5 CS020401000016 1974-09-15 [40.0, 50.0)\n6 CS015414000103 1977-08-09 [40.0, 50.0)\n7 CS029403000008 1973-08-17 [40.0, 50.0)\n8 CS015804000004 1931-05-02 [60.0, inf)\n9 CS033513000180 1962-07-11 [50.0, 60.0)"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-057: 前問題の抽出結果と性別(gender)を組み合わせ、新たに性別×年代の組み合わせを表すカテゴリデータを作成せよ。組み合わせを表すカテゴリの値は任意とする。先頭10件を表示させればよい。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer_era['era_gender'] = df_customer['gender_cd'].astype('str') + df_customer_era['age'].astype('str')\ndf_customer_era.head(10)",
"execution_count": 65,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 65,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n birth_day | \n age | \n era_gender | \n
\n \n \n \n 0 | \n CS021313000114 | \n 1981-04-29 | \n 30 | \n 130 | \n
\n \n 1 | \n CS037613000071 | \n 1952-04-01 | \n 60 | \n 960 | \n
\n \n 2 | \n CS031415000172 | \n 1976-10-04 | \n 40 | \n 140 | \n
\n \n 3 | \n CS028811000001 | \n 1933-03-27 | \n 60 | \n 160 | \n
\n \n 4 | \n CS001215000145 | \n 1995-03-29 | \n 20 | \n 120 | \n
\n \n 5 | \n CS020401000016 | \n 1974-09-15 | \n 40 | \n 040 | \n
\n \n 6 | \n CS015414000103 | \n 1977-08-09 | \n 40 | \n 140 | \n
\n \n 7 | \n CS029403000008 | \n 1973-08-17 | \n 40 | \n 040 | \n
\n \n 8 | \n CS015804000004 | \n 1931-05-02 | \n 60 | \n 060 | \n
\n \n 9 | \n CS033513000180 | \n 1962-07-11 | \n 50 | \n 150 | \n
\n \n
\n
",
"text/plain": " customer_id birth_day age era_gender\n0 CS021313000114 1981-04-29 30 130\n1 CS037613000071 1952-04-01 60 960\n2 CS031415000172 1976-10-04 40 140\n3 CS028811000001 1933-03-27 60 160\n4 CS001215000145 1995-03-29 20 120\n5 CS020401000016 1974-09-15 40 040\n6 CS015414000103 1977-08-09 40 140\n7 CS029403000008 1973-08-17 40 040\n8 CS015804000004 1931-05-02 60 060\n9 CS033513000180 1962-07-11 50 150"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-058: 顧客データフレーム(df_customer)の性別コード(gender_cd)をダミー変数化し、顧客ID(customer_id)とともに抽出せよ。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.get_dummies(df_customer[['customer_id', 'gender_cd']], columns=['gender_cd']).head(10)",
"execution_count": 66,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 66,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n gender_cd_0 | \n gender_cd_1 | \n gender_cd_9 | \n
\n \n \n \n 0 | \n CS021313000114 | \n 0 | \n 1 | \n 0 | \n
\n \n 1 | \n CS037613000071 | \n 0 | \n 0 | \n 1 | \n
\n \n 2 | \n CS031415000172 | \n 0 | \n 1 | \n 0 | \n
\n \n 3 | \n CS028811000001 | \n 0 | \n 1 | \n 0 | \n
\n \n 4 | \n CS001215000145 | \n 0 | \n 1 | \n 0 | \n
\n \n 5 | \n CS020401000016 | \n 1 | \n 0 | \n 0 | \n
\n \n 6 | \n CS015414000103 | \n 0 | \n 1 | \n 0 | \n
\n \n 7 | \n CS029403000008 | \n 1 | \n 0 | \n 0 | \n
\n \n 8 | \n CS015804000004 | \n 1 | \n 0 | \n 0 | \n
\n \n 9 | \n CS033513000180 | \n 0 | \n 1 | \n 0 | \n
\n \n
\n
",
"text/plain": " customer_id gender_cd_0 gender_cd_1 gender_cd_9\n0 CS021313000114 0 1 0\n1 CS037613000071 0 0 1\n2 CS031415000172 0 1 0\n3 CS028811000001 0 1 0\n4 CS001215000145 0 1 0\n5 CS020401000016 1 0 0\n6 CS015414000103 0 1 0\n7 CS029403000008 1 0 0\n8 CS015804000004 1 0 0\n9 CS033513000180 0 1 0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-059: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を平均0、標準偏差1に標準化して顧客ID、売上金額合計とともに表示せよ。標準化に使用する標準偏差は、不偏標準偏差と標本標準偏差のどちらでも良いものとする。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_ss'] = preprocessing.scale(df_sales_amount['amount'])\ndf_sales_amount.head(10)",
"execution_count": 67,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 67,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n amount_ss | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n -0.459378 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n -0.706390 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 0.182413 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n -0.205749 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 0.290114 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n -0.768879 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n -0.771819 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n -0.827691 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n -0.852686 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n -0.768879 | \n
\n \n
\n
",
"text/plain": " customer_id amount amount_ss\n0 CS001113000004 1298 -0.459378\n1 CS001114000005 626 -0.706390\n2 CS001115000010 3044 0.182413\n3 CS001205000004 1988 -0.205749\n4 CS001205000006 3337 0.290114\n5 CS001211000025 456 -0.768879\n6 CS001212000027 448 -0.771819\n7 CS001212000031 296 -0.827691\n8 CS001212000046 228 -0.852686\n9 CS001212000070 456 -0.768879"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-060: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を最小値0、最大値1に正規化して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_mm'] = preprocessing.minmax_scale(df_sales_amount['amount'])\ndf_sales_amount.head(10)",
"execution_count": 68,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 68,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n amount_mm | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 0.053354 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 0.024157 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 0.129214 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 0.083333 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 0.141945 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n 0.016771 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n 0.016423 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n 0.009819 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n 0.006865 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n 0.016771 | \n
\n \n
\n
",
"text/plain": " customer_id amount amount_mm\n0 CS001113000004 1298 0.053354\n1 CS001114000005 626 0.024157\n2 CS001115000010 3044 0.129214\n3 CS001205000004 1988 0.083333\n4 CS001205000006 3337 0.141945\n5 CS001211000025 456 0.016771\n6 CS001212000027 448 0.016423\n7 CS001212000031 296 0.009819\n8 CS001212000046 228 0.006865\n9 CS001212000070 456 0.016771"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-061: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を常用対数化(底=10)して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_log10'] = np.log10(df_sales_amount['amount'] + 1)\ndf_sales_amount.head(10)",
"execution_count": 69,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 69,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n amount_log10 | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 3.113609 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 2.797268 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 3.483587 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 3.298635 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 3.523486 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n 2.659916 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n 2.652246 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n 2.472756 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n 2.359835 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n 2.659916 | \n
\n \n
\n
",
"text/plain": " customer_id amount amount_log10\n0 CS001113000004 1298 3.113609\n1 CS001114000005 626 2.797268\n2 CS001115000010 3044 3.483587\n3 CS001205000004 1988 3.298635\n4 CS001205000006 3337 3.523486\n5 CS001211000025 456 2.659916\n6 CS001212000027 448 2.652246\n7 CS001212000031 296 2.472756\n8 CS001212000046 228 2.359835\n9 CS001212000070 456 2.659916"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-062: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を自然対数化(底=e)して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_loge'] = np.log(df_sales_amount['amount'] + 1)\ndf_sales_amount.head(10)",
"execution_count": 70,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 70,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n amount_loge | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 7.169350 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 6.440947 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 8.021256 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 7.595387 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 8.113127 | \n
\n \n 5 | \n CS001211000025 | \n 456 | \n 6.124683 | \n
\n \n 6 | \n CS001212000027 | \n 448 | \n 6.107023 | \n
\n \n 7 | \n CS001212000031 | \n 296 | \n 5.693732 | \n
\n \n 8 | \n CS001212000046 | \n 228 | \n 5.433722 | \n
\n \n 9 | \n CS001212000070 | \n 456 | \n 6.124683 | \n
\n \n
\n
",
"text/plain": " customer_id amount amount_loge\n0 CS001113000004 1298 7.169350\n1 CS001114000005 626 6.440947\n2 CS001115000010 3044 8.021256\n3 CS001205000004 1988 7.595387\n4 CS001205000006 3337 8.113127\n5 CS001211000025 456 6.124683\n6 CS001212000027 448 6.107023\n7 CS001212000031 296 5.693732\n8 CS001212000046 228 5.433722\n9 CS001212000070 456 6.124683"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-063: 商品データフレーム(df_product)の単価(unit_price)と原価(unit_cost)から、各商品の利益額を算出せよ。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_product.copy()\ndf_tmp['unit_profit'] = df_tmp['unit_price'] - df_tmp['unit_cost']\ndf_tmp.head(10)",
"execution_count": 71,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 71,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n unit_profit | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 49.0 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 54.0 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 57.0 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 62.0 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 67.0 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 74.0 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 84.0 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 105.0 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 124.0 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 145.0 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost unit_profit \n0 198.0 149.0 49.0 \n1 218.0 164.0 54.0 \n2 230.0 173.0 57.0 \n3 248.0 186.0 62.0 \n4 268.0 201.0 67.0 \n5 298.0 224.0 74.0 \n6 338.0 254.0 84.0 \n7 420.0 315.0 105.0 \n8 498.0 374.0 124.0 \n9 580.0 435.0 145.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-064: 商品データフレーム(df_product)の単価(unit_price)と原価(unit_cost)から、各商品の利益率の全体平均を算出せよ。\nただし、単価と原価にはNULLが存在することに注意せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_product.copy()\ndf_tmp['unit_profit_rate'] = (df_tmp['unit_price'] - df_tmp['unit_cost']) / df_tmp['unit_price']\ndf_tmp['unit_profit_rate'].mean(skipna=True)",
"execution_count": 72,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 72,
"data": {
"text/plain": "0.24911389885176904"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-065: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。ただし、1円未満は切り捨てること。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# math.floorはNaNでエラーとなるが、numpy.floorはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.floor(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)",
"execution_count": 73,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 73,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n new_price | \n new_profit_rate | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 212.0 | \n 0.297170 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 234.0 | \n 0.299145 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 247.0 | \n 0.299595 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 265.0 | \n 0.298113 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 287.0 | \n 0.299652 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 320.0 | \n 0.300000 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 362.0 | \n 0.298343 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 450.0 | \n 0.300000 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 534.0 | \n 0.299625 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 621.0 | \n 0.299517 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 212.0 0.297170 \n1 218.0 164.0 234.0 0.299145 \n2 230.0 173.0 247.0 0.299595 \n3 248.0 186.0 265.0 0.298113 \n4 268.0 201.0 287.0 0.299652 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 362.0 0.298343 \n7 420.0 315.0 450.0 0.300000 \n8 498.0 374.0 534.0 0.299625 \n9 580.0 435.0 621.0 0.299517 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-066: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。今回は、1円未満を四捨五入すること(0.5については偶数方向の丸めで良い)。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# 組み込みのroundはNaNでエラーとなるが、numpy.roundはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.round(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)",
"execution_count": 74,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 74,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n new_price | \n new_profit_rate | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 213.0 | \n 0.300469 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 234.0 | \n 0.299145 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 247.0 | \n 0.299595 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 266.0 | \n 0.300752 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 287.0 | \n 0.299652 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 320.0 | \n 0.300000 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 363.0 | \n 0.300275 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 450.0 | \n 0.300000 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 534.0 | \n 0.299625 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 621.0 | \n 0.299517 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 213.0 0.300469 \n1 218.0 164.0 234.0 0.299145 \n2 230.0 173.0 247.0 0.299595 \n3 248.0 186.0 266.0 0.300752 \n4 268.0 201.0 287.0 0.299652 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 363.0 0.300275 \n7 420.0 315.0 450.0 0.300000 \n8 498.0 374.0 534.0 0.299625 \n9 580.0 435.0 621.0 0.299517 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-067: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。今回は、1円未満を切り上げること。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# math.ceilはNaNでエラーとなるが、numpy.ceilはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.ceil(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)",
"execution_count": 75,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 75,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n new_price | \n new_profit_rate | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 213.0 | \n 0.300469 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 235.0 | \n 0.302128 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 248.0 | \n 0.302419 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 266.0 | \n 0.300752 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 288.0 | \n 0.302083 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 320.0 | \n 0.300000 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 363.0 | \n 0.300275 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 451.0 | \n 0.301552 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 535.0 | \n 0.300935 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 622.0 | \n 0.300643 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 213.0 0.300469 \n1 218.0 164.0 235.0 0.302128 \n2 230.0 173.0 248.0 0.302419 \n3 248.0 186.0 266.0 0.300752 \n4 268.0 201.0 288.0 0.302083 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 363.0 0.300275 \n7 420.0 315.0 451.0 0.301552 \n8 498.0 374.0 535.0 0.300935 \n9 580.0 435.0 622.0 0.300643 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-068: 商品データフレーム(df_product)の各商品について、消費税率10%の税込み金額を求めよ。 1円未満の端数は切り捨てとし、結果は10件表示すれば良い。ただし、単価(unit_price)にはNULLが存在することに注意せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# math.floorはNaNでエラーとなるが、numpy.floorはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['price_tax'] = df_tmp['unit_price'].apply(lambda x: np.floor(x * 1.1))\ndf_tmp.head(10)",
"execution_count": 76,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 76,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n price_tax | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 217.0 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 239.0 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 253.0 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 272.0 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 294.0 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 327.0 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 371.0 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 462.0 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 547.0 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 638.0 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost price_tax \n0 198.0 149.0 217.0 \n1 218.0 164.0 239.0 \n2 230.0 173.0 253.0 \n3 248.0 186.0 272.0 \n4 268.0 201.0 294.0 \n5 298.0 224.0 327.0 \n6 338.0 254.0 371.0 \n7 420.0 315.0 462.0 \n8 498.0 374.0 547.0 \n9 580.0 435.0 638.0 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-069: レシート明細データフレーム(df_receipt)と商品データフレーム(df_product)を結合し、顧客毎に全商品の売上金額合計と、カテゴリ大区分(category_major_cd)が\"07\"(瓶詰缶詰)の売上金額合計を計算の上、両者の比率を求めよ。抽出対象はカテゴリ大区分\"07\"(瓶詰缶詰)の購入実績がある顧客のみとし、結果は10件表示させればよい。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例1\ndf_tmp_1 = pd.merge(df_receipt, df_product, \n how='inner', on='product_cd').groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\ndf_tmp_2 = pd.merge(df_receipt, df_product.query('category_major_cd == \"07\"'), \n how='inner', on='product_cd').groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\ndf_tmp_3 = pd.merge(df_tmp_1, df_tmp_2, how='inner', on='customer_id')\ndf_tmp_3['rate_07'] = df_tmp_3['amount_y'] / df_tmp_3['amount_x']\ndf_tmp_3.head(10)",
"execution_count": 77,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 77,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount_x | \n amount_y | \n rate_07 | \n
\n \n \n \n 0 | \n CS001113000004 | \n 1298 | \n 1298 | \n 1.000000 | \n
\n \n 1 | \n CS001114000005 | \n 626 | \n 486 | \n 0.776358 | \n
\n \n 2 | \n CS001115000010 | \n 3044 | \n 2694 | \n 0.885020 | \n
\n \n 3 | \n CS001205000004 | \n 1988 | \n 346 | \n 0.174044 | \n
\n \n 4 | \n CS001205000006 | \n 3337 | \n 2004 | \n 0.600539 | \n
\n \n 5 | \n CS001212000027 | \n 448 | \n 200 | \n 0.446429 | \n
\n \n 6 | \n CS001212000031 | \n 296 | \n 296 | \n 1.000000 | \n
\n \n 7 | \n CS001212000046 | \n 228 | \n 108 | \n 0.473684 | \n
\n \n 8 | \n CS001212000070 | \n 456 | \n 308 | \n 0.675439 | \n
\n \n 9 | \n CS001213000018 | \n 243 | \n 145 | \n 0.596708 | \n
\n \n
\n
",
"text/plain": " customer_id amount_x amount_y rate_07\n0 CS001113000004 1298 1298 1.000000\n1 CS001114000005 626 486 0.776358\n2 CS001115000010 3044 2694 0.885020\n3 CS001205000004 1988 346 0.174044\n4 CS001205000006 3337 2004 0.600539\n5 CS001212000027 448 200 0.446429\n6 CS001212000031 296 296 1.000000\n7 CS001212000046 228 108 0.473684\n8 CS001212000070 456 308 0.675439\n9 CS001213000018 243 145 0.596708"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# コード例2\ndf_temp = df_receipt.merge(df_product, how='left', on='product_cd').groupby(['customer_id', 'category_major_cd'])['amount'].sum().unstack()\ndf_temp = df_temp[df_temp[7] > 0]\ndf_temp['sum'] = df_temp.sum(axis=1)\ndf_temp['07_rate'] = df_temp[7] / df_temp['sum']\ndf_temp.head(10)",
"execution_count": 78,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 78,
"data": {
"text/html": "\n\n
\n \n \n category_major_cd | \n 4 | \n 5 | \n 6 | \n 7 | \n 8 | \n 9 | \n sum | \n 07_rate | \n
\n \n customer_id | \n | \n | \n | \n | \n | \n | \n | \n | \n
\n \n \n \n CS001113000004 | \n NaN | \n NaN | \n NaN | \n 1298.0 | \n NaN | \n NaN | \n 1298.0 | \n 1.000000 | \n
\n \n CS001114000005 | \n NaN | \n 40.0 | \n NaN | \n 486.0 | \n 100.0 | \n NaN | \n 626.0 | \n 0.776358 | \n
\n \n CS001115000010 | \n NaN | \n NaN | \n NaN | \n 2694.0 | \n NaN | \n 350.0 | \n 3044.0 | \n 0.885020 | \n
\n \n CS001205000004 | \n 100.0 | \n 128.0 | \n 286.0 | \n 346.0 | \n 368.0 | \n 760.0 | \n 1988.0 | \n 0.174044 | \n
\n \n CS001205000006 | \n 635.0 | \n 60.0 | \n 198.0 | \n 2004.0 | \n 80.0 | \n 360.0 | \n 3337.0 | \n 0.600539 | \n
\n \n CS001212000027 | \n 248.0 | \n NaN | \n NaN | \n 200.0 | \n NaN | \n NaN | \n 448.0 | \n 0.446429 | \n
\n \n CS001212000031 | \n NaN | \n NaN | \n NaN | \n 296.0 | \n NaN | \n NaN | \n 296.0 | \n 1.000000 | \n
\n \n CS001212000046 | \n NaN | \n NaN | \n NaN | \n 108.0 | \n NaN | \n 120.0 | \n 228.0 | \n 0.473684 | \n
\n \n CS001212000070 | \n NaN | \n NaN | \n 148.0 | \n 308.0 | \n NaN | \n NaN | \n 456.0 | \n 0.675439 | \n
\n \n CS001213000018 | \n NaN | \n NaN | \n NaN | \n 145.0 | \n 98.0 | \n NaN | \n 243.0 | \n 0.596708 | \n
\n \n
\n
",
"text/plain": "category_major_cd 4 5 6 7 8 9 sum 07_rate\ncustomer_id \nCS001113000004 NaN NaN NaN 1298.0 NaN NaN 1298.0 1.000000\nCS001114000005 NaN 40.0 NaN 486.0 100.0 NaN 626.0 0.776358\nCS001115000010 NaN NaN NaN 2694.0 NaN 350.0 3044.0 0.885020\nCS001205000004 100.0 128.0 286.0 346.0 368.0 760.0 1988.0 0.174044\nCS001205000006 635.0 60.0 198.0 2004.0 80.0 360.0 3337.0 0.600539\nCS001212000027 248.0 NaN NaN 200.0 NaN NaN 448.0 0.446429\nCS001212000031 NaN NaN NaN 296.0 NaN NaN 296.0 1.000000\nCS001212000046 NaN NaN NaN 108.0 NaN 120.0 228.0 0.473684\nCS001212000070 NaN NaN 148.0 308.0 NaN NaN 456.0 0.675439\nCS001213000018 NaN NaN NaN 145.0 98.0 NaN 243.0 0.596708"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-070: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過日数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\ndf_tmp['elapsed_date'] = df_tmp['sales_ymd'] - df_tmp['application_date']\ndf_tmp.head(10)",
"execution_count": 79,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 79,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n application_date | \n elapsed_date | \n
\n \n \n \n 0 | \n CS006214000001 | \n 2018-11-03 | \n 1970-01-01 00:00:00.020150201 | \n 17837 days 23:59:59.979849 | \n
\n \n 1 | \n CS006214000001 | \n 2017-05-09 | \n 1970-01-01 00:00:00.020150201 | \n 17294 days 23:59:59.979849 | \n
\n \n 2 | \n CS006214000001 | \n 2017-06-08 | \n 1970-01-01 00:00:00.020150201 | \n 17324 days 23:59:59.979849 | \n
\n \n 4 | \n CS006214000001 | \n 2018-10-28 | \n 1970-01-01 00:00:00.020150201 | \n 17831 days 23:59:59.979849 | \n
\n \n 7 | \n CS006214000001 | \n 2019-09-08 | \n 1970-01-01 00:00:00.020150201 | \n 18146 days 23:59:59.979849 | \n
\n \n 8 | \n CS006214000001 | \n 2018-01-31 | \n 1970-01-01 00:00:00.020150201 | \n 17561 days 23:59:59.979849 | \n
\n \n 9 | \n CS006214000001 | \n 2017-07-05 | \n 1970-01-01 00:00:00.020150201 | \n 17351 days 23:59:59.979849 | \n
\n \n 10 | \n CS006214000001 | \n 2018-11-10 | \n 1970-01-01 00:00:00.020150201 | \n 17844 days 23:59:59.979849 | \n
\n \n 12 | \n CS006214000001 | \n 2019-04-10 | \n 1970-01-01 00:00:00.020150201 | \n 17995 days 23:59:59.979849 | \n
\n \n 15 | \n CS006214000001 | \n 2019-06-01 | \n 1970-01-01 00:00:00.020150201 | \n 18047 days 23:59:59.979849 | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd application_date \\\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 \n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 \n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 \n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 \n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 \n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 \n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 \n10 CS006214000001 2018-11-10 1970-01-01 00:00:00.020150201 \n12 CS006214000001 2019-04-10 1970-01-01 00:00:00.020150201 \n15 CS006214000001 2019-06-01 1970-01-01 00:00:00.020150201 \n\n elapsed_date \n0 17837 days 23:59:59.979849 \n1 17294 days 23:59:59.979849 \n2 17324 days 23:59:59.979849 \n4 17831 days 23:59:59.979849 \n7 18146 days 23:59:59.979849 \n8 17561 days 23:59:59.979849 \n9 17351 days 23:59:59.979849 \n10 17844 days 23:59:59.979849 \n12 17995 days 23:59:59.979849 \n15 18047 days 23:59:59.979849 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-071: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過月数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1ヶ月未満は切り捨てること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = df_tmp[['sales_ymd', 'application_date']].apply(lambda x: \n relativedelta(x[0], x[1]).years * 12 + relativedelta(x[0], x[1]).months, axis=1)\ndf_tmp.sort_values('customer_id').head(10)",
"execution_count": 80,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 80,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n application_date | \n elapsed_date | \n
\n \n \n \n 60376 | \n CS001113000004 | \n 2019-03-08 | \n 1970-01-01 00:00:00.020151105 | \n 590 | \n
\n \n 20158 | \n CS001114000005 | \n 2019-07-31 | \n 1970-01-01 00:00:00.020160412 | \n 594 | \n
\n \n 20156 | \n CS001114000005 | \n 2018-05-03 | \n 1970-01-01 00:00:00.020160412 | \n 580 | \n
\n \n 29140 | \n CS001115000010 | \n 2019-04-05 | \n 1970-01-01 00:00:00.020150417 | \n 591 | \n
\n \n 29141 | \n CS001115000010 | \n 2018-07-01 | \n 1970-01-01 00:00:00.020150417 | \n 581 | \n
\n \n 29142 | \n CS001115000010 | \n 2017-12-28 | \n 1970-01-01 00:00:00.020150417 | \n 575 | \n
\n \n 6526 | \n CS001205000004 | \n 2019-06-25 | \n 1970-01-01 00:00:00.020160615 | \n 593 | \n
\n \n 6521 | \n CS001205000004 | \n 2019-03-12 | \n 1970-01-01 00:00:00.020160615 | \n 590 | \n
\n \n 6518 | \n CS001205000004 | \n 2018-08-21 | \n 1970-01-01 00:00:00.020160615 | \n 583 | \n
\n \n 6520 | \n CS001205000004 | \n 2017-09-14 | \n 1970-01-01 00:00:00.020160615 | \n 572 | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd application_date elapsed_date\n60376 CS001113000004 2019-03-08 1970-01-01 00:00:00.020151105 590\n20158 CS001114000005 2019-07-31 1970-01-01 00:00:00.020160412 594\n20156 CS001114000005 2018-05-03 1970-01-01 00:00:00.020160412 580\n29140 CS001115000010 2019-04-05 1970-01-01 00:00:00.020150417 591\n29141 CS001115000010 2018-07-01 1970-01-01 00:00:00.020150417 581\n29142 CS001115000010 2017-12-28 1970-01-01 00:00:00.020150417 575\n6526 CS001205000004 2019-06-25 1970-01-01 00:00:00.020160615 593\n6521 CS001205000004 2019-03-12 1970-01-01 00:00:00.020160615 590\n6518 CS001205000004 2018-08-21 1970-01-01 00:00:00.020160615 583\n6520 CS001205000004 2017-09-14 1970-01-01 00:00:00.020160615 572"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-072: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過年数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い。(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1年未満は切り捨てること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = df_tmp[['sales_ymd', 'application_date']].apply(lambda x: \n relativedelta(x[0], x[1]).years, axis=1)\ndf_tmp.head(10)",
"execution_count": 81,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 81,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n application_date | \n elapsed_date | \n
\n \n \n \n 0 | \n CS006214000001 | \n 2018-11-03 | \n 1970-01-01 00:00:00.020150201 | \n 48 | \n
\n \n 1 | \n CS006214000001 | \n 2017-05-09 | \n 1970-01-01 00:00:00.020150201 | \n 47 | \n
\n \n 2 | \n CS006214000001 | \n 2017-06-08 | \n 1970-01-01 00:00:00.020150201 | \n 47 | \n
\n \n 3 | \n CS006214000001 | \n 2017-06-08 | \n 1970-01-01 00:00:00.020150201 | \n 47 | \n
\n \n 4 | \n CS006214000001 | \n 2018-10-28 | \n 1970-01-01 00:00:00.020150201 | \n 48 | \n
\n \n 5 | \n CS006214000001 | \n 2018-10-28 | \n 1970-01-01 00:00:00.020150201 | \n 48 | \n
\n \n 6 | \n CS006214000001 | \n 2017-05-09 | \n 1970-01-01 00:00:00.020150201 | \n 47 | \n
\n \n 7 | \n CS006214000001 | \n 2019-09-08 | \n 1970-01-01 00:00:00.020150201 | \n 49 | \n
\n \n 8 | \n CS006214000001 | \n 2018-01-31 | \n 1970-01-01 00:00:00.020150201 | \n 48 | \n
\n \n 9 | \n CS006214000001 | \n 2017-07-05 | \n 1970-01-01 00:00:00.020150201 | \n 47 | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd application_date elapsed_date\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 48\n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 47\n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 47\n3 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 47\n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 48\n5 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 48\n6 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 47\n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 49\n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 48\n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 47"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-073: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からのエポック秒による経過時間を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。なお、時間情報は保有していないため各日付は0時0分0秒を表すものとする。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = (df_tmp['sales_ymd'].astype(np.int64) / 10**9) - (df_tmp['application_date'].astype(np.int64) / 10**9)\ndf_tmp.head(10)",
"execution_count": 82,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 82,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n application_date | \n elapsed_date | \n
\n \n \n \n 0 | \n CS006214000001 | \n 2018-11-03 | \n 1970-01-01 00:00:00.020150201 | \n 1.541203e+09 | \n
\n \n 1 | \n CS006214000001 | \n 2017-05-09 | \n 1970-01-01 00:00:00.020150201 | \n 1.494288e+09 | \n
\n \n 2 | \n CS006214000001 | \n 2017-06-08 | \n 1970-01-01 00:00:00.020150201 | \n 1.496880e+09 | \n
\n \n 4 | \n CS006214000001 | \n 2018-10-28 | \n 1970-01-01 00:00:00.020150201 | \n 1.540685e+09 | \n
\n \n 7 | \n CS006214000001 | \n 2019-09-08 | \n 1970-01-01 00:00:00.020150201 | \n 1.567901e+09 | \n
\n \n 8 | \n CS006214000001 | \n 2018-01-31 | \n 1970-01-01 00:00:00.020150201 | \n 1.517357e+09 | \n
\n \n 9 | \n CS006214000001 | \n 2017-07-05 | \n 1970-01-01 00:00:00.020150201 | \n 1.499213e+09 | \n
\n \n 10 | \n CS006214000001 | \n 2018-11-10 | \n 1970-01-01 00:00:00.020150201 | \n 1.541808e+09 | \n
\n \n 12 | \n CS006214000001 | \n 2019-04-10 | \n 1970-01-01 00:00:00.020150201 | \n 1.554854e+09 | \n
\n \n 15 | \n CS006214000001 | \n 2019-06-01 | \n 1970-01-01 00:00:00.020150201 | \n 1.559347e+09 | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd application_date elapsed_date\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 1.541203e+09\n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 1.494288e+09\n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 1.496880e+09\n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 1.540685e+09\n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 1.567901e+09\n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 1.517357e+09\n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 1.499213e+09\n10 CS006214000001 2018-11-10 1970-01-01 00:00:00.020150201 1.541808e+09\n12 CS006214000001 2019-04-10 1970-01-01 00:00:00.020150201 1.554854e+09\n15 CS006214000001 2019-06-01 1970-01-01 00:00:00.020150201 1.559347e+09"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-074: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、当該週の月曜日からの経過日数を計算し、売上日、当該週の月曜日付とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値でデータを保持している点に注意)。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_receipt[['customer_id', 'sales_ymd']]\ndf_tmp = df_tmp.drop_duplicates()\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['monday'] = df_tmp['sales_ymd'].apply(lambda x: x - relativedelta(days=x.weekday()))\ndf_tmp['elapsed_weekday'] = df_tmp['sales_ymd'] - df_tmp['monday']\ndf_tmp.head(10)",
"execution_count": 83,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 83,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n sales_ymd | \n monday | \n elapsed_weekday | \n
\n \n \n \n 0 | \n CS006214000001 | \n 2018-11-03 | \n 2018-10-29 | \n 5 days | \n
\n \n 1 | \n CS008415000097 | \n 2018-11-18 | \n 2018-11-12 | \n 6 days | \n
\n \n 2 | \n CS028414000014 | \n 2017-07-12 | \n 2017-07-10 | \n 2 days | \n
\n \n 3 | \n ZZ000000000000 | \n 2019-02-05 | \n 2019-02-04 | \n 1 days | \n
\n \n 4 | \n CS025415000050 | \n 2018-08-21 | \n 2018-08-20 | \n 1 days | \n
\n \n 5 | \n CS003515000195 | \n 2019-06-05 | \n 2019-06-03 | \n 2 days | \n
\n \n 6 | \n CS024514000042 | \n 2018-12-05 | \n 2018-12-03 | \n 2 days | \n
\n \n 7 | \n CS040415000178 | \n 2019-09-22 | \n 2019-09-16 | \n 6 days | \n
\n \n 8 | \n ZZ000000000000 | \n 2017-05-04 | \n 2017-05-01 | \n 3 days | \n
\n \n 9 | \n CS027514000015 | \n 2019-10-10 | \n 2019-10-07 | \n 3 days | \n
\n \n
\n
",
"text/plain": " customer_id sales_ymd monday elapsed_weekday\n0 CS006214000001 2018-11-03 2018-10-29 5 days\n1 CS008415000097 2018-11-18 2018-11-12 6 days\n2 CS028414000014 2017-07-12 2017-07-10 2 days\n3 ZZ000000000000 2019-02-05 2019-02-04 1 days\n4 CS025415000050 2018-08-21 2018-08-20 1 days\n5 CS003515000195 2019-06-05 2019-06-03 2 days\n6 CS024514000042 2018-12-05 2018-12-03 2 days\n7 CS040415000178 2019-09-22 2019-09-16 6 days\n8 ZZ000000000000 2017-05-04 2017-05-01 3 days\n9 CS027514000015 2019-10-10 2019-10-07 3 days"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-075: 顧客データフレーム(df_customer)からランダムに1%のデータを抽出し、先頭から10件データを抽出せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer.sample(frac=0.01).head(10)",
"execution_count": 84,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 84,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n age_group | \n
\n \n \n \n 18772 | \n CS005515000369 | \n 春日 翔子 | \n 1 | \n 女性 | \n 1964-10-03 | \n 54 | \n 167-0031 | \n 東京都杉並区本天沼********** | \n S13005 | \n 20180211 | \n 3-20100517-2 | \n [50.0, 60.0) | \n
\n \n 11534 | \n CS029603000027 | \n 田村 正義 | \n 0 | \n 男性 | \n 1948-11-01 | \n 70 | \n 279-0011 | \n 千葉県浦安市美浜********** | \n S12029 | \n 20150128 | \n 0-00000000-0 | \n [60.0, inf) | \n
\n \n 992 | \n CS018715000013 | \n 古賀 杏 | \n 1 | \n 女性 | \n 1941-02-27 | \n 78 | \n 204-0012 | \n 東京都清瀬市中清戸********** | \n S13018 | \n 20150801 | \n 0-00000000-0 | \n [60.0, inf) | \n
\n \n 13280 | \n CS016312000113 | \n 亀山 れいな | \n 1 | \n 女性 | \n 1987-04-20 | \n 31 | \n 187-0011 | \n 東京都小平市鈴木町********** | \n S13016 | \n 20150525 | \n 0-00000000-0 | \n [30.0, 40.0) | \n
\n \n 12097 | \n CS015212000045 | \n 鶴田 奈央 | \n 1 | \n 女性 | \n 1992-01-06 | \n 27 | \n 136-0073 | \n 東京都江東区北砂********** | \n S13015 | \n 20150722 | \n 6-20090316-3 | \n [20.0, 30.0) | \n
\n \n 9797 | \n CS019313000127 | \n 杉本 あさみ | \n 1 | \n 女性 | \n 1987-10-21 | \n 31 | \n 173-0036 | \n 東京都板橋区向原********** | \n S13019 | \n 20150324 | \n 0-00000000-0 | \n [30.0, 40.0) | \n
\n \n 17813 | \n CS007615000110 | \n 岡本 美幸 | \n 1 | \n 女性 | \n 1954-05-13 | \n 64 | \n 285-0845 | \n 千葉県佐倉市西志津********** | \n S12007 | \n 20141019 | \n 8-20101018-C | \n [60.0, inf) | \n
\n \n 19506 | \n CS037511000006 | \n 柴田 陽子 | \n 1 | \n 女性 | \n 1966-01-29 | \n 53 | \n 136-0071 | \n 東京都江東区亀戸********** | \n S13037 | \n 20150630 | \n A-20081209-4 | \n [50.0, 60.0) | \n
\n \n 4257 | \n CS006312000096 | \n 柳川 真帆 | \n 1 | \n 女性 | \n 1979-02-03 | \n 40 | \n 224-0041 | \n 神奈川県横浜市都筑区仲町台********** | \n S14006 | \n 20150129 | \n 0-00000000-0 | \n [40.0, 50.0) | \n
\n \n 2686 | \n CS028513000046 | \n 生瀬 さやか | \n 1 | \n 女性 | \n 1961-09-12 | \n 57 | \n 246-0031 | \n 神奈川県横浜市瀬谷区瀬谷********** | \n S14028 | \n 20150313 | \n B-20100927-C | \n [50.0, 60.0) | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n18772 CS005515000369 春日 翔子 1 女性 1964-10-03 54 \n11534 CS029603000027 田村 正義 0 男性 1948-11-01 70 \n992 CS018715000013 古賀 杏 1 女性 1941-02-27 78 \n13280 CS016312000113 亀山 れいな 1 女性 1987-04-20 31 \n12097 CS015212000045 鶴田 奈央 1 女性 1992-01-06 27 \n9797 CS019313000127 杉本 あさみ 1 女性 1987-10-21 31 \n17813 CS007615000110 岡本 美幸 1 女性 1954-05-13 64 \n19506 CS037511000006 柴田 陽子 1 女性 1966-01-29 53 \n4257 CS006312000096 柳川 真帆 1 女性 1979-02-03 40 \n2686 CS028513000046 生瀬 さやか 1 女性 1961-09-12 57 \n\n postal_cd address application_store_cd \\\n18772 167-0031 東京都杉並区本天沼********** S13005 \n11534 279-0011 千葉県浦安市美浜********** S12029 \n992 204-0012 東京都清瀬市中清戸********** S13018 \n13280 187-0011 東京都小平市鈴木町********** S13016 \n12097 136-0073 東京都江東区北砂********** S13015 \n9797 173-0036 東京都板橋区向原********** S13019 \n17813 285-0845 千葉県佐倉市西志津********** S12007 \n19506 136-0071 東京都江東区亀戸********** S13037 \n4257 224-0041 神奈川県横浜市都筑区仲町台********** S14006 \n2686 246-0031 神奈川県横浜市瀬谷区瀬谷********** S14028 \n\n application_date status_cd age_group \n18772 20180211 3-20100517-2 [50.0, 60.0) \n11534 20150128 0-00000000-0 [60.0, inf) \n992 20150801 0-00000000-0 [60.0, inf) \n13280 20150525 0-00000000-0 [30.0, 40.0) \n12097 20150722 6-20090316-3 [20.0, 30.0) \n9797 20150324 0-00000000-0 [30.0, 40.0) \n17813 20141019 8-20101018-C [60.0, inf) \n19506 20150630 A-20081209-4 [50.0, 60.0) \n4257 20150129 0-00000000-0 [40.0, 50.0) \n2686 20150313 B-20100927-C [50.0, 60.0) "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-076: 顧客データフレーム(df_customer)から性別(gender_cd)の割合に基づきランダムに10%のデータを層化抽出データし、性別ごとに件数を集計せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# sklearn.model_selection.train_test_splitを使用した例\n_, df_tmp = train_test_split(df_customer, test_size=0.1, stratify=df_customer['gender'])\ndf_tmp.groupby('gender_cd').agg({'customer_id' : 'count'})",
"execution_count": 85,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 85,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n
\n \n gender_cd | \n | \n
\n \n \n \n 0 | \n 298 | \n
\n \n 1 | \n 1793 | \n
\n \n 9 | \n 107 | \n
\n \n
\n
",
"text/plain": " customer_id\ngender_cd \n0 298\n1 1793\n9 107"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp.head(10)",
"execution_count": 86,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 86,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n age_group | \n
\n \n \n \n 20954 | \n CS002413000302 | \n 細谷 希 | \n 1 | \n 女性 | \n 1974-11-22 | \n 44 | \n 185-0023 | \n 東京都国分寺市西元町********** | \n S13002 | \n 20170201 | \n 0-00000000-0 | \n [40.0, 50.0) | \n
\n \n 11786 | \n CS001312000436 | \n 上原 怜奈 | \n 1 | \n 女性 | \n 1986-04-26 | \n 32 | \n 212-0004 | \n 神奈川県川崎市幸区小向西町********** | \n S13001 | \n 20161230 | \n 0-00000000-0 | \n [30.0, 40.0) | \n
\n \n 21437 | \n CS014312000076 | \n 戸塚 礼子 | \n 1 | \n 女性 | \n 1983-03-29 | \n 36 | \n 263-0013 | \n 千葉県千葉市稲毛区千草台********** | \n S12014 | \n 20151101 | \n 0-00000000-0 | \n [30.0, 40.0) | \n
\n \n 8791 | \n CS003113000013 | \n 日下部 美咲 | \n 1 | \n 女性 | \n 2003-07-07 | \n 15 | \n 182-0022 | \n 東京都調布市国領町********** | \n S13003 | \n 20160201 | \n 0-00000000-0 | \n [10.0, 20.0) | \n
\n \n 15810 | \n CS039515000204 | \n 井田 季衣 | \n 1 | \n 女性 | \n 1968-03-11 | \n 51 | \n 168-0081 | \n 東京都杉並区宮前********** | \n S13039 | \n 20150102 | \n D-20100724-E | \n [50.0, 60.0) | \n
\n \n 3702 | \n CS001713000172 | \n 中井 愛 | \n 1 | \n 女性 | \n 1945-10-16 | \n 73 | \n 144-0035 | \n 東京都大田区南蒲田********** | \n S13001 | \n 20161221 | \n 0-00000000-0 | \n [60.0, inf) | \n
\n \n 11297 | \n CS032414000003 | \n 藤村 みき | \n 1 | \n 女性 | \n 1976-03-04 | \n 43 | \n 144-0056 | \n 東京都大田区西六郷********** | \n S13032 | \n 20150418 | \n F-20101027-E | \n [40.0, 50.0) | \n
\n \n 15839 | \n CS005713000139 | \n 五十嵐 結衣 | \n 1 | \n 女性 | \n 1946-11-01 | \n 72 | \n 167-0021 | \n 東京都杉並区井草********** | \n S13005 | \n 20170221 | \n 0-00000000-0 | \n [60.0, inf) | \n
\n \n 3082 | \n CS020414000070 | \n 小笠原 千夏 | \n 1 | \n 女性 | \n 1970-09-11 | \n 48 | \n 114-0001 | \n 東京都北区東十条********** | \n S13020 | \n 20150322 | \n D-20100901-E | \n [40.0, 50.0) | \n
\n \n 14682 | \n CS019613000074 | \n 尾崎 華子 | \n 1 | \n 女性 | \n 1955-07-10 | \n 63 | \n 176-0002 | \n 東京都練馬区桜台********** | \n S13019 | \n 20150918 | \n 0-00000000-0 | \n [60.0, inf) | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n20954 CS002413000302 細谷 希 1 女性 1974-11-22 44 \n11786 CS001312000436 上原 怜奈 1 女性 1986-04-26 32 \n21437 CS014312000076 戸塚 礼子 1 女性 1983-03-29 36 \n8791 CS003113000013 日下部 美咲 1 女性 2003-07-07 15 \n15810 CS039515000204 井田 季衣 1 女性 1968-03-11 51 \n3702 CS001713000172 中井 愛 1 女性 1945-10-16 73 \n11297 CS032414000003 藤村 みき 1 女性 1976-03-04 43 \n15839 CS005713000139 五十嵐 結衣 1 女性 1946-11-01 72 \n3082 CS020414000070 小笠原 千夏 1 女性 1970-09-11 48 \n14682 CS019613000074 尾崎 華子 1 女性 1955-07-10 63 \n\n postal_cd address application_store_cd \\\n20954 185-0023 東京都国分寺市西元町********** S13002 \n11786 212-0004 神奈川県川崎市幸区小向西町********** S13001 \n21437 263-0013 千葉県千葉市稲毛区千草台********** S12014 \n8791 182-0022 東京都調布市国領町********** S13003 \n15810 168-0081 東京都杉並区宮前********** S13039 \n3702 144-0035 東京都大田区南蒲田********** S13001 \n11297 144-0056 東京都大田区西六郷********** S13032 \n15839 167-0021 東京都杉並区井草********** S13005 \n3082 114-0001 東京都北区東十条********** S13020 \n14682 176-0002 東京都練馬区桜台********** S13019 \n\n application_date status_cd age_group \n20954 20170201 0-00000000-0 [40.0, 50.0) \n11786 20161230 0-00000000-0 [30.0, 40.0) \n21437 20151101 0-00000000-0 [30.0, 40.0) \n8791 20160201 0-00000000-0 [10.0, 20.0) \n15810 20150102 D-20100724-E [50.0, 60.0) \n3702 20161221 0-00000000-0 [60.0, inf) \n11297 20150418 F-20101027-E [40.0, 50.0) \n15839 20170221 0-00000000-0 [60.0, inf) \n3082 20150322 D-20100901-E [40.0, 50.0) \n14682 20150918 0-00000000-0 [60.0, inf) "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-077: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、ここでは外れ値を平均から3σ以上離れたものとする。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_ss'] = preprocessing.scale(df_sales_amount['amount'])\ndf_sales_amount.query('abs(amount_ss) >= 3').head(10)",
"execution_count": 87,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 87,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n amount_ss | \n
\n \n \n \n 332 | \n CS001605000009 | \n 18925 | \n 6.019921 | \n
\n \n 1755 | \n CS006415000147 | \n 12723 | \n 3.740202 | \n
\n \n 1817 | \n CS006515000023 | \n 18372 | \n 5.816651 | \n
\n \n 1833 | \n CS006515000125 | \n 12575 | \n 3.685800 | \n
\n \n 1841 | \n CS006515000209 | \n 11373 | \n 3.243972 | \n
\n \n 1870 | \n CS007115000006 | \n 11528 | \n 3.300946 | \n
\n \n 1941 | \n CS007514000056 | \n 13293 | \n 3.949721 | \n
\n \n 1943 | \n CS007514000094 | \n 15735 | \n 4.847347 | \n
\n \n 1951 | \n CS007515000107 | \n 11188 | \n 3.175970 | \n
\n \n 1997 | \n CS007615000026 | \n 11959 | \n 3.459372 | \n
\n \n
\n
",
"text/plain": " customer_id amount amount_ss\n332 CS001605000009 18925 6.019921\n1755 CS006415000147 12723 3.740202\n1817 CS006515000023 18372 5.816651\n1833 CS006515000125 12575 3.685800\n1841 CS006515000209 11373 3.243972\n1870 CS007115000006 11528 3.300946\n1941 CS007514000056 13293 3.949721\n1943 CS007514000094 15735 4.847347\n1951 CS007515000107 11188 3.175970\n1997 CS007615000026 11959 3.459372"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-078: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、ここでは外れ値を第一四分位と第三四分位の差であるIQRを用いて、「第一四分位数-1.5×IQR」よりも下回るもの、または「第三四分位数+1.5×IQR」を超えるものとする。結果は10件表示させれば良い。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\npct75 = np.percentile(df_sales_amount['amount'], q=75)\npct25 = np.percentile(df_sales_amount['amount'], q=25)\niqr = pct75 - pct25\namount_low = pct25 - (iqr * 1.5)\namount_hight = pct75 + (iqr * 1.5)\ndf_sales_amount.query('amount < @amount_low or @amount_hight < amount').head(10)",
"execution_count": 88,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 88,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount | \n
\n \n \n \n 98 | \n CS001414000048 | \n 8584 | \n
\n \n 332 | \n CS001605000009 | \n 18925 | \n
\n \n 549 | \n CS002415000594 | \n 9568 | \n
\n \n 1180 | \n CS004414000181 | \n 9584 | \n
\n \n 1558 | \n CS005415000137 | \n 8734 | \n
\n \n 1733 | \n CS006414000001 | \n 9156 | \n
\n \n 1736 | \n CS006414000029 | \n 9179 | \n
\n \n 1752 | \n CS006415000105 | \n 10042 | \n
\n \n 1755 | \n CS006415000147 | \n 12723 | \n
\n \n 1757 | \n CS006415000157 | \n 10648 | \n
\n \n
\n
",
"text/plain": " customer_id amount\n98 CS001414000048 8584\n332 CS001605000009 18925\n549 CS002415000594 9568\n1180 CS004414000181 9584\n1558 CS005415000137 8734\n1733 CS006414000001 9156\n1736 CS006414000029 9179\n1752 CS006415000105 10042\n1755 CS006415000147 12723\n1757 CS006415000157 10648"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-079: 商品データフレーム(df_product)の各項目に対し、欠損数を確認せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product.isnull().sum()",
"execution_count": 89,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 89,
"data": {
"text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 7\nunit_cost 7\ndtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-080: 商品データフレーム(df_product)のいずれかの項目に欠損が発生しているレコードを全て削除した新たなdf_product_1を作成せよ。なお、削除前後の件数を表示させ、前設問で確認した件数だけ減少していることも確認すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_1 = df_product.copy()\nprint('削除前:', len(df_product_1))\ndf_product_1.dropna(inplace=True)\nprint('削除後:', len(df_product_1))",
"execution_count": 90,
"outputs": [
{
"output_type": "stream",
"text": "削除前: 10030\n削除後: 10023\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-081: 単価(unit_price)と原価(unit_cost)の欠損値について、それぞれの平均値で補完した新たなdf_product_2を作成せよ。なお、平均値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_2 = df_product.fillna({'unit_price':np.round(np.nanmean(df_product['unit_price'])), \n 'unit_cost':np.round(np.nanmean(df_product['unit_cost']))})\ndf_product_2.isnull().sum()",
"execution_count": 91,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 91,
"data": {
"text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\ndtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-082: 単価(unit_price)と原価(unit_cost)の欠損値について、それぞれの中央値で補完した新たなdf_product_3を作成せよ。なお、中央値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_3 = df_product.fillna({'unit_price':np.round(np.nanmedian(df_product['unit_price'])), \n 'unit_cost':np.round(np.nanmedian(df_product['unit_cost']))})\ndf_product_3.isnull().sum()",
"execution_count": 92,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 92,
"data": {
"text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\ndtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-083: 単価(unit_price)と原価(unit_cost)の欠損値について、各商品の小区分(category_small_cd)ごとに算出した中央値で補完した新たなdf_product_4を作成せよ。なお、中央値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_product.groupby('category_small_cd').agg({'unit_price':'median', 'unit_cost':'median'}).reset_index()\ndf_tmp.columns = ['category_small_cd', 'median_price', 'median_cost']\n\ndf_product_4 = pd.merge(df_product, df_tmp, how='inner', on='category_small_cd')\n\ndf_product_4['unit_price'] = df_product_4[['unit_price', 'median_price']]. \\\n apply(lambda x: np.round(x[1]) if np.isnan(x[0]) else x[0], axis=1)\ndf_product_4['unit_cost'] = df_product_4[['unit_cost', 'median_cost']]. \\\n apply(lambda x: np.round(x[1]) if np.isnan(x[0]) else x[0], axis=1)\n\ndf_product_4.isnull().sum()",
"execution_count": 93,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 93,
"data": {
"text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\nmedian_price 0\nmedian_cost 0\ndtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-084: 顧客データフレーム(df_customer)の全顧客に対し、全期間の売上金額に占める2019年売上金額の割合を計算せよ。ただし、販売実績のない場合は0として扱うこと。そして計算した割合が0超のものを抽出せよ。 結果は10件表示させれば良い。また、作成したデータにNAやNANが存在しないことを確認せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp_1 = df_receipt.query('20190101 <= sales_ymd <= 20191231')\ndf_tmp_1 = pd.merge(df_customer['customer_id'], df_tmp_1[['customer_id', 'amount']], how='left', on='customer_id'). \\\n groupby('customer_id').sum().reset_index().rename(columns={'amount':'amount_2019'})\n\ndf_tmp_2 = pd.merge(df_customer['customer_id'], df_receipt[['customer_id', 'amount']], how='left', on='customer_id'). \\\n groupby('customer_id').sum().reset_index()\n\ndf_tmp = pd.merge(df_tmp_1, df_tmp_2, how='inner', on='customer_id')\ndf_tmp['amount_rate'] = df_tmp['amount_2019'] / df_tmp['amount']",
"execution_count": 94,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp.query('amount_rate > 0').head(10)",
"execution_count": 95,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 95,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n amount_2019 | \n amount | \n amount_rate | \n
\n \n \n \n 8 | \n CS001113000004 | \n 1298.0 | \n 1298.0 | \n 1.000000 | \n
\n \n 10 | \n CS001114000005 | \n 188.0 | \n 626.0 | \n 0.300319 | \n
\n \n 12 | \n CS001115000010 | \n 578.0 | \n 3044.0 | \n 0.189882 | \n
\n \n 17 | \n CS001205000004 | \n 702.0 | \n 1988.0 | \n 0.353119 | \n
\n \n 18 | \n CS001205000006 | \n 486.0 | \n 3337.0 | \n 0.145640 | \n
\n \n 23 | \n CS001211000025 | \n 456.0 | \n 456.0 | \n 1.000000 | \n
\n \n 30 | \n CS001212000070 | \n 456.0 | \n 456.0 | \n 1.000000 | \n
\n \n 57 | \n CS001214000009 | \n 664.0 | \n 4685.0 | \n 0.141729 | \n
\n \n 59 | \n CS001214000017 | \n 2962.0 | \n 4132.0 | \n 0.716844 | \n
\n \n 61 | \n CS001214000048 | \n 1889.0 | \n 2374.0 | \n 0.795703 | \n
\n \n
\n
",
"text/plain": " customer_id amount_2019 amount amount_rate\n8 CS001113000004 1298.0 1298.0 1.000000\n10 CS001114000005 188.0 626.0 0.300319\n12 CS001115000010 578.0 3044.0 0.189882\n17 CS001205000004 702.0 1988.0 0.353119\n18 CS001205000006 486.0 3337.0 0.145640\n23 CS001211000025 456.0 456.0 1.000000\n30 CS001212000070 456.0 456.0 1.000000\n57 CS001214000009 664.0 4685.0 0.141729\n59 CS001214000017 2962.0 4132.0 0.716844\n61 CS001214000048 1889.0 2374.0 0.795703"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_4.isnull().sum()",
"execution_count": 96,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 96,
"data": {
"text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\nmedian_price 0\nmedian_cost 0\ndtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-085: 顧客データフレーム(df_customer)の全顧客に対し、郵便番号(postal_cd)を用いて経度緯度変換用データフレーム(df_geocode)を紐付け、新たなdf_customer_1を作成せよ。ただし、複数紐づく場合は経度(longitude)、緯度(latitude)それぞれ平均を算出すること。\n"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer_1 = pd.merge(df_customer[['customer_id', 'postal_cd']],\n df_geocode[['postal_cd', 'longitude' ,'latitude']],\n how='inner', on='postal_cd')\ndf_customer_1 = df_customer_1.groupby('customer_id'). \\\n agg({'longitude':'mean', 'latitude':'mean'}).reset_index(). \\\n rename(columns={'longitude':'m_longitude', 'latitude':'m_latitude'})\n\ndf_customer_1 = pd.merge(df_customer, df_customer_1, how='inner', on='customer_id')\ndf_customer_1.head(3)",
"execution_count": 97,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 97,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n customer_name | \n gender_cd | \n gender | \n birth_day | \n age | \n postal_cd | \n address | \n application_store_cd | \n application_date | \n status_cd | \n age_group | \n m_longitude | \n m_latitude | \n
\n \n \n \n 0 | \n CS021313000114 | \n 大野 あや子 | \n 1 | \n 女性 | \n 1981-04-29 | \n 37 | \n 259-1113 | \n 神奈川県伊勢原市粟窪********** | \n S14021 | \n 20150905 | \n 0-00000000-0 | \n [30.0, 40.0) | \n 139.31779 | \n 35.41358 | \n
\n \n 1 | \n CS037613000071 | \n 六角 雅彦 | \n 9 | \n 不明 | \n 1952-04-01 | \n 66 | \n 136-0076 | \n 東京都江東区南砂********** | \n S13037 | \n 20150414 | \n 0-00000000-0 | \n [60.0, inf) | \n 139.83502 | \n 35.67193 | \n
\n \n 2 | \n CS031415000172 | \n 宇多田 貴美子 | \n 1 | \n 女性 | \n 1976-10-04 | \n 42 | \n 151-0053 | \n 東京都渋谷区代々木********** | \n S13031 | \n 20150529 | \n D-20100325-C | \n [40.0, 50.0) | \n 139.68965 | \n 35.67374 | \n
\n \n
\n
",
"text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n0 CS021313000114 大野 あや子 1 女性 1981-04-29 37 259-1113 \n1 CS037613000071 六角 雅彦 9 不明 1952-04-01 66 136-0076 \n2 CS031415000172 宇多田 貴美子 1 女性 1976-10-04 42 151-0053 \n\n address application_store_cd application_date status_cd \\\n0 神奈川県伊勢原市粟窪********** S14021 20150905 0-00000000-0 \n1 東京都江東区南砂********** S13037 20150414 0-00000000-0 \n2 東京都渋谷区代々木********** S13031 20150529 D-20100325-C \n\n age_group m_longitude m_latitude \n0 [30.0, 40.0) 139.31779 35.41358 \n1 [60.0, inf) 139.83502 35.67193 \n2 [40.0, 50.0) 139.68965 35.67374 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-086: 前設問で作成した緯度経度つき顧客データフレーム(df_customer_1)に対し、申込み店舗コード(application_store_cd)をキーに店舗データフレーム(df_store)と結合せよ。そして申込み店舗の緯度(latitude)・経度情報(longitude)と顧客の緯度・経度を用いて距離(km)を求め、顧客ID(customer_id)、顧客住所(address)、店舗住所(address)とともに表示せよ。計算式は簡易式で良いものとするが、その他精度の高い方式を利用したライブラリを利用してもかまわない。結果は10件表示すれば良い。"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "$$\n緯度(ラジアン):\\phi \\\\\n経度(ラジアン):\\lambda \\\\\n距離L = 6371 * arccos(sin \\phi_1 * sin \\phi_2\n+ cos \\phi_1 * cos \\phi_2 * cos(\\lambda_1 − \\lambda_2))\n$$"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "def calc_distance(x1, y1, x2, y2):\n distance = 6371 * math.acos(math.sin(math.radians(y1)) * math.sin(math.radians(y2)) \n + math.cos(math.radians(y1)) * math.cos(math.radians(y2)) \n * math.cos(math.radians(x1) - math.radians(x2)))\n return distance\n\ndf_tmp = pd.merge(df_customer_1, df_store, how='inner', left_on='application_store_cd', right_on='store_cd') \n\ndf_tmp['distance'] = df_tmp[['m_longitude', 'm_latitude','longitude', 'latitude']]. \\\n apply(lambda x: calc_distance(x[0], x[1], x[2], x[3]), axis=1)\n\ndf_tmp[['customer_id', 'address_x', 'address_y', 'distance']].head(10)",
"execution_count": 98,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 98,
"data": {
"text/html": "\n\n
\n \n \n | \n customer_id | \n address_x | \n address_y | \n distance | \n
\n \n \n \n 0 | \n CS021313000114 | \n 神奈川県伊勢原市粟窪********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 1.394409 | \n
\n \n 1 | \n CS021313000025 | \n 神奈川県伊勢原市伊勢原********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 0.474282 | \n
\n \n 2 | \n CS021411000096 | \n 神奈川県伊勢原市高森********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 2.480155 | \n
\n \n 3 | \n CS021415000150 | \n 神奈川県伊勢原市上粕屋********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 2.734723 | \n
\n \n 4 | \n CS021313000046 | \n 神奈川県伊勢原市池端********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 1.111911 | \n
\n \n 5 | \n CS021103000002 | \n 神奈川県伊勢原市西富岡********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 2.384941 | \n
\n \n 6 | \n CS021214000028 | \n 神奈川県伊勢原市桜台********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 1.399344 | \n
\n \n 7 | \n CS021512000095 | \n 神奈川県伊勢原市沼目********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 1.993991 | \n
\n \n 8 | \n CS021613000002 | \n 神奈川県伊勢原市桜台********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 1.399344 | \n
\n \n 9 | \n CS021412000147 | \n 神奈川県伊勢原市三ノ宮********** | \n 神奈川県伊勢原市伊勢原四丁目 | \n 3.507680 | \n
\n \n
\n
",
"text/plain": " customer_id address_x address_y distance\n0 CS021313000114 神奈川県伊勢原市粟窪********** 神奈川県伊勢原市伊勢原四丁目 1.394409\n1 CS021313000025 神奈川県伊勢原市伊勢原********** 神奈川県伊勢原市伊勢原四丁目 0.474282\n2 CS021411000096 神奈川県伊勢原市高森********** 神奈川県伊勢原市伊勢原四丁目 2.480155\n3 CS021415000150 神奈川県伊勢原市上粕屋********** 神奈川県伊勢原市伊勢原四丁目 2.734723\n4 CS021313000046 神奈川県伊勢原市池端********** 神奈川県伊勢原市伊勢原四丁目 1.111911\n5 CS021103000002 神奈川県伊勢原市西富岡********** 神奈川県伊勢原市伊勢原四丁目 2.384941\n6 CS021214000028 神奈川県伊勢原市桜台********** 神奈川県伊勢原市伊勢原四丁目 1.399344\n7 CS021512000095 神奈川県伊勢原市沼目********** 神奈川県伊勢原市伊勢原四丁目 1.993991\n8 CS021613000002 神奈川県伊勢原市桜台********** 神奈川県伊勢原市伊勢原四丁目 1.399344\n9 CS021412000147 神奈川県伊勢原市三ノ宮********** 神奈川県伊勢原市伊勢原四丁目 3.507680"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-087: 顧客データフレーム(df_customer)では、異なる店舗での申込みなどにより同一顧客が複数登録されている。名前(customer_name)と郵便番号(postal_cd)が同じ顧客は同一顧客とみなし、1顧客1レコードとなるように名寄せした名寄顧客データフレーム(df_customer_u)を作成せよ。ただし、同一顧客に対しては売上金額合計が最も高いものを残すものとし、売上金額合計が同一もしくは売上実績の無い顧客については顧客ID(customer_id)の番号が小さいものを残すこととする。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_receipt.groupby('customer_id').agg({'amount':sum}).reset_index()\ndf_customer_u = pd.merge(df_customer, df_tmp, how='left', on='customer_id').sort_values(['amount', 'customer_id']\n , ascending=[False, True])\ndf_customer_u.drop_duplicates(subset=['customer_name', 'postal_cd'], keep='first', inplace=True)\n\nprint('減少数: ', len(df_customer) - len(df_customer_u))",
"execution_count": 99,
"outputs": [
{
"output_type": "stream",
"text": "減少数: 30\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-088: 前設問で作成したデータを元に、顧客データフレームに統合名寄IDを付与したデータフレーム(df_customer_n)を作成せよ。ただし、統合名寄IDは以下の仕様で付与するものとする。\n>\n> - 重複していない顧客:顧客ID(customer_id)を設定\n> - 重複している顧客:前設問で抽出したレコードの顧客IDを設定"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_customer_n = pd.merge(df_customer, df_customer_u[['customer_name', 'postal_cd', 'customer_id']],\n how='inner', on =['customer_name', 'postal_cd'])\ndf_customer_n.rename(columns={'customer_id_x':'customer_id', 'customer_id_y':'integration_id'}, inplace=True)\n\nprint('ID数の差', len(df_customer_n['customer_id'].unique()) - len(df_customer_n['integration_id'].unique()))",
"execution_count": 100,
"outputs": [
{
"output_type": "stream",
"text": "ID数の差 30\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-閑話: df_customer_1, df_customer_nは使わないので削除する。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "del df_customer_1\ndel df_customer_n",
"execution_count": 101,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-089: 売上実績のある顧客に対し、予測モデル構築のため学習用データとテスト用データに分割したい。それぞれ8:2の割合でランダムにデータを分割せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.merge(df_customer, df_receipt['customer_id'], how='inner', on='customer_id')\ndf_train, df_test = train_test_split(df_tmp, test_size=0.2, random_state=71)\nprint('学習データ割合: ', len(df_train) / len(df_tmp))\nprint('テストデータ割合: ', len(df_test) / len(df_tmp))",
"execution_count": 102,
"outputs": [
{
"output_type": "stream",
"text": "学習データ割合: 0.7999908650771901\nテストデータ割合: 0.2000091349228099\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-090: レシート明細データフレーム(df_receipt)は2017年1月1日〜2019年10月31日までのデータを有している。売上金額(amount)を月次で集計し、学習用に12ヶ月、テスト用に6ヶ月のモデル構築用データを3セット作成せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = df_receipt[['sales_ymd', 'amount']].copy()\ndf_tmp['sales_ym'] = df_tmp['sales_ymd'].astype('str').str[0:6]\ndf_tmp = df_tmp.groupby('sales_ym').agg({'amount':'sum'}).reset_index()\n\n# 関数化することで長期間データに対する多数のデータセットもループなどで処理できるようにする\ndef split_data(df, train_size, test_size, slide_window, start_point):\n train_start = start_point * slide_window\n test_start = train_start + train_size\n return df[train_start : test_start], df[test_start : test_start + test_size]\n\ndf_train_1, df_test_1 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=0)\ndf_train_2, df_test_2 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=1)\ndf_train_3, df_test_3 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=2)",
"execution_count": 103,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_train_1",
"execution_count": 104,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 104,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ym | \n amount | \n
\n \n \n \n 0 | \n 201701 | \n 902056 | \n
\n \n 1 | \n 201702 | \n 764413 | \n
\n \n 2 | \n 201703 | \n 962945 | \n
\n \n 3 | \n 201704 | \n 847566 | \n
\n \n 4 | \n 201705 | \n 884010 | \n
\n \n 5 | \n 201706 | \n 894242 | \n
\n \n 6 | \n 201707 | \n 959205 | \n
\n \n 7 | \n 201708 | \n 954836 | \n
\n \n 8 | \n 201709 | \n 902037 | \n
\n \n 9 | \n 201710 | \n 905739 | \n
\n \n 10 | \n 201711 | \n 932157 | \n
\n \n 11 | \n 201712 | \n 939654 | \n
\n \n
\n
",
"text/plain": " sales_ym amount\n0 201701 902056\n1 201702 764413\n2 201703 962945\n3 201704 847566\n4 201705 884010\n5 201706 894242\n6 201707 959205\n7 201708 954836\n8 201709 902037\n9 201710 905739\n10 201711 932157\n11 201712 939654"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_test_1",
"execution_count": 105,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 105,
"data": {
"text/html": "\n\n
\n \n \n | \n sales_ym | \n amount | \n
\n \n \n \n 12 | \n 201801 | \n 944509 | \n
\n \n 13 | \n 201802 | \n 864128 | \n
\n \n 14 | \n 201803 | \n 946588 | \n
\n \n 15 | \n 201804 | \n 937099 | \n
\n \n 16 | \n 201805 | \n 1004438 | \n
\n \n 17 | \n 201806 | \n 1012329 | \n
\n \n
\n
",
"text/plain": " sales_ym amount\n12 201801 944509\n13 201802 864128\n14 201803 946588\n15 201804 937099\n16 201805 1004438\n17 201806 1012329"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-091: 顧客データフレーム(df_customer)の各顧客に対し、売上実績のある顧客数と売上実績のない顧客数が1:1となるようにアンダーサンプリングで抽出せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "#unbalancedのubUnderを使った例\ndf_tmp = df_receipt.groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_tmp = pd.merge(df_customer, df_tmp, how='left', on='customer_id')\ndf_tmp['buy_flg'] = df_tmp['amount'].apply(lambda x: 0 if np.isnan(x) else 1)\n\nprint('0の件数', len(df_tmp.query('buy_flg == 0')))\nprint('1の件数', len(df_tmp.query('buy_flg == 1')))\n\npositive_count = len(df_tmp.query('buy_flg == 1'))\n\nrs = RandomUnderSampler(random_state=71)\n\ndf_sample, _ = rs.fit_sample(df_tmp, df_tmp.buy_flg)\n\nprint('0の件数', len(df_sample.query('buy_flg == 0')))\nprint('1の件数', len(df_sample.query('buy_flg == 1')))",
"execution_count": 106,
"outputs": [
{
"output_type": "stream",
"text": "0の件数 13665\n1の件数 8306\n0の件数 8306\n1の件数 8306\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-092: 顧客データフレーム(df_customer)では、性別に関する情報が非正規化の状態で保持されている。これを第三正規化せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_gender = df_customer[['gender_cd', 'gender']].drop_duplicates()\ndf_customer_s = df_customer.drop(columns='gender')",
"execution_count": 107,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-093: 商品データフレーム(df_product)では各カテゴリのコード値だけを保有し、カテゴリ名は保有していない。カテゴリデータフレーム(df_category)と組み合わせて非正規化し、カテゴリ名を保有した新たな商品データフレームを作成せよ。"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_full = pd.merge(df_product, df_category[['category_small_cd', \n 'category_major_name',\n 'category_medium_name',\n 'category_small_name']], \n how = 'inner', on = 'category_small_cd')",
"execution_count": 108,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-094: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "os.makedirs('./data')\ndf_product_full.to_csv('./data/P_df_product_full_UTF-8_header.csv', encoding='UTF-8', index=False)",
"execution_count": 109,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-095: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはCP932"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_full.to_csv('./data/P_df_product_full_CP932_header.csv', encoding='CP932', index=False)",
"execution_count": 110,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-096: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ無し\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_full.to_csv('./data/P_df_product_full_UTF-8_noh.csv', header=False ,encoding='UTF-8', index=False)",
"execution_count": 111,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-097: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.read_csv('./data/P_df_product_full_UTF-8_header.csv')\ndf_tmp.head(10)",
"execution_count": 112,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 112,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n category_major_name | \n category_medium_name | \n category_small_name | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_major_name category_medium_name \\\n0 198.0 149.0 惣菜 御飯類 \n1 218.0 164.0 惣菜 御飯類 \n2 230.0 173.0 惣菜 御飯類 \n3 248.0 186.0 惣菜 御飯類 \n4 268.0 201.0 惣菜 御飯類 \n5 298.0 224.0 惣菜 御飯類 \n6 338.0 254.0 惣菜 御飯類 \n7 420.0 315.0 惣菜 御飯類 \n8 498.0 374.0 惣菜 御飯類 \n9 580.0 435.0 惣菜 御飯類 \n\n category_small_name \n0 弁当類 \n1 弁当類 \n2 弁当類 \n3 弁当類 \n4 弁当類 \n5 弁当類 \n6 弁当類 \n7 弁当類 \n8 弁当類 \n9 弁当類 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-098: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ無し\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.read_csv('./data/P_df_product_full_UTF-8_noh.csv', header=None)\ndf_tmp.head(10)",
"execution_count": 113,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 113,
"data": {
"text/html": "\n\n
\n \n \n | \n 0 | \n 1 | \n 2 | \n 3 | \n 4 | \n 5 | \n 6 | \n 7 | \n 8 | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n
\n
",
"text/plain": " 0 1 2 3 4 5 6 7 8\n0 P040101001 4 401 40101 198.0 149.0 惣菜 御飯類 弁当類\n1 P040101002 4 401 40101 218.0 164.0 惣菜 御飯類 弁当類\n2 P040101003 4 401 40101 230.0 173.0 惣菜 御飯類 弁当類\n3 P040101004 4 401 40101 248.0 186.0 惣菜 御飯類 弁当類\n4 P040101005 4 401 40101 268.0 201.0 惣菜 御飯類 弁当類\n5 P040101006 4 401 40101 298.0 224.0 惣菜 御飯類 弁当類\n6 P040101007 4 401 40101 338.0 254.0 惣菜 御飯類 弁当類\n7 P040101008 4 401 40101 420.0 315.0 惣菜 御飯類 弁当類\n8 P040101009 4 401 40101 498.0 374.0 惣菜 御飯類 弁当類\n9 P040101010 4 401 40101 580.0 435.0 惣菜 御飯類 弁当類"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-099: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はTSV(タブ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_product_full.to_csv('./data/P_df_product_full_UTF-8_header.tsv', sep='\\t', encoding='UTF-8', index=False)",
"execution_count": 114,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "---\n> P-100: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はTSV(タブ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_tmp = pd.read_table('./data/P_df_product_full_UTF-8_header.tsv', encoding='UTF-8')\ndf_tmp.head(10)",
"execution_count": 115,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 115,
"data": {
"text/html": "\n\n
\n \n \n | \n product_cd | \n category_major_cd | \n category_medium_cd | \n category_small_cd | \n unit_price | \n unit_cost | \n category_major_name | \n category_medium_name | \n category_small_name | \n
\n \n \n \n 0 | \n P040101001 | \n 4 | \n 401 | \n 40101 | \n 198.0 | \n 149.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 1 | \n P040101002 | \n 4 | \n 401 | \n 40101 | \n 218.0 | \n 164.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 2 | \n P040101003 | \n 4 | \n 401 | \n 40101 | \n 230.0 | \n 173.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 3 | \n P040101004 | \n 4 | \n 401 | \n 40101 | \n 248.0 | \n 186.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 4 | \n P040101005 | \n 4 | \n 401 | \n 40101 | \n 268.0 | \n 201.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 5 | \n P040101006 | \n 4 | \n 401 | \n 40101 | \n 298.0 | \n 224.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 6 | \n P040101007 | \n 4 | \n 401 | \n 40101 | \n 338.0 | \n 254.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 7 | \n P040101008 | \n 4 | \n 401 | \n 40101 | \n 420.0 | \n 315.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 8 | \n P040101009 | \n 4 | \n 401 | \n 40101 | \n 498.0 | \n 374.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n 9 | \n P040101010 | \n 4 | \n 401 | \n 40101 | \n 580.0 | \n 435.0 | \n 惣菜 | \n 御飯類 | \n 弁当類 | \n
\n \n
\n
",
"text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_major_name category_medium_name \\\n0 198.0 149.0 惣菜 御飯類 \n1 218.0 164.0 惣菜 御飯類 \n2 230.0 173.0 惣菜 御飯類 \n3 248.0 186.0 惣菜 御飯類 \n4 268.0 201.0 惣菜 御飯類 \n5 298.0 224.0 惣菜 御飯類 \n6 338.0 254.0 惣菜 御飯類 \n7 420.0 315.0 惣菜 御飯類 \n8 498.0 374.0 惣菜 御飯類 \n9 580.0 435.0 惣菜 御飯類 \n\n category_small_name \n0 弁当類 \n1 弁当類 \n2 弁当類 \n3 弁当類 \n4 弁当類 \n5 弁当類 \n6 弁当類 \n7 弁当類 \n8 弁当類 \n9 弁当類 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# これで100本終わりです。おつかれさまでした!"
}
],
"metadata": {
"kernelspec": {
"name": "python36",
"display_name": "Python 3.6",
"language": "python"
},
"language_info": {
"mimetype": "text/x-python",
"nbconvert_exporter": "python",
"name": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6",
"file_extension": ".py",
"codemirror_mode": {
"version": 3,
"name": "ipython"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}