{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": "# データサイエンス100本ノック(構造化データ加工編) - Python\n# for AzureNotebook" }, { "metadata": {}, "cell_type": "markdown", "source": "## 【注意】オリジナル版との変更点\n1. Azure Notebook ではDockerを使えないので、2020.06.18時点の 100knocks-preprocess/docker/work/data にあるCSVファイルをPostgreSQLから入手する代わりに使っています。\n2. オリジナルのCSVデータ, geocode.csvの'latitude'列名の初めにスペースが入っていたため、それを削除しました。\n\n オリジナル(100knocks-preprocess ver.1.0): ' latitude' --> 'latitude'\n \n \n3. 本環境下において、いくつかのオリジナル解答例にエラーが確認されましたので、オリジナルのスクリプトを最大限尊重しコードを訂正しました。\n4. オリジナルの解答例に必要のないlibraryはimportせず、そして必要なlibraryをAzureNotebookでインストールするように最初のセルを改変してあります。\n5. また、SQLではなく上記CSVからデータを読み込むように'はじめに'の最初のセルを改変してあります。" }, { "metadata": {}, "cell_type": "markdown", "source": "## 【Azure Notebook】動かす前に\n1. 本スクリプトはPython3.6で動作検証しました。メニュ-->Kernel-->Change Kernelから、Python 3.6を選んでください。Python3ではlibraryをインストールする際にエラーが発生します。" }, { "metadata": {}, "cell_type": "markdown", "source": "## はじめに\n- 初めに以下のセルを実行してください\n- 必要なライブラリのインポートと~~データベース(PostgreSQL)~~ 100knocks-preprocess/docker/work/data にあるCSVファイルからのデータ読み込みを行います。geocode.csvに変更を加えたため、またgit cloneをするとAzure Notebooksの簡便さを損なうため、筆者(noguhiro2002)のgithubレポジトリから直接読み込みます。\n- pandas等、利用が想定されるライブラリは以下セルでインポートしています\n- ~~その他利用したいライブラリがあれば適宜インストールしてください(\"!pip install ライブラリ名\"でインストールも可能)~~\n- オリジナルの解答例を元に、必要なライブラリーをpipでインストールします。\n- 処理は複数回に分けても構いません\n- 名前、住所等はダミーデータであり、実在するものではありません" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# pipでオリジナルの解答に必要なライブラリーをインストール\n!pip install --upgrade pip\n!pip install -U pandas numpy scikit-learn\n!pip install imbalanced-learn\n\n\n# pipでオリジナルの解答に必要なライブラリーをインポート\nimport os\nimport pandas as pd\nimport numpy as np\nfrom datetime import datetime, date\nfrom dateutil.relativedelta import relativedelta\nimport math\nfrom sklearn import preprocessing\nfrom sklearn.model_selection import train_test_split\nfrom imblearn.under_sampling import RandomUnderSampler\n\n\n# データを github/noguhiro2002/100knocks-preprocess/work/data フォルダよりDataframe形式でRead\ndf_customer = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/customer.csv')\ndf_category = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/category.csv')\ndf_product = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/product.csv')\ndf_receipt = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/receipt.csv')\ndf_store = pd.read_csv('https://raw.githubusercontent.com/The-Japan-DataScientist-Society/100knocks-preprocess/master/docker/work/data/store.csv')\ndf_geocode = pd.read_csv('https://raw.githubusercontent.com/noguhiro2002/100knocks-preprocess_ForColab-AzureNotebook/master/data/geocode.csv')", "execution_count": 1, "outputs": [ { "output_type": "stream", "text": "Requirement already up-to-date: pip in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (20.1.1)\nRequirement already up-to-date: pandas in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (1.0.4)\nRequirement already up-to-date: numpy in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (1.18.5)\nRequirement already up-to-date: scikit-learn in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.23.1)\nRequirement already satisfied, skipping upgrade: python-dateutil>=2.6.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas) (2.8.1)\nRequirement already satisfied, skipping upgrade: pytz>=2017.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas) (2019.3)\nRequirement already satisfied, skipping upgrade: threadpoolctl>=2.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (2.1.0)\nRequirement already satisfied, skipping upgrade: scipy>=0.19.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (1.1.0)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn) (0.14.0)\nRequirement already satisfied, skipping upgrade: six>=1.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.11.0)\nRequirement already satisfied: imbalanced-learn in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.7.0)\nRequirement already satisfied: scikit-learn>=0.23 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (0.23.1)\nRequirement already satisfied: scipy>=0.19.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (1.1.0)\nRequirement already satisfied: numpy>=1.13.3 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (1.18.5)\nRequirement already satisfied: joblib>=0.11 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from imbalanced-learn) (0.14.0)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from scikit-learn>=0.23->imbalanced-learn) (2.1.0)\n", "name": "stdout" }, { "output_type": "stream", "text": "/home/nbuser/anaconda3_501/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (4) have mixed types.Specify dtype option on import or set low_memory=False.\n interactivity=interactivity, compiler=compiler, result=result)\n", "name": "stderr" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "# 演習問題" }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-001: レシート明細のデータフレーム(df_receipt)から全項目の先頭10件を表示し、どのようなデータを保有しているか目視で確認せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.head(10)", "execution_count": 2, "outputs": [ { "output_type": "execute_result", "execution_count": 2, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdsales_epochstore_cdreceipt_noreceipt_sub_nocustomer_idproduct_cdquantityamount
0201811031257206400S140061121CS006214000001P0703050121158
1201811181258502400S1300811322CS008415000097P070701017181
2201707121215820800S1402811021CS028414000014P0601010051170
3201902051265328000S1404211321ZZ000000000000P050301001125
4201808211250812800S1402511022CS025415000050P060102007190
5201906051275696000S1300311121CS003515000195P0501020021138
6201812051259971200S1402411022CS024514000042P080101005130
7201909221285113600S1404011021CS040415000178P0705010041128
8201705041209859200S1302011122ZZ000000000000P0713020101770
9201910101286668800S1402711021CS027514000015P0711010031680
\n
", "text/plain": " sales_ymd sales_epoch store_cd receipt_no receipt_sub_no \\\n0 20181103 1257206400 S14006 112 1 \n1 20181118 1258502400 S13008 1132 2 \n2 20170712 1215820800 S14028 1102 1 \n3 20190205 1265328000 S14042 1132 1 \n4 20180821 1250812800 S14025 1102 2 \n5 20190605 1275696000 S13003 1112 1 \n6 20181205 1259971200 S14024 1102 2 \n7 20190922 1285113600 S14040 1102 1 \n8 20170504 1209859200 S13020 1112 2 \n9 20191010 1286668800 S14027 1102 1 \n\n customer_id product_cd quantity amount \n0 CS006214000001 P070305012 1 158 \n1 CS008415000097 P070701017 1 81 \n2 CS028414000014 P060101005 1 170 \n3 ZZ000000000000 P050301001 1 25 \n4 CS025415000050 P060102007 1 90 \n5 CS003515000195 P050102002 1 138 \n6 CS024514000042 P080101005 1 30 \n7 CS040415000178 P070501004 1 128 \n8 ZZ000000000000 P071302010 1 770 \n9 CS027514000015 P071101003 1 680 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-002: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、10件表示させよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].head(10)", "execution_count": 3, "outputs": [ { "output_type": "execute_result", "execution_count": 3, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdamount
020181103CS006214000001P070305012158
120181118CS008415000097P07070101781
220170712CS028414000014P060101005170
320190205ZZ000000000000P05030100125
420180821CS025415000050P06010200790
520190605CS003515000195P050102002138
620181205CS024514000042P08010100530
720190922CS040415000178P070501004128
820170504ZZ000000000000P071302010770
920191010CS027514000015P071101003680
\n
", "text/plain": " sales_ymd customer_id product_cd amount\n0 20181103 CS006214000001 P070305012 158\n1 20181118 CS008415000097 P070701017 81\n2 20170712 CS028414000014 P060101005 170\n3 20190205 ZZ000000000000 P050301001 25\n4 20180821 CS025415000050 P060102007 90\n5 20190605 CS003515000195 P050102002 138\n6 20181205 CS024514000042 P080101005 30\n7 20190922 CS040415000178 P070501004 128\n8 20170504 ZZ000000000000 P071302010 770\n9 20191010 CS027514000015 P071101003 680" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-003: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、10件表示させよ。ただし、sales_ymdはsales_dateに項目名を変更しながら抽出すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].rename(columns={'sales_ymd': 'sales_date'}).head(10)", "execution_count": 4, "outputs": [ { "output_type": "execute_result", "execution_count": 4, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_datecustomer_idproduct_cdamount
020181103CS006214000001P070305012158
120181118CS008415000097P07070101781
220170712CS028414000014P060101005170
320190205ZZ000000000000P05030100125
420180821CS025415000050P06010200790
520190605CS003515000195P050102002138
620181205CS024514000042P08010100530
720190922CS040415000178P070501004128
820170504ZZ000000000000P071302010770
920191010CS027514000015P071101003680
\n
", "text/plain": " sales_date customer_id product_cd amount\n0 20181103 CS006214000001 P070305012 158\n1 20181118 CS008415000097 P070701017 81\n2 20170712 CS028414000014 P060101005 170\n3 20190205 ZZ000000000000 P050301001 25\n4 20180821 CS025415000050 P060102007 90\n5 20190605 CS003515000195 P050102002 138\n6 20181205 CS024514000042 P080101005 30\n7 20190922 CS040415000178 P070501004 128\n8 20170504 ZZ000000000000 P071302010 770\n9 20191010 CS027514000015 P071101003 680" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-004: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].query('customer_id == \"CS018205000001\"')", "execution_count": 5, "outputs": [ { "output_type": "execute_result", "execution_count": 5, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdamount
3620180911CS018205000001P0714010122200
984320180414CS018205000001P060104007600
2111020170614CS018205000001P050206001990
2767320170614CS018205000001P060702015108
2784020190216CS018205000001P071005024102
2875720180414CS018205000001P071101002278
3925620190226CS018205000001P070902035168
5812120190924CS018205000001P060805001495
6811720190226CS018205000001P0714010202200
7225420180911CS018205000001P0714010051100
8850820190216CS018205000001P040101002218
9152520190924CS018205000001P091503001280
\n
", "text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n9843 20180414 CS018205000001 P060104007 600\n21110 20170614 CS018205000001 P050206001 990\n27673 20170614 CS018205000001 P060702015 108\n27840 20190216 CS018205000001 P071005024 102\n28757 20180414 CS018205000001 P071101002 278\n39256 20190226 CS018205000001 P070902035 168\n58121 20190924 CS018205000001 P060805001 495\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100\n88508 20190216 CS018205000001 P040101002 218\n91525 20190924 CS018205000001 P091503001 280" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-005: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & amount >= 1000')", "execution_count": 6, "outputs": [ { "output_type": "execute_result", "execution_count": 6, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdamount
3620180911CS018205000001P0714010122200
6811720190226CS018205000001P0714010202200
7225420180911CS018205000001P0714010051100
\n
", "text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-006: レシート明細データフレーム「df_receipt」から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上数量(quantity)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上または売上数量(quantity)が5以上" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'quantity', 'amount']].query('customer_id == \"CS018205000001\" & (amount >= 1000 | quantity >=5)')", "execution_count": 7, "outputs": [ { "output_type": "execute_result", "execution_count": 7, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdquantityamount
3620180911CS018205000001P07140101212200
984320180414CS018205000001P0601040076600
2111020170614CS018205000001P0502060015990
6811720190226CS018205000001P07140102012200
7225420180911CS018205000001P07140100511100
\n
", "text/plain": " sales_ymd customer_id product_cd quantity amount\n36 20180911 CS018205000001 P071401012 1 2200\n9843 20180414 CS018205000001 P060104007 6 600\n21110 20170614 CS018205000001 P050206001 5 990\n68117 20190226 CS018205000001 P071401020 1 2200\n72254 20180911 CS018205000001 P071401005 1 1100" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-007: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 売上金額(amount)が1,000以上2,000以下" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & 1000 <= amount <= 2000')", "execution_count": 8, "outputs": [ { "output_type": "execute_result", "execution_count": 8, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdamount
7225420180911CS018205000001P0714010051100
\n
", "text/plain": " sales_ymd customer_id product_cd amount\n72254 20180911 CS018205000001 P071401005 1100" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-008: レシート明細のデータフレーム(df_receipt)から売上日(sales_ymd)、顧客ID(customer_id)、商品コード(product_cd)、売上金額(amount)の順に列を指定し、以下の条件を満たすデータを抽出せよ。\n> - 顧客ID(customer_id)が\"CS018205000001\"\n> - 商品コード(product_cd)が\"P071401019\"以外" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \\\n .query('customer_id == \"CS018205000001\" & product_cd != \"P071401019\"')", "execution_count": 9, "outputs": [ { "output_type": "execute_result", "execution_count": 9, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdcustomer_idproduct_cdamount
3620180911CS018205000001P0714010122200
984320180414CS018205000001P060104007600
2111020170614CS018205000001P050206001990
2767320170614CS018205000001P060702015108
2784020190216CS018205000001P071005024102
2875720180414CS018205000001P071101002278
3925620190226CS018205000001P070902035168
5812120190924CS018205000001P060805001495
6811720190226CS018205000001P0714010202200
7225420180911CS018205000001P0714010051100
8850820190216CS018205000001P040101002218
9152520190924CS018205000001P091503001280
\n
", "text/plain": " sales_ymd customer_id product_cd amount\n36 20180911 CS018205000001 P071401012 2200\n9843 20180414 CS018205000001 P060104007 600\n21110 20170614 CS018205000001 P050206001 990\n27673 20170614 CS018205000001 P060702015 108\n27840 20190216 CS018205000001 P071005024 102\n28757 20180414 CS018205000001 P071101002 278\n39256 20190226 CS018205000001 P070902035 168\n58121 20190924 CS018205000001 P060805001 495\n68117 20190226 CS018205000001 P071401020 2200\n72254 20180911 CS018205000001 P071401005 1100\n88508 20190216 CS018205000001 P040101002 218\n91525 20190924 CS018205000001 P091503001 280" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-009: 以下の処理において、出力結果を変えずにORをANDに書き換えよ。\n\n`df_store.query('not(prefecture_cd == \"13\" | floor_area > 900)')`" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_store.query('prefecture_cd != \"13\" & floor_area <= 900')", "execution_count": 10, "outputs": [ { "output_type": "execute_result", "execution_count": 10, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdstore_nameprefecture_cdprefectureaddressaddress_kanatel_nolongitudelatitudefloor_area
18S14046北山田店14神奈川県神奈川県横浜市都筑区北山田一丁目カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ045-123-4049139.591635.56189831.0
20S14011日吉本町店14神奈川県神奈川県横浜市港北区日吉本町四丁目カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ045-123-4033139.631635.54655890.0
38S12013習志野店12千葉県千葉県習志野市芝園一丁目チバケンナラシノシシバゾノイッチョウメ047-123-4002140.022035.66122808.0
\n
", "text/plain": " store_cd store_name prefecture_cd prefecture address \\\n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n38 S12013 習志野店 12 千葉県 千葉県習志野市芝園一丁目 \n\n address_kana tel_no longitude latitude \\\n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n38 チバケンナラシノシシバゾノイッチョウメ 047-123-4002 140.0220 35.66122 \n\n floor_area \n18 831.0 \n20 890.0 \n38 808.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-010: 店舗データフレーム(df_store)から、店舗コード(store_cd)が\"S14\"で始まるものだけ全項目抽出し、10件だけ表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_store.query(\"store_cd.str.startswith('S14')\", engine='python').head(10)", "execution_count": 11, "outputs": [ { "output_type": "execute_result", "execution_count": 11, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdstore_nameprefecture_cdprefectureaddressaddress_kanatel_nolongitudelatitudefloor_area
2S14010菊名店14神奈川県神奈川県横浜市港北区菊名一丁目カナガワケンヨコハマシコウホククキクナイッチョウメ045-123-4032139.632635.500491732.0
3S14033阿久和店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4043139.496135.459181495.0
4S14036相模原中央店14神奈川県神奈川県相模原市中央二丁目カナガワケンサガミハラシチュウオウニチョウメ042-123-4045139.371635.573271679.0
7S14040長津田店14神奈川県神奈川県横浜市緑区長津田みなみ台五丁目カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ045-123-4046139.499435.523981548.0
9S14050阿久和西店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4053139.496135.459181830.0
12S14028二ツ橋店14神奈川県神奈川県横浜市瀬谷区二ツ橋町カナガワケンヨコハマシセヤクフタツバシチョウ045-123-4042139.496335.463041574.0
16S14012本牧和田店14神奈川県神奈川県横浜市中区本牧和田カナガワケンヨコハマシナカクホンモクワダ045-123-4034139.658235.421561341.0
18S14046北山田店14神奈川県神奈川県横浜市都筑区北山田一丁目カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ045-123-4049139.591635.56189831.0
19S14022逗子店14神奈川県神奈川県逗子市逗子一丁目カナガワケンズシシズシイッチョウメ046-123-4036139.578935.296421838.0
20S14011日吉本町店14神奈川県神奈川県横浜市港北区日吉本町四丁目カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ045-123-4033139.631635.54655890.0
\n
", "text/plain": " store_cd store_name prefecture_cd prefecture address \\\n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n4 S14036 相模原中央店 14 神奈川県 神奈川県相模原市中央二丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n19 S14022 逗子店 14 神奈川県 神奈川県逗子市逗子一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n\n address_kana tel_no longitude latitude \\\n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n4 カナガワケンサガミハラシチュウオウニチョウメ 042-123-4045 139.3716 35.57327 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n19 カナガワケンズシシズシイッチョウメ 046-123-4036 139.5789 35.29642 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n\n floor_area \n2 1732.0 \n3 1495.0 \n4 1679.0 \n7 1548.0 \n9 1830.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n19 1838.0 \n20 890.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-011: 顧客データフレーム(df_customer)から顧客ID(customer_id)の末尾が1のものだけ全項目抽出し、10件だけ表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.query(\"customer_id.str.endswith('1')\", engine='python').head(10)", "execution_count": 12, "outputs": [ { "output_type": "execute_result", "execution_count": 12, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
1CS037613000071六角 雅彦9不明1952-04-0166136-0076東京都江東区南砂**********S13037201504140-00000000-0
3CS028811000001堀井 かおり1女性1933-03-2786245-0016神奈川県横浜市泉区和泉町**********S14028201601150-00000000-0
14CS040412000191川井 郁恵1女性1977-01-0542226-0021神奈川県横浜市緑区北八朔町**********S14040201511011-20091025-4
31CS028314000011小菅 あおい1女性1983-11-2635246-0038神奈川県横浜市瀬谷区宮沢**********S14028201511231-20080426-5
56CS039212000051藤島 恵梨香1女性1997-02-0322166-0001東京都杉並区阿佐谷北**********S13039201711211-20100215-4
59CS015412000111松居 奈月1女性1972-10-0446136-0071東京都江東区亀戸**********S13015201506290-00000000-0
63CS004702000041野島 洋0男性1943-08-2475176-0022東京都練馬区向山**********S13004201702180-00000000-0
74CS041515000001栗田 千夏1女性1967-01-0252206-0001東京都多摩市和田**********S1304120160422E-20100803-F
85CS029313000221北条 ひかり1女性1987-06-1931279-0011千葉県浦安市美浜**********S12029201808100-00000000-0
102CS034312000071望月 奈央1女性1980-09-2038213-0026神奈川県川崎市高津区久末**********S14034201601060-00000000-0
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n1 CS037613000071 六角 雅彦 9 不明 1952-04-01 66 \n3 CS028811000001 堀井 かおり 1 女性 1933-03-27 86 \n14 CS040412000191 川井 郁恵 1 女性 1977-01-05 42 \n31 CS028314000011 小菅 あおい 1 女性 1983-11-26 35 \n56 CS039212000051 藤島 恵梨香 1 女性 1997-02-03 22 \n59 CS015412000111 松居 奈月 1 女性 1972-10-04 46 \n63 CS004702000041 野島 洋 0 男性 1943-08-24 75 \n74 CS041515000001 栗田 千夏 1 女性 1967-01-02 52 \n85 CS029313000221 北条 ひかり 1 女性 1987-06-19 31 \n102 CS034312000071 望月 奈央 1 女性 1980-09-20 38 \n\n postal_cd address application_store_cd application_date \\\n1 136-0076 東京都江東区南砂********** S13037 20150414 \n3 245-0016 神奈川県横浜市泉区和泉町********** S14028 20160115 \n14 226-0021 神奈川県横浜市緑区北八朔町********** S14040 20151101 \n31 246-0038 神奈川県横浜市瀬谷区宮沢********** S14028 20151123 \n56 166-0001 東京都杉並区阿佐谷北********** S13039 20171121 \n59 136-0071 東京都江東区亀戸********** S13015 20150629 \n63 176-0022 東京都練馬区向山********** S13004 20170218 \n74 206-0001 東京都多摩市和田********** S13041 20160422 \n85 279-0011 千葉県浦安市美浜********** S12029 20180810 \n102 213-0026 神奈川県川崎市高津区久末********** S14034 20160106 \n\n status_cd \n1 0-00000000-0 \n3 0-00000000-0 \n14 1-20091025-4 \n31 1-20080426-5 \n56 1-20100215-4 \n59 0-00000000-0 \n63 0-00000000-0 \n74 E-20100803-F \n85 0-00000000-0 \n102 0-00000000-0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-012: 店舗データフレーム(df_store)から横浜市の店舗だけ全項目表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_store.query(\"address.str.contains('横浜市')\", engine='python')", "execution_count": 13, "outputs": [ { "output_type": "execute_result", "execution_count": 13, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdstore_nameprefecture_cdprefectureaddressaddress_kanatel_nolongitudelatitudefloor_area
2S14010菊名店14神奈川県神奈川県横浜市港北区菊名一丁目カナガワケンヨコハマシコウホククキクナイッチョウメ045-123-4032139.632635.500491732.0
3S14033阿久和店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4043139.496135.459181495.0
7S14040長津田店14神奈川県神奈川県横浜市緑区長津田みなみ台五丁目カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ045-123-4046139.499435.523981548.0
9S14050阿久和西店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4053139.496135.459181830.0
12S14028二ツ橋店14神奈川県神奈川県横浜市瀬谷区二ツ橋町カナガワケンヨコハマシセヤクフタツバシチョウ045-123-4042139.496335.463041574.0
16S14012本牧和田店14神奈川県神奈川県横浜市中区本牧和田カナガワケンヨコハマシナカクホンモクワダ045-123-4034139.658235.421561341.0
18S14046北山田店14神奈川県神奈川県横浜市都筑区北山田一丁目カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ045-123-4049139.591635.56189831.0
20S14011日吉本町店14神奈川県神奈川県横浜市港北区日吉本町四丁目カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ045-123-4033139.631635.54655890.0
26S14048中川中央店14神奈川県神奈川県横浜市都筑区中川中央二丁目カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ045-123-4051139.575835.549121657.0
40S14042新山下店14神奈川県神奈川県横浜市中区新山下二丁目カナガワケンヨコハマシナカクシンヤマシタニチョウメ045-123-4047139.659335.438941044.0
52S14006葛が谷店14神奈川県神奈川県横浜市都筑区葛が谷カナガワケンヨコハマシツヅキククズガヤ045-123-4031139.563335.535731886.0
\n
", "text/plain": " store_cd store_name prefecture_cd prefecture address \\\n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n26 S14048 中川中央店 14 神奈川県 神奈川県横浜市都筑区中川中央二丁目 \n40 S14042 新山下店 14 神奈川県 神奈川県横浜市中区新山下二丁目 \n52 S14006 葛が谷店 14 神奈川県 神奈川県横浜市都筑区葛が谷 \n\n address_kana tel_no longitude latitude \\\n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n26 カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ 045-123-4051 139.5758 35.54912 \n40 カナガワケンヨコハマシナカクシンヤマシタニチョウメ 045-123-4047 139.6593 35.43894 \n52 カナガワケンヨコハマシツヅキククズガヤ 045-123-4031 139.5633 35.53573 \n\n floor_area \n2 1732.0 \n3 1495.0 \n7 1548.0 \n9 1830.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n20 890.0 \n26 1657.0 \n40 1044.0 \n52 1886.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-013: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の先頭がアルファベットのA〜Fで始まるデータを全項目抽出し、10件だけ表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.query(\"status_cd.str.contains('^[A-F]', regex=True)\", engine='python').head(10)", "execution_count": 14, "outputs": [ { "output_type": "execute_result", "execution_count": 14, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
2CS031415000172宇多田 貴美子1女性1976-10-0442151-0053東京都渋谷区代々木**********S1303120150529D-20100325-C
6CS015414000103奥野 陽子1女性1977-08-0941136-0073東京都江東区北砂**********S1301520150722B-20100609-B
12CS011215000048芦田 沙耶1女性1992-02-0127223-0062神奈川県横浜市港北区日吉本町**********S1401120150228C-20100421-9
15CS029415000023梅田 里穂1女性1976-01-1743279-0043千葉県浦安市富士見**********S1202920150610D-20100918-E
21CS035415000029寺沢 真希9不明1977-09-2741158-0096東京都世田谷区玉川台**********S1303520141220F-20101029-F
32CS031415000106宇野 由美子1女性1970-02-2649151-0053東京都渋谷区代々木**********S1303120150201F-20100511-E
33CS029215000025石倉 美帆1女性1993-09-2825279-0022千葉県浦安市今川**********S1202920150708B-20100820-C
40CS033605000005猪股 雄太0男性1955-12-0563246-0031神奈川県横浜市瀬谷区瀬谷**********S1403320150425F-20100917-E
44CS033415000229板垣 菜々美1女性1977-11-0741246-0021神奈川県横浜市瀬谷区二ツ橋町**********S1403320150712F-20100326-E
53CS008415000145黒谷 麻緒1女性1977-06-2741157-0067東京都世田谷区喜多見**********S1300820150829F-20100622-F
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n2 CS031415000172 宇多田 貴美子 1 女性 1976-10-04 42 151-0053 \n6 CS015414000103 奥野 陽子 1 女性 1977-08-09 41 136-0073 \n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 223-0062 \n15 CS029415000023 梅田 里穂 1 女性 1976-01-17 43 279-0043 \n21 CS035415000029 寺沢 真希 9 不明 1977-09-27 41 158-0096 \n32 CS031415000106 宇野 由美子 1 女性 1970-02-26 49 151-0053 \n33 CS029215000025 石倉 美帆 1 女性 1993-09-28 25 279-0022 \n40 CS033605000005 猪股 雄太 0 男性 1955-12-05 63 246-0031 \n44 CS033415000229 板垣 菜々美 1 女性 1977-11-07 41 246-0021 \n53 CS008415000145 黒谷 麻緒 1 女性 1977-06-27 41 157-0067 \n\n address application_store_cd application_date \\\n2 東京都渋谷区代々木********** S13031 20150529 \n6 東京都江東区北砂********** S13015 20150722 \n12 神奈川県横浜市港北区日吉本町********** S14011 20150228 \n15 千葉県浦安市富士見********** S12029 20150610 \n21 東京都世田谷区玉川台********** S13035 20141220 \n32 東京都渋谷区代々木********** S13031 20150201 \n33 千葉県浦安市今川********** S12029 20150708 \n40 神奈川県横浜市瀬谷区瀬谷********** S14033 20150425 \n44 神奈川県横浜市瀬谷区二ツ橋町********** S14033 20150712 \n53 東京都世田谷区喜多見********** S13008 20150829 \n\n status_cd \n2 D-20100325-C \n6 B-20100609-B \n12 C-20100421-9 \n15 D-20100918-E \n21 F-20101029-F \n32 F-20100511-E \n33 B-20100820-C \n40 F-20100917-E \n44 F-20100326-E \n53 F-20100622-F " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-014: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の末尾が数字の1〜9で終わるデータを全項目抽出し、10件だけ表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.query(\"status_cd.str.contains('[1-9]$', regex=True)\", engine='python').head(10)", "execution_count": 15, "outputs": [ { "output_type": "execute_result", "execution_count": 15, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
4CS001215000145田崎 美紀1女性1995-03-2924144-0055東京都大田区仲六郷**********S13001201706056-20090929-2
9CS033513000180安斎 遥1女性1962-07-1156241-0823神奈川県横浜市旭区善部町**********S14033201507286-20080506-5
12CS011215000048芦田 沙耶1女性1992-02-0127223-0062神奈川県横浜市港北区日吉本町**********S1401120150228C-20100421-9
14CS040412000191川井 郁恵1女性1977-01-0542226-0021神奈川県横浜市緑区北八朔町**********S14040201511011-20091025-4
16CS009315000023皆川 文世1女性1980-04-1538154-0012東京都世田谷区駒沢**********S13009201503195-20080322-1
22CS015315000033福士 璃奈子1女性1983-03-1736135-0043東京都江東区塩浜**********S13015201410244-20080219-3
23CS023513000066神戸 そら1女性1961-12-1757210-0005神奈川県川崎市川崎区東田町**********S14023201509155-20100524-9
24CS035513000134市川 美帆1女性1960-03-2759156-0053東京都世田谷区桜**********S13035201502278-20100711-9
27CS001515000263高松 夏空1女性1962-11-0956144-0051東京都大田区西蒲田**********S13001201608121-20100804-1
28CS040314000027鶴田 きみまろ9不明1986-03-2633226-0027神奈川県横浜市緑区長津田**********S14040201501222-20080426-4
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n4 CS001215000145 田崎 美紀 1 女性 1995-03-29 24 144-0055 \n9 CS033513000180 安斎 遥 1 女性 1962-07-11 56 241-0823 \n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 223-0062 \n14 CS040412000191 川井 郁恵 1 女性 1977-01-05 42 226-0021 \n16 CS009315000023 皆川 文世 1 女性 1980-04-15 38 154-0012 \n22 CS015315000033 福士 璃奈子 1 女性 1983-03-17 36 135-0043 \n23 CS023513000066 神戸 そら 1 女性 1961-12-17 57 210-0005 \n24 CS035513000134 市川 美帆 1 女性 1960-03-27 59 156-0053 \n27 CS001515000263 高松 夏空 1 女性 1962-11-09 56 144-0051 \n28 CS040314000027 鶴田 きみまろ 9 不明 1986-03-26 33 226-0027 \n\n address application_store_cd application_date \\\n4 東京都大田区仲六郷********** S13001 20170605 \n9 神奈川県横浜市旭区善部町********** S14033 20150728 \n12 神奈川県横浜市港北区日吉本町********** S14011 20150228 \n14 神奈川県横浜市緑区北八朔町********** S14040 20151101 \n16 東京都世田谷区駒沢********** S13009 20150319 \n22 東京都江東区塩浜********** S13015 20141024 \n23 神奈川県川崎市川崎区東田町********** S14023 20150915 \n24 東京都世田谷区桜********** S13035 20150227 \n27 東京都大田区西蒲田********** S13001 20160812 \n28 神奈川県横浜市緑区長津田********** S14040 20150122 \n\n status_cd \n4 6-20090929-2 \n9 6-20080506-5 \n12 C-20100421-9 \n14 1-20091025-4 \n16 5-20080322-1 \n22 4-20080219-3 \n23 5-20100524-9 \n24 8-20100711-9 \n27 1-20100804-1 \n28 2-20080426-4 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-015: 顧客データフレーム(df_customer)から、ステータスコード(status_cd)の先頭がアルファベットのA〜Fで始まり、末尾が数字の1〜9で終わるデータを全項目抽出し、10件だけ表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.query(\"status_cd.str.contains('^[A-F].*[1-9]$', regex=True)\", engine='python').head(10)", "execution_count": 16, "outputs": [ { "output_type": "execute_result", "execution_count": 16, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
12CS011215000048芦田 沙耶1女性1992-02-0127223-0062神奈川県横浜市港北区日吉本町**********S1401120150228C-20100421-9
68CS022513000105島村 貴美子1女性1962-03-1257249-0002神奈川県逗子市山の根**********S1402220150320A-20091115-7
71CS001515000096水野 陽子9不明1960-11-2958144-0053東京都大田区蒲田本町**********S1300120150614A-20100724-7
122CS013615000053西脇 季衣1女性1953-10-1865261-0026千葉県千葉市美浜区幕張西**********S1201320150128B-20100329-6
144CS020412000161小宮 薫1女性1974-05-2144174-0042東京都板橋区東坂下**********S1302020150822B-20081021-3
178CS001215000097竹中 あさみ1女性1990-07-2528146-0095東京都大田区多摩川**********S1300120170315A-20100211-2
252CS035212000007内村 恵梨香1女性1990-12-0428152-0023東京都目黒区八雲**********S1303520151013B-20101018-6
259CS002515000386野田 コウ1女性1963-05-3055185-0013東京都国分寺市西恋ケ窪**********S1300220160410C-20100127-8
293CS001615000372稲垣 寿々花1女性1956-10-2962144-0035東京都大田区南蒲田**********S1300120170403A-20100104-1
297CS032512000121松井 知世1女性1962-09-0456210-0011神奈川県川崎市川崎区富士見**********S1303220150727A-20100103-5
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n12 CS011215000048 芦田 沙耶 1 女性 1992-02-01 27 \n68 CS022513000105 島村 貴美子 1 女性 1962-03-12 57 \n71 CS001515000096 水野 陽子 9 不明 1960-11-29 58 \n122 CS013615000053 西脇 季衣 1 女性 1953-10-18 65 \n144 CS020412000161 小宮 薫 1 女性 1974-05-21 44 \n178 CS001215000097 竹中 あさみ 1 女性 1990-07-25 28 \n252 CS035212000007 内村 恵梨香 1 女性 1990-12-04 28 \n259 CS002515000386 野田 コウ 1 女性 1963-05-30 55 \n293 CS001615000372 稲垣 寿々花 1 女性 1956-10-29 62 \n297 CS032512000121 松井 知世 1 女性 1962-09-04 56 \n\n postal_cd address application_store_cd \\\n12 223-0062 神奈川県横浜市港北区日吉本町********** S14011 \n68 249-0002 神奈川県逗子市山の根********** S14022 \n71 144-0053 東京都大田区蒲田本町********** S13001 \n122 261-0026 千葉県千葉市美浜区幕張西********** S12013 \n144 174-0042 東京都板橋区東坂下********** S13020 \n178 146-0095 東京都大田区多摩川********** S13001 \n252 152-0023 東京都目黒区八雲********** S13035 \n259 185-0013 東京都国分寺市西恋ケ窪********** S13002 \n293 144-0035 東京都大田区南蒲田********** S13001 \n297 210-0011 神奈川県川崎市川崎区富士見********** S13032 \n\n application_date status_cd \n12 20150228 C-20100421-9 \n68 20150320 A-20091115-7 \n71 20150614 A-20100724-7 \n122 20150128 B-20100329-6 \n144 20150822 B-20081021-3 \n178 20170315 A-20100211-2 \n252 20151013 B-20101018-6 \n259 20160410 C-20100127-8 \n293 20170403 A-20100104-1 \n297 20150727 A-20100103-5 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-016: 店舗データフレーム(df_store)から、電話番号(tel_no)が3桁-3桁-4桁のデータを全項目表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_store.query(\"tel_no.str.contains('[0-9]{3}-[0-9]{3}-[0-9]{4}', regex=True)\", engine='python')", "execution_count": 17, "outputs": [ { "output_type": "execute_result", "execution_count": 17, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdstore_nameprefecture_cdprefectureaddressaddress_kanatel_nolongitudelatitudefloor_area
0S12014千草台店12千葉県千葉県千葉市稲毛区千草台一丁目チバケンチバシイナゲクチグサダイイッチョウメ043-123-4003140.118035.635591698.0
1S13002国分寺店13東京都東京都国分寺市本多二丁目トウキョウトコクブンジシホンダニチョウメ042-123-4008139.480235.705661735.0
2S14010菊名店14神奈川県神奈川県横浜市港北区菊名一丁目カナガワケンヨコハマシコウホククキクナイッチョウメ045-123-4032139.632635.500491732.0
3S14033阿久和店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4043139.496135.459181495.0
4S14036相模原中央店14神奈川県神奈川県相模原市中央二丁目カナガワケンサガミハラシチュウオウニチョウメ042-123-4045139.371635.573271679.0
7S14040長津田店14神奈川県神奈川県横浜市緑区長津田みなみ台五丁目カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ045-123-4046139.499435.523981548.0
9S14050阿久和西店14神奈川県神奈川県横浜市瀬谷区阿久和西一丁目カナガワケンヨコハマシセヤクアクワニシイッチョウメ045-123-4053139.496135.459181830.0
11S13052森野店13東京都東京都町田市森野三丁目トウキョウトマチダシモリノサンチョウメ042-123-4030139.438335.552931087.0
12S14028二ツ橋店14神奈川県神奈川県横浜市瀬谷区二ツ橋町カナガワケンヨコハマシセヤクフタツバシチョウ045-123-4042139.496335.463041574.0
16S14012本牧和田店14神奈川県神奈川県横浜市中区本牧和田カナガワケンヨコハマシナカクホンモクワダ045-123-4034139.658235.421561341.0
18S14046北山田店14神奈川県神奈川県横浜市都筑区北山田一丁目カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ045-123-4049139.591635.56189831.0
19S14022逗子店14神奈川県神奈川県逗子市逗子一丁目カナガワケンズシシズシイッチョウメ046-123-4036139.578935.296421838.0
20S14011日吉本町店14神奈川県神奈川県横浜市港北区日吉本町四丁目カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ045-123-4033139.631635.54655890.0
21S13016小金井店13東京都東京都小金井市本町一丁目トウキョウトコガネイシホンチョウイッチョウメ042-123-4015139.509435.700181399.0
22S14034川崎野川店14神奈川県神奈川県川崎市宮前区野川カナガワケンカワサキシミヤマエクノガワ044-123-4044139.599835.576931318.0
26S14048中川中央店14神奈川県神奈川県横浜市都筑区中川中央二丁目カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ045-123-4051139.575835.549121657.0
27S12007佐倉店12千葉県千葉県佐倉市上志津チバケンサクラシカミシヅ043-123-4001140.145235.718721895.0
28S14026辻堂西海岸店14神奈川県神奈川県藤沢市辻堂西海岸二丁目カナガワケンフジサワシツジドウニシカイガンニチョウメ046-123-4040139.446635.324641732.0
29S13041八王子店13東京都東京都八王子市大塚トウキョウトハチオウジシオオツカ042-123-4026139.423535.63787810.0
31S14049川崎大師店14神奈川県神奈川県川崎市川崎区中瀬三丁目カナガワケンカワサキシカワサキクナカゼサンチョウメ044-123-4052139.732735.53759962.0
32S14023川崎店14神奈川県神奈川県川崎市川崎区本町二丁目カナガワケンカワサキシカワサキクホンチョウニチョウメ044-123-4037139.702835.535991804.0
33S13018清瀬店13東京都東京都清瀬市松山一丁目トウキョウトキヨセシマツヤマイッチョウメ042-123-4017139.517835.768851220.0
35S14027南藤沢店14神奈川県神奈川県藤沢市南藤沢カナガワケンフジサワシミナミフジサワ046-123-4041139.489635.337621521.0
36S14021伊勢原店14神奈川県神奈川県伊勢原市伊勢原四丁目カナガワケンイセハラシイセハラヨンチョウメ046-123-4035139.312935.40169962.0
37S14047相模原店14神奈川県神奈川県相模原市千代田六丁目カナガワケンサガミハラシチヨダロクチョウメ042-123-4050139.374835.559591047.0
38S12013習志野店12千葉県千葉県習志野市芝園一丁目チバケンナラシノシシバゾノイッチョウメ047-123-4002140.022035.66122808.0
40S14042新山下店14神奈川県神奈川県横浜市中区新山下二丁目カナガワケンヨコハマシナカクシンヤマシタニチョウメ045-123-4047139.659335.438941044.0
42S12030八幡店12千葉県千葉県市川市八幡三丁目チバケンイチカワシヤワタサンチョウメ047-123-4005139.924035.723181162.0
44S14025大和店14神奈川県神奈川県大和市下和田カナガワケンヤマトシシモワダ046-123-4039139.468035.434141011.0
45S14045厚木店14神奈川県神奈川県厚木市中町二丁目カナガワケンアツギシナカチョウニチョウメ046-123-4048139.365135.44182980.0
47S12029東野店12千葉県千葉県浦安市東野一丁目チバケンウラヤスシヒガシノイッチョウメ047-123-4004139.896835.650861101.0
49S12053高洲店12千葉県千葉県浦安市高洲五丁目チバケンウラヤスシタカスゴチョウメ047-123-4006139.917635.637551555.0
51S14024三田店14神奈川県神奈川県川崎市多摩区三田四丁目カナガワケンカワサキシタマクミタヨンチョウメ044-123-4038139.542435.60770972.0
52S14006葛が谷店14神奈川県神奈川県横浜市都筑区葛が谷カナガワケンヨコハマシツヅキククズガヤ045-123-4031139.563335.535731886.0
\n
", "text/plain": " store_cd store_name prefecture_cd prefecture address \\\n0 S12014 千草台店 12 千葉県 千葉県千葉市稲毛区千草台一丁目 \n1 S13002 国分寺店 13 東京都 東京都国分寺市本多二丁目 \n2 S14010 菊名店 14 神奈川県 神奈川県横浜市港北区菊名一丁目 \n3 S14033 阿久和店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n4 S14036 相模原中央店 14 神奈川県 神奈川県相模原市中央二丁目 \n7 S14040 長津田店 14 神奈川県 神奈川県横浜市緑区長津田みなみ台五丁目 \n9 S14050 阿久和西店 14 神奈川県 神奈川県横浜市瀬谷区阿久和西一丁目 \n11 S13052 森野店 13 東京都 東京都町田市森野三丁目 \n12 S14028 二ツ橋店 14 神奈川県 神奈川県横浜市瀬谷区二ツ橋町 \n16 S14012 本牧和田店 14 神奈川県 神奈川県横浜市中区本牧和田 \n18 S14046 北山田店 14 神奈川県 神奈川県横浜市都筑区北山田一丁目 \n19 S14022 逗子店 14 神奈川県 神奈川県逗子市逗子一丁目 \n20 S14011 日吉本町店 14 神奈川県 神奈川県横浜市港北区日吉本町四丁目 \n21 S13016 小金井店 13 東京都 東京都小金井市本町一丁目 \n22 S14034 川崎野川店 14 神奈川県 神奈川県川崎市宮前区野川 \n26 S14048 中川中央店 14 神奈川県 神奈川県横浜市都筑区中川中央二丁目 \n27 S12007 佐倉店 12 千葉県 千葉県佐倉市上志津 \n28 S14026 辻堂西海岸店 14 神奈川県 神奈川県藤沢市辻堂西海岸二丁目 \n29 S13041 八王子店 13 東京都 東京都八王子市大塚 \n31 S14049 川崎大師店 14 神奈川県 神奈川県川崎市川崎区中瀬三丁目 \n32 S14023 川崎店 14 神奈川県 神奈川県川崎市川崎区本町二丁目 \n33 S13018 清瀬店 13 東京都 東京都清瀬市松山一丁目 \n35 S14027 南藤沢店 14 神奈川県 神奈川県藤沢市南藤沢 \n36 S14021 伊勢原店 14 神奈川県 神奈川県伊勢原市伊勢原四丁目 \n37 S14047 相模原店 14 神奈川県 神奈川県相模原市千代田六丁目 \n38 S12013 習志野店 12 千葉県 千葉県習志野市芝園一丁目 \n40 S14042 新山下店 14 神奈川県 神奈川県横浜市中区新山下二丁目 \n42 S12030 八幡店 12 千葉県 千葉県市川市八幡三丁目 \n44 S14025 大和店 14 神奈川県 神奈川県大和市下和田 \n45 S14045 厚木店 14 神奈川県 神奈川県厚木市中町二丁目 \n47 S12029 東野店 12 千葉県 千葉県浦安市東野一丁目 \n49 S12053 高洲店 12 千葉県 千葉県浦安市高洲五丁目 \n51 S14024 三田店 14 神奈川県 神奈川県川崎市多摩区三田四丁目 \n52 S14006 葛が谷店 14 神奈川県 神奈川県横浜市都筑区葛が谷 \n\n address_kana tel_no longitude latitude \\\n0 チバケンチバシイナゲクチグサダイイッチョウメ 043-123-4003 140.1180 35.63559 \n1 トウキョウトコクブンジシホンダニチョウメ 042-123-4008 139.4802 35.70566 \n2 カナガワケンヨコハマシコウホククキクナイッチョウメ 045-123-4032 139.6326 35.50049 \n3 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4043 139.4961 35.45918 \n4 カナガワケンサガミハラシチュウオウニチョウメ 042-123-4045 139.3716 35.57327 \n7 カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ 045-123-4046 139.4994 35.52398 \n9 カナガワケンヨコハマシセヤクアクワニシイッチョウメ 045-123-4053 139.4961 35.45918 \n11 トウキョウトマチダシモリノサンチョウメ 042-123-4030 139.4383 35.55293 \n12 カナガワケンヨコハマシセヤクフタツバシチョウ 045-123-4042 139.4963 35.46304 \n16 カナガワケンヨコハマシナカクホンモクワダ 045-123-4034 139.6582 35.42156 \n18 カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ 045-123-4049 139.5916 35.56189 \n19 カナガワケンズシシズシイッチョウメ 046-123-4036 139.5789 35.29642 \n20 カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ 045-123-4033 139.6316 35.54655 \n21 トウキョウトコガネイシホンチョウイッチョウメ 042-123-4015 139.5094 35.70018 \n22 カナガワケンカワサキシミヤマエクノガワ 044-123-4044 139.5998 35.57693 \n26 カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ 045-123-4051 139.5758 35.54912 \n27 チバケンサクラシカミシヅ 043-123-4001 140.1452 35.71872 \n28 カナガワケンフジサワシツジドウニシカイガンニチョウメ 046-123-4040 139.4466 35.32464 \n29 トウキョウトハチオウジシオオツカ 042-123-4026 139.4235 35.63787 \n31 カナガワケンカワサキシカワサキクナカゼサンチョウメ 044-123-4052 139.7327 35.53759 \n32 カナガワケンカワサキシカワサキクホンチョウニチョウメ 044-123-4037 139.7028 35.53599 \n33 トウキョウトキヨセシマツヤマイッチョウメ 042-123-4017 139.5178 35.76885 \n35 カナガワケンフジサワシミナミフジサワ 046-123-4041 139.4896 35.33762 \n36 カナガワケンイセハラシイセハラヨンチョウメ 046-123-4035 139.3129 35.40169 \n37 カナガワケンサガミハラシチヨダロクチョウメ 042-123-4050 139.3748 35.55959 \n38 チバケンナラシノシシバゾノイッチョウメ 047-123-4002 140.0220 35.66122 \n40 カナガワケンヨコハマシナカクシンヤマシタニチョウメ 045-123-4047 139.6593 35.43894 \n42 チバケンイチカワシヤワタサンチョウメ 047-123-4005 139.9240 35.72318 \n44 カナガワケンヤマトシシモワダ 046-123-4039 139.4680 35.43414 \n45 カナガワケンアツギシナカチョウニチョウメ 046-123-4048 139.3651 35.44182 \n47 チバケンウラヤスシヒガシノイッチョウメ 047-123-4004 139.8968 35.65086 \n49 チバケンウラヤスシタカスゴチョウメ 047-123-4006 139.9176 35.63755 \n51 カナガワケンカワサキシタマクミタヨンチョウメ 044-123-4038 139.5424 35.60770 \n52 カナガワケンヨコハマシツヅキククズガヤ 045-123-4031 139.5633 35.53573 \n\n floor_area \n0 1698.0 \n1 1735.0 \n2 1732.0 \n3 1495.0 \n4 1679.0 \n7 1548.0 \n9 1830.0 \n11 1087.0 \n12 1574.0 \n16 1341.0 \n18 831.0 \n19 1838.0 \n20 890.0 \n21 1399.0 \n22 1318.0 \n26 1657.0 \n27 1895.0 \n28 1732.0 \n29 810.0 \n31 962.0 \n32 1804.0 \n33 1220.0 \n35 1521.0 \n36 962.0 \n37 1047.0 \n38 808.0 \n40 1044.0 \n42 1162.0 \n44 1011.0 \n45 980.0 \n47 1101.0 \n49 1555.0 \n51 972.0 \n52 1886.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-17: 顧客データフレーム(df_customer)を生年月日(birth_day)で高齢順にソートし、先頭10件を全項目表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.sort_values('birth_day', ascending=True).head(10)", "execution_count": 18, "outputs": [ { "output_type": "execute_result", "execution_count": 18, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
18817CS003813000014村山 菜々美1女性1928-11-2690182-0007東京都調布市菊野台**********S13003201602140-00000000-0
12328CS026813000004吉村 朝陽1女性1928-12-1490251-0043神奈川県藤沢市辻堂元町**********S14026201507230-00000000-0
15682CS018811000003熊沢 美里1女性1929-01-0790204-0004東京都清瀬市野塩**********S13018201504030-00000000-0
15302CS027803000004内村 拓郎0男性1929-01-1290251-0031神奈川県藤沢市鵠沼藤が谷**********S14027201512270-00000000-0
1681CS013801000003天野 拓郎0男性1929-01-1590274-0824千葉県船橋市前原東**********S12013201601200-00000000-0
7511CS001814000022鶴田 里穂1女性1929-01-2890144-0045東京都大田区南六郷**********S1300120161012A-20090415-7
2378CS016815000002山元 美紀1女性1929-02-2290184-0005東京都小金井市桜町**********S1301620150629C-20090923-C
4680CS009815000003中田 里穂1女性1929-04-0889154-0014東京都世田谷区新町**********S1300920150421D-20091021-E
16070CS005813000015金谷 恵梨香1女性1929-04-0989165-0032東京都中野区鷺宮**********S13005201505060-00000000-0
6305CS012813000013宇野 南朋1女性1929-04-0989231-0806神奈川県横浜市中区本牧町**********S14012201507120-00000000-0
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n18817 CS003813000014 村山 菜々美 1 女性 1928-11-26 90 \n12328 CS026813000004 吉村 朝陽 1 女性 1928-12-14 90 \n15682 CS018811000003 熊沢 美里 1 女性 1929-01-07 90 \n15302 CS027803000004 内村 拓郎 0 男性 1929-01-12 90 \n1681 CS013801000003 天野 拓郎 0 男性 1929-01-15 90 \n7511 CS001814000022 鶴田 里穂 1 女性 1929-01-28 90 \n2378 CS016815000002 山元 美紀 1 女性 1929-02-22 90 \n4680 CS009815000003 中田 里穂 1 女性 1929-04-08 89 \n16070 CS005813000015 金谷 恵梨香 1 女性 1929-04-09 89 \n6305 CS012813000013 宇野 南朋 1 女性 1929-04-09 89 \n\n postal_cd address application_store_cd \\\n18817 182-0007 東京都調布市菊野台********** S13003 \n12328 251-0043 神奈川県藤沢市辻堂元町********** S14026 \n15682 204-0004 東京都清瀬市野塩********** S13018 \n15302 251-0031 神奈川県藤沢市鵠沼藤が谷********** S14027 \n1681 274-0824 千葉県船橋市前原東********** S12013 \n7511 144-0045 東京都大田区南六郷********** S13001 \n2378 184-0005 東京都小金井市桜町********** S13016 \n4680 154-0014 東京都世田谷区新町********** S13009 \n16070 165-0032 東京都中野区鷺宮********** S13005 \n6305 231-0806 神奈川県横浜市中区本牧町********** S14012 \n\n application_date status_cd \n18817 20160214 0-00000000-0 \n12328 20150723 0-00000000-0 \n15682 20150403 0-00000000-0 \n15302 20151227 0-00000000-0 \n1681 20160120 0-00000000-0 \n7511 20161012 A-20090415-7 \n2378 20150629 C-20090923-C \n4680 20150421 D-20091021-E \n16070 20150506 0-00000000-0 \n6305 20150712 0-00000000-0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-18: 顧客データフレーム(df_customer)を生年月日(birth_day)で若い順にソートし、先頭10件を全項目表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.sort_values('birth_day', ascending=False).head(10)", "execution_count": 19, "outputs": [ { "output_type": "execute_result", "execution_count": 19, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cd
15639CS035114000004大村 美里1女性2007-11-2511156-0053東京都世田谷区桜**********S13035201506196-20091205-6
7468CS022103000002福山 はじめ9不明2007-10-0211249-0006神奈川県逗子市逗子**********S14022201609090-00000000-0
10745CS002113000009柴田 真悠子1女性2007-09-1711184-0014東京都小金井市貫井南町**********S13002201603040-00000000-0
19811CS004115000014松井 京子1女性2007-08-0911165-0031東京都中野区上鷺宮**********S13004201611201-20081231-1
7039CS002114000010山内 遥1女性2007-06-0311184-0015東京都小金井市貫井北町**********S13002201609206-20100510-1
3670CS025115000002小柳 夏希1女性2007-04-1811245-0018神奈川県横浜市泉区上飯田町**********S1402520160116D-20100913-D
12493CS002113000025広末 まなみ1女性2007-03-3012184-0015東京都小金井市貫井北町**********S13002201710300-00000000-0
15977CS033112000003長野 美紀1女性2007-03-2212245-0051神奈川県横浜市戸塚区名瀬町**********S14033201506060-00000000-0
5716CS007115000006福岡 瞬1女性2007-03-1012285-0845千葉県佐倉市西志津**********S1200720151118F-20101016-F
15097CS014113000008矢口 莉緒1女性2007-03-0512260-0041千葉県千葉市中央区東千葉**********S12014201506223-20091108-6
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n15639 CS035114000004 大村 美里 1 女性 2007-11-25 11 \n7468 CS022103000002 福山 はじめ 9 不明 2007-10-02 11 \n10745 CS002113000009 柴田 真悠子 1 女性 2007-09-17 11 \n19811 CS004115000014 松井 京子 1 女性 2007-08-09 11 \n7039 CS002114000010 山内 遥 1 女性 2007-06-03 11 \n3670 CS025115000002 小柳 夏希 1 女性 2007-04-18 11 \n12493 CS002113000025 広末 まなみ 1 女性 2007-03-30 12 \n15977 CS033112000003 長野 美紀 1 女性 2007-03-22 12 \n5716 CS007115000006 福岡 瞬 1 女性 2007-03-10 12 \n15097 CS014113000008 矢口 莉緒 1 女性 2007-03-05 12 \n\n postal_cd address application_store_cd \\\n15639 156-0053 東京都世田谷区桜********** S13035 \n7468 249-0006 神奈川県逗子市逗子********** S14022 \n10745 184-0014 東京都小金井市貫井南町********** S13002 \n19811 165-0031 東京都中野区上鷺宮********** S13004 \n7039 184-0015 東京都小金井市貫井北町********** S13002 \n3670 245-0018 神奈川県横浜市泉区上飯田町********** S14025 \n12493 184-0015 東京都小金井市貫井北町********** S13002 \n15977 245-0051 神奈川県横浜市戸塚区名瀬町********** S14033 \n5716 285-0845 千葉県佐倉市西志津********** S12007 \n15097 260-0041 千葉県千葉市中央区東千葉********** S12014 \n\n application_date status_cd \n15639 20150619 6-20091205-6 \n7468 20160909 0-00000000-0 \n10745 20160304 0-00000000-0 \n19811 20161120 1-20081231-1 \n7039 20160920 6-20100510-1 \n3670 20160116 D-20100913-D \n12493 20171030 0-00000000-0 \n15977 20150606 0-00000000-0 \n5716 20151118 F-20101016-F \n15097 20150622 3-20091108-6 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-19: レシート明細データフレーム(df_receipt)に対し、1件あたりの売上金額(amount)が高い順にランクを付与し、先頭10件を抽出せよ。項目は顧客ID(customer_id)、売上金額(amount)、付与したランクを表示させること。なお、売上金額(amount)が等しい場合は同一順位を付与するものとする。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] \n ,df_receipt['amount'].rank(method='min', ascending=False)], axis=1)\ndf_tmp.columns = ['customer_id', 'amount', 'ranking']\ndf_tmp.sort_values('ranking', ascending=True).head(10)", "execution_count": 20, "outputs": [ { "output_type": "execute_result", "execution_count": 20, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountranking
1202CS011415000006109251.0
62317ZZ00000000000068002.0
54095CS02860500000257803.0
4632CS01551500003454804.0
72747ZZ00000000000054804.0
10320ZZ00000000000054804.0
97294CS02151500008954407.0
28304ZZ00000000000054407.0
92246CS00941500003852809.0
68553CS04041500020052809.0
\n
", "text/plain": " customer_id amount ranking\n1202 CS011415000006 10925 1.0\n62317 ZZ000000000000 6800 2.0\n54095 CS028605000002 5780 3.0\n4632 CS015515000034 5480 4.0\n72747 ZZ000000000000 5480 4.0\n10320 ZZ000000000000 5480 4.0\n97294 CS021515000089 5440 7.0\n28304 ZZ000000000000 5440 7.0\n92246 CS009415000038 5280 9.0\n68553 CS040415000200 5280 9.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-020: レシート明細データフレーム(df_receipt)に対し、1件あたりの売上金額(amount)が高い順にランクを付与し、先頭10件を抽出せよ。項目は顧客ID(customer_id)、売上金額(amount)、付与したランクを表示させること。なお、売上金額(amount)が等しい場合でも別順位を付与すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] \n ,df_receipt['amount'].rank(method='first', ascending=False)], axis=1)\ndf_tmp.columns = ['customer_id', 'amount', 'ranking']\ndf_tmp.sort_values('ranking', ascending=True).head(10)", "execution_count": 21, "outputs": [ { "output_type": "execute_result", "execution_count": 21, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountranking
1202CS011415000006109251.0
62317ZZ00000000000068002.0
54095CS02860500000257803.0
4632CS01551500003454804.0
10320ZZ00000000000054805.0
72747ZZ00000000000054806.0
28304ZZ00000000000054407.0
97294CS02151500008954408.0
596CS01551500008352809.0
11275CS017414000114528010.0
\n
", "text/plain": " customer_id amount ranking\n1202 CS011415000006 10925 1.0\n62317 ZZ000000000000 6800 2.0\n54095 CS028605000002 5780 3.0\n4632 CS015515000034 5480 4.0\n10320 ZZ000000000000 5480 5.0\n72747 ZZ000000000000 5480 6.0\n28304 ZZ000000000000 5440 7.0\n97294 CS021515000089 5440 8.0\n596 CS015515000083 5280 9.0\n11275 CS017414000114 5280 10.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-021: レシート明細データフレーム(df_receipt)に対し、件数をカウントせよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "len(df_receipt)", "execution_count": 22, "outputs": [ { "output_type": "execute_result", "execution_count": 22, "data": { "text/plain": "104681" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-022: レシート明細データフレーム(df_receipt)の顧客ID(customer_id)に対し、ユニーク件数をカウントせよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "len(df_receipt['customer_id'].unique())", "execution_count": 23, "outputs": [ { "output_type": "execute_result", "execution_count": 23, "data": { "text/plain": "8307" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-023: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)と売上数量(quantity)を合計せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').agg({'amount':'sum', 'quantity':'sum'}).reset_index()", "execution_count": 24, "outputs": [ { "output_type": "execute_result", "execution_count": 24, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamountquantity
0S120076387612099
1S120137875132425
2S120147251672358
3S120297947412555
4S120306844022403
5S130018119362347
6S130027278212340
7S130037642942197
8S130047793732390
9S130056298762004
10S130088092882491
11S130098088702486
12S130157808732248
13S130167937732432
14S130177482212376
15S130187905352562
16S130198278332541
17S130207963832383
18S130317059682336
19S130327905012491
20S130357158692219
21S130376930872344
22S130387088842337
23S130396118881981
24S130417282662233
25S130435878951881
26S130445207641729
27S13051107452354
28S13052100314250
29S140067128392284
30S140107903612290
31S140118057242434
32S140127206002412
33S140216995112231
34S140226513282047
35S140237276302258
36S140247363232417
37S140257555812394
38S140268245372503
39S140277145502303
40S140287861452458
41S140337253182282
42S140346536812024
43S14036203694635
44S140407018582233
45S140425346891935
46S140454584841398
47S140464126461354
48S140473383291041
49S14048234276769
50S14049230808788
51S14050167090580
\n
", "text/plain": " store_cd amount quantity\n0 S12007 638761 2099\n1 S12013 787513 2425\n2 S12014 725167 2358\n3 S12029 794741 2555\n4 S12030 684402 2403\n5 S13001 811936 2347\n6 S13002 727821 2340\n7 S13003 764294 2197\n8 S13004 779373 2390\n9 S13005 629876 2004\n10 S13008 809288 2491\n11 S13009 808870 2486\n12 S13015 780873 2248\n13 S13016 793773 2432\n14 S13017 748221 2376\n15 S13018 790535 2562\n16 S13019 827833 2541\n17 S13020 796383 2383\n18 S13031 705968 2336\n19 S13032 790501 2491\n20 S13035 715869 2219\n21 S13037 693087 2344\n22 S13038 708884 2337\n23 S13039 611888 1981\n24 S13041 728266 2233\n25 S13043 587895 1881\n26 S13044 520764 1729\n27 S13051 107452 354\n28 S13052 100314 250\n29 S14006 712839 2284\n30 S14010 790361 2290\n31 S14011 805724 2434\n32 S14012 720600 2412\n33 S14021 699511 2231\n34 S14022 651328 2047\n35 S14023 727630 2258\n36 S14024 736323 2417\n37 S14025 755581 2394\n38 S14026 824537 2503\n39 S14027 714550 2303\n40 S14028 786145 2458\n41 S14033 725318 2282\n42 S14034 653681 2024\n43 S14036 203694 635\n44 S14040 701858 2233\n45 S14042 534689 1935\n46 S14045 458484 1398\n47 S14046 412646 1354\n48 S14047 338329 1041\n49 S14048 234276 769\n50 S14049 230808 788\n51 S14050 167090 580" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-024: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も新しい売上日(sales_ymd)を求め、10件表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('customer_id').sales_ymd.max().reset_index().head(10)", "execution_count": 25, "outputs": [ { "output_type": "execute_result", "execution_count": 25, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymd
0CS00111300000420190308
1CS00111400000520190731
2CS00111500001020190405
3CS00120500000420190625
4CS00120500000620190224
5CS00121100002520190322
6CS00121200002720170127
7CS00121200003120180906
8CS00121200004620170811
9CS00121200007020191018
\n
", "text/plain": " customer_id sales_ymd\n0 CS001113000004 20190308\n1 CS001114000005 20190731\n2 CS001115000010 20190405\n3 CS001205000004 20190625\n4 CS001205000006 20190224\n5 CS001211000025 20190322\n6 CS001212000027 20170127\n7 CS001212000031 20180906\n8 CS001212000046 20170811\n9 CS001212000070 20191018" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-025: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も古い売上日(sales_ymd)を求め、10件表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).head(10)", "execution_count": 26, "outputs": [ { "output_type": "execute_result", "execution_count": 26, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymd
customer_id
CS00111300000420190308
CS00111400000520180503
CS00111500001020171228
CS00120500000420170914
CS00120500000620180207
CS00121100002520190322
CS00121200002720170127
CS00121200003120180906
CS00121200004620170811
CS00121200007020191018
\n
", "text/plain": " sales_ymd\ncustomer_id \nCS001113000004 20190308\nCS001114000005 20180503\nCS001115000010 20171228\nCS001205000004 20170914\nCS001205000006 20180207\nCS001211000025 20190322\nCS001212000027 20170127\nCS001212000031 20180906\nCS001212000046 20170811\nCS001212000070 20191018" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-026: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに最も新しい売上日(sales_ymd)と古い売上日を求め、両者が異なるデータを10件表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()\ndf_tmp.columns = [\"_\".join(pair) for pair in df_tmp.columns]\ndf_tmp.query('sales_ymd_max != sales_ymd_min').head(10)", "execution_count": 27, "outputs": [ { "output_type": "execute_result", "execution_count": 27, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_id_sales_ymd_maxsales_ymd_min
1CS0011140000052019073120180503
2CS0011150000102019040520171228
3CS0012050000042019062520170914
4CS0012050000062019022420180207
13CS0012140000092019090220170306
14CS0012140000172019100620180828
16CS0012140000482019092920171109
17CS0012140000522019061720180208
20CS0012150000052018102120170206
21CS0012150000402017102220170214
\n
", "text/plain": " customer_id_ sales_ymd_max sales_ymd_min\n1 CS001114000005 20190731 20180503\n2 CS001115000010 20190405 20171228\n3 CS001205000004 20190625 20170914\n4 CS001205000006 20190224 20180207\n13 CS001214000009 20190902 20170306\n14 CS001214000017 20191006 20180828\n16 CS001214000048 20190929 20171109\n17 CS001214000052 20190617 20180208\n20 CS001215000005 20181021 20170206\n21 CS001215000040 20171022 20170214" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-027: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、降順でTOP5を表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)", "execution_count": 28, "outputs": [ { "output_type": "execute_result", "execution_count": 28, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamount
28S13052402.867470
12S13015351.111960
7S13003350.915519
30S14010348.791262
5S13001348.470386
\n
", "text/plain": " store_cd amount\n28 S13052 402.867470\n12 S13015 351.111960\n7 S13003 350.915519\n30 S14010 348.791262\n5 S13001 348.470386" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-028: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の中央値を計算し、降順でTOP5を表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)", "execution_count": 29, "outputs": [ { "output_type": "execute_result", "execution_count": 29, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamount
28S13052190
30S14010188
51S14050185
44S14040180
7S13003180
\n
", "text/plain": " store_cd amount\n28 S13052 190\n30 S14010 188\n51 S14050 185\n44 S14040 180\n7 S13003 180" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-029: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに商品コード(product_cd)の最頻値を求めよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').product_cd.apply(lambda x: x.mode()).reset_index()", "execution_count": 30, "outputs": [ { "output_type": "execute_result", "execution_count": 30, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdlevel_1product_cd
0S120070P060303001
1S120130P060303001
2S120140P060303001
3S120290P060303001
4S120300P060303001
5S130010P060303001
6S130020P060303001
7S130030P071401001
8S130040P060303001
9S130050P040503001
10S130080P060303001
11S130090P060303001
12S130150P071401001
13S130160P071102001
14S130170P060101002
15S130180P071401001
16S130190P071401001
17S130200P071401001
18S130310P060303001
19S130320P060303001
20S130350P040503001
21S130370P060303001
22S130380P060303001
23S130390P071401001
24S130410P071401001
25S130430P060303001
26S130440P060303001
27S130510P050102001
28S130511P071003001
29S130512P080804001
30S130520P050101001
31S140060P060303001
32S140100P060303001
33S140110P060101001
34S140120P060303001
35S140210P060101001
36S140220P060303001
37S140230P071401001
38S140240P060303001
39S140250P060303001
40S140260P071401001
41S140270P060303001
42S140280P060303001
43S140330P071401001
44S140340P060303001
45S140360P040503001
46S140361P060101001
47S140400P060303001
48S140420P050101001
49S140450P060303001
50S140460P060303001
51S140470P060303001
52S140480P050101001
53S140490P060303001
54S140500P060303001
\n
", "text/plain": " store_cd level_1 product_cd\n0 S12007 0 P060303001\n1 S12013 0 P060303001\n2 S12014 0 P060303001\n3 S12029 0 P060303001\n4 S12030 0 P060303001\n5 S13001 0 P060303001\n6 S13002 0 P060303001\n7 S13003 0 P071401001\n8 S13004 0 P060303001\n9 S13005 0 P040503001\n10 S13008 0 P060303001\n11 S13009 0 P060303001\n12 S13015 0 P071401001\n13 S13016 0 P071102001\n14 S13017 0 P060101002\n15 S13018 0 P071401001\n16 S13019 0 P071401001\n17 S13020 0 P071401001\n18 S13031 0 P060303001\n19 S13032 0 P060303001\n20 S13035 0 P040503001\n21 S13037 0 P060303001\n22 S13038 0 P060303001\n23 S13039 0 P071401001\n24 S13041 0 P071401001\n25 S13043 0 P060303001\n26 S13044 0 P060303001\n27 S13051 0 P050102001\n28 S13051 1 P071003001\n29 S13051 2 P080804001\n30 S13052 0 P050101001\n31 S14006 0 P060303001\n32 S14010 0 P060303001\n33 S14011 0 P060101001\n34 S14012 0 P060303001\n35 S14021 0 P060101001\n36 S14022 0 P060303001\n37 S14023 0 P071401001\n38 S14024 0 P060303001\n39 S14025 0 P060303001\n40 S14026 0 P071401001\n41 S14027 0 P060303001\n42 S14028 0 P060303001\n43 S14033 0 P071401001\n44 S14034 0 P060303001\n45 S14036 0 P040503001\n46 S14036 1 P060101001\n47 S14040 0 P060303001\n48 S14042 0 P050101001\n49 S14045 0 P060303001\n50 S14046 0 P060303001\n51 S14047 0 P060303001\n52 S14048 0 P050101001\n53 S14049 0 P060303001\n54 S14050 0 P060303001" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-030: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標本分散を計算し、降順でTOP5を表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').amount.var(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)", "execution_count": 31, "outputs": [ { "output_type": "execute_result", "execution_count": 31, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamount
28S13052440088.701311
31S14011306314.558164
42S14034296920.081011
5S13001295431.993329
12S13015295294.361116
\n
", "text/plain": " store_cd amount\n28 S13052 440088.701311\n31 S14011 306314.558164\n42 S14034 296920.081011\n5 S13001 295431.993329\n12 S13015 295294.361116" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-031: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標本標準偏差を計算し、降順でTOP5を表示せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').amount.std(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)", "execution_count": 32, "outputs": [ { "output_type": "execute_result", "execution_count": 32, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamount
28S13052663.391816
31S14011553.456916
42S14034544.903736
5S13001543.536561
12S13015543.409938
\n
", "text/plain": " store_cd amount\n28 S13052 663.391816\n31 S14011 553.456916\n42 S14034 544.903736\n5 S13001 543.536561\n12 S13015 543.409938" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-032: レシート明細データフレーム(df_receipt)の売上金額(amount)について、25%刻みでパーセンタイル値を求めよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例1\nnp.percentile(df_receipt['amount'], q=[25, 50, 75,100])", "execution_count": 33, "outputs": [ { "output_type": "execute_result", "execution_count": 33, "data": { "text/plain": "array([ 102., 170., 288., 10925.])" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例2\ndf_receipt.amount.quantile(q=np.arange(5)/4)", "execution_count": 34, "outputs": [ { "output_type": "execute_result", "execution_count": 34, "data": { "text/plain": "0.00 10.0\n0.25 102.0\n0.50 170.0\n0.75 288.0\n1.00 10925.0\nName: amount, dtype: float64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-033: レシート明細データフレーム(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、330以上のものを抽出せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_receipt.groupby('store_cd').amount.mean().reset_index().query('amount >= 330')", "execution_count": 35, "outputs": [ { "output_type": "execute_result", "execution_count": 35, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
store_cdamount
1S12013330.194130
5S13001348.470386
7S13003350.915519
8S13004330.943949
12S13015351.111960
16S13019330.208616
17S13020337.879932
28S13052402.867470
30S14010348.791262
31S14011335.718333
38S14026332.340588
46S14045330.082073
48S14047330.077073
\n
", "text/plain": " store_cd amount\n1 S12013 330.194130\n5 S13001 348.470386\n7 S13003 350.915519\n8 S13004 330.943949\n12 S13015 351.111960\n16 S13019 330.208616\n17 S13020 337.879932\n28 S13052 402.867470\n30 S14010 348.791262\n31 S14011 335.718333\n38 S14026 332.340588\n46 S14045 330.082073\n48 S14047 330.077073" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-034: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求めよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。\n" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# queryを使わない書き方\ndf_receipt[~df_receipt['customer_id'].str.startswith(\"Z\")].groupby('customer_id').amount.sum().mean()", "execution_count": 36, "outputs": [ { "output_type": "execute_result", "execution_count": 36, "data": { "text/plain": "2547.742234529256" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# queryを使う書き方\ndf_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python').groupby('customer_id').amount.sum().mean()", "execution_count": 37, "outputs": [ { "output_type": "execute_result", "execution_count": 37, "data": { "text/plain": "2547.742234529256" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-035: レシート明細データフレーム(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求め、平均以上に買い物をしている顧客を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、データは10件だけ表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "amount_mean = df_receipt[~df_receipt['customer_id'].str.startswith(\"Z\")].groupby('customer_id').amount.sum().mean()\ndf_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_amount_sum[df_amount_sum['amount'] >= amount_mean].head(10)", "execution_count": 38, "outputs": [ { "output_type": "execute_result", "execution_count": 38, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamount
2CS0011150000103044
4CS0012050000063337
13CS0012140000094685
14CS0012140000174132
17CS0012140000525639
21CS0012150000403496
30CS0013040000063726
32CS0013050000053485
33CS0013050000114370
53CS0013150001803300
\n
", "text/plain": " customer_id amount\n2 CS001115000010 3044\n4 CS001205000006 3337\n13 CS001214000009 4685\n14 CS001214000017 4132\n17 CS001214000052 5639\n21 CS001215000040 3496\n30 CS001304000006 3726\n32 CS001305000005 3485\n33 CS001305000011 4370\n53 CS001315000180 3300" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-036: レシート明細データフレーム(df_receipt)と店舗データフレーム(df_store)を内部結合し、レシート明細データフレームの全項目と店舗データフレームの店舗名(store_name)を10件表示させよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.merge(df_receipt, df_store[['store_cd','store_name']], how='inner', on='store_cd').head(10)", "execution_count": 39, "outputs": [ { "output_type": "execute_result", "execution_count": 39, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdsales_epochstore_cdreceipt_noreceipt_sub_nocustomer_idproduct_cdquantityamountstore_name
0201811031257206400S140061121CS006214000001P0703050121158葛が谷店
1201811161258329600S140061122ZZ000000000000P080401001148葛が谷店
2201701181200614400S1400611621CS006815000006P0504060351220葛が谷店
3201905241274659200S1400611921CS006514000034P060104003180葛が谷店
4201904191271635200S140061122ZZ000000000000P0605010021148葛が谷店
5201811191258588800S1400611522ZZ000000000000P050701001188葛が谷店
6201712111228953600S1400611322CS006515000175P090903001180葛が谷店
7201910211287619200S1400611122CS006415000221P0406020011405葛が谷店
8201707101215648000S1400611322CS006411000036P0903010511330葛が谷店
9201908051280966400S140061121CS006211000012P0501040011115葛が谷店
\n
", "text/plain": " sales_ymd sales_epoch store_cd receipt_no receipt_sub_no \\\n0 20181103 1257206400 S14006 112 1 \n1 20181116 1258329600 S14006 112 2 \n2 20170118 1200614400 S14006 1162 1 \n3 20190524 1274659200 S14006 1192 1 \n4 20190419 1271635200 S14006 112 2 \n5 20181119 1258588800 S14006 1152 2 \n6 20171211 1228953600 S14006 1132 2 \n7 20191021 1287619200 S14006 1112 2 \n8 20170710 1215648000 S14006 1132 2 \n9 20190805 1280966400 S14006 112 1 \n\n customer_id product_cd quantity amount store_name \n0 CS006214000001 P070305012 1 158 葛が谷店 \n1 ZZ000000000000 P080401001 1 48 葛が谷店 \n2 CS006815000006 P050406035 1 220 葛が谷店 \n3 CS006514000034 P060104003 1 80 葛が谷店 \n4 ZZ000000000000 P060501002 1 148 葛が谷店 \n5 ZZ000000000000 P050701001 1 88 葛が谷店 \n6 CS006515000175 P090903001 1 80 葛が谷店 \n7 CS006415000221 P040602001 1 405 葛が谷店 \n8 CS006411000036 P090301051 1 330 葛が谷店 \n9 CS006211000012 P050104001 1 115 葛が谷店 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-037: 商品データフレーム(df_product)とカテゴリデータフレーム(df_category)を内部結合し、商品データフレームの全項目とカテゴリデータフレームの小区分名(category_small_name)を10件表示させよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.merge(df_product\n , df_category[['category_major_cd', 'category_medium_cd','category_small_cd','category_small_name']]\n , how='inner', on=['category_major_cd', 'category_medium_cd','category_small_cd']).head(10)", "execution_count": 40, "outputs": [ { "output_type": "execute_result", "execution_count": 40, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costcategory_small_name
0P040101001440140101198.0149.0弁当類
1P040101002440140101218.0164.0弁当類
2P040101003440140101230.0173.0弁当類
3P040101004440140101248.0186.0弁当類
4P040101005440140101268.0201.0弁当類
5P040101006440140101298.0224.0弁当類
6P040101007440140101338.0254.0弁当類
7P040101008440140101420.0315.0弁当類
8P040101009440140101498.0374.0弁当類
9P040101010440140101580.0435.0弁当類
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_small_name \n0 198.0 149.0 弁当類 \n1 218.0 164.0 弁当類 \n2 230.0 173.0 弁当類 \n3 248.0 186.0 弁当類 \n4 268.0 201.0 弁当類 \n5 298.0 224.0 弁当類 \n6 338.0 254.0 弁当類 \n7 420.0 315.0 弁当類 \n8 498.0 374.0 弁当類 \n9 580.0 435.0 弁当類 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-038: 顧客データフレーム(df_customer)とレシート明細データフレーム(df_receipt)から、各顧客ごとの売上金額合計を求めよ。ただし、買い物の実績がない顧客については売上金額を0として表示させること。また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、非会員(顧客IDが'Z'から始まるもの)は除外すること。なお、結果は10件だけ表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_tmp = df_customer.query('gender_cd == \"1\" and not customer_id.str.startswith(\"Z\")', engine='python')\npd.merge(df_tmp['customer_id'], df_amount_sum, how='left', on='customer_id').fillna(0).head(10)", "execution_count": 41, "outputs": [ { "output_type": "execute_result", "execution_count": 41, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamount
0CS0213130001140.0
1CS0314150001725088.0
2CS0288110000010.0
3CS001215000145875.0
4CS0154140001033122.0
5CS033513000180868.0
6CS0356140000140.0
7CS0112150000483444.0
8CS0094130000790.0
9CS040412000191210.0
\n
", "text/plain": " customer_id amount\n0 CS021313000114 0.0\n1 CS031415000172 5088.0\n2 CS028811000001 0.0\n3 CS001215000145 875.0\n4 CS015414000103 3122.0\n5 CS033513000180 868.0\n6 CS035614000014 0.0\n7 CS011215000048 3444.0\n8 CS009413000079 0.0\n9 CS040412000191 210.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-039: レシート明細データフレーム(df_receipt)から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。ただし、非会員(顧客IDが'Z'から始まるもの)は除外すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_sum = df_receipt.groupby('customer_id').amount.sum().reset_index()\ndf_sum = df_sum.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_sum = df_sum.sort_values('amount', ascending=False).head(20)\n\ndf_cnt = df_receipt[~df_receipt.duplicated(subset=['customer_id', 'sales_ymd'])]\ndf_cnt = df_cnt.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_cnt = df_cnt.groupby('customer_id').sales_ymd.count().reset_index()\ndf_cnt = df_cnt.sort_values('sales_ymd', ascending=False).head(20)\n\npd.merge(df_sum, df_cnt, how='outer', on='customer_id')", "execution_count": 42, "outputs": [ { "output_type": "execute_result", "execution_count": 42, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountsales_ymd
0CS01741500009723086.020.0
1CS01541500018520153.022.0
2CS03141400005119202.019.0
3CS02841500000719127.021.0
4CS00160500000918925.0NaN
5CS01021400001018585.022.0
6CS01641500014118372.020.0
7CS00651500002318372.0NaN
8CS01141400010618338.0NaN
9CS03841500010417847.0NaN
10CS03541400002417615.0NaN
11CS02151500008917580.0NaN
12CS03241400007216563.0NaN
13CS01641500010116348.0NaN
14CS01141500000616094.0NaN
15CS03441500004716083.0NaN
16CS00751400009415735.0NaN
17CS00941400005915492.0NaN
18CS03041500003415468.0NaN
19CS01551500003415300.0NaN
20CS040214000008NaN23.0
21CS010214000002NaN21.0
22CS014214000023NaN19.0
23CS022515000226NaN19.0
24CS021515000172NaN19.0
25CS039414000052NaN19.0
26CS021514000045NaN19.0
27CS022515000028NaN18.0
28CS030214000008NaN18.0
29CS021515000056NaN18.0
30CS014415000077NaN18.0
31CS021515000211NaN18.0
32CS032415000209NaN18.0
33CS031414000073NaN18.0
\n
", "text/plain": " customer_id amount sales_ymd\n0 CS017415000097 23086.0 20.0\n1 CS015415000185 20153.0 22.0\n2 CS031414000051 19202.0 19.0\n3 CS028415000007 19127.0 21.0\n4 CS001605000009 18925.0 NaN\n5 CS010214000010 18585.0 22.0\n6 CS016415000141 18372.0 20.0\n7 CS006515000023 18372.0 NaN\n8 CS011414000106 18338.0 NaN\n9 CS038415000104 17847.0 NaN\n10 CS035414000024 17615.0 NaN\n11 CS021515000089 17580.0 NaN\n12 CS032414000072 16563.0 NaN\n13 CS016415000101 16348.0 NaN\n14 CS011415000006 16094.0 NaN\n15 CS034415000047 16083.0 NaN\n16 CS007514000094 15735.0 NaN\n17 CS009414000059 15492.0 NaN\n18 CS030415000034 15468.0 NaN\n19 CS015515000034 15300.0 NaN\n20 CS040214000008 NaN 23.0\n21 CS010214000002 NaN 21.0\n22 CS014214000023 NaN 19.0\n23 CS022515000226 NaN 19.0\n24 CS021515000172 NaN 19.0\n25 CS039414000052 NaN 19.0\n26 CS021514000045 NaN 19.0\n27 CS022515000028 NaN 18.0\n28 CS030214000008 NaN 18.0\n29 CS021515000056 NaN 18.0\n30 CS014415000077 NaN 18.0\n31 CS021515000211 NaN 18.0\n32 CS032415000209 NaN 18.0\n33 CS031414000073 NaN 18.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-040: 全ての店舗と全ての商品を組み合わせると何件のデータとなるか調査したい。店舗(df_store)と商品(df_product)を直積した件数を計算せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_store_tmp = df_store.copy()\ndf_product_tmp = df_product.copy()\n\ndf_store_tmp['key'] = 0\ndf_product_tmp['key'] = 0\nlen(pd.merge(df_store_tmp, df_product_tmp, how='outer', on='key'))", "execution_count": 43, "outputs": [ { "output_type": "execute_result", "execution_count": 43, "data": { "text/plain": "531590" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-041: レシート明細データフレーム(df_receipt)の売上金額(amount)を日付(sales_ymd)ごとに集計し、前日からの売上金額増減を計算せよ。なお、計算結果は10件表示すればよい。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\ndf_sales_amount_by_date = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift()], axis=1)\ndf_sales_amount_by_date.columns = ['sales_ymd','amount','lag_ymd','lag_amount']\ndf_sales_amount_by_date['diff_amount'] = df_sales_amount_by_date['amount'] - df_sales_amount_by_date['lag_amount']\ndf_sales_amount_by_date.head(10)", "execution_count": 44, "outputs": [ { "output_type": "execute_result", "execution_count": 44, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdamountlag_ymdlag_amountdiff_amount
02017010133723NaNNaNNaN
1201701022416520170101.033723.0-9558.0
2201701032750320170102.024165.03338.0
3201701043616520170103.027503.08662.0
4201701053783020170104.036165.01665.0
5201701063238720170105.037830.0-5443.0
6201701072341520170106.032387.0-8972.0
7201701082473720170107.023415.01322.0
8201701092671820170108.024737.01981.0
9201701102014320170109.026718.0-6575.0
\n
", "text/plain": " sales_ymd amount lag_ymd lag_amount diff_amount\n0 20170101 33723 NaN NaN NaN\n1 20170102 24165 20170101.0 33723.0 -9558.0\n2 20170103 27503 20170102.0 24165.0 3338.0\n3 20170104 36165 20170103.0 27503.0 8662.0\n4 20170105 37830 20170104.0 36165.0 1665.0\n5 20170106 32387 20170105.0 37830.0 -5443.0\n6 20170107 23415 20170106.0 32387.0 -8972.0\n7 20170108 24737 20170107.0 23415.0 1322.0\n8 20170109 26718 20170108.0 24737.0 1981.0\n9 20170110 20143 20170109.0 26718.0 -6575.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-042: レシート明細データフレーム(df_receipt)の売上金額(amount)を日付(sales_ymd)ごとに集計し、各日付のデータに対し、1日前、2日前、3日前のデータを結合せよ。結果は10件表示すればよい。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例1:縦持ちケース\ndf_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\nfor i in range(1, 4):\n if i == 1:\n df_lag = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1)\n else:\n df_lag = df_lag.append(pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1))\ndf_lag.columns = ['sales_ymd', 'amount', 'lag_ymd', 'lag_amount']\ndf_lag.dropna().sort_values(['sales_ymd','lag_ymd']).head(10)", "execution_count": 45, "outputs": [ { "output_type": "execute_result", "execution_count": 45, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdamountlag_ymdlag_amount
1201701022416520170101.033723.0
2201701032750320170101.033723.0
2201701032750320170102.024165.0
3201701043616520170101.033723.0
3201701043616520170102.024165.0
3201701043616520170103.027503.0
4201701053783020170102.024165.0
4201701053783020170103.027503.0
4201701053783020170104.036165.0
5201701063238720170103.027503.0
\n
", "text/plain": " sales_ymd amount lag_ymd lag_amount\n1 20170102 24165 20170101.0 33723.0\n2 20170103 27503 20170101.0 33723.0\n2 20170103 27503 20170102.0 24165.0\n3 20170104 36165 20170101.0 33723.0\n3 20170104 36165 20170102.0 24165.0\n3 20170104 36165 20170103.0 27503.0\n4 20170105 37830 20170102.0 24165.0\n4 20170105 37830 20170103.0 27503.0\n4 20170105 37830 20170104.0 36165.0\n5 20170106 32387 20170103.0 27503.0" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例2:横持ちケース\ndf_sales_amount_by_date = df_receipt[['sales_ymd', 'amount']].groupby('sales_ymd').sum().reset_index()\nfor i in range(1, 4):\n if i == 1:\n df_lag = pd.concat([df_sales_amount_by_date, df_sales_amount_by_date.shift(i)],axis=1)\n else:\n df_lag = pd.concat([df_lag, df_sales_amount_by_date.shift(i)],axis=1)\ndf_lag.columns = ['sales_ymd', 'amount', 'lag_ymd_1', 'lag_amount_1', 'lag_ymd_2', 'lag_amount_2', 'lag_ymd_3', 'lag_amount_3']\ndf_lag.dropna().sort_values(['sales_ymd']).head(10)", "execution_count": 46, "outputs": [ { "output_type": "execute_result", "execution_count": 46, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymdamountlag_ymd_1lag_amount_1lag_ymd_2lag_amount_2lag_ymd_3lag_amount_3
3201701043616520170103.027503.020170102.024165.020170101.033723.0
4201701053783020170104.036165.020170103.027503.020170102.024165.0
5201701063238720170105.037830.020170104.036165.020170103.027503.0
6201701072341520170106.032387.020170105.037830.020170104.036165.0
7201701082473720170107.023415.020170106.032387.020170105.037830.0
8201701092671820170108.024737.020170107.023415.020170106.032387.0
9201701102014320170109.026718.020170108.024737.020170107.023415.0
10201701112428720170110.020143.020170109.026718.020170108.024737.0
11201701122352620170111.024287.020170110.020143.020170109.026718.0
12201701132800420170112.023526.020170111.024287.020170110.020143.0
\n
", "text/plain": " sales_ymd amount lag_ymd_1 lag_amount_1 lag_ymd_2 lag_amount_2 \\\n3 20170104 36165 20170103.0 27503.0 20170102.0 24165.0 \n4 20170105 37830 20170104.0 36165.0 20170103.0 27503.0 \n5 20170106 32387 20170105.0 37830.0 20170104.0 36165.0 \n6 20170107 23415 20170106.0 32387.0 20170105.0 37830.0 \n7 20170108 24737 20170107.0 23415.0 20170106.0 32387.0 \n8 20170109 26718 20170108.0 24737.0 20170107.0 23415.0 \n9 20170110 20143 20170109.0 26718.0 20170108.0 24737.0 \n10 20170111 24287 20170110.0 20143.0 20170109.0 26718.0 \n11 20170112 23526 20170111.0 24287.0 20170110.0 20143.0 \n12 20170113 28004 20170112.0 23526.0 20170111.0 24287.0 \n\n lag_ymd_3 lag_amount_3 \n3 20170101.0 33723.0 \n4 20170102.0 24165.0 \n5 20170103.0 27503.0 \n6 20170104.0 36165.0 \n7 20170105.0 37830.0 \n8 20170106.0 32387.0 \n9 20170107.0 23415.0 \n10 20170108.0 24737.0 \n11 20170109.0 26718.0 \n12 20170110.0 20143.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-043: レシート明細データフレーム(df_receipt)と顧客データフレーム(df_customer)を結合し、性別(gender)と年代(ageから計算)ごとに売上金額(amount)を合計した売上サマリデータフレーム(df_sales_summary)を作成せよ。性別は0が男性、1が女性、9が不明を表すものとする。\n>\n> ただし、項目構成は年代、女性の売上金額、男性の売上金額、性別不明の売上金額の4項目とすること(縦に年代、横に性別のクロス集計)。また、年代は10歳ごとの階級とすること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_receipt, df_customer, how ='inner', on=\"customer_id\")\ndf_tmp['era'] = df_tmp['age'].apply(lambda x: math.floor(x / 10) * 10)\ndf_sales_summary = pd.pivot_table(df_tmp, index='era', columns='gender_cd', values='amount', aggfunc='sum').reset_index()\ndf_sales_summary.columns = ['era', 'male', 'female', 'unknown']\ndf_sales_summary", "execution_count": 47, "outputs": [ { "output_type": "execute_result", "execution_count": 47, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
eramalefemaleunknown
0101591.0149836.04317.0
12072940.01363724.044328.0
230177322.0693047.050441.0
34019355.09320791.0483512.0
45054320.06685192.0342923.0
560272469.0987741.071418.0
67013435.029764.02427.0
78046360.0262923.05111.0
890NaN6260.0NaN
\n
", "text/plain": " era male female unknown\n0 10 1591.0 149836.0 4317.0\n1 20 72940.0 1363724.0 44328.0\n2 30 177322.0 693047.0 50441.0\n3 40 19355.0 9320791.0 483512.0\n4 50 54320.0 6685192.0 342923.0\n5 60 272469.0 987741.0 71418.0\n6 70 13435.0 29764.0 2427.0\n7 80 46360.0 262923.0 5111.0\n8 90 NaN 6260.0 NaN" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-044: 前設問で作成した売上サマリデータフレーム(df_sales_summary)は性別の売上を横持ちさせたものであった。このデータフレームから性別を縦持ちさせ、年代、性別コード、売上金額の3項目に変換せよ。ただし、性別コードは男性を'00'、女性を'01'、不明を'99'とする。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_sales_summary = df_sales_summary.set_index('era'). \\\n stack().reset_index().replace({'female':'01',\n 'male':'00',\n 'unknown':'99'}).rename(columns={'level_1':'gender_cd', 0: 'amount'})", "execution_count": 48, "outputs": [] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_sales_summary", "execution_count": 49, "outputs": [ { "output_type": "execute_result", "execution_count": 49, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
eragender_cdamount
010001591.0
11001149836.0
210994317.0
3200072940.0
420011363724.0
5209944328.0
63000177322.0
73001693047.0
8309950441.0
9400019355.0
1040019320791.0
114099483512.0
12500054320.0
1350016685192.0
145099342923.0
156000272469.0
166001987741.0
17609971418.0
18700013435.0
19700129764.0
2070992427.0
21800046360.0
228001262923.0
2380995111.0
2490016260.0
\n
", "text/plain": " era gender_cd amount\n0 10 00 1591.0\n1 10 01 149836.0\n2 10 99 4317.0\n3 20 00 72940.0\n4 20 01 1363724.0\n5 20 99 44328.0\n6 30 00 177322.0\n7 30 01 693047.0\n8 30 99 50441.0\n9 40 00 19355.0\n10 40 01 9320791.0\n11 40 99 483512.0\n12 50 00 54320.0\n13 50 01 6685192.0\n14 50 99 342923.0\n15 60 00 272469.0\n16 60 01 987741.0\n17 60 99 71418.0\n18 70 00 13435.0\n19 70 01 29764.0\n20 70 99 2427.0\n21 80 00 46360.0\n22 80 01 262923.0\n23 80 99 5111.0\n24 90 01 6260.0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-045: 顧客データフレーム(df_customer)の生年月日(birth_day)は日付型(Date)でデータを保有している。これをYYYYMMDD形式の文字列に変換し、顧客ID(customer_id)とともに抽出せよ。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_customer['customer_id'],\n pd.to_datetime(df_customer['birth_day']).dt.strftime('%Y%m%d')],\n axis = 1).head(10)", "execution_count": 50, "outputs": [ { "output_type": "execute_result", "execution_count": 50, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idbirth_day
0CS02131300011419810429
1CS03761300007119520401
2CS03141500017219761004
3CS02881100000119330327
4CS00121500014519950329
5CS02040100001619740915
6CS01541400010319770809
7CS02940300000819730817
8CS01580400000419310502
9CS03351300018019620711
\n
", "text/plain": " customer_id birth_day\n0 CS021313000114 19810429\n1 CS037613000071 19520401\n2 CS031415000172 19761004\n3 CS028811000001 19330327\n4 CS001215000145 19950329\n5 CS020401000016 19740915\n6 CS015414000103 19770809\n7 CS029403000008 19730817\n8 CS015804000004 19310502\n9 CS033513000180 19620711" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-046: 顧客データフレーム(df_customer)の申し込み日(application_date)はYYYYMMD形式の文字列型でデータを保有している。これを日付型(dateやdatetime)に変換し、顧客ID(customer_id)とともに抽出せよ。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_customer['customer_id'],pd.to_datetime(df_customer['application_date'])], axis=1).head(10)", "execution_count": 51, "outputs": [ { "output_type": "execute_result", "execution_count": 51, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idapplication_date
0CS0213130001141970-01-01 00:00:00.020150905
1CS0376130000711970-01-01 00:00:00.020150414
2CS0314150001721970-01-01 00:00:00.020150529
3CS0288110000011970-01-01 00:00:00.020160115
4CS0012150001451970-01-01 00:00:00.020170605
5CS0204010000161970-01-01 00:00:00.020150225
6CS0154140001031970-01-01 00:00:00.020150722
7CS0294030000081970-01-01 00:00:00.020150515
8CS0158040000041970-01-01 00:00:00.020150607
9CS0335130001801970-01-01 00:00:00.020150728
\n
", "text/plain": " customer_id application_date\n0 CS021313000114 1970-01-01 00:00:00.020150905\n1 CS037613000071 1970-01-01 00:00:00.020150414\n2 CS031415000172 1970-01-01 00:00:00.020150529\n3 CS028811000001 1970-01-01 00:00:00.020160115\n4 CS001215000145 1970-01-01 00:00:00.020170605\n5 CS020401000016 1970-01-01 00:00:00.020150225\n6 CS015414000103 1970-01-01 00:00:00.020150722\n7 CS029403000008 1970-01-01 00:00:00.020150515\n8 CS015804000004 1970-01-01 00:00:00.020150607\n9 CS033513000180 1970-01-01 00:00:00.020150728" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-047: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)はYYYYMMDD形式の数値型でデータを保有している。これを日付型(dateやdatetime)に変換し、レシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_ymd'].astype('str'))],axis=1).head(10)", "execution_count": 52, "outputs": [ { "output_type": "execute_result", "execution_count": 52, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
receipt_noreceipt_sub_nosales_ymd
011212018-11-03
1113222018-11-18
2110212017-07-12
3113212019-02-05
4110222018-08-21
5111212019-06-05
6110222018-12-05
7110212019-09-22
8111222017-05-04
9110212019-10-10
\n
", "text/plain": " receipt_no receipt_sub_no sales_ymd\n0 112 1 2018-11-03\n1 1132 2 2018-11-18\n2 1102 1 2017-07-12\n3 1132 1 2019-02-05\n4 1102 2 2018-08-21\n5 1112 1 2019-06-05\n6 1102 2 2018-12-05\n7 1102 1 2019-09-22\n8 1112 2 2017-05-04\n9 1102 1 2019-10-10" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-048: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)は数値型のUNIX秒でデータを保有している。これを日付型(dateやdatetime)に変換し、レシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s')],axis=1).head(10)", "execution_count": 53, "outputs": [ { "output_type": "execute_result", "execution_count": 53, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
receipt_noreceipt_sub_nosales_epoch
011212009-11-03
1113222009-11-18
2110212008-07-12
3113212010-02-05
4110222009-08-21
5111212010-06-05
6110222009-12-05
7110212010-09-22
8111222008-05-04
9110212010-10-10
\n
", "text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 2009-11-03\n1 1132 2 2009-11-18\n2 1102 1 2008-07-12\n3 1132 1 2010-02-05\n4 1102 2 2009-08-21\n5 1112 1 2010-06-05\n6 1102 2 2009-12-05\n7 1102 1 2010-09-22\n8 1112 2 2008-05-04\n9 1102 1 2010-10-10" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-049: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"年\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.year],axis=1).head(10)", "execution_count": 54, "outputs": [ { "output_type": "execute_result", "execution_count": 54, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
receipt_noreceipt_sub_nosales_epoch
011212009
1113222009
2110212008
3113212010
4110222009
5111212010
6110222009
7110212010
8111222008
9110212010
\n
", "text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 2009\n1 1132 2 2009\n2 1102 1 2008\n3 1132 1 2010\n4 1102 2 2009\n5 1112 1 2010\n6 1102 2 2009\n7 1102 1 2010\n8 1112 2 2008\n9 1102 1 2010" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-050: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"月\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。なお、\"月\"は0埋め2桁で取り出すこと。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# dt.monthでも月を取得できるが、ここでは0埋め2桁で取り出すためstrftimeを利用している\npd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.strftime('%m')],axis=1).head(10)", "execution_count": 55, "outputs": [ { "output_type": "execute_result", "execution_count": 55, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
receipt_noreceipt_sub_nosales_epoch
0112111
11132211
21102107
31132102
41102208
51112106
61102212
71102109
81112205
91102110
\n
", "text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 11\n1 1132 2 11\n2 1102 1 07\n3 1132 1 02\n4 1102 2 08\n5 1112 1 06\n6 1102 2 12\n7 1102 1 09\n8 1112 2 05\n9 1102 1 10" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-051: レシート明細データフレーム(df_receipt)の売上エポック秒(sales_epoch)を日付型(timestamp型)に変換し、\"日\"だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに抽出せよ。なお、\"日\"は0埋め2桁で取り出すこと。データは10件を抽出すれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# dt.dayでも日を取得できるが、ここでは0埋め2桁で取り出すためstrftimeを利用している\npd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],\n pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.strftime('%d')],axis=1).head(10)", "execution_count": 56, "outputs": [ { "output_type": "execute_result", "execution_count": 56, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
receipt_noreceipt_sub_nosales_epoch
0112103
11132218
21102112
31132105
41102221
51112105
61102205
71102122
81112204
91102110
\n
", "text/plain": " receipt_no receipt_sub_no sales_epoch\n0 112 1 03\n1 1132 2 18\n2 1102 1 12\n3 1132 1 05\n4 1102 2 21\n5 1112 1 05\n6 1102 2 05\n7 1102 1 22\n8 1112 2 04\n9 1102 1 10" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-052: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計の上、売上金額合計に対して2000円以下を0、2000円超を1に2値化し、顧客ID、売上金額合計とともに10件表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python')\ndf_sales_amount = df_sales_amount[['customer_id', 'amount']].groupby('customer_id').sum().reset_index()\ndf_sales_amount['sales_flg'] = df_sales_amount['amount'].apply(lambda x: 1 if x > 2000 else 0)\ndf_sales_amount.head(10)", "execution_count": 57, "outputs": [ { "output_type": "execute_result", "execution_count": 57, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountsales_flg
0CS00111300000412980
1CS0011140000056260
2CS00111500001030441
3CS00120500000419880
4CS00120500000633371
5CS0012110000254560
6CS0012120000274480
7CS0012120000312960
8CS0012120000462280
9CS0012120000704560
\n
", "text/plain": " customer_id amount sales_flg\n0 CS001113000004 1298 0\n1 CS001114000005 626 0\n2 CS001115000010 3044 1\n3 CS001205000004 1988 0\n4 CS001205000006 3337 1\n5 CS001211000025 456 0\n6 CS001212000027 448 0\n7 CS001212000031 296 0\n8 CS001212000046 228 0\n9 CS001212000070 456 0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-053: 顧客データフレーム(df_customer)の郵便番号(postal_cd)に対し、東京(先頭3桁が100〜209のもの)を1、それ以外のものを0に2値化せよ。さらにレシート明細データフレーム(df_receipt)と結合し、全期間において買い物実績のある顧客数を、作成した2値ごとにカウントせよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_customer[['customer_id', 'postal_cd']].copy()\ndf_tmp['postal_flg'] = df_tmp['postal_cd'].apply(lambda x: 1 if 100 <= int(x[0:3]) <= 209 else 0)\n\npd.merge(df_tmp, df_receipt, how='inner', on='customer_id'). \\\n groupby('postal_flg').agg({'customer_id':'nunique'})\n\n", "execution_count": 58, "outputs": [ { "output_type": "execute_result", "execution_count": 58, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_id
postal_flg
03906
14400
\n
", "text/plain": " customer_id\npostal_flg \n0 3906\n1 4400" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-054: 顧客データデータフレーム(df_customer)の住所(address)は、埼玉県、千葉県、東京都、神奈川県のいずれかとなっている。都道府県毎にコード値を作成し、顧客ID、住所とともに抽出せよ。値は埼玉県を11、千葉県を12、東京都を13、神奈川県を14とすること。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.concat([df_customer[['customer_id', 'address']], df_customer['address'].str[0:3].map({'埼玉県': '11',\n '千葉県':'12', \n '東京都':'13', \n '神奈川':'14'})], axis=1).head(10)", "execution_count": 59, "outputs": [ { "output_type": "execute_result", "execution_count": 59, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idaddressaddress
0CS021313000114神奈川県伊勢原市粟窪**********14
1CS037613000071東京都江東区南砂**********13
2CS031415000172東京都渋谷区代々木**********13
3CS028811000001神奈川県横浜市泉区和泉町**********14
4CS001215000145東京都大田区仲六郷**********13
5CS020401000016東京都板橋区若木**********13
6CS015414000103東京都江東区北砂**********13
7CS029403000008千葉県浦安市海楽**********12
8CS015804000004東京都江東区北砂**********13
9CS033513000180神奈川県横浜市旭区善部町**********14
\n
", "text/plain": " customer_id address address\n0 CS021313000114 神奈川県伊勢原市粟窪********** 14\n1 CS037613000071 東京都江東区南砂********** 13\n2 CS031415000172 東京都渋谷区代々木********** 13\n3 CS028811000001 神奈川県横浜市泉区和泉町********** 14\n4 CS001215000145 東京都大田区仲六郷********** 13\n5 CS020401000016 東京都板橋区若木********** 13\n6 CS015414000103 東京都江東区北砂********** 13\n7 CS029403000008 千葉県浦安市海楽********** 12\n8 CS015804000004 東京都江東区北砂********** 13\n9 CS033513000180 神奈川県横浜市旭区善部町********** 14" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-055: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、その合計金額の四分位点を求めよ。その上で、顧客ごとの売上金額合計に対して以下の基準でカテゴリ値を作成し、顧客ID、売上金額と合計ともに表示せよ。カテゴリ値は上から順に1〜4とする。結果は10件表示させれば良い。\n>\n> - 最小値以上第一四分位未満\n> - 第一四分位以上第二四分位未満\n> - 第二四分位以上第三四分位未満\n> - 第三四分位以上" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例1\ndf_sales_amount = df_receipt[['customer_id', 'amount']].groupby('customer_id').sum().reset_index()\npct25 = np.quantile(df_sales_amount['amount'], 0.25)\npct50 = np.quantile(df_sales_amount['amount'], 0.5)\npct75 = np.quantile(df_sales_amount['amount'], 0.75)\n\ndef pct_group(x):\n if x < pct25:\n return 1\n elif pct25 <= x < pct50:\n return 2\n elif pct50 <= x < pct75:\n return 3\n elif pct75 <= x:\n return 4\n\ndf_sales_amount['pct_group'] = df_sales_amount['amount'].apply(lambda x: pct_group(x))\ndf_sales_amount.head(10)", "execution_count": 60, "outputs": [ { "output_type": "execute_result", "execution_count": 60, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountpct_group
0CS00111300000412982
1CS0011140000056262
2CS00111500001030443
3CS00120500000419883
4CS00120500000633373
5CS0012110000254561
6CS0012120000274481
7CS0012120000312961
8CS0012120000462281
9CS0012120000704561
\n
", "text/plain": " customer_id amount pct_group\n0 CS001113000004 1298 2\n1 CS001114000005 626 2\n2 CS001115000010 3044 3\n3 CS001205000004 1988 3\n4 CS001205000006 3337 3\n5 CS001211000025 456 1\n6 CS001212000027 448 1\n7 CS001212000031 296 1\n8 CS001212000046 228 1\n9 CS001212000070 456 1" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# 確認用\nprint('pct25:', pct25)\nprint('pct50:', pct50)\nprint('pct75:', pct75)", "execution_count": 61, "outputs": [ { "output_type": "stream", "text": "pct25: 548.5\npct50: 1478.0\npct75: 3651.0\n", "name": "stdout" } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例2\ndf_temp = df_receipt.groupby('customer_id')[['amount']].sum()\ndf_temp['quantile'], bins = pd.qcut(df_receipt.groupby('customer_id')['amount'].sum(), 4, retbins=True)\ndisplay(df_temp.head())\nprint('quantiles:', bins)", "execution_count": 62, "outputs": [ { "output_type": "display_data", "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
amountquantile
customer_id
CS0011130000041298(548.5, 1478.0]
CS001114000005626(548.5, 1478.0]
CS0011150000103044(1478.0, 3651.0]
CS0012050000041988(1478.0, 3651.0]
CS0012050000063337(1478.0, 3651.0]
\n
", "text/plain": " amount quantile\ncustomer_id \nCS001113000004 1298 (548.5, 1478.0]\nCS001114000005 626 (548.5, 1478.0]\nCS001115000010 3044 (1478.0, 3651.0]\nCS001205000004 1988 (1478.0, 3651.0]\nCS001205000006 3337 (1478.0, 3651.0]" }, "metadata": {} }, { "output_type": "stream", "text": "quantiles: [7.0000000e+01 5.4850000e+02 1.4780000e+03 3.6510000e+03 1.2395003e+07]\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-056: 顧客データフレーム(df_customer)の年齢(age)をもとに10歳刻みで年代を算出し、顧客ID(customer_id)、生年月日(birth_day)とともに抽出せよ。ただし、60歳以上は全て60歳代とすること。年代を表すカテゴリ名は任意とする。先頭10件を表示させればよい。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例1\ndf_customer_era = pd.concat([df_customer[['customer_id', 'birth_day']],\n df_customer['age'].apply(lambda x: min(math.floor(x / 10) * 10, 60))],\n axis=1)\n\ndf_customer_era.head(10)", "execution_count": 63, "outputs": [ { "output_type": "execute_result", "execution_count": 63, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idbirth_dayage
0CS0213130001141981-04-2930
1CS0376130000711952-04-0160
2CS0314150001721976-10-0440
3CS0288110000011933-03-2760
4CS0012150001451995-03-2920
5CS0204010000161974-09-1540
6CS0154140001031977-08-0940
7CS0294030000081973-08-1740
8CS0158040000041931-05-0260
9CS0335130001801962-07-1150
\n
", "text/plain": " customer_id birth_day age\n0 CS021313000114 1981-04-29 30\n1 CS037613000071 1952-04-01 60\n2 CS031415000172 1976-10-04 40\n3 CS028811000001 1933-03-27 60\n4 CS001215000145 1995-03-29 20\n5 CS020401000016 1974-09-15 40\n6 CS015414000103 1977-08-09 40\n7 CS029403000008 1973-08-17 40\n8 CS015804000004 1931-05-02 60\n9 CS033513000180 1962-07-11 50" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例2\ndf_customer['age_group'] = pd.cut(df_customer['age'], bins=[0, 10, 20, 30, 40, 50, 60, np.inf], right=False)\ndf_customer[['customer_id', 'birth_day', 'age_group']].head(10)", "execution_count": 64, "outputs": [ { "output_type": "execute_result", "execution_count": 64, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idbirth_dayage_group
0CS0213130001141981-04-29[30.0, 40.0)
1CS0376130000711952-04-01[60.0, inf)
2CS0314150001721976-10-04[40.0, 50.0)
3CS0288110000011933-03-27[60.0, inf)
4CS0012150001451995-03-29[20.0, 30.0)
5CS0204010000161974-09-15[40.0, 50.0)
6CS0154140001031977-08-09[40.0, 50.0)
7CS0294030000081973-08-17[40.0, 50.0)
8CS0158040000041931-05-02[60.0, inf)
9CS0335130001801962-07-11[50.0, 60.0)
\n
", "text/plain": " customer_id birth_day age_group\n0 CS021313000114 1981-04-29 [30.0, 40.0)\n1 CS037613000071 1952-04-01 [60.0, inf)\n2 CS031415000172 1976-10-04 [40.0, 50.0)\n3 CS028811000001 1933-03-27 [60.0, inf)\n4 CS001215000145 1995-03-29 [20.0, 30.0)\n5 CS020401000016 1974-09-15 [40.0, 50.0)\n6 CS015414000103 1977-08-09 [40.0, 50.0)\n7 CS029403000008 1973-08-17 [40.0, 50.0)\n8 CS015804000004 1931-05-02 [60.0, inf)\n9 CS033513000180 1962-07-11 [50.0, 60.0)" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-057: 前問題の抽出結果と性別(gender)を組み合わせ、新たに性別×年代の組み合わせを表すカテゴリデータを作成せよ。組み合わせを表すカテゴリの値は任意とする。先頭10件を表示させればよい。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer_era['era_gender'] = df_customer['gender_cd'].astype('str') + df_customer_era['age'].astype('str')\ndf_customer_era.head(10)", "execution_count": 65, "outputs": [ { "output_type": "execute_result", "execution_count": 65, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idbirth_dayageera_gender
0CS0213130001141981-04-2930130
1CS0376130000711952-04-0160960
2CS0314150001721976-10-0440140
3CS0288110000011933-03-2760160
4CS0012150001451995-03-2920120
5CS0204010000161974-09-1540040
6CS0154140001031977-08-0940140
7CS0294030000081973-08-1740040
8CS0158040000041931-05-0260060
9CS0335130001801962-07-1150150
\n
", "text/plain": " customer_id birth_day age era_gender\n0 CS021313000114 1981-04-29 30 130\n1 CS037613000071 1952-04-01 60 960\n2 CS031415000172 1976-10-04 40 140\n3 CS028811000001 1933-03-27 60 160\n4 CS001215000145 1995-03-29 20 120\n5 CS020401000016 1974-09-15 40 040\n6 CS015414000103 1977-08-09 40 140\n7 CS029403000008 1973-08-17 40 040\n8 CS015804000004 1931-05-02 60 060\n9 CS033513000180 1962-07-11 50 150" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-058: 顧客データフレーム(df_customer)の性別コード(gender_cd)をダミー変数化し、顧客ID(customer_id)とともに抽出せよ。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "pd.get_dummies(df_customer[['customer_id', 'gender_cd']], columns=['gender_cd']).head(10)", "execution_count": 66, "outputs": [ { "output_type": "execute_result", "execution_count": 66, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idgender_cd_0gender_cd_1gender_cd_9
0CS021313000114010
1CS037613000071001
2CS031415000172010
3CS028811000001010
4CS001215000145010
5CS020401000016100
6CS015414000103010
7CS029403000008100
8CS015804000004100
9CS033513000180010
\n
", "text/plain": " customer_id gender_cd_0 gender_cd_1 gender_cd_9\n0 CS021313000114 0 1 0\n1 CS037613000071 0 0 1\n2 CS031415000172 0 1 0\n3 CS028811000001 0 1 0\n4 CS001215000145 0 1 0\n5 CS020401000016 1 0 0\n6 CS015414000103 0 1 0\n7 CS029403000008 1 0 0\n8 CS015804000004 1 0 0\n9 CS033513000180 0 1 0" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-059: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を平均0、標準偏差1に標準化して顧客ID、売上金額合計とともに表示せよ。標準化に使用する標準偏差は、不偏標準偏差と標本標準偏差のどちらでも良いものとする。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_ss'] = preprocessing.scale(df_sales_amount['amount'])\ndf_sales_amount.head(10)", "execution_count": 67, "outputs": [ { "output_type": "execute_result", "execution_count": 67, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountamount_ss
0CS0011130000041298-0.459378
1CS001114000005626-0.706390
2CS00111500001030440.182413
3CS0012050000041988-0.205749
4CS00120500000633370.290114
5CS001211000025456-0.768879
6CS001212000027448-0.771819
7CS001212000031296-0.827691
8CS001212000046228-0.852686
9CS001212000070456-0.768879
\n
", "text/plain": " customer_id amount amount_ss\n0 CS001113000004 1298 -0.459378\n1 CS001114000005 626 -0.706390\n2 CS001115000010 3044 0.182413\n3 CS001205000004 1988 -0.205749\n4 CS001205000006 3337 0.290114\n5 CS001211000025 456 -0.768879\n6 CS001212000027 448 -0.771819\n7 CS001212000031 296 -0.827691\n8 CS001212000046 228 -0.852686\n9 CS001212000070 456 -0.768879" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-060: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を最小値0、最大値1に正規化して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_mm'] = preprocessing.minmax_scale(df_sales_amount['amount'])\ndf_sales_amount.head(10)", "execution_count": 68, "outputs": [ { "output_type": "execute_result", "execution_count": 68, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountamount_mm
0CS00111300000412980.053354
1CS0011140000056260.024157
2CS00111500001030440.129214
3CS00120500000419880.083333
4CS00120500000633370.141945
5CS0012110000254560.016771
6CS0012120000274480.016423
7CS0012120000312960.009819
8CS0012120000462280.006865
9CS0012120000704560.016771
\n
", "text/plain": " customer_id amount amount_mm\n0 CS001113000004 1298 0.053354\n1 CS001114000005 626 0.024157\n2 CS001115000010 3044 0.129214\n3 CS001205000004 1988 0.083333\n4 CS001205000006 3337 0.141945\n5 CS001211000025 456 0.016771\n6 CS001212000027 448 0.016423\n7 CS001212000031 296 0.009819\n8 CS001212000046 228 0.006865\n9 CS001212000070 456 0.016771" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-061: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を常用対数化(底=10)して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_log10'] = np.log10(df_sales_amount['amount'] + 1)\ndf_sales_amount.head(10)", "execution_count": 69, "outputs": [ { "output_type": "execute_result", "execution_count": 69, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountamount_log10
0CS00111300000412983.113609
1CS0011140000056262.797268
2CS00111500001030443.483587
3CS00120500000419883.298635
4CS00120500000633373.523486
5CS0012110000254562.659916
6CS0012120000274482.652246
7CS0012120000312962.472756
8CS0012120000462282.359835
9CS0012120000704562.659916
\n
", "text/plain": " customer_id amount amount_log10\n0 CS001113000004 1298 3.113609\n1 CS001114000005 626 2.797268\n2 CS001115000010 3044 3.483587\n3 CS001205000004 1988 3.298635\n4 CS001205000006 3337 3.523486\n5 CS001211000025 456 2.659916\n6 CS001212000027 448 2.652246\n7 CS001212000031 296 2.472756\n8 CS001212000046 228 2.359835\n9 CS001212000070 456 2.659916" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-062: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、合計した売上金額を自然対数化(底=e)して顧客ID、売上金額合計とともに表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_loge'] = np.log(df_sales_amount['amount'] + 1)\ndf_sales_amount.head(10)", "execution_count": 70, "outputs": [ { "output_type": "execute_result", "execution_count": 70, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountamount_loge
0CS00111300000412987.169350
1CS0011140000056266.440947
2CS00111500001030448.021256
3CS00120500000419887.595387
4CS00120500000633378.113127
5CS0012110000254566.124683
6CS0012120000274486.107023
7CS0012120000312965.693732
8CS0012120000462285.433722
9CS0012120000704566.124683
\n
", "text/plain": " customer_id amount amount_loge\n0 CS001113000004 1298 7.169350\n1 CS001114000005 626 6.440947\n2 CS001115000010 3044 8.021256\n3 CS001205000004 1988 7.595387\n4 CS001205000006 3337 8.113127\n5 CS001211000025 456 6.124683\n6 CS001212000027 448 6.107023\n7 CS001212000031 296 5.693732\n8 CS001212000046 228 5.433722\n9 CS001212000070 456 6.124683" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-063: 商品データフレーム(df_product)の単価(unit_price)と原価(unit_cost)から、各商品の利益額を算出せよ。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_product.copy()\ndf_tmp['unit_profit'] = df_tmp['unit_price'] - df_tmp['unit_cost']\ndf_tmp.head(10)", "execution_count": 71, "outputs": [ { "output_type": "execute_result", "execution_count": 71, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costunit_profit
0P040101001440140101198.0149.049.0
1P040101002440140101218.0164.054.0
2P040101003440140101230.0173.057.0
3P040101004440140101248.0186.062.0
4P040101005440140101268.0201.067.0
5P040101006440140101298.0224.074.0
6P040101007440140101338.0254.084.0
7P040101008440140101420.0315.0105.0
8P040101009440140101498.0374.0124.0
9P040101010440140101580.0435.0145.0
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost unit_profit \n0 198.0 149.0 49.0 \n1 218.0 164.0 54.0 \n2 230.0 173.0 57.0 \n3 248.0 186.0 62.0 \n4 268.0 201.0 67.0 \n5 298.0 224.0 74.0 \n6 338.0 254.0 84.0 \n7 420.0 315.0 105.0 \n8 498.0 374.0 124.0 \n9 580.0 435.0 145.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-064: 商品データフレーム(df_product)の単価(unit_price)と原価(unit_cost)から、各商品の利益率の全体平均を算出せよ。\nただし、単価と原価にはNULLが存在することに注意せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_product.copy()\ndf_tmp['unit_profit_rate'] = (df_tmp['unit_price'] - df_tmp['unit_cost']) / df_tmp['unit_price']\ndf_tmp['unit_profit_rate'].mean(skipna=True)", "execution_count": 72, "outputs": [ { "output_type": "execute_result", "execution_count": 72, "data": { "text/plain": "0.24911389885176904" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-065: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。ただし、1円未満は切り捨てること。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# math.floorはNaNでエラーとなるが、numpy.floorはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.floor(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)", "execution_count": 73, "outputs": [ { "output_type": "execute_result", "execution_count": 73, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costnew_pricenew_profit_rate
0P040101001440140101198.0149.0212.00.297170
1P040101002440140101218.0164.0234.00.299145
2P040101003440140101230.0173.0247.00.299595
3P040101004440140101248.0186.0265.00.298113
4P040101005440140101268.0201.0287.00.299652
5P040101006440140101298.0224.0320.00.300000
6P040101007440140101338.0254.0362.00.298343
7P040101008440140101420.0315.0450.00.300000
8P040101009440140101498.0374.0534.00.299625
9P040101010440140101580.0435.0621.00.299517
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 212.0 0.297170 \n1 218.0 164.0 234.0 0.299145 \n2 230.0 173.0 247.0 0.299595 \n3 248.0 186.0 265.0 0.298113 \n4 268.0 201.0 287.0 0.299652 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 362.0 0.298343 \n7 420.0 315.0 450.0 0.300000 \n8 498.0 374.0 534.0 0.299625 \n9 580.0 435.0 621.0 0.299517 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-066: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。今回は、1円未満を四捨五入すること(0.5については偶数方向の丸めで良い)。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# 組み込みのroundはNaNでエラーとなるが、numpy.roundはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.round(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)", "execution_count": 74, "outputs": [ { "output_type": "execute_result", "execution_count": 74, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costnew_pricenew_profit_rate
0P040101001440140101198.0149.0213.00.300469
1P040101002440140101218.0164.0234.00.299145
2P040101003440140101230.0173.0247.00.299595
3P040101004440140101248.0186.0266.00.300752
4P040101005440140101268.0201.0287.00.299652
5P040101006440140101298.0224.0320.00.300000
6P040101007440140101338.0254.0363.00.300275
7P040101008440140101420.0315.0450.00.300000
8P040101009440140101498.0374.0534.00.299625
9P040101010440140101580.0435.0621.00.299517
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 213.0 0.300469 \n1 218.0 164.0 234.0 0.299145 \n2 230.0 173.0 247.0 0.299595 \n3 248.0 186.0 266.0 0.300752 \n4 268.0 201.0 287.0 0.299652 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 363.0 0.300275 \n7 420.0 315.0 450.0 0.300000 \n8 498.0 374.0 534.0 0.299625 \n9 580.0 435.0 621.0 0.299517 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-067: 商品データフレーム(df_product)の各商品について、利益率が30%となる新たな単価を求めよ。今回は、1円未満を切り上げること。そして結果を10件表示させ、利益率がおよそ30%付近であることを確認せよ。ただし、単価(unit_price)と原価(unit_cost)にはNULLが存在することに注意せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# math.ceilはNaNでエラーとなるが、numpy.ceilはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['new_price'] = df_tmp['unit_cost'].apply(lambda x: np.ceil(x / 0.7))\ndf_tmp['new_profit_rate'] = (df_tmp['new_price'] - df_tmp['unit_cost']) / df_tmp['new_price']\ndf_tmp.head(10)", "execution_count": 75, "outputs": [ { "output_type": "execute_result", "execution_count": 75, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costnew_pricenew_profit_rate
0P040101001440140101198.0149.0213.00.300469
1P040101002440140101218.0164.0235.00.302128
2P040101003440140101230.0173.0248.00.302419
3P040101004440140101248.0186.0266.00.300752
4P040101005440140101268.0201.0288.00.302083
5P040101006440140101298.0224.0320.00.300000
6P040101007440140101338.0254.0363.00.300275
7P040101008440140101420.0315.0451.00.301552
8P040101009440140101498.0374.0535.00.300935
9P040101010440140101580.0435.0622.00.300643
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost new_price new_profit_rate \n0 198.0 149.0 213.0 0.300469 \n1 218.0 164.0 235.0 0.302128 \n2 230.0 173.0 248.0 0.302419 \n3 248.0 186.0 266.0 0.300752 \n4 268.0 201.0 288.0 0.302083 \n5 298.0 224.0 320.0 0.300000 \n6 338.0 254.0 363.0 0.300275 \n7 420.0 315.0 451.0 0.301552 \n8 498.0 374.0 535.0 0.300935 \n9 580.0 435.0 622.0 0.300643 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-068: 商品データフレーム(df_product)の各商品について、消費税率10%の税込み金額を求めよ。 1円未満の端数は切り捨てとし、結果は10件表示すれば良い。ただし、単価(unit_price)にはNULLが存在することに注意せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# math.floorはNaNでエラーとなるが、numpy.floorはエラーとならない\ndf_tmp = df_product.copy()\ndf_tmp['price_tax'] = df_tmp['unit_price'].apply(lambda x: np.floor(x * 1.1))\ndf_tmp.head(10)", "execution_count": 76, "outputs": [ { "output_type": "execute_result", "execution_count": 76, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costprice_tax
0P040101001440140101198.0149.0217.0
1P040101002440140101218.0164.0239.0
2P040101003440140101230.0173.0253.0
3P040101004440140101248.0186.0272.0
4P040101005440140101268.0201.0294.0
5P040101006440140101298.0224.0327.0
6P040101007440140101338.0254.0371.0
7P040101008440140101420.0315.0462.0
8P040101009440140101498.0374.0547.0
9P040101010440140101580.0435.0638.0
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost price_tax \n0 198.0 149.0 217.0 \n1 218.0 164.0 239.0 \n2 230.0 173.0 253.0 \n3 248.0 186.0 272.0 \n4 268.0 201.0 294.0 \n5 298.0 224.0 327.0 \n6 338.0 254.0 371.0 \n7 420.0 315.0 462.0 \n8 498.0 374.0 547.0 \n9 580.0 435.0 638.0 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-069: レシート明細データフレーム(df_receipt)と商品データフレーム(df_product)を結合し、顧客毎に全商品の売上金額合計と、カテゴリ大区分(category_major_cd)が\"07\"(瓶詰缶詰)の売上金額合計を計算の上、両者の比率を求めよ。抽出対象はカテゴリ大区分\"07\"(瓶詰缶詰)の購入実績がある顧客のみとし、結果は10件表示させればよい。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例1\ndf_tmp_1 = pd.merge(df_receipt, df_product, \n how='inner', on='product_cd').groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\ndf_tmp_2 = pd.merge(df_receipt, df_product.query('category_major_cd == \"07\"'), \n how='inner', on='product_cd').groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\ndf_tmp_3 = pd.merge(df_tmp_1, df_tmp_2, how='inner', on='customer_id')\ndf_tmp_3['rate_07'] = df_tmp_3['amount_y'] / df_tmp_3['amount_x']\ndf_tmp_3.head(10)", "execution_count": 77, "outputs": [ { "output_type": "execute_result", "execution_count": 77, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamount_xamount_yrate_07
0CS001113000004129812981.000000
1CS0011140000056264860.776358
2CS001115000010304426940.885020
3CS00120500000419883460.174044
4CS001205000006333720040.600539
5CS0012120000274482000.446429
6CS0012120000312962961.000000
7CS0012120000462281080.473684
8CS0012120000704563080.675439
9CS0012130000182431450.596708
\n
", "text/plain": " customer_id amount_x amount_y rate_07\n0 CS001113000004 1298 1298 1.000000\n1 CS001114000005 626 486 0.776358\n2 CS001115000010 3044 2694 0.885020\n3 CS001205000004 1988 346 0.174044\n4 CS001205000006 3337 2004 0.600539\n5 CS001212000027 448 200 0.446429\n6 CS001212000031 296 296 1.000000\n7 CS001212000046 228 108 0.473684\n8 CS001212000070 456 308 0.675439\n9 CS001213000018 243 145 0.596708" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# コード例2\ndf_temp = df_receipt.merge(df_product, how='left', on='product_cd').groupby(['customer_id', 'category_major_cd'])['amount'].sum().unstack()\ndf_temp = df_temp[df_temp[7] > 0]\ndf_temp['sum'] = df_temp.sum(axis=1)\ndf_temp['07_rate'] = df_temp[7] / df_temp['sum']\ndf_temp.head(10)", "execution_count": 78, "outputs": [ { "output_type": "execute_result", "execution_count": 78, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
category_major_cd456789sum07_rate
customer_id
CS001113000004NaNNaNNaN1298.0NaNNaN1298.01.000000
CS001114000005NaN40.0NaN486.0100.0NaN626.00.776358
CS001115000010NaNNaNNaN2694.0NaN350.03044.00.885020
CS001205000004100.0128.0286.0346.0368.0760.01988.00.174044
CS001205000006635.060.0198.02004.080.0360.03337.00.600539
CS001212000027248.0NaNNaN200.0NaNNaN448.00.446429
CS001212000031NaNNaNNaN296.0NaNNaN296.01.000000
CS001212000046NaNNaNNaN108.0NaN120.0228.00.473684
CS001212000070NaNNaN148.0308.0NaNNaN456.00.675439
CS001213000018NaNNaNNaN145.098.0NaN243.00.596708
\n
", "text/plain": "category_major_cd 4 5 6 7 8 9 sum 07_rate\ncustomer_id \nCS001113000004 NaN NaN NaN 1298.0 NaN NaN 1298.0 1.000000\nCS001114000005 NaN 40.0 NaN 486.0 100.0 NaN 626.0 0.776358\nCS001115000010 NaN NaN NaN 2694.0 NaN 350.0 3044.0 0.885020\nCS001205000004 100.0 128.0 286.0 346.0 368.0 760.0 1988.0 0.174044\nCS001205000006 635.0 60.0 198.0 2004.0 80.0 360.0 3337.0 0.600539\nCS001212000027 248.0 NaN NaN 200.0 NaN NaN 448.0 0.446429\nCS001212000031 NaN NaN NaN 296.0 NaN NaN 296.0 1.000000\nCS001212000046 NaN NaN NaN 108.0 NaN 120.0 228.0 0.473684\nCS001212000070 NaN NaN 148.0 308.0 NaN NaN 456.0 0.675439\nCS001213000018 NaN NaN NaN 145.0 98.0 NaN 243.0 0.596708" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-070: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過日数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\ndf_tmp['elapsed_date'] = df_tmp['sales_ymd'] - df_tmp['application_date']\ndf_tmp.head(10)", "execution_count": 79, "outputs": [ { "output_type": "execute_result", "execution_count": 79, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymdapplication_dateelapsed_date
0CS0062140000012018-11-031970-01-01 00:00:00.02015020117837 days 23:59:59.979849
1CS0062140000012017-05-091970-01-01 00:00:00.02015020117294 days 23:59:59.979849
2CS0062140000012017-06-081970-01-01 00:00:00.02015020117324 days 23:59:59.979849
4CS0062140000012018-10-281970-01-01 00:00:00.02015020117831 days 23:59:59.979849
7CS0062140000012019-09-081970-01-01 00:00:00.02015020118146 days 23:59:59.979849
8CS0062140000012018-01-311970-01-01 00:00:00.02015020117561 days 23:59:59.979849
9CS0062140000012017-07-051970-01-01 00:00:00.02015020117351 days 23:59:59.979849
10CS0062140000012018-11-101970-01-01 00:00:00.02015020117844 days 23:59:59.979849
12CS0062140000012019-04-101970-01-01 00:00:00.02015020117995 days 23:59:59.979849
15CS0062140000012019-06-011970-01-01 00:00:00.02015020118047 days 23:59:59.979849
\n
", "text/plain": " customer_id sales_ymd application_date \\\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 \n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 \n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 \n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 \n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 \n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 \n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 \n10 CS006214000001 2018-11-10 1970-01-01 00:00:00.020150201 \n12 CS006214000001 2019-04-10 1970-01-01 00:00:00.020150201 \n15 CS006214000001 2019-06-01 1970-01-01 00:00:00.020150201 \n\n elapsed_date \n0 17837 days 23:59:59.979849 \n1 17294 days 23:59:59.979849 \n2 17324 days 23:59:59.979849 \n4 17831 days 23:59:59.979849 \n7 18146 days 23:59:59.979849 \n8 17561 days 23:59:59.979849 \n9 17351 days 23:59:59.979849 \n10 17844 days 23:59:59.979849 \n12 17995 days 23:59:59.979849 \n15 18047 days 23:59:59.979849 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-071: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過月数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1ヶ月未満は切り捨てること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = df_tmp[['sales_ymd', 'application_date']].apply(lambda x: \n relativedelta(x[0], x[1]).years * 12 + relativedelta(x[0], x[1]).months, axis=1)\ndf_tmp.sort_values('customer_id').head(10)", "execution_count": 80, "outputs": [ { "output_type": "execute_result", "execution_count": 80, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymdapplication_dateelapsed_date
60376CS0011130000042019-03-081970-01-01 00:00:00.020151105590
20158CS0011140000052019-07-311970-01-01 00:00:00.020160412594
20156CS0011140000052018-05-031970-01-01 00:00:00.020160412580
29140CS0011150000102019-04-051970-01-01 00:00:00.020150417591
29141CS0011150000102018-07-011970-01-01 00:00:00.020150417581
29142CS0011150000102017-12-281970-01-01 00:00:00.020150417575
6526CS0012050000042019-06-251970-01-01 00:00:00.020160615593
6521CS0012050000042019-03-121970-01-01 00:00:00.020160615590
6518CS0012050000042018-08-211970-01-01 00:00:00.020160615583
6520CS0012050000042017-09-141970-01-01 00:00:00.020160615572
\n
", "text/plain": " customer_id sales_ymd application_date elapsed_date\n60376 CS001113000004 2019-03-08 1970-01-01 00:00:00.020151105 590\n20158 CS001114000005 2019-07-31 1970-01-01 00:00:00.020160412 594\n20156 CS001114000005 2018-05-03 1970-01-01 00:00:00.020160412 580\n29140 CS001115000010 2019-04-05 1970-01-01 00:00:00.020150417 591\n29141 CS001115000010 2018-07-01 1970-01-01 00:00:00.020150417 581\n29142 CS001115000010 2017-12-28 1970-01-01 00:00:00.020150417 575\n6526 CS001205000004 2019-06-25 1970-01-01 00:00:00.020160615 593\n6521 CS001205000004 2019-03-12 1970-01-01 00:00:00.020160615 590\n6518 CS001205000004 2018-08-21 1970-01-01 00:00:00.020160615 583\n6520 CS001205000004 2017-09-14 1970-01-01 00:00:00.020160615 572" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-072: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からの経過年数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い。(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1年未満は切り捨てること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = df_tmp[['sales_ymd', 'application_date']].apply(lambda x: \n relativedelta(x[0], x[1]).years, axis=1)\ndf_tmp.head(10)", "execution_count": 81, "outputs": [ { "output_type": "execute_result", "execution_count": 81, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymdapplication_dateelapsed_date
0CS0062140000012018-11-031970-01-01 00:00:00.02015020148
1CS0062140000012017-05-091970-01-01 00:00:00.02015020147
2CS0062140000012017-06-081970-01-01 00:00:00.02015020147
3CS0062140000012017-06-081970-01-01 00:00:00.02015020147
4CS0062140000012018-10-281970-01-01 00:00:00.02015020148
5CS0062140000012018-10-281970-01-01 00:00:00.02015020148
6CS0062140000012017-05-091970-01-01 00:00:00.02015020147
7CS0062140000012019-09-081970-01-01 00:00:00.02015020149
8CS0062140000012018-01-311970-01-01 00:00:00.02015020148
9CS0062140000012017-07-051970-01-01 00:00:00.02015020147
\n
", "text/plain": " customer_id sales_ymd application_date elapsed_date\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 48\n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 47\n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 47\n3 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 47\n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 48\n5 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 48\n6 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 47\n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 49\n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 48\n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 47" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-073: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、顧客データフレーム(df_customer)の会員申込日(application_date)からのエポック秒による経過時間を計算し、顧客ID(customer_id)、売上日、会員申込日とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。なお、時間情報は保有していないため各日付は0時0分0秒を表すものとする。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_receipt[['customer_id', 'sales_ymd']], df_customer[['customer_id', 'application_date']],\n how='inner', on='customer_id')\n\ndf_tmp = df_tmp.drop_duplicates()\n\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['application_date'] = pd.to_datetime(df_tmp['application_date'])\n\ndf_tmp['elapsed_date'] = (df_tmp['sales_ymd'].astype(np.int64) / 10**9) - (df_tmp['application_date'].astype(np.int64) / 10**9)\ndf_tmp.head(10)", "execution_count": 82, "outputs": [ { "output_type": "execute_result", "execution_count": 82, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymdapplication_dateelapsed_date
0CS0062140000012018-11-031970-01-01 00:00:00.0201502011.541203e+09
1CS0062140000012017-05-091970-01-01 00:00:00.0201502011.494288e+09
2CS0062140000012017-06-081970-01-01 00:00:00.0201502011.496880e+09
4CS0062140000012018-10-281970-01-01 00:00:00.0201502011.540685e+09
7CS0062140000012019-09-081970-01-01 00:00:00.0201502011.567901e+09
8CS0062140000012018-01-311970-01-01 00:00:00.0201502011.517357e+09
9CS0062140000012017-07-051970-01-01 00:00:00.0201502011.499213e+09
10CS0062140000012018-11-101970-01-01 00:00:00.0201502011.541808e+09
12CS0062140000012019-04-101970-01-01 00:00:00.0201502011.554854e+09
15CS0062140000012019-06-011970-01-01 00:00:00.0201502011.559347e+09
\n
", "text/plain": " customer_id sales_ymd application_date elapsed_date\n0 CS006214000001 2018-11-03 1970-01-01 00:00:00.020150201 1.541203e+09\n1 CS006214000001 2017-05-09 1970-01-01 00:00:00.020150201 1.494288e+09\n2 CS006214000001 2017-06-08 1970-01-01 00:00:00.020150201 1.496880e+09\n4 CS006214000001 2018-10-28 1970-01-01 00:00:00.020150201 1.540685e+09\n7 CS006214000001 2019-09-08 1970-01-01 00:00:00.020150201 1.567901e+09\n8 CS006214000001 2018-01-31 1970-01-01 00:00:00.020150201 1.517357e+09\n9 CS006214000001 2017-07-05 1970-01-01 00:00:00.020150201 1.499213e+09\n10 CS006214000001 2018-11-10 1970-01-01 00:00:00.020150201 1.541808e+09\n12 CS006214000001 2019-04-10 1970-01-01 00:00:00.020150201 1.554854e+09\n15 CS006214000001 2019-06-01 1970-01-01 00:00:00.020150201 1.559347e+09" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-074: レシート明細データフレーム(df_receipt)の売上日(sales_ymd)に対し、当該週の月曜日からの経過日数を計算し、売上日、当該週の月曜日付とともに表示せよ。結果は10件表示させれば良い(なお、sales_ymdは数値でデータを保持している点に注意)。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_receipt[['customer_id', 'sales_ymd']]\ndf_tmp = df_tmp.drop_duplicates()\ndf_tmp['sales_ymd'] = pd.to_datetime(df_tmp['sales_ymd'].astype('str'))\ndf_tmp['monday'] = df_tmp['sales_ymd'].apply(lambda x: x - relativedelta(days=x.weekday()))\ndf_tmp['elapsed_weekday'] = df_tmp['sales_ymd'] - df_tmp['monday']\ndf_tmp.head(10)", "execution_count": 83, "outputs": [ { "output_type": "execute_result", "execution_count": 83, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idsales_ymdmondayelapsed_weekday
0CS0062140000012018-11-032018-10-295 days
1CS0084150000972018-11-182018-11-126 days
2CS0284140000142017-07-122017-07-102 days
3ZZ0000000000002019-02-052019-02-041 days
4CS0254150000502018-08-212018-08-201 days
5CS0035150001952019-06-052019-06-032 days
6CS0245140000422018-12-052018-12-032 days
7CS0404150001782019-09-222019-09-166 days
8ZZ0000000000002017-05-042017-05-013 days
9CS0275140000152019-10-102019-10-073 days
\n
", "text/plain": " customer_id sales_ymd monday elapsed_weekday\n0 CS006214000001 2018-11-03 2018-10-29 5 days\n1 CS008415000097 2018-11-18 2018-11-12 6 days\n2 CS028414000014 2017-07-12 2017-07-10 2 days\n3 ZZ000000000000 2019-02-05 2019-02-04 1 days\n4 CS025415000050 2018-08-21 2018-08-20 1 days\n5 CS003515000195 2019-06-05 2019-06-03 2 days\n6 CS024514000042 2018-12-05 2018-12-03 2 days\n7 CS040415000178 2019-09-22 2019-09-16 6 days\n8 ZZ000000000000 2017-05-04 2017-05-01 3 days\n9 CS027514000015 2019-10-10 2019-10-07 3 days" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-075: 顧客データフレーム(df_customer)からランダムに1%のデータを抽出し、先頭から10件データを抽出せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer.sample(frac=0.01).head(10)", "execution_count": 84, "outputs": [ { "output_type": "execute_result", "execution_count": 84, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cdage_group
18772CS005515000369春日 翔子1女性1964-10-0354167-0031東京都杉並区本天沼**********S13005201802113-20100517-2[50.0, 60.0)
11534CS029603000027田村 正義0男性1948-11-0170279-0011千葉県浦安市美浜**********S12029201501280-00000000-0[60.0, inf)
992CS018715000013古賀 杏1女性1941-02-2778204-0012東京都清瀬市中清戸**********S13018201508010-00000000-0[60.0, inf)
13280CS016312000113亀山 れいな1女性1987-04-2031187-0011東京都小平市鈴木町**********S13016201505250-00000000-0[30.0, 40.0)
12097CS015212000045鶴田 奈央1女性1992-01-0627136-0073東京都江東区北砂**********S13015201507226-20090316-3[20.0, 30.0)
9797CS019313000127杉本 あさみ1女性1987-10-2131173-0036東京都板橋区向原**********S13019201503240-00000000-0[30.0, 40.0)
17813CS007615000110岡本 美幸1女性1954-05-1364285-0845千葉県佐倉市西志津**********S12007201410198-20101018-C[60.0, inf)
19506CS037511000006柴田 陽子1女性1966-01-2953136-0071東京都江東区亀戸**********S1303720150630A-20081209-4[50.0, 60.0)
4257CS006312000096柳川 真帆1女性1979-02-0340224-0041神奈川県横浜市都筑区仲町台**********S14006201501290-00000000-0[40.0, 50.0)
2686CS028513000046生瀬 さやか1女性1961-09-1257246-0031神奈川県横浜市瀬谷区瀬谷**********S1402820150313B-20100927-C[50.0, 60.0)
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n18772 CS005515000369 春日 翔子 1 女性 1964-10-03 54 \n11534 CS029603000027 田村 正義 0 男性 1948-11-01 70 \n992 CS018715000013 古賀 杏 1 女性 1941-02-27 78 \n13280 CS016312000113 亀山 れいな 1 女性 1987-04-20 31 \n12097 CS015212000045 鶴田 奈央 1 女性 1992-01-06 27 \n9797 CS019313000127 杉本 あさみ 1 女性 1987-10-21 31 \n17813 CS007615000110 岡本 美幸 1 女性 1954-05-13 64 \n19506 CS037511000006 柴田 陽子 1 女性 1966-01-29 53 \n4257 CS006312000096 柳川 真帆 1 女性 1979-02-03 40 \n2686 CS028513000046 生瀬 さやか 1 女性 1961-09-12 57 \n\n postal_cd address application_store_cd \\\n18772 167-0031 東京都杉並区本天沼********** S13005 \n11534 279-0011 千葉県浦安市美浜********** S12029 \n992 204-0012 東京都清瀬市中清戸********** S13018 \n13280 187-0011 東京都小平市鈴木町********** S13016 \n12097 136-0073 東京都江東区北砂********** S13015 \n9797 173-0036 東京都板橋区向原********** S13019 \n17813 285-0845 千葉県佐倉市西志津********** S12007 \n19506 136-0071 東京都江東区亀戸********** S13037 \n4257 224-0041 神奈川県横浜市都筑区仲町台********** S14006 \n2686 246-0031 神奈川県横浜市瀬谷区瀬谷********** S14028 \n\n application_date status_cd age_group \n18772 20180211 3-20100517-2 [50.0, 60.0) \n11534 20150128 0-00000000-0 [60.0, inf) \n992 20150801 0-00000000-0 [60.0, inf) \n13280 20150525 0-00000000-0 [30.0, 40.0) \n12097 20150722 6-20090316-3 [20.0, 30.0) \n9797 20150324 0-00000000-0 [30.0, 40.0) \n17813 20141019 8-20101018-C [60.0, inf) \n19506 20150630 A-20081209-4 [50.0, 60.0) \n4257 20150129 0-00000000-0 [40.0, 50.0) \n2686 20150313 B-20100927-C [50.0, 60.0) " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-076: 顧客データフレーム(df_customer)から性別(gender_cd)の割合に基づきランダムに10%のデータを層化抽出データし、性別ごとに件数を集計せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# sklearn.model_selection.train_test_splitを使用した例\n_, df_tmp = train_test_split(df_customer, test_size=0.1, stratify=df_customer['gender'])\ndf_tmp.groupby('gender_cd').agg({'customer_id' : 'count'})", "execution_count": 85, "outputs": [ { "output_type": "execute_result", "execution_count": 85, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_id
gender_cd
0298
11793
9107
\n
", "text/plain": " customer_id\ngender_cd \n0 298\n1 1793\n9 107" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp.head(10)", "execution_count": 86, "outputs": [ { "output_type": "execute_result", "execution_count": 86, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cdage_group
20954CS002413000302細谷 希1女性1974-11-2244185-0023東京都国分寺市西元町**********S13002201702010-00000000-0[40.0, 50.0)
11786CS001312000436上原 怜奈1女性1986-04-2632212-0004神奈川県川崎市幸区小向西町**********S13001201612300-00000000-0[30.0, 40.0)
21437CS014312000076戸塚 礼子1女性1983-03-2936263-0013千葉県千葉市稲毛区千草台**********S12014201511010-00000000-0[30.0, 40.0)
8791CS003113000013日下部 美咲1女性2003-07-0715182-0022東京都調布市国領町**********S13003201602010-00000000-0[10.0, 20.0)
15810CS039515000204井田 季衣1女性1968-03-1151168-0081東京都杉並区宮前**********S1303920150102D-20100724-E[50.0, 60.0)
3702CS001713000172中井 愛1女性1945-10-1673144-0035東京都大田区南蒲田**********S13001201612210-00000000-0[60.0, inf)
11297CS032414000003藤村 みき1女性1976-03-0443144-0056東京都大田区西六郷**********S1303220150418F-20101027-E[40.0, 50.0)
15839CS005713000139五十嵐 結衣1女性1946-11-0172167-0021東京都杉並区井草**********S13005201702210-00000000-0[60.0, inf)
3082CS020414000070小笠原 千夏1女性1970-09-1148114-0001東京都北区東十条**********S1302020150322D-20100901-E[40.0, 50.0)
14682CS019613000074尾崎 華子1女性1955-07-1063176-0002東京都練馬区桜台**********S13019201509180-00000000-0[60.0, inf)
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age \\\n20954 CS002413000302 細谷 希 1 女性 1974-11-22 44 \n11786 CS001312000436 上原 怜奈 1 女性 1986-04-26 32 \n21437 CS014312000076 戸塚 礼子 1 女性 1983-03-29 36 \n8791 CS003113000013 日下部 美咲 1 女性 2003-07-07 15 \n15810 CS039515000204 井田 季衣 1 女性 1968-03-11 51 \n3702 CS001713000172 中井 愛 1 女性 1945-10-16 73 \n11297 CS032414000003 藤村 みき 1 女性 1976-03-04 43 \n15839 CS005713000139 五十嵐 結衣 1 女性 1946-11-01 72 \n3082 CS020414000070 小笠原 千夏 1 女性 1970-09-11 48 \n14682 CS019613000074 尾崎 華子 1 女性 1955-07-10 63 \n\n postal_cd address application_store_cd \\\n20954 185-0023 東京都国分寺市西元町********** S13002 \n11786 212-0004 神奈川県川崎市幸区小向西町********** S13001 \n21437 263-0013 千葉県千葉市稲毛区千草台********** S12014 \n8791 182-0022 東京都調布市国領町********** S13003 \n15810 168-0081 東京都杉並区宮前********** S13039 \n3702 144-0035 東京都大田区南蒲田********** S13001 \n11297 144-0056 東京都大田区西六郷********** S13032 \n15839 167-0021 東京都杉並区井草********** S13005 \n3082 114-0001 東京都北区東十条********** S13020 \n14682 176-0002 東京都練馬区桜台********** S13019 \n\n application_date status_cd age_group \n20954 20170201 0-00000000-0 [40.0, 50.0) \n11786 20161230 0-00000000-0 [30.0, 40.0) \n21437 20151101 0-00000000-0 [30.0, 40.0) \n8791 20160201 0-00000000-0 [10.0, 20.0) \n15810 20150102 D-20100724-E [50.0, 60.0) \n3702 20161221 0-00000000-0 [60.0, inf) \n11297 20150418 F-20101027-E [40.0, 50.0) \n15839 20170221 0-00000000-0 [60.0, inf) \n3082 20150322 D-20100901-E [40.0, 50.0) \n14682 20150918 0-00000000-0 [60.0, inf) " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-077: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、ここでは外れ値を平均から3σ以上離れたものとする。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_sales_amount['amount_ss'] = preprocessing.scale(df_sales_amount['amount'])\ndf_sales_amount.query('abs(amount_ss) >= 3').head(10)", "execution_count": 87, "outputs": [ { "output_type": "execute_result", "execution_count": 87, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamountamount_ss
332CS001605000009189256.019921
1755CS006415000147127233.740202
1817CS006515000023183725.816651
1833CS006515000125125753.685800
1841CS006515000209113733.243972
1870CS007115000006115283.300946
1941CS007514000056132933.949721
1943CS007514000094157354.847347
1951CS007515000107111883.175970
1997CS007615000026119593.459372
\n
", "text/plain": " customer_id amount amount_ss\n332 CS001605000009 18925 6.019921\n1755 CS006415000147 12723 3.740202\n1817 CS006515000023 18372 5.816651\n1833 CS006515000125 12575 3.685800\n1841 CS006515000209 11373 3.243972\n1870 CS007115000006 11528 3.300946\n1941 CS007514000056 13293 3.949721\n1943 CS007514000094 15735 4.847347\n1951 CS007515000107 11188 3.175970\n1997 CS007615000026 11959 3.459372" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-078: レシート明細データフレーム(df_receipt)の売上金額(amount)を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、ここでは外れ値を第一四分位と第三四分位の差であるIQRを用いて、「第一四分位数-1.5×IQR」よりも下回るもの、または「第三四分位数+1.5×IQR」を超えるものとする。結果は10件表示させれば良い。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "# skleanのpreprocessing.scaleを利用するため、標本標準偏差で計算されている\ndf_sales_amount = df_receipt.query('not customer_id.str.startswith(\"Z\")', engine='python'). \\\n groupby('customer_id').agg({'amount':'sum'}).reset_index()\n\npct75 = np.percentile(df_sales_amount['amount'], q=75)\npct25 = np.percentile(df_sales_amount['amount'], q=25)\niqr = pct75 - pct25\namount_low = pct25 - (iqr * 1.5)\namount_hight = pct75 + (iqr * 1.5)\ndf_sales_amount.query('amount < @amount_low or @amount_hight < amount').head(10)", "execution_count": 88, "outputs": [ { "output_type": "execute_result", "execution_count": 88, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamount
98CS0014140000488584
332CS00160500000918925
549CS0024150005949568
1180CS0044140001819584
1558CS0054150001378734
1733CS0064140000019156
1736CS0064140000299179
1752CS00641500010510042
1755CS00641500014712723
1757CS00641500015710648
\n
", "text/plain": " customer_id amount\n98 CS001414000048 8584\n332 CS001605000009 18925\n549 CS002415000594 9568\n1180 CS004414000181 9584\n1558 CS005415000137 8734\n1733 CS006414000001 9156\n1736 CS006414000029 9179\n1752 CS006415000105 10042\n1755 CS006415000147 12723\n1757 CS006415000157 10648" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-079: 商品データフレーム(df_product)の各項目に対し、欠損数を確認せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product.isnull().sum()", "execution_count": 89, "outputs": [ { "output_type": "execute_result", "execution_count": 89, "data": { "text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 7\nunit_cost 7\ndtype: int64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-080: 商品データフレーム(df_product)のいずれかの項目に欠損が発生しているレコードを全て削除した新たなdf_product_1を作成せよ。なお、削除前後の件数を表示させ、前設問で確認した件数だけ減少していることも確認すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_1 = df_product.copy()\nprint('削除前:', len(df_product_1))\ndf_product_1.dropna(inplace=True)\nprint('削除後:', len(df_product_1))", "execution_count": 90, "outputs": [ { "output_type": "stream", "text": "削除前: 10030\n削除後: 10023\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-081: 単価(unit_price)と原価(unit_cost)の欠損値について、それぞれの平均値で補完した新たなdf_product_2を作成せよ。なお、平均値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_2 = df_product.fillna({'unit_price':np.round(np.nanmean(df_product['unit_price'])), \n 'unit_cost':np.round(np.nanmean(df_product['unit_cost']))})\ndf_product_2.isnull().sum()", "execution_count": 91, "outputs": [ { "output_type": "execute_result", "execution_count": 91, "data": { "text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\ndtype: int64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-082: 単価(unit_price)と原価(unit_cost)の欠損値について、それぞれの中央値で補完した新たなdf_product_3を作成せよ。なお、中央値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_3 = df_product.fillna({'unit_price':np.round(np.nanmedian(df_product['unit_price'])), \n 'unit_cost':np.round(np.nanmedian(df_product['unit_cost']))})\ndf_product_3.isnull().sum()", "execution_count": 92, "outputs": [ { "output_type": "execute_result", "execution_count": 92, "data": { "text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\ndtype: int64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-083: 単価(unit_price)と原価(unit_cost)の欠損値について、各商品の小区分(category_small_cd)ごとに算出した中央値で補完した新たなdf_product_4を作成せよ。なお、中央値について1円未満は四捨五入とし、0.5については偶数寄せでかまわない。補完実施後、各項目について欠損が生じていないことも確認すること。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_product.groupby('category_small_cd').agg({'unit_price':'median', 'unit_cost':'median'}).reset_index()\ndf_tmp.columns = ['category_small_cd', 'median_price', 'median_cost']\n\ndf_product_4 = pd.merge(df_product, df_tmp, how='inner', on='category_small_cd')\n\ndf_product_4['unit_price'] = df_product_4[['unit_price', 'median_price']]. \\\n apply(lambda x: np.round(x[1]) if np.isnan(x[0]) else x[0], axis=1)\ndf_product_4['unit_cost'] = df_product_4[['unit_cost', 'median_cost']]. \\\n apply(lambda x: np.round(x[1]) if np.isnan(x[0]) else x[0], axis=1)\n\ndf_product_4.isnull().sum()", "execution_count": 93, "outputs": [ { "output_type": "execute_result", "execution_count": 93, "data": { "text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\nmedian_price 0\nmedian_cost 0\ndtype: int64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-084: 顧客データフレーム(df_customer)の全顧客に対し、全期間の売上金額に占める2019年売上金額の割合を計算せよ。ただし、販売実績のない場合は0として扱うこと。そして計算した割合が0超のものを抽出せよ。 結果は10件表示させれば良い。また、作成したデータにNAやNANが存在しないことを確認せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp_1 = df_receipt.query('20190101 <= sales_ymd <= 20191231')\ndf_tmp_1 = pd.merge(df_customer['customer_id'], df_tmp_1[['customer_id', 'amount']], how='left', on='customer_id'). \\\n groupby('customer_id').sum().reset_index().rename(columns={'amount':'amount_2019'})\n\ndf_tmp_2 = pd.merge(df_customer['customer_id'], df_receipt[['customer_id', 'amount']], how='left', on='customer_id'). \\\n groupby('customer_id').sum().reset_index()\n\ndf_tmp = pd.merge(df_tmp_1, df_tmp_2, how='inner', on='customer_id')\ndf_tmp['amount_rate'] = df_tmp['amount_2019'] / df_tmp['amount']", "execution_count": 94, "outputs": [] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp.query('amount_rate > 0').head(10)", "execution_count": 95, "outputs": [ { "output_type": "execute_result", "execution_count": 95, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idamount_2019amountamount_rate
8CS0011130000041298.01298.01.000000
10CS001114000005188.0626.00.300319
12CS001115000010578.03044.00.189882
17CS001205000004702.01988.00.353119
18CS001205000006486.03337.00.145640
23CS001211000025456.0456.01.000000
30CS001212000070456.0456.01.000000
57CS001214000009664.04685.00.141729
59CS0012140000172962.04132.00.716844
61CS0012140000481889.02374.00.795703
\n
", "text/plain": " customer_id amount_2019 amount amount_rate\n8 CS001113000004 1298.0 1298.0 1.000000\n10 CS001114000005 188.0 626.0 0.300319\n12 CS001115000010 578.0 3044.0 0.189882\n17 CS001205000004 702.0 1988.0 0.353119\n18 CS001205000006 486.0 3337.0 0.145640\n23 CS001211000025 456.0 456.0 1.000000\n30 CS001212000070 456.0 456.0 1.000000\n57 CS001214000009 664.0 4685.0 0.141729\n59 CS001214000017 2962.0 4132.0 0.716844\n61 CS001214000048 1889.0 2374.0 0.795703" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_4.isnull().sum()", "execution_count": 96, "outputs": [ { "output_type": "execute_result", "execution_count": 96, "data": { "text/plain": "product_cd 0\ncategory_major_cd 0\ncategory_medium_cd 0\ncategory_small_cd 0\nunit_price 0\nunit_cost 0\nmedian_price 0\nmedian_cost 0\ndtype: int64" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-085: 顧客データフレーム(df_customer)の全顧客に対し、郵便番号(postal_cd)を用いて経度緯度変換用データフレーム(df_geocode)を紐付け、新たなdf_customer_1を作成せよ。ただし、複数紐づく場合は経度(longitude)、緯度(latitude)それぞれ平均を算出すること。\n" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer_1 = pd.merge(df_customer[['customer_id', 'postal_cd']],\n df_geocode[['postal_cd', 'longitude' ,'latitude']],\n how='inner', on='postal_cd')\ndf_customer_1 = df_customer_1.groupby('customer_id'). \\\n agg({'longitude':'mean', 'latitude':'mean'}).reset_index(). \\\n rename(columns={'longitude':'m_longitude', 'latitude':'m_latitude'})\n\ndf_customer_1 = pd.merge(df_customer, df_customer_1, how='inner', on='customer_id')\ndf_customer_1.head(3)", "execution_count": 97, "outputs": [ { "output_type": "execute_result", "execution_count": 97, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idcustomer_namegender_cdgenderbirth_dayagepostal_cdaddressapplication_store_cdapplication_datestatus_cdage_groupm_longitudem_latitude
0CS021313000114大野 あや子1女性1981-04-2937259-1113神奈川県伊勢原市粟窪**********S14021201509050-00000000-0[30.0, 40.0)139.3177935.41358
1CS037613000071六角 雅彦9不明1952-04-0166136-0076東京都江東区南砂**********S13037201504140-00000000-0[60.0, inf)139.8350235.67193
2CS031415000172宇多田 貴美子1女性1976-10-0442151-0053東京都渋谷区代々木**********S1303120150529D-20100325-C[40.0, 50.0)139.6896535.67374
\n
", "text/plain": " customer_id customer_name gender_cd gender birth_day age postal_cd \\\n0 CS021313000114 大野 あや子 1 女性 1981-04-29 37 259-1113 \n1 CS037613000071 六角 雅彦 9 不明 1952-04-01 66 136-0076 \n2 CS031415000172 宇多田 貴美子 1 女性 1976-10-04 42 151-0053 \n\n address application_store_cd application_date status_cd \\\n0 神奈川県伊勢原市粟窪********** S14021 20150905 0-00000000-0 \n1 東京都江東区南砂********** S13037 20150414 0-00000000-0 \n2 東京都渋谷区代々木********** S13031 20150529 D-20100325-C \n\n age_group m_longitude m_latitude \n0 [30.0, 40.0) 139.31779 35.41358 \n1 [60.0, inf) 139.83502 35.67193 \n2 [40.0, 50.0) 139.68965 35.67374 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-086: 前設問で作成した緯度経度つき顧客データフレーム(df_customer_1)に対し、申込み店舗コード(application_store_cd)をキーに店舗データフレーム(df_store)と結合せよ。そして申込み店舗の緯度(latitude)・経度情報(longitude)と顧客の緯度・経度を用いて距離(km)を求め、顧客ID(customer_id)、顧客住所(address)、店舗住所(address)とともに表示せよ。計算式は簡易式で良いものとするが、その他精度の高い方式を利用したライブラリを利用してもかまわない。結果は10件表示すれば良い。" }, { "metadata": {}, "cell_type": "markdown", "source": "$$\n緯度(ラジアン):\\phi \\\\\n経度(ラジアン):\\lambda \\\\\n距離L = 6371 * arccos(sin \\phi_1 * sin \\phi_2\n+ cos \\phi_1 * cos \\phi_2 * cos(\\lambda_1 − \\lambda_2))\n$$" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "def calc_distance(x1, y1, x2, y2):\n distance = 6371 * math.acos(math.sin(math.radians(y1)) * math.sin(math.radians(y2)) \n + math.cos(math.radians(y1)) * math.cos(math.radians(y2)) \n * math.cos(math.radians(x1) - math.radians(x2)))\n return distance\n\ndf_tmp = pd.merge(df_customer_1, df_store, how='inner', left_on='application_store_cd', right_on='store_cd') \n\ndf_tmp['distance'] = df_tmp[['m_longitude', 'm_latitude','longitude', 'latitude']]. \\\n apply(lambda x: calc_distance(x[0], x[1], x[2], x[3]), axis=1)\n\ndf_tmp[['customer_id', 'address_x', 'address_y', 'distance']].head(10)", "execution_count": 98, "outputs": [ { "output_type": "execute_result", "execution_count": 98, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
customer_idaddress_xaddress_ydistance
0CS021313000114神奈川県伊勢原市粟窪**********神奈川県伊勢原市伊勢原四丁目1.394409
1CS021313000025神奈川県伊勢原市伊勢原**********神奈川県伊勢原市伊勢原四丁目0.474282
2CS021411000096神奈川県伊勢原市高森**********神奈川県伊勢原市伊勢原四丁目2.480155
3CS021415000150神奈川県伊勢原市上粕屋**********神奈川県伊勢原市伊勢原四丁目2.734723
4CS021313000046神奈川県伊勢原市池端**********神奈川県伊勢原市伊勢原四丁目1.111911
5CS021103000002神奈川県伊勢原市西富岡**********神奈川県伊勢原市伊勢原四丁目2.384941
6CS021214000028神奈川県伊勢原市桜台**********神奈川県伊勢原市伊勢原四丁目1.399344
7CS021512000095神奈川県伊勢原市沼目**********神奈川県伊勢原市伊勢原四丁目1.993991
8CS021613000002神奈川県伊勢原市桜台**********神奈川県伊勢原市伊勢原四丁目1.399344
9CS021412000147神奈川県伊勢原市三ノ宮**********神奈川県伊勢原市伊勢原四丁目3.507680
\n
", "text/plain": " customer_id address_x address_y distance\n0 CS021313000114 神奈川県伊勢原市粟窪********** 神奈川県伊勢原市伊勢原四丁目 1.394409\n1 CS021313000025 神奈川県伊勢原市伊勢原********** 神奈川県伊勢原市伊勢原四丁目 0.474282\n2 CS021411000096 神奈川県伊勢原市高森********** 神奈川県伊勢原市伊勢原四丁目 2.480155\n3 CS021415000150 神奈川県伊勢原市上粕屋********** 神奈川県伊勢原市伊勢原四丁目 2.734723\n4 CS021313000046 神奈川県伊勢原市池端********** 神奈川県伊勢原市伊勢原四丁目 1.111911\n5 CS021103000002 神奈川県伊勢原市西富岡********** 神奈川県伊勢原市伊勢原四丁目 2.384941\n6 CS021214000028 神奈川県伊勢原市桜台********** 神奈川県伊勢原市伊勢原四丁目 1.399344\n7 CS021512000095 神奈川県伊勢原市沼目********** 神奈川県伊勢原市伊勢原四丁目 1.993991\n8 CS021613000002 神奈川県伊勢原市桜台********** 神奈川県伊勢原市伊勢原四丁目 1.399344\n9 CS021412000147 神奈川県伊勢原市三ノ宮********** 神奈川県伊勢原市伊勢原四丁目 3.507680" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-087: 顧客データフレーム(df_customer)では、異なる店舗での申込みなどにより同一顧客が複数登録されている。名前(customer_name)と郵便番号(postal_cd)が同じ顧客は同一顧客とみなし、1顧客1レコードとなるように名寄せした名寄顧客データフレーム(df_customer_u)を作成せよ。ただし、同一顧客に対しては売上金額合計が最も高いものを残すものとし、売上金額合計が同一もしくは売上実績の無い顧客については顧客ID(customer_id)の番号が小さいものを残すこととする。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_receipt.groupby('customer_id').agg({'amount':sum}).reset_index()\ndf_customer_u = pd.merge(df_customer, df_tmp, how='left', on='customer_id').sort_values(['amount', 'customer_id']\n , ascending=[False, True])\ndf_customer_u.drop_duplicates(subset=['customer_name', 'postal_cd'], keep='first', inplace=True)\n\nprint('減少数: ', len(df_customer) - len(df_customer_u))", "execution_count": 99, "outputs": [ { "output_type": "stream", "text": "減少数: 30\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-088: 前設問で作成したデータを元に、顧客データフレームに統合名寄IDを付与したデータフレーム(df_customer_n)を作成せよ。ただし、統合名寄IDは以下の仕様で付与するものとする。\n>\n> - 重複していない顧客:顧客ID(customer_id)を設定\n> - 重複している顧客:前設問で抽出したレコードの顧客IDを設定" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_customer_n = pd.merge(df_customer, df_customer_u[['customer_name', 'postal_cd', 'customer_id']],\n how='inner', on =['customer_name', 'postal_cd'])\ndf_customer_n.rename(columns={'customer_id_x':'customer_id', 'customer_id_y':'integration_id'}, inplace=True)\n\nprint('ID数の差', len(df_customer_n['customer_id'].unique()) - len(df_customer_n['integration_id'].unique()))", "execution_count": 100, "outputs": [ { "output_type": "stream", "text": "ID数の差 30\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-閑話: df_customer_1, df_customer_nは使わないので削除する。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "del df_customer_1\ndel df_customer_n", "execution_count": 101, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-089: 売上実績のある顧客に対し、予測モデル構築のため学習用データとテスト用データに分割したい。それぞれ8:2の割合でランダムにデータを分割せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.merge(df_customer, df_receipt['customer_id'], how='inner', on='customer_id')\ndf_train, df_test = train_test_split(df_tmp, test_size=0.2, random_state=71)\nprint('学習データ割合: ', len(df_train) / len(df_tmp))\nprint('テストデータ割合: ', len(df_test) / len(df_tmp))", "execution_count": 102, "outputs": [ { "output_type": "stream", "text": "学習データ割合: 0.7999908650771901\nテストデータ割合: 0.2000091349228099\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-090: レシート明細データフレーム(df_receipt)は2017年1月1日〜2019年10月31日までのデータを有している。売上金額(amount)を月次で集計し、学習用に12ヶ月、テスト用に6ヶ月のモデル構築用データを3セット作成せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = df_receipt[['sales_ymd', 'amount']].copy()\ndf_tmp['sales_ym'] = df_tmp['sales_ymd'].astype('str').str[0:6]\ndf_tmp = df_tmp.groupby('sales_ym').agg({'amount':'sum'}).reset_index()\n\n# 関数化することで長期間データに対する多数のデータセットもループなどで処理できるようにする\ndef split_data(df, train_size, test_size, slide_window, start_point):\n train_start = start_point * slide_window\n test_start = train_start + train_size\n return df[train_start : test_start], df[test_start : test_start + test_size]\n\ndf_train_1, df_test_1 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=0)\ndf_train_2, df_test_2 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=1)\ndf_train_3, df_test_3 = split_data(df_tmp, train_size=12, test_size=6, slide_window=6, start_point=2)", "execution_count": 103, "outputs": [] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_train_1", "execution_count": 104, "outputs": [ { "output_type": "execute_result", "execution_count": 104, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymamount
0201701902056
1201702764413
2201703962945
3201704847566
4201705884010
5201706894242
6201707959205
7201708954836
8201709902037
9201710905739
10201711932157
11201712939654
\n
", "text/plain": " sales_ym amount\n0 201701 902056\n1 201702 764413\n2 201703 962945\n3 201704 847566\n4 201705 884010\n5 201706 894242\n6 201707 959205\n7 201708 954836\n8 201709 902037\n9 201710 905739\n10 201711 932157\n11 201712 939654" }, "metadata": {} } ] }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_test_1", "execution_count": 105, "outputs": [ { "output_type": "execute_result", "execution_count": 105, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
sales_ymamount
12201801944509
13201802864128
14201803946588
15201804937099
162018051004438
172018061012329
\n
", "text/plain": " sales_ym amount\n12 201801 944509\n13 201802 864128\n14 201803 946588\n15 201804 937099\n16 201805 1004438\n17 201806 1012329" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-091: 顧客データフレーム(df_customer)の各顧客に対し、売上実績のある顧客数と売上実績のない顧客数が1:1となるようにアンダーサンプリングで抽出せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "#unbalancedのubUnderを使った例\ndf_tmp = df_receipt.groupby('customer_id').agg({'amount':'sum'}).reset_index()\ndf_tmp = pd.merge(df_customer, df_tmp, how='left', on='customer_id')\ndf_tmp['buy_flg'] = df_tmp['amount'].apply(lambda x: 0 if np.isnan(x) else 1)\n\nprint('0の件数', len(df_tmp.query('buy_flg == 0')))\nprint('1の件数', len(df_tmp.query('buy_flg == 1')))\n\npositive_count = len(df_tmp.query('buy_flg == 1'))\n\nrs = RandomUnderSampler(random_state=71)\n\ndf_sample, _ = rs.fit_sample(df_tmp, df_tmp.buy_flg)\n\nprint('0の件数', len(df_sample.query('buy_flg == 0')))\nprint('1の件数', len(df_sample.query('buy_flg == 1')))", "execution_count": 106, "outputs": [ { "output_type": "stream", "text": "0の件数 13665\n1の件数 8306\n0の件数 8306\n1の件数 8306\n", "name": "stdout" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-092: 顧客データフレーム(df_customer)では、性別に関する情報が非正規化の状態で保持されている。これを第三正規化せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_gender = df_customer[['gender_cd', 'gender']].drop_duplicates()\ndf_customer_s = df_customer.drop(columns='gender')", "execution_count": 107, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-093: 商品データフレーム(df_product)では各カテゴリのコード値だけを保有し、カテゴリ名は保有していない。カテゴリデータフレーム(df_category)と組み合わせて非正規化し、カテゴリ名を保有した新たな商品データフレームを作成せよ。" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_full = pd.merge(df_product, df_category[['category_small_cd', \n 'category_major_name',\n 'category_medium_name',\n 'category_small_name']], \n how = 'inner', on = 'category_small_cd')", "execution_count": 108, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-094: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "os.makedirs('./data')\ndf_product_full.to_csv('./data/P_df_product_full_UTF-8_header.csv', encoding='UTF-8', index=False)", "execution_count": 109, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-095: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはCP932" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_full.to_csv('./data/P_df_product_full_CP932_header.csv', encoding='CP932', index=False)", "execution_count": 110, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-096: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ無し\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_full.to_csv('./data/P_df_product_full_UTF-8_noh.csv', header=False ,encoding='UTF-8', index=False)", "execution_count": 111, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-097: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.read_csv('./data/P_df_product_full_UTF-8_header.csv')\ndf_tmp.head(10)", "execution_count": 112, "outputs": [ { "output_type": "execute_result", "execution_count": 112, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costcategory_major_namecategory_medium_namecategory_small_name
0P040101001440140101198.0149.0惣菜御飯類弁当類
1P040101002440140101218.0164.0惣菜御飯類弁当類
2P040101003440140101230.0173.0惣菜御飯類弁当類
3P040101004440140101248.0186.0惣菜御飯類弁当類
4P040101005440140101268.0201.0惣菜御飯類弁当類
5P040101006440140101298.0224.0惣菜御飯類弁当類
6P040101007440140101338.0254.0惣菜御飯類弁当類
7P040101008440140101420.0315.0惣菜御飯類弁当類
8P040101009440140101498.0374.0惣菜御飯類弁当類
9P040101010440140101580.0435.0惣菜御飯類弁当類
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_major_name category_medium_name \\\n0 198.0 149.0 惣菜 御飯類 \n1 218.0 164.0 惣菜 御飯類 \n2 230.0 173.0 惣菜 御飯類 \n3 248.0 186.0 惣菜 御飯類 \n4 268.0 201.0 惣菜 御飯類 \n5 298.0 224.0 惣菜 御飯類 \n6 338.0 254.0 惣菜 御飯類 \n7 420.0 315.0 惣菜 御飯類 \n8 498.0 374.0 惣菜 御飯類 \n9 580.0 435.0 惣菜 御飯類 \n\n category_small_name \n0 弁当類 \n1 弁当類 \n2 弁当類 \n3 弁当類 \n4 弁当類 \n5 弁当類 \n6 弁当類 \n7 弁当類 \n8 弁当類 \n9 弁当類 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-098: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はCSV(カンマ区切り)\n> - ヘッダ無し\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.read_csv('./data/P_df_product_full_UTF-8_noh.csv', header=None)\ndf_tmp.head(10)", "execution_count": 113, "outputs": [ { "output_type": "execute_result", "execution_count": 113, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
012345678
0P040101001440140101198.0149.0惣菜御飯類弁当類
1P040101002440140101218.0164.0惣菜御飯類弁当類
2P040101003440140101230.0173.0惣菜御飯類弁当類
3P040101004440140101248.0186.0惣菜御飯類弁当類
4P040101005440140101268.0201.0惣菜御飯類弁当類
5P040101006440140101298.0224.0惣菜御飯類弁当類
6P040101007440140101338.0254.0惣菜御飯類弁当類
7P040101008440140101420.0315.0惣菜御飯類弁当類
8P040101009440140101498.0374.0惣菜御飯類弁当類
9P040101010440140101580.0435.0惣菜御飯類弁当類
\n
", "text/plain": " 0 1 2 3 4 5 6 7 8\n0 P040101001 4 401 40101 198.0 149.0 惣菜 御飯類 弁当類\n1 P040101002 4 401 40101 218.0 164.0 惣菜 御飯類 弁当類\n2 P040101003 4 401 40101 230.0 173.0 惣菜 御飯類 弁当類\n3 P040101004 4 401 40101 248.0 186.0 惣菜 御飯類 弁当類\n4 P040101005 4 401 40101 268.0 201.0 惣菜 御飯類 弁当類\n5 P040101006 4 401 40101 298.0 224.0 惣菜 御飯類 弁当類\n6 P040101007 4 401 40101 338.0 254.0 惣菜 御飯類 弁当類\n7 P040101008 4 401 40101 420.0 315.0 惣菜 御飯類 弁当類\n8 P040101009 4 401 40101 498.0 374.0 惣菜 御飯類 弁当類\n9 P040101010 4 401 40101 580.0 435.0 惣菜 御飯類 弁当類" }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-099: 先に作成したカテゴリ名付き商品データを以下の仕様でファイル出力せよ。なお、出力先のパスはdata配下とする。\n>\n> - ファイル形式はTSV(タブ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_product_full.to_csv('./data/P_df_product_full_UTF-8_header.tsv', sep='\\t', encoding='UTF-8', index=False)", "execution_count": 114, "outputs": [] }, { "metadata": {}, "cell_type": "markdown", "source": "---\n> P-100: 先に作成した以下形式のファイルを読み込み、データフレームを作成せよ。また、先頭10件を表示させ、正しくとりまれていることを確認せよ。\n>\n> - ファイル形式はTSV(タブ区切り)\n> - ヘッダ有り\n> - 文字コードはUTF-8" }, { "metadata": { "trusted": true }, "cell_type": "code", "source": "df_tmp = pd.read_table('./data/P_df_product_full_UTF-8_header.tsv', encoding='UTF-8')\ndf_tmp.head(10)", "execution_count": 115, "outputs": [ { "output_type": "execute_result", "execution_count": 115, "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costcategory_major_namecategory_medium_namecategory_small_name
0P040101001440140101198.0149.0惣菜御飯類弁当類
1P040101002440140101218.0164.0惣菜御飯類弁当類
2P040101003440140101230.0173.0惣菜御飯類弁当類
3P040101004440140101248.0186.0惣菜御飯類弁当類
4P040101005440140101268.0201.0惣菜御飯類弁当類
5P040101006440140101298.0224.0惣菜御飯類弁当類
6P040101007440140101338.0254.0惣菜御飯類弁当類
7P040101008440140101420.0315.0惣菜御飯類弁当類
8P040101009440140101498.0374.0惣菜御飯類弁当類
9P040101010440140101580.0435.0惣菜御飯類弁当類
\n
", "text/plain": " product_cd category_major_cd category_medium_cd category_small_cd \\\n0 P040101001 4 401 40101 \n1 P040101002 4 401 40101 \n2 P040101003 4 401 40101 \n3 P040101004 4 401 40101 \n4 P040101005 4 401 40101 \n5 P040101006 4 401 40101 \n6 P040101007 4 401 40101 \n7 P040101008 4 401 40101 \n8 P040101009 4 401 40101 \n9 P040101010 4 401 40101 \n\n unit_price unit_cost category_major_name category_medium_name \\\n0 198.0 149.0 惣菜 御飯類 \n1 218.0 164.0 惣菜 御飯類 \n2 230.0 173.0 惣菜 御飯類 \n3 248.0 186.0 惣菜 御飯類 \n4 268.0 201.0 惣菜 御飯類 \n5 298.0 224.0 惣菜 御飯類 \n6 338.0 254.0 惣菜 御飯類 \n7 420.0 315.0 惣菜 御飯類 \n8 498.0 374.0 惣菜 御飯類 \n9 580.0 435.0 惣菜 御飯類 \n\n category_small_name \n0 弁当類 \n1 弁当類 \n2 弁当類 \n3 弁当類 \n4 弁当類 \n5 弁当類 \n6 弁当類 \n7 弁当類 \n8 弁当類 \n9 弁当類 " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "# これで100本終わりです。おつかれさまでした!" } ], "metadata": { "kernelspec": { "name": "python36", "display_name": "Python 3.6", "language": "python" }, "language_info": { "mimetype": "text/x-python", "nbconvert_exporter": "python", "name": "python", "pygments_lexer": "ipython3", "version": "3.6.6", "file_extension": ".py", "codemirror_mode": { "version": 3, "name": "ipython" } } }, "nbformat": 4, "nbformat_minor": 4 }