"
],
"text/plain": [
" price surface\n",
"0 400000.0 150\n",
"1 350000.0 200\n",
"2 400000.0 120\n",
"3 150000.0 150"
]
},
"metadata": {
"tags": []
},
"execution_count": 281
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pBs1LmXgcgk5",
"colab_type": "text"
},
"source": [
"## Exploration statistique du dataframe\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "czsPJk69ihKw",
"colab_type": "text"
},
"source": [
"Avant même de corriger le dataframe, il peut être intéressant de réaliser une première exploration statistique, qui permet parfois de se rendre compte directement des données à corriger.\n",
"\n",
"df.describe() permet de calculer différents aggrégats (des statistiques de base) pour toutes les colonnes numériques."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ipSGQZLvcOIX",
"colab_type": "code",
"outputId": "2e82c102-6872-44f8-de69-0877c6d7a870",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
}
},
"source": [
"df.describe()"
],
"execution_count": 282,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
\n",
" \n",
" \n",
"
\n",
"
count
\n",
"
17.000000
\n",
"
17.000000
\n",
"
18.000000
\n",
"
\n",
"
\n",
"
mean
\n",
"
308817.647059
\n",
"
3.588235
\n",
"
166.388889
\n",
"
\n",
"
\n",
"
std
\n",
"
166402.795331
\n",
"
1.325652
\n",
"
50.695329
\n",
"
\n",
"
\n",
"
min
\n",
"
-100.000000
\n",
"
0.000000
\n",
"
100.000000
\n",
"
\n",
"
\n",
"
25%
\n",
"
230000.000000
\n",
"
3.000000
\n",
"
142.500000
\n",
"
\n",
"
\n",
"
50%
\n",
"
330000.000000
\n",
"
4.000000
\n",
"
165.000000
\n",
"
\n",
"
\n",
"
75%
\n",
"
400000.000000
\n",
"
5.000000
\n",
"
178.750000
\n",
"
\n",
"
\n",
"
max
\n",
"
700000.000000
\n",
"
5.000000
\n",
"
320.000000
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price rooms surface\n",
"count 17.000000 17.000000 18.000000\n",
"mean 308817.647059 3.588235 166.388889\n",
"std 166402.795331 1.325652 50.695329\n",
"min -100.000000 0.000000 100.000000\n",
"25% 230000.000000 3.000000 142.500000\n",
"50% 330000.000000 4.000000 165.000000\n",
"75% 400000.000000 5.000000 178.750000\n",
"max 700000.000000 5.000000 320.000000"
]
},
"metadata": {
"tags": []
},
"execution_count": 282
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bd8RQ1OpEI8q",
"colab_type": "text"
},
"source": [
"* On remarque directement des données incohérentes: le prix minimum est négatif, et le nombre de pièces minimum est 0. Il faut donc au moins corriger ces deux caractéristiques.\n",
"* Il manque des valeurs pour les colonnes price et rooms (17 au lieu de 18). (Note: df.dropna() permet d'éliminer ces lignes invalides)\n",
"* Les valeurs moyennes et maximales semble cohérentes."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gygRr631emVQ",
"colab_type": "text"
},
"source": [
"La méthode \"groupby\" permet de grouper les données selon leur adresse (on obtient un groupe par addresse unique), et la méthode \"count\" permet ensuite de compter les valeurs pour chaque groupe. Ceci permet de directement identifier les doublons, et les données non-valides.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "E6h4M8_4dHJ5",
"colab_type": "code",
"outputId": "354633c4-4998-4730-dfda-aafa7c77af84",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 572
}
},
"source": [
"df.groupby(\"address\").count()"
],
"execution_count": 283,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
"
\n",
"
address
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
NaN
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
2
\n",
"
2
\n",
"
2
\n",
"
2
\n",
"
\n",
"
\n",
"
Rue de Fer 25, 5000 Namur
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de Fer 27, Namur
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de Fer 28, Namur
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de Fer 29, Namur
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
2
\n",
"
2
\n",
"
2
\n",
"
2
\n",
"
\n",
"
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
Rue du Luxembourg 15, 1000 Bruxelles
\n",
"
1
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price rooms surface website\n",
"address \n",
"NaN 0 1 1 1\n",
"Porte de Namur 25, Bruxelles 1 1 1 1\n",
"Rue Saint-ghislain 30, 6224 Fleurus 1 1 1 1\n",
"Rue de Bruxelles 42, 5000 Namur 2 2 2 2\n",
"Rue de Fer 25, 5000 Namur 1 1 1 1\n",
"Rue de Fer 26, 5000 Namur 1 1 1 1\n",
"Rue de Fer 27, Namur 1 1 1 1\n",
"Rue de Fer 28, Namur 1 1 1 1\n",
"Rue de Fer 29, Namur 1 1 1 1\n",
"Rue de L'Eglise 42, Charleroi 1 1 1 1\n",
"Rue de la Closière 20, Fleurus 2 2 2 2\n",
"Rue de la Loi 50, Bruxelles 1 1 1 1\n",
"Rue de la Loi 51, Bruxelles 1 1 1 1\n",
"Rue de la Loi 52, Bruxelles 1 1 1 1\n",
"Rue de la Loi 53, Bruxelles 1 1 1 1\n",
"Rue du Luxembourg 15, 1000 Bruxelles 1 0 1 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 283
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Gi120EbAEMYS",
"colab_type": "text"
},
"source": [
"\n",
"* On distingue 2 doublons: Rue de Bruxelles 42, 5000 Namur et Rue de la Closière 20, Fleurus\n",
"\n",
"* On remarque aussi des données invalides pour deux adresses: il manque le prix pour \"NaN\", de le nombre de pièces pour \"Rue du Luxembourg 15, 1000 Bruxelles\"\n",
"* On aperçoit une adresse invalide: \"NaN\", qu'on peut enlever manuellement\n",
"\n",
"Note: df.dropna() permet d'éliminer directement ces données invalides (des NaN)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ri8VqV6lhD7t",
"colab_type": "text"
},
"source": [
"On peut réaliser le même processus pour l'autre colonne textuelle: website\n",
"\n",
"* On remarque en l'occurrence que plus de données viennent d'immoweb, et plus particulièrement que les données invalides viennent toutes de immovlan"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Asdqxln6g7Mj",
"colab_type": "code",
"outputId": "52d07f13-2559-42eb-8aca-0b31fab824dd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 141
}
},
"source": [
"df.groupby(\"website\").count()"
],
"execution_count": 284,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
\n",
"
\n",
"
website
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
immovlan
\n",
"
8
\n",
"
7
\n",
"
7
\n",
"
8
\n",
"
\n",
"
\n",
"
immoweb
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface\n",
"website \n",
"immovlan 8 7 7 8\n",
"immoweb 10 10 10 10"
]
},
"metadata": {
"tags": []
},
"execution_count": 284
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "grHijEqKVR2e",
"colab_type": "text"
},
"source": [
"## Correction du dataframe en une seule instruction\n",
"* Retire les données invalides:\n",
" * string à \"NaN\",\"\" ou \" \"\n",
" * nombre négatif\n",
" * nombre défini à NaN\n",
"* Retire les doublons"
]
},
{
"cell_type": "code",
"metadata": {
"id": "RSYtHkwfUSIr",
"colab_type": "code",
"outputId": "f41f67c7-edbc-433b-e7e6-ce889e9d34c2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 449
}
},
"source": [
"#Correction du dataframe en une seule instruction: retire toutes les données invalides et les doublons\n",
"df = df[~df.address.isin([\"NaN\",\"\",\" \"]) & \\\n",
" (df.price > 0) & (df.rooms > 0)]\\\n",
" .dropna()\\\n",
" .drop_duplicates(\"address\")\n",
"\n",
"df\n"
],
"execution_count": 285,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
1
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
200
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
5
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
2.0
\n",
"
175
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"1 Rue de Bruxelles 42, 5000 Namur 350000.0 5.0 200 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"5 Rue de la Closière 20, Fleurus 230000.0 2.0 175 immoweb\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 285
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4JgQN1tuVOuo",
"colab_type": "text"
},
"source": [
"## Correction étape par étape"
]
},
{
"cell_type": "code",
"metadata": {
"id": "t6hD3BYPbd4o",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 603
},
"outputId": "182cc9af-5ad2-4a09-f743-90b68a57d6c9"
},
"source": [
"#On reprend le dataframe non-corrigé pour comparer\n",
"df = df1\n",
"df"
],
"execution_count": 286,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
1
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
200
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
5
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
2.0
\n",
"
175
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
7
\n",
"
Rue de Fer 25, 5000 Namur
\n",
"
0.0
\n",
"
3.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
8
\n",
"
Rue du Luxembourg 15, 1000 Bruxelles
\n",
"
-100.0
\n",
"
NaN
\n",
"
100
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
9
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
100
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
10
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
4.0
\n",
"
200
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"1 Rue de Bruxelles 42, 5000 Namur 350000.0 5.0 200 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"5 Rue de la Closière 20, Fleurus 230000.0 2.0 175 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"7 Rue de Fer 25, 5000 Namur 0.0 3.0 170 immovlan\n",
"8 Rue du Luxembourg 15, 1000 Bruxelles -100.0 NaN 100 immovlan\n",
"9 NaN NaN 0.0 100 immovlan\n",
"10 Rue de Bruxelles 42, 5000 Namur 350000.0 4.0 200 immovlan\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 286
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xBZMkmqlOva3",
"colab_type": "code",
"outputId": "d0ccc9cc-4d8f-4773-cf79-4f974b6c2d9e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 572
}
},
"source": [
"#Pour les colonnes de texte, retirer les valeurs invalides (NaN ou \"\" ou \" \")\n",
"\n",
"#df = df[df.address != \"NaN\"]\n",
"#Ou pus complet\n",
"df = df[~df.address.isin([\"NaN\",\"\",\" \"])]\n",
"df"
],
"execution_count": 287,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
1
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
200
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
5
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
2.0
\n",
"
175
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
7
\n",
"
Rue de Fer 25, 5000 Namur
\n",
"
0.0
\n",
"
3.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
8
\n",
"
Rue du Luxembourg 15, 1000 Bruxelles
\n",
"
-100.0
\n",
"
NaN
\n",
"
100
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
10
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
4.0
\n",
"
200
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"1 Rue de Bruxelles 42, 5000 Namur 350000.0 5.0 200 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"5 Rue de la Closière 20, Fleurus 230000.0 2.0 175 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"7 Rue de Fer 25, 5000 Namur 0.0 3.0 170 immovlan\n",
"8 Rue du Luxembourg 15, 1000 Bruxelles -100.0 NaN 100 immovlan\n",
"10 Rue de Bruxelles 42, 5000 Namur 350000.0 4.0 200 immovlan\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 287
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "b9hJbbkVQkpI",
"colab_type": "code",
"outputId": "3557893c-9a30-419a-a16e-ab547305b5a7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 541
}
},
"source": [
"#Elimine les lignes contenant des NaN\n",
"df = df.dropna()\n",
"df"
],
"execution_count": 288,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
1
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
200
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
5
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
2.0
\n",
"
175
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
7
\n",
"
Rue de Fer 25, 5000 Namur
\n",
"
0.0
\n",
"
3.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
10
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
4.0
\n",
"
200
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"1 Rue de Bruxelles 42, 5000 Namur 350000.0 5.0 200 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"5 Rue de la Closière 20, Fleurus 230000.0 2.0 175 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"7 Rue de Fer 25, 5000 Namur 0.0 3.0 170 immovlan\n",
"10 Rue de Bruxelles 42, 5000 Namur 350000.0 4.0 200 immovlan\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 288
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "29AAOyEPQ034",
"colab_type": "code",
"outputId": "b7394d86-9de6-4b66-c6f2-4ca2af4f81a7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 511
}
},
"source": [
"#Retirer les valeurs numériques incohérentes\n",
"df = df[(df.price > 0) & (df.rooms > 0)]\n",
"df"
],
"execution_count": 289,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
1
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
200
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
5
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
2.0
\n",
"
175
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
10
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
4.0
\n",
"
200
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"1 Rue de Bruxelles 42, 5000 Namur 350000.0 5.0 200 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"5 Rue de la Closière 20, Fleurus 230000.0 2.0 175 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"10 Rue de Bruxelles 42, 5000 Namur 350000.0 4.0 200 immovlan\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 289
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Op6hoKP4kxgg",
"colab_type": "code",
"colab": {}
},
"source": [
"#On stocke la version avec doublons pour plus tard\n",
"dfdup = df"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Fw5esCh7R5dO",
"colab_type": "code",
"outputId": "b325226b-5958-45d6-f315-724f3dfcb43a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 449
}
},
"source": [
"#Retirer les doublons\n",
"\n",
"#df =df.drop_duplicates(\"address\")\n",
"#Par défaut, le premier élément est gardé. On peut aussi garder le dernier (ce qui gardera notamment plus d'éléments d'immovlan)\n",
"df = dfdup.drop_duplicates(\"address\", keep = \"last\")\n",
"df"
],
"execution_count": 291,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
10
\n",
"
Rue de Bruxelles 42, 5000 Namur
\n",
"
350000.0
\n",
"
4.0
\n",
"
200
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
11
\n",
"
Rue de la Loi 50, Bruxelles
\n",
"
700000.0
\n",
"
3.0
\n",
"
220
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
13
\n",
"
Rue de la Loi 52, Bruxelles
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
14
\n",
"
Rue de la Loi 53, Bruxelles
\n",
"
480000.0
\n",
"
5.0
\n",
"
170
\n",
"
immovlan
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
17
\n",
"
Rue de Fer 29, Namur
\n",
"
350000.0
\n",
"
5.0
\n",
"
180
\n",
"
immovlan
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"10 Rue de Bruxelles 42, 5000 Namur 350000.0 4.0 200 immovlan\n",
"11 Rue de la Loi 50, Bruxelles 700000.0 3.0 220 immovlan\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"13 Rue de la Loi 52, Bruxelles 400000.0 4.0 150 immovlan\n",
"14 Rue de la Loi 53, Bruxelles 480000.0 5.0 170 immovlan\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb\n",
"17 Rue de Fer 29, Namur 350000.0 5.0 180 immovlan"
]
},
"metadata": {
"tags": []
},
"execution_count": 291
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YbhbjCpPLkyP",
"colab_type": "code",
"outputId": "1375f9a7-f63d-44c0-9c26-01bddb605991",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
}
},
"source": [
"#filtrer les données d'immoweb\n",
"df[df.website == \"immoweb\"]"
],
"execution_count": 292,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
address
\n",
"
price
\n",
"
rooms
\n",
"
surface
\n",
"
website
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Rue de Fer 26, 5000 Namur
\n",
"
400000.0
\n",
"
4.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
2
\n",
"
Porte de Namur 25, Bruxelles
\n",
"
400000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
3
\n",
"
Rue de L'Eglise 42, Charleroi
\n",
"
150000.0
\n",
"
5.0
\n",
"
150
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
4
\n",
"
Rue Saint-ghislain 30, 6224 Fleurus
\n",
"
330000.0
\n",
"
5.0
\n",
"
320
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
6
\n",
"
Rue de la Closière 20, Fleurus
\n",
"
230000.0
\n",
"
3.0
\n",
"
170
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
12
\n",
"
Rue de la Loi 51, Bruxelles
\n",
"
280000.0
\n",
"
3.0
\n",
"
120
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
15
\n",
"
Rue de Fer 27, Namur
\n",
"
280000.0
\n",
"
3.0
\n",
"
140
\n",
"
immoweb
\n",
"
\n",
"
\n",
"
16
\n",
"
Rue de Fer 28, Namur
\n",
"
320000.0
\n",
"
4.0
\n",
"
160
\n",
"
immoweb
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" address price rooms surface website\n",
"0 Rue de Fer 26, 5000 Namur 400000.0 4.0 150 immoweb\n",
"2 Porte de Namur 25, Bruxelles 400000.0 3.0 120 immoweb\n",
"3 Rue de L'Eglise 42, Charleroi 150000.0 5.0 150 immoweb\n",
"4 Rue Saint-ghislain 30, 6224 Fleurus 330000.0 5.0 320 immoweb\n",
"6 Rue de la Closière 20, Fleurus 230000.0 3.0 170 immoweb\n",
"12 Rue de la Loi 51, Bruxelles 280000.0 3.0 120 immoweb\n",
"15 Rue de Fer 27, Namur 280000.0 3.0 140 immoweb\n",
"16 Rue de Fer 28, Namur 320000.0 4.0 160 immoweb"
]
},
"metadata": {
"tags": []
},
"execution_count": 292
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4mRu-QHMcJZ9",
"colab_type": "text"
},
"source": [
"## Aggrégats"
]
},
{
"cell_type": "code",
"metadata": {
"id": "UrEN2kqJcKq1",
"colab_type": "code",
"outputId": "72dadfd4-d7e6-4189-8abb-8ad5bde331f5",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
}
},
"source": [
"df.mean()"
],
"execution_count": 293,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"price 359230.769231\n",
"rooms 3.923077\n",
"surface 173.076923\n",
"dtype: float64"
]
},
"metadata": {
"tags": []
},
"execution_count": 293
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "lXghl-S6i5yQ",
"colab_type": "code",
"outputId": "27ffe1fe-ce47-4d99-d59a-d7aad1456abb",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
}
},
"source": [
"df[df.website == \"immoweb\"].mean()\n",
"#On remarque que les maisons d'immoweb sont en moyenne moins chère que la moyenne totale"
],
"execution_count": 294,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"price 298750.00\n",
"rooms 3.75\n",
"surface 166.25\n",
"dtype: float64"
]
},
"metadata": {
"tags": []
},
"execution_count": 294
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "343kdifbcMlm",
"colab_type": "code",
"outputId": "3d81ff5d-7f4a-4991-ddd3-9f2e6cf69376",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 123
}
},
"source": [
"df.count()"
],
"execution_count": 295,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"address 13\n",
"price 13\n",
"rooms 13\n",
"surface 13\n",
"website 13\n",
"dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 295
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "my_jdhJGb85j",
"colab_type": "text"
},
"source": [
"## Comparer deux groupes (aggrégats groupés)\n",
"Avec Pandas, il est très facile de comparer différents groupes d'échantillons, grâce à la méthode `pandas.DataFrame.groupby`, qui permet de faire des aggrégats groupés.\n",
"\n",
"Remarque: la méthode `pandas.DataFrame.groupby` renvoie en fait un objet d'un nouveau type (`GroupBy`), qui permet de gérer des groupements de données, et dispose de ses propres méthodes dont la liste complète peut se trouver ici: https://pandas.pydata.org/pandas-docs/stable/reference/groupby.html"
]
},
{
"cell_type": "code",
"metadata": {
"id": "lhGeXuXtMnye",
"colab_type": "code",
"outputId": "88b891e7-a494-4804-ef24-c80d65ace36d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
}
},
"source": [
"#Comparer les deux plateformes: c'est direct avec la méthode \"groupby\"\n",
"display(df.groupby(\"website\").count())\n",
"display(df.groupby(\"website\").mean())"
],
"execution_count": 296,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"