{ "cells": [ { "cell_type": "markdown", "id": "c6bfcf58-b95d-45b4-a395-6e851c347f7f", "metadata": { "tags": [] }, "source": [ "# About" ] }, { "cell_type": "markdown", "id": "5d083f9e-dafd-4ca6-89a9-c76e9513c587", "metadata": {}, "source": [ "This notebook explores the conversion dataset.\n", "\n", "The `conversions.tsv` dataset has one row per search conversion. \n", "\n", "The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. \n", "\n", "[Source](https://github.com/unsplash/datasets/blob/master/DOCS.md)\n", "\n", "\n", "We will use this dataset to understand the type of queries, that users in the platform are searching." ] }, { "cell_type": "markdown", "id": "6020f52f-145b-45ce-8b21-c05f060ad301", "metadata": { "tags": [] }, "source": [ "# Exploring the data" ] }, { "cell_type": "code", "execution_count": 1, "id": "34f2eede-56da-4319-8df5-5cf45336484f", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.133017Z", "iopub.status.busy": "2023-04-25T16:30:53.132603Z", "iopub.status.idle": "2023-04-25T16:30:53.215007Z", "shell.execute_reply": "2023-04-25T16:30:53.213149Z", "shell.execute_reply.started": "2023-04-25T16:30:53.132981Z" } }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "91428786-d229-4748-a91e-2829d834a674", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.216683Z", "iopub.status.busy": "2023-04-25T16:30:53.216378Z", "iopub.status.idle": "2023-04-25T16:30:53.308980Z", "shell.execute_reply": "2023-04-25T16:30:53.308050Z", "shell.execute_reply.started": "2023-04-25T16:30:53.216661Z" } }, "outputs": [], "source": [ "pd.set_option('display.max_rows', 100)\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "9d2b5364-acf5-47c4-a765-1a691435e139", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.310150Z", "iopub.status.busy": "2023-04-25T16:30:53.309865Z", "iopub.status.idle": "2023-04-25T16:30:53.319703Z", "shell.execute_reply": "2023-04-25T16:30:53.318898Z", "shell.execute_reply.started": "2023-04-25T16:30:53.310123Z" } }, "outputs": [], "source": [ "path = \"../data/raw/conversions.tsv000\"\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "dc7046a8-6a05-41ce-811e-c5df1223a226", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.321113Z", "iopub.status.busy": "2023-04-25T16:30:53.320603Z", "iopub.status.idle": "2023-04-25T16:31:18.524555Z", "shell.execute_reply": "2023-04-25T16:31:18.523773Z", "shell.execute_reply.started": "2023-04-25T16:30:53.321087Z" } }, "outputs": [], "source": [ "df = pd.read_csv(path,sep=\"\\t\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "a156cb80-e608-40d2-81ba-cdd0e874b819", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.527688Z", "iopub.status.busy": "2023-04-25T16:31:18.527041Z", "iopub.status.idle": "2023-04-25T16:31:18.533732Z", "shell.execute_reply": "2023-04-25T16:31:18.532900Z", "shell.execute_reply.started": "2023-04-25T16:31:18.527652Z" } }, "outputs": [ { "data": { "text/plain": [ "12166088" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "id": "e9aa396d-c335-47cd-b01d-ffb3f7b743ab", "metadata": {}, "source": [ "sample view of the data" ] }, { "cell_type": "code", "execution_count": 6, "id": "745cd38a-a41e-44c4-a836-8601ebfd043f", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.535011Z", "iopub.status.busy": "2023-04-25T16:31:18.534779Z", "iopub.status.idle": "2023-04-25T16:31:18.627594Z", "shell.execute_reply": "2023-04-25T16:31:18.626599Z", "shell.execute_reply.started": "2023-04-25T16:31:18.534991Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>converted_at</th>\n", " <th>conversion_type</th>\n", " <th>keyword</th>\n", " <th>photo_id</th>\n", " <th>anonymous_user_id</th>\n", " <th>conversion_country</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2020-07-29 00:08:04.221</td>\n", " <td>download</td>\n", " <td>clouds</td>\n", " <td>ABmygVJcYgY</td>\n", " <td>dd01ebdd-7691-4518-ab19-b2105782ae8b</td>\n", " <td>VE</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2020-07-29 00:25:23.426</td>\n", " <td>download</td>\n", " <td>shark</td>\n", " <td>fB2jl6Rb3l4</td>\n", " <td>c48ba6e0-c6a7-4a92-b569-fe57808a8a2c</td>\n", " <td>QA</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2020-07-29 00:26:13.122</td>\n", " <td>download</td>\n", " <td>dogs</td>\n", " <td>k1hbfag2na0</td>\n", " <td>62c4f043-579c-438f-8815-eb8ba3c54d34</td>\n", " <td>KR</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>2020-07-29 00:37:03.308</td>\n", " <td>download</td>\n", " <td>astronaut</td>\n", " <td>-SyUjRlHauQ</td>\n", " <td>7ad6dc18-a02e-4ba2-b93c-fd7ea2e551d8</td>\n", " <td>JP</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>2020-07-29 00:54:28.942</td>\n", " <td>download</td>\n", " <td>red roses</td>\n", " <td>A0iTJUhK4es</td>\n", " <td>f03a5708-32e4-4fae-8210-3c5d2632cbfb</td>\n", " <td>NZ</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " converted_at conversion_type keyword photo_id \\\n", "0 2020-07-29 00:08:04.221 download clouds ABmygVJcYgY \n", "1 2020-07-29 00:25:23.426 download shark fB2jl6Rb3l4 \n", "2 2020-07-29 00:26:13.122 download dogs k1hbfag2na0 \n", "3 2020-07-29 00:37:03.308 download astronaut -SyUjRlHauQ \n", "4 2020-07-29 00:54:28.942 download red roses A0iTJUhK4es \n", "\n", " anonymous_user_id conversion_country \n", "0 dd01ebdd-7691-4518-ab19-b2105782ae8b VE \n", "1 c48ba6e0-c6a7-4a92-b569-fe57808a8a2c QA \n", "2 62c4f043-579c-438f-8815-eb8ba3c54d34 KR \n", "3 7ad6dc18-a02e-4ba2-b93c-fd7ea2e551d8 JP \n", "4 f03a5708-32e4-4fae-8210-3c5d2632cbfb NZ " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "3cd9509a-2cc9-452a-ad92-4c4ea5fd7df0", "metadata": {}, "source": [ "Get top queries" ] }, { "cell_type": "code", "execution_count": 7, "id": "6738ac94-950f-45ef-89a9-f558e83ea151", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.629372Z", "iopub.status.busy": "2023-04-25T16:31:18.628998Z", "iopub.status.idle": "2023-04-25T16:31:20.872575Z", "shell.execute_reply": "2023-04-25T16:31:20.871772Z", "shell.execute_reply.started": "2023-04-25T16:31:18.629345Z" } }, "outputs": [], "source": [ "df_res = df.groupby([\"keyword\"], as_index=False)\\\n", " .size()\\\n", " .sort_values(\"size\", ascending=False)\\\n", " .rename(columns={'size':'num_searches'})" ] }, { "cell_type": "code", "execution_count": 8, "id": "6500ce65-b8f2-4abf-b796-e80a554c92b4", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.874034Z", "iopub.status.busy": "2023-04-25T16:31:20.873707Z", "iopub.status.idle": "2023-04-25T16:31:20.879505Z", "shell.execute_reply": "2023-04-25T16:31:20.878517Z", "shell.execute_reply.started": "2023-04-25T16:31:20.873970Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of unique queries: 569996 \n" ] } ], "source": [ "print (f\"Number of unique queries: {len(df_res)} \")" ] }, { "cell_type": "code", "execution_count": 9, "id": "98918b39-a639-42ea-a25c-e17bc0c51c8d", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.881180Z", "iopub.status.busy": "2023-04-25T16:31:20.880529Z", "iopub.status.idle": "2023-04-25T16:31:20.894394Z", "shell.execute_reply": "2023-04-25T16:31:20.893375Z", "shell.execute_reply.started": "2023-04-25T16:31:20.881155Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>keyword</th>\n", " <th>num_searches</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>334943</th>\n", " <td>nature</td>\n", " <td>381173</td>\n", " </tr>\n", " <tr>\n", " <th>445718</th>\n", " <td>sky</td>\n", " <td>239848</td>\n", " </tr>\n", " <tr>\n", " <th>193034</th>\n", " <td>flowers</td>\n", " <td>202391</td>\n", " </tr>\n", " <tr>\n", " <th>333735</th>\n", " <td>natural</td>\n", " <td>196189</td>\n", " </tr>\n", " <tr>\n", " <th>189492</th>\n", " <td>flower</td>\n", " <td>175126</td>\n", " </tr>\n", " <tr>\n", " <th>431887</th>\n", " <td>sea</td>\n", " <td>165744</td>\n", " </tr>\n", " <tr>\n", " <th>325200</th>\n", " <td>mountain</td>\n", " <td>161816</td>\n", " </tr>\n", " <tr>\n", " <th>198609</th>\n", " <td>forest</td>\n", " <td>153677</td>\n", " </tr>\n", " <tr>\n", " <th>350461</th>\n", " <td>ocean</td>\n", " <td>145435</td>\n", " </tr>\n", " <tr>\n", " <th>45100</th>\n", " <td>beach</td>\n", " <td>136862</td>\n", " </tr>\n", " <tr>\n", " <th>460237</th>\n", " <td>space</td>\n", " <td>120184</td>\n", " </tr>\n", " <tr>\n", " <th>146484</th>\n", " <td>dog</td>\n", " <td>112637</td>\n", " </tr>\n", " <tr>\n", " <th>328262</th>\n", " <td>mountains</td>\n", " <td>111005</td>\n", " </tr>\n", " <tr>\n", " <th>533443</th>\n", " <td>water</td>\n", " <td>109987</td>\n", " </tr>\n", " <tr>\n", " <th>320914</th>\n", " <td>moon</td>\n", " <td>106111</td>\n", " </tr>\n", " <tr>\n", " <th>550361</th>\n", " <td>winter</td>\n", " <td>89541</td>\n", " </tr>\n", " <tr>\n", " <th>90851</th>\n", " <td>cat</td>\n", " <td>87984</td>\n", " </tr>\n", " <tr>\n", " <th>528686</th>\n", " <td>wallpaper</td>\n", " <td>87079</td>\n", " </tr>\n", " <tr>\n", " <th>19880</th>\n", " <td>animal</td>\n", " <td>79378</td>\n", " </tr>\n", " <tr>\n", " <th>507852</th>\n", " <td>tree</td>\n", " <td>78697</td>\n", " </tr>\n", " <tr>\n", " <th>345303</th>\n", " <td>night sky</td>\n", " <td>77892</td>\n", " </tr>\n", " <tr>\n", " <th>482869</th>\n", " <td>sunset</td>\n", " <td>75404</td>\n", " </tr>\n", " <tr>\n", " <th>343674</th>\n", " <td>night</td>\n", " <td>74551</td>\n", " </tr>\n", " <tr>\n", " <th>278629</th>\n", " <td>landscape</td>\n", " <td>72824</td>\n", " </tr>\n", " <tr>\n", " <th>481555</th>\n", " <td>sunrise</td>\n", " <td>72290</td>\n", " </tr>\n", " <tr>\n", " <th>161638</th>\n", " <td>earth</td>\n", " <td>70303</td>\n", " </tr>\n", " <tr>\n", " <th>21578</th>\n", " <td>animals</td>\n", " <td>68001</td>\n", " </tr>\n", " <tr>\n", " <th>518127</th>\n", " <td>universe</td>\n", " <td>66711</td>\n", " </tr>\n", " <tr>\n", " <th>475829</th>\n", " <td>summer</td>\n", " <td>66019</td>\n", " </tr>\n", " <tr>\n", " <th>387151</th>\n", " <td>plant</td>\n", " <td>64406</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " keyword num_searches\n", "334943 nature 381173\n", "445718 sky 239848\n", "193034 flowers 202391\n", "333735 natural 196189\n", "189492 flower 175126\n", "431887 sea 165744\n", "325200 mountain 161816\n", "198609 forest 153677\n", "350461 ocean 145435\n", "45100 beach 136862\n", "460237 space 120184\n", "146484 dog 112637\n", "328262 mountains 111005\n", "533443 water 109987\n", "320914 moon 106111\n", "550361 winter 89541\n", "90851 cat 87984\n", "528686 wallpaper 87079\n", "19880 animal 79378\n", "507852 tree 78697\n", "345303 night sky 77892\n", "482869 sunset 75404\n", "343674 night 74551\n", "278629 landscape 72824\n", "481555 sunrise 72290\n", "161638 earth 70303\n", "21578 animals 68001\n", "518127 universe 66711\n", "475829 summer 66019\n", "387151 plant 64406" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_res.head(30)" ] }, { "cell_type": "markdown", "id": "c8349ec7-0112-4b7a-b1c5-a52e4b33a221", "metadata": { "tags": [] }, "source": [ "## What can we say about the typical queries ?" ] }, { "cell_type": "markdown", "id": "0548f316-6452-4eab-9740-838176ea683b", "metadata": {}, "source": [ "- Most of the queries seem to be under <3 keywords.\n", "- Users in the platform are interested in nature\n", "- no normalizations is done for the queries; animal vs animals ; vs mountain vs mountains" ] }, { "cell_type": "code", "execution_count": null, "id": "bf35d26c-9a62-4e67-b810-1d552dc9e822", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0ab13652-0433-40c0-9044-e21fc7ac22c6", "metadata": {}, "source": [ "Queries like above with \"broad\" intent are not that useful for comparing results" ] }, { "cell_type": "code", "execution_count": null, "id": "ba1ef62a-fdab-4a33-8f89-e20e0c6d2fbc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6e8e5731-c4f5-45c8-9a15-76c8758e1845", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fc047f35-f156-4a2c-a0e3-d629ddaca3ff", "metadata": {}, "source": [ "## Exploring Longer Queries" ] }, { "cell_type": "code", "execution_count": 10, "id": "a5f9bcd7-6c07-4b87-8910-81df7affaea4", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.895617Z", "iopub.status.busy": "2023-04-25T16:31:20.895394Z", "iopub.status.idle": "2023-04-25T16:31:21.322740Z", "shell.execute_reply": "2023-04-25T16:31:21.321881Z", "shell.execute_reply.started": "2023-04-25T16:31:20.895597Z" } }, "outputs": [], "source": [ "df_res[\"num_keywords\"] = df_res[\"keyword\"].apply(lambda x: len(x.split(\" \")))" ] }, { "cell_type": "code", "execution_count": 11, "id": "049b1fe8-cc95-4e1a-956f-81564bd75c16", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.324017Z", "iopub.status.busy": "2023-04-25T16:31:21.323750Z", "iopub.status.idle": "2023-04-25T16:31:21.369322Z", "shell.execute_reply": "2023-04-25T16:31:21.368403Z", "shell.execute_reply.started": "2023-04-25T16:31:21.323993Z" } }, "outputs": [], "source": [ "df_long_queries = df_res[(df_res[\"num_keywords\"] > 1) ]" ] }, { "cell_type": "code", "execution_count": 12, "id": "c06f5c52-1240-432b-aa0a-07b6fb6427ce", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.370917Z", "iopub.status.busy": "2023-04-25T16:31:21.370534Z", "iopub.status.idle": "2023-04-25T16:31:21.385868Z", "shell.execute_reply": "2023-04-25T16:31:21.385104Z", "shell.execute_reply.started": "2023-04-25T16:31:21.370890Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>keyword</th>\n", " <th>num_searches</th>\n", " <th>num_keywords</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>327457</th>\n", " <td>mountain star landscape night sky</td>\n", " <td>779</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>287590</th>\n", " <td>light at the end of the tunnel</td>\n", " <td>308</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>499894</th>\n", " <td>there is no planet b</td>\n", " <td>242</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>276561</th>\n", " <td>lago di braies, braies, italy</td>\n", " <td>118</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>534678</th>\n", " <td>water droplets on a leaf</td>\n", " <td>106</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>224699</th>\n", " <td>great sand dunes national park</td>\n", " <td>94</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>258115</th>\n", " <td>image of a man in a desert</td>\n", " <td>82</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>274846</th>\n", " <td>konkan beach resort, ratnagiri, india</td>\n", " <td>73</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>335652</th>\n", " <td>nature backgrounds water ripple reflection</td>\n", " <td>67</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>459722</th>\n", " <td>south georgia and the south sandwich islands</td>\n", " <td>54</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>426421</th>\n", " <td>samsung note 10 lite wallpaper</td>\n", " <td>52</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>37977</th>\n", " <td>background image for google doc</td>\n", " <td>51</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>141140</th>\n", " <td>desert sunset nature landscape sky</td>\n", " <td>49</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>61505</th>\n", " <td>black grapes with wood plates</td>\n", " <td>44</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>454677</th>\n", " <td>snow mountain clear blue sky</td>\n", " <td>44</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>364313</th>\n", " <td>palm trees at the beach</td>\n", " <td>43</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>349030</th>\n", " <td>nova scotia duck tolling retriever</td>\n", " <td>43</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>499403</th>\n", " <td>the surface of the moon</td>\n", " <td>37</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>499623</th>\n", " <td>the waves of the sea</td>\n", " <td>37</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>263540</th>\n", " <td>iphone 11 pro max wallpaper</td>\n", " <td>36</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>27378</th>\n", " <td>art of table potted flower</td>\n", " <td>35</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>337991</th>\n", " <td>nature photos light colours</td>\n", " <td>35</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>504377</th>\n", " <td>torres del paine national park</td>\n", " <td>32</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>134706</th>\n", " <td>dark side of the moon</td>\n", " <td>31</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>67340</th>\n", " <td>blue sky and white clouds</td>\n", " <td>31</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>375214</th>\n", " <td>person on top of mountain</td>\n", " <td>31</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>277772</th>\n", " <td>lake with lotus and lilies photos</td>\n", " <td>29</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>50421</th>\n", " <td>beauitful wallpaper nature 8k</td>\n", " <td>29</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>287582</th>\n", " <td>light at end of tunnel</td>\n", " <td>28</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>26499</th>\n", " <td>ariel view of the ocean</td>\n", " <td>28</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>177964</th>\n", " <td>farmhouse rustic yellow and pink</td>\n", " <td>28</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>161094</th>\n", " <td>eagle flying in the sky</td>\n", " <td>27</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>569329</th>\n", " <td>沙漠青蛙 沙漠青蛙 desert frog</td>\n", " <td>26</td>\n", " <td>14</td>\n", " </tr>\n", " <tr>\n", " <th>415209</th>\n", " <td>ripley's aquarium of canada, toronto, canada</td>\n", " <td>26</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>425941</th>\n", " <td>salar de uyuni uyuni bolivia</td>\n", " <td>25</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>143650</th>\n", " <td>dew drops on a grass</td>\n", " <td>25</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>123267</th>\n", " <td>couple romdik love photo in tamil</td>\n", " <td>25</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>304156</th>\n", " <td>man on top of mountain</td>\n", " <td>25</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>193247</th>\n", " <td>flowers and plants and trees</td>\n", " <td>24</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>497287</th>\n", " <td>the butterfly atrium at hershey gardens</td>\n", " <td>24</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>277773</th>\n", " <td>lake with lotus and lily</td>\n", " <td>24</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>479245</th>\n", " <td>sun rise on a mountain</td>\n", " <td>24</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>203350</th>\n", " <td>free hd luminious backgrounds for photos</td>\n", " <td>24</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>298593</th>\n", " <td>lower antelope canyon, page, united states</td>\n", " <td>24</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>393216</th>\n", " <td>por do sol no mar</td>\n", " <td>23</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>142678</th>\n", " <td>desktop wallpapers 1920 x 1080</td>\n", " <td>22</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>534769</th>\n", " <td>water drops on the rose</td>\n", " <td>22</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>437160</th>\n", " <td>seven wonders of the world</td>\n", " <td>20</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>66424</th>\n", " <td>blue lake and green shore</td>\n", " <td>19</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>295937</th>\n", " <td>looking up to the sky</td>\n", " <td>19</td>\n", " <td>5</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " keyword num_searches \\\n", "327457 mountain star landscape night sky 779 \n", "287590 light at the end of the tunnel 308 \n", "499894 there is no planet b 242 \n", "276561 lago di braies, braies, italy 118 \n", "534678 water droplets on a leaf 106 \n", "224699 great sand dunes national park 94 \n", "258115 image of a man in a desert 82 \n", "274846 konkan beach resort, ratnagiri, india 73 \n", "335652 nature backgrounds water ripple reflection 67 \n", "459722 south georgia and the south sandwich islands 54 \n", "426421 samsung note 10 lite wallpaper 52 \n", "37977 background image for google doc 51 \n", "141140 desert sunset nature landscape sky 49 \n", "61505 black grapes with wood plates 44 \n", "454677 snow mountain clear blue sky 44 \n", "364313 palm trees at the beach 43 \n", "349030 nova scotia duck tolling retriever 43 \n", "499403 the surface of the moon 37 \n", "499623 the waves of the sea 37 \n", "263540 iphone 11 pro max wallpaper 36 \n", "27378 art of table potted flower 35 \n", "337991 nature photos light colours 35 \n", "504377 torres del paine national park 32 \n", "134706 dark side of the moon 31 \n", "67340 blue sky and white clouds 31 \n", "375214 person on top of mountain 31 \n", "277772 lake with lotus and lilies photos 29 \n", "50421 beauitful wallpaper nature 8k 29 \n", "287582 light at end of tunnel 28 \n", "26499 ariel view of the ocean 28 \n", "177964 farmhouse rustic yellow and pink 28 \n", "161094 eagle flying in the sky 27 \n", "569329 沙漠青蛙 沙漠青蛙 desert frog 26 \n", "415209 ripley's aquarium of canada, toronto, canada 26 \n", "425941 salar de uyuni uyuni bolivia 25 \n", "143650 dew drops on a grass 25 \n", "123267 couple romdik love photo in tamil 25 \n", "304156 man on top of mountain 25 \n", "193247 flowers and plants and trees 24 \n", "497287 the butterfly atrium at hershey gardens 24 \n", "277773 lake with lotus and lily 24 \n", "479245 sun rise on a mountain 24 \n", "203350 free hd luminious backgrounds for photos 24 \n", "298593 lower antelope canyon, page, united states 24 \n", "393216 por do sol no mar 23 \n", "142678 desktop wallpapers 1920 x 1080 22 \n", "534769 water drops on the rose 22 \n", "437160 seven wonders of the world 20 \n", "66424 blue lake and green shore 19 \n", "295937 looking up to the sky 19 \n", "\n", " num_keywords \n", "327457 5 \n", "287590 7 \n", "499894 5 \n", "276561 5 \n", "534678 5 \n", "224699 5 \n", "258115 7 \n", "274846 5 \n", "335652 5 \n", "459722 7 \n", "426421 5 \n", "37977 5 \n", "141140 5 \n", "61505 5 \n", "454677 5 \n", "364313 5 \n", "349030 5 \n", "499403 5 \n", "499623 5 \n", "263540 5 \n", "27378 5 \n", "337991 5 \n", "504377 5 \n", "134706 5 \n", "67340 5 \n", "375214 5 \n", "277772 6 \n", "50421 5 \n", "287582 5 \n", "26499 5 \n", "177964 5 \n", "161094 5 \n", "569329 14 \n", "415209 6 \n", "425941 5 \n", "143650 5 \n", "123267 6 \n", "304156 5 \n", "193247 5 \n", "497287 6 \n", "277773 5 \n", "479245 5 \n", "203350 6 \n", "298593 6 \n", "393216 5 \n", "142678 5 \n", "534769 5 \n", "437160 5 \n", "66424 5 \n", "295937 5 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_long_queries[df_long_queries.num_keywords > 4].head(50)" ] }, { "cell_type": "markdown", "id": "7c4d9566-4f47-4e50-8dde-5790c6466c4d", "metadata": {}, "source": [ "## Interesting Queries" ] }, { "cell_type": "markdown", "id": "49f0b698-ffde-4da4-9975-2ea8ea63c6b8", "metadata": {}, "source": [ "Detailed Intent\n", "- water droplets on a leaf\t\n", "- image of a man in a desert\t\n", "- person on top of mountain\t\n", "\n", "\n", "\n", "Location:\n", "- ripley's aquarium of canada, toronto, canada\t\n", "- the butterfly atrium at hershey gardens\t\n", "\n", "Non English Queries\n", "- salar de uyuni uyuni bolivia\t\n", "- 沙漠青蛙 沙漠青蛙 desert frog\t\n", "- por do sol no mar\t\n", "- conhece te a ti mesmo\t ( Greek for know thyself)\n", "\n", "\n", "Metaphors / Slogan:\n", "- light at the end of the tunnel\t\n", "- there is no planet b\t\n", "\n", "Multiple Candidates\n", "- seven wonders of the world\t\n", "\n", "Long Query / Single Intent\n", "- nova scotia duck tolling retriever\t ( dog breed)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7f9f81d3-ebfa-4489-920c-d81f0f9e98f3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "66c63f2d-219b-4ee8-ac61-3ca43a3b79bf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1d184a7c-87d2-4003-8082-6e1a38bcdf53", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8b57c441-ae37-47c4-b0b6-bf07ecb16ca4", "metadata": {}, "source": [ "Non frequently searched queries" ] }, { "cell_type": "code", "execution_count": 13, "id": "5e2d990b-b36f-4274-8b55-e8b71db3ef43", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.387111Z", "iopub.status.busy": "2023-04-25T16:31:21.386821Z", "iopub.status.idle": "2023-04-25T16:31:21.401449Z", "shell.execute_reply": "2023-04-25T16:31:21.400704Z", "shell.execute_reply.started": "2023-04-25T16:31:21.387073Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>keyword</th>\n", " <th>num_searches</th>\n", " <th>num_keywords</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>313105</th>\n", " <td>mid night star picture for youtube thumbnail</td>\n", " <td>1</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>119583</th>\n", " <td>cool gamer pics for free</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313060</th>\n", " <td>mid century gothic style rose painting</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>313079</th>\n", " <td>mid century modern interior design</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313077</th>\n", " <td>mid century modern home interior</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313076</th>\n", " <td>mid century modern home decor</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313160</th>\n", " <td>middle aged women beauty</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313148</th>\n", " <td>middle age is an age of many colors.</td>\n", " <td>1</td>\n", " <td>8</td>\n", " </tr>\n", " <tr>\n", " <th>313185</th>\n", " <td>middle east night in the desert</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>313694</th>\n", " <td>milky way at the sea</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313709</th>\n", " <td>milky way by the nasa</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>119302</th>\n", " <td>cool adventurous places one can visit with a b...</td>\n", " <td>1</td>\n", " <td>16</td>\n", " </tr>\n", " <tr>\n", " <th>119308</th>\n", " <td>cool and colorful wallpapers</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>119310</th>\n", " <td>cool and fun pictures of animals</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>313799</th>\n", " <td>milky way moon 3000x3000</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>119239</th>\n", " <td>cooking over a flame</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313744</th>\n", " <td>milky way galaxy and man</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313765</th>\n", " <td>milky way galaxy with people</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>119390</th>\n", " <td>cool beach romance for familly</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>119374</th>\n", " <td>cool backgrounds with cool wolves</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>313519</th>\n", " <td>miles pond, vt chamber of commerce</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>315756</th>\n", " <td>minimalist autumn wallpaper for mac</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118529</th>\n", " <td>constantia, cape town, south africa</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118506</th>\n", " <td>conserve energy hd images</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315504</th>\n", " <td>minimal windows 10 wallpaper plants</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315497</th>\n", " <td>minimal white pot with green leaves</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>316032</th>\n", " <td>minimalist nature black and white</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>316000</th>\n", " <td>minimalist lotus whitte background flower</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>316020</th>\n", " <td>minimalist motivation work hard beach</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>316101</th>\n", " <td>minimalist qoutes for travel wallpaper</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118232</th>\n", " <td>conhece te a ti mesmo</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315931</th>\n", " <td>minimalist gentle monochrome simple macro text...</td>\n", " <td>1</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>315959</th>\n", " <td>minimalist home decor with plants</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315905</th>\n", " <td>minimalist flower black and white</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314828</th>\n", " <td>minimal black and white background</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118809</th>\n", " <td>contaminated and counterfeited bottled water</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314994</th>\n", " <td>minimal flower on white background</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>314905</th>\n", " <td>minimal colorful art on white background</td>\n", " <td>1</td>\n", " <td>6</td>\n", " </tr>\n", " <tr>\n", " <th>314803</th>\n", " <td>minimal background texture nature plants</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314801</th>\n", " <td>minimal background stacks of magazines</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314765</th>\n", " <td>minimal background dark double screen</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118848</th>\n", " <td>contemporary architecture made from wood</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314790</th>\n", " <td>minimal background nature soft brown</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118850</th>\n", " <td>contemporary art gallery at night</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>314729</th>\n", " <td>minimal art black and white</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315333</th>\n", " <td>minimal scene with geometric forms.</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118589</th>\n", " <td>constellations in the night sky</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>315010</th>\n", " <td>minimal food flat lay background</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118714</th>\n", " <td>construction worker at the beach</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " <tr>\n", " <th>118716</th>\n", " <td>construction worker in the winter</td>\n", " <td>1</td>\n", " <td>5</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " keyword num_searches \\\n", "313105 mid night star picture for youtube thumbnail 1 \n", "119583 cool gamer pics for free 1 \n", "313060 mid century gothic style rose painting 1 \n", "313079 mid century modern interior design 1 \n", "313077 mid century modern home interior 1 \n", "313076 mid century modern home decor 1 \n", "313160 middle aged women beauty 1 \n", "313148 middle age is an age of many colors. 1 \n", "313185 middle east night in the desert 1 \n", "313694 milky way at the sea 1 \n", "313709 milky way by the nasa 1 \n", "119302 cool adventurous places one can visit with a b... 1 \n", "119308 cool and colorful wallpapers 1 \n", "119310 cool and fun pictures of animals 1 \n", "313799 milky way moon 3000x3000 1 \n", "119239 cooking over a flame 1 \n", "313744 milky way galaxy and man 1 \n", "313765 milky way galaxy with people 1 \n", "119390 cool beach romance for familly 1 \n", "119374 cool backgrounds with cool wolves 1 \n", "313519 miles pond, vt chamber of commerce 1 \n", "315756 minimalist autumn wallpaper for mac 1 \n", "118529 constantia, cape town, south africa 1 \n", "118506 conserve energy hd images 1 \n", "315504 minimal windows 10 wallpaper plants 1 \n", "315497 minimal white pot with green leaves 1 \n", "316032 minimalist nature black and white 1 \n", "316000 minimalist lotus whitte background flower 1 \n", "316020 minimalist motivation work hard beach 1 \n", "316101 minimalist qoutes for travel wallpaper 1 \n", "118232 conhece te a ti mesmo 1 \n", "315931 minimalist gentle monochrome simple macro text... 1 \n", "315959 minimalist home decor with plants 1 \n", "315905 minimalist flower black and white 1 \n", "314828 minimal black and white background 1 \n", "118809 contaminated and counterfeited bottled water 1 \n", "314994 minimal flower on white background 1 \n", "314905 minimal colorful art on white background 1 \n", "314803 minimal background texture nature plants 1 \n", "314801 minimal background stacks of magazines 1 \n", "314765 minimal background dark double screen 1 \n", "118848 contemporary architecture made from wood 1 \n", "314790 minimal background nature soft brown 1 \n", "118850 contemporary art gallery at night 1 \n", "314729 minimal art black and white 1 \n", "315333 minimal scene with geometric forms. 1 \n", "118589 constellations in the night sky 1 \n", "315010 minimal food flat lay background 1 \n", "118714 construction worker at the beach 1 \n", "118716 construction worker in the winter 1 \n", "\n", " num_keywords \n", "313105 7 \n", "119583 5 \n", "313060 6 \n", "313079 5 \n", "313077 5 \n", "313076 5 \n", "313160 5 \n", "313148 8 \n", "313185 6 \n", "313694 5 \n", "313709 5 \n", "119302 16 \n", "119308 5 \n", "119310 6 \n", "313799 5 \n", "119239 5 \n", "313744 5 \n", "313765 5 \n", "119390 5 \n", "119374 5 \n", "313519 6 \n", "315756 5 \n", "118529 5 \n", "118506 5 \n", "315504 5 \n", "315497 6 \n", "316032 5 \n", "316000 5 \n", "316020 5 \n", "316101 5 \n", "118232 5 \n", "315931 7 \n", "315959 5 \n", "315905 5 \n", "314828 5 \n", "118809 5 \n", "314994 6 \n", "314905 6 \n", "314803 5 \n", "314801 5 \n", "314765 5 \n", "118848 5 \n", "314790 5 \n", "118850 5 \n", "314729 5 \n", "315333 5 \n", "118589 5 \n", "315010 5 \n", "118714 5 \n", "118716 5 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_long_queries[df_long_queries.num_keywords > 4].tail(50)" ] }, { "cell_type": "markdown", "id": "0fdf13c0-a913-44f4-8786-b223cad1d0a7", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "49f898d8-5fd1-4475-a8f3-68c2e3d7f7b9", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "environment": { "kernel": "python3", "name": "pytorch-gpu.1-13.m107", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-13:m107" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }