{ "cells": [ { "cell_type": "markdown", "id": "c6bfcf58-b95d-45b4-a395-6e851c347f7f", "metadata": { "tags": [] }, "source": [ "# About" ] }, { "cell_type": "markdown", "id": "5d083f9e-dafd-4ca6-89a9-c76e9513c587", "metadata": {}, "source": [ "This notebook explores the conversion dataset.\n", "\n", "The `conversions.tsv` dataset has one row per search conversion. \n", "\n", "The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. \n", "\n", "[Source](https://github.com/unsplash/datasets/blob/master/DOCS.md)\n", "\n", "\n", "We will use this dataset to understand the type of queries, that users in the platform are searching." ] }, { "cell_type": "markdown", "id": "6020f52f-145b-45ce-8b21-c05f060ad301", "metadata": { "tags": [] }, "source": [ "# Exploring the data" ] }, { "cell_type": "code", "execution_count": 1, "id": "34f2eede-56da-4319-8df5-5cf45336484f", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.133017Z", "iopub.status.busy": "2023-04-25T16:30:53.132603Z", "iopub.status.idle": "2023-04-25T16:30:53.215007Z", "shell.execute_reply": "2023-04-25T16:30:53.213149Z", "shell.execute_reply.started": "2023-04-25T16:30:53.132981Z" } }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "91428786-d229-4748-a91e-2829d834a674", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.216683Z", "iopub.status.busy": "2023-04-25T16:30:53.216378Z", "iopub.status.idle": "2023-04-25T16:30:53.308980Z", "shell.execute_reply": "2023-04-25T16:30:53.308050Z", "shell.execute_reply.started": "2023-04-25T16:30:53.216661Z" } }, "outputs": [], "source": [ "pd.set_option('display.max_rows', 100)\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "9d2b5364-acf5-47c4-a765-1a691435e139", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.310150Z", "iopub.status.busy": "2023-04-25T16:30:53.309865Z", "iopub.status.idle": "2023-04-25T16:30:53.319703Z", "shell.execute_reply": "2023-04-25T16:30:53.318898Z", "shell.execute_reply.started": "2023-04-25T16:30:53.310123Z" } }, "outputs": [], "source": [ "path = \"../data/raw/conversions.tsv000\"\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "dc7046a8-6a05-41ce-811e-c5df1223a226", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:30:53.321113Z", "iopub.status.busy": "2023-04-25T16:30:53.320603Z", "iopub.status.idle": "2023-04-25T16:31:18.524555Z", "shell.execute_reply": "2023-04-25T16:31:18.523773Z", "shell.execute_reply.started": "2023-04-25T16:30:53.321087Z" } }, "outputs": [], "source": [ "df = pd.read_csv(path,sep=\"\\t\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "a156cb80-e608-40d2-81ba-cdd0e874b819", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.527688Z", "iopub.status.busy": "2023-04-25T16:31:18.527041Z", "iopub.status.idle": "2023-04-25T16:31:18.533732Z", "shell.execute_reply": "2023-04-25T16:31:18.532900Z", "shell.execute_reply.started": "2023-04-25T16:31:18.527652Z" } }, "outputs": [ { "data": { "text/plain": [ "12166088" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "id": "e9aa396d-c335-47cd-b01d-ffb3f7b743ab", "metadata": {}, "source": [ "sample view of the data" ] }, { "cell_type": "code", "execution_count": 6, "id": "745cd38a-a41e-44c4-a836-8601ebfd043f", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.535011Z", "iopub.status.busy": "2023-04-25T16:31:18.534779Z", "iopub.status.idle": "2023-04-25T16:31:18.627594Z", "shell.execute_reply": "2023-04-25T16:31:18.626599Z", "shell.execute_reply.started": "2023-04-25T16:31:18.534991Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
converted_atconversion_typekeywordphoto_idanonymous_user_idconversion_country
02020-07-29 00:08:04.221downloadcloudsABmygVJcYgYdd01ebdd-7691-4518-ab19-b2105782ae8bVE
12020-07-29 00:25:23.426downloadsharkfB2jl6Rb3l4c48ba6e0-c6a7-4a92-b569-fe57808a8a2cQA
22020-07-29 00:26:13.122downloaddogsk1hbfag2na062c4f043-579c-438f-8815-eb8ba3c54d34KR
32020-07-29 00:37:03.308downloadastronaut-SyUjRlHauQ7ad6dc18-a02e-4ba2-b93c-fd7ea2e551d8JP
42020-07-29 00:54:28.942downloadred rosesA0iTJUhK4esf03a5708-32e4-4fae-8210-3c5d2632cbfbNZ
\n", "
" ], "text/plain": [ " converted_at conversion_type keyword photo_id \\\n", "0 2020-07-29 00:08:04.221 download clouds ABmygVJcYgY \n", "1 2020-07-29 00:25:23.426 download shark fB2jl6Rb3l4 \n", "2 2020-07-29 00:26:13.122 download dogs k1hbfag2na0 \n", "3 2020-07-29 00:37:03.308 download astronaut -SyUjRlHauQ \n", "4 2020-07-29 00:54:28.942 download red roses A0iTJUhK4es \n", "\n", " anonymous_user_id conversion_country \n", "0 dd01ebdd-7691-4518-ab19-b2105782ae8b VE \n", "1 c48ba6e0-c6a7-4a92-b569-fe57808a8a2c QA \n", "2 62c4f043-579c-438f-8815-eb8ba3c54d34 KR \n", "3 7ad6dc18-a02e-4ba2-b93c-fd7ea2e551d8 JP \n", "4 f03a5708-32e4-4fae-8210-3c5d2632cbfb NZ " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "3cd9509a-2cc9-452a-ad92-4c4ea5fd7df0", "metadata": {}, "source": [ "Get top queries" ] }, { "cell_type": "code", "execution_count": 7, "id": "6738ac94-950f-45ef-89a9-f558e83ea151", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:18.629372Z", "iopub.status.busy": "2023-04-25T16:31:18.628998Z", "iopub.status.idle": "2023-04-25T16:31:20.872575Z", "shell.execute_reply": "2023-04-25T16:31:20.871772Z", "shell.execute_reply.started": "2023-04-25T16:31:18.629345Z" } }, "outputs": [], "source": [ "df_res = df.groupby([\"keyword\"], as_index=False)\\\n", " .size()\\\n", " .sort_values(\"size\", ascending=False)\\\n", " .rename(columns={'size':'num_searches'})" ] }, { "cell_type": "code", "execution_count": 8, "id": "6500ce65-b8f2-4abf-b796-e80a554c92b4", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.874034Z", "iopub.status.busy": "2023-04-25T16:31:20.873707Z", "iopub.status.idle": "2023-04-25T16:31:20.879505Z", "shell.execute_reply": "2023-04-25T16:31:20.878517Z", "shell.execute_reply.started": "2023-04-25T16:31:20.873970Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of unique queries: 569996 \n" ] } ], "source": [ "print (f\"Number of unique queries: {len(df_res)} \")" ] }, { "cell_type": "code", "execution_count": 9, "id": "98918b39-a639-42ea-a25c-e17bc0c51c8d", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.881180Z", "iopub.status.busy": "2023-04-25T16:31:20.880529Z", "iopub.status.idle": "2023-04-25T16:31:20.894394Z", "shell.execute_reply": "2023-04-25T16:31:20.893375Z", "shell.execute_reply.started": "2023-04-25T16:31:20.881155Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keywordnum_searches
334943nature381173
445718sky239848
193034flowers202391
333735natural196189
189492flower175126
431887sea165744
325200mountain161816
198609forest153677
350461ocean145435
45100beach136862
460237space120184
146484dog112637
328262mountains111005
533443water109987
320914moon106111
550361winter89541
90851cat87984
528686wallpaper87079
19880animal79378
507852tree78697
345303night sky77892
482869sunset75404
343674night74551
278629landscape72824
481555sunrise72290
161638earth70303
21578animals68001
518127universe66711
475829summer66019
387151plant64406
\n", "
" ], "text/plain": [ " keyword num_searches\n", "334943 nature 381173\n", "445718 sky 239848\n", "193034 flowers 202391\n", "333735 natural 196189\n", "189492 flower 175126\n", "431887 sea 165744\n", "325200 mountain 161816\n", "198609 forest 153677\n", "350461 ocean 145435\n", "45100 beach 136862\n", "460237 space 120184\n", "146484 dog 112637\n", "328262 mountains 111005\n", "533443 water 109987\n", "320914 moon 106111\n", "550361 winter 89541\n", "90851 cat 87984\n", "528686 wallpaper 87079\n", "19880 animal 79378\n", "507852 tree 78697\n", "345303 night sky 77892\n", "482869 sunset 75404\n", "343674 night 74551\n", "278629 landscape 72824\n", "481555 sunrise 72290\n", "161638 earth 70303\n", "21578 animals 68001\n", "518127 universe 66711\n", "475829 summer 66019\n", "387151 plant 64406" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_res.head(30)" ] }, { "cell_type": "markdown", "id": "c8349ec7-0112-4b7a-b1c5-a52e4b33a221", "metadata": { "tags": [] }, "source": [ "## What can we say about the typical queries ?" ] }, { "cell_type": "markdown", "id": "0548f316-6452-4eab-9740-838176ea683b", "metadata": {}, "source": [ "- Most of the queries seem to be under <3 keywords.\n", "- Users in the platform are interested in nature\n", "- no normalizations is done for the queries; animal vs animals ; vs mountain vs mountains" ] }, { "cell_type": "code", "execution_count": null, "id": "bf35d26c-9a62-4e67-b810-1d552dc9e822", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0ab13652-0433-40c0-9044-e21fc7ac22c6", "metadata": {}, "source": [ "Queries like above with \"broad\" intent are not that useful for comparing results" ] }, { "cell_type": "code", "execution_count": null, "id": "ba1ef62a-fdab-4a33-8f89-e20e0c6d2fbc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6e8e5731-c4f5-45c8-9a15-76c8758e1845", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fc047f35-f156-4a2c-a0e3-d629ddaca3ff", "metadata": {}, "source": [ "## Exploring Longer Queries" ] }, { "cell_type": "code", "execution_count": 10, "id": "a5f9bcd7-6c07-4b87-8910-81df7affaea4", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:20.895617Z", "iopub.status.busy": "2023-04-25T16:31:20.895394Z", "iopub.status.idle": "2023-04-25T16:31:21.322740Z", "shell.execute_reply": "2023-04-25T16:31:21.321881Z", "shell.execute_reply.started": "2023-04-25T16:31:20.895597Z" } }, "outputs": [], "source": [ "df_res[\"num_keywords\"] = df_res[\"keyword\"].apply(lambda x: len(x.split(\" \")))" ] }, { "cell_type": "code", "execution_count": 11, "id": "049b1fe8-cc95-4e1a-956f-81564bd75c16", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.324017Z", "iopub.status.busy": "2023-04-25T16:31:21.323750Z", "iopub.status.idle": "2023-04-25T16:31:21.369322Z", "shell.execute_reply": "2023-04-25T16:31:21.368403Z", "shell.execute_reply.started": "2023-04-25T16:31:21.323993Z" } }, "outputs": [], "source": [ "df_long_queries = df_res[(df_res[\"num_keywords\"] > 1) ]" ] }, { "cell_type": "code", "execution_count": 12, "id": "c06f5c52-1240-432b-aa0a-07b6fb6427ce", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.370917Z", "iopub.status.busy": "2023-04-25T16:31:21.370534Z", "iopub.status.idle": "2023-04-25T16:31:21.385868Z", "shell.execute_reply": "2023-04-25T16:31:21.385104Z", "shell.execute_reply.started": "2023-04-25T16:31:21.370890Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keywordnum_searchesnum_keywords
327457mountain star landscape night sky7795
287590light at the end of the tunnel3087
499894there is no planet b2425
276561lago di braies, braies, italy1185
534678water droplets on a leaf1065
224699great sand dunes national park945
258115image of a man in a desert827
274846konkan beach resort, ratnagiri, india735
335652nature backgrounds water ripple reflection675
459722south georgia and the south sandwich islands547
426421samsung note 10 lite wallpaper525
37977background image for google doc515
141140desert sunset nature landscape sky495
61505black grapes with wood plates445
454677snow mountain clear blue sky445
364313palm trees at the beach435
349030nova scotia duck tolling retriever435
499403the surface of the moon375
499623the waves of the sea375
263540iphone 11 pro max wallpaper365
27378art of table potted flower355
337991nature photos light colours355
504377torres del paine national park325
134706dark side of the moon315
67340blue sky and white clouds315
375214person on top of mountain315
277772lake with lotus and lilies photos296
50421beauitful wallpaper nature 8k295
287582light at end of tunnel285
26499ariel view of the ocean285
177964farmhouse rustic yellow and pink285
161094eagle flying in the sky275
569329沙漠青蛙 沙漠青蛙 desert frog2614
415209ripley's aquarium of canada, toronto, canada266
425941salar de uyuni uyuni bolivia255
143650dew drops on a grass255
123267couple romdik love photo in tamil256
304156man on top of mountain255
193247flowers and plants and trees245
497287the butterfly atrium at hershey gardens246
277773lake with lotus and lily245
479245sun rise on a mountain245
203350free hd luminious backgrounds for photos246
298593lower antelope canyon, page, united states246
393216por do sol no mar235
142678desktop wallpapers 1920 x 1080225
534769water drops on the rose225
437160seven wonders of the world205
66424blue lake and green shore195
295937looking up to the sky195
\n", "
" ], "text/plain": [ " keyword num_searches \\\n", "327457 mountain star landscape night sky 779 \n", "287590 light at the end of the tunnel 308 \n", "499894 there is no planet b 242 \n", "276561 lago di braies, braies, italy 118 \n", "534678 water droplets on a leaf 106 \n", "224699 great sand dunes national park 94 \n", "258115 image of a man in a desert 82 \n", "274846 konkan beach resort, ratnagiri, india 73 \n", "335652 nature backgrounds water ripple reflection 67 \n", "459722 south georgia and the south sandwich islands 54 \n", "426421 samsung note 10 lite wallpaper 52 \n", "37977 background image for google doc 51 \n", "141140 desert sunset nature landscape sky 49 \n", "61505 black grapes with wood plates 44 \n", "454677 snow mountain clear blue sky 44 \n", "364313 palm trees at the beach 43 \n", "349030 nova scotia duck tolling retriever 43 \n", "499403 the surface of the moon 37 \n", "499623 the waves of the sea 37 \n", "263540 iphone 11 pro max wallpaper 36 \n", "27378 art of table potted flower 35 \n", "337991 nature photos light colours 35 \n", "504377 torres del paine national park 32 \n", "134706 dark side of the moon 31 \n", "67340 blue sky and white clouds 31 \n", "375214 person on top of mountain 31 \n", "277772 lake with lotus and lilies photos 29 \n", "50421 beauitful wallpaper nature 8k 29 \n", "287582 light at end of tunnel 28 \n", "26499 ariel view of the ocean 28 \n", "177964 farmhouse rustic yellow and pink 28 \n", "161094 eagle flying in the sky 27 \n", "569329 沙漠青蛙 沙漠青蛙 desert frog 26 \n", "415209 ripley's aquarium of canada, toronto, canada 26 \n", "425941 salar de uyuni uyuni bolivia 25 \n", "143650 dew drops on a grass 25 \n", "123267 couple romdik love photo in tamil 25 \n", "304156 man on top of mountain 25 \n", "193247 flowers and plants and trees 24 \n", "497287 the butterfly atrium at hershey gardens 24 \n", "277773 lake with lotus and lily 24 \n", "479245 sun rise on a mountain 24 \n", "203350 free hd luminious backgrounds for photos 24 \n", "298593 lower antelope canyon, page, united states 24 \n", "393216 por do sol no mar 23 \n", "142678 desktop wallpapers 1920 x 1080 22 \n", "534769 water drops on the rose 22 \n", "437160 seven wonders of the world 20 \n", "66424 blue lake and green shore 19 \n", "295937 looking up to the sky 19 \n", "\n", " num_keywords \n", "327457 5 \n", "287590 7 \n", "499894 5 \n", "276561 5 \n", "534678 5 \n", "224699 5 \n", "258115 7 \n", "274846 5 \n", "335652 5 \n", "459722 7 \n", "426421 5 \n", "37977 5 \n", "141140 5 \n", "61505 5 \n", "454677 5 \n", "364313 5 \n", "349030 5 \n", "499403 5 \n", "499623 5 \n", "263540 5 \n", "27378 5 \n", "337991 5 \n", "504377 5 \n", "134706 5 \n", "67340 5 \n", "375214 5 \n", "277772 6 \n", "50421 5 \n", "287582 5 \n", "26499 5 \n", "177964 5 \n", "161094 5 \n", "569329 14 \n", "415209 6 \n", "425941 5 \n", "143650 5 \n", "123267 6 \n", "304156 5 \n", "193247 5 \n", "497287 6 \n", "277773 5 \n", "479245 5 \n", "203350 6 \n", "298593 6 \n", "393216 5 \n", "142678 5 \n", "534769 5 \n", "437160 5 \n", "66424 5 \n", "295937 5 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_long_queries[df_long_queries.num_keywords > 4].head(50)" ] }, { "cell_type": "markdown", "id": "7c4d9566-4f47-4e50-8dde-5790c6466c4d", "metadata": {}, "source": [ "## Interesting Queries" ] }, { "cell_type": "markdown", "id": "49f0b698-ffde-4da4-9975-2ea8ea63c6b8", "metadata": {}, "source": [ "Detailed Intent\n", "- water droplets on a leaf\t\n", "- image of a man in a desert\t\n", "- person on top of mountain\t\n", "\n", "\n", "\n", "Location:\n", "- ripley's aquarium of canada, toronto, canada\t\n", "- the butterfly atrium at hershey gardens\t\n", "\n", "Non English Queries\n", "- salar de uyuni uyuni bolivia\t\n", "- 沙漠青蛙 沙漠青蛙 desert frog\t\n", "- por do sol no mar\t\n", "- conhece te a ti mesmo\t ( Greek for know thyself)\n", "\n", "\n", "Metaphors / Slogan:\n", "- light at the end of the tunnel\t\n", "- there is no planet b\t\n", "\n", "Multiple Candidates\n", "- seven wonders of the world\t\n", "\n", "Long Query / Single Intent\n", "- nova scotia duck tolling retriever\t ( dog breed)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7f9f81d3-ebfa-4489-920c-d81f0f9e98f3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "66c63f2d-219b-4ee8-ac61-3ca43a3b79bf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1d184a7c-87d2-4003-8082-6e1a38bcdf53", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8b57c441-ae37-47c4-b0b6-bf07ecb16ca4", "metadata": {}, "source": [ "Non frequently searched queries" ] }, { "cell_type": "code", "execution_count": 13, "id": "5e2d990b-b36f-4274-8b55-e8b71db3ef43", "metadata": { "execution": { "iopub.execute_input": "2023-04-25T16:31:21.387111Z", "iopub.status.busy": "2023-04-25T16:31:21.386821Z", "iopub.status.idle": "2023-04-25T16:31:21.401449Z", "shell.execute_reply": "2023-04-25T16:31:21.400704Z", "shell.execute_reply.started": "2023-04-25T16:31:21.387073Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keywordnum_searchesnum_keywords
313105mid night star picture for youtube thumbnail17
119583cool gamer pics for free15
313060mid century gothic style rose painting16
313079mid century modern interior design15
313077mid century modern home interior15
313076mid century modern home decor15
313160middle aged women beauty15
313148middle age is an age of many colors.18
313185middle east night in the desert16
313694milky way at the sea15
313709milky way by the nasa15
119302cool adventurous places one can visit with a b...116
119308cool and colorful wallpapers15
119310cool and fun pictures of animals16
313799milky way moon 3000x300015
119239cooking over a flame15
313744milky way galaxy and man15
313765milky way galaxy with people15
119390cool beach romance for familly15
119374cool backgrounds with cool wolves15
313519miles pond, vt chamber of commerce16
315756minimalist autumn wallpaper for mac15
118529constantia, cape town, south africa15
118506conserve energy hd images15
315504minimal windows 10 wallpaper plants15
315497minimal white pot with green leaves16
316032minimalist nature black and white15
316000minimalist lotus whitte background flower15
316020minimalist motivation work hard beach15
316101minimalist qoutes for travel wallpaper15
118232conhece te a ti mesmo15
315931minimalist gentle monochrome simple macro text...17
315959minimalist home decor with plants15
315905minimalist flower black and white15
314828minimal black and white background15
118809contaminated and counterfeited bottled water15
314994minimal flower on white background16
314905minimal colorful art on white background16
314803minimal background texture nature plants15
314801minimal background stacks of magazines15
314765minimal background dark double screen15
118848contemporary architecture made from wood15
314790minimal background nature soft brown15
118850contemporary art gallery at night15
314729minimal art black and white15
315333minimal scene with geometric forms.15
118589constellations in the night sky15
315010minimal food flat lay background15
118714construction worker at the beach15
118716construction worker in the winter15
\n", "
" ], "text/plain": [ " keyword num_searches \\\n", "313105 mid night star picture for youtube thumbnail 1 \n", "119583 cool gamer pics for free 1 \n", "313060 mid century gothic style rose painting 1 \n", "313079 mid century modern interior design 1 \n", "313077 mid century modern home interior 1 \n", "313076 mid century modern home decor 1 \n", "313160 middle aged women beauty 1 \n", "313148 middle age is an age of many colors. 1 \n", "313185 middle east night in the desert 1 \n", "313694 milky way at the sea 1 \n", "313709 milky way by the nasa 1 \n", "119302 cool adventurous places one can visit with a b... 1 \n", "119308 cool and colorful wallpapers 1 \n", "119310 cool and fun pictures of animals 1 \n", "313799 milky way moon 3000x3000 1 \n", "119239 cooking over a flame 1 \n", "313744 milky way galaxy and man 1 \n", "313765 milky way galaxy with people 1 \n", "119390 cool beach romance for familly 1 \n", "119374 cool backgrounds with cool wolves 1 \n", "313519 miles pond, vt chamber of commerce 1 \n", "315756 minimalist autumn wallpaper for mac 1 \n", "118529 constantia, cape town, south africa 1 \n", "118506 conserve energy hd images 1 \n", "315504 minimal windows 10 wallpaper plants 1 \n", "315497 minimal white pot with green leaves 1 \n", "316032 minimalist nature black and white 1 \n", "316000 minimalist lotus whitte background flower 1 \n", "316020 minimalist motivation work hard beach 1 \n", "316101 minimalist qoutes for travel wallpaper 1 \n", "118232 conhece te a ti mesmo 1 \n", "315931 minimalist gentle monochrome simple macro text... 1 \n", "315959 minimalist home decor with plants 1 \n", "315905 minimalist flower black and white 1 \n", "314828 minimal black and white background 1 \n", "118809 contaminated and counterfeited bottled water 1 \n", "314994 minimal flower on white background 1 \n", "314905 minimal colorful art on white background 1 \n", "314803 minimal background texture nature plants 1 \n", "314801 minimal background stacks of magazines 1 \n", "314765 minimal background dark double screen 1 \n", "118848 contemporary architecture made from wood 1 \n", "314790 minimal background nature soft brown 1 \n", "118850 contemporary art gallery at night 1 \n", "314729 minimal art black and white 1 \n", "315333 minimal scene with geometric forms. 1 \n", "118589 constellations in the night sky 1 \n", "315010 minimal food flat lay background 1 \n", "118714 construction worker at the beach 1 \n", "118716 construction worker in the winter 1 \n", "\n", " num_keywords \n", "313105 7 \n", "119583 5 \n", "313060 6 \n", "313079 5 \n", "313077 5 \n", "313076 5 \n", "313160 5 \n", "313148 8 \n", "313185 6 \n", "313694 5 \n", "313709 5 \n", "119302 16 \n", "119308 5 \n", "119310 6 \n", "313799 5 \n", "119239 5 \n", "313744 5 \n", "313765 5 \n", "119390 5 \n", "119374 5 \n", "313519 6 \n", "315756 5 \n", "118529 5 \n", "118506 5 \n", "315504 5 \n", "315497 6 \n", "316032 5 \n", "316000 5 \n", "316020 5 \n", "316101 5 \n", "118232 5 \n", "315931 7 \n", "315959 5 \n", "315905 5 \n", "314828 5 \n", "118809 5 \n", "314994 6 \n", "314905 6 \n", "314803 5 \n", "314801 5 \n", "314765 5 \n", "118848 5 \n", "314790 5 \n", "118850 5 \n", "314729 5 \n", "315333 5 \n", "118589 5 \n", "315010 5 \n", "118714 5 \n", "118716 5 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_long_queries[df_long_queries.num_keywords > 4].tail(50)" ] }, { "cell_type": "markdown", "id": "0fdf13c0-a913-44f4-8786-b223cad1d0a7", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "49f898d8-5fd1-4475-a8f3-68c2e3d7f7b9", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "environment": { "kernel": "python3", "name": "pytorch-gpu.1-13.m107", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-13:m107" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }