"# Recommender with Redis and Milvus\n", "> Storing the pre-calculated user and items vectors of movielens dataset into redis in-memory database and then indexing into milvus for efficient large-scale retrieval\n", "\n", "- toc: true\n", "- badges: true\n", "- comments: true\n", "- categories: [retrieval, redis, milvus, movie]\n", "- image:" Supported CPU instruction sets: avx2, sse4_2\n", "FAISS hook AVX2\n", "Milvus server started successfully! Redis must be restarted after THP is disabled.\n", "42581:M 23 Jun 2021 16:02:32.642 * Ready to accept connections\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "4K0PQx_Qpk-d" }, "source": [ "!pip install -U grpcio" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ryH_eaIEpcuF", "outputId": "32f9e0db-636f-4698-e1c8-8dee54961fcd" }, "source": [ "%cd /content" ], "execution_count": 3, "outputs": [ { "output_type": "stream", "text": [ "/content\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "cac14f13" }, "source": [ "### Downloading Pretrained Models\n", "\n", "This PaddlePaddle model is used to transform user information into vectors." ] }, { "cell_type": "code", "metadata": { "scrolled": true, "id": "b52f00c3", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "c20670c6-1a1d-4ec8-edf5-4981c8499a0f" }, "source": [ "!wget https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz --no-check-certificate\n", "!mkdir -p movie_recommender/user_vector_model\n", "!tar xf user_vector.tar.gz -C movie_recommender/user_vector_model/\n", "!rm user_vector.tar.gz" ], "execution_count": 4, "outputs": [ { "output_type": "stream", "text": [ "--2021-06-23 16:13:35-- https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz\n", "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)..., 2409:8c00:6c21:10ad:0:ff:b00e:67d\n", "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)||:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 33650924 (32M) [application/x-gzip]\n", "Saving to: ‘user_vector.tar.gz’\n", "\n", "user_vector.tar.gz 100%[===================>] 32.09M 7.94MB/s in 5.4s \n", "\n", "2021-06-23 16:13:42 (5.95 MB/s) - ‘user_vector.tar.gz’ saved [33650924/33650924]\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "1ab3a252" }, "source": [ "Downloading Data" ] }, { "cell_type": "code", "metadata": { "id": "39a7facb", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "86a36475-e468-447b-d572-9b5d8eb6c53f" }, "source": [ "# Download movie information\n", "!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movies.dat --no-check-certificate\n", "# Download movie vecotrs\n", "!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt --no-check-certificate" ], "execution_count": 5, "outputs": [ { "output_type": "stream", "text": [ "--2021-06-23 16:13:43-- https://paddlerec.bj.bcebos.com/aistudio/movies.dat\n", "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)..., 2409:8c00:6c21:10ad:0:ff:b00e:67d\n", "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)||:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 171308 (167K) [application/octet-stream]\n", "Saving to: ‘movie_recommender/movies.dat’\n", "\n", "movies.dat 100%[===================>] 167.29K 167KB/s in 1.0s \n", "\n", "2021-06-23 16:13:46 (167 KB/s) - ‘movie_recommender/movies.dat’ saved [171308/171308]\n", "\n", "--2021-06-23 16:13:46-- https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt\n", "Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)..., 2409:8c00:6c21:10ad:0:ff:b00e:67d\n", "Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)||:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1095505 (1.0M) [text/plain]\n", "Saving to: ‘movie_recommender/movie_vectors.txt’\n", "\n", "movie_vectors.txt 100%[===================>] 1.04M 648KB/s in 1.7s \n", "\n", "2021-06-23 16:13:49 (648 KB/s) - ‘movie_recommender/movie_vectors.txt’ saved [1095505/1095505]\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "e994eb1e-aa76-446b-98c6-02c74f050ba5" }, "source": [ "Importing Movies into Milvus" ] }, { "cell_type": "markdown", "metadata": { "id": "3a999eeb-bcc6-4800-9039-f9c57ea399f1" }, "source": [ "#### 1. Connectings to Milvus and Redis" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fQnASRYRno5g", "outputId": "59d20c18-64d4-40b8-dc0e-50ee4eac09db" }, "source": [ "! lsof -i -P -n | grep -E 'milvus|redis'" ], "execution_count": 28, "outputs": [ { "output_type": "stream", "text": [ "milvus_se 42433 root 17u IPv4 478871 0t0 TCP *:19121 (LISTEN)\n", "milvus_se 42433 root 20u IPv4 479283 0t0 TCP *:19530 (LISTEN)\n", "redis-ser 42581 root 6u IPv6 507112 0t0 TCP *:6379 (LISTEN)\n", "redis-ser 42581 root 7u IPv4 507113 0t0 TCP *:6379 (LISTEN)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "d8de5e40" }, "source": [ "from milvus import Milvus, IndexType, MetricType, Status\n", "import redis\n", "\n", "milv = Milvus(host = '', port = 19530)\n", "r = redis.StrictRedis(host=\"\", port=6379) " ], "execution_count": 2, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "KGwZgi9Hqdhs", "outputId": "c3e09d0e-d937-4daf-ab4d-6c9d605b684f" }, "source": [ "milv.client_version()" ], "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'1.1.0'" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "markdown", "metadata": { "id": "a3c114a7" }, "source": [ "#### 2. Loading Movies into Redis\n", "We begin by loading all the movie files into redis. " ] }, { "cell_type": "code", "metadata": { "id": "f56cf19c" }, "source": [ "import json\n", "import codecs\n", "\n", "#1::Toy Story (1995)::Animation|Children's|Comedy\n", "def process_movie(lines, redis_cli):\n", " for line in lines:\n", " if len(line.strip()) == 0:\n", " continue\n", " tmp = line.strip().split(\"::\")\n", " movie_id = tmp[0]\n", " title = tmp[1]\n", " genre_group = tmp[2]\n", " tmp = genre_group.strip().split(\"|\")\n", " genre = tmp\n", " movie_info = {\"movie_id\" : movie_id,\n", " \"title\" : title,\n", " \"genre\" : genre\n", " }\n", " redis_cli.set(\"{}##movie_info\".format(movie_id), json.dumps(movie_info))\n", " \n", "with codecs.open(\"movie_recommender/movies.dat\", \"r\",encoding='utf-8',errors='ignore') as f:\n", " lines = f.readlines()\n", " process_movie(lines, r)" ], "execution_count": 4, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "a54a6046" }, "source": [ "#### 3. Creating Partition and Collection in Milvus" ] }, { "cell_type": "code", "metadata": { "id": "ef3ef1f7", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "2f9f5133-bfc1-47b8-faf3-66984ea71774" }, "source": [ "COLLECTION_NAME = 'demo_films'\n", "PARTITION_NAME = 'Movie'\n", "\n", "#Dropping collection for clean slate run\n", "milv.drop_collection(COLLECTION_NAME)\n", "\n", "\n", "param = {'collection_name':COLLECTION_NAME, \n", " 'dimension':32, \n", " 'index_file_size':2048, \n", " 'metric_type':MetricType.L2\n", " }\n", "\n", "milv.create_collection(param)\n", "# milv.create_partition(COLLECTION_NAME, PARTITION_NAME)" ], "execution_count": 5, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Status(code=0, message='Create collection successfully!')" ] }, "metadata": { "tags": [] }, "execution_count": 5 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "399Cxz4cqhZ4", "outputId": "5ac70704-2877-4f59-fa0a-3df4db2ded6b" }, "source": [ "milv.get_collection_info(COLLECTION_NAME)" ], "execution_count": 7, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(Status(code=0, message='Describe collection successfully!'),\n", " CollectionSchema(collection_name='demo_films', dimension=32, index_file_size=2048, metric_type=))" ] }, "metadata": { "tags": [] }, "execution_count": 7 } ] }, { "cell_type": "markdown", "metadata": { "id": "d298372e" }, "source": [ "#### 4. Getting Embeddings and IDs\n", "The vectors in `movie_vectors.txt` are obtained from the `user_vector_model` downloaded above. So we can directly get the vectors and the IDs by reading the file." ] }, { "cell_type": "code", "metadata": { "id": "1aaee36b" }, "source": [ "def get_vectors():\n", " with codecs.open(\"movie_recommender/movie_vectors.txt\", \"r\", encoding='utf-8', errors='ignore') as f:\n", " lines = f.readlines()\n", " ids = [int(line.split(\":\")[0]) for line in lines]\n", " embeddings = []\n", " for line in lines:\n", " line = line.strip().split(\":\")[1][1:-1]\n", " str_nums = line.split(\",\")\n", " emb = [float(x) for x in str_nums]\n", " embeddings.append(emb)\n", " return ids, embeddings\n", "\n", "ids, embeddings = get_vectors()" ], "execution_count": 8, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "3a6140b1" }, "source": [ "#### 4. Importing Vectors into Milvus\n", "Import vectors into the partition **Movie** under the collection **demo_films**." ] }, { "cell_type": "code", "metadata": { "id": "4ac4cfff", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "8aff463d-ec4b-4ecc-e076-2d626d54d536" }, "source": [ "# status = milv.insert(collection_name=COLLECTION_NAME, records=embeddings, ids=ids, partition_tag=PARTITION_NAME)\n", "status = milv.insert(collection_name=COLLECTION_NAME, records=embeddings, ids=ids)\n", "status[0]" ], "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Status(code=0, message='Add vectors successfully!')" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "markdown", "metadata": { "id": "e93feb30" }, "source": [ "### Recalling Vectors in Milvus\n", "#### 1. Genarating User Embeddings\n", "Pass in the gender, age and occupation of the user we want to recommend. **user_vector_model** model will generate the corresponding user vector.\n", "Occupation is chosen from the following choices:\n", "* 0: \"other\" or not specified\n", "* 1: \"academic/educator\"\n", "* 2: \"artist\"\n", "* 3: \"clerical/admin\"\n", "* 4: \"college/grad student\"\n", "* 5: \"customer service\"\n", "* 6: \"doctor/health care\"\n", "* 7: \"executive/managerial\"\n", "* 8: \"farmer\"\n", "* 9: \"homemaker\"\n", "* 10: \"K-12 student\"\n", "* 11: \"lawyer\"\n", "* 12: \"programmer\"\n", "* 13: \"retired\"\n", "* 14: \"sales/marketing\"\n", "* 15: \"scientist\"\n", "* 16: \"self-employed\"\n", "* 17: \"technician/engineer\"\n", "* 18: \"tradesman/craftsman\"\n", "* 19: \"unemployed\"\n", "* 20: \"writer\"" ] }, { "cell_type": "code", "metadata": { "tags": [], "id": "1a35a9d4", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "6eaf7dbd-95e8-43a6-c43a-9e2c57fb5c81" }, "source": [ "import numpy as np\n", "from paddle_serving_app.local_predict import LocalPredictor\n", "\n", "class RecallServerServicer(object):\n", " def __init__(self):\n", " self.uv_client = LocalPredictor()\n", " self.uv_client.load_model_config(\"movie_recommender/user_vector_model/serving_server_dir\") \n", " \n", " def hash2(self, a):\n", " return hash(a) % 1000000\n", "\n", " def get_user_vector(self):\n", " dic = {\"userid\": [], \"gender\": [], \"age\": [], \"occupation\": []}\n", " lod = [0]\n", " dic[\"userid\"].append(self.hash2('0'))\n", " dic[\"gender\"].append(self.hash2('M'))\n", " dic[\"age\"].append(self.hash2('23'))\n", " dic[\"occupation\"].append(self.hash2('6'))\n", " lod.append(1)\n", "\n", " dic[\"userid.lod\"] = lod\n", " dic[\"gender.lod\"] = lod\n", " dic[\"age.lod\"] = lod\n", " dic[\"occupation.lod\"] = lod\n", " for key in dic:\n", " dic[key] = np.array(dic[key]).astype(np.int64).reshape(len(dic[key]),1)\n", " fetch_map = self.uv_client.predict(feed=dic, fetch=[\"save_infer_model/scale_0.tmp_1\"], batch=True)\n", " return fetch_map[\"save_infer_model/scale_0.tmp_1\"].tolist()[0]\n", "\n", "recall = RecallServerServicer()\n", "user_vector = recall.get_user_vector()" ], "execution_count": 10, "outputs": [ { "output_type": "stream", "text": [ "2021-06-23 16:29:24,262 - INFO - LocalPredictor load_model_config params: model_path:movie_recommender/user_vector_model/serving_server_dir, use_gpu:False, gpu_id:0, use_profile:False, thread_num:1, mem_optim:True, ir_optim:False, use_trt:False, use_lite:False, use_xpu: False, use_feed_fetch_ops:False\n" ], "name": "stderr" } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vfzaCguwtLgL", "outputId": "78506dc2-aa85-497b-f412-cdc31fb38a31" }, "source": [ "user_vector" ], "execution_count": 11, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[0.0,\n", " 4.911433696746826,\n", " 4.132595062255859,\n", " 3.2255895137786865,\n", " 0.0,\n", " 4.944108963012695,\n", " 0.0,\n", " 0.0,\n", " 1.27165687084198,\n", " 3.1072912216186523,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 1.9184402227401733,\n", " 0.0,\n", " 0.0,\n", " 0.0,\n", " 4.42396354675293,\n", " 2.0686450004577637,\n", " 0.0]" ] }, "metadata": { "tags": [] }, "execution_count": 11 } ] }, { "cell_type": "markdown", "metadata": { "tags": [], "id": "e15ea6e8" }, "source": [ "#### 2. Searching\n", "Pass in the user vector, and then recall vectors in the previously imported data collection and partition." ] }, { "cell_type": "code", "metadata": { "id": "e4d91d02" }, "source": [ "TOP_K = 20\n", "SEARCH_PARAM = {'nprobe': 20}\n", "status, results = milv.search(collection_name=COLLECTION_NAME, query_records=[user_vector], top_k=TOP_K, params=SEARCH_PARAM)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9c847608" }, "source": [ "#### 3. Returning Information by IDs" ] }, { "cell_type": "code", "metadata": { "id": "90a56325", "outputId": "bffcbddb-2b25-4320-b8c9-1af5b0b6a75f" }, "source": [ "recall_results = []\n", "for x in results[0]:\n", " recall_results.append(r.get(\"{}##movie_info\".format(x.id)).decode('utf-8'))\n", "recall_results" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['{\"movie_id\": \"760\", \"title\": \"Stalingrad (1993)\", \"genre\": [\"War\"]}',\n", " '{\"movie_id\": \"1350\", \"title\": \"Omen, The (1976)\", \"genre\": [\"Horror\"]}',\n", " '{\"movie_id\": \"1258\", \"title\": \"Shining, The (1980)\", \"genre\": [\"Horror\"]}',\n", " '{\"movie_id\": \"632\", \"title\": \"Land and Freedom (Tierra y libertad) (1995)\", \"genre\": [\"War\"]}',\n", " '{\"movie_id\": \"3007\", \"title\": \"American Movie (1999)\", \"genre\": [\"Documentary\"]}',\n", " '{\"movie_id\": \"2086\", \"title\": \"One Magic Christmas (1985)\", \"genre\": [\"Drama\", \"Fantasy\"]}',\n", " '{\"movie_id\": \"1051\", \"title\": \"Trees Lounge (1996)\", \"genre\": [\"Drama\"]}',\n", " '{\"movie_id\": \"3920\", \"title\": \"Faraway, So Close (In Weiter Ferne, So Nah!) (1993)\", \"genre\": [\"Drama\", \"Fantasy\"]}',\n", " '{\"movie_id\": \"1303\", \"title\": \"Man Who Would Be King, The (1975)\", \"genre\": [\"Adventure\"]}',\n", " '{\"movie_id\": \"652\", \"title\": \"301, 302 (1995)\", \"genre\": [\"Mystery\"]}',\n", " '{\"movie_id\": \"1605\", \"title\": \"Excess Baggage (1997)\", \"genre\": [\"Adventure\", \"Romance\"]}',\n", " '{\"movie_id\": \"1275\", \"title\": \"Highlander (1986)\", \"genre\": [\"Action\", \"Adventure\"]}',\n", " '{\"movie_id\": \"1126\", \"title\": \"Drop Dead Fred (1991)\", \"genre\": [\"Comedy\", \"Fantasy\"]}',\n", " '{\"movie_id\": \"792\", \"title\": \"Hungarian Fairy Tale, A (1987)\", \"genre\": [\"Fantasy\"]}',\n", " '{\"movie_id\": \"2228\", \"title\": \"Mountain Eagle, The (1926)\", \"genre\": [\"Drama\"]}',\n", " '{\"movie_id\": \"2659\", \"title\": \"It Came from Hollywood (1982)\", \"genre\": [\"Comedy\", \"Documentary\"]}',\n", " '{\"movie_id\": \"2545\", \"title\": \"Relax... It\\'s Just Sex (1998)\", \"genre\": [\"Comedy\"]}',\n", " '{\"movie_id\": \"1289\", \"title\": \"Koyaanisqatsi (1983)\", \"genre\": [\"Documentary\", \"War\"]}',\n", " '{\"movie_id\": \"2537\", \"title\": \"Beyond the Poseidon Adventure (1979)\", \"genre\": [\"Adventure\"]}',\n", " '{\"movie_id\": \"2864\", \"title\": \"Splendor (1999)\", \"genre\": [\"Comedy\"]}']" ] }, "metadata": { "tags": [] }, "execution_count": 23 } ] }, { "cell_type": "markdown", "metadata": { "id": "d4f7e3c5" }, "source": [ "### Conclusion" ] }, { "cell_type": "markdown", "metadata": { "id": "843120ee" }, "source": [ "After completing the recall service, the results can be further sorted using the **movie_recommender** model, and then the movies with high similarity scores can be recommended to users. You can try this deployable recommendation system using this [quick start](QUICK_START.md)." ] } ] }