# Recommender with Redis and Milvus
> Storing the pre-calculated user and items vectors of movielens dataset into redis in-memory database and then indexing into milvus for efficient large-scale retrieval

- toc: true
- badges: true
- comments: true
- categories: [retrieval, redis, milvus, movie]
- image:

| Packages | Servers |
| --------------- | -------------- |
| pymilvus | milvus-1.1.0 |
| redis | redis |
| paddle_serving_app |
| paddlepaddle |

In [None]:
!pip install pymilvus==1.1.0
!pip install paddle_serving_app==0.3.1
!pip install paddlepaddle
!pip install redis

### Install and run Milvus server

> Warning: It will take ~40 minutes to install!

In [None]:
!git clone -b 1.1 https://github.com/milvus-io/milvus.git
% cd /content/milvus/core
! ./ubuntu_build_deps.sh
!./build.sh -t Release
# !./build.sh -t Release -g

% cd /content/milvus/core/milvus
! echo $LD_LIBRARY_PATH
import os
os.environ['LD_LIBRARY_PATH'] +=":/content/milvus/core/milvus/lib"
! echo $LD_LIBRARY_PATH
% cd scripts
! nohup ./start_server.sh &
! cat nohup.out

In [8]:
!cat nohup.out


 __ _________ _ ____ ______ 
 / |/ / _/ /| | / / / / / __/ 
 / /|_/ // // /_| |/ / /_/ /\ \ 
 /_/ /_/___/____/___/\____/___/ 

Welcome to use Milvus!
Milvus Release version: v1.1.1, built at 2021-06-23 14:11.42, with OpenBLAS library.
You are using Milvus CPU edition
Last commit id: 3fc81236452d8060fe7adc1793ad1d69f3d8423c

Loading configuration from: ../conf/server_config.yaml
NOTICE: You are using SQLite as the meta data management. We recommend change it to MySQL.
Supported CPU instruction sets: avx2, sse4_2
FAISS hook AVX2
Milvus server started successfully!
Milvus server is going to shutdown ...
Milvus server exit...


We are using Redis as a metadata storage service. Code can easily be modified to use a python dictionary, but that usually does not work in any use case outside of quick examples. We need a metadata storage service in order to be able to be able to map between embeddings and the corresponding data.

### Install and run Redis server

In [3]:
#hide-output
!wget http://download.redis.io/releases/redis-stable.tar.gz --no-check-certificate
!tar -xf redis-stable.tar.gz && cd redis-stable/src && make

--2021-06-23 14:52:56-- http://download.redis.io/releases/redis-stable.tar.gz
Resolving download.redis.io (download.redis.io)... 45.60.121.1
Connecting to download.redis.io (download.redis.io)|45.60.121.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2261060 (2.2M) [application/octet-stream]
Saving to: ‘redis-stable.tar.gz’


2021-06-23 14:52:57 (21.0 MB/s) - ‘redis-stable.tar.gz’ saved [2261060/2261060]

 [34mCC[0m [33mMakefile.dep[0m
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep dict-benchmark
rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d scripting.d bio.d rio.d 

In [22]:
! nohup ./redis-stable/src/redis-server > redis_nohup.out &
! cat redis_nohup.out

nohup: redirecting stderr to stdout
42581:C 23 Jun 2021 16:02:32.639 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
42581:C 23 Jun 2021 16:02:32.639 # Redis version=6.0.5, bits=64, commit=3fc81236, modified=0, pid=42581, just started
42581:M 23 Jun 2021 16:02:32.641 * Running mode=standalone, port=6379.
42581:M 23 Jun 2021 16:02:32.641 # Server initialized
42581:M 23 Jun 2021 16:02:32.642 * Ready to accept connections


In [None]:
!pip install -U grpcio

In [3]:
%cd /content

/content


### Downloading Pretrained Models

This PaddlePaddle model is used to transform user information into vectors.

In [4]:
!wget https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz --no-check-certificate
!mkdir -p movie_recommender/user_vector_model
!tar xf user_vector.tar.gz -C movie_recommender/user_vector_model/
!rm user_vector.tar.gz

--2021-06-23 16:13:35-- https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz
Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 103.235.46.61, 2409:8c00:6c21:10ad:0:ff:b00e:67d
Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33650924 (32M) [application/x-gzip]
Saving to: ‘user_vector.tar.gz’


2021-06-23 16:13:42 (5.95 MB/s) - ‘user_vector.tar.gz’ saved [33650924/33650924]



Downloading Data

In [5]:
# Download movie information
!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movies.dat --no-check-certificate
# Download movie vecotrs
!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt --no-check-certificate

--2021-06-23 16:13:43-- https://paddlerec.bj.bcebos.com/aistudio/movies.dat
Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 103.235.46.61, 2409:8c00:6c21:10ad:0:ff:b00e:67d
Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 171308 (167K) [application/octet-stream]
Saving to: ‘movie_recommender/movies.dat’


2021-06-23 16:13:46 (167 KB/s) - ‘movie_recommender/movies.dat’ saved [171308/171308]

--2021-06-23 16:13:46-- https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt
Resolving paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)... 103.235.46.61, 2409:8c00:6c21:10ad:0:ff:b00e:67d
Connecting to paddlerec.bj.bcebos.com (paddlerec.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1095505 (1.0M) [text/plain]
Saving to: ‘movie_recommender/movie_vectors.txt’


2021-06-23 16:13:49 (648 KB/s) - ‘movie_recommender/movie_vect

Importing Movies into Milvus

#### 1. Connectings to Milvus and Redis

In [28]:
! lsof -i -P -n | grep -E 'milvus|redis'

milvus_se 42433 root 17u IPv4 478871 0t0 TCP *:19121 (LISTEN)
milvus_se 42433 root 20u IPv4 479283 0t0 TCP *:19530 (LISTEN)
redis-ser 42581 root 6u IPv6 507112 0t0 TCP *:6379 (LISTEN)
redis-ser 42581 root 7u IPv4 507113 0t0 TCP *:6379 (LISTEN)


In [2]:
from milvus import Milvus, IndexType, MetricType, Status
import redis

milv = Milvus(host = '127.0.0.1', port = 19530)
r = redis.StrictRedis(host="127.0.0.1", port=6379) 

In [3]:
milv.client_version()

'1.1.0'

#### 2. Loading Movies into Redis
We begin by loading all the movie files into redis. 

In [4]:
import json
import codecs

#1::Toy Story (1995)::Animation|Children's|Comedy
def process_movie(lines, redis_cli):
 for line in lines:
 if len(line.strip()) == 0:
 continue
 tmp = line.strip().split("::")
 movie_id = tmp[0]
 title = tmp[1]
 genre_group = tmp[2]
 tmp = genre_group.strip().split("|")
 genre = tmp
 movie_info = {"movie_id" : movie_id,
 "title" : title,
 "genre" : genre
 }
 redis_cli.set("{}##movie_info".format(movie_id), json.dumps(movie_info))
 
with codecs.open("movie_recommender/movies.dat", "r",encoding='utf-8',errors='ignore') as f:
 lines = f.readlines()
 process_movie(lines, r)

#### 3. Creating Partition and Collection in Milvus

In [5]:
COLLECTION_NAME = 'demo_films'
PARTITION_NAME = 'Movie'

#Dropping collection for clean slate run
milv.drop_collection(COLLECTION_NAME)


param = {'collection_name':COLLECTION_NAME, 
 'dimension':32, 
 'index_file_size':2048, 
 'metric_type':MetricType.L2
 }

milv.create_collection(param)
# milv.create_partition(COLLECTION_NAME, PARTITION_NAME)

Status(code=0, message='Create collection successfully!')

In [7]:
milv.get_collection_info(COLLECTION_NAME)

(Status(code=0, message='Describe collection successfully!'),
 CollectionSchema(collection_name='demo_films', dimension=32, index_file_size=2048, metric_type=))

#### 4. Getting Embeddings and IDs
The vectors in `movie_vectors.txt` are obtained from the `user_vector_model` downloaded above. So we can directly get the vectors and the IDs by reading the file.

In [8]:
def get_vectors():
 with codecs.open("movie_recommender/movie_vectors.txt", "r", encoding='utf-8', errors='ignore') as f:
 lines = f.readlines()
 ids = [int(line.split(":")[0]) for line in lines]
 embeddings = []
 for line in lines:
 line = line.strip().split(":")[1][1:-1]
 str_nums = line.split(",")
 emb = [float(x) for x in str_nums]
 embeddings.append(emb)
 return ids, embeddings

ids, embeddings = get_vectors()

#### 4. Importing Vectors into Milvus
Import vectors into the partition **Movie** under the collection **demo_films**.

In [9]:
# status = milv.insert(collection_name=COLLECTION_NAME, records=embeddings, ids=ids, partition_tag=PARTITION_NAME)
status = milv.insert(collection_name=COLLECTION_NAME, records=embeddings, ids=ids)
status[0]

Status(code=0, message='Add vectors successfully!')

### Recalling Vectors in Milvus
#### 1. Genarating User Embeddings
Pass in the gender, age and occupation of the user we want to recommend. **user_vector_model** model will generate the corresponding user vector.
Occupation is chosen from the following choices:
* 0: "other" or not specified
* 1: "academic/educator"
* 2: "artist"
* 3: "clerical/admin"
* 4: "college/grad student"
* 5: "customer service"
* 6: "doctor/health care"
* 7: "executive/managerial"
* 8: "farmer"
* 9: "homemaker"
* 10: "K-12 student"
* 11: "lawyer"
* 12: "programmer"
* 13: "retired"
* 14: "sales/marketing"
* 15: "scientist"
* 16: "self-employed"
* 17: "technician/engineer"
* 18: "tradesman/craftsman"
* 19: "unemployed"
* 20: "writer"

In [10]:
import numpy as np
from paddle_serving_app.local_predict import LocalPredictor

class RecallServerServicer(object):
 def __init__(self):
 self.uv_client = LocalPredictor()
 self.uv_client.load_model_config("movie_recommender/user_vector_model/serving_server_dir") 
 
 def hash2(self, a):
 return hash(a) % 1000000

 def get_user_vector(self):
 dic = {"userid": [], "gender": [], "age": [], "occupation": []}
 lod = [0]
 dic["userid"].append(self.hash2('0'))
 dic["gender"].append(self.hash2('M'))
 dic["age"].append(self.hash2('23'))
 dic["occupation"].append(self.hash2('6'))
 lod.append(1)

 dic["userid.lod"] = lod
 dic["gender.lod"] = lod
 dic["age.lod"] = lod
 dic["occupation.lod"] = lod
 for key in dic:
 dic[key] = np.array(dic[key]).astype(np.int64).reshape(len(dic[key]),1)
 fetch_map = self.uv_client.predict(feed=dic, fetch=["save_infer_model/scale_0.tmp_1"], batch=True)
 return fetch_map["save_infer_model/scale_0.tmp_1"].tolist()[0]

recall = RecallServerServicer()
user_vector = recall.get_user_vector()

2021-06-23 16:29:24,262 - INFO - LocalPredictor load_model_config params: model_path:movie_recommender/user_vector_model/serving_server_dir, use_gpu:False, gpu_id:0, use_profile:False, thread_num:1, mem_optim:True, ir_optim:False, use_trt:False, use_lite:False, use_xpu: False, use_feed_fetch_ops:False


In [11]:
user_vector

[0.0,
 4.911433696746826,
 4.132595062255859,
 3.2255895137786865,
 0.0,
 4.944108963012695,
 0.0,
 0.0,
 1.27165687084198,
 3.1072912216186523,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 1.9184402227401733,
 0.0,
 0.0,
 0.0,
 4.42396354675293,
 2.0686450004577637,
 0.0]

#### 2. Searching
Pass in the user vector, and then recall vectors in the previously imported data collection and partition.

In [None]:
TOP_K = 20
SEARCH_PARAM = {'nprobe': 20}
status, results = milv.search(collection_name=COLLECTION_NAME, query_records=[user_vector], top_k=TOP_K, params=SEARCH_PARAM)

#### 3. Returning Information by IDs

In [None]:
recall_results = []
for x in results[0]:
 recall_results.append(r.get("{}##movie_info".format(x.id)).decode('utf-8'))
recall_results

['{"movie_id": "760", "title": "Stalingrad (1993)", "genre": ["War"]}',
 '{"movie_id": "1350", "title": "Omen, The (1976)", "genre": ["Horror"]}',
 '{"movie_id": "1258", "title": "Shining, The (1980)", "genre": ["Horror"]}',
 '{"movie_id": "632", "title": "Land and Freedom (Tierra y libertad) (1995)", "genre": ["War"]}',
 '{"movie_id": "3007", "title": "American Movie (1999)", "genre": ["Documentary"]}',
 '{"movie_id": "2086", "title": "One Magic Christmas (1985)", "genre": ["Drama", "Fantasy"]}',
 '{"movie_id": "1051", "title": "Trees Lounge (1996)", "genre": ["Drama"]}',
 '{"movie_id": "3920", "title": "Faraway, So Close (In Weiter Ferne, So Nah!) (1993)", "genre": ["Drama", "Fantasy"]}',
 '{"movie_id": "1303", "title": "Man Who Would Be King, The (1975)", "genre": ["Adventure"]}',
 '{"movie_id": "652", "title": "301, 302 (1995)", "genre": ["Mystery"]}',
 '{"movie_id": "1605", "title": "Excess Baggage (1997)", "genre": ["Adventure", "Romance"]}',
 '{"movie_id": "1275", "title": "High

### Conclusion

After completing the recall service, the results can be further sorted using the **movie_recommender** model, and then the movies with high similarity scores can be recommended to users. You can try this deployable recommendation system using this [quick start](QUICK_START.md).