Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Emmanuel cdab7162d8
refactor: update the model files on DVC
1 year ago
f771d617aa
feature: push the model files to DVC / Dagshub
1 year ago
cdab7162d8
refactor: update the model files on DVC
1 year ago
0441c2dbec
refactor: add missing a media file used in the README file
1 year ago
f771d617aa
feature: push the model files to DVC / Dagshub
1 year ago
900e87d192
refactor: fix some text and add a zoom plot on the final ratio scores
1 year ago
b67f7a5c01
refactor: change code to deplou the model_bpr instead of the model_als
1 year ago
2ad79d5b70
refactor: calls local azure function server if the secrets.txt file is not present, otherwise calls the online azure function
1 year ago
dd16ad78c3
refactor: add more details about the requiered environment for the Streamlit tests
1 year ago
0bd0d2d24e
feature: add requirements.txt file
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Recommender Systems

("Réalisez une application de recommandation de contenu")

[This project is part of the AI Engineer class on OpenClassrooms]

We are provided with a dataset from globo.com containing 364,047 article's metadatas and 2,988,181 user's interactions with these articles.

The purpose of this project is to train on various aspects of the recommender systems:

  • Collaborative filtering
  • Content Based filtering
  • Hybrid filtering
  • Serverless deployment (Azure function)
  1. At first, we will conduce an EDA (01_EDA.ipynb) in order to better understand the dataset and prepare some pre-processed datasets.
  2. Then we will search for a baseline model. (02_Recommender_systems.ipynb)
  3. After that, we will try various appoaches of either the Collaborative and Content-based filtering. (02_Recommender_systems.ipynb)
  4. Next, we will build an Hybrid model based on the best Collaborative and Content-based models. (02_Recommender_systems.ipynb)
  5. And, we will develop and deploy an Azure Function to expose the hybrid model. (02_Recommender_systems.ipynb)
  6. Finally, we create a Streamlit app to test the model. (03_Streamlit.py)

Running the notebooks online

As the notebooks are sometimes too big to be displayed on GitHub (and because the hyperlinks used for the navigation, doesn't work on GitHub), note that they are also avaible on nbviewer.org and dagshub.com for convenience.

Setting up the project / Running the notebooks locally

In order to use this project locally, you will need to have Python and Jupyter notebook installed. Once done, we can set the environment by using the following commands:

First,

let's duplicate the project github repository

>>> git clone https://github.com/Valkea/OC_AI_09
>>> cd OC_AI_09

Secondly,

let's download the dataset and unzip it in the 'data' folder:

  • data/news-portal-user-interactions-by-globocom/articles_metadata.csv

  • data/news-portal-user-interactions-by-globocom/clicks/clicks_hour_XXX.csv

  • data/news-portal-user-interactions-by-globocom/articles_embeddings.pickle

and let's clone the large file with DVC (you need to install DVC prior to using the following command line):

>>> dvc remote add origin https://dagshub.com/Valkea/OC_AI_09.dvc
>>> dvc pull -r origin

Thirdly,

let's create a virtual environment and install the required Python libraries

(Linux or Mac)

>>> python3 -m venv venvP9
>>> source venvP9/bin/activate
>>> pip install -r requirements.txt

(Windows):

>>> py -m venv venvP9
>>> .\venvP9\Scripts\activate
>>> py -m pip install -r requirements.txt

Finally,

let's configure and run the virtual environment for Jupyter notebook

Install jupyter kernel for the virtual environment using the following command:
>>> pip install ipykernel
>>> python -m ipykernel install --user --name=venvP9
Select the installed kernel

In order to run the various notebooks, you will need to use the virtual environnement created above. So once the notebooks are opened (see below), prior to running it, follow this step: alt text

Run the jupyter notebooks

To see the notebooks, run:

>>> jupyter lab
  • 01_EDA.ipynb shows the Exploratory Data Analysis of the available files
  • 02_Recommender_systems.ipynb shows

Running the API server locally

The hybrid recommender system is deployed using an Azure function, and if I shared the secrets.txt file containing the FUNCTION_KEY with you, you can simply jumb to the Streamlit test.

However, in case I didn't shared the secrets.txt with you, you can still start a local instance of the very same Azure function with the following steps:

1. Install the Azure CLI and Azure CORE

2. Move to the azure_function folder

>> cd azure_function

3. Create a new virtual environment for the azure function & install the requiered files

(Linux or Mac)

>>> python3 -m venv venvP9azure
>>> source venvP9azure/bin/activate
>>> pip install -r requirements.txt

(Windows):

>>> py -m venv venvP9azure
>>> .\venvP9azure\Scripts\activate
>>> py -m pip install -r requirements.txt

4. Start the local Azure function

(venv9azure) >>> func host start --port 5000

Stop the Azure function local server, with CTRL+C (once the tests are done, from another terminal...)

Testing the model with a Streamlit app

Once you have access to the Azure function (either locally or in the cloud with the secret key), you can test some recommendations using the Streamlit user interface (from another terminal if you are already running the local Azure function server, and with the venv9 virtual environment):

(venv9) >>> streamlit run 03_Streamlit.py

Set the number of recommendations you want to receive, then click the button next to a user_id to get recommendations (only a tiny fraction of all users are displayed).

Stop the Streamlit server, with CTRL+C (once the tests are done)

Cloud deployement

I used Azure Function to deploy this project in the cloud. So let's recall the deployment steps...

1. Intialize the folder you’re working in

>>> func init FOLDER_NAME
or
>>> func init FOLDER_NAME --python

2. Set up the bare bones Azure Function

>>> cd FOLDER_NAME
>>> func new

then select HTTP trigger

3. Add the requiered libs to the FOLDER_NAME/requirements.txt file

4. Create a virtual environment & install libs

>>> python -m venv VENV_NAME
>>> source VENV_NAME/bin/activate
>>> pip install -r requirements.txt

5. Run local Azure instance

>>> func host start
or 
>>> func host start --port 5000

6. Create & configure an Azure Function APP_NAME on the Azure Portal

7. Deploy to Azure (you need APP_NAME azure function created on the portal)

>>> az login
>>> func azure functionapp publish APP_NAME --build remote

8. Grab the function URL on top right of the function page for remote calls *(Postman, cURL, Streamlit app...)

In this project I used the following parameters:

  • FOLDER_NAME: azure_function
  • VENV_NAME: venvP9azure
  • APP_NAME: globo-reco

Uninstalling the venv kernel

Once done with the project, the kernel can be listed and removed using the following commands:

>>> jupyter kernelspec list
>>> jupyter kernelspec uninstall venvp9
Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...