Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
[This project is part of the AI Engineer cursus on OpenClassrooms]
We are provided with a dataset from Yelp containing 6.990.280 reviews wrote by 1.987.897 users about 150.346 businesses of any kind. We can also use 200 100 pictures of foods, drinks, interiors, exteriors and menus taken by the users.
The purpose of this project is to train on the 3 following features:
- detect topics of dissatisfaction in the comments published on the platform.
- automatically label the photos posted on the platform.
- collect new reviews using the Yelp API.
As the notebooks use hyperlinks for the navigation, and because this doesn't work on GitHub, they are also avaible on nbviewer.org for convenience.
In order to use this project locally, you will need to have Python and Jupyter notebook installed. Once done, we can set the environment by using the following commands:
let's duplicate the project github repository
>>> git clone https://github.com/Valkea/OC_AI_06
>>> cd OC_AI_06
let's download the datasets and put them in the 'data' folder:
data/yelp_dataset/yelp_academic_dataset_business.json data/yelp_dataset/yelp_academic_dataset_review.json data/yelp_photos/photos.json' data/yelp_photos/photos/[the photos].jpg
let's create a virtual environment and install the required Python libraries
(Linux or Mac)
>>> python3 -m venv venvP6
>>> source venvP6/bin/activate
>>> pip install -r requirements.txt
(Windows):
>>> py -m venv venvP6
>>> .\venvP6\Scripts\activate
>>> py -m pip install -r requirements.txt
let's configure and run the virtual environment for Jupyter notebook
>>> pip install ipykernel
>>> python -m ipykernel install --user --name=venvP6
REQUIRED: let's install the spacy model used in this project
>>> python -m spacy download en_core_web_sm
OPTIONAL: if you encounter a problem after the Streamlit installation, the following line might solve it:
>>> echo fs.inotify.max_user_watches=524288 | sudo tee /etc/sysctl.d/40-max-user-watches.conf && sudo sysctl --system
OPTIONAL: if tf-lite is not correctly installed, it might be a good idea to try the following line:
>>> pip install --extra-index-url https://google-coral.github.io/py-repo/ tflite_runtime
In order to run the various notebooks, you will need to use the virtual environnement created above. So once the notebooks are opened (see below), prior to running it, follow this step:
>>> jupyter notebook 01_Topic_Modelling.ipynb
>>> jupyter notebook 02_Image_Classification.ipynb
>>> streamlit run 03_Streamlit.py
⚠️ Unfortunately, to run the streamlit app with all features, we need the models and as they are quite heavy they aren't pushed to GitHub.
So in order to run this app, there are three solutions:
Clone the models from DagsHub using DVC (you need to install DVC prior to using the following command line):
>>> dvc remote add origin https://dagshub.com/Valkea/OC_AI_06.dvc >>> dvc pull -r origin
or
Run both notebooks to export the models (they will automatically export), and convert the VGG16 produced model to a TF-lite model.
>>> cd models >>> python model_convert_to_TFLite.py vgg16_clf1.h5 vgg16_clf1.epochXX-categorical_accuracyX.XX.hdf5 # replace XX with the best produced model values >>> mv vgg16_clf1.tflite vgg_15_clf.tflite >>> cd ..
or
Run the streamlit app without the models... the app will work but some feature will be missing.
>>> python 02_API_Scrapping.py # this will gather "200" new reviews from "France" and save them to "data/api_export.csv"
or
>>> python 02_API_Scrapping.py --help # this will return the full list of available parameters
or
>>> python 02_API_Scrapping.py -n=100 -l=Chicago -p=data/my_file.csv -v=3 # This is an example
Once done with the project, the kernel can be listed and removed using the following commands:
>>> jupyter kernelspec list
>>> jupyter kernelspec uninstall venvp6
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?