Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Administrator 1de1160d50
Merge branch 'docker' into 'main'
2 years ago
d3c517aa2a
Added remote dvc and mlflow hosted on dagshub.com. Added unknow catalog artifacts
2 years ago
c31a5f61f8
Added missing packages related with mlflow; restructured fastapi service project
2 years ago
3e66724c01
Resolve conflicts with git and dvc
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
cde2d398db
Change median size to 11000000
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
c3d3748d56
Added minio and nginx services only
2 years ago
src
0548ee0395
Refactor code related with fastapi service
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
de4eddbbe7
Added dvc.yaml and executed first repro command
2 years ago
481541e101
Added .example_env file
2 years ago
e82c0d385f
Deleted directories related with MLflow since its tracking server is in docker now
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
0548ee0395
Refactor code related with fastapi service
2 years ago
0548ee0395
Refactor code related with fastapi service
2 years ago
c3d3748d56
Added minio and nginx services only
2 years ago
0c124f3328
Refactor .env, docker-copose.yaml, training and evaluating scripts
2 years ago
660344581e
Add fastapi service
2 years ago
c31a5f61f8
Added missing packages related with mlflow; restructured fastapi service project
2 years ago
c31a5f61f8
Added missing packages related with mlflow; restructured fastapi service project
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
d076bf5183
Initial working project based on Snakemake and Cookiecutter
2 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

hw_cookiecutter

google store app rating prediction

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Deploy MLflow tracking server of scenario number 4 in Docker

1. Storing environment variables

These variables are stored in .env file but must be added to .gitignore for safety reasons and .env.example is tracking under VCS. .env.example file stores only key of variables not values!

2. Source docker files

Create directory for example Docker under project root.

$ mkdir Docker

3. Services

Core of the deployed app will correspond to MLflow tracking server of scenario 4:

Thus, the app will consist of the following services:

  • S3 storage (minio) host;
  • Nginx proxy server of S3 host for load balance;
  • MLflow tracking server;
  • PostgreSQL database;
  • PgAdmin server for postgresql db.

4. docker-compose.yaml

This file will start all defined images that handle all defined services.

Create docker-compose.yaml file and describe all services:

$ vim docker-compose.yaml

Note: docker-compose can use environment variables. For this it is necessary to have .env file in the same directory of the docker-compose.yaml.

To check docker-compose.yaml file run:

$ docker-compose config

Minio

Command example of running standalone MinIO on Docker:

$ docker run -p 9000:9000 -p 9001:9001 \
  -e "MINIO_ROOT_USER=AKIAIOSFODNN7EXAMPLE" \
  -e "MINIO_ROOT_PASSWORD=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  quay.io/minio/minio server /data --console-address ":9001"

If call above command without setting root user and root password then default user and password will be minioadmin:minioadmin correspondingly.

Important: After start minio service it is necessary to create s3 bucket with name equals to environment variable MINIO_S3_BUCKET.

Note: variables MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are equivalent to MINIO_ACCESS_KEY and MINIO_SECRET_KEY correspondingly. MinIO warning recommends to use first variant.

Note: official documentation uses the following image of minio quay.io/minio/minio but starting docker-compose results in download two images related to the service:quay.io/minio/minio and minio/minio. Probably sufficient condition of start the service is to use minio/minio image.

Nginx

MinIO official documentation uses nginx server as proxy one. These configuration files are placed on: https://docs.min.io/docs/deploy-minio-on-docker-compose.

Create nginx.conf file in ./Docker directory and fill it with required settings (example can be found on link mentioned above).

Question: what :ro does mean in nginx volume forward ./Docker/nginx.conf:/etc/nginx/nginx.conf:ro?

PostgreSQL and PGadmin

Configures can be seen in docker-compose.yaml it is pretty standard.

  1. Important: before start mounted host volume MUST belong to 5050 user id and the same group id. This id corresponds to pgadmin user/group somehow:

    $ mkdir ./Docker/pgadmin
    $ sudo chown -R 5050:5050 ./Docker/pgadmin
    
  2. After starting the containers need to add new server at PGadmin that is accessible on 127.0.0.1:5050 as defined in docker-compose.yaml:

    • set to Host name/addres field ip address of the postgresql database. To find out required ip first search for container id of postgresql service:

      $ docker ps
      

      Then use docker inspect command to list configuration of the container. Required ip address can be found at section Networks:

      $ docker inspect 3c982a5b3e76
      
    • fields Maintenance database, Username, Password must equal to environment values of postgresql service: POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD correspondingly.

Note: PGadmin can be proxied through Nginx as Minio also (see official documentation).

MLflow tracking server

  1. Create Dockerfile in ./Docker/mlflow_image/ that based on python 3.9 image and installs mlflow, boto3 and psycopg2 python packages.

  2. Include in docker-compose.yaml command that build image from the Dockerfile.

Note: MLflow obtains credentials to access S3 from ~/.aws/credentials or environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY depending on which of these are available.

Built-in deployment MLflow models

Local environment

To run MLflow server for model serving type:

$ mlflow models serve --no-conda -m path_to_model -h 0.0.0.0 -p 8001

Argument path_model can be obtained from MLflow UI in field Full Path under specific experiment.

Important: before run the server it is needed to set environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_S3_BUCKET_NAME to proper values:

$ export AWS_ACCESS_KEY_ID=minioadmin
$ export AWS_SECRET_ACCESS_KEY=minioadmin
$ export AWS_S3_BUCKET_NAME=app-ratings

While the sever deployed it is possible send POST request on http://127.0.0.1:8001/invocations with JSON body that has "columns" and "data" properties and corresponding values in array format.

Docker environment

$ mlflow models build-docker -m path_to_model -n image_name
$ docker run image_name -p p_out:p_in

External tools for deployment MLflow models

FastAPI

  1. Create Dockefile in Docker/fastapi_model_service.

  2. Create directory app in ./Docker/fastapi_model_service. Place application code in this directory.

Note: docker is not able to copy files from outside of Dockerfile directory.

  1. Include the application in docker-compose.yaml.

General concepts of deployment

Some of the important concepts are:

  • Security - HTTPS
  • Running on startup
  • Restarts
  • Replication (the number of processes running)
  • Memory
  • Previous steps before starting

For details read https://fastapi.tiangolo.com/deployment/concepts/

Tip!

Press p or to see the previous file or, n or to see the next file

About

First steps into world of DVC and MLflow

Collaborators 1

Comments

Loading...