Are you sure you want to delete this access key?
Idea is to build multiple model and ensemble them to create a stable strong model.
Your final pipeline structure will look like this in the end:
+-------------------+
| data/iris.csv.dvc |
+-------------------+
*
*
*
+-----------+
| split.dvc |
+-----------+
*
*
*
+---------------+
****| featurize.dvc |****
******** +---------------+ ********
******** ** *** *********
******** *** ** ********
***** ** ** ********
+--------------------+ +---------------+ +-------------------+ *****
| train_logistic.dvc |** | train_svc.dvc | | train_forrest.dvc | ********
+--------------------+ ******** +---------------+ +-------------------+ ********
******** ** *** *********
******** *** ** ********
***** ** ** *****
+--------------------+
| train_ensemble.dvc |
+--------------------+
Order is data -> train_test_split -> feature_extraction -> 3 models -> ensemble_model
and Done!!
With data/iris.csv
versioned in DVC you can start with your first pipeline.
Define Stage1
=> split.dvc
i.e train_test_split.
dvc run -f split.dvc\
-d data/iris.csv\
-d src/train_test_split.py\
-o data/split\
python src/train_test_split.py -i "data/iris.csv" -o "data/split/"
Define Stage2
=> featurize.dvc
i.e Feature Engineering.
dvc run -f featurize.dvc\
-d data/split\
-d src/feature_engineering.py\
-o data/features\
-o data/models/pca/model.gz\
-O data/models/pca/params.yml\
-M data/models/pca/metrics.csv\
python src/feature_engineering.py -i "data/split/" -o "data/features/" -o "data/models/pca/"
Define Stage3.a
=> train_logistic.dvc
i.e Fit Logistic Regression Model.
dvc run -f train_logistic.dvc\
-d src/logistic_regression.py\
-d data/features\
-o data/models/logistic/model.gz\
-O data/models/logistic/params.yml\
-M data/models/logistic/metrics.csv\
python src/logistic_regression.py -i "data/features/" -o "data/models/logistic/"
Define Stage3.b
=> train_svc.dvc
i.e Fit Linear SVC Model.
dvc run -f train_svc.dvc\
-d src/linear_svc.py\
-d data/features\
-o data/models/svc/model.gz\
-O data/models/svc/params.yml\
-M data/models/svc/metrics.csv\
python src/linear_svc.py -i "data/features/" -o "data/models/svc/"
Define Stage3.c
=> train_forrest.dvc
i.e Fit Random Forrest Model.
dvc run -f train_forrest.dvc\
-d src/random_forrest.py\
-d data/features\
-o data/models/r_forrest/model.gz\
-O data/models/r_forrest/params.yml\
-M data/models/r_forrest/metrics.csv\
python src/random_forrest.py -i "data/features/" -o "data/models/r_forrest/"
Define Stage4
=> train_ensemble.dvc
i.e Create an Ensemble Model.
dvc run -f train_ensemble.dvc\
-d src/ensemble.py\
-d data/features\
-d data/models/logistic/model.gz\
-d data/models/svc/model.gz\
-d data/models/r_forrest/model.gz\
-o data/models/ensemble/model.gz\
-O data/models/ensemble/params.yml\
-M data/models/ensemble/metrics.csv\
python src/ensemble.py -i "data/features/" -m "data/models/" -o "data/models/ensemble/"
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?