Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  mlops azure Task:  image classification Data Domain:  computer vision Framework:  tensorflow Integration:  dvc git
Aiswarya Srinivas f9bf2c619d
Updating Dagshub branch value to be main instead of master
1 year ago
5a8df4b18d
Updating Git with raw data
1 year ago
f2a94a28c9
file added variables.data-00000-of-00001
1 year ago
ab34905b2b
Updating Notebook with Metrics Upload to Dagshub
1 year ago
6795bb4195
Initial commit
1 year ago
6795bb4195
Initial commit
1 year ago
src
f9bf2c619d
Updating Dagshub branch value to be main instead of master
1 year ago
6795bb4195
Initial commit
1 year ago
6933409b40
Updating Metrics File
1 year ago
6795bb4195
Initial commit
1 year ago
b1ac9ea858
Updating Readme with the Article Link
1 year ago
868a5fc04c
Revert "Delete 'dvc.lock'"
1 year ago
12541e9375
Revert "Delete 'dvc.yaml'"
1 year ago
6933409b40
Updating Metrics File
1 year ago
bd1db2d9b9
file added metrics to private repo
1 year ago
fa77cbeb2a
Adding EfficientModels to The Raw Data
1 year ago
6795bb4195
Initial commit
1 year ago
6795bb4195
Initial commit
1 year ago
4007ec3709
Test Sample File added to check upload functionality
1 year ago
8b4f71fe36
file added test_1 to private repo
1 year ago
b3d9f7c19f
file added test_1 to private repo
1 year ago
6795bb4195
Initial commit
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Mayo_Stroke_Blood_Clot_Origin

The goal of this project is to classify the blood clot origins in ischemic stroke. Using whole slide digital pathology images, you'll build a model that differentiates between the two major acute ischemic stroke (AIS) etiology subtypes: cardiac and large artery atherosclerosis.

The model is to be trained on Azure using Data from the repo using streaming client

This repo contains the data and code files for the kaggle competition https://www.kaggle.com/competitions/mayo-clinic-strip-ai.

Tools Used

  • Dagshub Streaming Client
  • Azure ML-SDK

About the Data

The data originally present for the challenge is in tiff format and is 356GB in size. For the purpose of this analysis, we downscale the images and store them in png format. The code for this downscaling of data can be found in this repo in data/raw/train/ folder. The downscaled images can be found here. To map the images to the classes they belong to, the competition data provides us with a CSV file that contains information about the data which can be found in data/raw/train.csv

In addition this repo, also contains the Pretrained EfficientNet Models which can be found in data/raw/Pretrained_Efficient_Models/ folder

Setting Up the Environment

  1. Create a Virtual Environment with python 3.9.13.
conda create -n azure_ml python=3.9.13
  1. Install the following libraries in the Environment
conda activate azure_ml  [Goes into conda Environment ]
pip install azureml-sdk[notebooks,tensorboard,interpret]  

  1. Clone the Repo

  2. The code can be found in notebooks/01-aiswarya-ramachandran-connecting-to-azure-ml-sdk.ipynb . This notebooks contains the complete code and internally creates the scripts needed to train the model on Azure.

Steps Involved

  1. Install the Dagshub Client
  2. Create a Workspace on Azure
  3. Create a GPU Compute on which we want to run the training
  4. Creating the training scripts and put them all into a folder so that they can be pushed to Azure. The training scripts, read the data from Dagshub Streaming client as the data is not pushed to Azure. To do this the scripts must include a step to clone the git repo on Azure.
  5. Create an Environment which contains the list of conda and pip dependencies that need to be installed on Azure.
  6. Create an experiment and a Run config which takes all the information that is needed for the training to run on Azure
  7. Monitor the metrics

Article describing this entire process can be found at https://hackernoon.com/image-classification-on-azure-with-dagshub-direct-data-access

Tip!

Press p or to see the previous file or, n or to see the next file

About

This repo contains the data and code files for the kaggle competition https://www.kaggle.com/competitions/mayo-clinic-strip-ai.

Collaborators 2

Comments

Loading...