Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.md 2.7 KB

You have to be logged in to leave a comment. Sign In

Mayo_Stroke_Blood_Clot_Origin

The goal of this project is to classify the blood clot origins in ischemic stroke. Using whole slide digital pathology images, you'll build a model that differentiates between the two major acute ischemic stroke (AIS) etiology subtypes: cardiac and large artery atherosclerosis.

The model is to be trained on Azure using Data from the repo using streaming client

This repo contains the data and code files for the kaggle competition https://www.kaggle.com/competitions/mayo-clinic-strip-ai.

Tools Used

  • Dagshub Streaming Client
  • Azure ML-SDK

About the Data

The data originally present for the challenge is in tiff format and is 356GB in size. For the purpose of this analysis, we downscale the images and store them in png format. The code for this downscaling of data can be found in this repo in data/raw/train/ folder. The downscaled images can be found here. To map the images to the classes they belong to, the competition data provides us with a CSV file that contains information about the data which can be found in data/raw/train.csv

In addition this repo, also contains the Pretrained EfficientNet Models which can be found in data/raw/Pretrained_Efficient_Models/ folder

Setting Up the Environment

  1. Create a Virtual Environment with python 3.9.13.
conda create -n azure_ml python=3.9.13
  1. Install the following libraries in the Environment
conda activate azure_ml  [Goes into conda Environment ]
pip install azureml-sdk[notebooks,tensorboard,interpret]  

  1. Clone the Repo

  2. The code can be found in notebooks/01-aiswarya-ramachandran-connecting-to-azure-ml-sdk.ipynb . This notebooks contains the complete code and internally creates the scripts needed to train the model on Azure.

Steps Involved

  1. Install the Dagshub Client
  2. Create a Workspace on Azure
  3. Create a GPU Compute on which we want to run the training
  4. Creating the training scripts and put them all into a folder so that they can be pushed to Azure. The training scripts, read the data from Dagshub Streaming client as the data is not pushed to Azure. To do this the scripts must include a step to clone the git repo on Azure.
  5. Create an Environment which contains the list of conda and pip dependencies that need to be installed on Azure.
  6. Create an experiment and a Run config which takes all the information that is needed for the training to run on Azure
  7. Monitor the metrics

Article describing this entire process can be found at https://hackernoon.com/image-classification-on-azure-with-dagshub-direct-data-access

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...