Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Yashasvi Sharma 317cf05412
Merge pull request #10 from Omii2899/yash_v6
2 days ago
df3ccb99d5
Intial changes in config file
4 days ago
1647bcfbbf
Changed dag graph
2 days ago
src
452e851977
Update datapipeline.py
2 days ago
3773f8b0ce
DVC Setup
3 weeks ago
c305c94f08
Logger; Pre-processing and Statistics Upload for data pipeline
6 days ago
87e90492d7
Create Guide to get started
3 weeks ago
c6d9bb29ff
Adding scoping doc
1 week ago
f6dc69c575
Adding dag details
2 days ago
3773f8b0ce
DVC Setup
3 weeks ago
8b8e09859d
Merge branch 'main' into yash_v5
2 days ago
eea6ba15c3
Created and fixed email trigger for dags
2 days ago
5e9b3df7b7
Merge remote-tracking branch 'origin' into akshi
2 days ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Brain-Tumor-Classification

This repository contains the code and configuration files for a Brain Tumor Detection MLOps project. The project includes data versioning, data pipelines, and Docker for containerization.

Introduction

Brain tumors are a significant health challenge, with approximately 24,810 adults in the United States diagnosed in 2023. The complexity and variability of brain tumors make accurate diagnosis difficult, especially in regions lacking skilled medical professionals. This project leverages machine learning to develop an end-to-end ML pipeline for automated brain tumor detection, aiming to provide scalable, reliable, and timely diagnostic support.

Dataset Information

The dataset combines MRI images from three sources: figshare, SARTAJ, and Br35H. It includes 7023 JPEG images of human brains, categorized into four classes: glioma, meningioma, no tumor, and pituitary.

  • Dataset Name: Brain Tumor MRI Images
  • Size: 7023 images
  • Format: JPEG
  • Classes: Glioma, Meningioma, No Tumor, Pituitary
  • Sources:

All data used are sourced from publicly available datasets with proper usage permissions.

Project workflow

picture alt

Prerequisites

Before you begin, ensure you have the following installed on your machine:

  • Git
  • Docker
  • Airflow
  • DVC (Data Version Control)
  • Python 3.x
  • Pip (Python package installer)
  • Google Cloud Platform (GCP) Account

Getting Started

To get started with the project, follow these steps:

1. Clone the Repository

Clone the repository using the following command:

git clone https://github.com/Omii2899/Brain-Tumor-Classification.git
cd Brain-Tumor-Detection

2. Create a Python Virtual Environment

Create a virtual environment to manage project dependencies:

pip install virtualenv
python -m venv <virtual_environment_name>
source <virtual_environment_name>/bin/activate 

3. Install the Dependencies

Install the necessary dependencies using the requirements.txt file:

pip install -r requirements.txt

4. Get the Data from Remote Source

Pull the data from the remote source using DVC:

dvc pull

5. Add the Key File:

You need to add the key file in src/keys folder. For security purposes, we have not included this file. To obtain this file, please contact Aadarsh

Description of Files and Folders

Project Structure:

├── .dvc
│   ├── config
│   ├── .gitignore
├── data
│   ├── Testing
│   │   ├── ...
│   ├── Training
│   │   ├── ...
├── src
│   │   
│   ├── dags
│   │   ├── scripts
│   │       ├── logger.py
│   │       ├── preprocessing.py
│   │       ├── statistics.py
│   │   ├── datapipeline.py
│   └── keys
│   │       ├── keyfile.json
├── .dvcignore
├── .gitignore
├── data.dvc
├── dockerfile
├── entrypoint.sh
├── requirements.txt

Source code files:

  1. Data Version Control (DVC)
    • config: DVC configuration file for setting up data versioning.
    • .gitignore: Specifies which DVC files should be ignored by Git.

  2. Data
    • Testing: Contains subfolders with testing images.
    • Training: Contains subfolders with training images.

  3. Source
    • datapipeline.py: Orchestrates the data pipeline process, integrating various preprocessing and processing scripts.
    • scripts:
      • logger.py: Sets up and configures logging for the project. It creates a logger instance with a specific format and log level.
      • preprocessing.py: Handles image preprocessing for both training and testing phases using TensorFlow's ImageDataGenerator. It contains two primary functions: preprocessing_for_training that applies various augmentation techniques, such as normalization, rotation, width and height shifts, shearing, zooming, and horizontal flipping, to enhance the training datasetand and preprocessing_for_testing_inference for normalizing the data.
      • statistics.py: Captures statistics of each image in each class, generates histograms, and validates images against these histograms using OpenCV.

  4. Configuration and Ignore Files
    • .dvcignore: Specifies files and directories that DVC should ignore.
    • .gitignore: Specifies files and directories that Git should ignore.
    • data.dvc: DVC data tracking file.

  5. Docker Configuration
    • Dockerfile: Contains instructions to build a Docker image for the project.

  6. Entry Point
    • entrypoint.sh: Shell script to set up the environment and run the application inside Docker.

  7. Dependencies
    • requirements.txt: Lists the Python dependencies needed for the project.

Running the data pipeline

To run the pipeline, you can use Docker for containerization.

  1. Build the Docker Image
docker build -t image-name:tag-name .
  1. Verify the image
docker images
  1. Run the built image
docker run -it --rm -p 8080:8080 image-name:tag-name

The application should now be running and accessible at http://localhost:8080 .

Use the below credentials- User: mlopsproject Password: admin

Note: If the commands fail to execute, ensure that virtualization is enabled in your BIOS settings. Additionally, if you encounter permission-related issues, try executing the commands by prefixing them with sudo.

  1. Trigger the airflow UI
python src/dags/datapipeline.py

DAG:

picture alt picture alt

Contributors

Aadrash Siddha
Akshita Singh
Praneith Ranganath
Shaun Kirtan
Yashasvi Sharma

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...