Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Hamed Mehrabi ed37ab5f89
README.mb updated
1 year ago
a374c26432
dvc update
1 year ago
9d48fa9de4
workflow added first time
1 year ago
9755a44df4
model.pkl updataed
1 year ago
34f50d60b0
st -01 - Finished py scripting
1 year ago
b396ad20ed
st -01 - dvc lock added
1 year ago
6418f9be9f
research based files
1 year ago
src
1c5dda8383
st-05 py scripting completed
1 year ago
51d4dfaa3e
st 01- dvc yaml updated
1 year ago
7aee335c2b
Import Template
1 year ago
02236f3257
Create LICENSE
1 year ago
ed37ab5f89
README.mb updated
1 year ago
1c5dda8383
st-05 py scripting completed
1 year ago
1c5dda8383
st-05 py scripting completed
1 year ago
7aee335c2b
Import Template
1 year ago
2c48a0e65a
st - 02 - finished py scripting
1 year ago
51d666a9cc
metrics st-05 updated .json
1 year ago
1c5dda8383
st-05 py scripting completed
1 year ago
51d666a9cc
metrics st-05 updated .json
1 year ago
51d666a9cc
metrics st-05 updated .json
1 year ago
a1d150f5b3
Update requirements and setup files
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Project: NLP Binary Classification using Microservices Architecture for StackOverflow Tag Prediction with DVC Integration.

✨ Project information:

The project is a natural language processing (NLP) binary classifier problem of predicting tags for a given StackOverflow question. For example, we want one classifier which can predict a post that is about the R language by tagging it R. The project uses DVC (data version control) for managing data. It is built on a microservices architecture and is an end-to-end project. The dataset can be downloaded from this link.

📚 Libraries used :

  • Scikit-learn
  • Pandas
  • Numpy
  • DVC

🚀 Project structure:

workflow workflow

🐨 DagsHub Data Pipeline

workflow

Complete Project Data Pipeline is available at DagsHub Data Pipeline

🔥 Technologies Used:

1. Python 
2. shell scripting 
3. aws cloud Provider 
4. DVC

🔌 Infrastructure:

1. AWS S3
2. GitHub
3. DaghsHub

👷 Initial Setup:

conda create --prefix ./env python=3.9
conda activate ./env 
pip install -r requirements.txt
dvc init

Conclusion

This project is production ready to be used for the similar use cases and it will provide the automated and orchesrated production ready pipelines(Training & Serving)

Thanks for taking a look at this project. If you find it valuable, kindly rate it by clicking the star icon. Your support is highly appreciated! 😊🙏⭐

📃 License

MIT license © My Website website
Let's connect on LinkedIn

Tip!

Press p or to see the previous file or, n or to see the next file

About

Project-DVC-NLP-StackOverflow

Collaborators 1

Comments

Loading...