Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git mlflow
Shambhavi Mishra 54a1ed0b55
Adding app link to README
3 years ago
2c08fbf012
Initialise DVC
3 years ago
src
d9af368b8b
Updated app.py
3 years ago
2c08fbf012
Initialise DVC
3 years ago
eebf819145
Adding the code for Semantic Similarity
3 years ago
7986d0b6ba
Added mlruns to slugignore
3 years ago
371acb0c56
Deploying SimSem App
3 years ago
8d02652238
Removing extra reqs
3 years ago
54a1ed0b55
Adding app link to README
3 years ago
7986d0b6ba
Added mlruns to slugignore
3 years ago
ce1f3e86ab
Saving output in data directory
3 years ago
93f96492f0
Calculating Similarity
3 years ago
93f96492f0
Calculating Similarity
3 years ago
d6ee128c17
FastText Experiment Logged
3 years ago
6d7173aad0
Add FastText Experiment
3 years ago
5f12157a6c
Adding SentBERT
3 years ago
3190187e46
EC2 Deployment Files
3 years ago
3190187e46
EC2 Deployment Files
3 years ago
00d30c3ec9
Removing extra reqs
3 years ago
3190187e46
EC2 Deployment Files
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Semantic_Similarity

Calculating semantic similarity between the search query and the database to retrieve the most similar one and rank them accordingly.

Prerequisites

  • Python 3.8+
  • Transformers
  • All the specified requirements in the text file.

Usage

  1. Clone this repository.
  2. Install requirements.txt using pip install -r requirements.txt.
  3. Use DVC to pull the files that are stored on the DAGsHub remote storage by running dvc pull
  4. To run the code for BERT-based model, use the command :
python src/main.py --model_name 'BERT' --search_criteria './data/search.txt' --query_file './data/data.csv' --column_name 'Title'
  1. To run the code for FastText-based model, use the command :
python src/main.py --model_name 'fasttext' --search_criteria './data/search.txt' --query_file './data/data.csv' --column_name 'Title'
  1. The search criteria can either be a file or a text string.
  2. Other arguments can be customised as per your data.

Running the Streamlit App

To run the streamlit app:

streamlit run app.py

Dataset

The dataset has been extracted from Kaggle and is based on questions from Stack Overflow.

Checkout the App on AWS EC2

This project has been deployed to EC2, feel free to check it out here.

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...