config

2a1c796d9a

use hydra version 1.2

3 months ago

data

b06e8726da

remove dvc

3 months ago

docs

636f460b63

infer __directory_name

3 months ago

models

636f460b63

infer __directory_name

3 months ago

notebooks

636f460b63

infer __directory_name

3 months ago

src

ca32ecbf43

add utils.py file

1 month ago

tests

8ddb9b2a4e

update the init file of tests

3 months ago

.gitignore

e1fdda47bf

ignore files in data and notebooks directories

3 months ago

.pre-commit-config.yaml

8151b9c3a2

upgrade ruff to 0.2.0

3 months ago

Makefile

64cedc2e11

Edit pip install in Makefile

2 months ago

README.md

ca32ecbf43

add utils.py file

1 month ago

pyproject.toml

5ca58f88c8

add requirements-dev.txt and change README

3 months ago

requirements-dev.txt

5ca58f88c8

add requirements-dev.txt and change README

3 months ago

requirements.txt

5ca58f88c8

add requirements-dev.txt and change README

3 months ago

You have to be logged in to leave a comment.

Tools used in this project

hydra: Manage configuration files - article
pdoc: Automatically create an API documentation for your project
pre-commit plugins: Automate code reviewing formatting {% if cookiecutter.dependency_manager == "poetry" %}
Poetry: Dependency management - article {% endif %}

Project Structure

.
├── config                      
│   ├── main.yaml                   # Main configuration file
│   ├── model                       # Configurations for training model
│   │   ├── model1.yaml             # First variation of parameters to train model
│   │   └── model2.yaml             # Second variation of parameters to train model
│   └── process                     # Configurations for processing data
│       ├── process1.yaml           # First variation of parameters to process data
│       └── process2.yaml           # Second variation of parameters to process data
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   └── raw                         # raw data
├── docs                            # documentation for your project
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
{% if cookiecutter.dependency_manager == "pip" -%}
├── pyproject.toml                  # Configure black
{% elif cookiecutter.dependency_manager == "poetry" -%}
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # dependencies for poetry
{%- endif %}
├── README.md                       # describe your project
├── src                             # store source code
│   ├── __init__.py                 # make src a Python module 
│   ├── process.py                  # process data before training model
│   ├── train_model.py              # train model
│   └── utils.py                    # store helper functions
└── tests                           # store tests
    ├── __init__.py                 # make tests a Python module 
    ├── test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

Set up the environment

{% if cookiecutter.dependency_manager == "poetry" %}

Install Poetry
Activate the virtual environment:

poetry shell

Install dependencies:

To install all dependencies from pyproject.toml, run:

poetry install

To install only production dependencies, run:

poetry install --only main

To install a new package, run:

poetry add <package-name>

{% else %}

Create the virtual environment:

python3 -m venv venv

Activate the virtual environment:

For Linux/MacOS:

source venv/bin/activate

For Command Prompt:

.\venv\Scripts\activate

Install dependencies:

To install all dependencies, run:

pip install -r requirements-dev.txt

To install only production dependencies, run:

pip install -r requirements.txt

To install a new package, run:

pip install <package-name>

{% endif %}

View and alter configurations

To view the configurations associated with a Pythons script, run the following command:

python src/process.py --help

Output:

process is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

model: model1, model2
process: process1, process2


== Config ==
Override anything in the config (foo.bar=value)

process:
  use_columns:
  - col1
  - col2
model:
  name: model1
data:
  raw: data/raw/sample.csv
  processed: data/processed/processed.csv
  final: data/final/final.csv

To alter the configurations associated with a Python script from the command line, run the following:

python src/process.py data.raw=sample2.csv

Auto-generate API documentation

To auto-generate API document for your project, run:

make docs

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

{{cookiecutter.project_name}}

Tools used in this project

Project Structure

Set up the environment

View and alter configurations

Auto-generate API documentation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

khuyentran1401 / data-science-template connected to https://github.com/khuyentran1401/data-science-template.git

README.md

{{cookiecutter.project_name}}

Tools used in this project

Project Structure

Set up the environment

View and alter configurations

Auto-generate API documentation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

khuyentran1401
/
data-science-template
connected to https://github.com/khuyentran1401/data-science-template.git