2 Branches 3 Releases

.dvc

53cd73ef77

folder refactoring

4 years ago

conda-recipe

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

data

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

notebooks

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

parameters

382e55a52e

minore lightgbm cleanup

3 years ago

yspecies

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

.gitignore

7b4b897698

updated second stage

3 years ago

README.md

985fb8dd1e

readme fix

3 years ago

dvc.lock

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

dvc.yaml

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

environment.yaml

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

setup.py

1bf8ad636f

updated yspecies, added extended_samples

3 years ago

DagsHub Storage

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

You have to be logged in to leave a comment.

YSpecies

This repository was created to prototype the DVC-based ML pipelines for the crosspecies project All dependencies are written in conda environment.yaml file, DVC and jupyter lab are also installed there.

Project structure

In the data folder one keeps input, interim and output data.

Before you start running anything do not forget to dvc pull the data and after commiting do not forget to dvc push it!

The pipeline is run by running dvc stages (see stages folder)

Most of the analysis is written in jupyter notebooks in the notebooks folder. Each stage runs (and source controls input-outputs) corresponding notebooks using papermill software (which also stores output of the notebooks to data/notebooks)

Temporaly some classes are copy-pasted from xspecies repository to make notebooks works

yspecies package

The code in yspecies folder is a conda package that is used inside notebooks The package can be installed from conda https://anaconda.org/antonkulaga/yspecies

conda install -c antonkulaga yspecies

Running stages

DVC stages are in dvc.yaml file, to run dvc stage just use dvc repro <stage_name>:

dvc repro

Most of the stages also produce notebooks together with files in the output

Yspecies classes

Indexing

One of the key classes is ExpressionDataset class:

e = ExpressionDataset("5_tissues", expressions, genes, samples)
e

It allows indexing by genes:

e[["ENSG00000073921", "ENSG00000139687"]]
#or
e.by_genes[["ENSG00000073921", "ENSG00000139687"]]

By samples:

e.by_samples[["SRR2308103","SRR1981979"]]

Both:

e[["ENSG00000073921", "ENSG00000139687"],["SRR2308103","SRR1981979"]]

Filtering

ExpressionDataset class has by_genes and by_samples properties which allow indexing and filtering. For instance filtering only blood tissue:

e.by_samples.filter(lambda s: s["tissue"]=="Blood")

The class is also Jupyter-friendly with repr_html() method implemented

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

YSpecies

Project structure

yspecies package

Running stages

Yspecies classes

Indexing

Filtering

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

aging-research / species

README.md

YSpecies

Project structure

yspecies package

Running stages

Yspecies classes

Indexing

Filtering

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

aging-research
/
species