Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

prep-25k-vocab-corpus.sh 541 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  1. #!/bin/bash
  2. set -euo pipefail
  3. trap 'rm -rf .venv' EXIT
  4. if [[ "$OSTYPE" == "darwin"* ]]; then
  5. echo "This stage cannot be run on OSx"
  6. exit 2
  7. fi
  8. codeprep_version=$(yq -r .codeprep_version params/25k-vocab-corpus.yml)
  9. virtualenv .venv
  10. source .venv/bin/activate
  11. pip install "codeprep==$codeprep_version"
  12. export XDG_CONFIG_HOME=$(pwd)/data/25k-vocab-corpus-prepped
  13. codeprep nosplit -p $(pwd)/data/25k-vocab-corpus --no-unicode --no-str --no-com --no-spaces --calc-vocab --verbose --output-path=$(pwd)/data/25k-vocab-corpus-prepped
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...