Browse Source

Updates dictionary generation pipeline

Now we have DVC files in data/raw/UN Data, so the pipeline was reading
everything inside, which was a problem with the new DVC files over
there. This fix limits the files to be read to the ones ending with .csv
Marcel Ribeiro-Dantas 5 months ago
parent
commit
b855a2711c
2 changed files with 4 additions and 4 deletions
  1. 3
    3
      generate_dictionary_file.dvc
  2. 1
    1
      scripts/generate_raw_dictionary_file.R

+ 3
- 3
generate_dictionary_file.dvc

@@ -1,9 +1,9 @@
-md5: 487e5b95396687ae5b3024972551dd0e
+md5: 2d20124c81ecc49c77b5ecb69f8a2de8
 cmd: Rscript scripts/generate_raw_dictionary_file.R
 deps:
-- md5: dd335ac87bf5b82f3a5e47a51539b49b.dir
+- md5: 5f84db4d7730d79d0138f60207922e0c.dir
   path: data/raw/UN Data
-- md5: 2a960606c7c14454578094c15fc7a726
+- md5: e8e1fc04472465234dbf608ebc81a33b
   path: scripts/generate_raw_dictionary_file.R
 outs:
 - md5: c899f92797887198a102e2038a6b2cf1

+ 1
- 1
scripts/generate_raw_dictionary_file.R

@@ -7,7 +7,7 @@ library(WriteXLS)
 
 # Read all raw datasets from UN into one dataframe ---------
 
-df <- fs::dir_ls('data/raw/UN Data/') %>%
+df <- fs::dir_ls('data/raw/UN Data/', glob='*.csv') %>%
   map_dfr(read_delim, delim = ',', skip = 1, .id = "filepath")
 
 # Keep the variables required to create the data dictionary