Are you sure you want to delete this access key?
This repository contains the code to import and integrate the book and rating data that we work with.
pg_prewarm
(from the
PostgreSQL Contrib package) installed.psql
executable on the machine where the import scripts will runThe environment-linux-x64.yml
file defines an Anaconda environment that contains all the required
packages, with the exception of the PostgreSQL server and client executables.
All scripts read database connection info from the standard PostgreSQL client environment variables:
PGDATGABASE
PGHOST
PGUSER
PGPASSWORD
This imports the following data sets:
data/LOC
directory.data/LOC
.data
directory.data
.data
.data
The import process is scripted with invoke. The first tasks to run are the import tasks:
invoke loc.import
invoke viaf.import
invoke openlib.import-authors openlib.import-works openlib.import-editions
invoke goodreads.import
invoke ratings.import-az
invoke ratings.import-bx
Once all the data is imported, you can begin to run the indexing and linking tasks:
invoke viaf.index
invoke loc.index
invoke openlib.index
invoke goodreads.index-books
invoke analyze.cluster --scope loc
invoke analyze.cluster --scope ol
invoke analyze.cluster --scope gr
invoke analyze.cluster
invoke ratings.index
invoke goodreads.index-ratings
invoke analyze.authors
The tasks keep track of the import status in an import_status
table, and will
keep you from running tasks in the wrong order.
The -schema
files contain the base schemas for the data:
common-schema.sql
— common tablesloc-schema.sql
— Library of Congress catalog tablesol-schema.sql
— OpenLibrary book dataviaf-schema.sql
— VIAF tablesaz-schema.sql
— Amazon rating schemabx-schema.sql
— BookCrossing rating data schemagr-schema.sql
— GoodReads data schemaPress p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?