huxtable-ocr

These scripts were used to run OCR on a corpus of ~1,500 articles by the architectural critic Ada Louise Huxtable.

Getting Started

This code utilizes the Google Vision API and Google Cloud storage.

Before starting, make sure that your input list and your filenames do not include apostrophes or commas.

google_pdf_ocr.py
Run OCR on PDF files stored in the cloud. Writes JSON output to the cloud.
Download output using Google Cloud CLI in terminal
./google-cloud-sdk/bin/gcloud init
gsutil cp -r [GOOGLE FOLDER] [OUTPUT FOLDER]
json_to_csv_rename.py
Write all output (filename, detected text) to a single CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
google_pdf_ocr.py		google_pdf_ocr.py
json_to_csv_rename.py		json_to_csv_rename.py