BioDBTracker

This tool is designed to track and log versions and details of bioinformatics databases. It provides a streamlined way to monitor changes, with results stored in an SQLite database or directly added to a Google Sheet for easy sharing and collaboration.

Features

Track and log versions of bioinformatics databases.
Store results in an SQLite database for local access.
Directly integrate with Google Sheets for sharing and collaboration.
Monitor and document changes efficiently.

Requirements

Ensure you have Python installed. We recommend using Conda for managing dependencies. See the included environment.yml file for setting up the environment.

Installation

Clone the repository:

git clone https://github.com/cidgoh/BioDBTracker
cd BioDBTracker

Set up the Conda environment:

conda env create -f environment.yml
conda activate BioDBTracker

Usage

Run the script with the following command:

python BioDBTracker.py -i dir1 dir2 -db database_name -gs -gsn google_sheet -gss google_sheet_name -gsc database-cidgoh-1dbcf7c8bf62.json

Command-Line Arguments

-i : Specify input directories (e.g., dir1, dir2).
-db : Name of the database to use.
-gs : Enable Google Sheets integration.
-gsn : Specify the Google Sheet name.
-gss : Specify the sheet name within the Google Sheet.
-gsc : Provide the path to the Google Sheets credential JSON file.

Example

Here's an example command:

python BioDBTracker.py -i samples/input1 samples/input2 -db sample_db -gs -gsn MyGoogleSheet -gss Sheet1 -gsc my-google-credentials.json

How to Generate the Google Sheets Authentication File

To enable Google Sheets integration, you need to generate a credentials file (JSON) from the Google Cloud Console. Follow these steps:

Go to the Google Cloud Console:
- Visit Google Cloud Console.
Create a New Project:
- If you don’t already have a project, create one by clicking on Select a Project > New Project.
Enable the Google Sheets API:
- Navigate to APIs & Services > Library.
- Search for "Google Sheets API" and click Enable.
Enable the Google Drive API:
- Similarly, search for "Google Drive API" and click Enable.
Create Credentials:
- Go to APIs & Services > Credentials.
- Click Create Credentials > Service Account.
- Fill in the required details and click Done.
Generate the JSON File:
- Select the created service account.
- Navigate to the Keys tab and click Add Key > Create New Key.
- Select JSON and download the file. This is your Google Sheets authentication file.
Share the Google Sheet with the Service Account:
- Open the Google Sheet you want to use.
- Share it with the email address of the service account (found in the JSON file).
Use the JSON File:
- Place the JSON file in your project directory.
- Provide its path using the -gsc argument when running the script.

Docker Usage

Prerequisites

Install Docker on your system. Follow the instructions from the Docker website.

Running the Docker Image

The BioDBTracker Docker image provides a streamlined way to run the tool without manually setting up dependencies.

Pull the Docker Image
```
docker pull cidgoh/biodbtracker:v0.1
```
Run the Docker Container
Use the following command to run the container with your local database directory mounted:
```
docker run -it \
  -v /path/to/local/db:/path/to/local/db \
  cidgoh/biodbtracker:v0.1 \
  -i /path/to/local/db \
  -db /path/to/local/db/sqlite.db
```
Replace /path/to/local/db with the absolute path to your local database directory.
Explanation of the Command
- -v /path/to/local/db:/path/to/local/db: Mounts your local database directory into the container.
- -i /path/to/local/db: Specifies the input directory.
- -db /path/to/local/db/sqlite.db: Specifies the database to use.

Example Command
Assuming your local database path is /home/user/BioDBTracker/db:

docker run -it \
  -v /home/user/BioDBTracker/db:/home/user/BioDBTracker/db \
  cidgoh/biodbtracker:v0.1 \
  -i /home/user/BioDBTracker/db \
  -db /home/user/BioDBTracker/db/sqlite.db

Results
The tool will process the specified databases and display the output in the terminal or save it to the designated location (e.g., SQLite or Google Sheets).

Database Versioning and Information Management

This folder contains various databases and their respective version information. Each database directory includes a version.yml file to track key details such as version number, source, and additional notes. This helps ensure transparency and consistency when sharing or updating databases.

Folder Structure

The folder structure is designed to organize databases by name and version. For example:

DatabaseName/
   ├── v1/
   │    ├── version.yml
   │    └── [other files or subdirectories]
   ├── v2/
   │    ├── version.yml
   │    └── [other files or subdirectories]
   └── custom/
        ├── version.yml
        └── [other files or subdirectories]

Instructions for Adding a Database

Create a Folder:
- Create a folder for the database (e.g., DatabaseName).
- Inside this folder, create subfolders for each version (e.g., v1, v2, etc.).
Add version.yml:
- In each version's folder, add a version.yml file to describe the database details.
- Use the provided template for consistency.

Template for version.yml:

database_info:
  name: [Database Name]
  version: [Database Version]
  date: [Date of Creation/Update]
  downloaded_from: [Source or DOI]
  downloaded_by: [Your Initials]
  tested_by: [Tester Initials, if applicable]
  note: [Any additional information or notes]

An example template for version.yml is provided in this directory. Simply copy the template and replace its content.
For custom databases, assign a custom version number and describe the source of the data in the note field.

Example for a Custom Database:

database_info:
  name: k2_refSeq-March-01-2024_mouse_50G
  version: v1.0-custom
  date: "2024-01-2024"
  downloaded_from: "RefSeq"
  downloaded_by: "JD"
  tested_by: "MT"
  note:  This database was built using Kraken 2 version 2.1.2. It includes genomes from RefSeq release 221, covering the following: Archaea, Bacter, Human, Plasmid, UniVec_Core, Viral genomes

Zipped Files:
- Zipped files can remain zipped within the version folder.
- Ensure the version.yml file exists alongside the zipped files.

License

MIT License.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any proposed changes.

Contact

For questions or suggestions, reach out to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
BioDBTracker.py		BioDBTracker.py
Dockerfile		Dockerfile
README.md		README.md
db_info_example.yml		db_info_example.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioDBTracker

Features

Requirements

Installation

Usage

Command-Line Arguments

Example

How to Generate the Google Sheets Authentication File

Docker Usage

Prerequisites

Running the Docker Image

Database Versioning and Information Management

Folder Structure

Instructions for Adding a Database

License

Contributing

Contact

About

Releases

Packages

Languages

cidgoh/BioDBTracker

Folders and files

Latest commit

History

Repository files navigation

BioDBTracker

Features

Requirements

Installation

Usage

Command-Line Arguments

Example

How to Generate the Google Sheets Authentication File

Docker Usage

Prerequisites

Running the Docker Image

Database Versioning and Information Management

Folder Structure

Instructions for Adding a Database

License

Contributing

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages