Skip to content

skytells-research/COVID-19-XRay-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID 19 X-Ray Dataset

We are building a database of COVID-19 cases with chest X-ray or CT images. We are looking for COVID-19 cases as well as MERS, SARS, and ARDS.

All images and data will be released publicly in this GitHub repo. Currently we are building the database with images from publications as they are images that are already available.

Background

The 2019 novel coronavirus (COVID-19) presents several unique features. While the diagnosis is confirmed using polymerase chain reaction (PCR), infected patients with pneumonia may present on chest X-ray and computed tomography (CT) images with a pattern that is only moderately characteristic for the human eye Ng, 2020. COVID-19’s rate of transmission depends on our capacity to reliably identify infected patients with a low rate of false negatives. In addition, a low rate of false positives is required to avoid further increasing the burden on the healthcare system by unnecessarily exposing patients to quarantine if that is not required. Along with proper infection control, it is evident that timely detection of the disease would enable the implementation of all the supportive care required by patients affected by COVID-19.

In late January, a Chinese team published a paper detailing the clinical and paraclinical features of COVID-19. They reported that patients present abnormalities in chest CT images with most having bilateral involvement Huang 2020. Bilateral multiple lobular and subsegmental areas of consolidation constitute the typical findings in chest CT images of intensive care unit (ICU) patients on admission Huang 2020. In comparison, non-ICU patients show bilateral ground-glass opacity and subsegmental areas of consolidation in their chest CT images Huang 2020. In these patients, later chest CT images display bilateral ground-glass opacity with resolved consolidation Huang 2020.

COVID is possibly better diagnosed using radiological imaging Fang, 2020 and Ai 2020.

Motivation

While PCR tests offer many advantages they are physical things that require shipping the test or the sample. X-ray machines can be plugged in to screen patients as long as they have electricity.

Imagine a future where we run out of tests and then the majority of radiologists get sick. AI tools can help general practitioners to triage and treat patients.

Companies are developing AI tools and deploying them at hospitals Wired 2020. We should have an open database to develop free tools that will also provide assistance.

Goal

Our goal is to use these images to develop AI based approaches to predict and understand the infection. The tasks are as follows using chest X-ray or CT (preference for X-ray) as input to predict these tasks:

Metadata

Here is a list of each metadata field, with explanations where relevant

Attribute Description
patientid Internal identifier
offset Number of days since the start of symptoms or hospitalization for each image. If a report indicates "after a few days", then 5 days is assumed. This is very important to have when there are multiple images for the same patient to track progression.
sex Male (M), Female (F), or blank
age Age of the patient in years
finding Type of pneumonia
survival Yes (Y) or no (N) or blank if unknown
intubated Yes (Y) if the patient was intubated (or ventilated) at any point during this illness or No (N) or blank if unknown.
went_icu Yes (Y) if the patient was in the ICU (intensive care unit) or CCU (critical care unit) at any point during this illness or No (N) or blank if unknown.
needed_supplemental_O2 Yes (Y) if the patient required supplemental oxygen at any point during this illness or No (N) or blank if unknown
extubated Yes (Y) if the patient was successfully extubated or No (N) or blank if unknown
temperature Temperature of the patient in Celsius at the time of the image
pO2 saturation partial pressure of oxygen saturation in % at the time of the image
wbc count white blood cell count in units of 10^3/uL at the time of the image
neutrophil count neutrophil cell count in units of 10^3/uL at the time of the image
lymphocyte count lymphocyte cell count in units of 10^3/uL at the time of the image
view Posteroanterior (PA), Anteroposterior (AP), AP Supine (APS), or Lateral (L) for X-rays; Axial or Coronal for CT scans
modality CT, X-ray, or something else
date Date on which the image was acquired
location Hospital name, city, state, country
filename Name with extension
doi Digital object identifier (DOI) of the research article
url URL of the paper or website where the image came from
license License of the image such as CC BY-NC-SA. Blank if unknown
clinical notes Clinical notes about the image and/or the patient
other notes e.g. credit

Sources

Initial results

Citation

  • Skytells AI Research
    • Analyzing and Improving the Image Quality
    • Reconfirming Classifications
    • X-Ray Images implementation
  • Zhang Lab
    • X-Ray Images
  • Joseph Paul Cohen and Paul Morrison and Lan Dao
    • Data Collection, arXiv:2003.11597, 2020

Contribute

  • We can extract images from publications. Help identify publications which are not already included using a GitHub issue (DOIs we have are listed in the metadata file). There is a searchable database of COVID-19 papers here, and a non-searchable one (requires download) here.

  • Submit data to these sites (we can scrape the data from them):

  • Provide bounding box/masks for the detection of problematic regions in images already collected.

  • See SCHEMA.md for more information on the metadata schema.

Formats: For chest X-ray dcm, jpg, or png are preferred. For CT nifti (in gzip format) is preferred but also dcms. Please contact with any questions.

MIT LICENSE

Copyright (c) 2020 Skytells, Inc

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.