DATASETS

To download any of these datasets you must contact us at the following email: bivl2ab@saber.uis.edu.co. The use of these databases is prohibited for commercial purposes and is completely restricted to research and educational purposes. If you use one of these databases for your experiments, please include the reference attached to each database.
COVID-19 dataset

COVID-19

Details

The volumes present an average of 45 slices and the spatial dimensions are 512x512 pixels. Additionally, these studies were complemented with clinical information, comorbidities, findings and their distribution and a measure that indicates the degree of severity of the disease. The evaluation was carried out by two radiologists. Regarding the severity assignment of COVID-19, carried out based on the percentage of affection of each lobe, it has the following distribution: 33 studies in mild stages, 52 in moderate stages, 46 in severe stages, 37 in critical stages. The local RX data set presents a total of 688 radiographs, which have been anonymized and previously processed to perform the various classification and segmentation tasks. The data set presents 300 annotations made by the radiologists related to the possible findings of each condition.

Reference

No reference.

Parkinson's disease dataset

PARKINSON'S DISEASE

Details

The dataset consists of a set of videos captured with an RGB camera, of 22 patients, with 11 control patients and 11 patients diagnosed with Parkinson's disease. The patients with PD were in stages of the disease between 2.5 and 4.0 according to Hoehn and Yahr scale (values estimated by a physiotherapist). Each study subject was recorded 8 times while doing a nature walk without markers, 4 times to the right and 4 times left, for a total of 176 video sequences. The dataset consists of 12 men and 10 women distributed as follows: 2 women and 9 men in the PD group and 8 women and 3 men in the control group. The average length of the videos is 4 seconds.

Reference

Salazar, I., Pertuz, S., Contreras, W., & Martínez, F. (2020). A convolutional oculomotor representation to model parkinsonian fixational patterns from magnified videos. Pattern Analysis and Application.

Sign language dataset

SIGN LANGUAGE

Details

In the CoL-SLTD there are 39 sentences, divided into 24 affirmative, 4 negative and 11 interrogative sentences. Each of the phrases has 3 different repetitions, for a total of 1020 phrases, which allows to capture the variability of the movement of the signs related to specific or particular expressions. In addition, the sentences were made by 13 participants (between 21 and 80 years old), with a sentence duration of between two and nine signs. All recorded videos were resized to a spatial resolution of 448x448 with temporal resolutions of 30 and 60 FPS.

Reference

Rodríguez J. et al. (2021) Understanding Motion in Sign Language: A New Structured Translation Dataset. In: Ishikawa H., Liu CL., Pajdla T., Shi J. (eds) Computer Vision - ACCV 2020. ACCV 2020. Lecture Notes in Computer Science, vol 12627. Springer, Cham.

APIS dataset

APIS

Details

The dataset consists of 96 patient studies collected from two clinical centers between 2021 and 2022. It includes 86 ischemic stroke (IS) cases and 10 control studies to diversify tissue samples. For each patient, the dataset provides a triage NCCT (Non-Contrast Computed Tomography) and a subsequent ADC (Apparent Diffusion Coefficient) map, both skull-stripped and co-registered. Two neuro-interventional radiologists (with over 5 years of experience) performed the manual lesion delineations on the ADC sequences. The dataset reflects high variability in lesion shape and size. Additionally, it includes comprehensive demographic and clinical data, such as age, sex, symptoms, medical history (hypertension, diabetes), and the time elapsed between symptom onset and imaging.

Reference

Gómez, S., Rangel, E., Mantilla, D. et al. APIS: a paired CT-MRI dataset for ischemic stroke segmentation - methods and challenges. Sci Rep 14, 20543 (2024). https://doi.org/10.1038/s41598-024-71273-x

COLON dataset

COLON

Details

COLON: the largest COlonoscopy LONg sequence dataset has around of 30 thousand polyp labeled frames and 400 thousand background frames. Each frame has a resolution of 480 x 720 and a temporal resolution of 30 fps. Each video has an average of 16 thousand frames with unbalanced observations since most of the frames present only the intestinal tract. The dataset contains 30 long sequences with polyps and 10 long sequences without polyps and at least three sequences have two polyps. The polyps are highly variable regarding size, NICE classification, morphology (sessile or pedunculated), and biopsy results (adenoma or hyperplastic). An expert gastroenterologist (with more than 13 years of experience) labeled these findings. In addition, demographic variables such as sex and age were also included in this study.

Reference

Ruiz, L., Sierra-Jerez, F., Ruiz, J., & Martinez, F. (2024). COLON: The largest COlonoscopy LONg sequence public database. arXiv preprint arXiv:2403.00663.