Project Overview
During my Ph.D. at a government-funded research institute, I participated in 15 projects focused on AI and ML-driven medical imaging innovations. I worked on real-world clinical applications, solving complex challenges in data preprocessing, model optimization, and large-scale learning. This section highlights a selective set of projects where my contributions were most impactful.
Machine Learning & AI: Medical imaging, pattern recognition, large-scale learning algorithms
Deep Learning: Foundation models, LLMs, Generative AI (GenAI)
Computer Vision: Image processing, segmentation, multi-modal data integration
Real-World Data Processing: Preprocessing heterogeneous clinical datasets, optimizing workflows
Medical image analysis skills I've acquired through various courses, including those from my PhD studies and Fast Campus, and then introduce the real-world projects I've contributed to
First, I will showcase the technical skills I have learned and applied through coursework and assignments during my Ph.D. program and Fast Campus training. Then, I will introduce the real-world projects I have contributed to while working at research institutes. For the projects I have led independently, I have specified the institution and the duration of the project to provide clear context.Â
Radiology & Pathology Image Preprocessing
X-ray, CT, MRI, Pathology
MONAI, PyTorch, TensorFlow, DICOM, PyDicom, Nibabel, SimpleITK, etc.
Medical Image ClassificationÂ
Medical Image Detection
Medical Image Segmentation
Medical Image Synthesis
Medical Image Scoring & RegressionÂ
Evaluation (Pearson Correlation Coefficient (PCC), Confusion Matrix, Spearman’s Rank Correlation, Kendall’s Tau, PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index Measure), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), and R² (Coefficient of Determination))
From 2022 to 2024, I conducted research under the theme "Invisible to Visible," focusing on enhancing the visibility of soft tissues in CT imaging. This study leverages advanced AI-based image translation techniques to reconstruct and highlight anatomical structures that are traditionally challenging to visualize in CT scans. By integrating multi-modal imaging data, particularly MR-guided segmentation and CT-to-MR translation, this research aims to bridge the gap between imaging modalities, improving diagnostic accuracy and clinical applicability. The findings of this study are currently under review in a top-tier journal.
***While detailed content cannot be shared at this stage, I will update this section once the paper is published online***
This collaborative research with a Malaysian medical university focused on analyzing cerebral small vessel disease (CSVD) through AI-driven image analysis. The project brought together a clinical research team, led by a physician PI in Malaysia, and our computational research lab, creating a unique synergy between clinical expertise and AI-based medical imaging techniques.
During this collaboration, a Malaysian postdoctoral researcher visited our lab, where I provided computational expertise, including object detection, medical image preprocessing, and AI-based diagnostic techniques. This experience reinforced the critical role of data preprocessing in medical imaging, as even minor variations significantly impacted diagnostic outcomes. Additionally, I introduced VR-based 3D tracing, enabling direct visualization of cerebral vasculature and brain structures, which enhanced the understanding of CSVD progression.
Our interdisciplinary approach led to a joint conference presentation and the publication of our research in Frontiers in Cardiovascular Medicine in 2021.
Conference: The 7th Asian Oceanian Congress on Clinical Neurophysiology (AOCCN), January 2021
Title: Non-Invasive 3D Image Analysis of Microcirculation Vascular Integrity in Asymptomatic Cerebral Small Vessel Disease
Collaboration: Dr. Che Mohd Nasril (Postdoc) & Professor Muzaimi Mustapha (Wet Lab)
My Role: Computational analysis using imaging informatics techniques (Dry Lab)
Summary
In this study, I contributed to the development of neuTube, a neural tracing software designed to detect cerebral small vessel disease (CSVD) using object detection-based AI techniques. By leveraging AI-driven image analysis, our method enables rapid and accurate identification of CSVD, facilitating early diagnosis and potential intervention.
Key Highlights
Background: CSVD is a neurological disorder affecting small brain vessels, often linked to stroke, dementia, Alzheimer’s disease (AD), and depression.
Objective: Investigate microcirculation vascular integrity in individuals with and without CSVD through advanced image analysis.
Methods:
Utilized neuTube for 3D reconstruction of cerebral vasculature from non-invasive 3D-time-of-flight MRA images.
Conducted quantitative analysis to assess microcirculation integrity.
Results:
neuTube successfully reconstructed fine details of the cerebral blood vessel network.
Analysis of ten human MRA images provided insights into vascular differences in CSVD patients.
Further quantitative studies will enhance clinical applications.
Conclusion: This study demonstrates a non-invasive AI-driven approach to assess cerebral small vessel integrity, offering a promising tool for early diagnosis and monitoring of CSVD.
Objective: Improve the detection and segmentation of densely packed nuclei in medical images using a Mask R-CNN-based approach.
Challenge: Traditional methods rely on fully annotated training images, which are time-intensive to create. To address this, the project explored learning from partially labeled exemplars to enhance efficiency.
Proposed Approach:
Designed a centerness score and bounding box prediction task using partially labeled ground truth.
Implemented predictions at FPN1, the first feature level of the backbone network, to capture most nuclei in the image.
Integrated standard Mask R-CNN components, including Backbone, RPN module, RCNN head, and Mask head.
Added a decomposed self-attention (SA) module to improve feature extraction.
My Role: Assisted my PI’s research, contributing to data preparation, model evaluation, and optimization of the Mask R-CNN framework.
Key Contribution: Enhanced segmentation performance by leveraging similarities among nuclei to improve prediction with partially labeled data.
Outcome: Improved segmentation accuracy while reducing dependency on fully annotated datasets. The source code is publicly available at GitHub.
Developed an AI-powered defect detection system for cardiovascular stents, automating quality control and reducing human inspection by 80%.
Implemented a camera-based defect detection system, significantly improving inspection accuracy and operational efficiency.
Extended internship twice for exceptional performance, completing three consecutive terms.
To gain access to high-quality coursework, I participated in exchange programs at KAIST and Korea University, where I took advanced AI and deep learning courses. These courses had a competitive grading system, and my scores were converted to a perfect 100 under the relative grading scale.
In the deep learning course, each project required end-to-end implementation, including coding, model development, report writing, and a final presentation, all within just one week per project. Over the span of four projects, I consistently achieved top scores, demonstrating strong technical and analytical skills. Additionally, I submitted Kaggle code contributions as part of my coursework.
The following section showcases the projects I completed, highlighting my approach and methodologies. 🚀
These projects were part of my training and research during my Ph.D. studies, which I began in September 2018 and successfully completed early in September 2021. 🚀
For the first deep learning assignment, I worked with the BraTS 2020 dataset, which is widely used for multimodal brain tumor segmentation. The dataset consists of pre-operative multi-institutional MRI scans from 19 different institutions, including glioblastoma (GBM/HGG) and lower-grade glioma (LGG) cases, with expert-annotated tumor regions. The imaging data is provided in NIfTI format (.nii.gz) and includes four MRI modalities:
T1-weighted (T1): Native MRI scan
T1Gd (T1-post contrast): Post-contrast enhanced T1-weighted scan
T2-weighted (T2): Fluid-sensitive scan
T2-FLAIR (Fluid Attenuated Inversion Recovery): Highlights edema and tumor regions
Each scan is manually annotated by board-certified neuroradiologists, with labeled tumor subregions:
Enhancing Tumor (ET - label 4)
Peritumoral Edema (ED - label 2)
Necrotic/Non-Enhancing Tumor Core (NCR/NET - label 1)
Additionally, survival data, including overall survival (OS) in days, patient age, and resection status, is provided for survival prediction tasks.
This was my first assignment in the deep learning course, and a major challenge was the large size of the dataset, which significantly impacted training time. With only one week to complete the assignment, I needed to efficiently manage data preprocessing, model training, and submission to a Kaggle competition.
Handling Large-Scale Data
The dataset required significant preprocessing, including normalization, skull-stripping, and data augmentation.
Given the high computational cost, I optimized batch sizes and memory usage to ensure smooth training within the given time.
Model Development & Implementation
I implemented a deep learning-based segmentation model trained on the multimodal MRI data.
The U-Net architecture was used for tumor segmentation, leveraging multi-channel MRI inputs for feature extraction.
Results & Optimization
I successfully submitted my model results within the deadline, adhering to the competition timeline.
Later, with extended training time, I fine-tuned my model to improve segmentation accuracy and fully analyze model performance.
First hands-on experience with large-scale medical image datasets and deep learning segmentation models.
Gained insights into the computational challenges of training on large medical datasets within time constraints.
Successfully applied U-Net for brain tumor segmentation, refining the model post-submission for improved accuracy.
In this project, the task was to predict patient survival based on demographic, clinical, and time-to-event data provided in an Excel dataset. However, a major challenge was the high proportion of missing data, making it difficult to train an accurate predictive model. Instead of discarding incomplete data, I focused on mathematically resolving this issue through an imputation strategy, which ultimately contributed to a higher evaluation score.
Approach & Methodology
Handling Missing Data with GAN-Based Imputation
Traditional imputation methods, such as mean imputation, kNN imputation, or multiple imputation (MICE), often fail to capture the underlying complex relationships in medical data, leading to potential bias or information loss.
To address this, I applied Generative Adversarial Networks (GANs) for missing data imputation. GANs are capable of learning the latent distribution of the data, generating plausible values that better reflect the original data structure.
This approach was particularly beneficial in this study, as patient survival data often involves nonlinear interactions between clinical variables, which GANs are well-suited to model.
Survival Prediction using Deep Learning
After data imputation, I applied a linear regression-based classification model using a deep learning approach to predict survival outcomes.
The goal was to determine how well machine learning methods could classify patient survival based on available features.
Kaplan-Meier Survival Estimation
To further validate the deep learning predictions, I conducted a Kaplan-Meier survival analysis, a widely used non-parametric statistical method for estimating survival functions.
This provided an intuitive visualization of survival probabilities over time and allowed for comparison with the deep learning classification results.
Comparison of ML-based and Statistical Approaches
Finally, I compared the Kaplan-Meier estimates with the deep learning model predictions, assessing their relative effectiveness in survival prediction.
Key Takeaways
GAN-based imputation effectively handled missing data, offering a more sophisticated alternative to traditional imputation methods by capturing the underlying data distribution.
Deep learning-based classification demonstrated strong predictive capabilities but required further validation.
Kaplan-Meier analysis provided a statistical benchmark, offering interpretability to complement ML-based predictions.
This project reinforced the importance of handling missing data in medical datasets and showcased the potential of AI-driven approaches in survival analysis.
For this project, the goal was to segment the lateral ventricles in brain CT images using a deep learning-based approach. The dataset consisted of NIfTI files, including:
Brain CT Slices (input images)
Lateral Ventricle Segmentation Slices (Ground Truth labels)
Test Slices (for model evaluation)
Model Selection & Training
Implemented a single-channel CNN trained on Brain CT slices with corresponding segmentation labels.
Used cross-validation to improve generalization and fine-tune model performance.
Evaluation & Testing
Evaluated the trained model using 100 test slices provided in the dataset.
Measured segmentation accuracy using common performance metrics for medical image segmentation.
Applied CNN-based segmentation for brain structure analysis using single-channel medical imaging data.
Gained experience in training deep learning models on NIfTI-structured datasets.
Learned how to evaluate model performance effectively using test data.
This project reinforced my understanding of deep learning for medical image segmentation and provided insights into the challenges of anatomical structure identification in CT scans. 🚀
Upon entering graduate school in 2018, my primary goal was to make the invisible visible. Genes are not directly observable to the human eye, yet their expression patterns hold crucial insights into biological processes and disease progression. I wanted to leverage artificial intelligence to decode these patterns—predicting when and how gene expression occurs—and further visualize the likelihood and pathways of recurrence and metastasis.
To pursue this vision, I undertook a project utilizing Google Cloud Platform to analyze RNA sequencing and exome sequencing data. By applying AI-driven models, I aimed to identify meaningful genetic patterns and translate them into actionable insights. This project, which I initiated immediately upon starting graduate school, reinforced my commitment to integrating AI with genomics to uncover hidden biological mechanisms.
• LAB Project : "Construction and Application of Bioinformatics Service System for Big Whole-Genome Data." , “Prediction and Application of Pathogenicity of Genome Variants by Machine Learning.”Â
• Studying RNA Seq Analysis of Human Diseases, Whole Exome Genome Analysis of Rare diseases, Genomic Prediction & Marker Selection, Development of Genomic Prediction and GWAS Mostly, I performed Whole Exome Sequencing, RNA Sequencing with Autism Spectrum Disorder Datasets
A significant portion of my work involved analyzing Whole Exome Sequencing and RNA Sequencing data from Autism Spectrum Disorder (ASD) datasets, leveraging Linux and Google Cloud-based computing (Google Genomics). The flexibility of Google Cloud GPUs played a critical role in enabling large-scale genomic analysis, as they provided high-performance computing power that could be accessed anytime, anywhere—offering a significant advantage for researchers handling big genomic data.
I have been fortunate to explore a wide range of research topics, and this would not have been possible without the guidance and support of the senior researchers I had the privilege to learn from. Their mentorship shaped my growth, and I deeply respect and appreciate the time, effort, and wisdom they shared with me. I am profoundly grateful to each of them for their generosity and guidance. I could never have come this far alone.👏
My PhD journey was unconventional, as I pursued my degree while working as a researcher at a government-funded institute. Frequent transitions of senior researchers, many of whom moved to university positions, led me to adapt to multiple research topics. While focusing on government projects limited my publications, it allowed me to explore diverse fields. I truly enjoyed my PhD journey, as I love learning new things and appreciated the opportunity to gain knowledge across disciplines. I am grateful for these experiences and look forward to continuing my research with the same curiosity and enthusiasm.🙂