• Stars
    star
    6
  • Rank 2,539,965 (Top 51 %)
  • Language
    Jupyter Notebook
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bioinformatics Project. Trying to find out the most influential metabolomic biomarkers(from 158) for predicting Lung Cancer. Also, trying to make some predictions about the disease.

Most Dominant Metabolomic Biomarkers Identification for Lung Cancer

Author Co Author Co Author MIT Contributions welcome Stars

Metabolomic biomarkers play a vital role in the early identification and prediction of cancer. It is possible to save numerous lives if biomarkers are used to assist medical providers in diagnosing their patients faster. Many researchers have been trying to identify the crucial biomarkers in the early diagnosis of diseases. This paper presents several steps divided into two phases for determining the most important metabolomic biomarkers in the blood for lung cancer prediction using Plasma and Serum samples. We used the Shapiro–Wilk Test, Bartlett’s Test, Levene’s Test, Student’s t-Test, and Kruskal–Wallis Test in the first phase to determine the potential biomarkers. Recursive Feature Elimination with Random Forest was used to identify the final most dominant metabolomic biomarker at the second phase. Lastly, we ended with Ridge Classifier and XGBoost Classifier to assess the consistency of our approaches. Despite the declining number of metabolites up to a greater level, our prediction accuracy was 100% and 90.91% for Plasma and Serum samples, respectively which is higher than the state-of-the-art method. Finally, we made some analysis using the most dominant metabolites that can serve as a source of inspiration for our work.

Setup the Project after git clone.

  1. Open the directory: ~/final and best approach/. You will find mainly 2 approaches I have applied.
  2. First, we go with Approach 1.
  3. Read and run these notebooks one after another by this sequence:
    • plasma_test_final.ipynb
    • serum_test_final.ipynb
    • specific_metabolics_accuracy_final.ipynb
  4. Now it's time for Approach 2.
  5. Read and run this notebook: exploratory_analysis.ipynb.
  6. For the mixed up approach, which have been added as a merge(apporaches 1 and 2), simply run the notebook: mixed_up.ipynb.

Thank you. Please let us know, if you find any mistake or way of development in this repo. Cheers!

Read our Published Journal Research Paper based on this repository. Cite if this helps your work:

    @article{ghosh2022most,
    title={Most dominant metabolomic biomarkers identification for lung cancer},
    author={Ghosh, Utshab Kumar and Al Abir, Fuad and Rifaat, Nahian and Shovan, SM and Sayeed, Abu and Hasan, Md Al Mehedi},
    journal={Informatics in Medicine Unlocked},
    volume={28},
    pages={100824},
    year={2022},
    publisher={Elsevier}
    }

More Repositories

1

ML-DS-Bootcamp

Jupyter Notebook
188
star
2

ML_Competition-AND-Practice

Jupyter Notebook
187
star
3

Image_Segmentation

Image Segmentation using Python
Jupyter Notebook
187
star
4

utshabkg

186
star
5

Competitive_Contest_Problem_Solves

Source Codes of problems from Codeforces, Atcoder, HackerEarth, UVA, Codechef etc.
Python
186
star
6

Ecommerce_site_Django

Ecommerce Site with Django framework. Made as a university project on 3rd Year, 1st Semester.
Python
185
star
7

utshabkg.github.io

HTML
185
star
8

NLP

NLP Notebooks
Jupyter Notebook
185
star
9

react-practice

JavaScript
185
star
10

wordpress-projects

PHP
184
star
11

Job_HigherStudies-Resources

Different resources shared by some good people
Python
184
star
12

ML_Web_Apps

Deploying the Machine Learning models in website with Django
Jupyter Notebook
184
star
13

react-projects

JavaScript
184
star
14

CSE-4102_Compiler_Design

Lab works of Compiler Design Course.
C
184
star
15

30-Days-of-ML_Kaggle

Jupyter Notebook
184
star
16

CSE-4106_Digital_Image_Processing

Lab works of Digital Image Processing Course.
Jupyter Notebook
184
star
17

CSE-4104_Digital-Signal_Processing

Lab Works of Digital Signal Processing(MATLAB).
MATLAB
183
star
18

AI_Master_Class

Pantech Solutions, India offered a 30 days AI Master Class Program
Python
183
star
19

IBM-Applied-Data-Science-Specialization

Jupyter Notebook
183
star
20

DS_Algorithm_OOP_Practice

Jupyter Notebook
183
star
21

Titanic_Dataset

Titanic: Machine Learning from Disaster. Predict survival on the Titanic. Thanks to Rakibul Hasan Sir for his books on machine learning.
Jupyter Notebook
4
star
22

Image_Denoising_Using_AutoEncoders_Keras

A guided project from Coursera on Image Denoising Using AutoEncoders in Keras and Python.
Jupyter Notebook
3
star
23

AI_for_Medicine

Deep Learning Specialization with Medicine in Coursera
Jupyter Notebook
2
star
24

Starting_Machine_DeepLearning_DataScience

Learning through practice on a course of Udemy. Course Link: https://www.udemy.com/course/data-science-and-machine-learning-with-python-hands-on/
Roff
2
star
25

Recommender_System_ML

Machine Learning with Python used in Recommender System
Jupyter Notebook
2
star
26

Data_Scientist_Career_Track

Data Scientist with Python track by Datacamp.
Jupyter Notebook
2
star
27

XGBoost_LightGBM_CatBoost

Started learning them with Machine Learning Mastery
Jupyter Notebook
1
star
28

gait_recognition

Jupyter Notebook
1
star
29

CSE-3202

Operating System
Java
1
star
30

CSE-2202

Computer Algorithm Course implemented with C++.
C++
1
star