• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
    Jupyter Notebook
  • Created almost 4 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The study of gene expression of cells and tissue is one of the major ways for discovery in medicines. The main challenge of such gene data is high input dimensionality, heterogeneity in the data with very low sample size. To overcome this, gene subset selection/Feature Selection has become a crucial and essential step. This is solved using: A. Applying the three filter methods( 1. Mutual Info[f1] 2. F Classif[f2] and 3.T-Test[f3] ) on the three datasets to get important features. B. Select the most important N/3 features from each of these three filter methods(f1,f2,f3).F = { f1 U f2 U f3) C. Now apply feature selection in a cascaded manner. a. F1( N features ) → F2( 2N/3 features out of selected features from F1) → F3(N/3 features out of selected features from F2) b. F2 → F3 → F1 c. F3 → F1 → F2 D. Classifying the test data using wrapper methods(Sequential Forward Search and Sequential Backward Search ) with N features. Using KNN and SVM for Classification. Reporting Accuray, F-Score, and Confusion Matrix.