• Stars
    star
    1
  • Language
  • License
    Academic Free Lic...
  • Created about 3 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Background. Discover relevant biomedical interactions in the literature is crucial for enhancing biology research. It has an essential role in studying the different processes and interactions reported that affect the biological process (e.g., genome, metabolome, and transcriptome). Therefore, the objective of this work is twofold: reduce the manual effort required to curate and review the existing biochemical interactions reported in the gluten-related bibliome while proposing a novel relation extraction deep learning approach that assists in a real curation task by learning from the previous decisions of the curators. Methods. Compared to previous works, the main contribution of this work lies in proposing a deep learning model that incorporates a novel vector-space that combine (i) high-level lexical and syntactic inference features as Wordnets and Health-related domain ontologies, (ii) unsupervised domain syntactic and semantic resources as word embeddings, (iii) semantical and sentence structure knowledge (e.g., part of speech, negation information, verb information), (iv) abbreviation resolution support, (v) several state-of-the-art Named-entity recognition methods, and (vi) different feature construction and optimization techniques to support a semi-automatic curation workflow. Results.The application of the semi-automatic curation workflow over a classified set of 2,451 relevant gluten-related documents produces a total of 8,349 relevant relations and 471,813 irrelevant relations of the next relation categories: (i) Related health issue, (ii) Improve, (iii) Aggravate, (iv) Stimulation, (v) Inhibition, (vi) Activation, (vii) Deactivation, (viii) Downregulation, (ix) Upregulation, (x) increase symptoms, (xi) decrease symptoms, (xii) weak relation and (xiii) no effect. Therefore, the mean achieved F-score for the different relation categories established was 0.731, with the lowest F.score at 0.47 (with 200 positive identified relations) and the highest F.score at 0.929 (with 2,129 positive identified relations). Experimental results showed that the presented workflow is an excellent approach for a semi-automatic RE task. It was able to obtain satisfactory results in the early stages of a real-world curation task and saved manual annotation efforts by learning from the decisions made by manual curators. On the other hand, the presented sentence vector-space can be integrated into several state-of-the-art machine learning models to recognize relevant relations with satisfactory results. Finally, this work highlights the benefit of use domain knowledge as ontologies and entity recognizers to improve the automatic recognition of health-related interactions in the literature.