Comparison Log
2025-03-23 07:26:24.914206
mwtab Python Library Version: 1.2.5
Source:      https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN005891/mwtab/...
Study ID:    ST003587
Analysis ID: AN005891
Status:      Inconsistent

Sections "PROJECT" contain missmatched items: {('PROJECT_SUMMARY', "Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation. Samples from 48 preterm and 102 term delivery mothers (1:2 ratio) from the All Our Families Cohort (Calgary, AB) were examined. Selected ML applications were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation. Model performance was evaluated based on confusion matrices, receiver operating characteristic curves, and feature importance rankings. Conventional linear models, like Partial Least Squares Discriminant Analysis (PLS-DA) and linear logistic regression, showed moderate predictive potential with AUC-ROC around 0.60. Non-linear models, including Extreme Gradient Boosting (XGBoost) and artificial neural networks, had marginally improved predictive accuracy and strength. Resampling by bootstrapping was also examined. Among all MLs, bootstrap resampling enhanced XGBoost's performance the most, improving AUC-ROC (0.85, 95% CI:0.574-0.995, p<0.001) for the best fitted model. Feature importance analysis by Shapley Additive Explanations analysis consistently identified acylcarnitines and amino acid derivatives as significant metabolites. Findings underscored the complexity of modeling preterm birth prediction, suggesting a trial-and-error approach for model selection."), ('PROJECT_SUMMARY', "Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models'' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation. Samples from 48 preterm and 102 term delivery mothers (1:2 ratio) from the All Our Families Cohort (Calgary, AB) were examined. Selected ML applications were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation. Model performance was evaluated based on confusion matrices, receiver operating characteristic curves, and feature importance rankings. Conventional linear models, like Partial Least Squares Discriminant Analysis (PLS-DA) and linear logistic regression, showed moderate predictive potential with AUC-ROC around 0.60. Non-linear models, including Extreme Gradient Boosting (XGBoost) and artificial neural networks, had marginally improved predictive accuracy and strength. Resampling by bootstrapping was also examined. Among all MLs, bootstrap resampling enhanced XGBoost''s performance the most, improving AUC-ROC (0.85, 95% CI:0.574-0.995, p<0.001) for the best fitted model. Feature importance analysis by Shapley Additive Explanations analysis consistently identified acylcarnitines and amino acid derivatives as significant metabolites. Findings underscored the complexity of modeling preterm birth prediction, suggesting a trial-and-error approach for model selection.")}
Sections "STUDY" contain missmatched items: {('STUDY_SUMMARY', "Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation. Samples from 48 preterm and 102 term delivery mothers (1:2 ratio) from the All Our Families Cohort (Calgary, AB) were examined. Selected ML applications were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation. Model performance was evaluated based on confusion matrices, receiver operating characteristic curves, and feature importance rankings. Conventional linear models, like Partial Least Squares Discriminant Analysis (PLS-DA) and linear logistic regression, showed moderate predictive potential with AUC-ROC around 0.60. Non-linear models, including Extreme Gradient Boosting (XGBoost) and artificial neural networks, had marginally improved predictive accuracy and strength. Resampling by bootstrapping was also examined. Among all MLs, bootstrap resampling enhanced XGBoost's performance the most, improving AUC-ROC (0.85, 95% CI:0.574-0.995, p<0.001) for the best fitted model. Feature importance analysis by Shapley Additive Explanations analysis consistently identified acylcarnitines and amino acid derivatives as significant metabolites. Findings underscored the complexity of modeling preterm birth prediction, suggesting a trial-and-error approach for model selection."), ('STUDY_SUMMARY', "Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models'' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation. Samples from 48 preterm and 102 term delivery mothers (1:2 ratio) from the All Our Families Cohort (Calgary, AB) were examined. Selected ML applications were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation. Model performance was evaluated based on confusion matrices, receiver operating characteristic curves, and feature importance rankings. Conventional linear models, like Partial Least Squares Discriminant Analysis (PLS-DA) and linear logistic regression, showed moderate predictive potential with AUC-ROC around 0.60. Non-linear models, including Extreme Gradient Boosting (XGBoost) and artificial neural networks, had marginally improved predictive accuracy and strength. Resampling by bootstrapping was also examined. Among all MLs, bootstrap resampling enhanced XGBoost''s performance the most, improving AUC-ROC (0.85, 95% CI:0.574-0.995, p<0.001) for the best fitted model. Feature importance analysis by Shapley Additive Explanations analysis consistently identified acylcarnitines and amino acid derivatives as significant metabolites. Findings underscored the complexity of modeling preterm birth prediction, suggesting a trial-and-error approach for model selection.")}
Sections "COLLECTION" contain missmatched items: {('COLLECTION_SUMMARY', "Serum samples were collected during the third trimester between 28 and 32 weeks'' of gestation. The collected serum was centrifuged and stored at -80 storage until the day of sample assay."), ('COLLECTION_SUMMARY', "Serum samples were collected during the third trimester between 28 and 32 weeks' of gestation. The collected serum was centrifuged and stored at -80 storage until the day of sample assay.")}
'Metabolites' section of 'MS_METABOLITE_DATA' block do not match.