The primary goal of BioMapAI is to connect high-dimensional biology data,
To run BioMapAI and DeepMECFS efficiently, we recommend the following hardware and software setup.
- CPU: Minimum Intel Core i7 / AMD Ryzen 7, Recommended Intel Xeon / AMD Threadripper
- RAM: Minimum 16GB, Recommended 32GB+ for large datasets
- GPU: No GPU required
- Storage: SSD recommended for faster I/O
Version:
- Python Version: 3.9.13
- TensorFlow Version: 2.12.0
- Pandas Version: 1.5.0
- Numpy Version: 1.24.0
OS: Linux/macOS/Windows (Linux recommended for best performance)
- Training BioMapAI Model: Total execution time: 23.61 seconds on CPU: x86_64 with Total RAM: 810.20 GB
- Inference with Pretrained DeepMECFS Model: Total execution time: 10.36 seconds econds on CPU: x86_64 with Total RAM: 810.20 GB
BioMapAI is a deep learning framework for multi-stage modeling of biological data. It first predicts intermediate targets (omics scores) and then maps them into a final outcome or classification label.
- OmicScoreModel: Train a model to predict intermediate omic scores (Y).
- ScoreLayer: Build a simple layer (or sub-model) that converts omic scores (Y) into the final target (y0).
- ScoreYModel: Combine the trained omic score model with the ScoreLayer to get final predictions and metrics.
- WeightsAdjust (Optional): Fine-tune the relationship between Y and y0 for better performance.
-
Install Dependencies
pip install numpy pandas tensorflow
-
Clone or Download This Repository
This repository should include:
BioMapAI.py: Contains the classes and methods (OmicScoreModel, ScoreLayer, etc.).example_data/: Folder containingtrain_data.csvandtest_data.csv.BioMapAI_Training_Tutorial.ipynb: Detailed notebook tutorial.
-
Run the Tutorial Notebook
- Open
OmicScoreModel_Tutorial.ipynbin Jupyter Notebook or JupyterLab. - Follow the cells step-by-step to:
- Load training and test data.
- Train the OmicScoreModel to predict intermediate scores (
Y). - Build a ScoreLayer to convert those scores into final predictions (
y0). - Evaluate the model performance on a test set.
- (Optional) Adjust weights to improve performance.
- Open
-
Customize or Extend
- Tune hyperparameters (epochs, optimizer, batch size, etc.).
- Add or remove features in the data CSV files.
- Modify
BioMapAI.pyto create custom network architectures or loss functions. - Integrate advanced data preprocessing or feature engineering techniques.
For an in-depth guide, check out the BioMapAI_Training_Tutorial.ipynb. It covers:
- Data loading and organization
- Model instantiation and training procedures
- How to evaluate intermediate and final predictions
- Strategies for adjusting the model to improve performance
We have used BioMapAI to build pretrained models specifically for ME/CFS omics data, called DeepMECFS.We trained BioMapAI on gut microbiome data (species abundance and KEGG gene abundance), plasma metabolome, high-throughput immune flow cytometry data, Quest lab measurements, and a combined omics file containing key features from all datasets. These models are located in the folder pretrained_model_DeepMECFS/ and can be applied directly to new ME/CFS datasets. Here we use one of public metabolome datasets as an example to walk through how to load and use our pretrained models.
DeepMECFS_metabolome/: Directory containing the trained TensorFlow model.Y2y_metabolome/: Secondary model for converting intermediate features (Y) into final ME/CFS classification.metabolome_feature_metadata.csv: Required features and metadata for alignment with your dataset.
- Install/Clone the repository containing the
pretrained_model_DeepMECFS/folder. - Prepare Your Data:
- Ensure your metabolomics data columns match the names (or COMP_IDs) in
metabolome_feature_metadata.csv. - Scale or normalize your data consistently (e.g., via
StandardScaler).
- Ensure your metabolomics data columns match the names (or COMP_IDs) in
- Run the Tutorial:
- Open
DeepMECFS_Tutorial.ipynb(or equivalent notebook/script). - Follow each step to:
- Load the pretrained models (
DeepMECFS_metabolome/andY2y_metabolome/). - Align your dataset columns to the modelās expected features.
- Generate predictions (ME/CFS vs. Control).
- Evaluate performance metrics (accuracy, AUC, precision, etc.).
- Load the pretrained models (
- Open
- Interpret Results:
- The model outputs a probability (
0 to 1) for ME/CFS classification. - You can threshold this probability (e.g., 0.5) to get a binary label (
CFSvs.Control).
- The model outputs a probability (
- Explore Further:
- You can experiment with different preprocessing or consider re-training parts of the pipeline if your data differs significantly from the original study.
The metabolomics data used to train DeepMECFS is described in:
Arnaud Germain, et al. āPlasma metabolomics reveals disrupted response and recovery following maximal exercise in myalgic encephalomyelitis/chronic fatigue syndrome.ā JCI Insight. 2022;7(9):e157621.
DOI: 10.1172/jci.insight.157621
For detailed instructions, see the Pretrained_DeepMECFS_Tutorial.ipynb. It includes code snippets for loading the data, aligning it to the modelās features, and running inference.
This project is provided under the MIT License.