TERMinator

Structure-Based Protein Design with Tertiary Repeating Motifs (Prof. Amy Keating, MIT)

TERMinator Model Architecture. The TERM Information Condenser extracts information from structural matches to TERMs in the target protein to construct node and edge embeddings. The GNN Potts Model Encoder takes in TERM data and coordinate features and outputs a Potts model over positional amino acid labels. We use MCMC simulated annealing to generate optimal sequences given the Potts model.

As part of my first PhD rotation, I helped in the development of TERMinator, a neural network designed to predict protein sequence from a given structure. It used TERMs (tertiary repeating motifs), structural motifs found in the PDB, to generate a Potts model for a given protein that could be optimized to generate a sequence. We demonstrated that the use of TERMs and Potts models showed small advantages over previously available methods. My primary contribution was aiding in benchmarking and ablation studies of TERMinator. Our work was presented at a NeurIPS workshop (Li et al., 2021) and published in Protein Science (Li et al., 2022). TERMinator is available for public use here.

References

2022

  1. Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs
    Alex Li, Mindren Lu, Israel Desta, and 3 more authors
    Protein Science, 2022

2021

  1. TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs
    Alex Li, Vikram Sundar, Gevorg Grigoryan, and 1 more author
    In NeurIPS Workshop: Machine Learning and Structural Biology, 2021