Datasets & Models

We believe in open science and every project from the lab is released under the MIT license.

Model

AMPLIFY

A family of efficient protein language models pre-trained on large-scale sequence data.

Protein LM ESM-compatible
Coming soon
Model

SaAMPLIFY

Structure-aligned variants of AMPLIFY, enriched with protein structural knowledge via a lightweight post-training.

Protein LM ESM-compatible
Coming soon
Model

AMPLIFY-C

Coming soon.

Protein LM ESM-compatible
Coming soon
Dataset

AMPLIFY Dataset

A curated large-scale protein sequence dataset built from UniProt, SCOP, and OAS, used to pre-train the AMPLIFY family of models.

Proteins Sequences Pre-training
Coming soon
Dataset

Metagenomic Dataset

Coming soon.

Proteins Sequences Pre-training
Coming soon
Code

FLAIR Codebase

A unified codebase for pre-training, fine-tuning, and evaluating the FLAIR Lab's protein language models.

Python PyTorch HuggingFace
Coming soon