This article was prepared by Alex Gurbych, Ph.D. in AI, CEO at Blackthorn.ai, and a Life Sciences expert. Alex has over 15 years of experience in AI, machine learning, and software engineering. He led AI teams in healthcare & drug discovery, with deep expertise in computer vision, NLP, and data science.
There is a vast amount of natural proteins, but potentially, there are many more that can be created to address current challenges in biotechnology. The aim of protein engineering is to concept and produce proteins with defined properties and functions.
Traditional methods of protein engineering:
- Directed evolution: creating a library with randomly mutated proteins and selecting those with desired traits (requires a lot of work and time);
- Rational design: modifications guided by structural and functional data (limited by the quality and access to structural information).
AI makes this process more efficient as it shifts from trial-and-error to a predictive, data-driven strategy.
In 2024, this industry had a market size valued at USD 4.35 billion and is predicted to increase from USD 5.09 billion in 2025 to approximately USD 20.86 billion by 2034 (CAGR of 16.97%) (Precedence Research).

What Is Protein Engineering with AI?
Artificial intelligence can be useful in both protein engineering strategies:
- In directed evolution: propose mutations, predict function from sequence, substantially reducing experimental cycles.
- In rational design: without a pre-existing template or natural protein as a reference, using only biophysical and biochemical principles, can predict structure from sequence at near-experimental accuracy and enables de novo protein design (Koh et al., 2025).

AI-driven protein design roadmap(Koh et al., 2025)
Let’s look closer at how AI tools can overcome key limitations and achieve better results.
Accuracy in the prediction of protein structure:
- Extract coevolutionary patterns from homologous sequences > improved residue–residue contact maps and backbone geometry;
- Refine structural hypotheses iteratively and combine MSAs, pairwise distances, and 3D coordinates data > atomic-level prediction with experimental quality;
- PLM removes the requirement for MSAs > faster, single-sequence structure inference (particularly valuable for orphan, rapidly evolving genes or synthetic sequences);
- Combine neural network outputs with physics-based simulations > improved handling of multi-domain proteins and better generalization to complex topologies;
- Integrate coevolution, structural embeddings, and cross-modal constraints > accurate prediction of large protein complexes, assemblies, and biomolecular interactions with binding affinity estimation;
- Capture alternative conformations and flexible states > realistic representations of proteins in solution (Zhang et al., 2025).
Expand the diversity of generated proteins:
- Overcome the fixed-topology limitations > identification of novel and diverse fold architectures (including non-natural), and potential functional innovations beyond known PDB entries;
- Can design a set of new protein scaffolds for a family of proteins > avoid brute-force experimental library screening;
- Keeps the core protein fold stable while optimizing loop regions > functionally meaningful diversity without random destabilization(Koh et al., 2025; Zhang et al., 2025).
Accelerates design cycles:
- Use iterative loops: generative models propose novel sequences/folds > predictive tools evaluate foldability and binding precision > experimental results feed back into models;
- GPU-accelerated, memory-efficient inference > fast and cheap predictions at scale;
- Data-driven suggestions on mutations, prioritizations of candidates with higher chances of success;
- Scoring catalytic efficiency, binding affinity, stability, solubility, and immune response properties > fewer experimental assays(Zhang et al., 2025).
AI-powered protein engineering already has remarkable value in synthetic biology and biotechnology:
- Enhance nonnatural catalyze reactions: high efficiency and stereoselectivity, stability in concentrated organic solvent conditions (up to 70% ethanol), and thermal resistance > 90 °C;
- In target/epitope specification: compact proteins and peptides with high affinity, stability, and efficacy (researchers engineered miniproteins that neutralize snake venom toxins – 100% survival in affected mice, thermal stability > 95 °C);
- De novo design of enzymes: Kemp eliminase – the top candidate is 60x better in comparison to the initial design; promising share of generated candidates with desired activity ( serine hydrolase – 20%, carbonic anhydrase – 35%, lactate dehydrogenase – 70%);
- Better understanding of complex cell processes: engineered intracellular Ras–GTP activity sensors and proximity-labeling modules, enabled analysis of resistance mechanisms to Ras-G12C inhibitors (Zhang et al., 2025).
AlphaFold3 Overview
Millions of researchers globally have used AlphaFold 2, and its scientific impact has been recognized through many prizes.
In 2024, Google DeepMind and Isomorphic Labs released an improved version– AlphaFold 3. They also launched AlphaFold Server to enable open access to AlphaFold, including a free database of 200 million protein structures (Google).

Architecture of AF3(Malhotra et al., 2025)
Main advancements:
- GDT up to 90.1;
- Its diffusion-based architecture predicts raw atomic coordinates, with denoising random noise and capturing detailed structural features (locally and globally) > better prediction of complex structures;
- 50% more precision compared to leading traditional methods (the PoseBusters benchmark);
Outperforms:
- physics-based tools in predicting biomolecular structures (even without template structures),
- traditional docking techniques in predicting protein–ligand interactions,
- nucleic-acid-specific predictors in protein–nucleic acid interaction accuracy,
- AF2, Rosetta, I-TASSER, and Phyre2 across key parameters (RMSD, TM-score, pLDDT confidence, and computational time);
- Forecast chemical modifications > deeper understanding of cellular processes and disease connections;
- Only 4 blocks of MSA, pair-weighted averaging > save time;
- Doesn`t require excessive specialization for different molecule types;
- Approximates complex molecular interactions with angstrom accuracy;
- Reveals functional insights (predictions correlated strongly with experimental data on protein stability and ligand binding affinities that were affected by disease-associated mutations (r = 0.89, p < 0.001));
Combined with GANs allows de novo protein design with specific functional properties (already successfully generated a series of artificial enzymes with desired catalytic activity)(Abramson et al, 2024; Malhotra et al., 2025).

The predicted structure coloured by pLDDT( estimate prediction confidence): orange, 0–50; yellow, 50–70; cyan, 70–90; and blue, 90–100)( Abramson et al., 2024)
Remaining challenges:
- Struggles with predicting how proteins behave dynamically;
- Inaccurate structures in disordered regions, orphan proteins, highly dynamic proteins, and those with significant conformational changes upon ligand binding ( for example, some enzymes have a closed conformation only when ligand-bound, but AF3 predicts such a conformation for the ligand-free state too);
- Sometimes produces inaccurate chirality, even when it’s correct in provided reference structures (4.4% violation rate);
- Produce overlapping atoms (often in protein–nucleic complexes with greater than 100 nucleotides / 2,000 residues in total);
- Generating a large number of predictions and ranking them to improve accuracy > increased computational costs;
- Predicts a single structure for a particular sequence (possible to increase the variability by modifying the MSA and using multiple seeds, but it may not help)(Abramson et al, 2024; Malhotra et al., 2025).
Boltz 2 Overview
Accurately modeling biomolecular interactions is a critical property, but it hasn’t been enabled properly with any tools.
The Boltz team presented Boltz-2, which exhibits strong performance for both structure and affinity prediction. It’s freely available with open access to model weights, inference pipeline, and training code.

Boltz-2 architecture (Passaro et al., 2025)
Main innovations in architecture:
- Mixed-precision (bfloat16) and the trifast-4 kernel for triangle attention reduce runtime and memory use, enabling training with crop sizes up to 768 tokens.
- Boltz-2x has Boltz-steering — an inference-time method that applies physics-based potentials, improves physical plausibility (overcomes steric clashes and incorrect stereochemistry).
- Broader users’ сontrollability by integrating:
-structure prediction method conditioning,
-template conditioning and steering (integrates related complex structures or multimeric templates without retraining),
-contact and pocket conditioning (allows using specific distance or pocket constraints).
- Specialized PairFormer refinement of protein–ligand contacts with dual-head prediction (one for binding likelihood and the other for continuous affinity) trained on heterogeneous affinity labels (Passaro et al., 2025).
Performance
Boltz-2 outperformed Haiping, GAT, VincDeep, and other methods in binding affinity prediction across 140 complexes. In hit-discovery, it achieves double the average precision of ML and docking baselines, and it has better RMSF and lDDT scores compared to Boltz-1, BioEmu, and AlphaFlow in capturing local protein dynamics (Passaro et al., 2025).

Boltz-2’s performance (Boltz team)
In addition to the above, there are other strengths:
- First AI model to approach the performance of FEP methods in estimating small molecule–protein binding affinity (Pearson of 0.62—comparable to OpenFE), while being 1000x more computationally efficient;
- Data curation and representation learning > overcomes performance/compute time trade-off;
- Training data include experimental and molecular dynamics ensembles; expanded distillation datasets across diverse modalities; enhanced user control > improved binding affinity prediction (Passaro et al., 2025)
Limitations:
- Inefficient molecular dynamics (small dataset, minor architecture tweaks, limited multi-conformation handling);
- Trained on similar data as predecessors;
- Struggles with large complexes, cofactors (ions, water, or multimeric partners); may misplace parts of the ligand or generate chemically unrealistic conformations > require additional help (template of the alternate conformation or running a refinement step);
- A limited affinity crop may truncate long-range interactions or miss relevant pockets (orthosteric/allosteric);
- A comparably new tool, performance has variability across assays, and the reason is unknown (maybe structure errors, poor generalization to new protein families, or low robustness to out-of-distribution molecules) > it needs further testing (deepmirror; Passaro et al., 2025).
OpenFold3
OpenFold3 is a result of collaboration between Novo Nordisk, Columbia University, OpenFold Consortium, and AWS ( hasn’t been publicly released yet).
One of the key current problems is that experimental datasets are not sufficient, so scientists implement distillation datasets (aiming to transfer knowledge from a large teacher model to a smaller student model)(Zhao et al., 2025).
For example, OpenFold2 produces billions of predicted structures (may be imperfect, but still more numerous and diverse than experimental data). OpenFold3 is training on this huge synthetic dataset that helps the model generalize better to rare folds, disordered proteins, RNA, improve co-folding capabilities, reduce noise, and smooth out individual prediction errors.

OpenFold3 distillation workflow (Samee Ullah)
Training datasets of OpenFold3:
- 200 thousand monomer structures (Protein Data Bank, experimental);
- 41 million monomer proteins (OpenFold2, predictions);
- 65 thousand RNA structures (Rfam, predictions);
- 25 thousand disordered proteins (OpenFold2-Multimer, predictions);
- 16 thousand DNA–protein complexes ( JASPAR, predictions)(Samee Ullah).
Additionally, among already available details are that OpenFold3 is trained on the El Capitan supercomputer, powered by Apheris’ federated learning (preventing compromising data privacy), and achieves a throughput of 2400 structures per second on 43,200 AMD APUs.
Boltz 2 vs AlphaFold3: Head-to-Head Comparison
These models are both very successful and competitive approaches. Their strengths and weaknesses have already been mentioned, but there are more differences to consider.
First noticeable advancement of Boltz 2 is substantially faster structure generation (its runtime depends on protein length, AlphaFold’s remains unexplained due to complexity)(Haley et al., 2025).
Another important thing is that despite the fact that both models claim to have open access, it’s on really different levels.
DeepMind released the AlphaFold 3 under a Creative Commons license:
- prohibits commercial use, without a commercial license;
- model weights and the full training pipeline/data are under restricted access;
- some functionalities (ligand binding, certain kinds of modifications) aren’t fully available in the public version.
On the other hand, Boltz2 was released under a permissive MIT license:
- unrestricted modification and use by academic and commercial structures;
- transparency (disclosed details of datasets, model code/weights);
- users can freely experiment, identify and fix bugs, and contribute their own enhancements (Falk Hoffmann).
If comparing the accuracy, it’s difficult to determine which one is superior, as performance varies between tasks and modalities.
On the graphs below, you can see the evaluation of structure prediction performance across different complexes that were unseen in their training sets.

Structure prediction Boltz-2 vs AlphaFold 3 (error bars indicate 95% confidence intervals)
Overall, AF3 had better performance, especially with antibody-antigen structures. However, Boltz has significantly better physicality metrics both for small-molecule conformations and for steric clashes at interfaces (Passaro et al., 2025).
Another comparison was on ligands bound to the SARS-CoV-2 and MERS-CoV main proteases: Boltz-2 has comparable success to AlphaFold3 models, even though they were fine-tuned or with additional physics-based relaxation (Passaro et al., 2025).

Boltz-2 matches the success rate without any fine-tuning or physics relaxation (error bars indicate 95% confidence intervals)(Passaro et al., 2025)
Short summary on the main advancement of each model:
Boltz 2 | AlphaFold3 |
predicts both structure and binding affinities; captures local dynamicsbetter ligand–protein interaction modeling, user controllability and flexibility;speed/efficiency; approaches FEP and is even 1000× faster;better physical validity; accessibility ( disclosed weights/code/ training data, fully open to academia and commercial usage). | – higher accuracy in structure prediction (especially for large complexes and multimeric assemblies);-broader coverage of biomolecules (proteins, nucleic acids, ligands, post-translational modifications…); – more validated and has less variable performance across tasks. |
Conclusion
Both Boltz-2 and AlphaFold 3 are transformative steps in AI-driven protein engineering. However, they have different capabilities and focus:
- Boltz-2 prioritizes speed, users’ control, accessibility, and physical accuracy. It enables prediction of structures and binding affinities, modeling local MD, and achieves FEP-level accuracy.
- AlphaFold 3, on the other hand, focuses on accuracy, generalization, and consistency across diverse biomolecules. Though it has more restrictive licensing and less flexible functional capabilities.
Thus, there is no answer to which model is better. Users need to decide on their own which one can be more useful in a particular task. These approaches demonstrate that AI has already become an essential part of modern protein engineering. Without such tools, the field remains constrained by trial-and-error, inefficiency, and limited precision.Nevertheless, limitations remain in model performance and adoption (particularly within industry), and future progress depends not only on improving individual models but also on integrating their complementary strengths into more accessible, versatile platforms for broader use.