Dr Sergei Krivov
- Position: Lecturer in Longitudinal AI
- Areas of expertise: Protein folding; longitudinal big data analysis; free energy landscape analysis; biomarkers from longitudinal data; reaction coordinates, molecular dynamics simulations
- Email: S.Krivov@leeds.ac.uk
- Phone: +44(0)113 343 3141
- Location: 8.109 Astbury
- Website: Astbury | Googlescholar | Researchgate
Research interests
My research interests lie at the intersection of statistical physics, stochastic processes, longitudinal data analysis, machine learning (ML), artificial intelligence (AI), biophysics, and computer simulations of biomolecules. I am developing rigorous and accurate methods to analyze complex stochastic dynamics and apply them across various domains. These applications include state-of-the-art protein folding simulations, patient and disease dynamics from clinical data, and even the dynamics of the game of chess.
Free energy landscapes as function of optimal reaction coordinates describe the complex dynamics in a simple and accurate way: protein folding (left), chess game (middle), and patient recovery after kidney transplant (right). For the latter, the red line and blue segments show the likelihood of successful recovery (P) estimated from the landscape and patients' data.
Optimal Reaction Coordinates
In various fields, a single variable or coordinate is often used to simplify the description of complex processes. For example, the Dow-Jones index describes the stock market, GDP represents economic performance, and reaction coordinates describe chemical reactions like protein folding. These coordinates are typically chosen for their ease of computation or experimental observation. However, using just one coordinate can result in the loss of significant information about the original dynamics.
I proposed using the best possible coordinate for describing the dynamics of interest, known as the optimal reaction coordinate. Diffusion models, based on these coordinates, can describe key properties of dynamics exactly, regardless of the complexity of the original dynamics. I have developed a methodological framework to define what makes a coordinate optimal in different contexts. This framework explains which properties of the dynamics can be described exactly, how to identify such coordinates in practice, and how to validate that a particular coordinate is indeed optimal.
For more details about the framework, see a collection of Jupyter Notebooks https://github.com/krivovsv/CFEPs/blob/main/RC_FEP.ipynb
Non-Parametric Approach for Reaction Coordinate Optimization
To determine optimal coordinates for describing stochastic dynamics, traditional ML methods often use complex functions like artificial neural networks (ANNs), which require extensive expertise. I have developed an alternative, non-parametric approach that offers several advantages. This method does not require extensive expertise in selecting the functional form, is more robust and accurate, and is easier to use. It is specifically designed for rare events, such as protein folding or the onset of a disease, and works seamlessly with irregularly sampled longitudinal data, like clinical records.
In particular, this approach accurately computes optimal reaction coordinates (RCs) for realistic protein folding simulations, providing state-of-the-art results and outperforming alternative methods, including those based on AI/ML.
Data Driven Disease Dynamics Models (D4M).
Most medical AI models developed for healthcare are static in nature and are primarily used for disease classification. These models determine whether a patient has a disease or not, without monitoring the dynamics of the process of becoming unwell. If the patient has the disease, it is often too late for prevention, while if the patient doesn’t, it is too early to act, leaving doctors without enough information. It is crucial to catch disease development at the right time to enable effective intervention.
In contrast, models of disease dynamics offer a fundamentally superior approach by modelling disease progression as a stochastic process, potentially capturing the full informational content embedded within longitudinal clinical datasets. Ideally, such models should describe the entire trajectory of disease progression, from healthy to diseased states, with particular emphasis on the critical transition phase, which often holds the key to understanding disease onset.
We are developing a novel methodology to construct such holistic disease dynamics models that describe the dynamics of several diseases simultaneously. This approach is based on the framework of optimal reaction coordinates and is specifically designed to handle sparse, irregular, and infrequent clinical data, which is challenging for alternative approaches. Additionally, it features a highly sensitive validation criterion, much more robust than the commonly used AUC. As a proof-of-concept, in collaboration with LTHT, we are currently developing such a model to predict the development of acute kidney injury (AKI).
Protein folding and atomistic simulations
The field of computer simulation of dynamics, such as protein folding, conformational changes of biomolecules, or crystallization, faces three main challenges: the accuracy of force-fields, the timescale gap, and the accurate analysis of simulations.
Molecular dynamics (MD) simulations increasingly produce massive trajectories, which are difficult for humans to analyze and interpret. The development of novel state-of-the-art analysis methods is timely, and their absence is widely recognized as a fundamental bottleneck in the application of atomistic simulations. Free energy landscapes, as a function of optimal reaction coordinates, are a fundamental approach to describe stochastic dynamics. They can be used to determine key determinants of these dynamics, such as the free energy barrier and the pre-exponential factor, in a direct manner, which is impossible in an experiment. Using the developed non-parametric approach, we have rigorously determined these quantities for the first time for a state-of-the-art protein folding trajectory.
Simulating the dynamics of biological systems for biologically relevant timescales, from milliseconds to seconds and longer, is very time-consuming. A promising approach in the forthcoming era of exascale computing is to simulate many short trajectories and then stitch them together. Doing this correctly is a very difficult problem. The non-parametric approach has been extended to accurately and rigorously compute optimal reaction coordinates and the corresponding free energy profile from non-equilibrium simulations. This method is particularly useful for stitching together many short trajectories to simulate long timescales.
Stochastic Dynamics in AI/ML
The fields of data analysis, machine learning (ML), and artificial intelligence (AI) are shifting from static data analysis to the study of stochastic dynamics. For instance, large language models (LLMs) learn the stochastic behavior of language by constructing a Markov chain model. These models demonstrate that by understanding the stochastic patterns of language, without needing to comprehend the actual meaning, we can still produce highly useful responses. Similarly, diffusion-based generative AI models, such as Stable Diffusion and RF-Diffusion, learn the stochastic dynamics of object distortion (such as images or biomolecules) caused by noise. They then generate new objects by reversing this distortion process, starting from noise. We are using the developed framework of optimal reaction coordinates to analyze these stochastic dynamics.
<h4>Research projects</h4> <p>Some research projects I'm currently working on, or have worked on, will be listed below. Our list of all <a href="https://biologicalsciences.leeds.ac.uk/dir/research-projects">research projects</a> allows you to view and search the full list of projects in the faculty.</p>Qualifications
- PhD Physics , Novosibirsk, Russia
- MS Physics of Non-Equilbrium Processes, Novosibirsk, Russia
Student education
I currently teach bioinformatics and computational biology to second year biology students; bioinformatics to Biochemistry and Biological Sciences master students, programming to master students in Precision Medicine, statistics to first and second year Biochemistry and Biological Sciences students, carrier and professional development to first and second year biology students, an Advanced Topic Units to final year students. My laboratory hosts students for their final year research projects (Biochemistry and Biological Sciences) for both BSc and MBiol schemes as well as for NatSci program. I have developed and managing Analytical Skills module for masters in Precision Medicine. I am involoved in developing programming and data analysis/machine learning modules for NatSci program.
Undergraduate project topics:
Practical projects
Machine Learning (ML), Artificial Intelligence (AI), and Big Data analytics are becoming increasingly important across various fields, from science to industry. These tools are also starting to have a significant impact on our daily lives, as seen with applications like DALL-E and Chat-GPT. Additionally, they are becoming crucial for future career aspirations.
In our group, we explore how these tools can be used to analyze complex stochastic dynamics across various domains. These include state-of-the-art protein folding simulations, protein aggregation, patient and disease dynamics from clinical data, and even the dynamics of the game of chess. Our group has developed original approaches that outperform many alternative methods, including those based on ML.
Practical projects in our group always involve computers, but no previous programming experience is required. You will learn how to use Python and Jupyter Notebooks, and depending on your project's requirements, you may use cloud computing platforms like Google Colab or install Linux on your laptop. You may also have access to a mini supercomputer with multiple cores or large computational facilities. Projects involve a training period where you will learn about Linux, Python, Jupyter notebooks, and analysis tools.
Your project will be tailored to your interests and skills and may involve the development and testing of new code for novel research. This could include using standard ML tools or developing a new method to analyze clinical data to identify biomarkers that predict the likelihood of kidney transplant rejection. Alternatively, you might apply developed approaches to address specific biochemical questions, such as determining the folding free energy landscape of the lambda repressor protein. If you are interested in programming, you can use your skills to develop a parallel version of existing code or to extend and modify ML algorithms to better describe dynamical processes.
Literature projects
We offer a broad range of literature projects tailored to your interests, all of which will involve some computational aspects. We are particularly interested in literature reviews on the application of Artificial Intelligence, Machine Learning, and Big Data analysis to biological systems or longitudinal clinical datasets. For example, you could explore the biological questions that can be addressed with these tools, the types of data that can be used, or how Chat-GPT and other AI tools can improve and impact research and education in general.
See also:
- Faculty Graduate School
- FindaPhD Project details:
Research groups and institutes
- Structural Biology
- Cancer