New Progress Achieved by Professor Mo Fanyang's Team in AI for Chromatography at Our Institute

Time:Jan 20, 2025

Recently, a collaborative team led by Professor Mo Fanyang from our institute and Professor Zhang Dongxiao from East China University of Technology has, for the first time, clearly revealed the quantitative relationship between Thin-Layer Chromatography (TLC) and Column Chromatography (CC) by combining statistical and machine learning methods. This study proposes a knowledge discovery technique, establishes an interpretable formula, and transforms "Chemist’s experience" into "AI experience", providing theoretical support for determining and optimizing experimental conditions for chromatographic separation. The related achievements were published inNature Communications.

Figure 1: The related work was published in Nature Communications on January 19

Both thin-layer chromatography and column chromatography are analytical separation methods based on chromatographic principles, widely used in synthetic chemistry laboratories. When conducting column chromatography experiments, it is often necessary to first perform thin-layer chromatography analysis and use the retention factor (RF value) to evaluate the relative polarity between components in the mixture and the mobile phase. In actual operations, the composition and proportion of the mobile phase are usually adjusted as needed to ensure that the RF value of the target compound is approximately between 0.2 and 0.3. This experience-based method is highly effective, but the underlying principles have not been fully clarified, resulting in a phenomenon of "knowing the result but not the reason". This hinders researchers' in-depth understanding of the chemical essence of chromatographic separation.

Figure 2: Quantification of the relationship between expert experience in chromatographic separation and data-driven approaches

To address this scientific question, the research team adopted a data-centric perspective, attempting to directly identify the potential coupling relationship between thin-layer chromatography and column chromatography from a large amount of experimental data and express it in the form of a concise equation. To this end, the research team developed an automated column chromatography platform, systematically collected column chromatography retention volumes of 192 compounds under different experimental conditions, and obtained a total of 5984 data entries. Based on this, the research team analyzed the relationship between the retention factor (RF value) of thin-layer chromatography and the retention volume of column chromatography using machine learning methods, and derived an explicit mathematical formula through symbolic regression.

Figure 3: Formula identification and prediction effect of the relationship between column chromatography retention time and thin-layer chromatography RF value

The study revealed an explicit relationship between the distribution range of compound retention volumes in column chromatography and their RF values. In addition, through transfer learning, this formula can be generalized to different chromatographic column specifications. By combining machine learning methods and leveraging AI’s ability to identify patterns and relationships in scientific datasets, this research deciphers the "black box" of chemical experience, provides important theoretical support for the principles of chromatographic separation in experimental chemistry, facilitates the determination of chromatographic separation conditions, and is expected to bring more efficient solutions to related research.

Associate Professor Mo Fanyang (tenured) from Peking University and Professor Zhang Dongxiao (Member of the US National Academy of Engineering) from East China University of Technology are the co-corresponding authors of this paper. The research was supported by projects such as the National Natural Science Foundation of China, the Postdoctoral Science Foundation, and the AI4S Interdisciplinary Special Program of Peking University Shenzhen Graduate School.

Source of this issue: Mo Fanyang's research group

Editing and proofreading: Lilly

Contact us

No. 2199 Lishui Road, Xilihu, Nanshan District, Shenzhen, China

Postal Code: 518055

Copyright © Peking University School of Al for Science All rights reserved