the cognistx blog

Cognistx’s MoleculeAI Platform Revolutionizes Drug Discovery for Psychedelic Compounds

October 4, 2022
By
Roshan Bhave and Justin Waltrip

Imagine having access to accurate drug candidate properties instantly without always needing to perform assay testing. Because of advances in the field of artificial intelligence and in-silico molecular testing, you no longer have to imagine. Using only the structure of a molecule, it is now possible to generate high-accuracy and near-instantaneous property predictions for a wide variety of small molecule drug candidates.

At Cognistx, we specialize in building advanced applied machine learning products that provide business value in real-world scenarios. Given the tremendous potential for artificial intelligence in drug discovery, we have developed MoleculeAI - a centralized platform for small molecule lead generation and optimization with AI tools for molecular property prediction, compound clustering, and automatic compound generation.

In this article, we will first introduce the drug development pipeline and identify the primary bottlenecks which can be solved using AI. Next, we will consider a case study for using the MoleculeAI platform to generate novel NSAID derivatives. Finally, we will highlight the business return-on-investment opportunities created by using MoleculeAI.

Before a drug is available on the market, it must first pass through several key stages of pre-clinical drug development, including target identification, compound screening, lead identification, and lead optimization. Overall, the drug development process can take 12-15 years and cost over $1 billion [1]. AI can help you save valuable time and help reduce the risk for lead candidates, as illustrated in Figure 1.

Figure 1: How In Silico testing reduces the number of compounds and saves precious time it takes to develop a new drug.

Lead optimization is the primary bottleneck within the drug discovery phase before preclinical trials may commence, and it requires scientists to rank their candidates based on how they are predicted to do during preclinical trials, which can be inferred from chemoinformatics and biomolecular expertise. Given how much computational translation there has been in the realms of cheminformatics and biology, coupled with the amount of available pharmacological and biomolecular datasets, inferences that scientists have made for years can be heavily streamlined through computational calculations and predictive tasks using AI. Scientists can now measure the potential success of their candidates in this manner, and we call it “in-silico testing”. Traditional psychedelics like LSD and Psilocybin have shown to have high binding affinity to the 5-HT receptor. With this in mind, Enveric was interested in investigating how predictable the binding affinity of a given compound to the 5-HT receptor would be using available datasets and Machine Learning modalities. 

Our work was primarily focused on a subset of the BindingDB database (1-3), an online public data source containing information curated from patents, journals, and binding measurements made by scientists. Specifically, we extracted a subset of the BindingDB database containing binding affinities measurements for small molecule drug candidates against various serotonin receptors. To perform model training, we used Cognistx’s AutoMol high-performance training pipeline. AutoMol is a centralized resource for fully-automated model training, model architecture hyperparameter tuning, ensembling, and deployment for molecular property prediction. Machine learning in industry settings is a slow and expensive process. Data scientists often spend much of their time running different experiments and analyzing results before passing their models onto software engineers, which create the infrastructure for model deployment. Using the AutoMol training pipeline, we can focus more of our time on working directly with the client to ensure that model predictions align with their goals. Once training is complete, AutoMol automatically deploys the highest-performing models to our MoleculeAI dashboard. This allows clients to view model predictions on their current molecular library and generate accurate predictions for new compounds in real-time. MoleculeAI also gives clients access to historical modeling predictions to easily measure progress over time and compare results.

Figure 2: AutoMol’s structured AI approach to developing custom models to meet customer needs.


Enveric’s goal for this engagement was to quickly produce accurate predictions for small molecule drug candidates against serotonin receptors of interest.  

  • Once we had defined the initial dataset, AutoMol could automatically perform many different experiments to generate an ensemble of the highest performing regression models and automatically deploy these models to our MoleculeAI dashboard. This entire process, which would take most companies several weeks or even months, can be completed within a single day.
  • After our initial models were deployed, Enveric was able to test these models on the dashboard and quickly identify that, while model performance was good across an extensive range of compound types, it was underperforming for their specific compounds of interest. Based on this feedback, Cognistx worked with Enveric’s scientists to filter our initial data set by the Ketanserin assay type. Now starting with this modified version of the dataset, we were able to automatically train and deploy our models using the same automatic training process as before.  
  • Within 24 hours, Enveric could check the dashboard and notice that these new models were more accurate than before, especially on their proprietary compounds. However, Enveric decided that, instead of producing specific values for the binding affinity of each compound, it would be more useful to now classify these compounds into binding affinity strength categories such as “high,” “medium,” and “low .”Using this feedback, we were able to threshold our filtered version of the initial dataset into a classification task.  
  • Instead of training regression models, the same training pipeline automatically produced a high-performing ensemble of classification models. Again, Enveric used the MoleculeAI dashboard to identify that these classification model predictions were more useful than previous versions in their decision-making process.
      
  • Finally, in addition to these binding affinity strength classifications, Enveric wanted to compare the binding affinity of their lead candidates to that of serotonin. By working with their scientists, we at Cognistx could further threshold our data set into a binary classification task using the serotonin binding affinity value as a cutoff. After another iteration of model training, Enveric now can view results from all relevant models within the MoleculeAI dashboard along with their associated performance metrics to help them assess confidence in predictions when making important business decisions.

From a business standpoint, the automatic prediction of serotonin receptor binding affinity, given the appropriate dataset configuration per model, allows Enveric scientists to effectively rank new candidates almost instantaneously, as compared to manual review where candidates would need to be tediously compared to compounds that have already had validated metrics for binding affinity to the serotonin receptor.

The time saved on compound validation plus the cost saved from not running assay testing for compounds predicted to have significantly low binding affinity reinforces the business value of these automatic model-based metrics. Beyond the time and cost saved in this process, the advanced analytic capabilities of MoleculeAI gives the user access to significantly more data than before to make more informed decisions and reduce the risk of moving certain compounds forward.

Over time, as more assay data is collected, the training set per predictive model grows, which leads to more fine-tuned predictive metrics on the client’s proprietary data. In the future, our models may include features from docking utilities such as AutoDock Vina, highlighting the binding profile between the actual serotonin receptor protein and a given ligand.

AutoMol is highly scalable and customizable from a featurization and modeling architecture perspective, meaning that the complexity of predictions will only increase as the knowledge space of computational chemistry and biology continues to expand. Because this process is much faster than (but retains the same performance as) traditional machine learning approaches, we can spend more time focusing on data analysis and exploring new, creative techniques to incorporate AI models into your workflow. 

For more information on MoleculeAI, please contact Jagriti@Cognistx.com 


References

  1. Hughes, J. P., Rees, S., Kalindjian, S. B., & Philpott, K. L. (2011). Principles of early drug discovery. British journal of pharmacology, 162(6), 1239–1249. https://doi.org/10.1111/j.1476-5381.2010.01127.x
  2. Chen X., Liu M., Gilson M.K. BindingDB: A Web-accessible molecular recognition database. Comb. Chem. High Throughput Screen. 2002;4:719–725. [PubMed] [Google Scholar]
  3. Chen X., Lin Y., Liu M., Gilson M.K. The binding database: Data management and interface design. Bioinformatics. 2002;18:130–139. [PubMed] [Google Scholar]

Past Blog Posts