PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks

This study introduces PROFIS, a new generative model capable of the design of structurally novel and target-focused compound libraries. The model relies on a recurrent neural network that was trained to decode embedded molecular fingerprints into SMILES strings. To identify potential novel ligands,...

Full description

Saved in:
Bibliographic Details
Main Author: Hubert Rybka (21190469) (author)
Other Authors: Tomasz Danel (15875368) (author), Sabina Podlewska (3750124) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852020999687503872
author Hubert Rybka (21190469)
author2 Tomasz Danel (15875368)
Sabina Podlewska (3750124)
author2_role author
author
author_facet Hubert Rybka (21190469)
Tomasz Danel (15875368)
Sabina Podlewska (3750124)
author_role author
dc.creator.none.fl_str_mv Hubert Rybka (21190469)
Tomasz Danel (15875368)
Sabina Podlewska (3750124)
dc.date.none.fl_str_mv 2025-04-28T11:37:37Z
dc.identifier.none.fl_str_mv 10.1021/acs.jcim.5c00698.s005
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/PROFIS_Design_of_Target-Focused_Libraries_by_Probing_Continuous_Fingerprint_Space_with_Recurrent_Neural_Networks/28882124
dc.rights.none.fl_str_mv CC BY-NC 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biochemistry
Molecular Biology
Pharmacology
Biotechnology
Science Policy
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
recurrent neural networks
generate candidate ligands
designing ligands outside
bayesian optimization algorithm
generate diverse libraries
recurrent neural network
focused compound libraries
study introduces profis
biological activity predictor
2 </ sub
given drug target
focused libraries
profis network
drug molecule
biological target
activity subspaces
worth noting
widespread use
structurally novel
selfies strings
scripts shared
paper demonstrates
output notation
model relies
latent representations
decode deepsmiles
also emphasizes
dc.title.none.fl_str_mv PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description This study introduces PROFIS, a new generative model capable of the design of structurally novel and target-focused compound libraries. The model relies on a recurrent neural network that was trained to decode embedded molecular fingerprints into SMILES strings. To identify potential novel ligands, a biological activity predictor is first trained on the low-dimensional fingerprint embedding space, enabling the identification of high-activity subspaces for a given drug target. The search for latent representations that are expected to yield active structures upon decoding to SMILES is conducted with a Bayesian optimization algorithm. We present the rationale for using SMILES as the output notation of the recurrent neural network and compare its performance with models trained to decode DeepSMILES and SELFIES strings. The paper demonstrates the application of this protocol to generate candidate ligands of the dopamine D<sub>2</sub> receptor. It also emphasizes the effectiveness of our approach in scaffold-hopping, which is valuable for designing ligands outside the already explored chemical space. We present how passing engineered molecular fingerprints through PROFIS network can be utilized to generate diverse libraries of analogs for a drug molecule of choice. It is worth noting that the protocol is versatile and it can be employed for any biological target, given the availability of a dataset containing known ligands. The potential for widespread use of PROFIS is secured by scripts shared by the authors on GitHub.
eu_rights_str_mv openAccess
id Manara_c7969a072c835caba2d2df509ea1033a
identifier_str_mv 10.1021/acs.jcim.5c00698.s005
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/28882124
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY-NC 4.0
spelling PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural NetworksHubert Rybka (21190469)Tomasz Danel (15875368)Sabina Podlewska (3750124)BiochemistryMolecular BiologyPharmacologyBiotechnologyScience PolicyBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedrecurrent neural networksgenerate candidate ligandsdesigning ligands outsidebayesian optimization algorithmgenerate diverse librariesrecurrent neural networkfocused compound librariesstudy introduces profisbiological activity predictor2 </ subgiven drug targetfocused librariesprofis networkdrug moleculebiological targetactivity subspacesworth notingwidespread usestructurally novelselfies stringsscripts sharedpaper demonstratesoutput notationmodel relieslatent representationsdecode deepsmilesalso emphasizesThis study introduces PROFIS, a new generative model capable of the design of structurally novel and target-focused compound libraries. The model relies on a recurrent neural network that was trained to decode embedded molecular fingerprints into SMILES strings. To identify potential novel ligands, a biological activity predictor is first trained on the low-dimensional fingerprint embedding space, enabling the identification of high-activity subspaces for a given drug target. The search for latent representations that are expected to yield active structures upon decoding to SMILES is conducted with a Bayesian optimization algorithm. We present the rationale for using SMILES as the output notation of the recurrent neural network and compare its performance with models trained to decode DeepSMILES and SELFIES strings. The paper demonstrates the application of this protocol to generate candidate ligands of the dopamine D<sub>2</sub> receptor. It also emphasizes the effectiveness of our approach in scaffold-hopping, which is valuable for designing ligands outside the already explored chemical space. We present how passing engineered molecular fingerprints through PROFIS network can be utilized to generate diverse libraries of analogs for a drug molecule of choice. It is worth noting that the protocol is versatile and it can be employed for any biological target, given the availability of a dataset containing known ligands. The potential for widespread use of PROFIS is secured by scripts shared by the authors on GitHub.2025-04-28T11:37:37ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.jcim.5c00698.s005https://figshare.com/articles/dataset/PROFIS_Design_of_Target-Focused_Libraries_by_Probing_Continuous_Fingerprint_Space_with_Recurrent_Neural_Networks/28882124CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/288821242025-04-28T11:37:37Z
spellingShingle PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
Hubert Rybka (21190469)
Biochemistry
Molecular Biology
Pharmacology
Biotechnology
Science Policy
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
recurrent neural networks
generate candidate ligands
designing ligands outside
bayesian optimization algorithm
generate diverse libraries
recurrent neural network
focused compound libraries
study introduces profis
biological activity predictor
2 </ sub
given drug target
focused libraries
profis network
drug molecule
biological target
activity subspaces
worth noting
widespread use
structurally novel
selfies strings
scripts shared
paper demonstrates
output notation
model relies
latent representations
decode deepsmiles
also emphasizes
status_str publishedVersion
title PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
title_full PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
title_fullStr PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
title_full_unstemmed PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
title_short PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
title_sort PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks
topic Biochemistry
Molecular Biology
Pharmacology
Biotechnology
Science Policy
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
recurrent neural networks
generate candidate ligands
designing ligands outside
bayesian optimization algorithm
generate diverse libraries
recurrent neural network
focused compound libraries
study introduces profis
biological activity predictor
2 </ sub
given drug target
focused libraries
profis network
drug molecule
biological target
activity subspaces
worth noting
widespread use
structurally novel
selfies strings
scripts shared
paper demonstrates
output notation
model relies
latent representations
decode deepsmiles
also emphasizes