EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit

Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups i...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Gonzalo Colmenarejo (650249) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1852023196870508544
author Gonzalo Colmenarejo (650249)
author_facet Gonzalo Colmenarejo (650249)
author_role author
dc.creator.none.fl_str_mv Gonzalo Colmenarejo (650249)
dc.date.none.fl_str_mv 2025-01-29T06:13:28Z
dc.identifier.none.fl_str_mv 10.1021/acs.jcim.4c02268.s001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/EFGs_A_Complete_and_Accurate_Implementation_of_Ertl_s_Functional_Group_Detection_Algorithm_in_RDKit/28301540
dc.rights.none.fl_str_mv CC BY-NC 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biophysics
Biochemistry
Molecular Biology
Physiology
Pharmacology
Biotechnology
Cancer
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
smiles canonicalized strings
png binary string
https :// github
highlighted functional groups
full functional groups
extract functional groups
arbitrary organic molecules
full functional group
rdkit functional groups
rdkit contrib directory
functional groups
functional group
organic chemistry
new rdkit
widely used
set corresponding
reactivity properties
python implementation
predefined libraries
medicinal chemistry
idx ),
freely available
ertl ’
atom indices
analyze physicochemical
dc.title.none.fl_str_mv EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).
eu_rights_str_mv openAccess
id Manara_2adb33cd58372feded6cb163ecc22d6d
identifier_str_mv 10.1021/acs.jcim.4c02268.s001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/28301540
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY-NC 4.0
spelling EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKitGonzalo Colmenarejo (650249)BiophysicsBiochemistryMolecular BiologyPhysiologyPharmacologyBiotechnologyCancerBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedsmiles canonicalized stringspng binary stringhttps :// githubhighlighted functional groupsfull functional groupsextract functional groupsarbitrary organic moleculesfull functional grouprdkit functional groupsrdkit contrib directoryfunctional groupsfunctional grouporganic chemistrynew rdkitwidely usedset correspondingreactivity propertiespython implementationpredefined librariesmedicinal chemistryidx ),freely availableertl ’atom indicesanalyze physicochemicalFunctional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).2025-01-29T06:13:28ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.jcim.4c02268.s001https://figshare.com/articles/dataset/EFGs_A_Complete_and_Accurate_Implementation_of_Ertl_s_Functional_Group_Detection_Algorithm_in_RDKit/28301540CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/283015402025-01-29T06:13:28Z
spellingShingle EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
Gonzalo Colmenarejo (650249)
Biophysics
Biochemistry
Molecular Biology
Physiology
Pharmacology
Biotechnology
Cancer
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
smiles canonicalized strings
png binary string
https :// github
highlighted functional groups
full functional groups
extract functional groups
arbitrary organic molecules
full functional group
rdkit functional groups
rdkit contrib directory
functional groups
functional group
organic chemistry
new rdkit
widely used
set corresponding
reactivity properties
python implementation
predefined libraries
medicinal chemistry
idx ),
freely available
ertl ’
atom indices
analyze physicochemical
status_str publishedVersion
title EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
title_full EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
title_fullStr EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
title_full_unstemmed EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
title_short EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
title_sort EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
topic Biophysics
Biochemistry
Molecular Biology
Physiology
Pharmacology
Biotechnology
Cancer
Biological Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
smiles canonicalized strings
png binary string
https :// github
highlighted functional groups
full functional groups
extract functional groups
arbitrary organic molecules
full functional group
rdkit functional groups
rdkit contrib directory
functional groups
functional group
organic chemistry
new rdkit
widely used
set corresponding
reactivity properties
python implementation
predefined libraries
medicinal chemistry
idx ),
freely available
ertl ’
atom indices
analyze physicochemical