EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit
Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups i...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852023196870508544 |
|---|---|
| author | Gonzalo Colmenarejo (650249) |
| author_facet | Gonzalo Colmenarejo (650249) |
| author_role | author |
| dc.creator.none.fl_str_mv | Gonzalo Colmenarejo (650249) |
| dc.date.none.fl_str_mv | 2025-01-29T06:13:28Z |
| dc.identifier.none.fl_str_mv | 10.1021/acs.jcim.4c02268.s001 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/EFGs_A_Complete_and_Accurate_Implementation_of_Ertl_s_Functional_Group_Detection_Algorithm_in_RDKit/28301540 |
| dc.rights.none.fl_str_mv | CC BY-NC 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biophysics Biochemistry Molecular Biology Physiology Pharmacology Biotechnology Cancer Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified smiles canonicalized strings png binary string https :// github highlighted functional groups full functional groups extract functional groups arbitrary organic molecules full functional group rdkit functional groups rdkit contrib directory functional groups functional group organic chemistry new rdkit widely used set corresponding reactivity properties python implementation predefined libraries medicinal chemistry idx ), freely available ertl ’ atom indices analyze physicochemical |
| dc.title.none.fl_str_mv | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | Functional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs). |
| eu_rights_str_mv | openAccess |
| id | Manara_2adb33cd58372feded6cb163ecc22d6d |
| identifier_str_mv | 10.1021/acs.jcim.4c02268.s001 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/28301540 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY-NC 4.0 |
| spelling | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKitGonzalo Colmenarejo (650249)BiophysicsBiochemistryMolecular BiologyPhysiologyPharmacologyBiotechnologyCancerBiological Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedsmiles canonicalized stringspng binary stringhttps :// githubhighlighted functional groupsfull functional groupsextract functional groupsarbitrary organic moleculesfull functional grouprdkit functional groupsrdkit contrib directoryfunctional groupsfunctional grouporganic chemistrynew rdkitwidely usedset correspondingreactivity propertiespython implementationpredefined librariesmedicinal chemistryidx ),freely availableertl ’atom indicesanalyze physicochemicalFunctional groups are widely used in organic chemistry, because they provide a rationale to analyze physicochemical and reactivity properties. In medicinal chemistry, they are the basis for analyzing ligand–biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that does not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper, a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides (i) a PNG binary string with an image of the molecule with color-highlighted functional groups; (ii) a list of sets of atom indices (idx), each set corresponding to a functional group; (iii) a list of pseudo-SMILES canonicalized strings for the full functional groups; and (iv) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in https://github.com/bbu-imdea/efgs and is part of the RDKit Contrib directory (https://github.com/rdkit/rdkit/tree/master/Contrib/efgs).2025-01-29T06:13:28ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1021/acs.jcim.4c02268.s001https://figshare.com/articles/dataset/EFGs_A_Complete_and_Accurate_Implementation_of_Ertl_s_Functional_Group_Detection_Algorithm_in_RDKit/28301540CC BY-NC 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/283015402025-01-29T06:13:28Z |
| spellingShingle | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit Gonzalo Colmenarejo (650249) Biophysics Biochemistry Molecular Biology Physiology Pharmacology Biotechnology Cancer Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified smiles canonicalized strings png binary string https :// github highlighted functional groups full functional groups extract functional groups arbitrary organic molecules full functional group rdkit functional groups rdkit contrib directory functional groups functional group organic chemistry new rdkit widely used set corresponding reactivity properties python implementation predefined libraries medicinal chemistry idx ), freely available ertl ’ atom indices analyze physicochemical |
| status_str | publishedVersion |
| title | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| title_full | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| title_fullStr | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| title_full_unstemmed | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| title_short | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| title_sort | EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit |
| topic | Biophysics Biochemistry Molecular Biology Physiology Pharmacology Biotechnology Cancer Biological Sciences not elsewhere classified Chemical Sciences not elsewhere classified smiles canonicalized strings png binary string https :// github highlighted functional groups full functional groups extract functional groups arbitrary organic molecules full functional group rdkit functional groups rdkit contrib directory functional groups functional group organic chemistry new rdkit widely used set corresponding reactivity properties python implementation predefined libraries medicinal chemistry idx ), freely available ertl ’ atom indices analyze physicochemical |