Supporting Information
Human volume of distribution and clearance predicted with DemQSAR
Ozgur Demir-Kavuk, Joerg Bentzien, Ingo Muegge and Ernst-Walter Knapp
Algebraic Expression for loss functions implemented in DemQSAR
Classification
Regression
Overview of all CDK features and fingerprints
Molecular QSAR (see CDK documentation for detailed information)
Note: descriptors may return more than one single value resulting in a total of 219 descriptors.
ALOGPDescriptor
APolDescriptor
AminoAcidCountDescriptor
AromaticAtomsCountDescriptor
AromaticBondsCountDescriptor
AtomCountDescriptor
AutocorrelationDescriptorCharge
AutocorrelationDescriptorMass
AutocorrelationDescriptorPolarizability
BCUTDescriptor
BPolDescriptor
BondCountDescriptor
CPSADescriptor
CarbonTypesDescriptor
ChiChainDescriptor
ChiClusterDescriptor
ChiPathClusterDescriptor
ChiPathDescriptor
EccentricConnectivityIndexDescriptor
FragmentComplexityDescriptor
GravitationalIndexDescriptor
HBondAcceptorCountDescriptor
HBondDonorCountDescriptor
IPMolecularLearningDescriptor
KappaShapeIndicesDescriptor
KierHallSmartsDescriptor
LargestChainDescriptor
LargestPiSystemDescriptor
LengthOverBreadthDescriptor
LongestAliphaticChainDescriptor
MDEDescriptor
MomentOfInertiaDescriptor
PetitjeanNumberDescriptor
PetitjeanShapeIndexDescriptor
RotatableBondsCountDescriptor
RuleOfFiveDescriptor
TPSADescriptor
VAdjMaDescriptor
WHIMDescriptor
WeightDescriptor
WeightedPathDescriptor
WienerNumbersDescriptor
XLogPDescriptor
ZagrebIndexDescriptor
Standard fingerprint
Daylight style hashed fingerprints. Bits are set according to the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information).
Extended fingerprint
Generates an extended fingerprint for a given Molecule, that extends the standard fingerprint with additional bits describing ring features.
Graph Only fingerprint
Specialized version of the standard fingerprint which does not take bond orders into account.
Estate fingerprint
This generates 79 bit fingerprints using the E-State fragments.The E-State fragments are those described in [Hall, L.H. and Kier, L.B. , Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information, Journal of Chemical Information and Computer Science, 1995, 35:1039-1045] and the SMARTS patterns were taken from RDKit. Note that this fingerprint simply indicates the presence or occurrence of the fragments.
MACCS fingerprint
This generates 166 bit MACCS keys.The SMARTS patterns for each of the features was taken from RDKit ( However given that there is no official and explicit listing of the original key definitions, the results of this implementation may differ from others. This class assumes that aromaticity perception and atom typing have been performed prior to generating the fingerprint Note Currently bits 1 and 44 are completely ignored since the RDKit defs do not provide a definition.
Substructure fingerprint
Gives a bit set which has a size equal to the number of substructures it was constructed from. A set bit indicates that that substructure was found at least once in the molecule for which the fingerprint was calculated. The default 307 CDK substructures were used.
Description / PatternPrimary aliph amine / [NX3H2+0,NX4H3+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Carboxylic ester / [CX3;$([R0][#6]),$([H1R0])](=[OX1])[OX2][#6;!$(C=[O,N,S])]
Quaternary aliph ammonium / [NX4H0+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Overview of all commercial descriptors
Overview of 404-Qsar Descriptors computed using various software packages
Num_Atoms
Num_MetalAtoms
Num_Bonds
Num_Hydrogens
Num_ExplicitHydrogens
Num_ExplicitAtoms
Num_ExplicitBonds
Num_PositiveAtoms
Num_NegativeAtoms
Num_SpiroAtoms
Num_BridgeHeadAtoms
Num_RingBonds
Num_RotatableBonds
Num_AromaticBonds
Num_BridgeBonds
Num_Rings
Num_AromaticRings
Num_RingAssemblies
Num_Rings3
Num_Rings4
Num_Rings5
Num_Rings6
Num_Rings7
Num_Rings8
Num_Rings9Plus
Num_Chains
Num_ChainAssemblies
Num_Fragments
Num_SGroups
Num_CustomData
Num_PiBonds
Num_RepeatUnits
Num_StereoAtoms
Num_StereoBonds
Num_UnknownStereoAtoms
Num_UnknownStereoBonds
Num_AtomClasses
Num_Macro_Chains
Num_Macro_Residues
Num_TerminalRotomers
Num_TrueStereoAtoms
Num_UnknownTrueStereoAtoms
Num_PseudoStereoAtoms
Num_UnknownPseudoStereoAtoms
Num_MesoStereoAtoms
FormalCharge
Molecular_Weight
Molecular_SurfaceArea
Molecular_PolarSurfaceArea
Molecular_FractionalPolarSurfaceArea
Molecular_SASA
Molecular_PolarSASA
Molecular_FractionalPolarSASA
Molecular_SAVol
Num_H_Acceptors
Num_H_Donors
CLOGP
frac_anion7
frac_cation7
pKa_MA
pKa_MB
MOE_apol
MOE_a_acc
MOE_a_acid
MOE_a_aro
MOE_a_base
MOE_a_count
MOE_a_don
MOE_a_heavy
MOE_a_hyd
MOE_a_IC
MOE_a_ICM
MOE_a_nB
MOE_a_nBr
MOE_a_nC
MOE_a_nCl
MOE_a_nF
MOE_a_nH
MOE_a_nI
MOE_a_nN
MOE_a_nO
MOE_a_nP
MOE_a_nS
MOE_balabanJ
MOE_BCUT_PEOE_0
MOE_BCUT_PEOE_1
MOE_BCUT_PEOE_2
MOE_BCUT_PEOE_3
MOE_BCUT_SLOGP_0
MOE_BCUT_SLOGP_1
MOE_BCUT_SLOGP_2
MOE_BCUT_SLOGP_3
MOE_BCUT_SMR_0
MOE_BCUT_SMR_1
MOE_BCUT_SMR_2
MOE_BCUT_SMR_3
MOE_bpol
MOE_b_1rotN
MOE_b_1rotR
MOE_b_ar
MOE_b_count
MOE_b_double
MOE_b_heavy
MOE_b_rotN
MOE_b_rotR
MOE_b_single
MOE_b_triple
MOE_chi0
MOE_chi0v
MOE_chi0v_C
MOE_chi0_C
MOE_chi1
MOE_chi1v
MOE_chi1v_C
MOE_chi1_C
MOE_chiral
MOE_chiral_u
MOE_density
MOE_diameter
MOE_FCharge
MOE_GCUT_PEOE_0
MOE_GCUT_PEOE_1
MOE_GCUT_PEOE_2
MOE_GCUT_PEOE_3
MOE_GCUT_SLOGP_0
MOE_GCUT_SLOGP_1
MOE_GCUT_SLOGP_2
MOE_GCUT_SLOGP_3
MOE_GCUT_SMR_0
MOE_GCUT_SMR_1
MOE_GCUT_SMR_2
MOE_GCUT_SMR_3
MOE_Kier1
MOE_Kier2
MOE_Kier3
MOE_KierA1
MOE_KierA2
MOE_KierA3
MOE_KierFlex
MOE_lip_acc
MOE_lip_don
MOE_lip_druglike
MOE_lip_violation
MOE_logP(o/w)
MOE_logS
MOE_mr
MOE_nmol
MOE_opr_brigid
MOE_opr_leadlike
MOE_opr_nring
MOE_opr_nrot
MOE_opr_violation
MOE_PC+
MOE_PC-
MOE_PEOE_PC+
MOE_PEOE_PC-
MOE_PEOE_RPC+
MOE_PEOE_RPC-
MOE_PEOE_VSA+0
MOE_PEOE_VSA+1
MOE_PEOE_VSA+2
MOE_PEOE_VSA+3
MOE_PEOE_VSA+4
MOE_PEOE_VSA+5
MOE_PEOE_VSA+6
MOE_PEOE_VSA-0
MOE_PEOE_VSA-1
MOE_PEOE_VSA-2
MOE_PEOE_VSA-3
MOE_PEOE_VSA-4
MOE_PEOE_VSA-5
MOE_PEOE_VSA-6
MOE_PEOE_VSA_FHYD
MOE_PEOE_VSA_FNEG
MOE_PEOE_VSA_FPNEG
MOE_PEOE_VSA_FPOL
MOE_PEOE_VSA_FPOS
MOE_PEOE_VSA_FPPOS
MOE_PEOE_VSA_HYD
MOE_PEOE_VSA_NEG
MOE_PEOE_VSA_PNEG
MOE_PEOE_VSA_POL
MOE_PEOE_VSA_POS
MOE_PEOE_VSA_PPOS
MOE_petitjean
MOE_petitjeanSC
MOE_Q_PC+
MOE_Q_PC-
MOE_Q_RPC+
MOE_Q_RPC-
MOE_Q_VSA_FHYD
MOE_Q_VSA_FNEG
MOE_Q_VSA_FPNEG
MOE_Q_VSA_FPOL
MOE_Q_VSA_FPOS
MOE_Q_VSA_FPPOS
MOE_Q_VSA_HYD
MOE_Q_VSA_NEG
MOE_Q_VSA_PNEG
MOE_Q_VSA_POL
MOE_Q_VSA_POS
MOE_Q_VSA_PPOS
MOE_radius
MOE_reactive
MOE_rings
MOE_RPC+
MOE_RPC-
MOE_SlogP
MOE_SlogP_VSA0
MOE_SlogP_VSA1
MOE_SlogP_VSA2
MOE_SlogP_VSA3
MOE_SlogP_VSA4
MOE_SlogP_VSA5
MOE_SlogP_VSA6
MOE_SlogP_VSA7
MOE_SlogP_VSA8
MOE_SlogP_VSA9
MOE_SMR
MOE_SMR_VSA0
MOE_SMR_VSA1
MOE_SMR_VSA2
MOE_SMR_VSA3
MOE_SMR_VSA4
MOE_SMR_VSA5
MOE_SMR_VSA6
MOE_SMR_VSA7
MOE_TPSA
MOE_VAdjEq
MOE_VAdjMa
MOE_VDistEq
MOE_VDistMa
MOE_vdw_area
MOE_vdw_vol
MOE_vsa_acc
MOE_vsa_acid
MOE_vsa_base
MOE_vsa_don
MOE_vsa_hyd
MOE_vsa_other
MOE_vsa_pol
MOE_Weight
MOE_weinerPath
MOE_weinerPol
MOE_zagreb
VS-DESCRIPTOR-%FU10
VS-DESCRIPTOR-%FU4
VS-DESCRIPTOR-%FU5
VS-DESCRIPTOR-%FU6
VS-DESCRIPTOR-%FU7
VS-DESCRIPTOR-%FU8
VS-DESCRIPTOR-%FU9
VS-DESCRIPTOR-A
VS-DESCRIPTOR-ACACAC
VS-DESCRIPTOR-ACACDO
VS-DESCRIPTOR-ACDODO
VS-DESCRIPTOR-AUS7.4
VS-DESCRIPTOR-CD1
VS-DESCRIPTOR-CD2
VS-DESCRIPTOR-CD3
VS-DESCRIPTOR-CD4
VS-DESCRIPTOR-CD5
VS-DESCRIPTOR-CD6
VS-DESCRIPTOR-CD7
VS-DESCRIPTOR-CD8
VS-DESCRIPTOR-CP
VS-DESCRIPTOR-CW1
VS-DESCRIPTOR-CW2
VS-DESCRIPTOR-CW3
VS-DESCRIPTOR-CW4
VS-DESCRIPTOR-CW5
VS-DESCRIPTOR-CW6
VS-DESCRIPTOR-CW7
VS-DESCRIPTOR-CW8
VS-DESCRIPTOR-D1
VS-DESCRIPTOR-D2
VS-DESCRIPTOR-D3
VS-DESCRIPTOR-D4
VS-DESCRIPTOR-D5
VS-DESCRIPTOR-D6
VS-DESCRIPTOR-D7
VS-DESCRIPTOR-D8
VS-DESCRIPTOR-DD1
VS-DESCRIPTOR-DD2
VS-DESCRIPTOR-DD3
VS-DESCRIPTOR-DD4
VS-DESCRIPTOR-DD5
VS-DESCRIPTOR-DD6
VS-DESCRIPTOR-DD7
VS-DESCRIPTOR-DD8
VS-DESCRIPTOR-DIFF
VS-DESCRIPTOR-DODODO
VS-DESCRIPTOR-DRACAC
VS-DESCRIPTOR-DRACDO
VS-DESCRIPTOR-DRDODO
VS-DESCRIPTOR-DRDRAC
VS-DESCRIPTOR-DRDRDO
VS-DESCRIPTOR-DRDRDR
VS-DESCRIPTOR-FLEX
VS-DESCRIPTOR-FLEX_RB
VS-DESCRIPTOR-G
VS-DESCRIPTOR-HL1
VS-DESCRIPTOR-HL2
VS-DESCRIPTOR-HSA
VS-DESCRIPTOR-ID1
VS-DESCRIPTOR-ID2
VS-DESCRIPTOR-ID3
VS-DESCRIPTOR-ID4
VS-DESCRIPTOR-IW1
VS-DESCRIPTOR-IW2
VS-DESCRIPTOR-IW3
VS-DESCRIPTOR-IW4
VS-DESCRIPTOR-LOGP_c-Hex
VS-DESCRIPTOR-LOGP_n-Oct
VS-DESCRIPTOR-LgD10
VS-DESCRIPTOR-LgD5
VS-DESCRIPTOR-LgD6
VS-DESCRIPTOR-LgD7
VS-DESCRIPTOR-LgD7.5
VS-DESCRIPTOR-LgD8
VS-DESCRIPTOR-LgD9
VS-DESCRIPTOR-NCC
VS-DESCRIPTOR-PHSAR
VS-DESCRIPTOR-POL
VS-DESCRIPTOR-PSA
VS-DESCRIPTOR-PSAR
VS-DESCRIPTOR-R
VS-DESCRIPTOR-S
VS-DESCRIPTOR-V
VS-DESCRIPTOR-W1
VS-DESCRIPTOR-W2
VS-DESCRIPTOR-W3
VS-DESCRIPTOR-W4
VS-DESCRIPTOR-W5
VS-DESCRIPTOR-W6
VS-DESCRIPTOR-W7
VS-DESCRIPTOR-W8
VS-DESCRIPTOR-WN1
VS-DESCRIPTOR-WN2
VS-DESCRIPTOR-WN3
VS-DESCRIPTOR-WN4
VS-DESCRIPTOR-WN5
VS-DESCRIPTOR-WN6
VS-DESCRIPTOR-WO1
VS-DESCRIPTOR-WO2
VS-DESCRIPTOR-WO3
VS-DESCRIPTOR-WO4
VS-DESCRIPTOR-WO5
VS-DESCRIPTOR-WO6
narecs
nvx
nedges
nrings
ncircuits
nclass
nelem
ntpaths
molweight
dX0
dX1
dX2
dXp3
dXp4
dXp5
dXp6
dXp7
dXp8
dXp9
dXp10
nHBd
nwHBd
nHBa
nwHBa
SHBd
SwHBd
SHBa
SwHBa
Hmax
Gmax
Hmin
Gmin
Hmaxpos
Hminneg
n2Pag11
n2Pag12
n2Pag13
n2Pag14
n2Pag15
n2Pag16
n2Pag22
n2Pag23
n2Pag24
n2Pag25
n2Pag26
n2Pag33
n2Pag34
n2Pag35
n2Pag36
n2Pag44
n2Pag45
n2Pag46
n2Pag55
n2Pag56
n2Pag66
Daylight Fingerprints
Daylight fingerprints encode atom paths of various length which are mapped onto a set number of bits using a CRC algorithm. This form of structure encoding is considerably more complex. In addition, atom paths are likely to span both substituents and core simultaneously.
ISIS Keys
ISIS keys encode explicit fragments that provide an overall description of the molecular structure.
MDL/MACCS (PipelinePilot, MOE)
166 predefined keys in PipelinePilot and MOE implementations (abbreviated MDL and MACCS, respectively). Described in detail in Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL keys for use in drug discovery J. Chem. Inf. Comput. Sci. 2002 42 6 1273 1280
Flowchart of DemQSAR
Detailed prediction results for DataExternal
Best four and worst four predictions of the DataExternal
Name / CAS Number / Structure / VDss / predicted VDssCytarabine / 147-94-4 / / 0,67 / 0,65
Candoxatrilat / 123122-54-3 / / 0,25 / 0,30
Levocabastine / 79516-68-0 / / 1,17 / 1,24
Clazosentan / 180384-56-9 / / 0,23 / 0,32
852A / 532959-63-0 / / 3,10 / 1,50
Lubeluzole / 144665-07-6 / / 2,60 / 4,46
Fulvestrant / 129453-61-8 / / 4,15 / 2,22
Fluphenazine / 69-23-8 / / 2,90 / 7,22