Supporting Information

Human volume of distribution and clearance predicted with DemQSAR

Ozgur Demir-Kavuk, Joerg Bentzien, Ingo Muegge and Ernst-Walter Knapp

Algebraic Expression for loss functions implemented in DemQSAR

Classification

Regression

Overview of all CDK features and fingerprints

Molecular QSAR (see CDK documentation for detailed information)

Note: descriptors may return more than one single value resulting in a total of 219 descriptors.

ALOGPDescriptor

APolDescriptor

AminoAcidCountDescriptor

AromaticAtomsCountDescriptor

AromaticBondsCountDescriptor

AtomCountDescriptor

AutocorrelationDescriptorCharge

AutocorrelationDescriptorMass

AutocorrelationDescriptorPolarizability

BCUTDescriptor

BPolDescriptor

BondCountDescriptor

CPSADescriptor

CarbonTypesDescriptor

ChiChainDescriptor

ChiClusterDescriptor

ChiPathClusterDescriptor

ChiPathDescriptor

EccentricConnectivityIndexDescriptor

FragmentComplexityDescriptor

GravitationalIndexDescriptor

HBondAcceptorCountDescriptor

HBondDonorCountDescriptor

IPMolecularLearningDescriptor

KappaShapeIndicesDescriptor

KierHallSmartsDescriptor

LargestChainDescriptor

LargestPiSystemDescriptor

LengthOverBreadthDescriptor

LongestAliphaticChainDescriptor

MDEDescriptor

MomentOfInertiaDescriptor

PetitjeanNumberDescriptor

PetitjeanShapeIndexDescriptor

RotatableBondsCountDescriptor

RuleOfFiveDescriptor

TPSADescriptor

VAdjMaDescriptor

WHIMDescriptor

WeightDescriptor

WeightedPathDescriptor

WienerNumbersDescriptor

XLogPDescriptor

ZagrebIndexDescriptor

Standard fingerprint

Daylight style hashed fingerprints. Bits are set according to the occurrence of a particular structural feature (See for example the Daylight inc. theory manual for more information).

Extended fingerprint

Generates an extended fingerprint for a given Molecule, that extends the standard fingerprint with additional bits describing ring features.

Graph Only fingerprint

Specialized version of the standard fingerprint which does not take bond orders into account.

Estate fingerprint

This generates 79 bit fingerprints using the E-State fragments.The E-State fragments are those described in [Hall, L.H. and Kier, L.B. , Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information, Journal of Chemical Information and Computer Science, 1995, 35:1039-1045] and the SMARTS patterns were taken from RDKit. Note that this fingerprint simply indicates the presence or occurrence of the fragments.

MACCS fingerprint

This generates 166 bit MACCS keys.The SMARTS patterns for each of the features was taken from RDKit ( However given that there is no official and explicit listing of the original key definitions, the results of this implementation may differ from others. This class assumes that aromaticity perception and atom typing have been performed prior to generating the fingerprint Note Currently bits 1 and 44 are completely ignored since the RDKit defs do not provide a definition.

Substructure fingerprint

Gives a bit set which has a size equal to the number of substructures it was constructed from. A set bit indicates that that substructure was found at least once in the molecule for which the fingerprint was calculated. The default 307 CDK substructures were used.

Description / Pattern
Primary aliph amine / [NX3H2+0,NX4H3+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Carboxylic ester / [CX3;$([R0][#6]),$([H1R0])](=[OX1])[OX2][#6;!$(C=[O,N,S])]
Quaternary aliph ammonium / [NX4H0+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]

Overview of all commercial descriptors

Overview of 404-Qsar Descriptors computed using various software packages

Num_Atoms

Num_MetalAtoms

Num_Bonds

Num_Hydrogens

Num_ExplicitHydrogens

Num_ExplicitAtoms

Num_ExplicitBonds

Num_PositiveAtoms

Num_NegativeAtoms

Num_SpiroAtoms

Num_BridgeHeadAtoms

Num_RingBonds

Num_RotatableBonds

Num_AromaticBonds

Num_BridgeBonds

Num_Rings

Num_AromaticRings

Num_RingAssemblies

Num_Rings3

Num_Rings4

Num_Rings5

Num_Rings6

Num_Rings7

Num_Rings8

Num_Rings9Plus

Num_Chains

Num_ChainAssemblies

Num_Fragments

Num_SGroups

Num_CustomData

Num_PiBonds

Num_RepeatUnits

Num_StereoAtoms

Num_StereoBonds

Num_UnknownStereoAtoms

Num_UnknownStereoBonds

Num_AtomClasses

Num_Macro_Chains

Num_Macro_Residues

Num_TerminalRotomers

Num_TrueStereoAtoms

Num_UnknownTrueStereoAtoms

Num_PseudoStereoAtoms

Num_UnknownPseudoStereoAtoms

Num_MesoStereoAtoms

FormalCharge

Molecular_Weight

Molecular_SurfaceArea

Molecular_PolarSurfaceArea

Molecular_FractionalPolarSurfaceArea

Molecular_SASA

Molecular_PolarSASA

Molecular_FractionalPolarSASA

Molecular_SAVol

Num_H_Acceptors

Num_H_Donors

CLOGP

frac_anion7

frac_cation7

pKa_MA

pKa_MB

MOE_apol

MOE_a_acc

MOE_a_acid

MOE_a_aro

MOE_a_base

MOE_a_count

MOE_a_don

MOE_a_heavy

MOE_a_hyd

MOE_a_IC

MOE_a_ICM

MOE_a_nB

MOE_a_nBr

MOE_a_nC

MOE_a_nCl

MOE_a_nF

MOE_a_nH

MOE_a_nI

MOE_a_nN

MOE_a_nO

MOE_a_nP

MOE_a_nS

MOE_balabanJ

MOE_BCUT_PEOE_0

MOE_BCUT_PEOE_1

MOE_BCUT_PEOE_2

MOE_BCUT_PEOE_3

MOE_BCUT_SLOGP_0

MOE_BCUT_SLOGP_1

MOE_BCUT_SLOGP_2

MOE_BCUT_SLOGP_3

MOE_BCUT_SMR_0

MOE_BCUT_SMR_1

MOE_BCUT_SMR_2

MOE_BCUT_SMR_3

MOE_bpol

MOE_b_1rotN

MOE_b_1rotR

MOE_b_ar

MOE_b_count

MOE_b_double

MOE_b_heavy

MOE_b_rotN

MOE_b_rotR

MOE_b_single

MOE_b_triple

MOE_chi0

MOE_chi0v

MOE_chi0v_C

MOE_chi0_C

MOE_chi1

MOE_chi1v

MOE_chi1v_C

MOE_chi1_C

MOE_chiral

MOE_chiral_u

MOE_density

MOE_diameter

MOE_FCharge

MOE_GCUT_PEOE_0

MOE_GCUT_PEOE_1

MOE_GCUT_PEOE_2

MOE_GCUT_PEOE_3

MOE_GCUT_SLOGP_0

MOE_GCUT_SLOGP_1

MOE_GCUT_SLOGP_2

MOE_GCUT_SLOGP_3

MOE_GCUT_SMR_0

MOE_GCUT_SMR_1

MOE_GCUT_SMR_2

MOE_GCUT_SMR_3

MOE_Kier1

MOE_Kier2

MOE_Kier3

MOE_KierA1

MOE_KierA2

MOE_KierA3

MOE_KierFlex

MOE_lip_acc

MOE_lip_don

MOE_lip_druglike

MOE_lip_violation

MOE_logP(o/w)

MOE_logS

MOE_mr

MOE_nmol

MOE_opr_brigid

MOE_opr_leadlike

MOE_opr_nring

MOE_opr_nrot

MOE_opr_violation

MOE_PC+

MOE_PC-

MOE_PEOE_PC+

MOE_PEOE_PC-

MOE_PEOE_RPC+

MOE_PEOE_RPC-

MOE_PEOE_VSA+0

MOE_PEOE_VSA+1

MOE_PEOE_VSA+2

MOE_PEOE_VSA+3

MOE_PEOE_VSA+4

MOE_PEOE_VSA+5

MOE_PEOE_VSA+6

MOE_PEOE_VSA-0

MOE_PEOE_VSA-1

MOE_PEOE_VSA-2

MOE_PEOE_VSA-3

MOE_PEOE_VSA-4

MOE_PEOE_VSA-5

MOE_PEOE_VSA-6

MOE_PEOE_VSA_FHYD

MOE_PEOE_VSA_FNEG

MOE_PEOE_VSA_FPNEG

MOE_PEOE_VSA_FPOL

MOE_PEOE_VSA_FPOS

MOE_PEOE_VSA_FPPOS

MOE_PEOE_VSA_HYD

MOE_PEOE_VSA_NEG

MOE_PEOE_VSA_PNEG

MOE_PEOE_VSA_POL

MOE_PEOE_VSA_POS

MOE_PEOE_VSA_PPOS

MOE_petitjean

MOE_petitjeanSC

MOE_Q_PC+

MOE_Q_PC-

MOE_Q_RPC+

MOE_Q_RPC-

MOE_Q_VSA_FHYD

MOE_Q_VSA_FNEG

MOE_Q_VSA_FPNEG

MOE_Q_VSA_FPOL

MOE_Q_VSA_FPOS

MOE_Q_VSA_FPPOS

MOE_Q_VSA_HYD

MOE_Q_VSA_NEG

MOE_Q_VSA_PNEG

MOE_Q_VSA_POL

MOE_Q_VSA_POS

MOE_Q_VSA_PPOS

MOE_radius

MOE_reactive

MOE_rings

MOE_RPC+

MOE_RPC-

MOE_SlogP

MOE_SlogP_VSA0

MOE_SlogP_VSA1

MOE_SlogP_VSA2

MOE_SlogP_VSA3

MOE_SlogP_VSA4

MOE_SlogP_VSA5

MOE_SlogP_VSA6

MOE_SlogP_VSA7

MOE_SlogP_VSA8

MOE_SlogP_VSA9

MOE_SMR

MOE_SMR_VSA0

MOE_SMR_VSA1

MOE_SMR_VSA2

MOE_SMR_VSA3

MOE_SMR_VSA4

MOE_SMR_VSA5

MOE_SMR_VSA6

MOE_SMR_VSA7

MOE_TPSA

MOE_VAdjEq

MOE_VAdjMa

MOE_VDistEq

MOE_VDistMa

MOE_vdw_area

MOE_vdw_vol

MOE_vsa_acc

MOE_vsa_acid

MOE_vsa_base

MOE_vsa_don

MOE_vsa_hyd

MOE_vsa_other

MOE_vsa_pol

MOE_Weight

MOE_weinerPath

MOE_weinerPol

MOE_zagreb

VS-DESCRIPTOR-%FU10

VS-DESCRIPTOR-%FU4

VS-DESCRIPTOR-%FU5

VS-DESCRIPTOR-%FU6

VS-DESCRIPTOR-%FU7

VS-DESCRIPTOR-%FU8

VS-DESCRIPTOR-%FU9

VS-DESCRIPTOR-A

VS-DESCRIPTOR-ACACAC

VS-DESCRIPTOR-ACACDO

VS-DESCRIPTOR-ACDODO

VS-DESCRIPTOR-AUS7.4

VS-DESCRIPTOR-CD1

VS-DESCRIPTOR-CD2

VS-DESCRIPTOR-CD3

VS-DESCRIPTOR-CD4

VS-DESCRIPTOR-CD5

VS-DESCRIPTOR-CD6

VS-DESCRIPTOR-CD7

VS-DESCRIPTOR-CD8

VS-DESCRIPTOR-CP

VS-DESCRIPTOR-CW1

VS-DESCRIPTOR-CW2

VS-DESCRIPTOR-CW3

VS-DESCRIPTOR-CW4

VS-DESCRIPTOR-CW5

VS-DESCRIPTOR-CW6

VS-DESCRIPTOR-CW7

VS-DESCRIPTOR-CW8

VS-DESCRIPTOR-D1

VS-DESCRIPTOR-D2

VS-DESCRIPTOR-D3

VS-DESCRIPTOR-D4

VS-DESCRIPTOR-D5

VS-DESCRIPTOR-D6

VS-DESCRIPTOR-D7

VS-DESCRIPTOR-D8

VS-DESCRIPTOR-DD1

VS-DESCRIPTOR-DD2

VS-DESCRIPTOR-DD3

VS-DESCRIPTOR-DD4

VS-DESCRIPTOR-DD5

VS-DESCRIPTOR-DD6

VS-DESCRIPTOR-DD7

VS-DESCRIPTOR-DD8

VS-DESCRIPTOR-DIFF

VS-DESCRIPTOR-DODODO

VS-DESCRIPTOR-DRACAC

VS-DESCRIPTOR-DRACDO

VS-DESCRIPTOR-DRDODO

VS-DESCRIPTOR-DRDRAC

VS-DESCRIPTOR-DRDRDO

VS-DESCRIPTOR-DRDRDR

VS-DESCRIPTOR-FLEX

VS-DESCRIPTOR-FLEX_RB

VS-DESCRIPTOR-G

VS-DESCRIPTOR-HL1

VS-DESCRIPTOR-HL2

VS-DESCRIPTOR-HSA

VS-DESCRIPTOR-ID1

VS-DESCRIPTOR-ID2

VS-DESCRIPTOR-ID3

VS-DESCRIPTOR-ID4

VS-DESCRIPTOR-IW1

VS-DESCRIPTOR-IW2

VS-DESCRIPTOR-IW3

VS-DESCRIPTOR-IW4

VS-DESCRIPTOR-LOGP_c-Hex

VS-DESCRIPTOR-LOGP_n-Oct

VS-DESCRIPTOR-LgD10

VS-DESCRIPTOR-LgD5

VS-DESCRIPTOR-LgD6

VS-DESCRIPTOR-LgD7

VS-DESCRIPTOR-LgD7.5

VS-DESCRIPTOR-LgD8

VS-DESCRIPTOR-LgD9

VS-DESCRIPTOR-NCC

VS-DESCRIPTOR-PHSAR

VS-DESCRIPTOR-POL

VS-DESCRIPTOR-PSA

VS-DESCRIPTOR-PSAR

VS-DESCRIPTOR-R

VS-DESCRIPTOR-S

VS-DESCRIPTOR-V

VS-DESCRIPTOR-W1

VS-DESCRIPTOR-W2

VS-DESCRIPTOR-W3

VS-DESCRIPTOR-W4

VS-DESCRIPTOR-W5

VS-DESCRIPTOR-W6

VS-DESCRIPTOR-W7

VS-DESCRIPTOR-W8

VS-DESCRIPTOR-WN1

VS-DESCRIPTOR-WN2

VS-DESCRIPTOR-WN3

VS-DESCRIPTOR-WN4

VS-DESCRIPTOR-WN5

VS-DESCRIPTOR-WN6

VS-DESCRIPTOR-WO1

VS-DESCRIPTOR-WO2

VS-DESCRIPTOR-WO3

VS-DESCRIPTOR-WO4

VS-DESCRIPTOR-WO5

VS-DESCRIPTOR-WO6

narecs

nvx

nedges

nrings

ncircuits

nclass

nelem

ntpaths

molweight

dX0

dX1

dX2

dXp3

dXp4

dXp5

dXp6

dXp7

dXp8

dXp9

dXp10

nHBd

nwHBd

nHBa

nwHBa

SHBd

SwHBd

SHBa

SwHBa

Hmax

Gmax

Hmin

Gmin

Hmaxpos

Hminneg

n2Pag11

n2Pag12

n2Pag13

n2Pag14

n2Pag15

n2Pag16

n2Pag22

n2Pag23

n2Pag24

n2Pag25

n2Pag26

n2Pag33

n2Pag34

n2Pag35

n2Pag36

n2Pag44

n2Pag45

n2Pag46

n2Pag55

n2Pag56

n2Pag66

Daylight Fingerprints

Daylight fingerprints encode atom paths of various length which are mapped onto a set number of bits using a CRC algorithm. This form of structure encoding is considerably more complex. In addition, atom paths are likely to span both substituents and core simultaneously.

ISIS Keys

ISIS keys encode explicit fragments that provide an overall description of the molecular structure.

MDL/MACCS (PipelinePilot, MOE)

166 predefined keys in PipelinePilot and MOE implementations (abbreviated MDL and MACCS, respectively). Described in detail in Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL keys for use in drug discovery J. Chem. Inf. Comput. Sci. 2002 42 6 1273 1280

Flowchart of DemQSAR

Detailed prediction results for DataExternal

Best four and worst four predictions of the DataExternal

Name / CAS Number / Structure / VDss / predicted VDss
Cytarabine / 147-94-4 / / 0,67 / 0,65
Candoxatrilat / 123122-54-3 / / 0,25 / 0,30
Levocabastine / 79516-68-0 / / 1,17 / 1,24
Clazosentan / 180384-56-9 / / 0,23 / 0,32
852A / 532959-63-0 / / 3,10 / 1,50
Lubeluzole / 144665-07-6 / / 2,60 / 4,46
Fulvestrant / 129453-61-8 / / 4,15 / 2,22
Fluphenazine / 69-23-8 / / 2,90 / 7,22