Supplemental Text

Quality control of WGS

1. Quality control of raw reads

Raw reads contaminated by adapter sequences, or raw reads with 50% bases whose base quality was 5 and with the proportion of N bases 10%, were filtered. Usually, the ratio of adapter contaminated reads was 2% of the total number, the proportion of low quality reads was <8%, and the proportion of N bases was 10%. If not, we considered discarding all the reads from these lanes.

2. Quality control of WGS

For 30X re-sequencing, whole genome sequences must meet the following criteria: the mapping rate should be 95%, the mismatch rate should be 1%, GC content should be within the normal range 35-45%, and coverage for ≥4X depth should be >95%.

Basic statistics of whole genome sequencing for each individual

Sample / Mapping rate (%) / Mismatch-rate (%) / GC-content (%) / Coverage≥4X (%) / Mean-depth
Unaffected member of HMO Family 1 / 97.14 / 0.53 / 41.09 / 99.13 / 31.01
Proband of HMO Family 1 / 96.92 / 0.58 / 41.30 / 99.19 / 33.52
Proband of HMO Family 3 / 97.87 / 0.40 / 39.75 / 99.60 / 28.39
Patient with Dent disease / 97.31 / 0.46 / 40.23 / 99.47 / 30.75

Supplemental Fig. S1 Radiographs showing exostosis lesions in affected HMO individuals. a Right forearm of Family 1 member III-1, showing exostosis in the ulna and bowed forearm conferring restricted rotation (arrow). b Right leg of Family 1 member III-12, showing the tibial exostosis resulting in the destruction of the fibula (arrow). c Pelvic radiograph of Family 1 member I-2, displaying osteoarthritis and necrosis of the femoral head on the right side of the hip (arrow). d Radiograph of Family 2 member II-1, showing the exostosis at the epiphysis of the phalanx in the fourth digit of the right hand (arrow). e Radiograph of Family 2 member II-1, showing multiple exostoses around the left knee joint (arrows). f Radiograph of Family 3 member II-1, revealing the exostoses in the scapula (arrow).

Supplemental Fig. S2 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from Families 1 and 2 (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequence or reference sequences).

Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints.

Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (6 at the 5’ end and 9 at the 3’end). Thus, the tentative breakpoints of this CNV exemplar were determined to be chr11:43,936,139 and chr11:44,438,037.

Step 3: Tracking sequences in breakpoints regions and fine-tuning of the breakpoints.

(1) Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Then we obtain information on insertions, microhomologies and micro-mutations.

(2)Fine-tuning.

1) Construct sequences before fine-tuning. We searched the inserted sequences (GAGAAAAGCATTTGCAAAAA) by BLAT, and found the only position in 157bp downstream of the 5’ breakpoint. Analysis of the patient reads at the 3' end of the deletion revealed a TGA microhomology (in purple box) that could either be assigned to the deleted sequence or to the breakpoint-flanking sequence due to the presence of an insertion at the breakpoint junction.

2) Fine-tuning. By combining the physical positions and flanking sequence at the breakpoint junction, the 'GTATGA'could be located at the 3’ flanking regions of the 20bp insertion because of perfect mapping to the reference sequence.

3) Construct sequences after fine-tuning. Consequently, the 26bp insertion perfectly matched in Chr11:43,936,296-43,936,321 and the precise position of the 3' breakpoint of the CNV was refined to position chr11:44,438,043.

4) Patient sequences.

Supplemental Fig. S3 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from Family 3 (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequences or reference sequences).

Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints.

Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (12 at the 5’ end, 10 at the 3’end). Thus, the breakpoints of this CNV exemplar were determined to be chr11:44,128,440 and chr11:44,198,500.

Step 3: Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Consequently, we found a 5bp insertion (TCTTG) within the breakpoint junctions and a CC insertion in the flanking regions of the breakpoint.

Supplemental Fig. S4 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from the Dent disease patient (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequences or reference sequences).

Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints.

Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (5 at the 5’ end, 5 at the 3’end). Thus, the breakpoints of this CNV exemplar were determined to be chrX:49,780,222 and chrX: 49,840,741.

Step 3: Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Consequently, we found a 22bp insertion (TACATATAGTGACAGGGAATGG) at the breakpoint junctions.

Supplemental Fig. S5 FISH analysis of the cultured blood cells from the proband of HMO Family 1 (III-1). The EXT2 gene signal is shown in red whilst the control signal from the centromeric sequences of chromosome 11 is shown in green. Note the absence of the EXT2 gene signal in one of the chromosome 11 homologues in both metaphase (a) and interphase (b) cells.

Supplemental Fig. S6 Identification of CNVs by MLPA (Multiplex Ligation-dependent Probe Amplification) and chromosome microarray analyses. a MLPA electropherogram of the HMO Family 1 proband showing the amplification ratio of all EXT2 probes relative to the reference probes (as well as the EXT1 probes). Identical MLPA electropherogram was observed in the Family 2 proband. b MLPA electropherogram of the HMO Family 3 proband showing a heterozygous deletion of exons 2-8 of EXT2 (defined by probes EXT2-04 to EXT2-10). The horizontal red line indicates the threshold ratio indicative of a heterozygous deletion. c Chromosome microarray analysis of the boy with Dent disease revealing a deletion involving part of the CLCN5 gene. By the weighted log2 ratio method (upper panel), the copy number of the X chromosome for a normal male corresponds to baseline -0.5 on the scatterplot. By the copy number state method (bottom panel), the copy number of the X chromosome for a normal male is 1.0. The probes revealing a zero copy number indicate a ~50 kb deletion at Xp11.23-p11.22 with a minimum range of 49,790,892-49,840,451 (hg19).

Supplemental Table S1 Clinical characteristics of the patients

HMO Patient / Sex / Agea
(years) / No. of exostoses / Location of exostoses / Other clinical phenotypes / Surgical therapy / Pain
Family 1-I-2 / Female / 70 / 2 / Femur / Hip osteoarthritis, necrosis of femoral head, and scoliosis / No / Yes
Family 1-II-3 / Male / 48 / 6 / Humerus, tibia, ulna and radius / No / No / No
Family 1-II-5 / Male / 45 / 5 / Femur, fibula and radius / Dislocation of radioulnar joint / No / No
Family 1-II-8 / Female / 40 / 4 / Femur and humerus / No / No / No
Family 1-III-1 / Male / 26 / 8 / Femur, tibia, fibula, humerus, ulna and radius / Forearm deformity and wrist joint dysfunction / No / No
Family 1-III-5 / Male / 13 / 7 / Femur, tibia, fibula, ulna and radius / No / No / No
Family 1-III-9 / Female / 7 / 5 / Femur and rib / No / No / No
Family 1-III-11 / Male / 14 / 13 / Femur, tibia, fibula, humerus, ulna and radius / No / Yes / No
F Family 1-III-12 / Male / 12 / 15 / Femur, tibia, fibula, humerus, ulna and radius / No / Yes / No
Family 2- I -1 / Male / 45 / 6 / Femur and tibia / No / Yes / No
Family 2-II-1 / Male / 13 / 17 / Femur, tibia, fibula and phalanx / No / No / No
Family 3-II-1 / Male / 10 / 6 / Femur, ulna, radius and scapula / No / Yes / No
Dent Disease Patient / Sex / Age*
(years) / Renal damage / Histopathological changes / Other phenotypes
II-1 / Male / 12 / Positive urinary protein, low-molecular-weight proteinuria, hypercalciuria, microscopic hematuria and intermittent hematuria / Mild mesangial proliferative glomerulonephritis, focal glomerulosclerosis and crescent formation in glomeruli / Mild growth retardation

a Age at diagnosis.

Supplemental Table S2 PCR primers for the generation of EXT2 gene probes used for FISH analysis

Forward (5’-3’) / Reverse (5’-3’)
Fish-1 / CGTGGTGTCTCGTTTGGGTTTAAG / GATCTGGTTCCCACCGAATGTAAC
Fish-2 / GGCAATGCTCAAGGTATAGA / AGAAATCCAAGGTAGTAACGGT
Fish-3 / TTAGGCACTGCGAATACTTAGATA / GCCCACCACACTAAACCTC
Fish-4 / CTTTTCTTGAGACCACTTGAACCA / CTAGGGCTTGAACATTCCACG
Fish-5 / TTTCCCTTGTAGTCCACGGCAATAC / ACTCCCTCAAACCCCCTCAATGT
Fish-6 / GGGGAAAGCCTATTGTATCAGT / CTTTTTCCTAATCAGCCCACTAC
Fish-7 / CTCCTGGGGCAGCATTTAAGTA / GCCCATTGGATTTTGCTTATCAC
Fish-8 / AGTGATAGATGGTATTGGACCTAC / GGCCTAACTCTTCTGATAACTCT
Fish-9 / TCTCTTTGTCCCATGTTCTATT / GCCCCATTGTAATTCTACG
Fish-10 / GCAATAGACAAATACTGAAACCTAC / GATTCAAGAGATCCGAGCTAC
Fish-11 / CCTCTGGGCTGAAATGTTACTACTG / AATACTCTCATCTGGCTGATCCCTT
Fish-12 / AGAGGCTGGGTTCAGACTAAATC / CAGCATTAATGGGGAAATAGGA
Fish-13 / ATTTGTTGAACTCTGGTCCATT / TTTAGGAATTTCTGGGCTACAG
Fish-14 / GGCAACATGGACCACATTACTGAT / CTGGCTGACCAAGGAGAGTGTCTA
Fish-15 / CGCCATAGTCCTCACCTACGACC / TGAACAAACACCCCACAGAAGATTAAAC
Fish-16 / GAATCTCCCCTGACACAGTTCTACCT / GCAATGAAGAGAGAAATCACTCGC
Fish-17 / CCTTAAAGGCACACCATAGCAAGT / GGCCCCCTCATCACTAATTAAATC
Fish-18 / CCCTTTGAGTTCATCTTGGAC / TAAACCAGCCAACAGACAGTAGTA
Fish-19 / TTGCTAGGGAGATCGCTAGTTAAGGT / TCTCTTCCAAAGGAGCTACGACAGT
Fish-20 / AAGCAGCATCTCCTGTTCACGTT / GACCCTCTGTTTTTCTCTGACAATACC
Supplemental Table S3 Primers for long-range PCR and for use, following Sanger sequencing, to confirm the whole-genome sequencing findings with respect to the three pathogenic CNVs*
Family / Forward (5’-3’) / Reverse (5’-3’)
Families1/2 / L1 / CGAGGCTTGCTCTCCAACTTCTTAAC / D1 / CCTGGGCTCTTCAACTAGGACAGTAAAC
F1 / AAGAAGTCTGGCAGGATG / R1 / GCTGGGATGAGTAGGTC
Family 3 / L2 / TCTTAAAATGTGGTCTACATGGGAACT / D2 / AGTCCAGGGAAGTATCTAATCCTCATC
F2 / TCACCGCAACCTCCAC / R2 / TCCCCTAATAAAGAAC
Dent disease patient / L3 / GGTGGGCTTGTCTGTGTATTAGAAT / L3 / GTTTCTGTTATTTTGACATGGAATGC
F3 / TGCCCTTTATCTTCCA / R3 / CTGCCTCTGACACTTCT
* L and D:* L and D indicate primers for long-range PCR whilst F and R indicate primers for Sanger sequencing.

Supplemental Table S4 LOD scores for chromosome 11 markers in HMO Family 1

Microsatellite markers / LOD Score at θ= / Zmax
0 / 0.01 / 0.05 / 0.1 / 0.2 / 0.3 / 0.4
D11S4102 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0
D11S905 / 4.21 / 4.15 / 3.88 / 3.53 / 2.76 / 1.91 / 0.96 / 4.21
D11S4191 / 3.01 / 2.96 / 2.77 / 2.51 / 1.95 / 1.32 / 0.65 / 3.01
D11S987 / -∞ / 0.97 / 1.49 / 1.55 / 1.34 / 0.95 / 0.47 / 1.55
D11S4162 / -∞ / 2.15 / 2.58 / 2.53 / 2.06 / 1.38 / 0.61 / 2.58
D11S1314 / -∞ / 2.45 / 2.88 / 2.83 / 2.36 / 1.68 / 0.87 / 2.88

Supplemental Table S5 The three templated inserts derived from distant regions of the human genome

Sample / CNV / Breakpoint insertion (5′ to 3′) / Origin of inserted sequences / Genic region
Dent disease patient / ChrX deletion / TACATATAGTGACAGGGAATGG / ChrX:49701701-49701722(+) / CLCN5 intron
114816 / Chr19 deletion / ATTTGGCAGAGGGGGATTTGGCAGGGTCATAGGACAACAGCGGAGGGAAGGTCAG / Chr17:15999543-15999592(-) / NCOR1 intron
120099/120098 / Ch6 deletion / GTCACCCAGTCTGGAGTGCTGT / Chr1:10452912-10452933(-), Chr1:187864763-187864784(+), Chr1:201144808-201144829(-), Chr4:53575026-53575047(+), Chr9:26186546-26186567(-) / Intergenic region
Intergenic region Intergenic region Intergenic region Intergenic region