levels in cancer cell chromosomal DNA we
levels in cancer cell chromosomal DNA, we used fluorescence mi-croscopy to stain common laboratory cell lines (293T, derived from human embryonic kidney, HAP1, derived from a male with chronic myelogenous leukemia, and HeLa obtained from a cervical cancer) with a G4 DNA-structure specific antibody, along with staining for chromosomal DNA with DAPI and cytoplasmic SP600125 filaments with phalloidin. Merging of the fluorescent emission spectra showed extensive G4 DNA staining throughout the nuclei and their distinct overlap with the nuclear space, as defined by DAPI (Fig. 1). By contrast, minimum overlap with the cytoskeleton was detected. These results were confirmed on several other immortal cell lines (not shown). Hence, G4 DNA structures appear abundant in cancer cell nuclei and are readily detected by structure-specific antibodies.
The role of G4 DNA structures in eliciting chromosomal rear-rangements has been reported (Bacolla et al., 2016; Georgakopoulos-Soares et al., 2018); however, it has remained unclear whether translocations are stimulated by specific G4 ele-ments throughout the human genome. The COSMIC database contains the largest collection of genomic rearrangements in cancer genomes resolved at bp accuracy; from this collection we obtained a set of 124,918 translocation breakpoints mapped to hg38, mostly from complex rearrangements involving two or more chromo-somes. We excluded identical breakpoint positions, which were often mapped to the same patient sample, to yield a set of 97,691 unique genomic coordinates that may represent, or are close to, original genomic sites of strand break that occurred during cancer development. Separately, we determined the location and number of G4 DNA-forming sequences in hg38, which amounted to a total of 358,605 (chromosome Y excluded) covering ~13.4 million bases.
To assess if translocations occurred at G4 DNA-forming se-quences more often than expected by chance, we mapped the number of breakpoints located within G4 DNA-forming repeats extended by 5 bp on either side. This mapping constrain is stringent because it assumes that a G4 DNA structure directly caused a chromosomal break. There were 738/97,691 such breakpoint po-sitions, or 7.55 10 3 (Fig. 2A). We compared this result with two types of controls. In the first, we selected genomic coordinates located 0.2, 0.4, 0.6, 0.8, 1.0 and 10 kb on either side of the 738 true sites, excluded the positions near unsequenced gaps (with N) in hg38, and assessed how many of the non-gap-containing positions were close to G4 DNA-forming sequences. This yielded an average of 5.68 ± 0.21 10 3 (n ¼ 10) significantly lower than at the real translocation breakpoints (Fig. 2A). In the second, we created five sets of 124,918 non-gap-containing random positions; on average, 4.62 ± 0.25 10 3 (n ¼ 5) of these were within G4 DNA-forming repeats (Fig. 2A), less than both at translocations breakpoints and at their neighboring genomic environment. In conclusion, our analysis shows that translocation breakpoints in cancer occur at G4 DNA-forming repeats more often than expected by chance alone.
3.1.2. Patients with translocations at G4 DNA carry frequent pathologic mutations in p53 Next, we asked whether patients with translocation breakpoints at G4 DNA-forming sequences could be distinguished from those without such a characteristic based on well-defined genetic
alterations. First, we assessed the genome-wide load of trans-locations in each patient; we found that the group of patients with G4-associated breakpoints carried more translocations than the group of patients without G4-associated breakpoints (57.9 ± 59.7 vs. 17.3 ± 20.7; Fig. 2B), although the frequency of breakpoints at G4 DNA-forming repeats did not directly correlate with total trans-locations. Second, even though tumor samples with and without G4-associated breakpoints carried pathologic mutations in cancer-related genes, such as TP53, KRAS, PIK3CA, etc. (Fig. 2C and D), samples with G4-containing breakpoints displayed a greater fre-quency of mutations at TP53, PTPRD and GATA3 than the alternate group. By contrast, the likelihood of harboring pathologic muta-tions at KRAS and CTNNB1 was significantly reduced (Fig. 3E), in accordance with the expectation that mutations in the TP53, RTK/ RAS and Wnt pathways are mutually exclusive (Sanchez-Vega et al., 2018). We conclude that strand breaks at or near G4 DNA-forming sequences occur generally in tumors with high genetic instability, which is promoted in part by mutations in tumor suppressor genes, such as TP53.
3.1.3. SVA transposable elements elicit translocations
Two classes of human non-LTR retrotransposons, LINE1 and Composite, contain family members harboring G4 DNA-forming sequences, including L1PAs (Sahakyan et al., 2017) and SVA ele-ments (Lexa et al., 2014). Because these elements are abundant in the human genome they provide the highest source of G4 DNA-structures along with telomeric sequences (Bhattacharjee et al., 2017; Kejnovsky et al., 2015; Lancrey et al., 2018). We found that the most abundant core G4 DNA-forming sequence mapping to SVA elements was “gggagggaggtggggggg”, present in variable number of copies in the VNTR regions of these elements, which we found to total 3,486 copies in hg38. Seventeen out of 738 translocation breakpoints coinciding with G4 DNA-forming sequences were located at this sequence in SVA_D, SVA_E and SVA_F elements, as classified by repeatMasker (2.3%, Fig. 2F). In hg38 the combined length of the SVA sequence occupied 0.99% of the total bases covered by all G4 DNA-forming sequences (132,779 out of 13,442,760), a percentage matching that of SVA G4-associated tracts relative to the number of G4 tracts genome-wide (0.97; 3,486 out of 358,605) (Fig. 2F). Because the core “gggagg-gaggtggggggg” sequence could occasionally be found outside of SVA regions, we repeated our analysis to breakpoints mapping within “cgtccgggagggaggtgggggggtcagc” (1,964 tracts, 56% of all core sequences), which by including 5 additional bases on either side of the core motif rendered the target sequence SVA-specific. There were 7/738 instances (0.95%) of translocations breakpoints occurring within the extended target sequence, whereas the genome-wide number of such targets was 0.4e0.5% of the total number of G4 DNA-forming sequences (Fig. 2F) (54,852/ 13,442,760*100 in terms of bases and 1,964/358,605*100 in terms of tracts). Thus, translocations breakpoints occur at SVA elements twice more frequently than we would expect by chance.