Mobile insertion elements such as transposons and T-DNA generate useful genetic variation and are important tools for functional genomics studies in plants and animals. The spectrum of mutations obtained in different systems can be highly influenced by target site preferences inherent in the mechanism of DNA integration. We investigated the target site preferences of Agrobacterium T-DNA insertions in the chromosomes of the model plant Arabidopsis thaliana. The relative frequencies of insertions in genic and intergenic regions of the genome were calculated and DNA composition features associated with the insertion site flanking sequences were identified. Insertion frequencies across the genome indicate that T-strand integration is suppressed near centromeres and rDNA loci, progressively increases towards telomeres, and is highly correlated with gene density. At the gene level, T-DNA integration events show a statistically significant preference for insertion in the 5′ and 3′ flanking regions of protein coding sequences as well as the promoter region of RNA polymerase I transcribed rRNA gene repeats. The increased insertion frequencies in 5′ upstream regions compared to coding sequences are positively correlated with gene expression activity and DNA sequence composition. Analysis of the relationship between DNA sequence composition and gene activity further demonstrates that DNA sequences with high CG-skew ratios are consistently correlated with T-DNA insertion site preference and high gene expression. The results demonstrate genomic and gene-specific preferences for T-strand integration and suggest that DNA sequences with a pronounced transition in CG- and AT-skew ratios are preferred targets for T-DNA integration.
ASJC Scopus subject areas