Searching for gRNAs in minicircle sequences if you know
the edited
mRNA sequences
I use programs from the GCG package running on a Unix computer, but
any other would work fine with the right comparison matrix. All you have to do
is to have a minicircle sequence file, reverse the sequence (but not
complement), and do a local alignment against
a file with all
the edited mRNAs from T. brucei. The alignment should look for complementary
base pairs (allowing G-U as well as G-C). Any continuous duplex 40 nt or longer
is a putative gRNA. You can provide more evidence for this if you search for the
conserved CSB-3 sequence (ggggttggtgta) on the minicircle. The gRNA genes are in
the same polarity. Then you can search for the conserved RUA-1 sequence (taatagata)
which is always just downstream of each gRNA gene. Most brucei minicircles
contain three gRNA genes, but some contain two. These are of course putative
gRNA genes until the gRNAs themselves are actually identified.
I attach to a Unix computer via an SSH link and also open an SSH-ftp
client to copy files back and forth.
1. Set several alias's on the Unix computer (depends on your Unix shell).
In the Korn shell, the .profile file contains:
alias fs='fromstaden'
alias rev='reverse -NOCOMPLEMENT g'
alias fbs='bestfit -MATRIX =comp -PAIR=0.25"
2. Create a file with all known edited mRNA sequences in tandem. The T.
brucei file "tbseqs.txt" can be obtained at the following site:
http://dna.kdna.ucla.edu/trypanosome/tbedseqs.txt (the edited mRNAs
are in tandem, with the locations of each gene shown at the top. The inserted
U's are inlower case.) Change the file
name to "fb" and copy to the Unix computer.
3. Create several text files with the minicircle sequence. If you
identify the CSB-3 conserved sequence (ggggttggtgta) on the
minicircle, this identifies the polarity and relative location of
putative gRNA gene(s). In the case of T. brucei minicircles, you can also locate the conserved RUA-1 inverted
repeat seq which usually is located in front of each gRNA gene. Cut out three
sequences from the minicircle, each encompassing a putative gRNA gene region and create
text files. Leishmania and Crithidia minicircles have a single gRNA gene. T.
cruzi minicircles have four gRNA genes located between the four conserved
regions.
You can convert any format to raw text format by using the READSEQ program
at
http://www-bimas.cit.nih.gov/molbio/readseq/ . Copy to the Unix
computer.
4. Run "fs" to change the minicircle sequence format to GCG format. Save file as "g".
5. Run "rev" to reverse but not complement the "g" sequence.
6. Run "fbs" to do a local “Bestfit” alignment of the reversed minicircle
sequence against the edited mRNA sequences, using a matrix (called
"comp")((http://dna.kdna.ucla.edu/trypanosome/comp.txt).
) to look for G-U base pairs in addition to the normal base pairs. The first
file to compare is "fb" and the second is "g". The output file is "fb.pair".
7. Run "CAT fb" to see the alignments and to identify any potential gRNA
sequences. The specific gene can be located from the file header of
"fb" as shown below:
ND8 1-574
ND9- 575-1221
ND7- 1222-2467
CO3 - 2468- 3437
CYb- 3438-4589
A6 - 4590-5409
CO2 - 5410-6041
MURF2 - 6042-7132
CR4 - 7133-7700
ND3 - 7701-8165
RPS12 - 8166-8490
8. Change name of fb.pair to “sequence name”.pair and copy to the local
computer.
Use this to identify the edited gene involved and the location of the gRNA. Be sure to reverse (and not complement) the putative gRNA sequence before searching the minicircle. One easy way to do this is to use the freeware program “ApE”.