Searching for gRNAs in minicircle sequences if  you know the edited
mRNA sequences

I use programs from the GCG package running on a Unix computer, but any other would work fine with the right comparison matrix. All you have to do is to have a minicircle sequence file, reverse the sequence (but not complement), and do a local alignment against a file with all the edited mRNAs from T. brucei. The alignment should look for complementary base pairs (allowing G-U as well as G-C). Any continuous duplex 40 nt or longer is a putative gRNA. You can provide more evidence for this if you search for the conserved CSB-3 sequence (ggggttggtgta) on the minicircle. The gRNA genes are in the same polarity. Then you can search for the conserved RUA-1 sequence (taatagata) which is always just downstream of each  gRNA gene. Most brucei minicircles contain three gRNA genes, but some contain two. These are of course putative gRNA genes until the gRNAs themselves are actually identified. 



I attach to a Unix computer via an SSH link and also open an SSH-ftp client to copy files back and forth.

1. Set several alias's on the Unix computer (depends on your Unix shell).
In the Korn shell, the .profile file contains:
alias fs='fromstaden'
alias rev='reverse -NOCOMPLEMENT g'
alias fbs='bestfit -MATRIX =comp -PAIR=0.25"


2. Create a file with all known edited mRNA sequences in tandem. The T.
brucei file "tbseqs.txt" can be obtained at the following site:
http://dna.kdna.ucla.edu/trypanosome/tbedseqs.txt  (the edited mRNAs are in tandem, with the locations of each gene shown at the top. The inserted U's are inlower case.) Change the file
name to "fb" and copy to the Unix computer.

3. Create several text files with the minicircle sequence. If you identify the CSB-3 conserved sequence (ggggttggtgta) on the
minicircle, this identifies the polarity and relative location of putative gRNA gene(s). In the case of T. brucei minicircles, you can also locate the conserved RUA-1 inverted repeat seq which usually is located in front of each  gRNA gene. Cut out three sequences from the minicircle, each encompassing a putative gRNA gene region and create text files. Leishmania and Crithidia minicircles have a single gRNA gene. T. cruzi minicircles have four gRNA genes located between the four conserved regions. You can convert any format to raw text format by using the READSEQ program at http://www-bimas.cit.nih.gov/molbio/readseq/ . Copy to the Unix
computer.


4. Run "fs" to change the minicircle sequence format to GCG format. Save file as "g".

5. Run "rev" to reverse but not complement the "g" sequence.

6. Run "fbs" to do a local “Bestfit” alignment of the reversed minicircle sequence against the edited mRNA sequences, using a matrix (called
"comp")((http://dna.kdna.ucla.edu/trypanosome/comp.txt). )  to look for G-U base pairs  in addition to the normal base pairs. The first file to compare is "fb" and the second is "g". The output file is "fb.pair".

7. Run "CAT fb" to see the alignments and to identify any potential gRNA sequences. The specific gene can be located from the file  header of
"fb" as shown below:


ND8 1-574
ND9- 575-1221
ND7- 1222-2467
CO3 - 2468- 3437
CYb- 3438-4589
A6 - 4590-5409
CO2 - 5410-6041
MURF2 - 6042-7132
CR4 - 7133-7700
ND3 - 7701-8165
RPS12 - 8166-8490


8. Change name of fb.pair to “sequence  name”.pair and copy to the local computer.

Use this to identify the edited gene involved and the location of the gRNA. Be sure to reverse (and not complement) the putative gRNA sequence before searching the minicircle. One easy way to do this is to  use the freeware program “ApE”.