Project 1: Fun with DNA (REGEX Lookaround)!
In this project we find Opening reading frame or ORF from DNA sequences with the help of Python regex.
DNA is a sequence of bases, A
, C
, G
, or T
. They are translated into proteins 3-bases where each sequence is called a codon. There is a special start codon ATG
, and three stop codons, TGA
, TAG
, and TAA
. Example:
cgcgcATGcATGcgTGAcTAAcgTAGcgcgcgcgc
An opening reading frame or ORF consists of a start codon, followed by some more codons, and ending with a stop codon. The above example has overlapping ORFs.
ATGcATGcgTGA
andATGcgTGAcTAA
.
The following pattern only finds the first ORF (atgcatgcgtga'
). Since it consumes the first ORF, it also consumes the beginning of the second ORF.
Get hands-on with 1400+ tech skills courses.