question archive For these problems, you will be processing DNA data from the file dna
Subject:Computer SciencePrice: Bought3
For these problems, you will be processing DNA data from the file dna.txt. Data is printed in the file in pairs of lines. The first line in the pair is the name of the DNA sequence and the second line is the DNA sequence itself. The following provides you with some context for the task, but an understanding of DNA is not required for this assignment.
DNA consists of long chains of chemical compounds called nucleotides. Four nucleotides are present in DNA: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). Certain regions of the DNA are called genes. Most genes encode instructions for building proteins (they're called "protein-coding" genes). These proteins are responsible for carrying out most of the life processes of the organism. Nucleotides in a gene are organized into codons. Codons are groups of three nucleotides and are written as the first letters of their nucleotides (e.g., TAC or GGA). Each codon uniquely encodes a single amino acid, a building block of proteins.
For these problems you will be identifying protein-encoding genes as well as other attributes of the genetic data in the file. Note that all matches for these problems will be case-insensitive.
Determine a single bash shell statement with grep that will perform the operation(s) requested.
You may use input/output redirection operators such as >, <, and |.
1) Print all of the DNA sequences in the file (all of the non-names).
2) Print the full DNA sequences that contain the word "CAT", preceded by the name of the sequence. Hint: man grep to look for options that alter what gets printed for a match.
dna.txt content:
Simple Protein-Coding Gene
ATGCCACTATGGTAG
Upper and Lowercase Protein
ATgCCAACATGgATGCCcGATAtGGATTgA
Valid Gene
ATGCGACCCTAGTAG
Invalid Gene
ATGCGACCCTAGTAGGG
Another Protein-Coding Gene
ATGACCGACTCAGTATAA
Yet-Another Protein-Coding Gene
ATGATCGACTACGATTAG
Yet-Again-Another Protein-Coding Gene
ATGATTGGGCCCGCTTAGTAGTGA
Invalid Protein-Coding Gene
ATGACCGACTCAGTAAAT
Another Invalid Protein-Coding Gene
ATGACCGACTAG
Yet-Another Invalid Protein-Coding Gene
AGGATTGGGCCCGCTTAGTAGTGA
Feline-Encoding DNA
CATCATCATCATCATCATCATCATCATCAT
Palindrome DNA
ACTTCA
1) correct output:
-ATGCCACTATGGTAG
-ATgCCAACATGgATGCCcGATAtGGATTgA
-ATGCGACCCTAGTAG
-ATGCGACCCTAGTAGGG
-ATGACCGACTCAGTATAA
-ATGATCGACTACGATTAG
-ATGATTGGGCCCGCTTAGTAGTGA
-ATGACCGACTCAGTAAAT
-ATGACCGACTAG
-AGGATTGGGCCCGCTTAGTAGTGA
-CATCATCATCATCATCATCATCATCATCAT
-ACTTCA
-TAGACGTACCTTAG
2) correct output:
-Upper and Lowercase Protein
-ATgCCAACATGgATGCCcGATAtGGATTgA
---
-Feline-Encoding DNA
-CATCATCATCATCATCATCATCATCATCAT