Software can open work with .RTF or .TXT or .FASTA (raw text) formatted files. You can associate a file type; make a file type always open by FastPCR.
Open … (Ctrl-O) – open text or RTF file with DNA or protein sequence(s) at Fasta, GenBank, Mega, Blast etc. formats into current open TAB editor (general, additional or result)
Open Large FASTA File into Memory (Ctrl-G) – is quickest way to open huge chromosome size single (like human chromosome 3, max 200Mb) or multiple sequence(s) to memory
Load All Files from Folder (Ctrl-J) – this is useful way opening all text files from one folder at once. You need click on any file from folder and will see result. Especially this is reliable for read all sequences from different files. Each file will converted to FASTA format with name of the file.
Software opened any text file with command line: fastpcr.exe filename.fasta
FastPCR normally expect to read sequence files in FASTA format.
Prepare your sequence data file using a text editor (Notepad, WordPad, Word), and save in ASCII text format (plain text) or Rich Text Format (.RTF).
You can type in FastPCR editors or import nucleotide (protein) sequence(s) from file or from the clipboard as simple text, RTF text, or FASTA format, MEGA, or alignment for BLAST, EMBL result searching, Dialign or MSF alignment, Excel sheet or Word table (two columns), the table with TAB or whitespace separators to general text editor.
Import sequences are not different for two editors: both editors take all ways, with keyboard (Shit-Insert, Ctrl-V) and allowed work with right-click mouse displays
a contextual menu. Importing sequences must be at the same format.
Degenerate DNA sequences are accepted as IUPAC code is an extended vocabulary of 16 letters which allows the description of ambiguous DNA code.
Each letter represents a combination of one or several nucleotides: M=(A/C) R=(A/G)
W=(A/T) S=(G/C) Y=(C/T) K=(G/T) V=(A/G/C) H=(A/C/T) D=(A/G/T) B=(C/G/T) N=(A/G/C/T),
U=T and I (Inosine).
Sequence formats are simply the way in which the amino acid or DNA sequence is recorded
in a computer file.
When sequences are imported you may edit the sequences in general or additional
editors and immediately visualize the result of editing. Press
F12 switch on-typing sequence interpretation to on or off. You can modify
a nucleotide sequences by inserting, deleting and replacing sequence fragments.
Raw format (ASCII)
Like a text/plain format without white space and TABs. It read only standard
IUB/IUPAC amino acid
or nucleic acid codes characters and rejects anything else, low- and upper-case
insensitive. Digits or else are removed and ignored (but Tab and space characters
with combination end line character (Enter press) can be interpreted as column format).
Here are some examples of raw formatted sequence:
ataaattcttattttgacactcaccaaaatagtcacctggaaaacccgctttttgtgaca
FASTA format have a highest priority and is simple as the raw sequence
proceeded by definition line. The definition line begins with a “>”
sign and optionally followed immediately by a name for the sequence with using any
length and amount of words. Many sequences can be listed in
the file, the format indicating a new sequence at each new “>” symbol found. It is important
to press Enter at the end of each line after “>”
to help FastPCR recognize the end and beginning of sequence and sequence’s
name. Make sure the first line starts with a “>”
and has (has not) a header description.
The description must be contained within one line and not run into 2 or more lines.
The sequence starts directly on next line. As for the previous raw data format,
sequences must be in the standard IUB/IUPAC amino acid or nucleic acid codes, any
other characters - digits, spaces, TAB characters or else are ignored, low- and
upper-case insensitive:
>
cggccgagatcaggcgatgcatg >
acgacgacgcagctatattacag
the alignment sequences in FASTA format will read only standard IUB/IUPAC amino
acid or nucleic acid codes characters and rejects anything else, low- and upper-case
insensitive. Sequence alignment (PHYLIP, NEXUS and else) is not necessary to reformat
into FASTA format.
Tables format description
You can directly import the table from text file
or from the clipborad via copy and paste operations from Microsoft Word or Excel
sheet (or OpenOffice), or primer’s list from FastPCR's "PCR design result",
or the table with TAB or whitespace separators. Software reads only first two columns
with names and sequences:
1F1_234-253
agggagtagcttacctcgct
2F1_263-283
gcgaaaaccaagtgcttacct
3F1_290-313
tcctcaagcgaaaaccaatccaca
4F1_318-338
tgttcacatgtttggggacga
5F1_545-564
gcttgtaaggcaaacccaca
6F1_606-625
acgtggtactcatggtgtca
7F1_668-689
cccaacggtttacctcaagggt
8F1_1071-1093
tcgcgaccttatgagaacgctgc
9F1_1112-1131
aagcagcgaccgacgaaacc
To check the correct format was read, look at the information under text editor
in the status bar:
As for the previous raw data format, sequences must be in the standard IUB/IUPAC
amino acid or nucleic acid codes, any other characters - digits, spaces or else
are ignored, low- and upper-case insensitive. Tab character or spaces are used for
recognition columns. Other simple table format is with or without name for primers
(probes or else); name is replaced by single space (space inside sequence not allowed)
and the end of each sequence, press Enter is necessary:
acgaatcgtattcaagcctgc
gcgtcatctggctgctacctcga
cgagcttagtcttcaacgccaa
agaggacgctcgtgtctttcggac
gctcacgtcaaagtcttgtccgag
In case using sequence’s name, no space inside names and sequences are allowed:
acgaatcgtattcaagcctgc
gcgtcatctggctgctacctcga
cgagcttagtcttcaacgccaa
agaggacgctcgtgtctttcggac
Software always indexing each sequence from 1 to N, therefore doesn’t matter
if some sequence’s name are the same or absent: 1 acg aat cgt att caa gcc tgc ccg tca tct ggc tgc tac ctc ga cga gct agt ctt caa cgc caa 1 aga gga cgc tcg tgt ctt tcg gac
Press
at tool bar for converting sequences into FASTA format, for checking correct format
reading.
Press
at tool bar for converting sequences to IUB/IUPAC FASTA format; at tool bar for
converting sequences to original format with FASTA “>”
sign.
GCG/MSF Format
The file may begin with as many lines of comment or description as required. The
comments are terminated with a line containing only two slashes.
The first mandatory line that is recognised as part of the
MSF file contains the text "MSF:";
this line also includes the sequence length and type, the date and an internal check
sum value. There then follows one line per sequence describing the sequence name,
length, checksum and a weight value. Only one name per line is allowed; the qualifier
"Name: " is followed by the sequence name. Extra characters, between the sequence
names and "Len: " are acceptable if they contain no blank characters. Another blank
line is added followed by a line starting with two slashes "//",
this indicates the end of the name list. There then follows another blank line.
Sequences are interleaved on separate lines with gaps represented by periods. Each
sequence line starts with the sequence name which is separated from the aligned
sequence residues by white space.
Simple Alignment format description
The first mandatory line that is recognised as part of the
Simple Alignment file contains the text "Alignment”
(case not sensitive):
GenBank is the NIH genetic sequence database, an annotated collection of all publicly
available DNA sequences. Although there are daily exchanges of information with
the EMBL Nucleotide Sequence Database, it has its own sequence format. Each GenBank
entry includes a concise description of the sequence, the scientific name and taxonomy
of the source organism, and a table of features identifying coding regions and other
sites of biological significance (such as transcription units, sites of mutations
or modifications, and repeats). Protein translations for coding regions are included
in the feature table. Bibliographic references are included along with a link to
the Medline unique identifier for all published sequences. Each sequence entry is
composed of lines. Different types of lines, each with their own format, are used
to record the various data that make up the entry.
All input data files are basic ASCII-text files,
which may contain DNA sequence, protein sequence, evolutionary distance, or phylogenetic
tree data. Most word processing packages (Notepad) allow you to edit and save ASCII
text files. These are usually marked with a .TXT or .MEG extensions.
However, there are a number of features that are common to all MEGA data files,
which are as follows.
The first line must contain the keyword #MEGA
to indicate that the data file is in the MEGA format.
The first mandatory line that is recognised as part of the
Simple Alignment file contains the text " DIALIGN” (case not sensitive).
Blast Queue WEB alignments result format
This format allowed reading and joining of all “Sbjct”
sequences from BLAST result at Internet browsers.
First, you need “select all” (Ctrl-A) text with graphics in the Internet browser page, then copy and paste to FastPCR. If FastPCR don’t recognize directly the format of Internet browser page, you need “select all” (Ctrl-A) text with graphics in the Internet browser page, then copy and paste to Notepad and then repeat the same for this text from Notepad: select all and copy, and after this paste to FastPCR.
Press at tool bar for converting sequences to “intelligent” FASTA format in with saving “-“ from alignment. Sequences in Blast Query Web format are NOT analysed and all non IUB/IUPAC codes are conserved. Sequences in this formatted Web or text files are preceded by a line starting with a ‘>’ symbol, containing the name and labels of the sequence.