| How do I search for regulatory
elements in the promoter region of a gene (or a group
of genes)?
The Potential Binding Site Search (PBSS) will serve
this purpose. First of all, you have to decide what
species you would like to perform this search on. There
are three options, Human, Mouse and Rat. You can search
for the regulatory elements existing across the genomes
specified by checking the 'across genomes' check box.
In this example, we will therefore search for regulatory
elements only existing across all available genomes.
The orthlogs information on GeneACT comes from NCBI
homologene. Please visit NCBI
homologene for more information about it. Suppose
you have a gene called Smad4. Any NCBI supported gene
symbols are allowed in PBSS. Alternatively, Entrez Gene
ID/ Locuslink ID are also allowed. According to NCBI
handbook, GeneIDs are equivalent to Locuslink in the
mammalian species. To look for gene symbols or synonyms
for the gene of interest, visit Entrez
Gene for more information.
The gene symbol for Smad4 is SMAD4 or MADH4 (or Gene
ID 4089). The next thing you have to decide is what
regulatory region you want to search. There are three
choices: upstream from transcription start site, upstream
from start codon and downstream from stop codon. You
can choose just one region or all three at the same
time. For the region(s) that you specified, you also
need to specify a range to search (for example, the
default is –500 to +100). Last but not least,
you can either use the Transcription Factor Database
(TFD) as the source of the binding site or you have
to put in some binding site patterns (please refer to
the reference manual for patterns allowed) to search
for. For this example, we will use SVG viewer to visualize
the output.
An example of input options is the following:
1. Checked all three species checkbox.
2. Checked the ‘Across Genomes’ checkbox.
3. Input MADH4 into the GeneID box.
4. Checked the ‘Use TFD’ checkbox.
5. Checked only the ‘Upstream from TSS’
checkbox for regions.
6. Input the Range –2000 to +100.
7. Checked the visualization checkbox.
After you hit the "Submit Search" button,
a graphical view of your gene of interest will show
up (in this case, Smad4) as follows, provided the SVG
viewer is already installed.
WARNING: At present, SVG
visualization feature is only fully supported in Windows
environment. For Macintoch users, the dynamic part of
the SVG visualization might not work properly.
In this SVG view, there are a few features to make
the visualization a little easier. For example, you
can Zoom in and out of the plot using the Start Location
box and End Location box. When you have the mouse over
the binding site on the plot or the list of binding
sites below, it will provide information about such
binding site (BS: Binding Site, Seq: Sequence, Aseq:
Actual Sequence found, Loc: Location of such binding
site). Right below the binding site plot, there are
color switches for you to turn on and off a particular
binding site (both on the plot above and the sequence
view below). Below the switches, binding sites are illustrated
down to the sequence level, with binding sites highlighted
in the color with respect to the switch. Again, on mouse-over
the binding sites, binding site information will show
up in a yellow window. To turn all the binding sites
off on the screen, click on the 'toggle all off' box.
In order to redisplay all binding sites, you must reset
the zoom even if you want the same region by clicking
the "zoom" button.
Using the link on the top of this page, you can download
the sequence in FASTA format or result in a flat file
format (If you do not choose to visualize the gene in
the SVG viewer, this link is the only thing that you
get for the result page, which is highly recommended
for large searches).
Back to Top
How do I retrieve a part of the genomic
sequence of a gene?
To retrieve a part of the genomic sequence, the Genomic
Sequence Retrieval tool will serve the purpose. First
of all, you have to decide what species you would like
to perform this search on. There are three options,
Human, Mouse and Rat. You can choose one of the three
available genomes. Suppose you have a gene called Smad4.
To begin with, you have to look up the gene symbol or
Locus ID of Smad4 (for more information, visit the walkthrough:
How do I search for regulatory elements in the promoter
region of a gene (or a group of genes)? ). One of the
gene symbols for Smad4 is MADH4 (or Locus ID 4089).
The next thing you have to decide is what regulatory
region you want to get the sequence for. There are three
choices: upstream from transcription start site, upstream
from start codon and downstream from stop codon. You
can only choose one region. For the region that you
specified, you also need to specify a range to search
(for example, from –500 to +100). Note that if
the gene is annotated to be on the minus strand, the
reverse complement will be displayed in the output.
The result that you get is the sequence that you ask
for in a FASTA format with a header similar to the one
shown below:
>gene name:|SMAD4|MADH4|DPC4|JIP|
| gene id:4089 | taxon: 9606 | Upstream_from_TSS | Chromosome:
18 | from 46810111 to 46810711 | +
AGGTGCCGCCAGCGTCTGTTTCTTCCCGAAGTGAACTCCTACAACCTAGCCACCTTCTCCCCAGAGCTGT
CGACTGGCTGTTGAAGGCCAATTTTTGTGCCTACGCAGGTCCTCAACACAGAACAAAACAAAAAAACAAC
AAAGGCCGGGCTAATAGCTATTTATAAACACTTACTGGACGCCCACTCTACGCCGAGCTCTCCCGCGCTC
CTTGGATACTTTTTTGCAACGAGATGCCAATTTCCCCGGCGACCACTCCCTCAAACAGGCCTTCGCCTCC
GCCCGCGCTGAGGCCCAGGCCCAGGTCCAGATTCAGAGCCGCCCGCCGGCTGGCGCTGCCCTGTAGGCGC
CTGCGCAGAGCGACCCTCCCCGTCACTCGGAGCGGGAGGCGGGGGCAGCCGGGAGAAAGGAAAGCTGCGG
GGGAAAAGGGCCAAACCCTGAAATTACCCGGATGTGGTCCCCGCGCGCGCATGCTCAGTGGCTTCTCGAC
AAGTTGGCAGCAACAACACGGCCCTGGTCGTCGTCGCCGCTGCGGTAACGGAGCGGTTTGGGTGGCGGAG
CCTGCGTTCGCGCCTTCCCGCTCTCCTCGGGAGGCCCTTCC
If multiple genes are input, sequences will be output
in a concatenated form.
Back to Top
How do I look for the transcription factor
name or binding site sequence in the Transcription Factor
Database?
To look for either binding site sequence or transcription
factor name, TFD search is designed to query the database
for this purpose. User can input a binding site sequence
or a transcription factor name with wildcards specified
below:
N (any)
A G C T
* (any with no spacing limit)
For example, the sequence GTCTNNAC will return the
result as following:
| Sequence |
Name |
Transcription
Factor Name |
Journal |
| GTCTAGAC |
Smad3/Smad4_RS |
Smad3/Smad4 |
Mol Cell 1: 611-7 (1998) |
| GTCTGGAC |
Smad-c-Myc-1 |
Smad factors |
N/A |
Back to Top
How do I look for regulatory elements that
are enriched in a group of genes?
To look for enrichment of binding sites in one set
of genes compared to another, Differential Binding Site
Search is the tool of choice. This tool is intended
for people who have done some microarray experiments
and wonder if there are any binding sites enriched in
their differentially regulated genes.
The principle behind the DBSS is simple. The basic
assumption is that some of the genes that are differentially
expressed in the microarray experiment are regulated
by some common transcription factors. So the count of
the transcription factor binding site should be higher
in the regulated gene set when compared to a control
gene set (i.e. not differentially expressed, which serves
as a background measurement).
To begin with, we are asking how many of a particular
binding site are found in a differentially regulated
gene set as compared to a control gene set. For example,
if you can find a binding site that is 10 fold more
when compared to the control gene set, this suggests
that this binding site serves a particular purpose in
your regulated gene set. Note that only Gene ID input
is allowed in this search. At present, we allow the
region "Upstream from the start codon" and
"Downstream from the stop codon" for the TF
binding sites and the range can be up to -10000 from
the start codon and -2000 away from the stop codon.
For these two regions, MicroRNA seed sequences are also
included in the preprocessing (for more information
about these MicroRNA seed sequences, please refer to
Lewis BP et al., Cell, Vol.115 787-798 2003). In addition,
we also provide MicroRNA target sites enrichment search
on the 3'UTR regions via the option " MicroRNA
target sites in 3'UTR ". We make use of the recently
developed miRanda algorithm to look for the potential
miRNA target sites. For more information about miRanda
algorithm, please refer to http://www.microrna.org/miranda_new.html
. For details about the preprocessing for DBSS in general,
please refer to our manuscript.
The interface of this tool is very similar to the Potential
Binding Site Search. The only difference is that it
contains two fields for Binding Site Ratio. The ratio
serves as a measurement of the fold of enrichment of
a particular binding site in the regulated set compared
to that of the control set. The binding site ratio is
calculated as follows:
For each of the binding sites found:
Ratio = ( (Frequency of a binding site found in regulated
gene set) / (total number of regulated gene input) )
/ ( (Frequency of a binding site found in control gene
set) / (total number of control gene input) )
If the binding site can only be found in either the
regulated gene set or the control gene set, the ‘Ratio’
with value of N/A is reported (in the case of the downloadable
text file, N/A is replaced with -1).
In the result report, the following fields are reported:
1. Name – The name of the binding site.
2. TF Name – The name of the transcription factor
that binds to binding site.
3. Sequence – The sequence of the binding site.
4. Ratio – Fold of binding site enrichment.
5. Regulated Frequency – The number of genes containing
such binding site in the regulated gene set.
6. Control Frequency – The number of genes containing
such binding site in the control gene set.
7. Lookup Gene – Checkbox to submit query for
the genes in the regulated/control gene set that contain
such site.
For the binding sites of interest, one can use the
check box to select the binding site and submit a lookup
query using the "Lookup genes" button. The
genes that contain such site will be reported in the
next page.
Back to Top
|