GeneACT
Home spacer Help spacer About  
 
 

Help Contents

1. Getting Started...GeneACT walk through


2. GeneACT Tutorial

  1. Potential Binding Site Search
  2. Genomic Sequence Retrieval
  3. Differential Binding Site Search
  4. TFD Search

3. GeneACT Reference Manual

  1. Potential Binding Site Search
  2. Genomic Sequence Retrieval
  3. Differential Binding Site Search
  4. TFD Search

Getting Started...

From the Home Page or any search page, click on one of the searches to start with.

GeneACT walk through

How do I search for regulatory elements in the promoter region of a gene (or a group of genes)?

The Potential Binding Site Search (PBSS) will serve this purpose. First of all, you have to decide what species you would like to perform this search on. There are three options, Human, Mouse and Rat. You can search for the regulatory elements existing across the genomes specified by checking the 'across genomes' check box. In this example, we will therefore search for regulatory elements only existing across all available genomes. The orthlogs information on GeneACT comes from NCBI homologene. Please visit NCBI homologene for more information about it. Suppose you have a gene called Smad4. Any NCBI supported gene symbols are allowed in PBSS. Alternatively, Entrez Gene ID/ Locuslink ID are also allowed. According to NCBI handbook, GeneIDs are equivalent to Locuslink in the mammalian species. To look for gene symbols or synonyms for the gene of interest, visit Entrez Gene for more information.

The gene symbol for Smad4 is SMAD4 or MADH4 (or Gene ID 4089). The next thing you have to decide is what regulatory region you want to search. There are three choices: upstream from transcription start site, upstream from start codon and downstream from stop codon. You can choose just one region or all three at the same time. For the region(s) that you specified, you also need to specify a range to search (for example, the default is –500 to +100). Last but not least, you can either use the Transcription Factor Database (TFD) as the source of the binding site or you have to put in some binding site patterns (please refer to the reference manual for patterns allowed) to search for. For this example, we will use SVG viewer to visualize the output.

An example of input options is the following:

1. Checked all three species checkbox.
2. Checked the ‘Across Genomes’ checkbox.
3. Input MADH4 into the GeneID box.
4. Checked the ‘Use TFD’ checkbox.
5. Checked only the ‘Upstream from TSS’ checkbox for regions.
6. Input the Range –2000 to +100.
7. Checked the visualization checkbox.

After you hit the "Submit Search" button, a graphical view of your gene of interest will show up (in this case, Smad4) as follows, provided the SVG viewer is already installed.

 

 

WARNING: At present, SVG visualization feature is only fully supported in Windows environment. For Macintoch users, the dynamic part of the SVG visualization might not work properly.

In this SVG view, there are a few features to make the visualization a little easier. For example, you can Zoom in and out of the plot using the Start Location box and End Location box. When you have the mouse over the binding site on the plot or the list of binding sites below, it will provide information about such binding site (BS: Binding Site, Seq: Sequence, Aseq: Actual Sequence found, Loc: Location of such binding site). Right below the binding site plot, there are color switches for you to turn on and off a particular binding site (both on the plot above and the sequence view below). Below the switches, binding sites are illustrated down to the sequence level, with binding sites highlighted in the color with respect to the switch. Again, on mouse-over the binding sites, binding site information will show up in a yellow window. To turn all the binding sites off on the screen, click on the 'toggle all off' box. In order to redisplay all binding sites, you must reset the zoom even if you want the same region by clicking the "zoom" button.

Using the link on the top of this page, you can download the sequence in FASTA format or result in a flat file format (If you do not choose to visualize the gene in the SVG viewer, this link is the only thing that you get for the result page, which is highly recommended for large searches).

Back to Top


How do I retrieve a part of the genomic sequence of a gene?

To retrieve a part of the genomic sequence, the Genomic Sequence Retrieval tool will serve the purpose. First of all, you have to decide what species you would like to perform this search on. There are three options, Human, Mouse and Rat. You can choose one of the three available genomes. Suppose you have a gene called Smad4. To begin with, you have to look up the gene symbol or Locus ID of Smad4 (for more information, visit the walkthrough: How do I search for regulatory elements in the promoter region of a gene (or a group of genes)? ). One of the gene symbols for Smad4 is MADH4 (or Locus ID 4089). The next thing you have to decide is what regulatory region you want to get the sequence for. There are three choices: upstream from transcription start site, upstream from start codon and downstream from stop codon. You can only choose one region. For the region that you specified, you also need to specify a range to search (for example, from –500 to +100). Note that if the gene is annotated to be on the minus strand, the reverse complement will be displayed in the output.

The result that you get is the sequence that you ask for in a FASTA format with a header similar to the one shown below:

>gene name:|SMAD4|MADH4|DPC4|JIP| | gene id:4089 | taxon: 9606 | Upstream_from_TSS | Chromosome: 18 | from 46810111 to 46810711 | +
AGGTGCCGCCAGCGTCTGTTTCTTCCCGAAGTGAACTCCTACAACCTAGCCACCTTCTCCCCAGAGCTGT
CGACTGGCTGTTGAAGGCCAATTTTTGTGCCTACGCAGGTCCTCAACACAGAACAAAACAAAAAAACAAC
AAAGGCCGGGCTAATAGCTATTTATAAACACTTACTGGACGCCCACTCTACGCCGAGCTCTCCCGCGCTC
CTTGGATACTTTTTTGCAACGAGATGCCAATTTCCCCGGCGACCACTCCCTCAAACAGGCCTTCGCCTCC
GCCCGCGCTGAGGCCCAGGCCCAGGTCCAGATTCAGAGCCGCCCGCCGGCTGGCGCTGCCCTGTAGGCGC
CTGCGCAGAGCGACCCTCCCCGTCACTCGGAGCGGGAGGCGGGGGCAGCCGGGAGAAAGGAAAGCTGCGG
GGGAAAAGGGCCAAACCCTGAAATTACCCGGATGTGGTCCCCGCGCGCGCATGCTCAGTGGCTTCTCGAC
AAGTTGGCAGCAACAACACGGCCCTGGTCGTCGTCGCCGCTGCGGTAACGGAGCGGTTTGGGTGGCGGAG
CCTGCGTTCGCGCCTTCCCGCTCTCCTCGGGAGGCCCTTCC

If multiple genes are input, sequences will be output in a concatenated form.

Back to Top


How do I look for the transcription factor name or binding site sequence in the Transcription Factor Database?

To look for either binding site sequence or transcription factor name, TFD search is designed to query the database for this purpose. User can input a binding site sequence or a transcription factor name with wildcards specified below:

N (any)

A G C T

* (any with no spacing limit)

For example, the sequence GTCTNNAC will return the result as following:
Sequence Name Transcription Factor Name Journal
GTCTAGAC Smad3/Smad4_RS Smad3/Smad4 Mol Cell 1: 611-7 (1998)
GTCTGGAC Smad-c-Myc-1 Smad factors N/A

 


Back to Top


How do I look for regulatory elements that are enriched in a group of genes?

To look for enrichment of binding sites in one set of genes compared to another, Differential Binding Site Search is the tool of choice. This tool is intended for people who have done some microarray experiments and wonder if there are any binding sites enriched in their differentially regulated genes.

The principle behind the DBSS is simple. The basic assumption is that some of the genes that are differentially expressed in the microarray experiment are regulated by some common transcription factors. So the count of the transcription factor binding site should be higher in the regulated gene set when compared to a control gene set (i.e. not differentially expressed, which serves as a background measurement).

To begin with, we are asking how many of a particular binding site are found in a differentially regulated gene set as compared to a control gene set. For example, if you can find a binding site that is 10 fold more when compared to the control gene set, this suggests that this binding site serves a particular purpose in your regulated gene set. Note that only Gene ID input is allowed in this search. At present, we allow the region "Upstream from the start codon" and "Downstream from the stop codon" for the TF binding sites and the range can be up to -10000 from the start codon and -2000 away from the stop codon. For these two regions, MicroRNA seed sequences are also included in the preprocessing (for more information about these MicroRNA seed sequences, please refer to Lewis BP et al., Cell, Vol.115 787-798 2003). In addition, we also provide MicroRNA target sites enrichment search on the 3'UTR regions via the option " MicroRNA target sites in 3'UTR ". We make use of the recently developed miRanda algorithm to look for the potential miRNA target sites. For more information about miRanda algorithm, please refer to http://www.microrna.org/miranda_new.html . For details about the preprocessing for DBSS in general, please refer to our manuscript.

The interface of this tool is very similar to the Potential Binding Site Search. The only difference is that it contains two fields for Binding Site Ratio. The ratio serves as a measurement of the fold of enrichment of a particular binding site in the regulated set compared to that of the control set. The binding site ratio is calculated as follows:

For each of the binding sites found:

Ratio = ( (Frequency of a binding site found in regulated gene set) / (total number of regulated gene input) ) / ( (Frequency of a binding site found in control gene set) / (total number of control gene input) )

If the binding site can only be found in either the regulated gene set or the control gene set, the ‘Ratio’ with value of N/A is reported (in the case of the downloadable text file, N/A is replaced with -1).

In the result report, the following fields are reported:

1. Name – The name of the binding site.
2. TF Name – The name of the transcription factor that binds to binding site.
3. Sequence – The sequence of the binding site.
4. Ratio – Fold of binding site enrichment.
5. Regulated Frequency – The number of genes containing such binding site in the regulated gene set.
6. Control Frequency – The number of genes containing such binding site in the control gene set.
7. Lookup Gene – Checkbox to submit query for the genes in the regulated/control gene set that contain such site.

For the binding sites of interest, one can use the check box to select the binding site and submit a lookup query using the "Lookup genes" button. The genes that contain such site will be reported in the next page.

Back to Top


GeneACT Tutorial

  1. Potential Binding Site Search

    Potential Binding Site Search is a tool that allows user to search for binding sites (user input or using the transcription factor database, TFD) for many genes in a high throughput way.


    Use case 1:

    Visualization of Single or Multiple promoter region(s) using SVG Viewer

    For known genename/LocusID:

    Potential Binding Sites Tutorial Screen Shot

    1. Select one or multiple species. If multiple species are checked, the choice of reporting only the binding sites existing across genomes is also available. Only genename can be used for cross species search.
      e.g.: check the human checkbox.
    2. Enter Genename or Locus ID into the GeneID dialog box in comma, in return or tab delimited format.
      e.g.: MADH4
    3. Check the Visualization box.
    4. Select 'Use TFD' if you want to search against TFD; alternatively, different sequences can be input into the Binding Site Sequence dialog box. For this tutorial, choose 'Use TFD'.
    5. Select one or more region(s) to work with.
      e.g.: Upstream from TSS
    6. Enter the range of the regions to search for.
      e.g.: from -20 to +100
    7. Press "Submit Search". If a valid email address was entered the server will send you an email when the search is done. Email is recommended for larger scale searches.

    Sample output:

    Potential Binding Sites Tutorial Screen Shot - Sample Output

    When pointing the binding site in the SVG viewer (On-mouse-over), the detailed information of such site will be reported.

    To disable a particular binding site, one can click on the square colored check box to disable such site.

    The fasta file and text output file are available for download from the hyperlink at the top of the page.

    For input sequence:

    Potential Binding Sites Tutorial Screen Shot

    1. Check the input sequence box.
    2. Input the sequence into the dialog box (note that only one sequence at a time is allowed).
    3. Check the SVG Viewer box.
    4. Select 'Use TFD' if you want to search against TFD; alternatively, binding site sequences can be input into the Binding Site Sequence dialog box. Finally, you can use your defined sequences in conjunction with the sequences in TFD.
    5. Press "Submit Search". If a valid email address was entered the server will send you an email when the search is done. Email is recommended for larger scale searches.
    Back to Top
    Use case 2:

    Search for potential binding sites in batch mode

    For known genename/LocusID:

    1. Select one or multiple species. If multiple species are checked, the choice of reporting only the binding sites existing across genomes is also available. Only genenames can be used for cross species search.
    2. Enter Genename or Locus ID into the GeneID dialog box in comma, return or tab delimited format.
    3. Select 'Use TFD' if you want to search against TFD; alternatively, different sequences can be input into the Binding Site Sequence dialog box.
    4. Select one or more region(s) to work with.
    5. Enter the range of the regions to search for (note that even you selected for more than one region, only one range is allowed).
    6. Press "Submit Search". If a valid email address was entered the server will send you an email when the search is done. Email is recommended for larger scale searches.

    back to Top


  1. Genomic Sequence Retrieval

    Genomic Sequence Retrieval is a tool that allows the user to retrieve genomic sequences. At present, only Homo sapiens, Mus musculus and Rattus norvegicus genomes are supported. If the gene is annotated to be on the minus strand, the reverse complement will be displayed in the output.


    Use case:

    Extract genomic sequences into a file in FASTA format

    For known genename/LocusID:

    Genomic Seq. Ret. Tutorial Screen Shot

    1. Choose one species to start with.
      e.g.: Human
    2. Enter Genename or Locus ID into the GeneID dialog box in comma, return or tab delimited format.
      e.g.: MADH4
    3. Select one region to work with.
      e.g.: Upstream from TSS
    4. Enter the range of the region to search for.
      e.g.: from: -200 to: 100
    5. Press "Submit Search".
    6. Click on the Download FASTA file link and you will see the sequence in FASTA format:
    7.  

      >gene name:|SMAD4|MADH4|DPC4|JIP| | gene id:4089 | taxon: 9606 | Upstream_from_TSS | Chromosome: 18 | from 46810411 to 46810711 | +
      CCAGGTCCAGATTCAGAGCCGCCCGCCGGCTGGCGCTGCCCTGTAGGCGCCTGCGCAGAGCGACCCTCCC
      CGTCACTCGGAGCGGGAGGCGGGGGCAGCCGGGAGAAAGGAAAGCTGCGGGGGAAAAGGGCCAAACCCTG
      AAATTACCCGGATGTGGTCCCCGCGCGCGCATGCTCAGTGGCTTCTCGACAAGTTGGCAGCAACAACACG
      GCCCTGGTCGTCGTCGCCGCTGCGGTAACGGAGCGGTTTGGGTGGCGGAGCCTGCGTTCGCGCCTTCCCG
      CTCTCCTCGGGAGGCCCTTCC


Back to Top
  1. Differential Binding Site Search

    Differential Binding Site Search is a tool that allows the user to compare the frequencies of the binding sites that are found in two sets of genes (named regulated and control gene set hereafter). This tool is intended for users who have done some microarray experiments and would like to know what kind of binding sites (TF binding sites or MicroRNA target sites) enriched in their regulated gene set. (for more information, please read the walkthrough "How do I look for regulatory elements that are enriched in a group of genes?")


    Use case:

    Search for binding sites that are enriched in the regulated set or control set


    Differential Binding Sites Tutorial Screen Shot

    1. For this search, only binding sites that go across all three species will be reported. Input Gene ID into the "Regulated Gene Set " and the "Control Gene Set" dialog box.
    2. Binding Site Ratio: Enter one of the two values (or both) for the binding site ratio for the frequencies of binding sites found in the regulated set compared to that of frequencies of the binding sites found in the control set (by default "greater than" is set at 0 such that every binding site will be returned).
    3. Select the range that you want to perform the search on.
    4. Press "Submit Search." If a valid email address was entered the server will send you an email when the search is done.

Back to Top
  1. TFD Search

    TFD Search allows the user to query our database for binding site sequence or transcription factor name of interest. TFD, the Transcription Factor Database, consists of >7000 binding sites from the literatures. In order to keep the database current, we also provide interface for binding site submission. The new binding sites will be curated and entered into the database for use by other tools.


    Use case:

    Search for binding site sequence or the name of the transcription factor that binds to the binding site sequence


    BAC Clone Tutorial Screen Shot

    1. Check the radio button for the search you want to perform. In this case, check on the "seqeuence" radio button.
    2. Enter transcription factor name or binding site sequence into the dialog box.
      e.g.: GTCTNNAC
    3. Press "Submit Search."
    4. The following is printed on the result page:

      Sequence Name Transcription Factor Name Journal
      GTCTAGAC Smad3/Smad4_RS Smad3/Smad4 Mol Cell 1: 611-7 (1998)
      GTCTGGAC Smad-c-Myc-1 Smad factors N/A

Back to Top

Reference Guide

Back to Top
  1. Potential Binding Site Search

    Potential Binding Site Search is a tool that allows users to search for binding sites (user input or using the transcription factor database, TFD) for many genes in a high throughput way.

    Species:
    Species supported as indicated. If multiple species and 'across genomes' are chosen, only binding sites found across species will be reported.
    Input Sequence:
    If checked, DNA sequence can be input into the Template Sequence dialog box.
    Gene ID:
    Either Locus ID of Genename can be used here.
    Visualization:
    If checked, SVG viewer will be used to illustrate the results.
    (NOTE: SVG plug-in is needed for this functionality. For Windows and Macintosh users, Adobe SVG viewer is recommended; http://www.adobe.com.)
     
    Use TFD:
    If checked, all binding sites in the TFD (Transcription Factor Database) will be used to perform the search.
    Binding Site Sequence:
    Different binding sites can be input into this dialog box. These binding sites will be used along with the TFD binding sites if the "Use TFD" option is chosen.

    Nucleic acid codes supported are:

    A adenosine M A C (amino)
    C cytidine S G C (strong)
    G guanine W A T (weak)
    T thymidine B G T C
    R G A (purine) D G A T
    Y T C (pyrimidine) H A C T
    K G T (keto) V G C A
    N A G C T (any) * anything no spacing limit


    Regions:
    If genename is input for search, choices of three regions are allowed:

     


      Searches can be performed for all three regions simultaneously.

    Range:
    When there is a genename input, range of the region has to be specified.
    Email:
    For large searches, the email option is highly recommended. Our server will send an email to the input email address once the search is performed.

    Back to Top
  2. Genomic Sequence Retrieval

    Genomic Sequence Retrieval is a tool that allows the user to retrieve genomic sequences. At present, only Homo sapiens, Mus musculus and Rattus norvegicus genomes are supported.

    Species:
    Species supported as indicated.
    Gene ID:
    Either Gene ID or Genename can be used here.
    Regions:
    If genename is input for search, choice of one from four regions are allowed:
    1. Upstream from TSS
    2. Upstream from the start codon
    3. Downstream from the stop codon
    4. 3' Untranslated Region

    Range:
    When there is a Gene ID input, the range of the region has to be specified. The range is offset from the start location of the gene.
    Email:
    For large searches, the email option is highly recommended. Our server will send an email to the input email address once the search is performed.

    Back to Top
  3. Differential Binding Site Search

    Differential Binding Site Search is a tool that allows the user to compare the frequencies of the binding sites that are found in two sets of genes (named regulated and control gene set hereafter). This tool is intended for users who have done some microarray experiments and would like to know what kind of binding sites enriched in their regulated gene set.

    Species:
    Note that only binding sites that go across all three species will be reported in this search.
    Control Gene Set / Regulated Gene Set:
    ONLY Gene ID can be used here.
     
    Binding Site Ratio:
    The ratio of the frequencies of the binding sites found in the control gene set and regulated gene set.
    Regions/Range:
    Three options to choose from:
     
    Upstream from the start codon - Choosing this option will allow search to be performed in the defined region upstream of the start codon using TF binding sites and MicroRNA seed sequences that go across three mammalian species. For more information on the MicroRNA seed sequences, please refer to Lewis BP et al., Cell, Vol.115 787-798. The range for this option can be up to -10000bp away from the start codon. 
     
    Downstream from the start codon - Choosing this option will allow search to be performed in the defined region downstream of the stop codon using TF binding sites and MicroRNA seed sequences that go across three mammalian species. For more information on the MicroRNA seed sequences, please refer to Lewis BP et al., Cell, Vol.115 787-798. The range for this option can be up to -2000bp away from the stop codon. 
     
    MicroRNA target sites in 3'UTR - Choosing this option will allow search to be performed in the 3'UTR regions using target sites discovered by miRanda algorithm that go across three mammalian species. For more information about miRanda algorithm, please refer to http://www.microrna.org/miranda_new.html . The range of this option is different from the previous two options. There are three options for the range (1 site or more, 2 sites or more, 3 sites or more), referring to the number of the same MicroRNA target sites needed in one 3'UTR of all three genomes to count as a hit. For example, choosing "2 sites or more" makes the search report the hits that contain 2 or more of the same MicroRNA target sites in each of the 3'UTRs for higher stringency.
     
    For details about the preprocessing for DBSS in general, please refer to our manuscript.
     

    When there is a genename input, the range of the region has to be specified.
    Email:
    For large searches, the email option is highly recommended. Our server will send an email to the input email address once the search is performed.

    Back to Top
  4. TFD Search

    TFD Search allows the user to query our database for binding site sequence or transcription factor name of interest. TFD, the Transcription Factor Database, consists of >7000 binding sites from the literatures. In order to keep the database current, we also provide interface for binding site submission. The new binding sites will be curated and entered into the database for use by other tools.

     
    Choose from either Sequence or Name and enter the search term:
    Sequence: enter the DNA sequence. The sequence has to be at least 4bp long. (* or N is allowed)
    Name: enter the name of the transcription factor.

    Back to Top
Back to Top
 
 
Copyright © 2005 University of Colorado, Boulder. All Rights Reserved.