© 2002-2009, Dr. Michael Berry, Dr. Ramin Homayouni, Dr. Kevin Heinrich, Lai Wei, Elina Tjioe
Login Register References Examples Help


Semantic Gene Organizer (SGO) is an automated method to cluster genes based on conceptual relationships derived from MEDLINE abstracts. It uses a variant of the vector-space model called Latent Semantic Indexing (LSI) to represent genes as vectors in lower-dimension (concept) space. The relationship between genes is deduced from the cosine of the angle between gene document vectors. A gene document is a concatenation of MEDLINE titles and abstracts identified in the LocusLink entry for each gene.

IMPORTANT NOTE: This is a test version of the program and is intended to demonstrate proof-of-concept for using LSI to functionally cluster genes. The current document collection contains only 50 hand-selected genes. In the near future, we will expand the collections to include all genes in LocusLink and OMIM databases.


Quick Start

  1. Login Anonymously.
  2. Select the document collection (e.g. 50_Test_Genes) from the pulldown menu.
  3. Assign a session name (e.g. BW1) so that you can recall your searches at a later time.
  4. Select accession number or keyword option from the pulldown menu.
  5. Type in (or cut and paste a list) of accession numbers or keywords for your query. A keyword can be a string of words separated by a space. Each keyword query should be separated by a blank line.
  6. Press Submit.
  7. In the next window, your queries will appear in the upper right panel. Click any query to view the relevant genes in ranked order in the bottom panel.
  8. Click on the gene symbol/score to view the titles and abstracts in the document associated with the gene in the upper left panel. Click on the Gene ID (LocusLink) beside the gene symbol to go directly to the LocusLink page in a new window.



Queries: The "accession number" query uses the entire gene document to construct a vector whereas the "keyword" query simply constructs a vector using the terms provided in the query. Therefore, using the accession number is an accurate way to identify gene-gene relationships. However, a keyword query will allow you to identify genes that are related to your specific query. This feature is useful for identifying genes associated with diseases, phenotypes, etc.

Scores: The score each gene receives is based on the cosine of the angle between the vectors constructed for the query and the corresponding gene document. Since this is a cosine measure, all scores are in the range [-1, +1] where a "perfect" score is +1.

Weighted Terms: The abstract window contains the top 100 terms that are extracted from the gene document. These terms may be useful in identifying synonyms and functional information for genes.

Note: If you're having trouble logging in, try closing your browser then trying again.