- How to get started?
Click File|Open to open either a tab-delimited or EXCEL file containing microarray data. A sample data file in EXCEL format is included in the program and should be found in the installation directory of AMADA. This is a subset of yeast data that I found at Stanford web site. If y ou are going to use your own file, please note two things. First, the gene names should be unique. Second, the gene names should not be longer than 254 characte rs long. The interface looks and feels like EXCEL, so you can immediately explore its functions. Functions particularly relevant to the analysis of microarray data are under the Data and Analysis menu.
- How to standardize data?
Data standardization means transforming the data in such a way that the mean = 0 and the variance = 1. You may transform your data in the active worksheet in two ways. Click Data|Standardize values in rows will transform your data so that values in each row will have mean = 0 and variance = 1. Click Data|Standardize values in columns will transform your data so that values in each column will have mean = 0 and variance = 1. You may use the Average and Stdev function to verify the transformation if you are very cautious. The use of these two functions are the same as in EXCEL, i.e., =average(b2:b9), = Stdev(b2:b9).
- How to do an integrated analysis of microarray data?
Here is brief tutorial. Open a file (either an EXCEL file or a tab-delimited, or comma-delimited file) that contains your microarray data. It might be a good idea to start with the sample data file ( yeast200.xls ) included in the AMADA distribution. The “200” means that it includes only the first 200 loci. It should be found in the installation directory of AMADA. This is a subset of yeast data that I found at Stanford web site. Click Analysis|Clustering . A dialog box will appear for you to specify options. The yeast200.xls file contains no missing values, so you should uncheck the Has missing value checkbox (it is on by default). Choose one of the seven implemented distance/similarity measures and one of the three clustering algorithms and click the OK button.
AMADA will now run to carry out a cluster analysis and expose additional functions after the cluster analysis is finished. A progress bar indicates the progress of the execution. After a few seconds or a few minutes, depending on the speed of your computer and the combination of the distance and the clustering algorithm you have chosen, a graphic window is presented displaying a huge bifurcating tree resulting from the cluster analysis. At the beginning, only the root node is displayed, and you need to left-click this node to expand the tree . The tree is scrollable and clickable. If you click the left mouse button, then the tree will expand and shrink just as what would happen when you are in Windows Explorer.
When you click the right mouse button, however, a popup menu appears with three items which can also be accessed from the menu bar under the Tree menu, i.e., Show subtree , Expression Plot , and Node properties . Choose the Show subtree menu item will display the subtree with the selected node as root in a versatile tree-viewing window which can manipulate the subtree and print it in high quality. Choose the Expression Plot menu item will display the expression profile of the loci clustered under the selected node, i.e., loci with similar expression profiles. Choose the Node properties menu item will show basic properties of the node, such as how far it is from the tip and how many nodes grouped under this selected node, etc.
The clustered genes are co-expressed genes, but not necessarily co-regulated genes. In order to confirm whether the co-expressed genes are indeed co-regulated, it might help to obtain the DNA sequences upstream and downstream of the coding sequences. These 5'-flanking and 3'-flanking regions are likely regulatory sequences. If the co-expressed genes also share a high similarity in the 5'-flanking sequences, then very likely they are co-regulated genes.
AMADA can retrieve the flanking sequences of the clustered genes, and you might have already noted the Retrieve sequences button when you right-click a node on the tree. Click a node and retrieve the flanking sequences of the clustered genes. The retrieval might be slow. Once it is done, the sequences are presented in a window captioned Sequence Tool . You should align the sequences by clicking Sequence|Align sequences . This will produce statistics to show whether the alignment is statistically significant, i.e., the similarity is significantly higher than random sequences. You cna save the sequences in one of 18 commonly used sequences for further analysis. Note that my program DAMBE implements an extensive collection of tools for sequence analysis.
- How to do a principal component analysis (PCA)?
Principal component analysis in one of the multivariate techniques for summarizing a large data matrix to a few manageable dimensions. It is also useful for visualizing similarities in expression profiles among loci.
To perform a PCA, click Analysis|Principal component analysis . A dialog box appears for you to specify options. Again, if you are using the yeast200.xls file, uncheck the Has missing value check box because there is no missing value in the data set. Choose whether you wish to run PCA on a correlation or a variance-covariance matrix, how many principal components to have, whether to have graphic output (i.e., plots of principal component scores), and whether to standardize the principal component score or not. Click the OK button and AMADA will perform PCA in a few seconds, outputting the correlation or variance-covariance matrix, eigenvalues, eigenvectors and principal component scores.
Once PCA is finished, you may choose to plot the first few principal component scores to have a visual inspection of the similarities among loci. AMADA provides a high-quality graphic window for this purpose with many functions for graphic manipulation.