2014-05-18

Page
edited by
Olivia W. Lang

Changes between revision 11
and revision 12:

...

h1. CS251: Project5

CREATIVE-(+C{+}olorful +R{+}otatable +E{+}igen +A{+}nalysis +T{+}hat +I{+}ntroduces +V{+}isual +E{+}vidence)

{color:#663333}{+}Write-up{+}{color}{color:#663333}: write a brief description of how you implemented the PCA algorithm and modified your Data and Application classes. Incorporate screen shots showing a visualization of the provided data set and another data set of your choice.{color}

This project implements the *analysis.pca* method written up in lab to give the user options to generate and plot PCA data. Below is a summary of the new additions to the *display.py* file. This project also uses the new class in the *data.py* file created in lab called *PCData*.

{panel}

h5. New Class

PCADialog \--> a new class which is called by the *DisplayApp* to calculate, return, and update the calculated PCAs chosen by the user in addition to displaying the eigenvectors and eigenvalues.

h5. Fields

savPCA \--> list that stores the *PCAData* objects created in the *PCADialog* which are extracted and stored into the *DisplayApp*

savPCAtit \--> dictionary that takes in a string of the title corresponding to a *PCAData* object and returns the index value in the *savPCA* list where the *PCData* object is saved.
{panel}

h4. Part1&Extension3: user choosen data to execute and store PCA analysis (PCADialog) 

For the choosing of data and executing PCA analysis, I decided to have it all controlled in a single dialog window. Similar to the *DimensionDialog*, I created a child of the parent class *Dialog* called *PCADialog*. When Cmd-P is called or the "PCAanalysis" option is called from the menu, the PCADialog is opened. It is initiated with a parent like the normal *Dialog* class in addition to the current data object (*dobj*), the list of saved PCAs (*savPCA*), and a dictionary linking the names the PCA analyses were saved under to their position in the list of saved PCAs (*savPCAtits*) and into the *ListboxBox* object *listBoxSaved*.

An extension was to allow the user to pick the columns to use for the PCA. In order to allow for this capability, I put all of the columns for the data object into a *Listbox* widget called *listBoxCol*. I also created an *Entry* widget called *titleEntry* for naming each PCA calculated. If there has not been a title entered into the space allowed or the title entered already exists in the list of saved PCAs, then a message will be printed in the terminal explaining the problem with the contents of *titleEntry*.

In addition to the "Add this PCA" button which takes the information from *listBoxCol* and *titleEntry* objects to create an PCA object stored *savPCA* and a title stored in both *savPCAtits* and *listBoxSaved*, I included a "Remove this PCA" and "Show Eigens" button. Below is some code for the handle method of "Add this PCA", *addPCA*.

{code}
def addPCA(self):
# retrieve title from Entry widget
tit = self.titleEntry.get()
# check for empty title
if tit=='':
print 'gimme a name'
return
# check for repeat title
orig = True
for t in self.savPCAtit.keys():
if t==tit:
orig=False
break
if orig==False:
print 'name already used'
return
# add to title into listBoxSaved
self.listBoxSaved.insert(tk.End,tit)
# create list of colids from selections in self.listBox
colids = []
for i in self.listBoxCol.curselection():
colids.append(dta.DataColId( self.data_obj, self.hed[int(i)] )) # self.hed is a field of all the headers of the current data object
# pca analysis and update fields
pcadobj = analysis.pca( colids, False )
self.savPCAtit[tit] = len(self.savPCA)
self.savPCA.append(pcadobj)
# clear selections and select new Entry (for formatting puposes)
self.listBoxSaved.selection_clear(0,tk.END)
self.listBoxSaved.selection_set(tk.END)
{code}I first checked that the title extracted from *titleEntry* was appropriate before storing it into the *Listbox* of saved PCAs. Because the *analysis.pca* method takes in *DataColID* objects, I extracted a list of these objects from the user-selected columns in *listBoxCol*. I finished by calling the method, updating the appropriate fields, and executing all the appropriate formatting commands.

The "Remove this PCA" button calls a simple method, *removePCA*, to remove the selected PCA object from *listBoxSaved* and the rest of the fields. Below is the code for this method.

{code}
def removePCA(self):
# save the indx of the PCA to be removed and use to remove from fields and listBoxes
rmv = int(self.listBoxSaved.curselection()[0])
self.listBoxSaved.delete(rmv)
del self.savPCA[rmv]
# dictionary savPCAtit must be re-done
newdict={}
for g in range(self.listBoxSaved.size()):
newdict[self.listBoxSaved.get(g)] = g
self.savPCAtit = newdict.copy()
{code}
The "Show Eigens" button is explained in part3 below.

h4. Part2&Extension4&5: user-selected plots of original and PCA projected data

With the PCA analysis completed, we next look to plot the data we have generated. For this, we go back to the *DimensionDialog* from Project 4. In the *DimensionDialog* class, I added the *savPCA* and *savPCAtit* input variables to the constructor. They were saved into fields and each of the PCA options were added to *list_comp*, a list of all the possible data columns for each dimension of data that can be plot on the screen (x,y,z,size, and color). Each pca entry will be formatted as a "PC" string followed by the PCA object title, followed by a space, and then the header the PCA column is associated with. The code below can help to understand what I mean by this.

{code}
# adding PCA data to the list_comp (which already includes the original columns)
# extract titles from dictionary
pcakeys = self.savPCAtit.keys()
# for every PCA object...
for p in range(len(self.savPCA)):
# store the PCAData object, all its headers, and all
pdobj = self.savPCA[p]
pcahed = pdobj.get_data_headers()
# for each column of current PCAData object...
for i in range(len(pcahed)):
# add string in my PCA column format to list of listbox components
entry = 'PC' + pcakeys[p] + ' ' + (pcahed[i]
self.list_comp.append(entry)
# eventually self.list_comp components will be added to each of the 5 dimensions' Listboxes
{code}

When inputting the contents of *list_comp* to each dimension's *Listbox*, I decided to omit the PCA option from the color and size options. I figured it would be more valuable to see the values on x,y, and z, while the information gathered from size and color could not be compared very well to similar graph with the original data values. This only took a simple *if-else* statement to execute.

Since the data extracted from *DimensionDialog* are just strings, we must go back to the *DisplayApp* class to check that the data is interpreted properly. The 5 returned strings in the *DimensionDialog*'s object *hed* help to determine which data columns to store  for plotting In order to allow for the PCA column selection we modified the portion of method which takes *self.curHed* and creates a list of *DataColId* objects. Below is a snippet from the *handleChooseAxes* method to show how I did this:

{code}
# Modifying the code storing column ids from curHed. da is the DimensionDialog object.
self.curHed = da.hed

self.dataCols = []
for ch in self.curHed:
# check if PC object
if ch[0:2]=='PC':
savidx = self.savPCAtit[ ch[2:ch.index' ')] ]
# use the PCAData object instead of the regular Data object.
pdo = self.savPCA[savidx]
# because the PCA columns are headed with PC# before the original columns,
# I subtract the number of PCA columns from the header index to get the
# header's corresponding PCA index.
hedidx = pdo.header2raw[ch[ch.index(' ')+1:len(ch)]] - len(pdo.get_pca_header())
self.dataCols.append(data.DataColId( pdo,pdo.raw_headers[hedidx] ))
# the else case handles the original data. (contents are copied from Proj4 code)
else:
self.dataCols.append(data.DataColId(self.data_obj,ch))
{code}

Once the *DataColId{*}s were handled, they were put into a numpy matrix so there is no need for any further edits of the *display* code. The code from Project4 was able to handle the rest without any major modifications.

h4. Part3: "Show Eigens" button opens new window with saved Eigen values

A introduced above, the "Show Eigens" button in the *PCADialog* will open a new *Toplevel* widget which shows the Eigen data for the selected PCA in the *listBoxSaved* widget. Below is the code for the handle method of the button.

{code}
def showEigens(self):
# use the index of the savPCA liset to get information that will be printed
cur = int(self.listBoxSaved.curselection()[0])
pcadobj = self.savPCA[cur]
eigenvalues = pcadobj.get_eigenvalues()
eigenvectors = pcadobj.get_eigenvectors()
pcaheaders = pcadobj.get_pca_headers()
dataheaders = pcadobj.get_data_headers()
# initialize and title Toplevel widget. Add basic labels
top = tk.Toplevel()
top.title("Eigens:" + self.listBoxSaved.get(cur))
tk.Label(top, text="eigen vals:").grid(row=1, column=0)
tk.Label(top, text=" ").grid(row=2, column=0)
tk.Label(top, text="eigen vectors:").grid(row=3, column=0)
for i in range(len(pcaheaders)):
# headers
heading = '' + pcaheaders[i] + '(' + dataheaders[i] + ')'
tk.Label(top, text=heading).grid(row=0, column=i+1)
# evals
tk.Label(top, text=eigenvalues[i]).grid(row=1, column=i+1)
# evecs
for v in range(len(pcaheaders)):
tk.Label(top, text=eigenvectors[v,i]).grid(row=3+v, column=i+1)
{code}

h4. Part4: cool name using acronym

+C{+}olorful

+R{+}otatable

+E{+}igen

+A{+}nalysis

+T{+}hat

+I{+}ntroduces

+V{+}isual

+E{+}vidence)

\__\_

View Changes Online
View All Revisions |
Revert To Version 11

CS251: Project5

CREATIVE-(
C
olorful
R
otatable
E
igen
A
nalysis
T
hat
I
ntroduces
V
isual
E
vidence)

Write-up
: write a brief description of how you implemented the PCA algorithm and modified your Data and Application classes. Incorporate screen shots showing a visualization of the provided data set and another data set of your choice.

This project implements the analysis.pca method written up in lab to give the user options to generate and plot PCA data. Below is a summary of the new additions to the display.py file. This project also uses the new class in the data.py file created in lab called PCData.

New Class

PCADialog --> a new class which is called by the DisplayApp to calculate, return, and update the calculated PCAs chosen by the user in addition to displaying the eigenvectors and eigenvalues.

Fields

savPCA --> list that stores the PCAData objects created in the PCADialog which are extracted and stored into the DisplayApp

savPCAtit --> dictionary that takes in a string of the title corresponding to a PCAData object and returns the index value in the savPCA list where the PCData object is saved.

Part1&Extension3: user choosen data to execute and store PCA analysis (PCADialog) 

For the choosing of data and executing PCA analysis, I decided to have it all controlled in a single dialog window. Similar to the DimensionDialog, I created a child of the parent class Dialog called PCADialog. When Cmd-P is called or the "PCAanalysis" option is called from the menu, the PCADialog is opened. It is initiated with a parent like the normal Dialog class in addition to the current data object (dobj), the list of saved PCAs (savPCA), and a dictionary linking the names the PCA analyses were saved under to their position in the list of saved PCAs (savPCAtits) and into the ListboxBox object listBoxSaved.

An extension was to allow the user to pick the columns to use for the PCA. In order to allow for this capability, I put all of the columns for the data object into a Listbox widget called listBoxCol. I also created an Entry widget called titleEntry for naming each PCA calculated. If there has not been a title entered into the space allowed or the title entered already exists in the list of saved PCAs, then a message will be printed in the terminal explaining the problem with the contents of titleEntry.

In addition to the "Add this PCA" button which takes the information from listBoxCol and titleEntry objects to create an PCA object stored savPCA and a title stored in both savPCAtits and listBoxSaved, I included a "Remove this PCA" and "Show Eigens" button. Below is some code for the handle method of "Add this PCA", addPCA.

I first checked that the title extracted from titleEntry was appropriate before storing it into the Listbox of saved PCAs. Because the analysis.pca method takes in DataColID objects, I extracted a list of these objects from the user-selected columns in listBoxCol. I finished by calling the method, updating the appropriate fields, and executing all the appropriate formatting commands.

The "Remove this PCA" button calls a simple method, removePCA, to remove the selected PCA object from listBoxSaved and the rest of the fields. Below is the code for this method.

The "Show Eigens" button is explained in part3 below.

Part2&Extension4&5: user-selected plots of original and PCA projected data

With the PCA analysis completed, we next look to plot the data we have generated. For this, we go back to the DimensionDialog from Project 4. In the DimensionDialog class, I added the savPCA and savPCAtit input variables to the constructor. They were saved into fields and each of the PCA options were added to list_comp, a list of all the possible data columns for each dimension of data that can be plot on the screen (x,y,z,size, and color). Each pca entry will be formatted as a "PC" string followed by the PCA object title, followed by a space, and then the header the PCA column is associated with. The code below can help to understand what I mean by this.

When inputting the contents of list_comp to each dimension's Listbox, I decided to omit the PCA option from the color and size options. I figured it would be more valuable to see the values on x,y, and z, while the information gathered from size and color could not be compared very well to similar graph with the original data values. This only took a simple if-else statement to execute.

Since the data extracted from DimensionDialog are just strings, we must go back to the DisplayApp class to check that the data is interpreted properly. The 5 returned strings in the DimensionDialog's object hed help to determine which data columns to store  for plotting In order to allow for the PCA column selection we modified the portion of method which takes self.curHed and creates a list of DataColId objects. Below is a snippet from the handleChooseAxes method to show how I did this:

Once the DataColIds were handled, they were put into a numpy matrix so there is no need for any further edits of the display code. The code from Project4 was able to handle the rest without any major modifications.

Part3: "Show Eigens" button opens new window with saved Eigen values

A introduced above, the "Show Eigens" button in the PCADialog will open a new Toplevel widget which shows the Eigen data for the selected PCA in the listBoxSaved widget. Below is the code for the handle method of the button.

Part4: cool name using acronym

C
olorful

R
otatable

E
igen

A
nalysis

T
hat

I
ntroduces

V
isual

E
vidence)

___

View Online
|
Add Comment

Show more