This past week I worked on preparing submission material for ICPC 2016. We submitted results from the machine learning algorithms we ran on the ABB fixation data using Microsoft Azure. Our main premise was to decide whether we could predict software developer expertise using fixation gaze data. We also wanted to decide which parameters and machine learning algorithms work best for software developer expertise prediction. Five different two class machine learning algorithms gave us a prediction accuracy of above 70%. Neural Networks and SVM performed the best.
I wrote the abstract and the first draft of the short paper we submitted for ICPC. I then worked with Dr. Sharif to refine some of the sections.
Thursday, March 31, 2016
Thursday, March 24, 2016
Week 29: 3/17/2016-3/24/2016
This past week I worked more on ScanMatch.
I discovered that the filename field in our data isn't always correct, so I used the fullyQualifiedName field to correct the filename. I reran the ScanMatch analysis on the corrected filenames and the average pairwise score for each participant increased slightly. I also continued work on designing and implementing a method level analysis using ScanMatch. I removed all gazes that did not fall into a method declaration. I compared sequence data for each participant by using the similarity score of the two methods each gaze fell into (I worked with Dr. Sharif to come up with a similarity scheme). I ran the first results for method level task 2 and I am double checking the results before running the rest of the data for tasks 3 and 4.
We are also working on submitting a paper to ICPC 2016 this weekend.
I discovered that the filename field in our data isn't always correct, so I used the fullyQualifiedName field to correct the filename. I reran the ScanMatch analysis on the corrected filenames and the average pairwise score for each participant increased slightly. I also continued work on designing and implementing a method level analysis using ScanMatch. I removed all gazes that did not fall into a method declaration. I compared sequence data for each participant by using the similarity score of the two methods each gaze fell into (I worked with Dr. Sharif to come up with a similarity scheme). I ran the first results for method level task 2 and I am double checking the results before running the rest of the data for tasks 3 and 4.
We are also working on submitting a paper to ICPC 2016 this weekend.
Thursday, March 17, 2016
Weeks 27 and 28: 3/3/2016-3/17/2016
Last week YSU was on spring break, so these past two weeks I made changes to the VISSOFT website and I continued to work on our ScanMatch analysis.
For ScanMatch, I finished coding the subMatrix and creating the sequence data so the data will run much faster. The data takes about 6 hours to run per task. It compares sequences across all participants for each task. We have three matrices of results (one for each task) that are pairwise comparisons among each participant (so the results on one side of the diagonal are the same as the results on the opposite side of the diagonal). Just by visually analyzing the results matrices, we found that the normalized ScanMatch scores are not higher than .5 and many are not higher than .1 or .2 (unless we are comparing a sequence to itself, in which case the normalized score is 1). We determined that because we are comparing sequences by determining the following line level scheme:
-If in the same file and are both on the same line score is 10 (the highest)
-If in the same file and are both a distance of 1 line away score is 9
-If in the same file and are both a distance of 2 lines away score is 8
-If in the same file and are both a distance of 3 lines away score is 7
-If in the same file and are both a distance of 4 lines or more away score is 5
That at a line level precision there isn't much correlation between sequences.
I have been modifying the current code to work on looking at a method level analysis and I am working on the subMatrix for that now. I will run the modified algorithm on the data to see if at least participants looked more at or in the same methods in sequence.
I made minor changes to the VISSOFT website. I fixed some information for a program committee member, added an announcement for the key note speaker, and removed the word Preliminary from the Call for Papers section.
For ScanMatch, I finished coding the subMatrix and creating the sequence data so the data will run much faster. The data takes about 6 hours to run per task. It compares sequences across all participants for each task. We have three matrices of results (one for each task) that are pairwise comparisons among each participant (so the results on one side of the diagonal are the same as the results on the opposite side of the diagonal). Just by visually analyzing the results matrices, we found that the normalized ScanMatch scores are not higher than .5 and many are not higher than .1 or .2 (unless we are comparing a sequence to itself, in which case the normalized score is 1). We determined that because we are comparing sequences by determining the following line level scheme:
-If in the same file and are both on the same line score is 10 (the highest)
-If in the same file and are both a distance of 1 line away score is 9
-If in the same file and are both a distance of 2 lines away score is 8
-If in the same file and are both a distance of 3 lines away score is 7
-If in the same file and are both a distance of 4 lines or more away score is 5
That at a line level precision there isn't much correlation between sequences.
I have been modifying the current code to work on looking at a method level analysis and I am working on the subMatrix for that now. I will run the modified algorithm on the data to see if at least participants looked more at or in the same methods in sequence.
I made minor changes to the VISSOFT website. I fixed some information for a program committee member, added an announcement for the key note speaker, and removed the word Preliminary from the Call for Papers section.
Thursday, March 3, 2016
Week 27: 2/25/2016-3/3/2016
This past week I worked more on the ScanMatch analysis.
I finished writing code to generate the sequences for our eye-tracking data, to generate scores for each AOI, and to run all of the sequences through ScanMatch in a pair-wise comparison type way and to output those scores to a matrix for each task.
Unfortunately, I ran into some issues with the SubMatrix being too large for Matlab, because we have a large amount of AOIs. I ended up computing SubMatrix scores on the fly, instead of computing them and storing them for look-up later.
Right now the code takes an incredible amount of time to run in Matlab. I determined the analysis to take at least a couple of days if not more at its current speed. I am working now to optimize my code and necessary data to make calculations faster. Worse case scenario, I will be letting the code run all Spring Break to get results or re-writing it in R.
I finished writing code to generate the sequences for our eye-tracking data, to generate scores for each AOI, and to run all of the sequences through ScanMatch in a pair-wise comparison type way and to output those scores to a matrix for each task.
Unfortunately, I ran into some issues with the SubMatrix being too large for Matlab, because we have a large amount of AOIs. I ended up computing SubMatrix scores on the fly, instead of computing them and storing them for look-up later.
Right now the code takes an incredible amount of time to run in Matlab. I determined the analysis to take at least a couple of days if not more at its current speed. I am working now to optimize my code and necessary data to make calculations faster. Worse case scenario, I will be letting the code run all Spring Break to get results or re-writing it in R.
Subscribe to:
Posts (Atom)