Sorry complete n00b here:
I am using ye olde LCQ DECA plus, a low res iontrap, from Thermo with a conventional HPLC doing untargeted metabolomics on GMO plant samples that have purposefully altered metabolisms (often randomly so with heterologous 2ndary metabolic genes).
I was originally doing all the data analysis by hand but as the sets got larger this became extremely impractical, time consuming, and generally painful so I switched over to xcms using mostly some pre-made recipes. Things seemed to be working well for a while, but we more recently got a small data set and so a coworker and I attacked it both by hand and with xcms so that we could compare results. We were happy that xcms found a few things that we didn’t by hand but were appalled that it was missing some very significant critical features that we did discover by hand. Xcms is a powerful tool so I have no doubt that if I set it up correctly it would find everything. But I have flipped through the manual and played around with many settings (my findings outlined below) but I still cannot get the behavior from xcms that I would like, hence why I am here imposing on the experts.
My general xcms workflow is very light. I pretty much only use “xcmsSet”, “group”, and the “retcor" functions before outputting a “diffreport” and going on from there by hand. So the first thing that I need to debug is my use of xcmsSet. Traditionally I just started with:
Code:
A<-xcmsSet(step=0.5)
which seemed to give fairly good results until we did that comparison with hand analysis (and yes this iontrap’s resolution and consistency is so low that the “step=0.5” is unfortunately representative of the data and thus very necessary.) Most of the significant features that we’re missing relative to the hand analysis had two things that I thought might be causing xcms to miss them, 1) they were very narrow despite their significant height and our old failing HPLC (their peak half-widths were under 3 seconds), and 2) they only appeared in one set of plant constructs so that they were only represented in like 10% of the traces within one group (which ironically means they are far more important then features common across more of the traces.) So for issue “1)” it seemed like the solution would be to play with the “fwhm” variable in “xcmsSet”, and for issue “2” it seemed like the solution would be to play with the “minfrac” variable in the “group” function.
So I tried
Code:
A<-xcmsSet(fwhm=3, step=0.5)
but while the eventual report now contained the missing critical narrow peaks, it lost even more of the critical thicker peaks, and also caused a large number of problems with grouping and quasi-redundant peaks which I will get to in a bit.
So I tried something in between like:
Code:
A<-xcmsSet(fwhm=10, step=0.5)
Which gave a bit cleaner final output but can miss both some of the narrowest peaks and some of the thickest.
I don’t really understand what it means or does but I noticed that the “fwhm” variable can take a range so I tried a few things like:
Code:
A<-xcmsSet(fwhm=c(8,25), step=0.5)
but in addition to creating over 50 errors on building the xcmsset object for certain data sets (works fine in others though) it apparently completely wrecks up grouping, because while grouping itself seems to proceed more or less normally subsequent calls to the “retcor” function produce errors like:
Code:
Error in match.arg(method, getOption(“BioC”)$xcms$retcor.methods) :
‘arg’ should be one of “obiwarp”, “peakgroups”
One thing that kind of worked … kind of … was to set the signal to noise threshold (“snthresh” variable) low and then compensate by setting the “max” variable high so that there would be enough recounts of ions to cover all the extra noise looked at by xcms. So trying something like
Code:
A<-xcmsSet(fwhm=3, step=0.5, max = 20, snthresh=2.5)
Does find all the critical peaks identified by hand analysis but it makes a horrible terrible mess too. For starters the resulting xcmset object is such a disaster that the group function either chokes or otherwise doesn’t do much, and the retcor function basically doesn’t work at all; either having no grouped peaks to work with or dropping out with multiple errors about more esoteric faulty aspects of the data that I don’t understand. Further in addition to the tremendous amount of just garbage/noise that this puts into the eventual diffreport, most of the features are significantly fragmented and appear to be tracked over completely unreasonable amounts of time. Which is to say that if one takes the resulting diffreport and groups the data by average retention time (“rtmid”) and then m/z one finds that most real features are now registered as like six or more features with the real data seemingly bined seemingly at random between this semi-redundant peak registries which together formed sort of a gausien with the most intense and frequently integrated version of the feature at the center of all the redundant copies each of which got less intence and less frequently used for binning integration as they spanned out from the major center one. Worse still was the range over which xcms was tracking these semi-redundant fragment features. Looking at the actual data most of the real peaks only varied about 4 seconds or so over the whole data set, but for ever feature the difference between “rtmin” and “rtmax” for each fragment peak was 2 to 4 minutes even though each fragment was only a few fractions of a second from its nearest faster or slower fragment neighbor of the same feature. This is somewhere between ugly and disastrous.
I kind of figured part of the issue might be setting the “max” variable so high, but if I drop it important peaks disappear. It is pretty easy to understand why, the injection peak at the start of all my traces, and the equilibration peak at the end are complex “ion rainbows” so as “fwhm” and/or “snthresh” get smaller the number of “features” xcms identifies in these ion rainbows explodes. Indeed with the setup as before:
Code:
A<-xcmsSet(fwhm=3, step=0.5, max = 20, snthresh=2.5)
well over 75% of my features end up being in the two “ion rainbows” in my diffreport, and I basically start all my analysis by deleting 75% or more of my “data” in these locations. The obvious thing to try to compensate for this was the “scanrange” variable. But any time I try something like:
Code:
C<-xcmsSet(step=0.5, scanrange = c(40,700))
I just get the following error:
Code:
Sample1: Error in .local(object, …) :
unused argument(s) (scanrange = c(40, 700))
Perhaps this is for the best as some of the only things that the “retcor” seems to think are okay to lock onto to do alignment with are in these starting and finishing ion rainbows. This is a bit frustrating though as that is basically garbage (though perhaps it is reproducible garbage) But there should be better peaks to track as all the base plants are the same, so even before the addition of our internal standards to each sample, all the samples have a wide range of metabolites at many points along the chromatogram that are consistent across the whole data set with a variability in retention time of only a couple of seconds, and yet the retcor function refuses to acknowledge any of them for grouping or alignment.
Well this post is already getting really long so I guess I’ll save discussion of my ham-handed abuse of the “group” function for another post and just focus on asking: any thoughts on how I could best be setting up xcmsSet to find all interesting and significant features without making a total has of things?
Indeed please don’t think I am asking people to read and address all the questions in here, just any advise on any aspects that people can think to suggest would be very much appreciated.
Thanks.
Statistics: Posted by Nat S — Mon Aug 26, 2013 5:36 am