Tip:
Highlight text to annotate it
X
David Wheeler: Okay. Thank you, Ilya [spelled phonetically].
So Multi Center Mutation Caller -- Calling. What is that? A somewhat enigmatic topic and
possibly mundane were it not for the fact that the mutation calls that we generate underpin
so much of what we do in TCGA. So that makes this a necessary topic, and one that has undergone
itself great evolution during the course of the TCGA.
So what I'm going to do for the next 15 minutes is review the approaches to sematic mutation
calling to consider what it is to call mutations with one caller, talk about the early benchmarking
of sematic mutation callers that we did early on in the project, look at the early trials
of three center calling and adoption of standards for the three center calling and look at the
current status of what's the center calling and new developments in new patient calling.
So the whole trick to this game is to be able to distinguish error from real variation,
the real biological variation that we seek. There are two sources of that error. Some
comes from the sequencing machines itself -- themselves, and are inherent in the base
callers and fortunately most of this is randomly distributed and the base calls come with calibrations
of Q values that enable us to distinguish the truth from error and we're able to filter
most of this, and yet some of this escapes because there's a fair amount of it across
the 50 billion reads that we take for a exome. And when these errors happen to be coincident,
they're found then at allele fractions that might be similar to what we're looking for
in the very heterogeneous tumor environments that we have. That's been described in other
talks; I'm not going to go into that in this talk.
But then there is systematic error that comes from mapping and alignment ambiguities. There's
a lot of difficulty with hundred based reads and the structure of the genome that we have
that makes this a very tricky problem, and this leads to high quality errors that are
much more -- or high quality base differences that actually are reflective of true base
differences, but the read just happens to be in the wrong place and so there -- they
can easily be mistaken for mutations.
So the way the current callers work is that they all have some what I'll call for lack
of a better term a truth engine that distinguishes the real variation from sequencing error,
and these formulations, which are largely Bayesian or log-ance [spelled phonetically]
based formulations give or output tens of thousands to hundreds of thousands of events,
which then have to be filtered, and it's at this point that heuristics come into play,
and each calling center applies heuristics in a slightly different way, and most of the
variation, or a very large fraction of the variation that passes those so called truth
filters then get filtered out. The best documentation of this is in the paper by my colleague Chris
Subulskis [spelled phonetically] at the Broad [spelled phonetically] when they did their
Mutech [spelled phonetically] publication. And what this shows is that at each sequencing
depth, you're filtering a significant fraction of the variance away due to these variance
characteristics, which are being heuristically applied. This amounts to up to 90 percent
of the variation that came through the first step is now disappearing, and this is going
to be important as we go along because I believe this is where most of the variation between
callers emerges.
So with the mutations that we collect from a single caller, we get actually very nice
profiles of significantly mutated genes, this is one example from colorectal cancer, but
now we have 10 or 12 tumors in which we've collected similar profiles as this, and the
profiles end up making a lot of sense. You can place these mutations into pathways that
describe what's going on in a tumor in -- actually in detail that we've never seen before.
But unfortunately, if you go back and compare the calls that different callers are making
on one or two tumors, you see a picture that is at first somewhat disturbing, and this
is where we were with the first benchmark proposed by David Hausler [spelled phonetically]
back about in February of 2011, where here we have on the left what were the sematic
calls, passing all the filters from each of three centers, and you can see that the number
of variance in the center is fairly small compared to the total variation discovered
by the callers in the union set, and in particular a lot more seemed to be going on in the variation
that was unique to any given called than there was in the overlap.
So then onto benchmark two, things didn't look a whole lot better there; 310 was the
intersect from four callers here, and again you can see that going around the outside
the number of events that were unique to all these different callers was much larger, and
so this left the sinking feeling that there was a lot of true variation possibly being
left on the table and a lot going unanalyzed. David took this another interesting step further
and from that -- from the next benchmark did a analysis where for each center that was
being considered in a Venn diagram, he looked at the top 100 calls from each patient that
was being studied here, and looking in the top 100 calls of just the UCSC caller, there
were ten calls that the UCSC caller was considering high quality that no other caller was seeing.
And going around the horn here, we go over to the Wash-U caller, looking at their best
calls, there were 143 that nobody else had seen. For Baylor here, there was about 1000.
These are unvalidated calls, so even though the center is saying they're high quality,
we don't really know. But -- and here we're 55 from the Broad, so potentially there was
a lot not being turned up in the analysis of any given cohort by a single caller.
So -- okay. Similar results were obtained in a set of colorectal samples where UCSC
and Broad made calls. You can see a lot more going on in the unique calls, and similarly
the high quality variance were numerous in the unique calls.
So the conclusion from this was obviously the discordance between callers was high,
and that was pretty dismaying, and the high quality calls is defined by one caller were
missed -- were being missed by the others, so that particular discordancy was distressing.
So the suggestion was that at least if we did multicenter calling, we would ameliorate
some of the possibilities of false negatives.
Okay. By the way, it's also a fact that if you were to apply callers just to diploid
analysis, where here you have the tremendous benefit of an expectation of about 50-50 in
their reference and variant alleles, which is a very powerful constraint on the data.
If you look at the results from five different callers on diploid genomes, they only agree
amongst themselves by 57 percent, so it's not just in sematic mutation calling that
there can be difficulties, but even in this business as a whole, and my sense is that
a lot of this, as I said before, is coming from the fact that we use heuristic callers
to help filter the data and that those differences are -- we're basically sampling a very large multi-variant space and choosing
different components of that space.
So nonetheless going forward, we then began using three center calling on each of the
cancers, and the results coming out of that where we now were able to superimpose validation
data on it, showed that in the -- at least in the overlap, what was being called was
extremely highly accurate. So in the three-center overlap shown here, 99 percent of the variance
validated, and even when there were just two centers making the call, the percentage of
validation was very high. Out in the uniques, the validation rates are much lower. Nonetheless,
validation was occurring in the unique so that the three center calls at least can pick
up false negatives, at least they're on the table for scientists to consider and accept
or reject.
Here's a similar analysis of lung adenocarcinoma data done by the Broad, and once again, a
very high, 100 percent validation rate in the three center overlap, lower validation
rates in the -- around the outside, and more recently in the -- this is the kidney clear
cell project where we had 500 patients, this was the largest cohort at this time, and looking
at validation within 177 of the cases, this shows the number of mutations that were validated
and the percentage that were valid in each segment of the overlap, and you can see a
very high validation in the three -- in the -- at least if two centers made a call, and
a much lower validation rate in the unique data.
So this led to two developments. The first was a medi-caller, developed by Terry Speed
[spelled phonetically], and this medi-caller, he demonstrated was very highly accurate,
and so it leads to a recipe for being able to take the multi-center calling data and
make it highly accurate. In order to achieve that high accuracy though, it's necessary
to have validation data to calibrate each of the callers that are involved, but once
that's done; you can make very accurate calls from the data. And so one possibility, unfortunately
prior to the marker papers coming out, we don't have the validation from all the centers,
and so one thing that could be done retrospectively now is to go back and with the validation
data available, and the calls from each caller, apply his method and generate even more comprehensive
mutation data sets.
The other thing this did was lead to a formalization of the multi-center calling, and with the
recognition that the multi -- that the mutation callers are improving overall, and different
callers detect different events, the validation cycles were taking too long to lead to expeditious
publication of the marker papers, and so we began using multicenter calling in order to
provide a significantly mutated gene list that's based on the calibrated accuracies
of calls made by two or more centers. So this gave us a path forward and broke the conundrum
of the problem we ran into with reviewers who never wanted to take at face value the
mutation calls that were being provided without validation. So we accelerate the submission
of the marker papers now, and -- Okay.
We don't abandon validation, validation is still brought in, but that can go on while
the paper is under review. Validation requiring a second independent sequencing event. Okay.
So where are we now? The other thing that the multi-center calling does is enable other
potentially interested researchers to add their callers in and experiment with the development
of new methods, and so now we have for the adrenocortical carcinoma project, which is
underway, and nearing a paper, I think earlier in this -- earlier in yesterday's session
we saw a review of the work with this tumor, but now we've got five centers calling. You
can see from this that there are still large numbers of unique calls, but now in the center,
through the -- I think through the advancement and the improved sophistication of the calling,
the center is the most heavily weighted, so the overlap is looking much better.
The fact that we still have a number of calls on the outside and preliminary data that I
-- sorry, I can't show you here, is that among these calls in the unique section, we look
in the RNA seek data now and there are hundreds of events out here that are part of -- that
appear to be being expressed and therefore are probably valid sematic mutations.
So again, we continue to sample from a large space, and every new logic that is applied
picks up more of that sampling. Most of what's out here are events that are very low allele
fractions, so they're sub-clonal events in general, although not all of them are. Some
of them have just escaped one or more of the heuristic parameters that are used in filtering.
So now we have a second generation of mutation callers coming on and one of them is being
developed by the genome center, in collaboration with MD Anderson, which measures a distance
per position per sample to reflect a mutation evolution, and uncertainty estimates are based
on a Bayesian-Markov model,, and therefore this method will come with a calibrated certainty
akin to a Q value, a method is now being refined called viper by Wash-U and there's a Mutech
version two on the drawing board. So the next round of mutation callers is going to have
even better accuracy and better sensitivity at low allele fraction. I don't think we'll
see these outer unique sections decrease at all, and that's a good thing, because that
is going to be pulling in these very low allele fraction, sub-clonal mutations that we want
to have.
These new callers are being tested in the DREAM challenge as we go through DREAM 1,
2, and now we're in 3. The callers from the sequencing centers, you can see are at the
very top of the list in what is probably a statistical dead heat. I think that speaks
very well of the TCGA sequencing centers, and so we'll be looking forward to actually
the final rendition of the dream challenge, which will use real data instead of synthetic
data, so.
In conclusion, the TCGA paradigm for mutation discovery is improved by multi-center calling.
This enables us to decrease the false negative rates; it delivers a set of sematic SNVs of
calibrated accuracy, accelerates submission of marker papers, and stimulates development
of new mutation callers by providing benchmarking on the fly. A formal meta-caller was developed,
which may be useful in retrospectively refining mutation calls from TCGA tumor sets.
And finally, one parting thought that I didn't talk about is that we're now starting to use
validation in -- or mutation calling in a lot of different contexts now. We have RNA
fusions, we have structural variation, all of those mutation modes are likely to experience
a similar phenomenon as we see in the SNV, and so that needs to be checked and multi-algorithm
calling will be required there too.
So I'll just end with my acknowledgements of my colleagues who all contributed to this
talk. Thank you.
[applause]
Male Speaker: Are there -- okay, one question, one burning
question please, we're running behind.
Male Speaker: Just we have quick questions. For us, the
modi-callers, are you using same underlined alignment, like with a ray or bow tie?
David Wheeler: These are -- yes. They're using the same set
of bam files.
Male Speaker: Okay. We also -- there are some softer files
for like BW, bow tie, if you use different software alignment. There maybe have some
difference, mutation call or mismatch.
David Wheeler: Yes. Well, yeah. That's going to be important
for comparing mutation profiles across tumors, and so it is necessary to go back, and that's
being done in the ICGCTCGA whole genome analysis now. But that is a very important point.
Male Speaker: So TCGA is recommended BWA for overall alignment?
David Wheeler: I'm sorry, TCGA what?
Male Speaker: Basically I saw lots of data was using BWA,
so is this like --
David Wheeler: Well this -- I was talking about whole exome
here, so I'm sorry, yeah. So yeah, we were using BWA. Everybody's using BWA, but in different
ways, actually.
Male Speaker: Okay, thanks David. Okay. I'd like to welcome
our next speaker, Kjong-Van Lehmann, who is from Memorial Sloan-Kettering Cancer Center,
will talk about extensive trans and cis-QTL's revealed by large-scale cancer genome analysis.
Please.
[end of transcript]