Tip:
Highlight text to annotate it
X
Melissa Troester: Okay, great. Thank you very much. It's great
to be here and talk about the work of our Double Normal committee. Our goal was to do
some genomic characterization of cancer-adjacent breast tissue, and what we're looking at is
both field effects and expression subtypes. So, to get started with this, I wanted to
give a bit of the translational relevance for looking at adjacent normal tissue.
This is a curve from a randomized trial of breast conserving therapy versus radical mastectomy.
And what you see on the y-axis is the probability of recurrence and on the x-axis is the number
of years. And even 15 to 20 years after surgery you see that individuals having breast conserving
therapy have a much higher probability of recurrence. Overall survival is the same for
the two groups because an additional resection can occur, but this difference in the recurrence
rates has been the subject of some interest, specifically with relation to this concept
of field effects. The other thing we know is that when local recurrence does occur it
commonly occurs in the lumpectomy bed and that the rates tend to be higher in some different
subtypes of breast cancer, specifically basal-like breast cancers.
So, to try to understand this better, my research group has been doing some work with looking
at field effects and stromal microenvironment surrounding tumor, and then the TCGA had done
this extensive characterization of many different data types in that adjacent normal tissue,
and so I am going to talk to you a little bit about that today.
But to start out with, I think it's important to have a clear definition of what do we mean
by a field effect or what is a field carcinogenic effect, and this visual really nicely shows
what we're thinking about when we're thinking about field effects. On the left you see a
patch which is defined as a small region that is showing high expression of a particular
mutant protein. In this case, it's mutant p53. Then, as you move to the right, you see
progression of this patch to form a field which ultimately then persists in the tumor
that forms. So, this is histologically normal tissue at the left, but it harbors this defect
that then ultimately manifests in the cancer, and this is what we refer to as a field effect.
Usually, we are referring to epithelial changes and we're referring to genetic changes that
create an area that looks normal but that, if left behind during resection, could result
in a second primary. And this was first described in 1953 in oral squamous cell carcinoma, but
has subsequently been shown to occur in a number of different cancer types.
The other thing that happens in the adjacent normal tissue is that we can get a stromal
response. So, it's well-known that there can be changes in the stromal tissue immediately
adjacent to a tumor and that these changes could be detected by gene expression profiling
or other genomic effects. So here I'm showing you a heat map from mRNA microarrays -- or
mRNA data from a DNA microarray, and you can see on the left that the cancer-adjacent tissue
has a very different gene expression profile than reduction mammoplasty tissue does, and
in this paper we showed that some of the gene expression changes that were most detectible
seemed to be consistent with evidence of a wound response or a strong stromal reaction
to the presence of the tumor.
So, with that background in mind, I am now going to turn to what we have been doing with
the TCGA Double Normal Project, and here I'm representing the data from a number of different
groups. And I'll start first talking about the DNA analysis, and this is data where we
had triplets. So we had 40 triplets where we had normal breast tissue adjacent to the
tumor. We also had the tumor and we had blood. And this tissue was -- exome sequencing was
performed by the WashU group with Dan Koboldt and Li Ding doing the analyses that I present
here today. And then we also had copy number alteration data which Andy Cherniack presented
-- or did for our group. And then we also have methylation data. Now, the difference
with the methylation data is that we don't have a blood normal standard. We can't look
at the blood level methylation as a standard for comparison. But our key question across
all these three DNA data types is: Are there detectible field effects and/or are there
detectible tumor cells? And the idea was that perhaps some of this adjacent normal tissue
could be used as control tissue for other types of TCGA analyses. And so what I'm going
to show you here is the limitations and the advantages of potentially using the data in
that way.
So this slide is courtesy of Andy Cherniack, and it just shows us how we are using the
SNP array data to find copy number alterations. So here, using the blood normal as a comparison,
Andy has identified the copy number alterations in tumor, and so you can see that there is
a MYC amplification present in this particular tumor. Also, lined up right above that, is
the normal adjacent, and you can see that there is also MYC -- focal areas of MYC amplification
in that normal tissue. So, this would be what we would consider a field effect. Or it could
also be potentially evidence of tumor cells, and it's difficult to distinguish that just
from looking at this data, but this is the kind of thing that Andy Cherniack did to assess
the copy number alterations in the adjacent normal tissue.
Here is a map showing you all 40 of the samples that we analyzed, and they're just color-coded
here -- each tumor subtype is color-coded. And the reason I did that is because I wanted
you to see that these field effects seem to be occurring in all different types of tumors.
So, we have a basal-like and two luminal As that showed some sort of field effect or tumor
contamination based on copy number data, and it was a total of about 7 percent of the samples
had evidence of some kind of copy number alteration in the adjacent normal.
We then turn to analyses by Li Ding and Dan Koboldt, where variant allele frequencies
were detected, again, using the blood normal as the standard. And what you can see is there
is quite a lot of variant allele frequency variation in the tumor tissue. And the scale
here goes from zero to 100 percent and there is quite a few samples that show high variant
allele frequencies in the tumors. Then, if you look along the x-axis, you see the variant
allele frequency in the adjacent, and you can see that the axis is much shorter, so
we see a lower prevalence of variant allele frequencies in the normal tissue, but many
of the samples that have high variant allele frequencies in tumors are also showing up
in the normals. There are also many variant allele frequencies that are present in tumor
that don't show up at all in the normal.
So, again, here we're trying to distinguish between what might be a field effect and what
might be tumor cell contamination, but the bottom line with this data is it tends to
be much more sensitive to being able to pick up these mutations. So, 10 cases out of the
40, or roughly 25 percent, had some evidence of a field effect by mutation analysis.
So, here we're adding an additional column to the data set, and you can see that the
exome-seq data demonstrated a much larger number of samples with some sort of alteration
in the adjacent normal tissue.
Okay, now, the methylation data is divided into two data sets. The first is a 27,000
Illumina platform, and that's shown here. On the left are the tumors corresponding to
the adjacent normal on the right, and you can see that there are quite distinct patterns
between tumor and normal for methylation, but there was one adjacent normal sample that
shows a methylation pattern very similar to what's observed in tumor. So this one was
picked up as having a tumor-like methylation profile.
Here is the Illumina for a 50K data and, again, there were three samples that showed up as
having this tumor-like methylation profile. So all of these were flagged, and again, I'm
going to show you where they came out in terms of these 40 samples. And what you can see
is that there is one tumor here that kind of consistently was picked up by -- or one
normal sample, excuse me, that was picked up by all three platforms, but that some of
the platforms are picking up different samples. So there's some heterogeneity in terms of
which samples are being identified depending on which data type is being used. With the
methylation, again, it was a 7 to 10 percent rate, more similar to the copy number alterations
in terms of how many samples were showing these kinds of effects.
So, the gold standard is -- I keep referring to this idea of we can't distinguish between
a field effect and a tumor cell contamination, and so our next thought was how would we actually
go about doing that. And our strategy was to get a pathologist to do a very careful
review of these tissues to see if there was actually any evidence of tumor cells present
in the tissue. So, in order to distinguish these two questions, we then had a pathologist
review all of these tissues, and what we found was that we actually had, unbeknownst to all
of the analysts, there was a positive control in our data set. In retroactively going through
the data collection and the QC procedures, we noticed that one cancer adjacent sample
had gotten in and had later been detected to have been very adjacent to a tumor specimen,
and so this one sample that was detected by all three of the platforms, in fact, did have
clear evidence of tumor cell contamination. Unfortunately, as luck may have it, for two
of the others there were actually no good quality sections. So we concluded from this
that if you do have good evidence that there's no tumor in your normal, then that might be
a pretty good indicator that you won't pick up these field effects. But for cases where
that data was missing or where there was clear evidence of tumor we were finding that the
genomic data was also uncovering some evidence of field effects there.
Okay, so looking for malignant cells is not all that we were capable of doing with this,
and we are very aware, particularly with breast -- breast tissue is very stroma-rich, and
so we were aware of some challenges in analyzing this data, even things to the extent of what
kind of coverage do you get when you have a 40x read but only 5 percent of you tissue
is epithelium, and we're really looking for epithelial-based field effects. So we were
thinking about all of these issues, and we decided that it was really important to have
good characterization of the stromal and epithelial content for our tissues.
So a post-doc in my lab went through and manually annotated using a PerioScanning all of these
samples and very carefully denoted where the epithelial cells, where is the stroma, and
where is the fat, and we were able to come up with composition estimates for every sample.
Then, Andy Beck at Harvard trained an algorithm that can do this in an automated fashion,
and he can now do this very rapidly on all of our samples for the TCGA, and he is also
being able to collect other sorts of morphometric data from these tissues at the same time.
In using this data, we were then able to go back and look at, were there particular marks
-- methylation marks in this case I'm showing you -- that were correlated with the stromal
or epithelial content of the tissue? What we found was on the 450 platform there were
a huge number of probes that were positively correlated, about 1,300 probes, and about
1,250 probes negatively correlated with epithelial content given a FDR of about 5 percent. With
the stromal content, there was also a large number of genes that were detected. So, being
able to look at these gene sets might help us to interpret methylation profiles where
we don't have that data, and it might also give us some clues as to the sort of changes
that we're seeing. There's a lot more that can be done here to sort of mine the biological
significance of these gene sets.
So I started out talking primarily about DNA, and the reason the DNA was so appealing for
us in trying to address this question is that we have this really great sequence standard
which is coming from that triplet, that blood normal. As I turn to the mRNA and the microRNA
data, you're going to see that we don't have that same facility to make comparisons because
we simply don't have the right tissue base control. We don't have patient-matched truly
normal tissue. All the tissue that we're getting could be showing effects of response or stromal
reaction to the tumor.
However, we were interested to see, what is the variation in RNA. So, I had shown you
earlier that there is this widespread wound response. While doing unsupervised analysis
just on the cancer adjacent tissue only, we first discovered that there were two very
distinct groups in terms of expression profile. And here you're seeing that there is two main
clusters: one we dubbed active, and the other inactive based on the fact that the active
group showed increased levels of cellular movement, inflammation, fibrosis, and chemotaxis
genes. So this is truly just based on ontology. But we were interested to say, "Okay, if there
is this heterogeneity in the normal does it have something to do with the tumor types
that are formed or does it have something to do with risk of recurrence?" And we looked
to see whether these two subtypes were correlated with ER status, and there was no significant
association there. And they also were not correlated with tumor subtype, and the white
samples here are ductal carcinoma in situ samples.
So, this is in some previously published work, but what we had identified in that paper is
that there were two subtypes of cancer-adjacent normal tissue and they seemed to be independent
of tumor subtype, but, interestingly, they appear to predict prognosis and ER positive
tumors. It's very difficult to predict late survival outcomes for ER positive tumors,
and our thought was that perhaps something about the way that microenvironment responds
to the tumor might give us some clues to understand the progression of those ER positive tumors.
So, when we turn to looking at the RNA data in the TCGA samples, we had this unsupervised
clustering in the back of our mind and we wanted to see if we were going to recapitulate
these same subtypes. Now, the TCGA data does not have the maturity in terms of the survival
outcomes, it's only a few years in, so we couldn't evaluate the survival probabilities
for these two groups, but we were able to show that there were two distinct clusters
also present in the TCGA data. And, actually, what I'm showing you here is the microRNA
data because the exact same pattern of expression was observed in the microRNA as was present
in the mRNA. And I understand this is distinct from what was seen in the breast tumor tissue,
where microRNA and mRNA profiles were not strongly correlated. In the normal tissue,
they are almost identical. So you can see here there are two clusters, and this is data
from Gordon Robertson where he has many, many samples of normal tissue that were done by
consensus clustering here and microRNA arrays. And then down at the bottom you can see the
identify based on the mRNA cluster. So, the white is active and the black is inactive,
and you can see that they're almost perfectly concordant with the two microRNA clusters.
Interestingly, we tried to figure out what kinds of biological themes might be represented
here, and the strong -- one of the strong factors distinguishing these two were microRNA-200
family genes, which was sort of indicative of a sort of more mesenchymal character. So,
going back again to that pathologic data and that data about the composition of the tissue,
we then wanted to ask, "Well, is there something about the composition that we can learn from
these expression subtypes?" But the other thing, before I turn to that data, is I just
wanted to point out that what you don't see here are those few samples sticking out like
you did in the methylation data where we could say, "Okay, there's clear evidence of tumor
contamination data here." We are not picking up tumor contamination readily in these mRNA
and microRNA data the way we were with the DNA data, but what we are seeing is that these
mRNA and microRNA subtypes seem to be pretty strongly correlated with the composition of
the tissue. There is quite a bit of variation here, but that active subtype seems to be
associated with high levels of adipose-rich stroma, whereas the inactive subtype is associated
with high levels of nonfatty stroma or fibroblastic tissue. And then the epithelial percentages
were not significantly associated with either subtype, either the active or the inactive.
So, in conclusion, the DNA results from our analysis of the TCGA Double Normal data shows
that there are strong evidence for either field effects or tumor contamination, and
our team has some work to do in trying to figure out exactly how to distinguish those
two or whether there are other ways that we can take advantage of this very rich data
to try to tease those two things apart. The pathologic evaluation can get us partly there,
but I think there's a lot more we can do, particularly in those triplets, to try to
understand where some of this genetic heterogeneity is coming from. And in contrast, the RNA really
shows us the expression subtypes and is not strongly indicative of tumor contamination.
So, I'd like to thank the team that contributed all of this data. I'm presenting information
on behalf of many people. Specifically Chris Benz I wanted to note because while he didn't
generate one of the particular figures here, he's been a co-leader of this group and has
provided a lot of really great insights, as well as the hard work by Andy Cherniack, Dan
Koboldt, Li Ding, and Swapna Mahurkar from USC who did all the methylation analyses.
[applause]
Male Speaker: Beautiful work. Tom Gerdanaw [spelled phonetically],
Michigan. I spend part of my year signing out [spelled phonetically] breast cancers.
Have you tried to incorporate fibrocystic changes, hyperplasia, atypical hyperplasia,
[unintelligible] change. So, pathologists recognize a lot of changes that occur in benign
breast tissue. Maybe that's driving some of this. Have you guys tried to address that?
It's a hard thing to address.
Melissa Troester: It is. We have -- in addition to sort of just
looking at composition changes, we had the pathologist score for any type of benign conditions
that they saw present in that tissue so we can go back and analyze it, although those
events were somewhat rare in those adjacent normals.
Male Speaker: Because, you know, a lot of breast cancers
arise in fibrocystic changes --
Male Speaker: -- and you can divide them broadly into a
proliferative type, where they have more epithelial hyperplasia, and a nonproliferative type,
and maybe that'll be useful.
Melissa Troester: Yeah, and also Andy Beck, I don't know if
Andy's here, but he has a really nice DTF signature which represents some fibrocystic
changes, and that's another thing we're analyzing in this data, to see if we can capture some
of that on the RNA level.
Male Speaker: Quick question. Have you actually looked at
menopausal status with respect to your two different subtypes of normal tissue?
Melissa Troester: Which status?
Male Speaker: Menopausal status.
Melissa Troester: Menopausal status.
Male Speaker: Simply, are you looking at cycling versus
Melissa Troester: We have. Actually, we've been studying age
in relation to those two characteristics, and young age seems to be associated with
the more active phenotype, but menopausal status, the association's a bit weaker than
it is for age.
Raju Kucherlapati: Thank you, Melissa.