Tip:
Highlight text to annotate it
X
In this video we'll introduce querying
XML data using XSLT.
As a reminder, querying
XML data is not nearly
as mature as querying relational data
due to it being much newer and
not having a nice underlying algebra like the relational algebra.
We already talked about XPath
which was the first language developed for
querying XML data.
And we've also talked about
XQuery which was actually developed after
XSLT, but it's similar
to XPath in it's style, where
XSLT, which we're going
to cover in this video, is actually quite different.
XSL stands for the
Extensible Stylesheet Language, and
it was introduced originally but soon
extended to included transformations and
XSLT is currently much
more widely used than XSL.
Here's how we can think of XSLT as a query language.
We have an XSLT processor and
we feed to that processor
our XML data in the
form of a document or a stream.
And we also give the processor
a specification in XSLT, which
by the way is expressed, using the XML format.
The processor takes the data
and the specification, and it
transforms the data into
a result, which is also
expressed as an XML document or string.
Now if we think about traditional
database query processing, there's actually a natural mapping.
If we think even about relational
processing, we have a query processor and a database.
We feed the data to the
query processor, we feed
the query to the query processor
as well, and out comes the answer.
So XSLT processing, although it
really is through transformations, it can
be thought of very much like querying a database.
So even though XSLT be thought
of as a query language, the query
paradigm itself is quite different
from what we're used to with SQL
or even with XPath or XQuery.
It's based fundamentally on the
notion of transforming the data.
And that transformation occurs with rules.
To understand what the rules
do and how the transformations work, it's
again very instructive to think of the XML as a tree.
So let's take our bookstore
data and again make it
a tree as we did before
when we were first learning about XPath.
So we have some books sub-elements
and we have a
magazine sub-element and I
won't be elaborating all of these.
We'll just imagine sub-trees here with
our book we have a title
and we have some authors.
The title might be our
leaf so we'll have a first course in
database systems for example, whereas
our authors may have author
sub-elements, and within
those author sub-elements we might
have first name name and
last name, abbreviated here,
with string values for those,
and of course more authors sub-elements as well.
So that give the basic
idea of a tree structure
of XML, exactly as we've seen before.
So now let's see what happens
with XSLT in light of this tree structure.
So the first thing that
we have is the concept of
matching a template and replacing it.
So the idea in XSLT is that
we can write an expression that finds
a template that finds portion
of the XML tree based on template matching.
For example we might find
books that have certain authors
and once we find those will
actually replace the entire
subtree with the result of what we put in our template.
For example, we might decide
that want to pick the title
here and replace this entire subtree with the title.
Or we might match down
to our authors and we might
find our first name and last
name, and say replace this entire
author sub-element with the
concatenation of the first and last name.
Again the idea being that you
write templates that match within
the tree, using, in fact, XPath
as we'll see as one of
the portions of writing those
templates, and then replace that portion of the tree.
We can also do that recursively.
So we can, for example, decide
that we're going to replace this
book with a different
element and then recursively apply our templates to its children.
We'll see that in a demo.
It takes a little getting used to again.
The XSLT language has
the ability to extract values, and
again it often uses XPath
expressions in order to do that.
It also has some programming language-like constructs.
It has a For Each so we
can do iteration, and it
has conditionals so we can do if.
All of these will be much better seen in the demo.
Finally, I'll have to mention that
there's some somewhat strange behavior
having to do with white
space in XML data and
some default behavior, which we'll see in the demo.
And there's also an implicit priority
scheme when we have multiple
templates that can all match the same elements.
So let's move directly to the demo.
We're again going to be using
our same bookstore data and we'll see a number of XSLT examples.
Even more than XQuery
or XPath our examples
will not be exhaustive, but they
will give a flavor of
the language and you'll be able
to express some fairly powerful
queries using just what we show in the videos.
Now let's see XSLT in action.
Let me first explain what we have on the screen.
In the upper left window we
have the document that we'll be querying over.
It's the exact same bookstore
data that we've been using for all of our examples.
So I'm actually going to make
that a lot smaller, so that we can see our templates better.
In the upper right corner, XSLT templates.
And every example we're going
to do is going to have
us opening and closing a style
sheet with some parameters is
to tell us how we'd like to display our results.
And then I'll be putting
different templates between those opening and closing tags.
Notice again that XSLT
is expressed using
XML once we have
our data and our set
of template matching rules we'll
run our transformation and in the bottom we'll see our result.
So you can think of
it as a query in the upper
right, the data in the upper
left, and the result displayed in the bottom.
Now even more than XQuery,
it's not going to be possible
to explain every single intricacy
of the templates that we're going to write.
So I again encourage you
to pause the video to take
a look, as well as
download the data file and
the transformation file so that you can experiment with them yourself.
Our first example is going
to do some very simple template matching.
It's going to look for
book sub-elements and when it
finds them, it's going to
replace those book sub-elements
with a book title element,
the value of the
title component of the book and a closing tag book title.
And it's similarly going to
match magazines of elements and
replace those magazines of elements
with an element that's an
opening tag of magazine
title, the value of
the title, sub-element of the magazine, and the closing tag.
So again the template will look through the XML tree.
They will match the sub elements in the tree.
It'll match the book of elements and the magazine of elements.
And for each one it will
replace those subelements with
the expression, in this case
with our opening and closing tags
that have changed and the value of the title.
We run the transformation and we
see, indeed, that the results
are our four book titles,
now opening and closing tags
that are book titles, and our four magazine titles.
For our next example, we're
going to only match books that satisfy a condition.
We do that by in our matching expression using XPath.
Now there's one small strange thing
here, which is we can't write
the less than symbol, we actually
have to use the escape symbol for less than.
But otherwise, this template finds
books whose price attribute
is less than 90, just like
we do in XPath using the
square brackets for conditions, and
when it matches those books, what
it does here is it copies those books.
So this is an important construct
that says if I match
the book, I'll copy the
book, I'll select dot which
the current element, so in
effect it's saying find the
books and retain them exactly as they are.
Let's run the transformation and take a look at what we get.
We can see that we got
this book because it's price
is 85 and we have another book
whose price is 50 and
another book whose price is 25.
But we do see something a little bit strange here.
We got our books so we also have these strings here.
These long bits of text
that we well we don't really know where they come from.
Well this is one of
the peculiarities of XSLT
. When you have
elements in your database
that aren't matched by any
template, what XSLT will
do is actually return the
concatenation of the string
leaf or text leaf values of those elements.
I know it seems kind of strange.
There's actually a simple fix for that.
We're going to add a second
template that matches those
text elements and for those returns nothing.
So here we've added a template and let me explain.
What we're matching here is elements
that satisfy the text predicates
so that will match those leaf
text elements and when
we write a template that has
no body, so we open
the template and then we
close the template with no
body at all, that says
match the element and then
don't replace with anything at all.
So this is very useful construct,
the templates that don't have
a body, for getting rid
of portions of the data we're not interested in.
So let's run the transformation
now and take a look
at the result, and now when
we scroll down we see that
all of that extraneous text that
we saw in the previous example is now gone.
So as we've seen, XSLT works
by defining templates that are matched against the data.
When a portion of the data
is matched by a template the template says what to do.
We might place that portion
of data with something different and
we might just remove that portion
of the data from the answer or we might just copy it over into the answer.
Now let's explore what
happens when we have
portions of the data that are
matched by more than one template
in our XSLT specification.
So here we're going to have three templates.
The first two templates both match book elements.
The first template says when
we match a book element, just throw it away.
Again, this is an example of
the template when we don't have
a body that says eliminate
the matched elements from the answer.
The second template says to do exactly the opposite.
Says when we match a book
sub-element, keep that book
sub-element exactly as it is.
As a reminder, this body here
says copy the current element into the result.
Our third template matches magazines,
and this one we just have one
and it says copy the magazine into the result.
So let's go ahead and run
this transformation and see what happens.
Well, first of all, we got
an ambiguous rule match, so that's good.
The system recognized that we
have two different rules that are matching the same element.
But then it did decide to give us a result.
So let's take a look at what happened.
It did return, in fact,
all of the books in
the database as well as all the magazines.
So we can see that it
chose to use the second
template instead of the
first template when we had the ambiguity.
So let's try an experiment.
Let's take our two book
templates and let's just reverse their order.
So now we have the one
that copies first, and the one that eliminates second.
Let's run the transformation, and indeed, something changed.
We no longer got the books.
So what we can deduce from
that is that when we
have two templates that both
match and we get this
ambiguity warning, it still
does the transformation and it chooses
the second of the matching transformations.
Actually, it turns out not to be quite that simple.
It doesn't always choose the second one.
In this example, we're going
to change our first template
to match only books whose price is less than 90.
So we'll use the same syntax we used before that before.
We have to escape that less than character like this.
Less than 90.
Close our score bracket.
So now our first transformation says
when we find books that are
less than 90, let's
return them, and when
we find any book, let's not return it.
So again we're going to have some ambiguity; let's run the transformation.
Well, we actually didn't get
an ambiguity error this time,
or warning, and the reason
is that XSLT actually has
a built-in notion of some templates
being more specific than others, and
when a template is more specific,
it is considered the higher priority template.
So what happened when we ran
this particular transformation is the
books that, where the price
was less than 90, were matched
by the first template and because
that one's considered more specific they
were not matched by the second template.
So we can see below that
we did get back all
of the books that are less than
90, and none of the
other books, and again, we
got back all of our magazines.
So let's make one last change to experiment.
Let's take our second book,
and let's add to it a
simple condition that's satisfied by
every book, which is the condition
that the book has a title sub-element.
Again, this is XPath.
Now, perhaps our two
rules have equivalent specificity,
in which we case we would again have ambiguity.
Let's just delete our result here
and then let's run the transformation and see what happens.
Indeed, now we have an
ambiguous rule match because both
of these templates have
a condition, so they are
considered equivalent again, just when,
just like when neither of them had a condition.
And now that they're
considered equivalent, again the
second one is going to
take precedence, because as you
can see we didn't get any books in our result.
So even though we have some
books that are less than 90,
those books also have a
title, so those books
were matched by the second template and they were not returned.
So what you can see from these
examples is that you
do need to be very careful
when you write XSLT programs or
queries, where multiple templates will match the same data.
Now let's look at a couple
of different ways of copying our
entire input data as a result of our query.
Our first example is the simplest one.
We write a template that matches the root element of the document.
As you may remember from XPath, a single slash is the root element.
And then as the body we have
that copy of template that
copies the entire current element.
Let's run the transformation and we
will see the we get our
entire database as a result.
Incidentally we could change that slash to be bookstore.
It would do exactly the same
thing, since our bookstore is our root element.
Okay, delete this, run the
transform, and once again,
we get the entire database as our result.
Now I'm going to show
action with a much more complicated
way of copying the entire document,
but it uses an important
kind of template that we'll see in other contexts.
This template is our first
example of recursively applying
templates to our result.
What we have here is a
template that matches absolutely anything in XML data.
This is actually an ex-path-expression that says
math an element with star,
that means any element tag, any
attribute, at star, or
any text leaf of the XML data.
So again this or-construct here
is seen quite frequently in XSLT
specifications to match just
anything at all in the data.
When anything at all is matched
that element of the
data is copied, and then
the templates are applied recursively
to everything below that's of any type.
So it may be best just to
take my word for it, or
you can spend some time on
your own thinking about exactly why
this works but, again, the
idea that we match any
type of element in our
XML element, attribute or text,
and we copy that object, and
then we apply the templates
to all of its sub-elements recursively,
again copying them.
Now obviously this is not
the best, the easiest way to copy an entire document.
We saw the easiest way to
do it with our previous example, but
we'll soon see why this particular template is valuable.
When we run it, of course we get back the entire document.
Now the reason that this type
of template is valuable is that
we can use this as one
of our templates and then add
additional templates that give
us exceptions to copying the whole document.
And that will allow us to
copy the whole document except
with changes in certain parts,
and what I'm adding here actually is a whole bunch of additional templates.
So the first one says: apply
all templates recursively to link to the entire document.
The second says, "When you
find while you're applying them
recursively that you're at
an attribute called ISBN, we'll change that to a sub-element."
So we'll match the ISBN attribute.
We'll change it to a
sub-element, similarly to what we saw
before by giving an open
tag ISBN and the value of the current element.
We'll similarly take our attributes,
our price attributes, and change
them to sub-elements and our
editions, our months, and our years and our magazine.
And last of all, we'll also make a change to our authors.
When we match an author, instead
of having sub-elements, we'll convert
those sub-elements to be attributes,
the last name attribute and the first name attribute.
So let's run the transformation, and
we'll see our data is now significantly restructured.
We have our bookstore and we
have our books, but our ISBN
numbers are now sub-elements, and
in our authors the last names
and first names are attributes.
And all of the books
are restructured in that fashion
and our magazines again have attributes
restructured as sub-elements.
Now let's see what would
have happened if we ran this
XSLT specification but we
didn't have this mega
template at the beginning that
does the recursive application of templates to the entire database.
When we run the transformation now,
well, we get a kind of surprising result.
We won't try to analyze it in its entirety.
It's a combination of only
matching automatically of sub-elements and not attributes.
And furthermore, dumping out all
the text leaves like we saw in an earlier example.
So, again, presuming that we would
not want this to be our
result, that shows the
necessity of including the sort
of generic template that matches
every type of object in
the database, and recursively applies templates to its children.
Now let's switch gears entirely.
What we're going to do
in this transformation is effectively write a program.
We're going to use the "for
each" and "sorting" and
an "If" statement, and the
program is furthermore going to
take the XML data and it's
going to transform it into HTML,
which we can then render in a browser.
So it's just one template
that matches the root element of
our document, and once that
root element is matched, it spits
out the tag HTML, it sets
up the table, so again we're
actually writing the result here,
and put some headers for the table.
And then we see a
for each that says we're going to
run the body of the for each for each book in the database.
We're gonna sort the result by its price.
If the price is less than
90, then we're going to generate a row in the table.
And that row is going
to be set up with italics for
the title, and it's going
to give the value of the price,
it's going to close the row,
and we're going to close all the tags.
So again, this is quite different in a couple of ways.
First of all, that it's written
more in a programmatic style, and
second of all that the result actually going to be HTML.
Let's run the transformation, and we
can see the result here, which is indeed HTML.
In fact, we can
take this very HTML and we
can render it in a browser and see how nice it looks.
And here it is.
We can see very beautifully formatted
the three books that cost less
than 90, sorted by price,
with the title in italics, all
formatted in an HTML table.
And that was with not
a very complicated XSLT program.
So it's not surprising that XSLT
is used frequently for for
translating data expressed in
XML to HTML format
for rendering, as well as being used as a query language.
Our last two examples are back
to a more traditional template matching style.
Again we're going to start
with this recursive template match that matches everything in the database.
That means we're gonna copy everything
over except we're gonna make one type of change.
Specifically, we're going to
change...we're going to take
Jennifer out of the database
and then we're going to change Widom to Ms. Widom.
So every place where we have
Jennifer as the first name
and Widom as the last name, we'll
end up with just a name, Ms. Widom.
Specifically, we do it with two templates.
The first template says when we
find a first name where
the data in that first
name equals Jennifer...okay, so we're
again are using the dot to refer to the current element.
The data is a built-in function.
So a first name that's equal to Jennifer.
When we match that, we
want to...we'll actually return nothing.
There's no body in this template, so that will remove that element.
Now you might wonder why we
didn't just write a condition that said first name equals Jennifer.
The problem is, to write that
condition, the current element
would be the parent, and we
don't want to remove the parent,
we actually want to remove the first name itself.
In addition to removing
first names that are Jennifer, we'll
also match last name
templates where the value
is Widom and we will replace
those with an opening
tag name, the string is Widom, and a closing tag name.
So let's run the transformation and let's take a look.
And we will see in the
case where the author was Jennifer
Widom, it's now the single
element name Ms. Widom, and
we should see that occur a
few other times in the database as well.
As our very last example, let's
perform the same transformation, but
let's do it with just one template.
What we'll do is we'll look
for office of elements
where the first name equals Widom.
Now we don't need to use data.
So first name equals Widom.
And we'll take those entire
author sub-elements and we'll replace
them with an author
sub-element where the name is Widom.
So we need to put author here.
Let's get rid of this automatic
simply generated closing tab; we want it to be over here.
We'll get rid of this first template.
So again, we're going to make exactly
the same change, but we're gonna do it with a single template.
It's going to look for authors
where the first name is...whoops, better make that Jennifer.
And it's going to replace
them with the author
sub-element with just Ms. Widom.
We run the transformation and let's
take a quick look at what we got.
And we again see exactly
the same result, with a somewhat simpler program.
That concludes our demonstration of XSLT.
Again, we've shown only some of the constructs.
We haven't gone into great detail or walked through the syntax.
XSLT is very powerful.
We've seen quite a few different things.
We've also seen a little bit of non-intuitive behavior.
We have to be a little careful with white space.
We have to be a little careful
when we have multiple templates that match the same data.
But once we get
it all figured out, it can
be quite powerful for transforming data and for querying data.