Tip:
Highlight text to annotate it
X
JAREK WILKIEWICZ: Hello and welcome to "YouTube Developers
Live." We have an exciting show for you today.
We'll be talking about social media verification.
And I have excellent guests from Storyful today.
Let me introduce our guests.
Joining us live from New York City, I understand from a cafe
someplace, is David Clinch.
How are you doing, David?
DAVID CLINCH: I'm doing great, Jarek.
Thanks.
Yeah, sorry, in between meetings here in New York, but
thanks for having us.
JAREK WILKIEWICZ: Well, we really appreciate that.
And then, live from Dublin, we have Gavin Sheridan.
How are you doing, Gavin?
GAVIN SHERIDAN: Hello, how's it going?
JAREK WILKIEWICZ: Everything is going good, now.
And Paul Watson.
PAUL WATSON: How are you doing?
JAREK WILKIEWICZ: I'm doing good.
Well, thank you very much.
My name is Jarek Wilkiewicz, and I work for YouTube
developer relations.
So let's get into it.
I understand that Storyful is the new news agencies in the
age of social media.
So, David, tell us a little bit more about what your goal
is, and how did you come up with the idea?
DAVID CLINCH: Well, yes, I've been a
journalist for many years.
Our founder and CEO, Mark Little, also a journalist for
many years.
And along with Gavin and Paul and our team, we created
Storyful three years ago to essentially be the news agency
of the future.
And the key to our goal at Storyful is the discovery and
verification of user-generated content.
And a significant amount of that content that we supply to
the news companies that we work with in the US and all
around the world is YouTube video.
But again, the key to us is not just discovering it, but
making sure that we have the original version, and that our
news partners have permission and access to use the content.
JAREK WILKIEWICZ: Great.
So if I showed a video of my backyard, which is actually
pretty arid since I'm not that good about upkeep, and claim
that this is from North Africa, you guys are going to
be able to actually identify it as a fake?
DAVID CLINCH: Yes, we are.
And it's a combination of what we call our human algorithm,
which is a combination of our proprietary discovery tools
that allow us to tap into every social media
platform in real time.
And then finally, old fashioned, traditional
journalism at the last level to fully verify the content.
So we combine real time technology with real time
journalism to be able to verify and debunk videos from
anywhere, any time.
JAREK WILKIEWICZ: Very interesting.
Sounds like the human and the machine
working in perfect concert.
All right, so I understand we have a quick video overview of
what Storyful does.
So I'm going to ask our producer, in the studio here,
to roll the tape for us.
And then after that, we'll talk about some of the
technical challenges involved.
[VIDEO PLAYBACK]
-I'm Malachy Browne, I'm news editor at Storyful.
Hurricane Sandy was one of the biggest stories of the year,
from its timing a week out from the US election, to being
the biggest storm to hit the US since Katrina.
Storyful has built it's brand on establishing the veracity
of social media content so that news organizations can
use it straight away.
One of the biggest challenges for news rooms on a story like
Sandy is the sheer volume of user-generated content that's
been shared on social platforms.
You've got to know who to trust, you've got to know what
to trust, and how to verify content.
At Storyful, we're very mindful of official channels
like the National Guard.
We know how to doorstep their YouTube account so that we get
first dibs on the content that they're uploading, and we get
to pass that on as trustworthy content to news organizations.
At Storyful, our mantra is "news from noise." It's about
finding the authentic voices on the ground,
closest to the story.
It's about finding reputable sources, reputable uploaders
of content.
And when we find content, making sure that that content
is what it says it is.
-A building has collapsed here in New York.
-One of the consequences of the Occupy Movement has been
growth in this network of citizen journalists who are on
the ground in New York, and other cities in the US.
-Fantastic.
Yeah, I'd be psyched to give you guys the fridge.
-Power has gone down in this section.
See, there's some power still up.
Winds are getting rough as we get closer to high tide.
-We at Storyful have built relationships with guys like
Tim Pool, who during Sandy went out and filmed the
electricity shutting down-- floods on the streets.
He gets it to us and we get it out to our news clients.
Where Storyful really hits its stride is finding those gems
of videos, that video gold.
One example is a great video by a high school student in
New York called Matthew [? Weinschneider. ?]
A large tree at the back garden was uprooted, fell
down, crashed through the fence, and narrowly missed his
neighbor's house.
And for us, we were able to verify the location of the
video, find his house on satellite imagery.
We looked at his Google+ profile, his Facebook profile,
and we were able to identify the shed beside the tree in
his back garden, and the distance from his back porch
to the fence to absolutely verify that this was the
location where the tree had fallen over.
It's a challenge for the likes of YouTube, Google, Facebook,
and Twitter as well, to make it easier for us to separate
the news from the noise and find out what's really true.
-Oh my god.
-It hit my car.
-Oh, the car.
I got that all on film.
-Oh.
-I got that all on film.
-Oh my god.
[END VIDEO PLAYBACK]
JAREK WILKIEWICZ: Fascinating.
David, tell us more about this clip.
I understand you were involved in the
editorial process as well.
Can you tell us a little bit more about that?
DAVID CLINCH: Yeah, in fact, one of the key elements of
what we do at Storyful is that we have our journalists and
our technologists sitting in the same
room, at the same time.
Either in Dublin-- in our main newsroom-- or virtually--
through a 24/7 hangout that we use--
we are in constant communication, editorially,
around what new stories are happening.
But we also have our technologists, like Gavin and
Paul, sitting with us and monitoring our tools that help
us with discovering verification.
And literally, on some occasions, tweaking in real
time our search and discovery tools so that we can not just
use the tools that we have, but improve them in real time
as we approach real stories.
So for instance, today we're covering the clashes in Egypt.
We have search terms that are going into our discovery
platform across every social platform searching for videos.
And at the same time, we have our editorial team-- our
journalists--
looking at what that system brings up, and using old
fashioned journalism to help verify and get access.
So that's where Paul and Gavin can explain to you a little
bit more the tools that we use to help discover--
when there are thousands of videos out there--
which are the real ones and which are the original ones.
JAREK WILKIEWICZ: Great.
This is a great segue into the technology.
I wanted to make a comment that it sounds like very much
that working for Storyful is very much an adrenaline sport
because you guys have to do this in real time, as the
story is developing, isn't it?
DAVID CLINCH: Yeah, it's true.
And we have three years of experience in doing this.
And also, as I said, we're constantly
improving our tools.
But it is really exciting.
In fact, I think it's the sharp end of the future of
journalism.
And we're not just talking about it, we're
actually doing it.
And in my old days as a journalist, I always wished I
had a team of technologists immediately available to help
build and improve tools.
That's what we do at Storyful.
There's no barriers.
JAREK WILKIEWICZ: Great, so hacks and hackers, right?
DAVID CLINCH: Yeah, that's us.
JAREK WILKIEWICZ: All right, well, let's talk about the
technology now.
Gavin and Paul, tell us a little more about how is your
system implemented.
How do you actually do this?
PAUL WATSON: Yeah, so I mean it's pretty--
there are quite a few different tools and systems
that we actually build.
It's not one monolithic application.
We have a couple of Ruby processors which do a lot of
HTTP calls, a lot of hitting the YouTube API and a lot
other APIs.
We have a Postgres database that stores
the majority of it.
In that Postgres database, just as a note, we have 70,000
YouTube videos that have been looked at by individual
journalists, and verified, and annotated, and context added
to them, given titles, dates, and locations.
We've got 70,000 of these videos that
are really high quality.
And also cleared and worked with the
owners of those videos.
And then we have a Ruby on Rails front end for our own
clients, and for our journalists.
It's sort of a CMS system.
And then we have a--
it's a backbone JavaScript application, at the moment--
but we're actually moving over to Google's AngularJS as our
new front end application development framework.
And so all together, these various tools and systems that
we've built really help the journalists just be faster and
better and let the humans do what they're good at and the
computers do what they're good at.
JAREK WILKIEWICZ: Cool.
And since this is a "YouTube Developers Live" show, what do
you guys use the YouTube API for?
PAUL WATSON: Well, first of all, we use it to get the
video details that you don't always see in the UI.
Because obviously youtube.com is quite consumer focused.
So things like location and uploaded data is one of the
most important pieces of information about a video, but
it's not serviced in the youtube.com UI.
So Gavin over here, he'll use a tool, and he'll actually
look at the JSON output of their YouTube API to get at
the details of the video.
And then we'll pass it.
So that's the main way.
We also have a tool that goes through all those 70,000
videos and makes sure that they're still live.
So at the moment we have 6,000 videos out of those 70,000
that are no longer available on YouTube.
But we still have the metadata, because we don't
store the actual videos, obviously.
We leave those on YouTube.
JAREK WILKIEWICZ: Sure, sure.
So this is an important part of dealing with user-generated
content, right?
Because it ends up in the user's accounts, it can be
considered ephemeral.
Like users have full control over it, they can pull it,
they can do whatever.
PAUL WATSON: Yeah.
JAREK WILKIEWICZ: Great.
So let's talk a little bit more about is there anything
surprising about your product-- your system--
that you have learned while building it that you're
willing to share with our audience?
PAUL WATSON: Yeah, I mean, Gavin can actually talk quite
a bit about scrapes.
If he wants to--
GAVIN SHERIDAN: Yeah, I mean, I suppose one of our biggest
problems is that we'll see a piece of user-generated
content appear on YouTube, and within 15 minutes there will
be 10 copies of it.
And that is one of the biggest problems in journalism.
So we need to understand who owns the original video, and
then we have to kind of track back and find out, where did
this video originate?
Is this the original video or is it copied?
Dealing with copies is one of our biggest everyday tasks.
PAUL WATSON: And then I think the other thing is spam or
sort of very early uploads.
So we sort by the newest uploaded date, rather than,
say, the YouTube UI, which sorts by relevance.
So a lot of consumers of YouTube don't see the spam
videos that have been uploaded and which are taken down very
quickly by the YouTube automated systems.
But if you're using the API and you're hitting it quite
regularly, you're polling it--
those keywords and accounts--
you're actually going to see quite a bit of spam video that
you then have to filter out yourself.
JAREK WILKIEWICZ: I see.
Great.
Well, thanks for sharing that.
Have you guys tried the YouTube Data API v3 yet?
PAUL WATSON: Yep, we have, but it's not in production yet.
We're still exploring it, especially the
entity side of thing.
A lot of videos don't have very good titles or
descriptions, so if we can get more entity abstracted
[INAUDIBLE] out of them, that's fantastic.
But it's still very early days for us.
JAREK WILKIEWICZ: Yeah, I think for those of you
watching, we actually have the Topics API in Data API v3, so
we actually integrated Freebase.
And then there are additional annotations available for
content that you can then query in the Freebase database
and try to reason about.
Because these are semantic annotations as opposed to just
regular plain text keywords.
Any functionality--
DAVID CLINCH: Just to add something there, Jarek--
JAREK WILKIEWICZ: Sorry, go ahead, David.
DAVID CLINCH: Sorry, just to add something there, which is
very important, when you were talking about metadata.
In many cases, the videos that we're discovering have little
or no metadata at all.
They might be uploaded as video nine with no location,
no other information.
And we're discovering them not by actually searching on
YouTube, but through a social search which taps into the
mentions and the references to those videos from the people
who are at the scene, or journalists who are
knowledgeable in a particular area like Egypt or Syria or
somewhere else.
And they're pointing us towards these videos, with
additional context.
And really, the only information that we have about
the videos exists on other social platforms.
So it's very important for us to understand, and for our
news clients to understand, that in some cases you will
not find the videos by searching on YouTube.
But we can help discover them because they're being
referenced on other platforms.
At some point later they may have metadata added, but at
the time that they're uploaded, they often don't.
JAREK WILKIEWICZ: Yeah.
Yeah, very interesting.
Sounds like when you think of metadata, it's in the broader
context of all the social media signals that you can
reason about.
And that kind of builds the metadata in your
system about the video.
YouTube is just one of the sources, and sometimes the
only thing that YouTube will provide is the actual video,
without any other signals, right?
But fortunately, you can actually look at a
lot more than this.
Thank you for sharing that, that's very interesting.
Is there anything in the API that you guys would like to
see that is not there yet?
Any things on the wish list?
PAUL WATSON: Yeah, streaming APIs.
[? Pushnote ?]
APIs.
That's really-- it will help us a lot.
We do quite a bit of polling, and the YouTube API is a lot
more generous than, say, the Vimeo API.
But again, if we can get more streams of data, it will fit
our use case a lot better.
JAREK WILKIEWICZ: Great.
Well, thank you for sharing that.
We do have quite a bit of work ongoing right now to integrate
our synchronous push notifications
into Data API v3.
In fact, for those of you that attended Google I/O or watched
the videos-- there was a presentation by my colleague
Jeff Posnick talking about exactly how this is going to
work when it's fully rolled out.
We will post a link to that video, and then we hope to
have it available soon.
Well, let's switch to the Moderator to see if we have
received any questions.
I see there's one question that is relevant, and it's
actually from me.
It's about whether YouTube Developer Relations is hiring.
And the answer is yes, we are hiring.
Google Developer Relations is hiring as well.
You can learn more at developers.google.com/jobs.
Well, thank you very much for your time.
I really appreciate you joining us late from Dublin.
And thank you, David, I understand he needed to run to
the meeting that he alluded to earlier during the show.
A real pleasure having you on the show.
You're doing some very interesting work.
And for those of you watching, thank you very
much for your time.
And we'll see you again live next week on "YouTube
Developers Live." Bye.
PAUL WATSON: Thank you, Jarek.
GAVIN SHERIDAN: Thanks.