Data Mining and E-Business: the Social Data Revolution

Transcript of Andreas Weigend

Data Mining and E-Business: The Social Data Revolution

Stanford University, Dept. of Statistics

Andreas Weigend (www.weigend.com)

Data Mining and Electronic Business: The Social Data Revolution

STATS 252

May 11, 2009

Class 6 LinkedIn: (Part 2 of 2)

This transcript:

http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_6linkedin-2_2009.05.11.doc

Corresponding audio file:

http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_6linkedin-2_2009.05.11.mp3

Previous Transcript: (Part 1 of 2):

http://weigend.com/files/teaching/stanford/2009/recordings/audio/weigend_stanford2009_6linkedin-1_2009.05.11..doc

To see the whole series: Containing folder:

http://weigend.com/files/teaching/stanford/2009/recordings/audio/

D.J.: I think the best thing to do is we’ll try to bleed in your questions throughout here and if you see an area where you say, “That’s where I want clarity,” or “I want the answer to one of those questions,” you should wave your arm frantically, yelling at us to try to get an answer. We certainly believe that this is your time and we’re here to really answer your questions and help you to get the most out of this.

We talked about standardization. I know a lot of these questions touched on the areas of reputation. We talked about what you could do with data for a career path. Let’s get into a bit of how we also turn data into monetization, how we think about this entire process. One of our very popular products, on the right hand side of the homepage, is called “Who’s Viewed My Profile?” There is an example of the box, at the top, of what you would see. For me, this is the one I took this morning, and it says, “Your profile has been viewed by 7 people in the last day. Yesterday you appeared in search results 19 times.”

This was one of those very fast prototypes; get it out there, try it out, see what people think, iterate, iterate, iterate on it. As we’ve iterated, this is what happens if you click on that. It will say, “Your profile has been viewed by 7 people, including…,” and this is where the interesting part gets into. I’ll let Reid comment on this because he was more involved around how do you make this happen.

I want to touch on the part about the standardization question. We don’t want to actually show you the person because of any concerns about showing their actual name. We want to hide that. How do you hide that? Standardization is your best bet. I know this is tough to read, but you could say someone in the project administration function, in the financial services industry, from India has looked at my profile – doing a whole bunch of standardization there.

A software engineer at eBay – we have business logic to say if we said your company name it would give you away. Suppose there are only two people in your company; it’s a pretty good chance that you know who it’s going to be. We want to obfuscate that by saying something like the financial services.

Reid, do you want to talk a little bit about how we came to that decision?

Reid: We knew that one of the things that people like to do is they actually like to know what’s going on in terms of who is looking at me, I may want to optimize for that, that’s interesting, but the problem is it’s a two-sided equation. The privacy issue is if you put Reid Hoffman looked at you, Reid Hoffman might be really irritated by that because then D.J. writes me and says, “It says you looked at me,” and that sort of thing. I may not want to do that.

0:03:00.3 We actually had the idea for this feature about two years before we actually launched the feature. We didn’t know how to solve the privacy problem. Then, when we started doing some of the analytics work, the standardization work, we thought if we could categorize – if you click on all these things, I think the minimum is each of these would lead to ten candidates. If you did a search on this, it’s at least ten people and there is usually a minimum of twenty to fifty, depending on how the algorithm works. We could actually set a default standard to “anonymous,” which is this – some information. You actually get useful information, but it doesn’t lead to a name. You can set it as an individual, by the way. You’re all default – if everyone in class leaves here and looks at D.J.’s profile, by default, you’ll show up like this. You can set it to either it’s Reid Hoffman looking, or I don’t show up at all.

D.J.: Exactly, if I look at your profile, it will say D.J. Patil from LinkedIn has looked at your profile, because I have explicitly set my privacy settings to show that. It’s the same with Reid. Are there any questions about that first part, about what we’re doing?

Student: … I can see…

Reid: This one isn’t targeted.

D.J.: This one isn’t a targeted one. This is just a brand ad.

Reid: As a guess, this is a broad enough site for professionals in the U.S. If they want to do that, fine.

D.J.: For example, SAS will target ads occasionally, to me, because it’s an analyst. Okay, that guy looks like the guy we want.

One of the things we also did with this, which was very clever –

Reid: FedEx actually tries to sell to every professional, so that’s not necessarily a bad decision on our part.

Andreas: Which level of anonymity should you present something? There were two elements to this. One is that you default to something that is sort of the middle, as opposed to defaulting to something that is one of the two extremes.

Your next homework assignment will be a recommender system for Delicious, which we also had the last couple of years. In the early days of Delicious, we had all these discussions with Joshua Schachter on what should you do as default. There they felt that everything you save, every bookmark you save is actually public for the rest of the world. This was the right thing in the early days, but down the road, it might not have been as useful. You might change those defaults and then you potentially see what happens on Facebook, if they get changed without being too obvious about it, then people are not happy.

0:05:44.5 The second point is, however, that this one doesn’t give you the data to actually contact the person back. If you want people to say, “I don’t mind my name being revealed,” then they’re probably much more likely if it gives them a free way of contacting people back or they could do it one [0:06:02.5 unclear] time like this.

I think it’s a really good example about the complexity of what, on the surface, really seems like a simple question. Deep down, it has huge business applications. Can anybody use this, or do we have to pay?

D.J.: This is where the interesting part is.

Andreas: Or do we have to pay extra to not show up.

Reid: No, to not show up is free.

D.J.: Remember, a big part of this is that we don’t want to use this as extortion. We don’t want to use this site as saying, “Look, now you have to pay here.” We’re going to protect you. I first want to comment on the beginning part about the privacy because I think this is a very rapidly changing space. One of the big things that I think is happening is this space is changing, and I suspect that within four to five years, at the longest, it will become a place where somebody will be really nervous about people looking at you. I actually don’t personally believe that this will be an issue.

The only place where I think there may be a caveat on that being wrong is a professional aspect. The way things are going with the information that you get out of things, everything from Google Analytics, to all other places on the web that tell you information, you will just have all that information readily at your fingertips. I’m happy to be challenged on that, by the way.

Reid: But we should get going.

D.J.: Another part of this I want to bring up is the monetization part. Quite easily you could ask how we monetize this. You have a list. Show a certain number of the elements and then at some reasonable cut off you ask, “Do you want to see more? You have to upgrade.” It’s part of the bundle deal that you get. That’s responsible for a very healthy chunk of revenue for such a lightweight product.

One of the things we can do is show you what a rapid prototype would look like for an iteration, just for fun. Here is one we did and this is a prototype that took us three weeks because we were using a new database system. In this case we actually put in the names, as an experiment, to see what would happen. You get to see your names, and we want to see how people did.

0:08:36.2 We asked, “What if we showed you over time how you’ve been viewed?” We could also overlay on top of this how you’ve been viewed relative to your industry, relative to your peers, whatever defines your peer group. We could also ask, “Of these views over this period of time, how many are within what degree of your network.” We could also ask for regions, what region feeds your network, where have you been.

The other thing is companies. You might want to know which companies are really interested in you, or titles; let me click on titles here and you get an idea of the views and when people are doing it. The interesting thing about this is part of what we’re really trying to do with this is to let you take control of your brand, really being fundamental about what aspects of your brand are you in control of.

For example, suppose in this list you get up to manager and you get views. Nobody above manager looks at your profile. There, you know where your ceiling is. The next part of the product is actually talking about what are things you could do to improve your profile so you are in the search results or you are found by those other people. That’s the flip side.

One of the cute things we’re able to do with this was we actually did a fun little head-to-head competition where we said you could compare yourself to other people so you let’s pull up Reid here. Let’s see how I did against Reid there. I lost. [Laughs] The interesting thing is that we said who viewed us in common. Out of that, you can gleam a lot of interesting information. How do we expose that in a way that’s really actual for you? That’s one of the things we’re iterating on. It’s a very quick cycle of how do we take a simple project and show the realm of the things we can iterate on, and then figure out what works.

Student: …

D.J.: Say more how?

Student: …

D.J.: The reputation question, how do you make sure people are honest.

Reid: There are a couple of things. One is generally speaking, when we’ve done spot checks, and it’s not been a very deep analysis, anyone who has ten or more connections generally speaking has hardly put any inaccuracies in their profile. The problem is that when I do that, I’m lying in public.

D.J.: It’s actually stronger. If you compare their resume to someone with ten connections, there are less factual inaccuracies on their public profile than there are on their resume.

0:11:59.2

Reid: I am penalized by the actual environment I live in, in terms of lying in public. Lying is much better done (whisper-this way). We also have an ability to flag a profile and that leads to some internal processes of information checking and validation. For example if you have a former employee who is now claiming your job, you hit a flag button on the site and that will actually go into a process in the backend which includes some analytics of the information and also you’ll find – this is one of the things we’ve found from the early days at PayPal. A really easy way of fraud checking is to send an email with a challenge to the account that registered it. If there’s a response, there is a good likelihood that the challenge was a mistake. If the person doesn’t respond, frequently, you’ve caught them in something, even just that gives you information.

There is a wide variety of this. We also do analytics scores about validity. There are a lot of different scores on profiles on the backend that you don’t see because we’re gradually building towards projects. This all comes out of D.J.’s group, which comes to do we believe this information. We’re trying to get to a point where we can say, “Oh, this is information you can believe because we have a lot of information on it.”

D.J.: It’s incredibly insightful when you look at the Monster resume of a person and you looked at their LinkedIn resume and what they’ve put in there. It grew the numbers of X millions of dollars. All that disappears when someone actually is going on their LinkedIn profile. It’s very honest.

Let’s go on and I’ll touch on this quickly because somebody asked about the Q&A matching and how do we recommend questions and answers, how do we recommend jobs for people, how do we get that whole matching function going.

This is actually one of the things that we work on, called the Talent Match. We take the underpinnings of all of it, standardization, putting things in the right bucket so you have categories to match upon, and we have things that go into the transition probabilities, the connectivity overlap, all those things, and you do the very simple thing here – you say let’s create a vector of a person, let’s create a vector of another person, let’s create a vector for that job or that system, that Q&A; you match across that and you do iterative improvement. Nothing here is rocket science. It’s all about the edge case. The edge case, at the end of the day, is the biggest challenge of how to make that work. Then, we have information such as regions and other details in there that go into constantly refining it, but nothing that hard.