CFP 2014: Big Data, Privacy & Measurement

x
Main point
Full script

Hi, hi. So i'm karen levy, this is tracy cosa, and we are. Hopefully you guys are here to talk about big data, privacy and measurement. If you're not, if you're in the room, don't leave now, because you're in for a good surprise, i hope. I know there's been some confusion about who's where and why they're there, but hopefully we'll have a good conversation today.

Main point
Full script

So just to give you a sense for what we're gonna talk about today- and i think we both envision this as being like a super informal audience participation kind of thing- we are both really interested in different ways in the relationship between measurement and privacy and how they kind of feed into one another, how they can be friends, how they can sometimes operate as enemies and what, how we can sort of collectively think about those concepts in the same breath.

Main point
Full script

So just to give you like the basic setup for what we want to talk about, i think we have like just some short comments and then i think we'll just open it up to broader discussion. But first we'll just kind of review some of the ways in which new metrics are pervading our lives just across tons of scales. My guess is that most people in this room are pretty familiar with all those ways. So we'll just kind of survey the landscape. Then i'm going to talk about some of my research. I should say i'm finishing a phd in sociology at princeton university. I'm also a lawyer, so i'm i look at surveillance and monitoring from sort of a socio-legal perspective and i'll let tracy tell you what she does. But i'll talk about some of my work which looks at truck drivers and how they deal with surveillance and monitoring and privacy questions, and then tracy is going to take over and talk about some of her research about how privacy or how measurement- excuse me- can be used to enhance privacy and organizations. So that's, yeah, we're different. Okay, so just like as kind of the top line thing i think we're talking about today, so we all know that big data, whatever it is, enables the creation of lots and lots of new metrics.

Main point
Full script

Right, we're measuring things that we never could measure before. Lots of people are involved in that measurement that weren't involved in those relationships before. This is everything from you know, like things that we used to have, like your fico score to quantify, just quantification, across the board quantification and insurance, this sort of actuarial frame on how we know what we know, very intimate relationships, right. So lots and lots of kind of biological quantification that never used to happen before. Stuff like you know- the fitbit, for example. You know, like very, very fine-grained quantification in education, this idea of the measured classroom, i think we should.

Main point
Full script

You know, as we design new systems for education, new accountability metrics for education, we should be measuring everything that students and teachers are doing, very intimate activities, like. I just recently wrote up a thing for iaf, the association of privacy professionals, about dating and how metrics are sort of pervading dating. So there was like a big hubbub a few months ago about this app called lulu. I don't know if people are familiar with lulu, but it was targeted at college women, and it was intended as a way for them to create scores for the, the man or boys that they had had sexual relationships with, and to score them in a way that was pretty hard actually for those males to contest or to remove from public databases. So, basically, anyone it was facebook connected, anyone you were connected to on facebook, could see, you know how you rated. There's other sexual, there's a whole lot of sexual apps. I won't talk just about the sex stuff. Something called spreadsheets, though, is one of my favorites. That's an app that you use to track, like how long your sexual sessions are in the workplace. There's a company called sociometric solutions some of you may be familiar with, that gives bit. It's really taking off. It's based in boston. They give companies these rfid tags that they give to all of their employees and those track what types of interactions they have with other employees. They can even detect things like, you know, ambient noise, the tone of voice people are using to try and get a sense for how organizations are actually interacting and, you know, ostensibly to try and build better, better teams. And then, of course, social network analysis brings with it a whole set of new metrics for how we relate to one another. I'm sorry, good question: yeah, is rfid thing? Is it voluntary? Well, it's so. It's. The companies adopt it. It's a new. They may compel their employees to do this, presumably. Yeah, okay, so all of this is to say that you know it's not surprising that we see metrics pervading all of these different contexts, lots of different scales, from self-monitoring to employment, to healthcare, to education, to government, and so with that, i think measurement is really key, and it's something that we don't talk about enough, because i think big data basically gets monetized and reusable through measurement, and i think that butts up against privacy in a lot of interesting ways and it sort of creates this new sort of paradigm for how we know things right. So you know, sometimes on the internet people say: like pics, or it didn't happen right. You know, the picture didn't happen. I think in some ways it's now like metrics, or it didn't happen right. We want to put a number on everything. Senator tracy is going to talk with you a bit about surveillance and measurement. Do you mind doing that? Yeah, yeah, sure, so i'm. I used to be a teacher and so i pace when i talk, which makes powerpoint difficult. Okay, so what i'm going to talk about is is surveillance and measurement and sort of the connection between the two, and this is really new. Karen and i were both at plsc, the privacy law scholars conference, last week, and one of the things that a lot of the law scholars were talking about is surveillance as a tool for power.

Main point
Full script

So in privacy there's kind of a schism. My background is is pretty multi-interdisciplinary, and so i see this a lot, where you've got political scientists and lawyers on one side talking about surveillance issues, then you've got political scientists and lawyers talking about privacy issues. Then you've got engineers and computer scientists talking about privacy, and then you have this whole other growing group of people in that millennial generation who are really just focused on privacy period in that domain, and then they spread out. And so in my mind i just wanted to spend a couple of minutes clarifying those distinctions. So, for example, sioux, valence and solstice, and notion of surveillance, where we're watching the watchers and sort of the government is collecting data and looking at us, private corporations are collecting data and looking at us, but then what data are we collecting and looking at them?

Main point
Full script

And and both of those surveillance concepts, susan sir, can be used in the metric space and i'll delve into that a little bit. Karen's going to talk about the truckers, but i'm going to talk to you about what microsoft's doing in terms of monitoring and measuring its own compliance and its own privacy space. Essentially, when you look at measurement- and this is where my phd work focuses- my background is poli-sci, as i said, public service, and now in computer science doing my doctorate- it's really about taking these levels and dimensions and uncertainties that exist in the privacy space, whether that's in law or in computer science, and then shoving them together to try to come up with some type of number that can represent that, and a lot of lawyers have an innate hatred of this approach. Because what i'm trying to do and what i'm trying to argue for is that computer science essentially created this correct privacy, so maybe we can use it to solve it and we can use some of the computational models that exist to represent privacy in a way that's much more meaningful for your average data subject who isn't going to read a privacy policy?

Main point
Full script

Is it going to be familiar with what the supreme court's decisions about the fourth amendment were? Is he going to understand that in a lot of cases, when you cook a hula to use a service, you're giving away your rights to everything, and i'm not so sure we should be asking them to do that anyway. I'm not so sure that the answer isn't to change the way we do things, as opposed to just making all the users really comprehensive experts in privacy. So when we talk about this monitoring function- and we talk about monitoring a lot at the company in sort of the microsoft space, but we also talk about it in my research- what does it mean, like this whole notion of monitoring is so value-laden when you talk to different disciplines.

Main point
Full script

That for me, breaking it out, is a really important point. So, to circle back to what karen said in their introduction, monitoring doesn't exist without measurement. It just doesn't you mean those days of having the private investigator follow you around and collect data and write it down on the scratch pad, or long gone. So when we monitor now we're talking about monitoring in the space of sitting back and carrying out all of these activities, of watching on a network in space with one drive collecting terabytes of data. What does that mean and how does that change the nature of surveillance? But also how much of that is available to us like. The real need for transparency and talking about these data points is so that we can watch them too. Whoever that them is okay. So with that in mind, we're going to dive a little bit deeper into one particular context which i'm venturing. I guess that most people in this room don't have a lot of first-hand experience in- certainly not something that i did before i started hanging out with truck drivers in the name of science a couple of years ago, but now i've i've been. I've been conducting a multi-year, multi-sided ethnography where i look at how truckers interact with monitoring and with measurement and how it affects their privacy.

Main point
Full script

So all together, i can tell you more about the project later if you really care, but just to give you sort of the headline, i've been, i've talked to truckers in 11 states over several years. I've talked to the companies that they work for, the regulators that make laws that affect them, the tech companies that make technologies for the industries to try to develop sort of a picture of what life is like for them under a measurement model.

Main point
Full script

And i'm doing this not because truckers are inherently interesting- although i could certainly make an argument that they are- but because, in part, because they're a really resistant population and they're a population that really highly values their autonomy and their freedom, and so they become an interesting kind of target to look at when we think about pervasive monitoring. So if you talk to truckers about why they got into trucking, these are some pictures from trucking movies of the 60s and 70s, sort of the cultural icons of trekking. And i put these up not because, like any of my research subjects, look like chris christopherson or peter fonda, but i put them up because, like these actually are still pretty relevant cultural constructs for trucking.

Main point
Full script

It's super male, it's about 95, male machismo is really heavily valued. Like, when i talk to truckers about why they do what they do, they say: i don't want someone looking over my shoulder. I worked in a factory and i couldn't deal with the authority structure. I just want to do my work in the way that i know how right, because i've been doing this for a long time. I'm a professional, so that's super duper important to them. And so what i've been focused on most specifically is the hours that truckers drive. So just very generally- and i won't bore you with the details- truckers can only drive so much, right. Federal regulations limit the amount they can drive, which is a safety reason, right. If truckers get really tired, they have expensive and deadly accidents and put us all at risk. And the measurement tool that has been in use for about 75 years to make truckers comply with the rules about timekeeping or about how much they can drive are paper log books, so you can pick up a pad of these at any truck stop for like two or three dollars.

Main point
Full script

It looks like this, basically, the measurement system: what the trucker does is draws a line using a pencil and maybe a ruler that indicates when he's been. You can't really see it. But basically, like you indicate on this graph in the middle, when you've been driving, when you've been sleeping, when you've been off duty, that sort of thing, and then at the end of the day you add up the numbers and they have to be less than the federal cap, and this can get inspected if you get pulled over, if you go through a weigh station, that sort of thing. So this is the system, right, this is what we've been doing. The problem is that truckers, like do not take this seriously at all. Right, like it's very, very easy to lie using this measurement system and like it's just universally acknowledged that they do. They call them their color, their coloring books or their comic books or their swindle sheets. There are other references- these are some trucker anthems, lots of references in pop culture to how nobody's really taking this that seriously. They're dodging the scales, their log books are behind and the reason is because they're economically motivated- sorry, can you guys see- to drive as much as they can. Right, they're paid by the mile, they have the setting or the same, which is: if the whale ain't turning, you ain't earning. So this measurement system is not connected to their economic outcomes, right, and their companies- and all of us actually like, based on the way the market works- put a lot of pressure on them to violate the law in order to move goods at the speed of business, right?

Main point
Full script

So because of that, truckers have really depended on, you know, what they see is like the flexibility in this measurement scheme to do the work in the way that they have historically done it right, and it's like pretty. I mean, any one of us could do this. You can keep multiple sets of logs. You just draw the line a little bit longer, a little bit shorter than it needs to be. And you know, frankly, like if you have been on the road for a week and you're half an hour from your destination, you're not. There may not even be a safe place for you to pull over, right, like it's in your interest to just get where you're going and then make the record, look how it needs to look. But the government doesn't see it this way, right? The government says this is the law. We ought to be making truckers follow it, and so the dot is in the process of mandating that all drivers buy, install and use electronic monitors that will be hardwired into the trucks that will automatically detect when the trucks are moving and where they are. These are called electronic onboard recorders. So you can see this one has a similar kind of display and the idea is this is supposed to be much harder for truckers to falsify, right?

Main point
Full script

It's a much more exact monitoring system- not foolproof, but more full, full, pro foolproof. So, as you can imagine, this hasn't exacted a lot of controversy in the industry. I know this is texty, but i just wanted to give you a sense for what some of my research subjects have said about this. Not surprisingly at all, like there's some big privacy concerns here, in part because drivers, essentially the trucks, are basically like their homes, right?

Main point
Full script

So this isn't even like monitoring and employment, it's more like modern, it's closer to monitoring in one's home. They say that this is like a federal babysitter, a black box, that it treats them like criminals, that it treats them like children, that they don't know the government.

Main point
Full script

Anything contrast that with the way that tech companies and trucking companies talk about it, which is that this just takes non-compliance off the table. It perfects the regulation. It's an objective and unbiased source of type of measurement, right, so it basically gets us closer. It closes this gap between the rule that we've enacted democratically allegedly- and the way that people actually act in practice, right. So this is a good thing. So i'm not going to make an argument one way or the other, but i like obviously here, like the measurement issues are really key and butt up against privacy and interesting ways, and i want to add to it one additional kind of wrinkle, which is that it's actually really hard to buy an eobr, that device that only monitors the hours that a trucker drives, right, it's almost like saying, it's almost like the government's saying: everybody has to buy a cell phone that makes calls, right, like, as we all know, most cell phones make calls, but they also do a lot of other things. They're bundled right there, a lot of bundled capabilities, and that's the case for eobrs, for these computers. So what's important for truckers is that often this eobr capability, this hour tracking capability, gets bundled in what's called a fleet management system with lots of other capabilities that are of great use to trucking companies and surveilling other aspects of trucker's work.

Main point
Full script

So i won't talk about all of the ways, because there are several, but just to give you a sense, right. So now, one thing that it's very easy for trucking companies to monitor is a truck's real-time speed, whether it changes lanes without signaling, how hard it breaks, how hard it accelerates if it goes off of a pre-specified route, how much it idles, which is really key for them because fuel usage is a big cost driver for them. Some of them have cameras that are hooked up to it that watch what the trucker's doing and also the space around the truck, which is supposed to be for critical incidents, so you can see if there's a an accident. You can like look at what the trucker is doing, if he's closing his eyes or something, so all kinds of stuff, just to give you a sense for this right. So this is like the biggest vendor. This is a picture from their website, from their brochure on their website, so you can see. Maybe you can't see, but this is the hours of service module, this is the thing that's legally required and these are all the other things that you can do with the same piece of equipment. So, and then one of the things that it enables in terms of measurement is the construction of driver scorecards. So basically, like if you're a truck driver before you did your work and like that was that basically right, like you would report back about it later, after you've done it in the way that you saw fit. Now we can construct all kinds of very detailed scores about how safely you're driving, how efficiently you're driving- which is the big one- all kinds of things that the companies can customize and then at the bottom here you can see. You can also then compare drivers within and across fleets, fleece and in order to foster some sort of workplace competition, mostly around things like fuel consumption.

Main point
Full script

So what happens is that- and i won't go into this in a lot of detail- but basically this has totally changed what it means to be a truck driver. Right, it's totally changed trucking work and i'm not making a normative, i mean, i am somewhat making a normative argument. I'm not arguing that this is like a totally misplaced thing, or that trucking companies don't have an interest in this data, or that the government doesn't have an interest in making people follow the law, but it's worth considering how this measurement changes, changes what trucking is and what truckers do. So there's- i won't go into all of these- but essentially it takes all this knowledge that truckers used to have because they were there, right, like they knew how tired they were, they knew what road conditions were like, where they were.

Main point
Full script

Nobody else knew that. And now it gives dispatchers the ability to challenge their accounts. So one thing they can do, for example, a trucker can say you know it used to be the charger can say i need to take, you know, i-80 instead of i-90 because i-80, because i i-90 is too snowy, right. And now a trucking company can say like, oh, i can see i have five other trucks on the road there, so i know that that's not true, right. Or before, a trucker could say, oh, i'm really tired, like i really have to take a break. And now a trucking company can say like, well, you shouldn't be that tired because i see that you know, you've only been on duty for four hours today, right? So there's a big shift in what types of information is valued and used. And then, as i said, so like not only does this kind of divorce from local conditions what knowledge is useful, but then trucking companies also reinsert it back into the social world, and they do that through that ranking. They'll post these lists of drivers scores up in kind of public places in their home offices, and they will sometimes attach some small financial incentives to it. So if you're performing well, you know, maybe you get a gift card. My favorite thing that a trucking company does in my data is they'll send small checks to drivers wives, when the drivers drive in a way that conserves fuel. And the idea, right, is that if the wives start getting the checks, you know that they'll like really enforce the organizational rules against their husbands because they'll come to expect them every month. So it's kind of a sneaky way to bring metrics back into the discussion. Okay, so that's enough about truckers. Now tracy's going to talk with you about her own work. Okay, so, from the other side of the fence, let's go to the first slide: what microsoft is interested in doing. Oh, i'm so glad the animation worked. That took a long time. I love powerpoint. So what microsoft's interested in doing is saying, okay, so when we talk a lot about metrics in the space of individuals and what do data subjects actually expect in terms of privacy? And? And doing that work is of keen interest to me and that's my phd focus- but what microsoft hired me to do was say, okay, what does our program look like? Like we spend all this money on compliance. We have all these people working in compliance. There's over 300 that are dedicated to privacy and then a whole bunch of other people like lawyers, who are functionally classified as lawyers but may specialize in privacy. So we've got tons of resources working on the problem, but what does it actually look like? And so about a year and a half ago, i started working there and one of the first things that we looked at was: well, what does the actual program look like?

Main point
Full script

Because if you're going to set up metrics to surveil the organization, we need to talk about what does the organization look like? So a hub and spoke governance model is essentially what microsoft has. We have this sort of giant hub that exists with less staff and more staff out in the spokes, and the spokes tend to be responsible, like the privacy people who work actually with people who build products and services and devices. They tend to be more responsible for responding to audits, to work directly with an engineer, to work directly with a lawyer, to talk about a specific country's requirements, like kind of the.

Main point
Full script

The face-to-face folks are considered in the spoke model. They also manage incidents, they conduct privacy reviews, they do a host of other compliance related activities. But back here at the hub- and this is where i am actually- i'm behind all of these people we talk about. What is the uber corporate compliance requirement? What does the policy look like for the company? So what is the floor basically for meeting privacy requirements? And then we also look at how do we create tools for that, how do we communicate to folks about that and how do we create awareness about that.

Main point
Full script

So, given that that's the background, in the framework, we have to build measurement and metrics for privacy in and around that space. So we did that. Sorry, that should be great on target. Not intended to be a pun, but it worked out well. So the project, the program measurement lifecycle that i adopted is basically just project management, because i'm really not a big fan of reinventing the wheel or drawing it, as you can see.

Main point
Full script

So essentially, it's five stages, right. What do we want to measure? How do we set targets for measuring that stuff? How do we collect that data? How do we analyze it? How do we report on findings? So this was my first foray into the private sector as a die-hard public servant in canada for the past 15 years or so, and i figured this would be a piece of cake, because it's an engineering company, right, so they thrive on data. It'll be a piece of cake. That is a huge mistake. We spent a lot of time talking about what the requirements were and a lot of time talking about analyzing data.

Main point
Full script

If you're gonna create a measurement program for a corporation, don't do it that way, because what? What inevitably happened was we spent a lot of time driving to what data we already had, which you know at the organization level. Maybe that's okay, but when you start to think about it from karen's side of the picture, you start to see how it drives data. It drives action. Right, the data drives action. What we're going to measure drives the action. It took a long time to talk to my colleagues and to say it isn't about the data we collect, it's about what we want to collect. What do we want things to look like? What's the question we're trying to answer? Okay, so to that extent, there's actually three buckets of questions. So there's the issue of compliance, which every giant company, especially in the iit space, is going to want to know about, and that evidence of compliance is for at least four different audiences, right. There's the regulators who show up at the door and say, okay, microsoft proved that you're doing privacy. There's the press articles: it's been an exciting year. There's the researchers who publish papers and talk about different services or try to poke holes in different things or raise vulnerabilities. And then there's, of course, audit- internal and external, heavily regulated. So we have a lot of audit going on. But there's two other areas of interest to me specifically and i haven't driven these yet with the company, but they're important ones. One of them is about data driven decision making and business planning.

Main point
Full script

In the compliance place it's really easy to say: we're driving force, compliance, that's the business plan- great, okay, we're done. Business planning: let's see the next thing. But when it comes to privacy, it's not that simple. Like what does compliance mean? 100 compliance with every piece of legislation in every country that applies to every device and service and product that we sell. I don't think i'm talking out of band when i say that's probably not going to happen, but i don't know that that's the right thing to happen either. I'm not sure that 100 compliance actually meets the expectations of the data subjects. I mean, i think we've seen throughout the conference today and certainly last week, that a lot of folks aren't familiar with what their privacy rights are and clearly their expectations and notions of trust are coming from somewhere if they're not coming from the fourth amendment or a specific piece of legislation, then where are they coming from? And that's really of interest to microsoft, because what we want to do is establish trust with our customers, so we need to know what that means. The other piece- and i think this is coming, but i'm trying to do tea leaves here- is to demonstrate actual impact. Microsoft loves this word impact. They have their own language, like starbucks, and that's one of the key words. But a real interest here, frankly, sooner or later, as every privacy practitioner knows, is going to be the call to account for what you're doing. So i'm sitting here working in the privacy space. Some days someone's going to come up to me and say: can you justify your salary to me and i get any data to do that? Now i haven't, i've been asked that question at microsoft, but i expect at some point in the future, as privacy practitioners, we all will be asked that questions and i'd like to have an answer before the question comes. So that impact piece is important and i think it'll also go into unblocking sales, but anyway, so okay, so that's fine, it's okay. Next slide is good. So what do we have for data in this space? And again we're talking about kind of that soos valence level. So this isn't about individual data or individual data points. This is more about the company and a program, and so i thought the best thing to do would be to categorize it. I can present you with lists of metrics. We've got like three 400 data points written in an excel spreadsheet. But really what's more interesting is the categories, because, as we know, we're going to drive to the categories right. So, first and foremost, we have the policy space. We can collect all sorts of metrics on policy. How many policy clauses are there? What is the change management associated with the policy clauses? Are the changes big, little or small? Where are they coming from? Tons of data to be derived there, but the key ones are really: how many reviews are happening? Ie, we want to develop something.

Main point
Full script

We want to roll out, product x. Has the privacy person in that group looked at it, reviewed it and allowed any exceptions and, if so, what kind of exceptions? What's the risk associated with them and what do they look like? So those are really two important data points. Coverage and capability and capacity is microsoft's nice way of saying.

Main point
Full script

Do we have people and do they know what they're doing? So, in other words, are we training them? Are we training them on privacy? Are we training them on related concepts to privacy, like data access, requests and inquiries, and how to respond to just your average question from your average person in the public about privacy?

Main point
Full script

Do you have enough coverage and what does enough coverage mean? Like lots of companies need to have cheap privacy officers, that's great. I mean, in canada it's actually a legal requirement. You have a cpo, so we have a cpo. Is that coverage for a company that has 153 000 employees at last count? No, one person isn't going to cut it, so we need to figure out what does that mean? How many bodies do we need? What percentage of time do they need to be spending on privacy and what does that look like?

Main point
Full script

So there's lots of metrics there. Satisfaction is another important one. How satisfied is the privacy community with what's happening in company? That's kind of a new metric and it's one we're just playing around with now. So are you satisfied with the way we're actually running things? This is an interesting metric, because privacy people- well, in my experience they tend to be quite unique, especially practitioners. They didn't go into it for the money and fame- although there's lots of that, i guess. But they tend to be advocates, and so looking at the data that they self-report is a really kind of neat little space to get into communications. Obviously, how often do we talk about privacy? How do we talk about privacy? Who do we talk about privacy too? Then we've got incident stuff. So, of all the incidents, what severity are they?

Main point
Full script

Where did they happen? How long did it take to resolve? What products and services did they impact? Similarly, for audit, same kind of data is available. What are we auditing? What do those findings look like? Who's coming in? Who's internal? What's the ratings on all of those? Maturity is a little bit more subjective. So is anybody familiar with the gap principles on maturity gap, the trading model- yeah, jason, okay, so get basically published this model that anybody can use. So i took a step at applying it at microsoft on release four of it, and it's kind of interesting. It's basically a way of looking at how your privacy program is able to respond to the environment, not whether or not it is responding, but just the actual maturity of it.

Main point
Full script

So do you have the required capabilities in place to respond to a privacy incident and, if so, how sophisticated are they at the operational level. That'll be more interesting. But we're not there yet, okay, so click through, because there's a bunch of little. Oh, there we go, perfect. So this is some of the stuff we've managed to put together. This is all dummy data. Sorry about that, but approvals for releasing real data would have been a nightmare. So essentially, we look at maturity in this space. Microsoft's also a really big fan of red, yellow, green, so you know red tends to be. You gotta love the predictability of engineers, so red tends to be down in the not so mature process. Now that may or may not be associated with a higher risk. Just because you have a less mature process doesn't necessarily mean it's not good. In fact, at companies like microsoft, where there's a lot of lawyers, sometimes that's fine because we can just bring them in when we need them and then send them away when we don't, so you don't actually have to have a really robust mature process in that case. The green tends to be more at the other end of the maturity scale, so where you represent a best practice or it's something we're sharing with the industry or other companies have adopted, so there's a nice range in there. We have in-house dashboards- tons of them. Microsoft loves to build dashboards, so these ones will be more representative of specific data points. Then, of course, you have just super fancy excel diagrams.

Main point
Full script

Spider graphs are a really good way of representing an entire picture in the company. Anything close to the middle is less mature, anything outside is more, so you can see a nice little cluster in the middle which is actually fairly accurate. Lots of pivot tables and reports which took a long time to develop, and then something like this is, i think, what we're driving for eventually, which would be: this is your entire privacy program in one space, and this summarizes incidents, people, training, coverage, any other metrics you might want to put into one kind of snapshot. And the reason i like this is because most of the time, if you think back to the hub and spoke model, you've got a privacy person working for an engineering leader in a product group. They don't know anything about privacy. What they care about is: how does this impact my bottom line and what do i need to fix? So we need to present them something that says: here is everything that you've got now. Here's why this might be more important, or here's what we want you to do here, and all of the reports really have to drive to that and again to circle back to the trucking industry. It's remarkable how much the measurement drives action, but then also how eventually you get to the point where the action is driven by the measurement. It's. It's like we say, if we're metric, if we're metro sizing category a, b and c, then all of a sudden we're saying a, b and c are the priority and in the organizational space, maybe that's okay, maybe that works out okay. But from the other side of the fence, when you look at some programs like no child left behind or the canadian version in pqao, it's a little bit more worrisome because, as karen mentioned, you're taking the person out of the middle. You're looking at the organizations, surveilling its activities or having your activities surveilled by an organization, but there's no interpreter in the middle, there's nobody qualifying it in the middle. So that's what we're trying to accomplish with these reports. So that's it. Do you have any questions? Oh, wait, wait, hold on, we have. We have a couple of questions. So we are kind of just to kind of lead the discussion, but we don't have to go this way, which are sort of just the overarching themes from what we've been talking about. Basically, you know, i think it's obvious that measurement is a tool, right, it's not an unequivocally good or bad tool. It depends on the what the tool is used for, right.

Main point
Full script

And so i think, with that in mind, it's it's worth asking how measurement can be empowering, both on an organizational level and then maybe on a personal level as well, and whether there are things that we ought to just not measure or quantify for some ethical reason or for some political reason. And then i also just wanted to kind of frame the discussion a bit with this quote that my friend sullen baroques introduced me to recently, which i think is just like a fantastic statement of what this problem is, and so i'm just going to read it to you- which i know is bad presentation style, but i'm just going to do it- which is that perhaps the biggest threat that a data-driven world presents is an ethical one. Our social safety net is woven on uncertainty.

Main point
Full script

We have welfare insurance and other institutions precisely because we can't tell what's going to happen. So we amortize that risk across shared resources. The better we are at predicting the future, the less we'll be willing to share our fates with others. So, with that in mind, that's all tracy, and i have to say so i'm curious what everyone else thinks of questions truckers from cab drivers, and 30 years ago in new york they implemented a system where the passenger sits down with the intercept. That's because they were waiting to stick up in the middle of the night to make more money, and all the old care buyers. You know that'll be okay. And the young guy said: oh, we'll lose our money within a week. At every truck stop you could buy a magnet that held the contacts over. Oh yeah, what's the magnet for this? Is there a way, when you raise it in another session, by feeding it disinformation? Can they do that? How hard would that be? Yes, i have like a 50 page paper you can read about this.

Main point
Full script

Yeah, i mean, basically there's another huge number of things i've catalogued about like more or less 20 different things. Some of them are technical things, like you know, trying to break the fuses or block the signal using. You know you go to walmart and you get a big metal bowl and you put it over the transmitter. Well, i mean sometimes, but to some extent, the companies actually turn the other, turn the other way because they know like this is the way business gets done. So there's also back office data, edit stuff. There's ways to like not trigger the threshold of the monitor, so like if you drive 14 miles an hour, like it won't know that you're driving. So there's all kinds of things. It sounds like truckers are gonna start becoming very good friends with hackers, yeah, so yeah, i mean like it is hacking essentially. I mean there's a, i think, a big conception among most of us that truckers are not like a particularly technically savvy population. But of course they are right. Trucks are really complicated technology. So, yeah, i think that's totally foreseeable. Yeah, so i have a question about if you measure the side effects of environment on the truck by the driver's line, like if you find control that works so much- and i used to have so much freedom- all that stress does it get out in another way?

Main point
Full script

Having my wife get in the chat for my good behavior might stress me a lot. That might damage my marriage, kind of things, because you can make this, you can make, you can improve one side, but then you can not measure something else and that could get really, really bad. Oh, how does it work? Are you measuring this kind of side effects in the society or you stop on company? The corollary of that is: well, different kinds of people become truckers. Yeah, right, yeah, i mean. So it's a great point and you're right, it's not something that we can really measure, measure in a objective, quantitative fashion, right, which would be like that would be a lot of loops, like rate of divorce going down in new york inter- i haven't, i haven't actually looked at that. I know that there are companies that actually track that sort of thing in order to predict, like, whether trucker's gonna be in a crash. Like that gets worked into their models too, so it's on infinite loop. But i will say that i mean certainly effects, like a lot of guys say, especially the old guys that have been doing this for 50 years. They own their own trucks like this is just what they do, you know for that, like there's a lot of exit right, or there's a lot of threatened exit, like this is the day they make me do this is the last day i drive a truck right, or if my company starts doing this, i'm quitting companies which they that they do. And then younger people who i think are more accustomed to being electronically surveilled- maybe don't like it, but are more used to it. Those are the guys you're going to have entering the industry and those are also the guys who are the least safe, right, because they have the least expertise and the least experience, and that is also. That's also something that will be hard to measure if there's an actual safety impact. Yeah, you could design an experiment. I haven't done that, but you could do that through your studies with, with the trucking industry and the surveillance. Have you come across any topics or concern about discovery when there's an accident? For example, sure, and i see this from personal experience, i was driving in from fairfax county today and i passed a crash site that i experienced personally two years ago, where i was, my family and i were rear-ended by a truck trucking company, so i had to pass the accident site on the way here, and so that information or this type of information from the perspective of we kind of think of surveillance as a bad thing, but i mean there's some really unsafe drivers out there, absolutely yeah, trucking well, and drivers in general, drivers in general, right, and all of that, i think, comes to legal use.

Main point
Full script

I mean it's like for the tracking. It's those are business records, right, so they're usually admissible, you know, as as evidence so which cuts both ways right. So the companies: they often get insurance breaks if they install- i mean, right now it's voluntary still. So the companies install them because they get insurance breaks, but some of them don't want to install them because of those legal liability questions too. Yeah, where did we find the paper? Because i read about it about a year ago and wanted to read it then. So where is it about truck drivers? Yeah, i can, if we can talk later, or you can go to my website, which i'll talk to you later and i can give it to you. I actually really love that quote and i think it basically represents something that i find really interesting about the big, big data discussion that i've only last week talked at the ftc about, and that is the unintended consequences and the invisible long-term harms of this related election that we really do not yet very much know about. And i think one of the things that i i really find interesting here is, i mean, with with the truckers. It's a very obvious thing. You know, i'm an academic. I have good days and bad days. If i have to write an article and i give myself three days and i have one day where i've been really crappy, then i know that i'm capable of doing it in the remaining two days and i've had something monitoring me all the time. That means you have to do eight hours a day one, eight thousand day two and eight hours on day three. I'd lose that flexibility and i would imagine that, with regard to something like tracking, there is exactly the same kind of thing. You know you have the ability to have a longer coffee break and make up for it, kind of thing, and i know this sounds really silly in a way, but i think this flexibility is part of being human and i think we are losing this kind of being human. We are losing this kind of self-determination about our day-to-day life with a lot of these tracking technologies. So when we started to install these things- and okay, that was actually the first thing- i thought it was like, all of a sudden, your employer knows everything about you. They know where you are, at what time, how long you might break and all that stuff. And what this really does is it increases the power that the employer has over the employee in this context and this power imbalance that we are looking at. That was always there. This is where the information comes in, you know, or where who has access to information comes in really handy. The other thing is about this one, and that is the effect on social solidarity. I mean, i've looked at this- i'm german living in the uk and i've looked at this a lot- about the very the increasing individualization of risks that we're watching in the uk. One of the one of the areas where this has happened recently was a european justice decision which was basically about gender discrimination with regard to insurance, where the uk was very upset about this because what it meant was that uk women no longer could get cheaper car insurance. Okay, and i come from germany and the one thing in germany where medical insurance is partly privatized- okay, and the one thing that that means is that if you're a woman, you pay higher insurance premiums because you get pregnant- okay, and never mind that you don't do that on your own and never mind that children are, you know, kind of like a societal. You know joy and obligation. It's the women who pay for it in terms of insurance, and i think in some cases, when we're looking at this into the individualization of risk that big data allows us, completely destroys what kind of social solidarity we have accumulated over the last few years and we have actually not yet figured out what this is going to do for our society. You might like to take a look at stephen martin's work on computational trust, because he's starting- or he started about 10 years ago- to look at how we replicate the social solidarity in psychology and in social psychology, in computers and using agents to represent the same relationship, and so it might be a little optimistic. Are you the name against stephen marsh marsh? Yeah, regarding this, i'd you make a like based on the data? So if we measure, we shouldn't measure only how surveillance improves the company's prospects. It should measure the statistical effects of, say, in order to make a good decision, because you make decisions based on the data. So at least get all the data, not only the economic impact, but also the social equipment. So i would agree with you, but i think there's a tactical problem with that, which is it means that the project never ends and the scope never ends, and you can't actually get to the point where you're crunching and analyzing numbers. So one of the issues we had at microsoft and similarly in my industrial research, is that you have to eventually close the box and be able to say: this is the data set we're going to work with. We will analyze that and then we will go back and look at if there is some sort of bias in that data, if there is a problem with that data or if in fact, it just doesn't give us what we want. So i i think that's a good point for the long term, but in the short term the projects need to be scoped much more narrowly. In the research phase. It should be more broad. As a company, as microsoft, where you have us is you are working for profit and you want to make a cert. You have some setting goals. It makes sense. But for research like the truck drivers, i think it should be a little bit more broad. I'm not sure if this research comes from our money or if it's sponsored by a company, but it comes, from my mind, for my tax money. How would they spend more money? Yeah, i mean i'm i understand. I think it's a great point and it's not one that i've heard before, which i mean i like it a lot. But i think you do run into a catch-22 right if our concern is overly over monitoring and over-tracking and over-surveillance and the solution to figuring out if we're doing that is to surveil and track more. I mean it feels like a little bit, yeah, circulation by creating many demands. I mean at that point you need to test more and you'll have more casualties. But in the long term, long term, you'll know that this is the advised medicine that you need to take and you can't say this is the advice medicine without doing real experiments. So there's somebody behind you as well. I'm just i'm curious about this. It seems to me that part of what will happen here is that people will do as you say. They will get up from under the surveillance in a variety of informal ways. Because this, our systems. These rules are always built on the assumption that they couldn't be enforced. So now that they can actually be enforced, they don't work real well. People have to fight. Are you able to quantify in any way? How much sidestepping is it? 20 of truck drivers do it routinely, like, is it only at key points? Do they pick their spots? Are you talking about existing launch sheets? Yeah, well, i'm talking about. No, i'm thinking about the new. You're talking about the electronics. I'm talking about they're. I sounds like there are ways to beat the system you have any sense about, like, how often they do it. Yeah, that's a good question. It's a good question. The answer is: i don't have that metric. I wish i did. It was difficult methodologically because often if i ask truckers like, oh, what do you do to cheat legally, right, like how do you break the law, they would tell me, well, i don't do anything. And so i actually stopped asking that question and i said like, well, what do your friends do? Like, what have you heard? Yeah, which is like: then i got lots of stuff, but of course, like you know, that's not, it's hard to know. Basically, i mean, you can, i've talked to some companies and gotten a sense for how widespread certain practices are within that company, but i don't have a more global answer. Yeah, there's a fascinating study of the police academy where they give a scenario. You come upon an accident, a girl is dying, she's screaming in pain, she dies. The parents arrive and they say: did she suffer? Would you lie? Would you tell the truth? And police recruits- 90 of police academies- say no, i'm sworn. I have to tell the truth. I would say no, she was screaming in pain. Ten years police tenured officers- it flips around only ten percent of television. But it's not because their behavior changes, it's because of the nutrition. The ones who couldn't bring themselves to deal with this leave the police force. So what they find is that only the ones who can adapt. Okay, so that might say that you're going to stay in truck driver. That's great, that's what that's. Yeah doing would be great. I was going to talk more about the matchcom and you do, and all that data, because in the case of the truck drivers, they are not willing to give up the data. In the case of dating web sites and sex websites and things like that, people are very willingly giving up very personal data. Have there been any lawsuits, any legal problems? Have there been any accusations of either manipulation- i mean i'm not user manipulation- but of of people misusing that data? I mean, it seems like a right. I didn't even think about it until you mentioned it, but there's so much data hell that it could really be used against somebody or or, for that matter, exploited for some reason, very vicious marketing. So has there been any studies, lawsuits? You know, i don't know of any lawsuits. I know with lulu in particular. I don't know if the people know lulu, it just was, it just made it big a few months ago. Yeah, it's. I think it's just a little trendy thing, but it. What's interesting is like people give up a lot of personal, in that context at least, people give up a lot of personal information about other people, right, not about themselves, right, and that is a design choice, right, like it was up to the engineers at lulu to say: you know, it's not, it's not an opt-in for the subject of the evaluation, right, like they can, there is a there was well, okay. So i think, maybe, to answer your question there, it was very hard for men on lulu to make their scores go away, right, and i think, in response to a lot of outcry about the types of issues that you're talking about, like they did, i think, recently make it much easier, so that now i don't know if it's entirely an opt-in model, but it's much easier to have your score either divorced from, like, your facebook profile, which is what it was before, or just to to make it disappear entirely. So i don't know if there have been legal challenges, but certainly there's social justice not to do with sex, which is way time. Teachers, oh, yeah, yeah, have there been problems around that? I don't know. That's my question. Yeah, yeah, and rate my doctor in particular. There have been lots of things. Yeah, same thing. What would be the basis of the lawsuit? I'll tell you better. Okay, there's some yeah question in the back. Yeah, very good recommendations for most of you. With regard to your comment, microsoft wants to earn the trust of the customer. One of the things that i thought was conspicuous about your metrics was how do you connect that to gaining trucks or- and i think you may have been alluding to it in your comment- about unlocking sales, but at the end of the day, it's all about creating wealth, making new revenue. How do you connect all of those functional metrics to the business of selling more software and services, because customers trust the privacy structure? So that's a terrific question and thank you for asking it. It was the same question that i asked on the first day, because i assumed they'd done all this stuff already, but they hadn't so. Until we got to the point where we knew, for example, how many privacy programs we had at the company, or what jurisdictions they were operating in, or what legislation they were dealing with, or what devices, products and services they were dealing with, we thought it would be unwise to start talking about. How do we quantify trust with the customer, since we can't quantify the business of privacy at the company yet. So we're starting there and i'm hoping a revised timeline- a very well revised timeline- means that we'll be able to start that work in the latter half of fy 15, which for us starts on july 1st. So i'm hoping next year i'll be back to talk about that piece, leveraging a lot of steve's marsh's work on the computational trust algorithms. But that's, that's a plan right now. Yeah, so just just to speak to that. That's kind of interesting. So basically you're you're saying that microsoft is coming at it from what they think privacy looks like, rather than starting from the position of saying to the customer what is privacy or what does trust and currency look like to you? You tell us that and then we will design our systems to measure whether we're delivering it to you or not. I think a better way to go about it. There's actually two things happening. So in my space i'm coming at it from what is the legislation and what are the policy requirements in a given place. So i'm very much the top down person. But there's five or six other people who run trust centers for products and services at microsoft that take that approach that talk about what are the customers needs, what do they want, what are their expectations and how do we surface that? So that work is already up and running and if you want links to that, i believe most of it is public, so you can shoot me a mail and i'll send it to you. But but the two things happen at the same time. As often as the case with a company this large, there are multiple initiatives. Mine is specifically in the policy space, in the governance space, at that hub of the governance model. So my approach is dominated by the policy top-down structure. Yeah, my interest is really driven by. That relates to the trucker's case study as well. Is that, this notion that measurement is objective? Yes, of course it's not. I totally thought you know if you're, if you're designing a measurement system or some form of metrics to clean some information, what you're going to get is entirely skewed by what, what you think you want to know. So you really can only do it effectively. If you're asking the question is not self-referencing. It has to be external to you, otherwise it's a self-referencing exercise. I totally agree and i think part of the reason that we structured things this way, at least from the corporate perspective, was if, at the end of the day, as long as they let me run the program anyway- what i'm interested in is: what's the delta? So if the customer's expectations are x and we're doing a because that's what the legislation requires us to do, then how do we bridge the gap? And my assumption, erroneously, when i came to the company, was: we already knew what a was, but we don't. So there are other people in the company who are looking very carefully at x in terms of how, how do we manifest the customers expectations for sales? Right, because it's all, as gentlemen said in the back, profit driven. So they care very deeply and they work on that all the time. But i can't do the delta until we've measured what we've got in place already, and so that's where we're starting. But i i couldn't agree more with you. I think measurement is absolutely subjective. So i think at this point we stand between you and lunch. But thank you very much. I think this was a lot of fun for us.