A federal judge today ruled that some of the NSA’s broad, warrantless collection of data from American citizens, particularly of so-called ‘metadata,’ which includes routing information for phone calls (what phone numbers have been in contact with each other, and so on.  This can be very damaging!) – did not violate the constitution.  This ruling contradicts an earlier federal court ruling that the data collection was unconstitutional, and the issue seems likely headed to the Supreme Court.

If we set aside the details of this case for the moment, it seems to me that an important set of issues around big data is emerging.  That is, to be succinct, that the concept of ‘privacy’ is absolutely no good at all in slowing it down.  I don’t, however, think that the problem is either the frequently announced ‘end of privacy’ or the so-called ‘privacy paradox’ (that people say they value privacy but then act as if they don’t).  Rather, I think the problem is more basic.

Today’s opinion correctly reports a fact about privacy law: if I voluntarily disclose information to a third party – any third party – I lose any Fourth Amendment claim to privacy over that information, no matter how many times it changes hands afterwards.  That’s a problem, one well captured by Helen Nissenbaum’s work on privacy as ‘contextual integrity’ (or see the original paper here), which argues that moving information out of one context and into another can very well change the appropriate norms for sharing it.  But the pairing of ‘voluntary’ and ‘information’ also suggests that I have some cognizance of the semantic content of what I am sharing.  I may not know why information is valuable to someone else, but I at least know what that information is.

Big data challenges that.  We have no idea of the meaning of this material we are providing as we go about our daily lives, or even that it is meaningful: we are providing data, not information.  The NSA case is exemplary:


it’s true that we know what our phone number is, and presumably the number we are calling, but the significance of those two numbers emerges only when they’re part of a much larger analysis.  In that sense, the status of my phone number and the connections it makes as “information” is (for lack of a better term) an emergent property of the data analysis, in the sense that it has no significance and no relevant meaning until after it’s analyzed with other data.  Consider the case where I frequently dial Mr. X, and Mr. X is closely tied to a terrorist group.  That would get me on a watch list, and probably even the attention of some human agents – but only if the NSA’s computer also had knowledge of X’s ties to the terrorist group.  If the computer lacked that knowledge, my connection to X wouldn’t be represented to anyone, and the system would proceed without ever flagging me. The information that I have suspicious ties to a terrorist group simply would not exist.

One could certainly object to this example on the grounds that the NSA’s lack of other information doesn’t demote my phone records to mere data.  After all, I still voluntarily release my metadata.  But there’s a lot of examples where it’s not clear that I voluntarily release information, as a good deal of what big data does is mining: it generates new, emergent information from large data streams.  If nothing else, there's a lot of material that is going to be 'data' when viewed at one moment or from one point of view, and 'information' when viewed from another.

In short, a meaningful notion of information needs to include some sort of semantic content, and probably some de minimis level of significance.  Most of what makes big data so powerful is its ability to generate new information from vast data flows.  Most of what we give to big data doesn’t meet the bar for information, at least not at the time we give it.  It only becomes information in conjunction with other data and after a complex analysis.  So privacy is late to the game. Every time.

Posted in , ,

6 responses to “Big Data: Why privacy is no help at all”

  1. Roberta L. Millstein Avatar

    Just to be clear, you mean the legal meaning of privacy, right? Perhaps we need a broader notion of privacy than the legal one.

    Like

  2. Gordon Hull Avatar

    I actually intended the argument more generally, although I didn’t make the case here. Let me announce a caveat, and then make the general argument. The caveat is that nobody quite knows what the concept of privacy means, and it’s pretty routine for articles to start with a lament to that effect. There’s even an established line of scholarship that says, more or less, that the concept is too hopelessly muddled to be of any use at all, etc. So with that in mind, most notions of privacy say either (a) that there is some class of information that is sensitive, needing of protection, intimate, and so on. This is what’s going on in the Court’s right to privacy cases in Griswold and its progeny. The problem here is that most of what big data hoovers up isn’t intimate in that way. (b) privacy is about controlling access to information about ourselves. One of the things I think Nissenbaum gets right is that information can’t really be evaluated outside of its context, and so an information disclosure carries its context with it, as it were. So it’s appropriate for me to tell my doctor all about my health, but not appropriate for my doctor to then share that with golfing buddies.
    My thought is that both of these approaches assume that “information” is what we disclose, but that “data” comes much closer to getting at what big data collects. Sometimes, or even often, information emerges only in the context of data analyses, and because the information is emergent, it’s impossible to know that there is information in our data, much less what that information is. So there’s no way to attach a notion of voluntary/involuntary disclosure, since the concept of volition is going to have to say something about at least possible intention. Of course it would be possible to include in the concept of privacy everything that big data collects, but at that point I don’t see that privacy could do any useful conceptual work, because either everything about one would be private, or you’d have to develop a criterion for distinguishing what should be private and what shouldn’t. And that’s the original problem…

    Like

  3. John Smith Avatar
    John Smith

    it seems complicated. if I write on this blog, then it is public, not private.
    If I send you an e-mail, it is private.
    But I really do not understand these things very well.

    Like

  4. Gordon Avatar

    Morally, yes. Legally, correct about the blog. Not so much about the email. The Electronic Communications Privacy Act theoretically protects email from being intercepted in transit and imposes a warrant requirement on authorities who want to access it. But there’s endless loopholes, even if the NSA isn’t just ignoring the law (and there’s decent evidence they are). Email more than 180 days old isn’t protected by a warrant requirement (it can simply be subpoenaed, per the Patriot Act). ISPs often cooperate with law enforcement in turning over emails on request, and typically there’s some sort of ‘terms of service’ or ‘license agreement’ to the effect that you authorize them to turn over your emails whenever government makes such a request. If you work for the government (at any level), your email from your work account is almost certainly part of the public record and can be FOIA’d. If you work in the private sector, your boss can pretty much do whatever she wants with your email through their server. Oh, and the recipient of an email can share it freely.
    In any case, all of that can be conceptualized and discussed in terms of traditional theories of privacy. The question the NSA cases pose aren’t about the email or the content of the phone calls – they’re the so-called ‘metadata’ that come with them – the records of what number your phone number called, and when (etc.). It’s unclear (to me at least) whether the concept of privacy is going to be able to handle those sorts of cases.

    Like

  5. Robin James (@doctaj) Avatar
    Robin James (@doctaj)

    From my perspective, Gordon, you’re hitting on a lot of important issues here. The data/information one is key (it also reminds me of Nate Silver’s “Signal & Noise,” how he frames the problem of big data as the problem of what you might call filtering–how do we know when something is data, or when it is information?).
    I also wonder how your discussion of information as an “emergent property of the data analysis, in the sense that it has no significance and no relevant meaning until after it’s analyzed with other data” relates to Foucault’s idea that biopolitics governs populations? I may be wrong, but it seems like the two are operating at the same “level,” so to speak. A datum cannot be information; only data can be, and only, it seems, vast amounts of data, really. So information emerges from analyzing a “population” of data. Information is what emerges from relationships among data sets/points. I’m not sure how this relates back to privacy, though. Is privacy an individual-level concept?
    There are also, I think, some interesting conceptual parallels between privacy as a concept and, well, what I guess you could call the operative hermeneutic the NSA is working with. Privacy, at least as I understand it as a philosophical concept (which, admittedly, comes mainly from feminism) operates with some foundational binaries, e.g., inside/outside, private/public, etc. That parallels an interpretive regime or hermeneutic that is something like what Ranciere calls the aesthetic regime of art–it’s also grounded in these foundational binaries (inside/outside, surface/depth, appearance/reality). But the NSA-metadata surveillance isn’t interpreting data for its true inner content; it’s algorithmically processing data to generate emergent properties (right, it’s not ‘finding’ properties; the algorithms manipulate the data to take specific/preferred shapes). I sorta talk about this in the post I mentioned to you earlier (which is, for everyone else’s benefit, here: http://its-her-factory.blogspot.com/2013/06/on-prism-or-listening-neoliberally.html)
    Finally, it’s really interesting that you say privacy is “late.” Can you say more about it being “late”–it seems to me like that term is doing/could do some conceptual work that’s not explicitly stated in the post.

    Like

  6. Gordon Hull Avatar

    Those are all interesting points… I certainly think you’re right to make the connection to biopolitics (which is where I go in the follow-up). My reading of Foucault on that point emphasizes what one might call the techniques of representation involved (e.g., statistics), which then simultaneously constitute and manage their object populations. I suspect that the data/information heuristic is just that – a heuristic, and that we’re basically running around with the idea that things we see with our naked eye are somehow per se represented, in contrast with things made visible only by machines. Of course, the naive realism is just that, and so where that thought is going is to use the information/data distinction partly to undermine the idea that there is anything that is information by nature.
    The parallel hunch is that privacy is an individual-level concept. I don’t know if that proves we need a better idea of privacy (as Roberta suggests; there are a couple of people who talk about privacy as a social value: if we all knew everything about everyone, society would grind to an ugly halt!), or if we should be looking for a different concept. Whatever the concept is will need to be able to navigate the population/individual division in terms of how we are represented (privacy might create zones of un-representability, because one of the main issues, however one wants to call it, is figuring out how to carve out a space where we are not required to answer to a representation that involves information that is applied to us, but which we did not provide (I’m thinking mainly about insurance-style risk thinking).
    As far back as 1978, Richard Posner basically provided a plausible neoliberal reading of privacy, and the core of his argument was that privacy usually functions to hide relevant information in an exchange, and in that sense the state shouldn’t be protecting it very often. On the other hand, he also thought that the state shouldn’t do much to try to force disclosure of information, since that would probably be inefficient – the party for whom information is valuable should bear the cost of getting it. So privacy is basically the insertion of noise into the system (and for that reason is probably of vital importance for any theory of resistance). On the other hand, I don’t see how Posner’s concept survives in an age of big data, since one effect of big data is to make information a lot cheaper, and to make the sorting of signal and noise sufficiently robust that it’s very hard to make enough noise to slow it down. (Nothing I’m saying here, btw, is (I think) in any way incompatible with the way you talk about listening).
    That’s also probably where to start thinking about the temporality of the concept. I’d probably try to start with people like Melinda Cooper, who talk about the present as borrowing from the future as the signal characteristic of contemporary capital. I’m not entirely sure how that analysis would turn out, though.

    Like

Leave a reply to Gordon Cancel reply