In their series that could be titled "Academic sexism is a myth", Wendy Williams and Stephen Ceci have a newest installment: on the basis of fictive scenarios, faculty members in STEM disciplines had to make decisions about hiring particular male or female candidates. I'm not going to talk in detail about the methodology – which involved presenting faculty members fictitious scenarios about the on campus interviews of female and male candidates – but about the problem of inductive risk whenever we investigate biases against women and other underrepresented groups, such as African Americans, people with disabilities, etc.

Inductive risk is the chance that one is wrong accepting or rejecting a scientific hypothesis. For instance, a food additive that poses a serious health risk is wrongly concluded to be safe, or conversely, a food additive that has no health risk is wrongly concluded to be carcinogenic. Both false negatives and false positives can potentially pose inductive risks. Heather Douglas has argued that inductive risk is one way to let values play a role in science. Because scientists are in an epistemic position to assess the risks and benefits of their work, they should assess the non-epistemic consequences (in policy, public perception, health hazards etc) of publishing particular research findings. How does this concept apply to the research by Williams and Ceci?

When we investigate sexist biases against women in academia, there are two types of inductive risk: (1) There are no biases against women (indeed women are now being preferred as candidates for some positions in some fields). (2) There are in fact biases against women, but Williams and Ceci failed to detect it.

Looking at their most recent paper, published in PNAS, it is clear that Williams and Ceci talk mostly about the inductive risk posed by (1). In their conclusion, they state "We hope the discovery of an overall 2:1 preference for hiring women over otherwise identical men will help counter self-handicapping and opting-out by talented women at the point of entry to the STEM professoriate, and suggest that female underrepresentation can be addressed in part by increasing the number of women applying for tenure-track positions". I find this phrase "self-handicapping" quite revealing, almost sounding like blaming the victim. On the Daily Nous, it is phrased in a more friendly way: "It may indeed be that if there is a concern with the underrepresentation of women in philosophy, one strategy that might help would be to broadcast findings like these, so as to provide encouragement to would-be woman philosophers." If the fact of underrepresentation itself indeed deters women, or perceptions of a hostile environment, it's important we get all the facts right so that this itself will not form a deterrent.

But what about (2)? I think there are risks associated with (2) too. If there is indeed still sexism toward women in academia (as most other studies suggest), then Williams and Ceci are lulling us in a false sense of complacency, and might even give reason to stop serious attempts to redress gender imbalance (something like "Well women are already preferred 2:1 for hiring decisions, why should we bother doing any more things to help them get into academia? If they don't it's their personal decision, after all)."

A problem with Ceci and Williams' research (not just this paper, but their project as a whole) is their too-narrow focus on what counts as personal choices by women. To give an example, someone close to me, whom I've known since childhood, has a PhD in physics. She is a woman, and also a member of an ethnic minority. She did her PhD in 4 years, and left academia immediately upon graduating, in spite of having been offered a prestigious postdoc. She's one of those people who left early, in a STEM field, so in Williams and Ceci's views, one of those women in STEM who don't apply for permanent and other academic positions. One important reason she left is that many of her lab co-workers were saying, even to the extent she learned about it, that she had only gotten her position because she was a woman and her lab director wanted to improve gender balance. When she was about to defend her PhD, many of her colleagues said she didn't have enough to show for, (although her track record was similar to that of the majority of her male colleagues), but she would get something anyway, because, you know, affirmative action.

Now this person is someone who works hard, is gifted (she was a mathlete, straight-A student, etc), did all the right things, but this continued doubting of her capacities because she was a woman, the continued suspicion of affirmative action (which is illegal in that country, by the way), made her decide to leave academia and not accept the postdoc offer. She is now happy in a non-academic environment (which is also intellectually stimulating, and much more gender-balanced), where, she told me recently, "no-one has ever questioned my credentials because I'm a woman".

Is hers a personal choice? I am not sure, but I believe that the Williams and Ceci project does little to alter this sort of personal choices. In fact, I feel that their continued narrow focus on what counts as personal choices and project to blame those choices as the sole reason for women's underrepresentation in STEM (and by extension in other fields) poses a serious inductive risk. Few people are comfortable with being an affirmative action hire, certainly if the odds are 2:1 in your favor.

Posted in , , ,

7 responses to “Assessing inductive risk in the Williams and Ceci studies”

  1. Dan Hicks Avatar

    Great post! Part of the reason I like Douglas’ work so much is that it’s very easy to apply to a wide variety of cases.

    Like

  2. Sylvia Avatar

    Hi Helen,
    did you see this reactions to the Williams and Ceci study by Zuleyka Zevallos (Ph.D. in sociology)? She comments on (i) the fact that it was quite transparent to the participants that gender was the parameter of interest, (ii) the choice of psychologists as a control group, and (iii) the design of the study with vignettes rather than CVs; she also points to studies that did present CVs.
    In an earlier post, Zevallos called out the narrative of individual choice: “the idea of individual choice is a fallacy”, which is in line with your current post.

    Like

  3. Helen De Cruz Avatar
    Helen De Cruz

    Hi Sylvia: I did not know her response – very relevant. Even if there were no methodological concerns, and even if there is no hiring bias, Williams and Ceci’s program has inductive risk if we grant there are various problems (other at the hiring stage) that women encounter (i.e., it’s disingenuous of Ceci and Williams to imply that women are mainly deterred by hiring biases they might encounter).

    Like

  4. Wesley Buckwalter Avatar

    Hey Helen, Thanks for your post as always it has expanded my thinking. I agree inductive risk can sometimes be a mechanism through which values can or should play a prominent role in science. I have a question about this particular instance though. You note that wc allowed one such value (potential encouragement) to play a role, but that they did not, and should have, considered the opposite (potential discouragement) in carrying out the research. But I am wondering if either value should be featured in this research on biases. Isn’t it the job of scientists to report what they found, whether or not some could find it discouraging or encouraging? That’s not to say there aren’t very important worries about negative social perception that you identified. I’m just wondering if that risk is really an ‘inductive’ risk given that the worries could persist even if wc are correct, and or what role that should play in scientific practices when studying biases.

    Like

  5. Helen De Cruz Avatar

    Hi Wesley: I was struck by how value-laden the WC piece is (not just this one, also their previous ones). Some normative implications they spell out “Unfortunately, despite their success once hired, women apply for tenure-track positions in far smaller percentages than their male graduate student counterparts (14, 16, 18). Why might this be? One reason may be omnipresent discouraging messages about sexism in hiring, but does current evidence support such messages?” Their conclusion extrapolates, in an unwarranted manner, from the available evidence, e.g., that they’ve found”a surprisingly welcoming atmosphere today for female job candidates’ – the study only discusses tenure-track hires, not work environment (which qualitative work still suggests might be unwelcoming for women in STEM), senior appointments, mentoring etc. So you are right, given this unwarranted extrapolation, that the risk exists even if WC are correct about their narrow claim. Whether that’s still an inductive risk – I think so. Suppose that WC’s stronger claims about welcoming climates, and it being a “propitious time for women launching careers in academic science” is true, then I think the inductive risk disappears, since it would then seem that efforts to implement more fairness for women have succeeded across the board. Unless their work would cause some backlash, the inductive risk remains insofar that they have demonstrated that the climate and opportunities in STEM are truly the same for women across the board. This, however, they have not done (I think), given their narrow conceptualization of what counts as personal choices. So given the emphasis they place on the broader claims they make, there is still inductive risk, I think.
    It’s possible for authors to write about matters of biases in a less value-laden way, just reporting, as you say, what they found. It would still be the case that people would read things in it, of course.

    Like

  6. Wesley Buckwalter Avatar

    Hey Helen, I was also struck by those passages and considered them a limitation of the piece, which is why I was wondering whether appeal to that kind of value should be avoided altogether in this case. Perhaps this raises an interesting distinction about value in philosophy of science re the notion inductive risk. On the one hand there is an IR involving the cost to society will be significant when a particular experimental result is wrong. On the other hand, there’s how perception and general discussion points associated with that actual result might have a negative impact, which could vary independently of whether that finding is wrong and IR. It sounds like perhaps a mix in this case. More generally I was thinking, maybe making that distinction could provide an argument for how value can rightfully influence scientific practice. I wouldn’t want to censure an actual result because of the latter, but it does seems right that scientific discussion points and associated reporting should be more sensitive to the latter risks.

    Like

  7. David Miller Avatar

    [This below comment is a response to Dr. Zuleyka Zevallos’s critique of the PNAS study on STEM faculty hiring bias by Wendy Williams and Stephen Ceci. http://othersociologist.com/2015/04/16/myth-about-women-in-science/%5D
    Zuleyka, thank you for your engaging and well researched perspective. On Twitter, you mentioned that you were interested in my take on the study’s methods. So here are my thoughts.
    I’ll respond to your methodological critiques point-by-point in the same order as you: (a) self-selection bias is a concern, (b) raters likely suspected study’s purpose, and (c) study did not simulate the real world. Have I missed anything? If so, let me know. Then I’ll also discuss the rigor of the peer review process.
    As a forewarning to readers, the first half of this comment may come across as a boring methods discussion. However, the second half talks a little bit about the relevant players in this story and how the story has unfolded over time. Hence, the second half of this comment may interest a broader readership than the first half. But nevertheless, let’s dig into the methods.
    (a) WAS SELF-SELECTION A CONCERN?
    You note how emails were sent out to 2,090 professors in the first three of five experiments, of which 711 provided data yielding a response rate of 34%. You also note a control experiment involving psychology professors that aimed to assess self-selection bias.
    You critique this control experiment because, “including psychology as a control is not a true reflection of gender bias in broader STEM fields.” Would that experiment have been better if it incorporated other STEM fields? Sure.
    But there’s other data that also speak to this issue. Analyses reported in the Supporting Information found that respondents and nonrespondents were similar “in terms of their gender, rank, and discipline.” And that finding held true across all four sampled STEM fields, not just psychology.
    The authors note this type of analysis “has often been the only validation check researchers have utilized in experimental email surveys.” And often such analyses aren’t even done in many studies. Hence, the control experiment with psychology was their attempt to improve prior methodological approaches and was only one part of their strategy for assessing self-selection bias.
    (b) DID RATERS GUESS THE STUDY’S PURPOSE?
    You noted that, for faculty raters, “it is very easy to see from their study design that the researchers were examining gender bias in hiring.” I agree this might be a potential concern.
    But they did have data addressing that issue. As noted in the Supporting Information, “when a subset of 30 respondents was asked to guess the hypothesis of the study, none suspected it was related to applicant gender.” Many of those surveyed did think the study was about hiring biases for “analytic powerhouses” or “socially-skilled colleagues.” But not about gender biases, specifically. In fact, these descriptors were added to mask the true purpose of the study. And importantly, the gendered descriptors were counter-balanced.
    The fifth experiment also addresses this concern by presenting raters with only one applicant. This methodological feature meant that raters couldn’t compare different applicants and then infer that the study was about gender bias. A female preference was still found even in this setup that more closely matched the earlier 2012 PNAS study.
    (c) HOW WELL DID THE STUDY SIMULATE THE REAL WORLD?
    You note scientists hire based on CVs, not short narratives. Do the results extend to evaluation of CVs?
    There’s some evidence they do. From Experiment 4.
    In that experiment, 35 engineering professors favored women by 3-to-1.
    Could the evidence for CV evaluation be strengthened? Absolutely. With the right resources (time; money), any empirical evidence can be strengthened. That experiment with CVs could have sampled more faculty or other fields of study. But let’s also consider that this study had 5 experiments involving 873 participants, which took three years for data collection.
    Now let’s contrast the resources invested in the widely reported 2012 PNAS study. That study had 1 experiment involving 127 participants, which took two months for data collection. In other words, this current PNAS study invested more resources than the earlier one by almost 7:1 for number of participants and over 18:1 for time collecting data. The current PNAS study also replicated its findings across five experiments, whereas the earlier study had no replication experiment.
    My point is this: the available data show that the results for narrative summaries extend to CVs. Evidence for the CV results could be strengthened, but that involves substantial time and effort. Perhaps the results don’t extend to evaluation of CVs in, say, biology. But we have no particular reason to suspect that.
    You raise a valuable point, though, that we should be cautious about generalizing from studies of hypothetical scenarios to real-world outcomes. So what do the real-world data show?
    Scientists prefer actual female tenure-track applicants too. As I’ve noted elsewhere, “the proportion of women among tenure-track applicants increased substantially as jobseekers advanced through the process from applying to receiving job offers.”
    https://theconversation.com/some-good-news-about-hiring-women-in-stem-doesnt-erase-sex-bias-issue-40212
    This real-world preference for female applicants may come as a surprise to some. You wouldn’t learn about these real-world data by reading the introduction or discussion sections of the 2012 PNAS study, for instance.
    That paper’s introduction section does acknowledge a scholarly debate about gender bias. But it doesn’t discuss the data that surround the debate. The discussion section makes one very brief reference to correlational data, but is silent beyond that.
    Feeling somewhat unsatisfied with the lack of discussion, I was eager to hear what those authors had to say about those real-world data in more depth. So I talked with that study’s lead author, Corinne Moss-Racusin, in person after her talk at a social psychology conference in 2013.
    She acknowledged knowing about those real-world data, but quickly dismissed them as correlational. She had a fair point. Correlational data can be ambiguous. These ambiguous interpretations are discussed at length in the Supporting Information for the most recent PNAS paper.
    Unfortunately, however, I’ve found that dismissing evidence simply because it’s “correlational” can stunt productive discussion. In one instance, an academic journal declined to even send a manuscript of mine out for peer review “due to the strictly correlational nature of the data.” No specific concerns were mentioned, other than the study being merely “correlational.”
    Moss-Racusin’s most recent paper on gender bias pretends that a scholarly debate doesn’t even exist. Her most recent paper cites an earlier paper by Ceci and Williams, but only to say that “among other factors (Ceci & Williams, 2011), gender bias may play a role in constraining women’s STEM opportunities.”
    dx.doi.org/10.1177/0361684314565777
    Failing to acknowledge this debate prevents newcomers to this conversation from learning about the real-world, “correlational” data. All data points should be discussed, including both the earlier and new PNAS studies on gender bias. The real-world data, no doubt, have ambiguity attached to them. But they deserve discussion nevertheless.
    WAS THE PEER REVIEW PROCESS RIGOROUS?
    Peer review is a cornerstone of producing valid science. But was the peer review process rigorous in this case? I have some knowledge on that.
    I’ve talked at some length with two of the seven anonymous peer reviewers for this study. Both of them are extremely well respected scholars in my field (psychology), but had very different takes on the study and its methods.
    One reviewer embraced the study, while the other said to reject it. This is common in peer review. The reviewer recommending rejection echoed your concern that raters might guess the purpose of the study if they saw two men and one woman as applicants.
    You know what Williams and Ceci did to address that concern? They did another study.
    Enter data, stage Experiment 5.
    That experiment more closely resembled the earlier 2012 PNAS paper and still found similar results by presenting only one applicant to each rater. These new data definitely did help assuage the critical reviewer’s concerns.
    That reviewer still has a few other concerns. For instance, the reviewer noted the importance of “true” audit studies, like Shelley Correll’s excellent work on motherhood discrimination. However, a “true” audit study might be impossible for the tenure-track hiring context because of the small size of academia.
    The PNAS study was notable for having seven reviewers because the norm is two. The earlier 2012 PNAS study had two reviewers. I’ve reviewed for PNAS myself (not on a gender bias study). The journal published that study with only myself and one other scholar as the peer reviewers. The journal’s website even notes that having two reviewers is common at PNAS.
    http://www.pnas.org/site/authors/guidelines.xhtml
    So having seven reviewers is extremely uncommon. My guess is that the journal’s editorial board knew that the results would be controversial and therefore took heroic efforts to protect the reputation of the journal. PNAS has come under fire by multiple scientists who repeatedly criticize the journal for letting studies simply “slip by” and get published because of an old boy’s network.
    The editorial board probably knew that would be a concern for this current study, regardless of the study’s actual methodological strengths. This suspicion is further supported by some other facts about the study’s review process.
    External statisticians evaluated the data analyses, for instance. This is not common. Quoting from the Supporting Information, “an independent statistician requested these raw data through a third party associated with the peer review process in order to replicate the results. His analyses did in fact replicate these findings using R rather than the SAS we used.”
    Now I embrace methodological scrutiny in the peer review process. Frankly, I’m disappointed when I get peer reviews back and all I get is “methods were great.” I want people to critique my work! Critique helps improve it. But the scrutiny given to this study seems extreme, especially considering all the authors did to address the concerns such as collecting data for a fifth experiment.
    I plan on independently analyzing the data myself, but I trust the integrity of the analyses based on the information that I’ve read so far.
    SO WHAT’S MY OVERALL ASSESSMENT?
    Bloggers have brought up valid methodological concerns about the new PNAS paper. I am impressed with the time and effort put into producing detailed posts such as yours. However, my overall assessment is that these methodological concerns are not persuasive in the grand scheme. But other scholars may disagree.
    So that’s my take on the methods. I welcome your thoughts in response. I doubt this current study will end debate about sex bias in science. Nor should it. We still have a lot to learn about what contexts might undermine women.
    But the current study’s diverse methods and robust results indicate that hiring STEM faculty is likely not one of those contexts.
    Disclaimer: Ceci was the editor of a study I recently published in Frontiers in Psychology. I have been in email conversation with Williams and Ceci, but did not send them a draft of this comment before posting. I was not asked by them to write this comment.
    dx.doi.org/10.3389/fpsyg.2015.00037

    Like

Leave a comment