What does it take for a new advanced stat to go mainstream? (2024)

One month ago, Baseball Prospectus introduced a new total-offense statistic called Deserved Runs Created Plus (DRC+). Although the site’s related pitching statistic, Deserved Run Average (DRA), has appeared in my Awards Watch columns since 2016 — at times with behind-the-scenes assistance from the statistic’s co-creator, Jonathan Judge — the launch of DRC+ caught me by surprise.

Advertisem*nt

The introduction of DRA in early 2015 addressed a need. Sabermetrics continued, and continues, to struggle with the erratic nature of pitching performances and the challenge of separating pitcher-influenced results from fielder-influenced results. Offensive contributions, by comparison, are more predictable, and our existing alphabet soup of total-offense statistics —OPS+, wRC+, wOBA, and Baseball Prospectus’ own, now deposed, True Average, to name a few— seemed to be doing a sufficient job in capturing it.

That got me thinking about how new baseball statistics gain traction, and about the chances of DRC+ gaining the necessary currency to obtain equal footing with, if not supplant, those other metrics. To find out what can make or break a new statistic, I turned to some of the sabermetric community’s foremost gatekeepers, statisticians and thought leaders for their opinions.

“I don’t think there’s any one lead thing that determines what gets accepted and what doesn’t,” said Sean Forman, founder and president of Sports Reference, the parent company of Baseball-Reference.com. “It’s the quality of the stat. It’s the marketing of the stat. It’s the availability of the stat. You have to have all of those things present in order for something like that to take off.”

Launching a new statistic in today’s crowded stat-scape is certainly very different than it was 15 to 20 years ago, when sabermetrics was still the work of baseball outsiders, and the baseball blogosphere, as well as now-essential sites such as Baseball-Reference, Baseball Prospectus and FanGraphs, was still finding its feet.

“I think, initially, the decision-making process was just, whatever seems kind of cool,” said FanGraphs CEO David Appelman, who launched the site with a focus on fantasy baseball in the summer of 2005. “But that was in the days when there weren’t so many baseball stats out there. So, it was just kind of like, let’s just throw stuff at the wall and see what sticks. Now, I think it’s a considerably different process.”

Advertisem*nt

“I think that most of the things that we’ve seen that have become popular have stayed popular because it was deemed that there was a need for them in the first place,” said Rob Neyer, whose column on ESPN.com in the late 1990s and early 2000s helped introduce a generation of baseball fans to sabermetric thought, concepts and statistics. “That’s what’s happened with WAR [Wins Above Replacement]. A lot of people wanted one number to look at.”

Meeting existing needs has been the primary motivating factor for the statisticians working with Statcast, the most recent fount of new baseball statistics, according to one of the men behind the numbers, MLB.com Statcast analyst Mike Petriello. “We start with the question we are trying to answer,” said Petriello. “Someone will say, ‘hey, we’re really interested in being able to measure this,’ and then we’ll go back and look at the data and say, ‘how can we best do that?’”

Forman takes a similar approach in deciding what new statistics to add to Baseball-Reference’s player pages. “Our whole goal is to answer user questions,” he said. “So, if a user wants to know who was the best player in the National League last year, that’s a question that we feel we have the data to answer, and so we want to help answer that question for them with WAR or whatever other advanced stats they would find useful.”

That general focus on need is why the launch of DRC+ caught me by surprise. It uses the same 100-as-average scale as wRC+ and OPS+ and attempts to answer the same question as those two stats. Judge and company argued convincingly that DRC+ is better than the similar statistics that were previously available, but the degree of improvement is a crucial factor.

“I think the jump in improvement of stats is something that’s really important when introducing a new one,” said Appelman. “Because if it’s very, very marginal, people are going to tune out or say ‘why do I need to learn this new thing?’ And then, from a media standpoint, let’s say you’re writing an article, do you want to take the time to introduce a new stat that you think really isn’t going to increase your readers’ understanding of something? So, that’s kind of the thought process now, at least when introducing a stat. How much better are they then the existing stats we have, and are they going to tell you something new about the game of baseball that we didn’t already know? If the answer’s no, there’s going to have to be some other reason why we’re using it.”

Advertisem*nt

Baseball Prospectus Editor in Chief Aaron Gleeman doesn’t disagree. “For me, it’s mostly about whether a new stat tells a more accurate story and/or adds important context. With hitting stats, we all know the basics by now. There’s no need to introduce something new if it really only advances things a tiny bit or not at all. Our hope with DRC+ is that it’s not only more accurate than what’s previously been available, but that it’s more predictive and adds a level of context that advances things considerably.”

Adds Judge, “Once it became clear the metric would be a real upgrade, and not just another abbreviation for everyone to remember, we decided we wanted to release it.”

Still, unless and until that superiority becomes clear to its most influential users — teams, writers, broadcasters, fellow sabermetricians — DRC+ may have to fight for attention in what Appelman calls the “platform wars,”

“WAR, for instance, is where you have these platform wars, where you‘re like, ‘okay, well, which one is easier to access?’ I don’t think, when it comes down to one or the other, you’re looking at, really, a superior stat. But that is an instance where it comes down to, which one can I get to easier, not so much which one do I trust more. Because you can have one that maybe you prefer to the other, but, at the end of the day, you could flip a coin.”

A site like Baseball Prospectus, that puts most of its content behind a paywall and has kept the same look and functionality of its stats pages for more than a decade, tends to come up short in those comparisons. (Full disclosure: I worked on Baseball Prospectus’ books from 2006 to 2012.) Jay Jaffe, who wrote for Baseball Prospectus from 2004 to 2012 and is now a senior writer for FanGraphs, elaborates.

“I think Baseball Prospectus has always struggled to have its metrics penetrate the mainstream. Because both Baseball-Reference and FanGraphs have invested so much more in presenting statistics to the public, which is really what drives their models, they have always had the upper hand when it comes to their particular flavors of metric getting widespread use. WARP [Baseball Prospectus’s Wins Above Replacement Player] beat the various conceptions of WAR to market, but it didn’t catch on because not many people were looking to Baseball Prospectus for the encyclopedia-type stats in the way they were using FanGraphs and Baseball-Reference.”

“It has a lot to do with just the general interface,” Jaffe continues. “You go to Baseball-Reference, the first thing you’re given is the option of looking up a million player cards. FanGraphs, you’ve got editorial content there, but it’s still riding atop this tremendous statistical resource. Baseball Prospectus was, I think, more content-forward and really suffered from the lack of investment in interface infrastructure. BP today, when you’re looking for statistics, still looks a lot like BP did in 2001. Whereas FanGraphs, while it is a content company, it is, I think, first and foremost a stats-delivery company, and likewise with Baseball-Reference.”

Advertisem*nt

That may change. Gleeman told me that Baseball Prospectus is, “planning a further rollout on a soon-to-be-improved BP stat section that will allow people to access the data in much cleaner, more fluid ways.”

If that comes to fruition, BP’s current leadership will have succeeded where previous iterations failed.

“Baseball Prospectus was sort of a loose confederacy during its heyday,” Colin Wyers, who was director of statistical operations for the site from late 2009 to late 2013 and is now the senior architect in the Astros’ research and development department, told The Athletic via email. “BP had a lot of very brilliant people sort of doing their own thing. The problem was that a lot of things didn’t mesh well together. You had WARP and then you had VORP [the offense-only Value Over Replacement Player] and then you had [pitching metric] SNLVAR [Support-Neutral Lineup-adjusted Value Added above Replacement], and, at one point, they all had different definitions of replacement. It all got confusing and overwhelming.”

“At some point, it’s too much for anyone to be expected to know what stats to use,” continued Wyers. “The other thing is that all of those people worked in their own fashion and with their own tools, and so you’d have some code in FORTRAN that nobody knows how to run anymore, and you have some stuff that’s running on Oracle databases, which is really expensive to run, and we wanted to move to cheaper database solutions. We just didn’t have enough staffing to port and maintain everything, so we had to be judicious in keeping what seemed to be the best of what we had.”

That confusion — compounded by the departures of nearly all of the Baseball Prospectus founders (only Dave Pease, who joined after the first annual book to help launch the website, remains), and the rapid turnover of staff, due in part to the poaching of the site’s top statisticians by major-league teams, including VORP creator and SNLVAR custodian Keith Woolner by the Indians in 2007 — caused many once-valuable statistics to wither on the vine. One such stat was Clay Davenport’s Equivalent Average, which was renamed True Average in 2010 in an attempt to boost its appeal. Davenport left BP in 2011. DRC+ has now arrived to replace True Average entirely.

That lack of lasting stewardship contrasts sharply with the consistency provided by Forman and Appelman at Baseball-Reference and FanGraphs, respectively. Both sites have also maintained ongoing relationships with sabermetricians such as Tom Tango, who helps both keep their statistics current.

“Anyone who is a friend of sabermetrics is a friend of mine,” Tango wrote via email. “Every now and then I’ll find something small that no one might notice, but it stands out to me, and I’d like to get that corrected on B-R and FanGraphs since I am a heavy user of both sites. In other words, it benefits me just as it benefits everyone else to make sure the data is as correct as possible.”

Advertisem*nt

The DRA stat received criticism for growing up in public, with multiple significant alterations to its formula after its early 2015 release (no, Jason Schmidt’s 2004 is no longer among the greatest pitching seasons of all time according to the statistic), but such continued improvements are an important part of any statistic’s ongoing relevance.

“We have been very clear that our philosophy is of continuous improvement,” wrote Judge, who is now a co-owner of Baseball Prospectus, in an email. “We don’t think it is unreasonable to want something to be forever stable, but that is not a realistic goal unless you are willing to say, at some point, we are done learning and working. That’s a remarkable thing to declare, and we’re not interested in creating credibility by pretending our work is done when it isn’t. I also think that, if your position is that no statistic should ever be updated with new information or learning, then the most recent statistic you can use was probably created circa 1984. Virtually all statistics in wide use created since then have undergone major revisions at some point, including some ‘down-was-up’ moments as people have realized they were overlooking something important about this sport. Our objective is to get things right.”

A notable, non-BP example of a statistic falling out of favor because of an absent creator and a lack of public upgrades is Voros McCracken’s Defense Independent Pitching Stats (DIPS). In our email conversation, Tango called McCracken’s discovery of the lack of impact pitchers have on balls in play, first popularized via Baseball Prospectus and Neyer’s ESPN column in January 2001, “the most important sabermetric finding of the last 30 years.” Indeed, it was so monumental that McCracken went from relative obscurity to a job as a consultant with the Red Sox in less than two years.

“When the Red Sox hired me,” McCracken said, “basically all of my public work had to stop. And, at that time, by the fall of 2002, people were still very much in the thick of kicking around exactly what this all meant, and so forth and so on, and that discussion kind of had to go on without me contributing to it.”

With McCracken’s permission, Jaffe continued to post DIPS leaderboards on his personal site, FutilityInfielder.com, using the last publicly available version of the formula. However, with McCracken out of the public eye, DIPS was rather quickly replaced by Tango’s Fielding Independent Pitching (FIP).

Then again, McCracken, Jaffe and Tango all believe that FIP’s relative simplicity may have carried the day, even if McCracken had been around to keep DIPS current.

“[DIPS] was fairly arduous,” said Jaffe. “And then we learned, after a couple of years of doing that, a handy formula called FIP*, which you could do in seconds. I think it comes down to, you’ve got something that takes a whole spreadsheet to calculate and has a lot of complex variables in it, versus something that you can do with a pocket calculator, or even almost in your head, if you’re so inclined. Just the simplicity is what carried the day for FIP. You didn’t need an instruction manual to do it.”

Advertisem*nt

*the formula for FIP is (13*HR+3(HBP+BB)-2*K)/IP + a constant, usually around 3.2, to put it on the ERA scale

“FIP is a shortcut to all that,” wrote Tango, “with a correlation of . . . I don’t remember what, but something like r=0.98 or r=0.99 [r=1 is total positive linear correlation]. While DIPS and FIP are both relevant, FIP makes the whole process completely clear and relatable. You can totally see how much of an effect one more walk or one more strikeout or home run will have on the final number. It’s the pure clarity of what it is doing.”

“I think that FIP, from Jump Street, was a better idea than DIPS,” said McCracken, who is now a consultant for another major-league team, “because it was much simpler to calculate. Looking back on it now, I probably should have done it differently. Just because complicated statistical things are not necessarily friendly for mass consumption.”

Simplicity and ease of understanding was a theme that those with whom I spoke returned to repeatedly.

“I think of the Statcast stats that have come out,” said Appelman. “Exit velocity and launch angle are clearly the two which have caught on the most because they’re just really simple. It’s stuff that people always wanted to know and now there’s a hard measurement on it.”

“A big part of our job is to translate this stuff to the general public,” said Petriello. “How can we use this on MLB.com, or on social media, or on television? So it’s got to be, not simplistic, necessarily, but in a way that can be easily explainable and understandable to people. Like hit probability, right now, is exit velocity and launch angle. I think people understand that. We could get 10 levels deeper than that if we wanted to, and get into all sorts of modeling and complicated stuff. And we may yet do that, but for right now, when this stuff is still so new to so many people, we’re trying to understand that our audience isn’t just the one percent of the mass of nerds, who we love, but obviously casual baseball fans, as well.”

Enter Tom Tango, again. Tango joined the Statcast team as senior data architect in June 2016. Among his tasks since arriving has been improving the seemingly abandoned route-efficiency stat.

Advertisem*nt

“I knew right away that the idea was right, but the implementation wasn’t going to work,” wrote Tango. “That metric was simply total distance needed divided by total distance traveled, and you’d get back a number like 97 percent or 88 percent. Other than ‘high good, low bad,’ it was meaningless. The simple change was to take the difference between the two. That simple change makes it relevant because we are keeping the unit in feet. You run 50 feet, you need to cover 45 feet, so that’s an extra five feet added to your route. And it’s relatable because we can see how much a fielder missed the ball by. If he missed it by one step, three feet, then we can clearly determine that the extra five feet on the route prevented him from catching the ball. We can actually have a conversation about it.”

That new route-efficiency statistic is among those Tango, Petriello and the rest of the Statcast team hope to roll out before Opening Day this year.

Another crucial factor in ease of use and understanding is the name.

“I think the name is highly important,” said Neyer, “because you don’t have time to explain it. Nothing’s going to become truly popular unless it’s on TV or the radio, because that’s how 90-some percent of people still consume sports. They’re not going to read an article on Baseball Prospectus, a 1,000-word article on whatever their latest thing is. So, it’s got to have a name that people understand immediately.”

“We spend a lot of time thinking about that,” admitted Petriello. “Naming things is really, really difficult. I’ll be the first to say, we’re not always great at it. I think our most powerful stat is probably expected weighted on-base, xwOBA. It’s a tough thing to explain to anybody. It’s hard. We want to make sure that the name is something that people understand, but the deeper that you go with crazy acronyms, it might be more descriptive, but then it’s harder for people that actually want to know what it is. That is, for me, the most difficult thing we have, is trying to figure out how to name these things, and hopefully we’ve done a little bit better. I’m not sure we’re all the way there yet, but that’s an ongoing concern.”

Every now and then, a stat checks every box: a good name, simple to understand, responds to a question in need of an answer, moves our understanding forward significantly, benefits from an easily accessible presentation on a popular platform and is maintained and kept relevant by its creator. Jaffe’s Hall of Fame measuring stick JAWS (Jaffe WAR Score) is such a statistic.

Jaffe created the statistic in January 2004 to establish a total-value-based benchmark for Hall-worthiness, and later gave it its catchy name at the insistence of then-BP Editor in Chief Christina Kahrl. It still took years to catch on. Joe Posnanski was among the first mainstream writers to cite Jaffe’s Hall stats, in late 2010. The next fall, Brian Kenny brought Jaffe on the MLB Network to discuss JAWS and the Hall of Fame. Then, in 2012, Jaffe was hired by SI.com (full disclosure: I recommended him for the job) and Forman, at Jaffe’s suggestion, added sortable JAWS leaderboards to Baseball-Reference, which prompted a change in the base statistic for the calculation from BP’s WARP to B-Ref’s WAR. JAWS is now an essential part of the conversation about any player’s Hall of Fame candidacy, but not every statistic can satisfy every requirement so fully, and it still took the better part of a decade for JAWS to gain mainstream acceptance.

Advertisem*nt

So, what are the most important attributes for a new statistic to have in order to gain traction in and beyond the sabermetric community? This is where opinions diverged most.

Tango listed the platform first among the factors that help a new statistic gain traction. “Leverage Index, WAR, wOBA and FIP are good examples of metrics I created or had a strong hand in shaping,” he wrote. “Without FanGraphs and B-Ref, they’d be relegated to the niche market, and so would take at least a decade to gain widespread use.”

To FanGraphs’ Appelman, however, the platform is a secondary concern. “I think with new stuff, it doesn’t really matter what the platform is so much,” he said. “I think it’s going to take a while sometimes for stats or unknown platforms to percolate into the public consciousness just because people are going to have to find the site, but if it’s really good, regardless of the platform, people will find it and use it.”

Tango and Appelman don’t actually disagree as much as they have different opinions about the time it can take a statistic to gain currency.

“The one factor people often forget about is time,” wrote Judge. “Metrics like OPS+ and UZR and wRC+ were once themselves weird and new, used only by nerds on the cutting edge of something and ignored by even many savvy writers. Now, all of these are in common usage despite having involved fairly substantial changes at times. Even the best metric is unlikely to simply shove existing ones aside after a few months. At least a season or two is usually needed to have people get comfortable seeing it in action and having it provide information that readers confirm is useful.”

Sabermetricians may have to learn patience, but the potential user base for a new statistic isn’t likely to show much. As a result, when I asked about the most important attributes for a new statistic hoping to gain traction, the word that came up most often was “accessible,” referring to both how easy it is to find and access the stat itself on whatever platform is offering it, as well as how easy it is to understand what the stat is and how it works.

“I think you want to have a good name,” said Neyer, “it’s got to be reasonably explainable — I can explain WAR in one sentence — and, at this point, it’s got to be accessible, which means you can find what you want in two or three clicks. I think those three ingredients. I wouldn’t have any way of saying which of those is the most important. I think you need all three.”

Advertisem*nt

“It has to be accessible,” added Jaffe. “It certainly helps to have a memorable, catchy acronym, whether it actually tells you what it is or whether it’s reverse-engineered, like PECOTA. It’s those two things, and it’s also, ‘what is this for? Does it answer a question?’ I think there’s all of those things in there.”

“Is it unique is probably the first thing,” said Petriello, “and, yeah, is it accessible. For us, we do have an advantage where we have direct contact with a lot of the broadcasters, and we can get it on TV, and we can get it on the front page of MLB.com, and that certainly helps. I also think having the sabermetric community buy into it and pick it up and write about it on blogs and sites and articles, I think that helps a lot. You’ve certainly seen Sprint Speed on a lot of team blogs and all that kind of stuff. So, for me, those two things are maybe the most important. Is it unique? Is it available? And is the math behind it good?”

Though it may not always prove true, one still hopes that, ultimately, the deciding factor in a stat’s ultimate fate is the quality of the idea itself.

“I would like to think that it’s the importance of the concept behind it,” said McCracken. “In my experience, when you’re talking about statistics, you’re starting to talk about a subset of baseball fandom that is looking a little deeper than the talk radio crowd, and, therefore, I do think it is important that the concept has a good meaning and helps you understand baseball in a better way. I do think that that is, probably, in terms of long-term staying power, the most important. Now, other things matter. The platform certainly. The ease of understanding the statistic matters. But I really do think the No. 1 factor is, ‘how does this help us understand the game of baseball?’”

(Photo of Trout: Patrick McDermott/Getty Images)