How ‘Overwatch’ Is Tackling the Esports Stats Problem

Getty Images/Blizzard/Ringer illustration

By Ben LindberghJuly 24, 2019, 6:00 pm UTC • 14 min

In the inaugural season of the Overwatch League, the Blizzard-operated esports venture that launched last year with a four-stage season that ran from January through July, the London Spitfire recovered from a second-half swoon to defeat the Philadelphia Fusion in the finals in front of a loud crowd at Brooklyn’s Barclays Center. The Spitfire followed a perplexing path to the playoffs. After going a combined 15-5 against their opponents in the then-12-team (now 20-team) league in the first two regular-season stages, they slumped to a combined 9-11 in stages 3 and 4 and entered the playoffs with a worse overall record than four of their competitors in the six-team playoff field. In the playoffs, though, they went 6-1, dispatching both Los Angeles clubs before sealing the Fusion’s fate.

London’s up-down-up trajectory mirrored that of one of the team’s DPS (damage per second, or high-damage-dealing) players, Ji-hyeok “Birdring” Kim. A midseason wrist injury—which, as Birdring later revealed, he incurred by slamming his hand against his desk while playing the platformer Getting Over It—compromised his performance (when he was able to play at all) in the middle of the regular season. But by the playoffs, he was back to full strength, forming a devastating tag team with another London DPS player, finals MVP Joon-yeong “Profit” Park.

A new Blizzard-produced player-performance metric, player impact rating (PIR), which was unveiled on Wednesday, aims to express stories like the Spitfire’s in statistical terms. According to PIR’s designer, Overwatch League stats producer Ben Trautman, Profit’s PIR—a single number that purports to capture player performance in team fights, expressed on a scale where 100 is average—held steady at roughly 120 until spiking to 140 in the playoffs. But Birdring, who had been at about 110 before his injury, dropped down to 70 while he was playing through pain before bouncing back up toward the end of the season.

“It really kind of tells the story of London Spitfire’s season just with a single metric, because as outstanding as Profit was in the finals, their ability to even get to the season playoffs was largely due to Birdring’s turnaround,” Trautman says.

Just as esports have long mirrored traditional sports—right down to the star players who sabotage themselves by punching, slamming, or throwing something in frustration—traditional sports are starting to resemble esports in some respects. For one thing, owners in more established sports leagues sometimes double dip in upstart esports leagues: The Overwatch League’s New York Excelsior, Los Angeles Gladiators, and Boston Uprising, along with the Fusion, are all parts of the portfolios of ownership groups that also hold franchises in one or more of the four largest leagues in the U.S. For another, though, ball sports are increasingly becoming digitized. MLB introduced an automated pitch-tracking system in all big-league ballparks in 2008 and upgraded to complete player-tracking in 2015. The NBA debuted its own ball-and-player-tracking system in the 2013-14 season, followed by the NFL last year and the NHL later this year.

While reporting The MVP Machine, my book about baseball’s player-development revolution, I visited Driveline Baseball, a hotbed of data-driven development, and had my pitching mechanics motion tracked by an array of 15 cameras. As Driveline’s lead engineer Joe Marsh placed sensors all over my body so that the cameras and accompanying software could map my movements onto a wireframe model’s, he told me, “We basically make you into a video game and then do analysis on that.”

Esports competitors don’t need to don mocap gear to track their in-game movements. Everything their avatars do is already digitized and recorded. In theory, that should make esports even better suited to statistical analysis. In practice, though, other aspects of esports have made player performance challenging to quantify.

“It’s very difficult to compare player performance in a game where every role is super-duper different,” Trautman says. Overwatch, which came out in 2016, is one such game. As a team-based online shooter in which squads of six players cooperate to achieve objectives like capturing control points and delivering payloads, Overwatch synthesizes first-person shooters and MOBAs. In a standard, deathmatch-based shooter in which the main way to be good at the game is to kill without being killed, a simple stat like KDA (kills/deaths/assists) ratio does a decent job of ranking players, but Overwatch complicates matters by incorporating objectives, separating players into three (originally four) roles (damage, tank, and support), and requiring them to pick from 31 player-controlled characters, or “heroes” (including new addition Sigma). Among major American sports, Overwatch is most often linked with basketball, but the parallels aren’t perfect. In basketball, a point guard and a center accumulate certain stats at different rates—a point guard typically has a higher ratio of assists to blocks—but both can score points.

“That’s not exactly true in a game like Overwatch, where over half the cast can’t even heal,” Trautman says. “So we always wanted to have a way to supplement those arguments of like, ‘Oh, who’s the best player on this team? And is it their Mercy, or is it their Winston?’ So this player impact rating is one of the first, if not the first, stat that allows us to compare apples to oranges—or in our case, Winstons to Tracers—in a fair way.”

Blizzard blended esports and traditional sports when it conceived the Overwatch League, a geolocated circuit in which each team is tied to a single city. That’s the dominant model for traditional sports leagues, but it was highly unusual for esports, which has tended to feature either free-floating teams with no geographic alignment or leagues organized by continent and country. (Earlier this year, Blizzard’s parent company Activision Blizzard announced that the Call of Duty World League would follow in the OWL’s footsteps by switching to a city-based structure.) By introducing PIR—only one word and letter removed from basketball’s popular player efficiency rating, or PER—Blizzard is establishing one more point of convergence between the two worlds. Traditional sports fans used to consulting single-number metrics as a guide to which players are good will now find an equivalent in the OWL.

NYXL’s steep decline late last season, as pictured through PIR.

Attempts to express holistic player performance with one handy number aren’t entirely new to esports, but partly due to the aforementioned complexities of characters and roles, competitive video games don’t lend themselves to stats as tidy and all-encompassing as baseball’s wins above replacement (not that WAR doesn’t have its limitations and detractors). Esports analyst Ben Steenhuisen notes that in the popular MOBA Dota 2, players and teams employ their own individual “hero pools,” or subsets of the pool of 115 possible heroes, which are dictated partly by what the game’s current conditions favor and partly by personal preference and experience. “This means that the team has their own different-shaped pieces of the puzzle for each character, and they hope their selected five heroes make a coherent jigsaw puzzle when put together,” Steenhuisen says. Contrast that to baseball, in which all pitchers try to get batters out and all batters try to avoid making outs, making comparisons between players relatively (if not totally) straightforward.

In esports, Steenhuisen says, “Even good metrics are often really bad.” Dying in games is generally bad, but there are exceptions even to that rule. “Sometimes your death in a game can start off a fight in a preferred way, or prevent an enemy from planting the bomb,” Steenhuisen says. “Sometimes making a kill means you have no way to win the round, and sometimes sitting idle in your base doing nothing is the optimal play.”

Steenhuisen notes that a statistic like average damage per round (ADR) in Valve’s shooter Counter-Strike: Global Offensive can be skewed by the configuration of the map or by whether a player is an entry fragger (the first fighter through the door) or a lurker (someone who tries to pick off enemies from behind as they respond to attacks). Prominent CS: GO site HLTV.org offers a more sophisticated player rating metric, but Steenhuisen says it’s been “criticized for being opaque and not statistically rigorous.” One study showed it was strongly correlated to simpler stats that include only kills and deaths, which could suggest that it’s not well-suited for comparing players across roles. And CS: GO’s roles are less distinct than those of Overwatch.

Blizzard says that PIR “allows direct player-to-player comparisons across all roles and heroes using a formula that measures player impact during team fights.” Breaking down Overwatch matches into a series of engagements in which virtual blood is drawn is an idea borrowed from Dennis “Barroi” Matz, the former proprietor of independent Overwatch stats site Winston’s Lab. Matz made his own attempt at a player rating stat, but he was hired as an analyst late last year by an OWL expansion club, the Toronto Defiant. Trautman says he isn’t familiar with Matz’s player-rating research, although he acknowledges that Matz helped inspire the new stat by pioneering the concept of team fights. (Matz didn’t respond to a request for comment.) Trautman also says that PIR isn’t related or correlated to Overwatch’s “Skill Rating,” which Blizzard uses to match up non-professional players in the game.

Trautman and Overwatch League spokesman Kevin Scarpati liken PIR to wRC+, the offensive stat in baseball that sums up all of a player’s contribution at the plate in one number on a similar scale, where 100 is average. One way in which PIR differs from wRC+ is that the latter is almost entirely independent of a player’s teammates, because baseball is structured as a series of batter-pitcher matchups that are largely unaffected by the rest of the lineup. Overwatch is a game of continuous team action, which means it’s tough to uncouple each player’s performance from the conditions created by the surrounding squad. “We’ve actually had a couple of trades happen midseason, and the way that that tends to shake out is if you go from a really bad team to a really good team, that’s sometimes worth up to a 10-rating bump and vice versa,” Trautman says. “But that in and of itself isn’t enough to explain players that are like 15 to 20 percent above [average].”

wRC+ makes adjustments for each hitter’s ballpark and league, slightly boosting or suppressing players’ value based on their offensive environment. For PIR, the closest equivalent is adjusting for frequent patches, or updates, to the game itself, which Steenhuisen calls “possibly the most annoying factor” for any analyst trying to make sense out of esports stats.

“The games are changing so fast,” Steenhuisen says. “In the same time it’s taken [soccer] to trial and implement VAR we’ve seen the game of Dota 2 change three times over. Our understanding of what is and isn’t important changes so rapidly, both through changes in the game itself as well as changes in the metagame surrounding the game. Game lengths change significantly in most patches of Dota 2, affecting statistics like kills and deaths to the point where you have to time-normalize them.”

Trautman admits that patches pose the biggest problem for PIR also. “Any time that there is a patch, there’s a high chance that the very fundamentals of the game are gonna change a lot,” he says. Last year, Overwatch was dominated by the “dive” meta, which diminished the role of “tank” characters that can absorb a great deal of damage. “There was a lot less damage, but a lot more precise damage, going on,” Trautman says, adding that “the damage and healing totals weren’t super high, because generally the hero health pools … were lower.” Updates to the game made dive less beneficial and ushered in the era of triple-triple play, a style that favors alignments of three tanks and three support players. That means more healing done and more damage dealt, which in turn means that the expected values PIR placed on healing and damage-dealing during the dive days no longer accurately applied.

Overwatch releases patches at the start of each stage. (Stage 4 of this season will introduce role-locking, ending triple-triple’s reign by forcing teams to field two damage-dealers, two tanks, and two supports.) PIR balances the long and short terms by incorporating data from the previous five stages to derive an initial expected value for each stat in the formula. “Once we have that established,” Trautman says, “we go into whatever current patch we’re trying to balance and do a second balancing pass to bring every hero into actual equilibrium with each other. … The actual amount of impact rating that they’re going to be producing in a given patch is still balanced against other players playing that hero in that patch. Say that we reduced the amount of healing that Mercy is able to do from one patch to another by 5 percent. It would still be compared to an average among all Mercy players on the new patch.”

Bill James once suggested that new stats should, to paraphrase college basketball writer John Gasaway, be “80 percent reassuring but 20 percent surprising.” In other words, a stat that surprises us none of the time isn’t teaching us anything, but a stat that surprises us all of the time is probably missing a decimal point somewhere. PIR’s stance on the Spitfire confirmed what the Overwatch world already thought about Birdring’s injury, but it also has the capacity to surprise. As an example, Trautman mentions the Shanghai Dragons, who almost impressively finished 0-40 last season but have improved to the point that they won Stage 3 of Season 2. Trautman says that Shanghai DPS players Min-seong “diem” Bae and Jin-hyeok “DDing” Yang have received much of the credit for the team’s turnaround, but PIR reveals that less-heralded DPS player Yong-Jin “Youngjin” Jin, who upped his usage of damage hero Doomfist, actually deserves the largest share of the spotlight for the Stage 3 success.

“It’s really cool to see that one player playing Doomfist may have been the little bump that they needed to actually make it all the way through the playoffs,” Trautman says. PIR can also potentially dispel unfounded beliefs about some of the game’s heroes, including the support character Brigitte. “From a public opinion, sentiment sort of thing, people don’t think that Brigitte is a big deal. They don’t think that that hero is important, or I’ve even had people on Twitter suggest that no Brigitte players should be in the discussion for MVP, which I think is frankly silly.” The best counterargument: 18-year-old Vancouver Titans DPS player Hyojong “Haksal” Kim is currently leading all qualified players in Season 2 PIR through Stage 3, and nearly 90 percent of his playtime has been with Brigitte. “It’s interesting to see a player on a roll that the public isn’t super jazzed about,” Trautman says.

Diem, Gamsu, and Youngjin celebrate the Dragons’ victory in the Stage 3 playoffs.

Of course, to accept the surprises, one has to trust the stat. It’s tough to assess the accuracy of PIR from afar, both because the metric is based on a black-box, proprietary formula and because Blizzard isn’t offering much in the way of validation, at least prior to publication. “This is not a stat that is going to be 100 percent predictive of wins,” Trautman says. “But teams that win more tend to produce higher impact ratings. … The best way that we’ve been able to verify that it’s been successful is that looking historically it has actually predicted MVPs in the past.”

As Trautman notes, Excelsior support star Sung-hyeon “JJoNak” Bang, who won the regular-season MVP award last year, also finished first with a 128 PIR. The PIR values for both OWL seasons are fairly compressed, with the best and worst players coming in about 25 percent above or below the league average, respectively. “We like to see it as, there’s a great amount of teamwork required to succeed,” Trautman says. “The other aspect of it is because you are competing against everyone else in your role or on your hero, one player or a group of players doing very well will drag the average towards them.”

Season 1 PIR leader JJoNak hoists his MVP trophy.

Even if PIR proves reliable, it may not meet with an immediately warm welcome. “The players in general are very wary of statistics,” Trautman says. Some spectators may be skeptical, too. In the Dota 2 community, Steenhuisen says, “the fans have been slow to adapt to more complex metrics beyond what’s shown in the game. Perhaps the most promising work done was by Niels Lindgren in 2015 into researching an adjusted plus/minus in team fights and evaluating the impact of players of all roles within these fights. This was great in that it allowed even support players to be evaluated on an equal footing with the stars. Fans, however, have been more receptive to more basic stats such as average gold per minute (GPM) over a tournament, or average kills, or (kills + assists)/deaths.”

The players in general are very wary of statistics.
Ben Trautman, Overwatch League stats producer

Putting PIR out in the public may sway strategies or spur further research, although independent analysts’ hands are tied to some extent because Blizzard restricts access to raw Overwatch data much more than Valve does with CS: GO. In bat-and-ball sports, internal team metrics are often ahead of league-sanctioned stats, but that may not be the case in the OWL. “I don’t know of any teams that specifically have their own player rating metric,” Trautman says (although having hired Matz, the Defiant wouldn’t be a bad bet). “But I do know that at least a couple of teams have started to work on their own team fight models. So theoretically, if they know a stat like this exists, perhaps they think they can make a better one for their own internal use.” Trautman hopes to build on the PIR framework to develop a win-based metric of his own—essentially, Overwatch WAR.

Esports analyst, journalist, and broadcaster Rod “Slasher” Breslau believes that condensing the current array of bewildering and sometimes misleading stats into a smaller suite of more telling ones would be a boon to the industry. “While there are both advantages and disadvantages in the diversity of statistics in competitive gaming, I’m on the side that this is a bug that should be fixed in a future patch for the benefit of esports’ future,” Breslau says. “It’s not just that people are used to seeing and discussing individual player performance stats in traditional sports; the fans, talking heads in the media, and even the players themselves love to talk about them, debate them on talk shows and Twitter, and use them in contract negotiations. All of which has created additional conversation and content opportunities for younger sports fans and more interest in players over teams, and this model of thinking for me blends quite nicely with overarching themes in esports that should be embraced by the industry.”

The fans, talking heads in the media, and even the players themselves love to talk about them, debate them on talk shows and Twitter, and use them in contract negotiations.
Rod “Slasher” Breslau, esports analyst, journalist, and broadcaster

Trautman, who studied biomedical engineering in college and came to Blizzard’s attention through his writing and podcasting about Overwatch, says that “the first stats article I ever wrote about Overwatch, I believe I maintained the statistics with a whiteboard and tally marks.” PIR is a large leap from that, but it’s also a brand-new stat about a fairly young game in a fledgling league, which nearly doubled in size in its second season and may eventually expand to 28 teams. The players aren’t yet unionized. The teams aren’t yet playing games in their home cities (although they will next year). No one knows what strategic adjustments the next patch will inspire. But whatever the meta looks like next year, Overwatch will have a stat for that. And it’s one that it won’t take an advanced degree in esports to understand.

Ben Lindbergh

Ben is a writer, podcaster, and editor who covers culture and sports. He hosts ‘Effectively Wild’ at FanGraphs and previously wrote for FiveThirtyEight and Grantland, served as editor-in-chief of Baseball Prospectus, and authored ‘The MVP Machine’ and ‘The Only Rule Is It Has to Work.’

How ‘Overwatch’ Is Tackling the Esports Stats Problem

Keep Exploring