Computing Average-Great

7/1/94, Philadelphia, PA, The Mann Center for the Performing Arts

Dec 15, 2014

There’s an elegant Phish term for shows like 7/1/94: average-great. The band’s debut at the Mann contains three pretty-good jams in Stash, Possum, and Harry Hood, a mildly unorthodox setlist, a nice suite of Page-featuring songs in the first set, and a nine-segue run taking up most of the second set. It’s pleasant enough, but probably not anyone’s favorite show, and somewhat anonymous amid a respected series of shows between mid-June and mid-July.

Old-school Phish commentator Charlie Dirksen gave his fellow Phish writers a gift when he coined “average-great” way back in the rec.music.phish days. It does the heavy lifting of reassuring any overly-sensitive fans that a particular date or song version is still really, really good to our ears in the grand scheme of modern music, while also admitting that sometimes a show is just…a show, relative to the entirety of Phish’s career. It’s just basic statistics — if a band plays more than one show, there’s going to be a median somewhere, a line that half of performances are below and half are above. Only talking about Phish between the boundaries of “best show ever” and “bestest show ever” is incredibly limiting, and does a disservice to the truly great as much as it is overly forgiving to the merely routine.

But what exactly is average-great? On one hand, it’s as implicitly understood by a Phish obsessive as “Type II” and “bustout”; on the other, it’s a bit fuzzy and hard to pin down objectively. Average-great only gets fuzzier when you consider the term across time — surely, average-great in 1988 is different from 1993, or 1998, or 2003, or 2013?

So before I let my 7/1/94 judgment stand, I thought I’d try to quantify what “average-great” meant in the first half of 1994. I will immediately admit that this is a very silly pursuit—boiling the incredibly subjective experience of a Phish show down to simple numbers is blatantly absurd and imperfect. But as Zzyzx has shown for a couple decades, there are Phish numbers to play statistician on, and doing so can challenge or support some of the assumed, shared wisdom of the Phish community. So, with apologies to people who actually know how to do this stuff, let’s take a look.

The obvious first step is to try and measure what the true “average” show was in spring/summer 1994 (and for all these analyses, I’m only using shows I’ve covered to date, so 4/4/94–7/1/94, with the pesky one-setter on 4/28 tossed out). The least objectionable measure of a show’s quality is probably the show ratings on phish.net, even though they present a whole bunch of statistical issues: low sample size, oversampling on popular shows, and the general inability of Phish fans to rate any show lower than 3 stars. With all that in mind, here’s how early 1994 looks, in boxplot form.

For those interested in exact numbers, the median lies at 4.083, the mean at 4.056. The range runs from 3.211 (6/23/94 in Pontiac, MI) to 4.697 (surprise, surprise, it’s the Bomb Factory)—statistical evidence that “average-great” is the default setting for phish.net raters.

Eyeballing the points in that graph suggests that the average rating for shows has been gradually increasing over 1994, which I argued for in the last essay. So let’s look at the overall trend.

There’s a slight dip at the end of June, but clearly the performances have been improving over the course of the year so far. You can even see the scale of 5/7/94's anomaly, which is so highly rated compared to its peers that it pretty much singlehandedly creates an early bump in the data. I then looked at this spring/summer trajectory separated into months, to make sure I wasn’t completely talking out my ass in the 6/30/94 essay.

That’s a pretty nice staircase upward, month-by-month. We can also see an answer to my original question about whether 7/1/94 was really average-great here: at 4.0 its rating sits somewhere between the median show for April and May, and just below the median and mean of all shows so far.

So there’s a pretty good answer for what average-great means for 1994 as of the beginning of July. But I didn’t stop there. With some additional (equally-flawed) data, maybe we could get a sense of what *makes* an average-great vs. just-plain-great show in early 1994. I decided to compare average show rating to a few quantifiable measures of an individual Phish show: number of songs, number of segues, and average song gap.

First, number of songs. To further illustrate how the nature of average-great changes over time, I would argue that a larger number of songs is indicative of higher show quality in 1994, whereas the opposite relationship would probably be true for the rest of the 90's. That hypothesis is based off of what made for an excellent show in June—typically a second set full of song sandwiches, weirdo interstitials such as Kung and Catapult, and named jams (i.e. “Digital Delay Loop Jam” or “Midnight Rider Jam”) that inflate song totals. Let’s test it.

It works! Phish was remarkably good at keeping shows between 20–24 songs total in 1994, but the outliers on the upper end are nearly universally adored, twisting the trend upward. It’s also perhaps a bit of foreshadowing that the shows on the low end, still a robust 18 songs, are also highly regarded.

Next, I thought it would be interesting to look at segues. There are definitely some data quality issues here, since a segue is often a subjective call, and certain “assumed” segues such as Horse > Silent, TMWSIY > Avenu > TMWSIY, and HYHU > Whatever > HYHU can boost the numbers. But if we consider segues to be a loose measure of show cohesion or set flow, they may have something to say about the quality of a show. If the data can be believed, they certainly do.

Just for fun, here’s a graph of songs + segues against show rating — it’s sloppy math, because the two are hardly independent factors, but worth trying out.

One last factor I wanted to look at was Average Song Gap, which takes the mean of each individual song’s “Shows Since Last Seen” figure for a given show. This figure could be considered a measure of the scarcity of each setlist, or perhaps just how far the band was willing to stretch outside of the heavy rotation material on a particular night. In 1994, when certain Hoist songs were in ultra-heavy rotation and average show gaps were as low as 2.84, one might hypothesize that shows with higher show gaps and more rarities would fare better on the rating scale.

Eh, a weaker result than for songs and segues. But that crazy outlier on the far right (4/24/94, which is largely distorted by the 728-show gap for “Jump Monk”) is really messing with the graph. Here’s what it looks like with that show deleted from the dataset.

A little better, though not as dramatic a relationship as for songs and segues. It looks like a higher average show gap is good, but the benefit tops out around 12—possibly because higher show gaps than that reflect a single really long-term bustout that is cool to hear, but not a show-changer.

Taken together, we have a recipe for a great show in the first few months of 1994: Play lots of songs, connected by a lot of segues, with a moderately deep draw from the repertoire. I’ll mention one more time that this conclusion is deeply silly — every Phish fan knows that the setlist on paper rarely tells the full story, and I’m missing several factors (song length? number of key changes within jams? “->” segues vs. “>” segues?) that could play a major role in a show’s subsequent reception. Consider it an imperfect perspective from the numbers side — an approach that can be applied to look for and help define the average-great and the great-great throughout Phish’s history.

[For anyone interested, I made these graphs in R with ggplot2, off of data that I gathered manually since I don’t know how to web scrape. If you can improve upon these figures/methods, feel free!]

BONUS ANALYSIS: June ‘94 vs. August ‘93

In the last essay, I argued that June ‘94 was a return to the heights of the previous August, with a similar manic energy and unpredictable setlists. Now, that claim can be tested…with boxplots!

The figure indicates that June ‘94 definitely had a lower median rating and wider variation than August ‘93. But those points at the top sit in similar territory for the two months, reflecting that the best shows from the two months have similar ratings on Phish.net. So June ‘94 was less consistent than August ‘93, but reached similar heights. Let’s say I was half-right.

Phish Essays

Discussion about this post