Now for Baseball Teams – A Third Data Model

Today we added insights on baseball teams to our baseball application. This also introduces a third distinct data model, in addition to the models underlying our colleges and players applications. A data model refers to a generic type of application, not merely to a different dataset.

Here, an entry is a team and season, e.g., the 2013 Boston Red Sox, who won the World Series that year. Here are several outputs for that team/year:

  1. The 2013 Boston Red Sox had the 6th-most doubles (363) of the 2,745 teams.
    beat out by the 2008 Texas Rangers (376), the 1930 St. Louis Cardinals (373), the 1997 team (373), and the 2004 team (373), and 1 other.
  2. The 2013 Boston Red Sox struck out the most (1,308) of the 114 teams who won the World Series.
    surpassed the 2004 team (1,189), the 2008 Philadelphia Phillies (1,117), the 2010 San Francisco Giants (1,099), and the 2012 San Francisco Giants (1,097), and others, ending with the 1887 Detroit Wolverines (258).
  3. The 2013 Boston Red Sox were the only team who had players born in Aruba, Canada, Cuba, the Dominican Republic, Japan, Mexico, Puerto Rico, Saudi Arabia, USA, as well as Venezuela.

Before, entering a team/year would return a list of that team’s players during that season. To get that list, the user now needs to add the word “roster” to the query, e.g., “Red Sox 2013 roster”.

Adding this hasn’t been a simple change from the data-analytics viewpoint. Here’s why: Our colleges application does not have a time-element. There is no sense in which Harvard 2014 and Harvard 2004 need to be present. Any time-dependent aspects are expressed via the data attributes, e.g., the tuition increase over the last three years.

No so with baseball players. There, the entries are a player/team/year. A player can play for multiple teams during different seasons, and even within a single season, and it’s conventional and interesting to consider, say, Babe Ruth in 1929 as a distinct object of analysis.

Baseball teams represent a data model intermediate between the colleges model and the players model. It’s interesting to compare teams across seasons, e.g., the 2012 and 2013 Red Sox, but there is no interesting sense in which a team belongs to another entity.  Sure, teams belong to owners, but owners don’t have a large stable of teams, and there aren’t a thousand teams playing every year. If both the latter were true (but they aren’t), then team analytics would indeed resemble player analytics.

In summary, OnlyBoth has launched a new application that is interesting because (1) of what it has to say about historical baseball teams, and (2) it represents a third, distinct data model for OnlyBoth-style discovery and writing.

Raul Valdes-Perez

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s