Benchmarking 15,665 Nursing Homes

Today OnlyBoth launches as a public service what is likely the largest benchmarking analysis ever conducted, as measured in terms of readable language output.

The U.S. has about 1.4 million residents of nursing homes and 15,665 Medicare or Medicaid certified nursing homes. The federal government, through its regulatory powers and reimbursement function, collects performance data on all of these nursing homes, which cry out for comparison in order to understand how each is doing and where each falls short, compared to all peers or their subsets, without bias.

We downloaded data from the federal Nursing Home Compare website and spent a couple of days consolidating the information and configuring our Benchmarking Engine, then pushed a button (figuratively!) and waited just a day and a half. The output consists of 642,192 insights, totaling more than two Encyclopaedia Britannicas in terms of English words contained in grammatical, to-the-point sentences and paragraphs. Enter a nursing home and browse the insights at http://nursing.onlyboth.com.

What did the benchmarking engine find? At one extreme, the engine found the most things to say about Signature Healthcare at Saint Francis in Memphis, TN although most of these were not complimentary. There is clearly room for improvement there.

We often say that a massive analysis, as only an automated benchmarking engine can do, can find specific areas where even the best can improve. We discussed such a case – Stanford University Hospital – while launching our earlier hospitals benchmarking engine.

Let’s revisit sunny California. The US News & World Report lists Edgemoor Hospital in Santee, CA as the top nursing home in California, due to its top five-star rating in all the major categories. Could even Edgemoor improve? The engine reveals several areas for improvement, this one for example:

Edgemoor Hospital in Santee, CA has the most facility-reported incidents (9) of all the 200 nursing homes that have the top rating in each of overall, health inspection, quality measures, staffing, and registered-nurse staffing (5 total). Those 9 represent 18.8% of the total across the 200 nursing homes, whose average is 0.2.

Now let’s consider Bridgepoint Sub-Acute and Rehab Capitol Hill in Washington, DC which is the nursing home nearest Capitol Hill, where Congress meets. This facility does especially well in bladder and bowel control among low-risk, long-stay residents, as compared to other for-profit facilities that locate within a hospital. The engine found six specific areas for keen improvement, the first of which is this:

Bridgepoint Sub-Acute and Rehab Capitol Hill in Washington, DC has the 3rd-most high-risk long-stay residents with pressure ulcers (24.8%) among the 1,982 Mid-Atlantic nursing homes. That 24.8% compares to an average of 6.4% across the 1,982 nursing homes.

Another noteworthy insight it that the facility has “the most severe deficiencies on the health survey (5) of the 718 nursing homes that have the top rating in each of quality measures, staffing, and registered-nurse staffing.”

This application is launched as a public service as well as a technology showcase, benchmarking well over triple the number of entities that were benchmarked in our previous largest application, to 4,803 hospitals. The relevance to business is this: the advent of cloud services, internet of things, and other means for collecting customer performance data will enable the automated benchmarking of business processes, generating tremendous economic value by benchmarking 10,000 entities with the same amount of work as benchmarking 10 entities.

Our goal is universal betterment by providing persuasive, motivating insights that pinpoint what is going well and where improvement is sorely needed and is achievable. Benchmarking Engines will do for business benchmarking what Search Engines did for information seeking, assigning to computers what they do better – massive comparisons – and to people what they do better – evaluating and following up, as appropriate – on benchmarking insights.

Raul Valdes-Perez

Relaunch of Hospitals Benchmarking Engine

OnlyBoth benchmarks U.S. hospitals as both a public service and as a visible demonstration of the power of an automated Benchmarking Engine. This enables hospital stakeholders to instantly discover in perfect English how they’re doing, not compared to absolute standards or arbitrary peers, but to all peers and groups.

We launched our first version this summer. Today we relaunched our hospitals benchmarking engine based on fresh data and technical advances:

updated Hospital Compare dataset from Medicare.gov, now on 4,803 hospitals
new hospital attributes relating to hospital performance and geography
better expression of key types of insights
improved heuristics leading to more insights per hospital
addition of data on hospital networks, enabling intra-network comparisons

1. We have refreshed the data in the hospital application based on a late-September data release at the Hospital Compare data download page. This new release also contains new hospital attributes, as discussed below.

2. Since geography is an important determinant of peer groups, we’ve added attributes that enabling grouping East Coast, Southern, and Western states. We’ve also added two new attributes from the updated Hospital Compare data that relate to deaths or unplanned readmission due to coronary artery bypass grafting (CABG) surgery, and five new attributes that express hospital-readmission ratios for various afflictions.

3. A key type of insight expresses how entities that are within an elite peer group fall short along some key dimension. For example, our recent Harvard Business Review article, which explains why benchmarking is done wrong and how to do it right, gives this example of Stanford Hospital:

None of the other 344 hospitals with as many patients who reported YES, they would definitely recommend the hospital (85%) as Stanford Hospital in Stanford, CA also has as few patients who reported that the area around their room was always quiet at night (41%). That is, among those 344 hospitals, it has the fewest patients who reported that the area around their room was always quiet at night.

As the saying goes, this was too clever by half. After considering feedback from surveying users, this insight now appears, with the refreshed data, like this:

Stanford Hospital in Stanford, CA has the fewest patients who reported that the area around their room was always quiet at night (40%) among the 811 hospitals with at least 80% of patients who reported YES, they would definitely recommend the hospital (Stanford Hospital is at 84%). That 40% compares to an average of 69.4% and standard deviation of 10.7% across the 811 hospitals.

Of course, this improvement affects thousands of insights, and millions in the future.

4. We’ve improved the heuristics that enable finding valuable needles within the huge haystack that results from taking multiple slices out of a dataset of half a million hospital attribute values. Our new Hospitals Benchmarking contains 522,142 insights, or around 109 insights per hospital, compared to the previous 101 per hospital. The key benchmarking question – Where can this hospital improve? – has seen a 4% increase in answers per hospital.

5. For a hospital-network executive, it’s valuable to benchmark individual hospitals against others in the network, especially because knowledge transfer of good practices can happen more easily when two entities have the same owner. We’ve added a parent attribute that for now includes four networks: UPMC, Kaiser Foundation, Texas Health Resources, and NYC Health and Hospitals. We’ll add other hospital networks over time.

We expect that this hospitals application, and the diffusion of benchmarking engines in general, will further the goal of enabling universal betterment through data-driven comparison with peers, greatly simplified in terms of human work, but greatly expanded in terms of action-provoking insights.

Raul Valdes-Perez

Avoiding Tunnel Vision in Peer Comparisons

Comparing yourself to peers – also known as benchmarking – lets you understand how you’re doing, identify performance gaps and opportunities to improve, and highlight peer achievements that you could emulate, or your own achievements to be celebrated. As long as data is available, peer comparison can potentially accomplish all of these. The opportunities for peer comparison are greatly increasing due to cloud and other services that generate data as a by-product of serving customers.

The problem is that peer comparison as generally practiced suffers from Tunnel Vision and so misses a lot, to everyone’s detriment. To understand why, let’s first consider an analogy to search engines.

An information seeker, before there were search engines, might have gone to consult a librarian on, say, computers and heard “That’s technology, so look in the Technology books section, over in the back, by the right.” But there’s plenty of material on computers that’s catalogued elsewhere, e.g., automation’s impact on employment and job training, the philosophical question of whether computers in principle could do everything that people do, cognitive modeling of human reasoning using computers, computer history, and so on. The point is that looking only in the Technology section is an example of Tunnel Vision, or maybe bookshelf vision. Search engines changed that.

So where’s the Tunnel Vision in peer comparisons? It’s almost universal practice that the benchmarker chooses one or two organizational goals, then picks a few key metrics (key performance indicators) relevant to those goals, and finally selects several peer groups from a limited set. The outputs are then the mean, median, distribution, or high-percentile values for those peer groups on those metrics. The conclusion is that the organization may or may not have a problem, which may or may not be addressable. The flaw in all this is that organizations have many goals and subgoals, and many metrics that could reveal performance gaps, especially if a very large set of peer groups could also be explored. But our human inability to explore many paths in parallel imposes this Tunnel Vision, for the same reason that pre-search-engines information seekers went looking in one or two sections of the library.

As an example of peer-group selection, suppose you wanted to compare the U.S. against other nations. What would be the right peer group? Here are some that make sense: democracies; the Anglosphere; constitutional republics; large countries; developed countries; OECD or NATO members; the western hemisphere; non-tropical countries; largely monolingual countries; business-friendly economies; and even baseball-playing nations. Moreover, peer groups could be formed dynamically, e.g., countries at least as big as the U.S. in population or territory. And what would be the right metrics? The mind boggles at the number of interesting possibilities, all of which may have available data. As already pointed out, standard practice is to first specify an overarching goal, which then drives the choice of metrics and peer group. (Some web examples of standard benchmarking outputs are here, there, and elsewhere.) But what if the goal is to understand broadly how you’re doing and where you could improve? Tunnel Vision is caused by over-specific goals, limited metrics, and biased peer groups, all part of standard benchmarking practice which is made obsolete in the face of exploring all interesting metrics and potential peer groups that could lead to operational improvements.

Let’s run some numbers to show the scope of Tunnel Vision. Suppose there are 10 attributes with yes/no values and another 10 attributes that can take on any of five different values, plus one attribute that can take on 50 values, e.g., a U.S. state. There are theoretically 2¹⁰ x 5¹⁰ x 50 = 500 billion peer groups. Even if we include only peer groups whose attribute values match those of the specific individual to be benchmarked, the number would be 2²¹ = 2.1 million peer groups.

Let’s move from the abstract to the concrete. Here are two (accurate) peer comparisons that are arguably insightful:

1. St Anthony Community Hospital in Warwick, NY has the lowest average time patients spent in the emergency department before they were seen by a healthcare professional of all the church-owned hospitals in the mid-Atlantic.

2. Macalester College in Saint Paul, MN has the highest total student-service expenses of any big-city private college that doesn’t offer graduate degrees.

Note the peer groups: (1) church-owned; mid-Atlantic; and (2) big-city; private; doesn’t offer graduate degrees. Now consider an imaginary peer comparison that uses four attributes to form a noteworthy peer group:

3. Cumulus Inc. is the most profitable of all the B2B, cloud-based, venture-backed companies that have at least 200 customers.

We see that considering more peer groups leads to uncovering more valuable benchmarking insights. Since the number of possible peer groups is vast, and benchmarking has seen little automation, this means that Tunnel Vision is necessarily widespread.

But the Tunnel Vision gets much worse! Peer groups can be formed, not just by picking non-numeric (aka symbolic) attributes, but also by dynamically determining numeric thresholds. Here’s a revealing (and true) insight that contains a dynamically-formed peer group:

None of the other 344 hospitals with as many patients who reported YES, they would definitely recommend the hospital (85%) as Stanford Hospital in Stanford, CA also has as few patients who reported that the area around their room was always quiet at night (41%).

That is, among those 344 hospitals, it has the fewest patients who reported that the area around their room was always quiet at night.

This is clearly a provocative insight. One can imagine a hospital CEO reacting in one of these ways:

We’re profitable, prestigious, and have great weather. What’s a little nocturnal noise?
There’s been night-time construction next door for the last year, and it’s almost done, so the problem will solve itself.
I can’t think of any reason why we should be at the bottom of this elite peer group. I’ll forward this paragraph to our chief of operations to investigate and report back what may be happening.

This peer-comparison insight wouldn’t be found by today’s conventional benchmarking methods. Instead, what may be found is along these lines: The average value for this quantity among 309 California hospitals with known values is 51.5% with a standard deviation of 9.5%, so Stanford Hospital is about 1 standard deviation below average. The reader can judge which of the two insights is the more action-provoking, not just for the single individual in charge, but for the entire team that needs to be roused to act on and address performance gaps.

So far, we’ve used some math to highlight the Tunnel Vision problem and shown specific examples, real or fictitious, of what is being missed. As our last step, let’s report the results of actual software experiments.

The website hospitals.onlyboth.com showcases the results of applying an automated benchmarking engine to data on 4,813 U.S. hospitals described by 94 attributes, mostly downloaded from the Hospital Compare website at Medicare.gov. A combinatorial exploration of peer comparisons among the 4,813 hospitals turns up 98,296 benchmarking insights that survive the software’s quality, noteworthiness, and anti-redundancy filters, or about 20 per hospital. In this hospitals experiment, insights were required to place a hospital in the top or bottom ten within a peer group of sufficient size.

There appear 522 different peer groups that are formed by combining the hospital dataset’s 24 non-numeric attributes in various ways. As noted above, the number of peer groups is much, much larger if one counts, not the attributes used, but the diverse ways to combine attribute values, e.g., the attribute “state” can either be used or not, so there are two alternatives there, but the number of state values is 50 (or more, including non-state territories), implying many more alternatives. The number of peer groups becomes still larger when accounting for dynamically-formed peer groups based on numeric thresholds.

Of course, the engine explored more peer groups than appear in the end results, which are those found to be large and noteworthy enough to bring to human attention. Also, each peer group appears in many insights by combining them with the available metrics. On average, each of the 522 peer groups enables over 900 individual hospital insights, by further combining each peer group and metric with different hospitals.

Summarizing, Tunnel Vision in peer comparisons, or benchmarking for understanding and improvement, is widespread but misses a vast number of noteworthy and action-provoking insights that could help improve organizational performance. Without automation, there aren’t enough people and time in the world to explore what’s outside the Tunnel, select the best insights, and bring them to human attention. Software automation is the way forward.

Raul Valdes-Perez

Benchmarking Financials

Today we have launched a second public showcase application of our Benchmarking Engine, this time to mostly Department of Education IPEDS data on U.S. post-secondary educational institutions, or private colleges for short, that follow the FASB accounting standard, ranging from Harvard to the Belle Academy of Cosmetology. The financials data is from FY 2013, the latest available from IPEDS.

The 1,889 private colleges are described with 151 mostly-financial attributes, of which 101 are dollar amounts (investment, spending, debt, liability, etc. and their sub-categories) and 11 are financial ratios, augmented by some college rankings and profile and type attributes.

Given its emphasis on internal financial metrics, this benchmarking application addresses the core benchmarking questions from an institutional viewpoint, not from a student or faculty point of view. Some value judgments were made, for example that less debt is better than more debt, but of course in some circumstances more debt can be good, such as when the interest is low and the return on the debt is high.

Here is an example insight on how Columbia University can improve:

None of the other 1,631 private colleges with as few total liabilities ($3.028B) as Columbia also has as much debt related to property, plant, and equipment ($1.479B). That is, it has the most debt related to property, plant, and equipment among those 1,631 private colleges.

On a related note closer to home, here’s a rather favorable insight about Carnegie Mellon:

In the Mid Atlantic with its 434 private colleges, only Carnegie Mellon both spends as much on research ($284.3M) and has as few research expenses – operation and maintenance of plant ($9.958M).

Clearly, software, psychology, and decision science don’t cost much! Over on the west coast, Stanford is seen to have rich, forthcoming donors:

Stanford has the most private gifts ($694.5M) among all 1,889 private colleges. Those $694.5M represent 4.3% of the total among all 1,889 private colleges, whose average is $8.632M.

As a final example, let’s move southwest and to smaller colleges, for example Austin College in Texas:

In the Southwest with its 101 private colleges, only Austin College has both as much construction in progress ($32.95M) and as few total assets ($251.5M).

Build it and they will come!

Accounting statements and financials in general are an especially promising application of Benchmarking Engines, because financial metrics follow established standards – FASB in this case – and relate to critical organizational performance.

Enter your own private college here.

Raul Valdes-Perez

Ranking SaaS Vendors by their Benchmarking Activity

As I’ve argued elsewhere, business benchmarking has been held back by the problem of data availability, as well as by the lack of software automation, despite its worthy goal of enabling continuous organizational improvement.

benchmarking-saas

Most SaaS vendors are uniquely placed to sidestep the availability problem, because SaaS generates rich data as a byproduct of serving its customers. This data can be captured by vendors and put to good use, for the benefit of those same customers, via benchmarking. The exceptions tend to be utility-like SaaS, whose customers only care whether the service is on or off, or vendors who have little visibility into how customers perform the business process that their services support.

So how well are SaaS vendors exploiting this emerging opportunity? To find out, we analyzed the benchmarking activity of the Montclare SaaS 250 – the 250 “most successful SaaS companies in the world” according to Montclare, self-described as the “Industry’s Premier Research and Consulting Firm Focused on SaaS.” For each vendor, we measured benchmarking focus by dividing the number of its website’s hits on the query benchmarking by its total number of webpages, both as reported by the Google API. Below are the ranked results, which range from 0% to 94%. We opted to leave untouched a few anomalous results due to hits from hosted content, e.g., at Google and at LinkedIn.

Overall, there’s lots of activity. SaaS is a busy playing field for benchmarking. Unanswered here is whether that activity reflects actual vendor benchmarking services or something else. Also not addressed is whether vendor benchmarking is powered by automation.

Interestingly, SaaS pioneer Salesforce.com comes in below at #221. Benchmarking on its website tends toward blog topics or partner activity, not Salesforce’s own offerings.

Raul Valdes-Perez

Veracode 94.08%
Tangoe 87.13%
IQNavigator 80.62%
Meltwater Group 70.14%
athenahealth 64.66%
SciQuest 53.85%
MediData Solutions 53.48%
ON24 26.37%
ComScore 25.95%
Intel 25.69%
Apptio 25.3%
ServiceSource 20.42%
Symantec 16.41%
Deltek 15.14%
GTNexus 14.95%
Xactly 14.01%
Blackbaud 13.9%
Jobvite 13.21%
Trend Micro 12.81%
Synygy 12.74%
Coupa Software 12.52%
AlphaBricks 12.5%
Domo 11.85%
ADP 11.54%
Beckon 10.78%
Marin Software 10.45%
Intacct 10.0%
Act-On Software 9.51%
Peoplefluent 9.03%
E2open 8.62%
CallidusCloud 7.86%
Amber Road 7.8%
Fleetmatics 7.64%
Demandware 7.3%
Instart Logic 7.22%
Reval 7.22%
Wolters Kluwer 7.01%
Globoforce 6.83%
3D Systems 6.82%
Marketo 6.72%
eGain 6.5%
RingLead 6.5%
Achievers 6.14%
FICO 6.13%
CRMnext 5.82%
Veeva Systems 5.79%
KnowledgeTree 5.73%
Basware 5.61%
Deem 5.52%
Cornerstone OnDemand 5.4%
Bullhorn 5.34%
LiveOps 5.19%
Tidemark 5.03%
Hubspot 4.91%
Lattice Engines 4.9%
MindTree 4.87%
Telogis 4.87%
Plex 4.69%
InsideView 4.48%
Cloudpay 4.46%
Monitise 4.44%
Nice Systems 4.27%
Birst 4.25%
Payscale 4.24%
inContact 4.23%
NewVoiceMedia 4.19%
Anaplan 4.16%
PROS Holdings 4.08%
Zuora 4.01%
New Relic 3.99%
Mimecast 3.97%
Qualys 3.88%
GoodData 3.86%
FinancialForce.com 3.8%
Insidesales.com 3.75%
Actian 3.73%
Cerner Corporation 3.66%
CSC 3.66%
Healthstream 3.66%
MYOB 3.64%
Adaptive Insights 3.6%
Gainsight 3.6%
ClearSlide 3.55%
Verint Systems 3.52%
Oracle 3.45%
Lumesse 3.38%
Ultimate Software 3.33%
AppDynamics 3.26%
Kronos 3.24%
Ramco Systems 3.2%
Halogen Software 3.18%
RightScale 3.13%
Descartes Systems 3.12%
Workday 3.09%
Fujitsu 2.98%
NetSuite 2.93%
Ceridian 2.89%
QuestBack 2.88%
Ericsson 2.84%
Dassault SystÃƒÂ¨mes 2.8%
Rocket Fuel 2.79%
Nuance Communications 2.7%
DealerTrack 2.66%
Selectica 2.6%
Survey Monkey 2.57%
AdRoll 2.54%
Opower 2.52%
Saba 2.52%
iCIMS 2.5%
Intuit 2.48%
Rally Software 2.44%
Blackline Systems 2.38%
Host Analytics 2.37%
eVariant 2.36%
Covisint 2.34%
Apttus 2.32%
Proofpoint 2.3%
VMware 2.3%
cVent 2.25%
EMC Corporation 2.24%
Epicor 2.24%
ServiceMax 2.23%
CashStar 2.09%
SAS Institute 2.08%
SugarCRM 2.08%
Infor 2.03%
OpenText 2.0%
SPS Commerce 1.95%
WebTrends 1.94%
Akamai Technologies 1.93%
DATEV eG 1.89%
FPX 1.82%
Hitachi 1.81%
Huddle 1.81%
Threatmetrix 1.8%
BroadVision 1.79%
Kyriba 1.79%
Support.com 1.71%
Castlight Health 1.68%
Atlassian 1.65%
Workforce Software 1.65%
Bottomline Technologies 1.6%
Brightcove 1.6%
Retail Solutions 1.57%
2U 1.51%
Five9 1.5%
LinkedIn 1.41%
Hyland Software 1.4%
Workfront 1.39%
Informatica 1.34%
Mulesoft 1.33%
SilkRoad 1.31%
IBM 1.22%
Mix Telematics 1.22%
BenefitFocus 1.18%
Blue Jeans Network 1.18%
MicroStrategy 1.11%
Trustwave 1.1%
Google 1.09%
TIBCO Software 1.09%
Xero 1.09%
Blackboard 1.08%
Silver Spring Networks 1.08%
Zendesk 1.04%
AeroHive Networks 1.01%
Alfresco 1.01%
Clarizen 1.01%
GitHub 0.98%
Jive Software 0.98%
Paychex 0.98%
ASG Software 0.97%
Cision 0.96%
Freshbooks 0.95%
Logik 0.94%
Practice Fusion 0.94%
Autodesk 0.92%
SolarWinds 0.89%
Pegasystems 0.88%
Digital River 0.87%
Siemens 0.86%
Constant Contact 0.84%
LivePerson 0.84%
Synchronoss 0.81%
Dell 0.78%
Citrix 0.77%
Opera Software 0.76%
Hewlett-Packard 0.75%
Tableau Software 0.75%
Avangate 0.66%
Paylocity 0.65%
Mindjet 0.64%
Cisco Systems 0.63%
Aria Systems 0.62%
Hightail 0.62%
Glassdoor 0.6%
Nakisa 0.6%
Okta 0.6%
Deluxe Corp 0.57%
ChannelAdvisor 0.56%
FrontRange 0.54%
CA Technologies 0.53%
Daptiv 0.51%
SAP 0.51%
ServiceNow 0.51%
BMC Software 0.5%
IntraLinks 0.5%
Splunk 0.49%
Finnet Limited 0.47%
Bill.com 0.46%
Limelight Networks 0.46%
Box 0.44%
Zoho 0.42%
Adobe Systems 0.41%
CollabNet 0.41%
SugarSync 0.41%
MobileIron 0.39%
Lithium Technologies 0.32%
RingCentral 0.32%
Twilio 0.32%
Elance/oDesk 0.3%
Salesforce.com 0.29%
Zscaler 0.28%
Magic Software Enterprises 0.27%
Microsoft 0.25%
Jitterbit 0.23%
Parallels 0.23%
Bazaarvoice 0.17%
Basecamp 0.16%
Active Network 0.15%
M-Files 0.15%
DocuSign 0.14%
LogMeIn 0.14%
DropBox 0.11%
Rocket Lawyer 0.11%
Doximity 0.08%
Ping Identity 0.06%
BorderFree 0.04%
Evernote 0.04%
TOTVS 0.04%
Exact Holding NV 0.03%
Arena Solutions 0.0%
Carbonite 0.0%
Cybozu 0.0%
Eventbrite 0.0%
j2 Global 0.0%
KDS 0.0%
META4 0.0%
Paycom 0.0%
Xtenza Solutions 0.0%
Vend 0.0%

How Organizations Can Improve

In my discussions and readings, I’ve come across several ways that leaders try to improve organizational performance. Let’s consider one approach that might be called “Just do it” and could be parodied as follows. The Maximum Leader decides that twenty different metrics express the performance of an organization, and then mandates a 50% improvement in each metric.

Pretty simple. What’s wrong with that approach? There are several drawbacks.

The first problem is that people can’t work on too many things at once. The same is true about organizations in the context of initiatives that involve multiple parts of the organization and so require coordination. People and organizations both need to focus.

A second problem is that it may not be practical, or sometimes even theoretically possible, to achieve large performance increases across the board. Each metric expresses a different operational aspect and may be subject to different, practical limits to achievable improvements.

Third, people and organizations need to be convinced and inspired to act. Unless you can mandate prison time – or worse – on top of mandating all the improvements, people need reasons which are best articulated persuasively and linguistically, that is, as sentences. (Mere words aren’t reasons and so aren’t best at persuading, although they can summarize and inspire, e.g., Onward and Upward!)

A conventional alternative to massive mandates is comparing oneself to other organizations in order to identify several important areas in which an organization is falling short (problem #1 – focus), other organizations achieving better results (problem #2 – practicality), and to express the problem and need succinctly in order to persuade others to get fully on board (problem #3 – articulation).

Benchmarking in theory can achieve all of these, since it starts with data that enable comparison with other organizations. As we have written elsewhere, benchmarking has been held back because it’s applied often in cases where data availability is a problem, and because the lack of automation leads to high costs, uncertain outcomes, and the biases that are necessarily introduced by the manual methods that are employed.

Benchmarking engines are the way forward, especially when they lead to concise, specific insights, expressed well, on a large variety of dimensions that can be addressed departmentally, not just organization-wide. These insights are a spur to action – the action of deciding whether something is a practically addressable issue that should be a near- or mid-term priority, and then doing something about it. The issue could be either a problem or a cause for praise, and the actions can improve on the problems or lead to copying the practices that resulted in the good outcomes revealed by benchmarking.

Organizations need to focus on practically soluble issues that can be set up for action and can be articulated persuasively to people.

Raul Valdes-Perez

Why I joined OnlyBoth

People who know me have been asking why I joined OnlyBoth – an early stage technology startup. That’s quickly followed by the question: what is OnlyBoth anyway? Fair questions.

First, the “what” question. Founded by entrepreneurs Raul Valdes-Perez and Andre Lessa, OnlyBoth is the pioneer of artificial intelligence-based benchmarking software. Its fusion of proprietary artificial intelligence and natural language generation technology enables companies in a variety of industries to automatically discover critical business insights from data, and to communicate these in plain English.

Now the “why” question. There are actually five reasons, ranging from the lofty to the practical. Let me explain.

Reason One: Impact. I see this as an opportunity to make a huge impact – on customers, industry, and society, as well as on OnlyBoth’s employees and partners. The bigger the impact, the more energized I get. We’re addressing a pervasive need – to know how someone or something is doing in comparison to others. Capitalizing on the combination of today’s Big Data and OnlyBoth’s software, uncovering comparative insights and triggering business improvements is cheaper, simpler and more convenient than ever. It automates a process, which has been, until now, largely based on expensive and scarce talent. And it has a potentially strong position in a large, attractive market – key prerequisites for the most successful products. OnlyBoth’s unique technology was years in the making, so it won’t be quick or easy for anyone else to create an alternative that delivers better results. So I see OnlyBoth’s technology as disruptive – in a good way.

Reason Two: Startup. The company entered the market this summer. I joined at the end of July, and prior to this, I was occasionally advising and helping the co-founders discover the market problem/opportunity. For me, joining a company when it first gets started is the ideal time to learn and grow together. I’m at my best building stuff from scratch into something of value. I’ve helped to build new businesses, new product lines, new product categories, new customers and new skills for four previous startups, as well as three multinational corporations and Carnegie Mellon University. OnlyBoth provides an opportunity to work on all of these. So the timing on this one was nearly perfect.

Reason Three: Values and Respect. I respect a lot of entrepreneurs and business people, but I haven’t always been at ease with their values. However, Raul, OnlyBoth’s CEO, is someone I believe I can work with comfortably for a long time. That’s important because it can take years to build something new that makes a real impact. A company’s CEO, more than anyone else, shapes the culture, standards and identity of an organization. Raul is very keen on doing what’s right, whether it’s for customers, employees or other stakeholders. I’ve found Raul to be straightforward and open. He has integrity, values loyalty, and has a strong customer-value orientation. He stays focused, seeks input from others, gives people freedom to use their talents, and delivers on commitments. I share those values; they give me confidence we can work effectively together. And besides, experience matters. Raul has made this journey before. IBM acquired his first software startup twelve years after it was founded.

Reason Four. Finances. As a result of IBM acquiring their previous business, the founders are in a position to finance and scale OnlyBoth by themselves. Although it’s an attractive opportunity for investment capital, and VCs are actively funding other projects in the artificial intelligence and natural language technology spaces, we can devote our energies to solving customer problems, creating value with our products, and building a solid business, rather than pitching it to potential investors. Even better, we’re not dealing with embryonic technology. The research and development of OnlyBoth’s technology started back in the late 1990s with the support of a National Science Foundation grant, and it’s now ready for market.

Reason Five. Responsibilities. I’m very excited about my new role as Chief Customer Officer. I’m leading the development of our customer base and making sure our customers get the maximum value from those products and from our relationship. But in the early stages of a startup, flexibility is key, so my role could evolve and take on new dimensions. In just my first month, I’ve had insightful discussions with over 30 people from organizations in our target market. I’m learning a lot about the customer problems, the options they have to solve them, their readiness for a new solution, and where our communications can be improved. The people I’ve met have been very generous and helpful, for which I’m extremely grateful. It’s a great way to help build a solid foundation for business success.

That said, you’d think this would have been an easy decision for me. But it wasn’t. As you may know, I’d been itching to create another startup and I spent a lot of effort exploring technology solutions to a few business problems and evaluating a number of opportunities to commercialize technology originating from Carnegie Mellon University and the University of Pittsburgh. I was actually getting very close to starting a tech business.

And then, to make my decision even more difficult, I was surprised and honored to be asked to teach at a prominent west coast university and to lead a tech company’s product organization. I will continue teaching my strategic marketing and product management course at Carnegie Mellon, but I feel lucky that Raul asked me to join when he did. Opportunities like this don’t come by often; I’m glad I didn’t miss it.

Jim Berardone

Enter the Benchmarking Engine!

OnlyBoth was founded in March 2014 based on technology that answered a new question about data, never before posed computationally: What’s unusual or exceptional about a given entity, compared to all its peers? The technology’s origins were in research carried out at Carnegie Mellon University in the late 90s, sponsored by the National Science Foundation under a research grant to one of OnlyBoth’s co-founders. The technology was set aside for 12+ years while the co-founders worked together at Vivisimo, which was also founded on technology first developed at Carnegie Mellon. After IBM’s acquisition of Vivisimo, Lessa and Valdes-Perez got together again to commercialize OnlyBoth’s founding technology.

But first there was a puzzle to solve. The original work was a classic example of curiosity-driven research, in which the researcher often asks “Can this be done?” after first getting an idea of a novel “this”. The story in this case is told here. If the answer is “Yes, it can be done and here’s how.” then the next puzzle is how to convert this into an innovation that serves a need or creates an opportunity.

For the last year, we at OnlyBoth have been trying to identify how this technology best meets a human need or enables new accomplishments. There was no single aha! moment, but instead a gradual realization that the underlying technology fit the goals of benchmarking in the business world.

To understand benchmarking’s goals, we had to understand the questions that benchmarking seeks to answer. After much reading and thinking, we settled on these core benchmarking questions, which we have rephrased for brevity:

How are we doing?
Where could we improve?
What’s best in class? (peers may remain anonymous)

It turns out that OnlyBoth’s core technology is uniquely suited to answering these questions. But that’s only half the battle. The other half is: “Does benchmarking need improvement?” Our research revealed to us that it clearly does. Although benchmarking has laudable goals, it has a spotty reputation (e.g., see this Harvard Business Review article) because of multiple flaws, partly due to a lack of automation, and partly due to other circumstances that could be cured by moving to a more-promising playing field.

Our next post will examines these flaws and how software automation, based on artificial intelligence and algorithm design, removes them. Read here for a preview.

In view of this breakthrough, which matches a novel, unique technology with a business practice sorely in need of software automation, as of today we are introducing the novel concept, backed by mature technology, of a Benchmarking Engine and demonstrating its application openly to public data on all 4,813 U.S. hospitals as made available at the Hospital Compare website at Medicare.gov.

Going forward, our mission at OnlyBoth is now this: Universal betterment through automated benchmarking.

Raul Valdes-Perez

First comes Content, then come Sentences

Automated Insights has been acquired by Vista Equity Partners in conjunction with Stats LLC, a sports data and analytics vendor also bought by Vista (see TechCrunch, Xconomy, ZDNet). There has been media speculation about the implications of automated-writing (aka natural language generation) technology, sparked by this acquisition and by ongoing deployment successes by Automated Insights and also Narrative Science out of Chicago.

A frequent flaw in such analyses consists of treating artificial-intelligence software with a double standard that is not also applied to people. For example, way back in 1998 I wrote a letter to The Scientist dismantling criticisms of the Arrowsmith program which discovered unnoticed connections between themes in the medical literature. Critics had said that “everyone has some skepticism that you can find something new from what’s already out there”. Gee, people do that all the time.

Human writers succeed less often by solely becoming skilled at writing than by also becoming knowledgeable in a subject or mastering an analytical skill, which then give them something to write about. This is true of novelists, scientists, journalists, historians, and the rest: First comes the content, then come the sentences. Under this lens, one can argue that Automated Insights and Narrative Science technology is not so much automated writing but rather recurring event summarization, i.e., one of many analytical skills.

Thus, I’ll predict that the emerging technology for writing will pair up (1) specific subject-matter or analytical expertise with (2) writing as the more easily-mastered skill. Technical progress on (1) will be the bottleneck.

Raul Valdes-Perez

Now for Baseball Teams – A Third Data Model

Today we added insights on baseball teams to our baseball application. This also introduces a third distinct data model, in addition to the models underlying our colleges and players applications. A data model refers to a generic type of application, not merely to a different dataset.

Here, an entry is a team and season, e.g., the 2013 Boston Red Sox, who won the World Series that year. Here are several outputs for that team/year:

The 2013 Boston Red Sox had the 6th-most doubles (363) of the 2,745 teams.
beat out by the 2008 Texas Rangers (376), the 1930 St. Louis Cardinals (373), the 1997 team (373), and the 2004 team (373), and 1 other.
The 2013 Boston Red Sox struck out the most (1,308) of the 114 teams who won the World Series.
surpassed the 2004 team (1,189), the 2008 Philadelphia Phillies (1,117), the 2010 San Francisco Giants (1,099), and the 2012 San Francisco Giants (1,097), and others, ending with the 1887 Detroit Wolverines (258).
The 2013 Boston Red Sox were the only team who had players born in Aruba, Canada, Cuba, the Dominican Republic, Japan, Mexico, Puerto Rico, Saudi Arabia, USA, as well as Venezuela.

Before, entering a team/year would return a list of that team’s players during that season. To get that list, the user now needs to add the word “roster” to the query, e.g., “Red Sox 2013 roster”.

Adding this hasn’t been a simple change from the data-analytics viewpoint. Here’s why: Our colleges application does not have a time-element. There is no sense in which Harvard 2014 and Harvard 2004 need to be present. Any time-dependent aspects are expressed via the data attributes, e.g., the tuition increase over the last three years.

No so with baseball players. There, the entries are a player/team/year. A player can play for multiple teams during different seasons, and even within a single season, and it’s conventional and interesting to consider, say, Babe Ruth in 1929 as a distinct object of analysis.

Baseball teams represent a data model intermediate between the colleges model and the players model. It’s interesting to compare teams across seasons, e.g., the 2012 and 2013 Red Sox, but there is no interesting sense in which a team belongs to another entity. Sure, teams belong to owners, but owners don’t have a large stable of teams, and there aren’t a thousand teams playing every year. If both the latter were true (but they aren’t), then team analytics would indeed resemble player analytics.

In summary, OnlyBoth has launched a new application that is interesting because (1) of what it has to say about historical baseball teams, and (2) it represents a third, distinct data model for OnlyBoth-style discovery and writing.

Raul Valdes-Perez

OnlyBoth Blog

A sentence is worth 1,000 data.®

Benchmarking 15,665 Nursing Homes

Relaunch of Hospitals Benchmarking Engine

Avoiding Tunnel Vision in Peer Comparisons

Benchmarking Financials

How Organizations Can Improve

Why I joined OnlyBoth

Enter the Benchmarking Engine!

First comes Content, then come Sentences

Now for Baseball Teams – A Third Data Model

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: