New Benchmine Measure to Compare Investment Returns of Employer 401(k) Plans, provided by OnlyBoth, launched in late November a free, open-to-all website for comparing the performance of 55,000+ employer 401(k) plans, offering many novel analytic capabilities to users.

The federal data source (Department of Labor EBSA) reports many core measures of 401(k) plans, leaving it to users of the data to introduce derived measures that better enable broad performance comparison across plans of different sizes. This led to our incorporating total administrative expense ratio, defined as total administrative expenses (Line 2i(5)) divided by total assets (Line 1f), times 100. Note: all data sources referenced here are from Schedule H.

The Benchmine 401(k) engines needed a measure of plan-year investment returns that likewise enables fair comparison. The federal data reports on total income and net income during a plan year, but these include employer and participant contributions and rollovers, which tend to skew the returns on investment. Also, plans sometimes transfer investment assets out of, or into, the plan, which also complicates fair comparison.

After consulting 401(k) industry experts, we decided on a new measure yield on beginning-of-plan-year total assets (yield for short), defined as net earnings on investments (the sum of the 10 column (b) entries from section 2b minus investment advisory and management fees from Line 2i(3)) divided by total assets at the beginning of the plan year (Line 1f(a)), times 100. By itself, this doesn’t deal with the complications of mid-year asset transfers (section 2l), so we added a qualifying criterion that a plan’s asset transfers (incoming + outgoing) be less than 1% of its total assets at the beginning of the plan year. This prerequisite disqualifies about 5% of the 55,788 401(k) plans at Benchmine, which then get a value of N/A for their yield.

These three CY 2021 examples of employer 401(k) plans (names omitted here), from different total-assets brackets, stand out on their joint yield and administrative expenses:

  • Only PLAN (within the $10M-$50M bracket) has both such a high total administrative expense ratio (1.716%) and such a low yield (11.70%).
  • In California with its 209 ($250M-$1B) plans, only PLAN has both such a high yield (18.31%) and such a low total administrative expense ratio (0.002%).
  • PLAN has the highest total administrative expense ratio (0.360%) among the 196 ($100M-$250M) plans that have 1,000 to 4,999 total participants and have at least a 16.66% yield.

In conclusion, Benchmine is now equipped with good measures for both administrative expenses and investment returns, all in the service of enabling fair comparison, heightening performance transparency, helping to drive improvement, and empowering participant choices.

Raul Valdes-Perez

A Unique Way to Get Others to Improve

Joe’s haircut is darn ugly. What are effective ways to persuade Joe, or other people and organizations, such as healthcare providers, to make improvements? One way is to issue orders, but that only works if you’re the boss. Another way is to reliably predict a bad outcome unless improvements are made, as in the case of budgets, health, safety, etc. Yet another way is to teach how to improve on a specific key measure, which can work if people or organizations are self-driven.


It’s often pointed out (e.g., in this Harvard Business Review article) that people are motivated by peer comparisons, which are effective because it’s human nature to notice others and be influenced by them, and because the comparisons are easy to grasp: Your Peer does better at X, and X is important, so try to measure up! But a comparison to a single Peer is subject to the defensive reaction that the Peer has very different circumstances, so the two aren’t comparable!

I wish to put forth a way to make peer comparisons that are arguably persuasive, but rare. People seldom think of them and they are hard to come up with without the help of automated comparisons of available data. These peer comparisons are characterized by a second measure, Y.

Consider telling Joe this comparison:  You have the ugliest haircut of everybody as good-looking as you! Notice that you are implicitly using two measures: (1) haircut ugliness, and (2) good looks. The peer group is everybody who is at least as handsome as Joe. Within this elite group, unfortunately Joe does the worst. On the one hand, Joe feels good about his comparison group, and on the other hand, he has the worst outcome, assuming he cares at all. And the peer group is large, unless Joe is stunning!

Now, for you logician readers, let’s acknowledge that “Joe has the ugliest haircut of everybody who is as good looking as him.” is absolutely equivalent to “Joe is the best looking of everybody with such an ugly haircut.” But psychologist readers will agree that the first version does better at motivating performance improvement, since a likely human reaction to the second version is “Well, at least I’ve got something going for me!

It turns out that automated experiments with healthcare or business data turn up a large number of such peer comparisons. Here are three actual, but anonymized, insights taken from various healthcare sectors at

1.    A California hospital has the lowest communication-about-medicines rating (2 stars) of the 358 hospitals with as high an overall patient rating (5 stars). Those 2 stars compare to an average of 4.3 stars across the 358 hospitals.

2.    In the Southwest, a Texas home health agency has the fewest patients who got better at getting in and out of bed (19.1%) among the 1,651 home health agencies with at least 49.1% of patients who got better at walking or moving around. That 19.1% compares to an average of 65.3% across those 1,651 home health agencies.

3.    Pennsylvania nursing home has the most short-stay residents who had an outpatient emergency department visit (34.1%) among the 317 nursing homes with at most 10.1% of short-stay residents who were rehospitalized after a nursing home admission. That 34.1% compares to an average of 9.8% across those 317 nursing homes.

The basic “shaming” message is this: Why are you so bad at X if you’re so good at the related measure Y? Everybody else with such a good Y is doing better than you! Of course, the world is filled with such potential insights, although coming up with verifiable ones may best be done with rigor by software, as long as data can be collected and analyzed.

Instead of merely ordering Joe to get a new barber, presenting him with a book on hairstyling, or predicting that his love life is doomed unless he improves, let’s try pointing out how poorly he stands out as compared to his wonderful peer group! The same goes for Doris the hospital’s chief quality officer, Nancy the home health agency’s chief nurse, and Mary the nursing home’s director.

[First published on LinkedIn Pulse]

Raul Valdes-Perez


Why Comparing Healthcare Providers Needs Automation

I’ve lived for years in the same area of Pittsburgh, whose streets don’t follow a grid design since it’s hilly and pre-dates the automobile. Sometimes before driving to a familiar destination, I’ll check Google Maps, which alerts me to a favored route that I didn’t even know existed. I act on the suggestion which usually turns out great. Is this unique to mapping, or can this happen in other domains of reasoning and discovery? How about healthcare?

Solution Spaces and Artificial Intelligence

Automated mapping helps me discover new routes not because I’m spatially challenged, but because the software explores side streets which motorists like me don’t consider. Instead, motorists tend to consider the larger, familiar streets that head toward their destination. Using Artificial Intelligence (AI) concepts, we say that mapping software searches for solutions within a larger space of possibilities than people do. In chess play, software considers piece sacrifices which none but top players will ever think of. It also occurs in scientific research. This should happen in healthcare, too, where there are huge potential gains for many stakeholders and rich data sets are publicly reported.

[Continue reading at LinkedIn Pulse …]



Benchmarking the CDC 500 Cities on 28 Health Measures

The CDC’s 500 Cities Project recently published 28 health measures on the 500 largest U.S. cities (see them listed or mapped). The measures cover various resident behaviors, afflictions, medication, and screening. We at OnlyBoth downloaded the data and set up a cities benchmarking engine to answer these standard comparative questions: How is this city doing?, Where could it improve?, and What’s best in class?

Just enter any of the 500 cities at Then click on a left-side question to discover noteworthy peer groups in which your selection is near the top or bottom. Or, set up a fencemarking query and click Go at the bottom to, for example, learn the top insights among all 121 California cities, or to uncover comparatively-high binge drinking there (guess who?).

To appreciate this technology and its simplifying potential to motivate human and customer progress, compare to how standard dashboards have been applied to the 500 Cities data. Or, to understand why dashboards aren’t really up to the task of comparative performance evaluation, check out Why San Mateo Daily Journal Really Doesn’t Like California’s Education Dashboards.

Lastly, if you also wish to benchmark counties, read here.

A sentence is worth 1,000 data.®

Raul Valdes-Perez


Where Does Automated Customer Benchmarking Make Sense?

A customer benchmarking engine is an emerging technology which uses an artificial intelligence approach to automate the reasoning that underlies data-driven benchmarking. Its benefits are discussed here, there, and elsewhere. Briefly, it uncovers comparative insights on customers which empower customer-focused employees to be more proactive, or which are shown directly to those customers as a premium information service. The business benefits include churn reduction, market differentiation, extra revenue, and deeper customer relationships.

But, automated customer benchmarking doesn’t always make sense. So where does it? Here I’ll summarize the criteria that we’ve learned from clients, trials, conferences, discussions, and analysis.800px-Street_Sign_with_ideas

Data. A single organization collects data on its business-customers’ traits, behaviors, business outcomes, and feedback, e.g., via surveys. Lack of data on customer outcomes narrows the scope of the insights, which may still have internal value for account management. Also, the organization should not be contractually prohibited from performing comparative analysis across customers, appropriately anonymized if the resulting insights are to be shown to customers. Evidently, the data shouldn’t be wrong or mostly missing.

Motivation. The organization should be B2B because consumers (B2C) are generally less motivated to improve, because they are less driven by external stakeholders. The same lack of strong motivation may be found if the B2B organization serves very small businesses, which are less prone to carry out performance analysis: if they are tiny but making money, then life is good, and if they’re losing money, there are more-urgent issues to address. Think of your small neighborhood restaurant, for example.

Also, the business process that the organization supports with its services should not be seen as a utility, meaning that customers only care that the service be available when they need it, and little or nothing more. Think of an internet connectivity service, for example.

A strong positive indicator of motivation is when customers themselves ask the vendor organization how they’re doing compared to other customers, where they could improve, etc.

Comparability. In principle, benchmarking only makes sense if the benchmarked entities are comparable. It makes little sense to benchmark an elephant against an armchair and an airplane. Comparable doesn’t mean identical or even similar, just productively worthy of comparison. For example, a business consultancy that brings the smartest people in the world to fix whatever problem you have, whether it’s a leaky roof, runny nose, or buggy software, won’t have comparable customers. An HR SaaS company does have comparable customers, even if its customers range from the Fortune 500 to startups and in between, because HR has common elements across companies of any size or industry: employee motivation, compensation, tenure, promotion, recruiting, dismissal, etc. Comparability is a judgment call, but most B2B vendor organizations do have comparable customers, otherwise it would be hard for them to scale their business.

Scale. A customer benchmarking engine is a powerful tool that scales beautifully with the number of customers. But, just as a search engine is probably overkill if you only possess 50 documents, or a receptionist is overkill if you have 5 employees, benchmarking 50 customers likely isn’t worth the trouble, even though the engine will do its job. Given the tradeoffs, we believe that about 150 is the right minimum number of customers for automated benchmarking to make sense.

It’s worth citing some false disqualifiers which are wrongly believed to invalidate customer benchmarking, automated or not. (1) Customers need not be concentrated by industry or segment, much less be competitors, since one is benchmarking the customer’s business process that is supported by the vendor organization’s service, not benchmarking the customer’s overall market performance. (2) The data suitable for benchmarking is rarely scarce. For example, if a given metric (employee satisfaction, say) is potentially insightful, then so is the quarterly change in that metric, since it expresses a trend. Ditto for the change when compared to the same quarter last year. Thus, the insightful metrics are easily tripled, based on changes over time, as we’ve discussed elsewhere. (3) Data need not be perfect; it never is. And the end-result of imperfect data is not a plane crash, but a misleading insight, which tends to be caught and discarded before significant action is undertaken.

Now let’s summarize the four qualifiers data, motivation, comparability, and scale in a single brief sentence:  A customer benchmarking engine makes sense for B2B organizations that generate rich data on its 150+ non-tiny customers as a by-product of its non-utility-like, repeatable service.

Who are these organizations?  B2B SaaS (software as a service), Industrial Internet of Things, BPO (business process outsourcing), Managed Services Provider, and 3rd-Party Administrator, are generally good matches if they fit the other criteria.

Automation doesn’t always make business sense, especially when the enabling technology lies outside one’s own organization, which circumstance always involves a coordination cost. But automation scales well and can enable things or insights that don’t yet exist. Apart from the benefits discussed elsewhere, this article shares what we’ve learned about where the emerging technology of customer benchmarking engines makes sense.

Raul Valdes-Perez

Why San Mateo Daily Journal Really Doesn’t Like California’s Education Dashboards

In an editorial on March 22, the San Mateo Daily Journal called the new California School Dashboards “problematic”, “confusing”, and “useless”. It’s worth examining their perceptive reasoning, worthy of Silicon Valley, by reading the editorial, but the crux was this:

A dashboard in essence provides useful data, how fast you are going, what level your fuel is, if your engine is running hot, what your revolutions per minute are and if you need to check your engine. Based on your situation, one indicator may be more meaningful for you than others, but it is an apt description for a variety of indicators based on levels of data.

A car’s dashboard alerts to simple problems that call for immediate action:  slow down, get gas, or head to the mechanic. Dashboards were not designed to evaluate a driver’s or car’s ongoing performance, persuade that there’s a problem over a longer time scale, or motivate improvements, much less to compare performance to that of others. Making dashboards serve such goals leads to the editorial’s remark that “… how the information is presented is problematic and even the icons for performance levels are initially confusing.”

Overall, the editorial concludes that the California School dashboards don’t serve the goal of comparative evaluation, aka benchmarking:

The dashboard as it stands right now is fairly useless, and that could change once more information fills in. But the template also seems fairly poor as something parents may be able to use to see how their school is doing compared to other schools in the district or even other districts.

Except in narrow cases like a car running out of gas, we humans are best enlightened, persuaded, and motivated to act by language, which is why Daily Journals write editorials, and I write this post, rather than put up dashboards. Language deals quite well with quantitative information.

Raul Valdes-Perez


Customer Benchmarking Motivates Action

By Ed Powers, Principal Consultant at Service Excellence Partners, a Colorado consulting firm helping to improve customer loyalty and business performance at subscription-based technology companies.

Electric utility companies promoting power conservation programs discovered that simply informing consumers of their electricity usage relative to neighbors lowered overall consumption. This type of normative social comparison has produced the same effects in other domains. Why does this work? How can the idea be used in Customer Success?

Keeping up with the Joneses

Homeowners typically receive letters from the power company showing how many kilowatt hours they’ve used over the past month compared with other homes in the vicinity. In addition to bar charts, consumers see a “smiley face” if they’re doing well or a “frowny face” if they’re trailing the average. The mailers typically also include a list of recommendations for conserving normSurprisingly, this simple trick works. People change their behaviors when they see how they stack up against others. In multiple experiments run by the utility companies, providing benchmarks reduced overall power usage 2%.1 That may not sound like much, but across millions of homes the savings are substantial, helping power companies meet their government-mandated conservation goals. Similar outcomes from communicating descriptive norms have been shown in hospitality (reusing towels more often),2 voting (increasing turnout),3 and charitable giving (boosting the number of donors).4

What causes this behavior? Neuroscientist David Rock says subconscious social drivers are hardwired into our brains after eons of evolution.5 One primary driver is status, how we view ourselves in the “pecking order.” Social bearing means survival—whether it’s humans, birds, or wildebeest, members at the apex of the hierarchy are more likely to continue living and pass along genetic information. To aid our preservation, we reflexively perceive higher status as a reward and lower status as a threat. As a result, we savor outranking others and become anxious when we fall short.

Seeing rank expressed in numbers may increase the urgency to act. Neuroscientists have found that the brain uses the same circuitry to process number comparisons as it does to determine social status.6 The findings suggest we subconsciously use numbers to record social rank, and seeing status expressed numerically activates overlapping neural networks that may add fuel to our emotional response. Whenever emotions are strong, decisions and actions tend to follow.

Better business reviews

Many Customer Success Managers (CSMs) at Software-as-a-Service (SaaS) companies conduct Quarterly Business Reviews (QBRs) with senior executives at key accounts. Customers derive business value from deploying and using software, so a common objective is to ensure implementation milestones and adoption goals are met. Executive engagement and attendance at QBRs, however, is a chronic problem. Often CSMs do a poor job of describing how the software subscription impacts things like organizational productivity and decision making, but other times they simply fail to capture the attention of senior leaders.

Showing comparative data can help. Besides demonstrating progress vs. the customer’s goals, showing results relative to the customer’s own peer groups has greater impact. Executives are usually competitive people. When they see their organization is ahead of the pack, the fact suddenly becomes a talking point with their own bosses. When they see progress is behind the curve, they are more likely to push subordinates and make things happen. Sharing the tidbit garners increased attention and raises CSM status in the eyes of executives. “Information is power” is also true among top managers, and good intelligence is always appreciated.

The approach works at scale, too. Like the power company, automatically communicating descriptive social norms can move the needle in large populations of small customers. When SaaS companies show individual performance relative to benchmarks via tailored e-mails and in-product messages, they can influence behaviors in mass audiences without the need for personal contact. A simple change in how data are communicated can bump customer usage as well as CSM productivity.

New automation tools make the process much easier. OnlyBoth’s benchmarking engine uses artificial intelligence to automatically uncover readable, motivating, action-provoking insights and customer comparisons. CSMs can use this novel intelligence to nudge customers toward greater success during business reviews and for routine, personalized e-mail communications campaigns.

We’re naturally wired to compare ourselves with others. SaaS companies can easily capitalize on this basic human nature. And that would place them ahead of the pack.


1 Alcott, H. (2011). Social norms and energy conservation. Journal of Public Economics 95, pp. 1082-1095

2 Nolan, J., Schultz, W., Cialdini, R., Goldstein, N., Griskevicius, V. (2008). Normative influence is underdetected. Personality and Social Psychology Bulletin 34, pp. 913-923

3 Gerber, A. and Rogers, T. (2009). Descriptive social norms and motivation to vote: everybody’s voting and so should you. The Journal of Politics 71, January 2009, pp. 178-191

4 Frey, B., and Meier, S. (2004). Social comparisons and pro-social behavior: testing “conditional cooperation” in a field experiment. The American Economic Review, December 2004, pp. 1717-1722

5 Rock, D. (2008). SCARF: a brain-based model for collaborating with and influencing others. NeuroLeadership Journal

6 Chiaoa, J., Haradaa, T., Obyb, E., Lia, Z., Parrish, T. Bridge, D. (2008). Neural representations of social status hierarchy in human inferior parietal cortex. Frontiers in Neuroscience.


7 Strategies to Benchmark SaaS Customers to Success

Customer benchmarking —  the practice of identifying where a customer can improve or is already doing well by comparing to other customers – helps Customer Success Managers to deliver unique value to their customers. The comparative insights from benchmarking motivate customers to make changes that produce better outcomes with their solutions. I’ve written more about this link in a recent post.

SaaS customer success leaders publicly encourage greater adoption of this practice. Peter Armaly, a customer success expert with Oracle Marketing Cloud, argued in his presentation at the 2016 Customer Success Summit that it should be a foundational activity of CSMs. In her featured post earlier this year, Kia Puhm, a customer experience consultant and a former executive with Adobe, Eloqua and Blueprint, advocated using customer benchmarking in every QBR, annual renewal discussion and proactive strategic meeting with customers. At the TSIA World conference in May, Rachel Barger presented how Lithium Technologies’ CSMs use their benchmarking program to make recommendations to customers.

I’ve found that SaaS vendors use seven distinct strategies to empower CSMs with customer benchmarking. The first two are longstanding strategies that rely on third-party data. The other five strategies leverage the data that is a byproduct of the vendor’s customer relationships and usage of their solutions. The last two of these five progressively use artificial intelligence to further automate the task.

Strategies Defined

Strategy 1: Customer Benchmarking using Industry Surveys

CSMs can leverage the benchmarking work of industry associations and research firms to help customers set performance targets, identify areas for improvement, and recommend best practices. These independent organizations sponsor benchmarking surveys that are completed by representatives of companies in the industry. The sponsor creates the survey, collects the data and does the analysis. The aggregated, anonymized findings are accessible in published documents or online tools for their members and customers.

With this strategy, the sponsor does all the work, but self-reported data can be unreliable and not specific to the SaaS vendor’s solutions or even customers.

Strategy 2: Customer Benchmarking using Best-Practices Studies

CSMs can help customers strive for superior outcomes by comparing their customer’s practices with the best practices. This approach studies a key business process of several companies that are perceived as the best in their industry and agree to participate. A third-party organization or the SaaS vendor sponsors an on-site study to collect mostly qualitative data on practices, key metrics and business context. The sponsor analyzes the data and reports what they’ve learned.

The sponsor does all the work for this strategy too, and superior practices can be found. But the SaaS vendor’s customers may see the practices of the best companies as unrealistic or irrelevant.

Strategy 3: Customer Benchmarking using Vendor Surveys

SaaS vendors can do their own benchmarking survey when their customer base is sufficiently large to obtain a representative sample. They analyze the aggregated data and share summary findings with their customers. A CSM collects data from an individual customer to compare how they’re doing versus the aggregated, anonymized results at a more granular level.

This strategy benefits from a survey that’s tailored to the vendor’s customer base, but it suffers from the same reliability drawbacks as Strategy 1.

Strategy 4: Customer Benchmarking using Data Scientists

When requested by CSMs, data scientists use their skills with statistics and modeling to mine the SaaS vendor’s data for deep, insightful correlations and comparisons. This is often a one-off solution, although procedures can be set up for recurring needs. The work involves advanced techniques such as regression analysis, stochastic frontier analysis and data envelope analysis using sophisticated software tools.

Data scientists may find unexpected insights by analyzing their own solutions and customer relationship data. The lack of scalability handicaps this strategy.

Strategy 5: Customer Benchmarking using Business Software Reports

By running reports on their aggregated customer data, CSMs can easily see how a specific customer compares with other customers on a given metric. A company’s adopted CRM, BI or CSM software platforms usually provide this capability. For example, customer success software platforms such as Amity and Gainsight generate a scorecard summary which lists the SaaS vendor’s customers and their performance on key metrics. A CSM sorts the list by a selected metric, calculates averages and filters the customers into a relevant group using various attributes.

CSMs can generate reports as needed with this strategy, but the reports contain data, not insights, and support only basic, manual analysis to find insights.

Strategy 6: Customer Benchmarking using your SaaS Product

SaaS vendors can enable their customers to benchmark themselves within their own software solution, which can be used by their CSMs, too. SAP, Apptio, ServiceNow, InsightSquared, Samanage, IQNavigator, ADP, Zendesk and other companies offer this capability. Using the vendor’s anonymized, aggregate customer data, a customer can compare itself against other customers of the same solution on various metrics, and CSMs can compare them also. SaaS companies develop basic benchmarking features in-house (such as scorecards, rankings, averages and several peer groups) or embed technology from specialized benchmarking software vendors like OnlyBoth that do this along with automated, in-depth analysis and narrative reporting.

Customers can get answers to some benchmarking questions on their own, but this strategy doesn’t make use of the internal vendor data that provides valuable benchmarking insights to CSMs but isn’t available for customer viewing.

Strategy 7: Customer Benchmarking using Specialized Benchmarking Software

With advanced automation available from several software firms, CSMs can get many more actionable and deep comparative insights for each customer quickly. Waypoint Group’s TopBox benchmarking module automatically analyzes for significant correlations between customer feedback responses and outcome metrics such as NPS and customer health for several customer peer groups. The Customer Benchmarking Engine available from OnlyBoth automates the data analysis, insight discovery, and narrative reporting by using artificial intelligence and the SaaS vendor’s data.

CSMs benefit from automated analysis which produces many more and deeper insights for each customer, overcoming the limitations of business software reports and data scientist scalability. The challenge is that CSMs are called on to make more judgments on which insights to use and share.

For a brief guide to the pros, cons and requirements of each strategy, see the paper “Pioneering SaaS Customer Success Leaders Seek New Customer Benchmarking Strategies to Deliver Value to Customers.”

Final thoughts

According to research by TSIA, CSMs at 25 percent of SaaS vendors are already using customer benchmarking. These pioneers have recognized the opportunity to enhance customer relationships and sustain their journey with the insights that only they can provide, because only they have the solution-specific data. With the data and strategies that are now accessible, more customer success organizations are poised to adopt this practice.

Jim Berardone