Infusing Performance Transparency into Employer 401(k) Plans

OnlyBoth is proud to infuse unprecedented performance transparency into the world of employer 401(k) plans, about 55,000 of them, in partnership with, by applying a unique AI-based technology for comparative analytics (e.g., benchmarking) to the latest, completed EBSA 5500 and Schedule H data. DCIIA is offering this service as a member benefit.

Visit this DCIIA page for an introductory video as well as a series of ten brief explainer videos that illustrate how you can discover answers to the following questions about 401(k) plans:

  1. Where does a plan stand out from related peer groups?
  2. How does a plan compare to a user-selected peer group?
  3. What’s best in class, i.e., the best achievement on a given measure by a similar plan?
  4. How does a plan score compare to all others in the same industry?
  5. How do up to 10 plans compare side-by-side?
  6. What are the overall top- or bottom-scoring plans?
  7. What are the ranked scores for a group of plans selected by industry or geography?
  8. What are the top benchmarking insights for a group of plans that I select?
  9. What are the top benchmarking insights that mention specific data attributes?
  10. What plans match the characteristics, measures, and/or geography that I select?

A Comparison of 23 Healthcare Comparison Websites

This study examines 23 prominent healthcare comparison websites that U.S. consumers, and other healthcare participants, can use to find and evaluate healthcare providers in terms of, for example, their quality, patient experiences, and cost. Our emphasis is on free website tools that offer side-by-side comparison of two or more providers, in a way that informs consumer choice but also provides value to other users within the large and complex landscape of healthcare.

Our goals in carrying out this empirical study are these:

  • Inform potential users about the landscape of website tools, analogous to how each tool informs users about the healthcare landscape.
  • Inform website-tool designers about the scope of existing features that could be offered by their tools, and perhaps prompt thoughts about non-existent features that should be invented and added.
  • Data on healthcare providers contributes to transparency and helps drive improvement via several alternative pathways. Similarly, data on comparison websites enhances tool transparency and can drive improvement among comparison tools.

In science it is understood that when there exist numerous similar entities, it helps to overlay a structure based on identifying aspects on which they differ; such is our intent here. Often such structure leads further to devising categories that help make sense of the landscape, much as how medical science forms categories out of individual diseases.  

Continue reading

Evaluation Engines Recognized as Finalist for Healthiest Communities Data Challenge

At the recent Health Datapalooza held March 27-28, 2019 in Washington DC, OnlyBoth’s submission to the Healthiest Communities Data Challenge was recognized as one of the three finalists, all equal co-winners of the Challenge.  We submitted our portfolio of four distinct comparative-analytics engines – benchmarking, comparison, discovery, and scoring – to extract maximum value from the rich county-level dataset on Social Determinants of Health, made available to the pool of 30 contestants, to which we added Census data on county populations.  The resulting engines are accessible via, or directly at

Our supporting partners in this submission were HealthBegins and the Allegheny County Health Department.  The benchmarking engine, used to deeply assess the comparative performance of a single county against others nearby and nationwide, was augmented by HealthBegins with recommended actions to take for certain performance deficiencies. For example, here is a comparative deficiency for Bronx County:

Bronx County, New York has the most adults who don’t eat enough daily fruits & vegetables (77.60%) among the 64 counties with at least 876,764 in population (Bronx County, New York is at 1,471,160). That 77.60% compares to an average of 73.58% and standard deviation of 2.90% across those 64 counties. […]  Among those 64 counties, it also has the most adult diabetes (12.3%).

which can be addressed by the recommendations seen by clicking on Taking Action.

Institutions and communities can take a specific, multi-pronged approach to increase daily consumption of fruits and vegetables among adults.  The CDC Guide to Strategies to Increase the Consumption of Fruits and Vegetables describes each of the following strategies in detail. […]

Want to evaluate your city or state?  Let’s say Boston.  Assess the social determinants of health in Suffolk County or Middlesex County.  Then do a side-by-side comparison of the 7 counties surrounding Boston.  Then examine the biggest achievements and improvement opportunities (i.e., deficiencies) across all Massachusetts counties.

Migrating to California, here are two interesting comparative insights for San Francisco county and San Mateo county.

San Francisco County, California has the most violent crimes per 100,000 population (702.66) of the 78 counties with at least $78,621 in median household income (San Francisco County, California is at $81,294). […]

Only San Mateo County, California has both such a high median household income ($93,623) and such a high natural amenities index (8.19). […]

We at OnlyBoth are pleased to show that AI-style comparative analytics can enable unprecedented transparency in both healthcare as well as health, and thus help drive performance improvements and inform consumer choice, all for the public good.

Raul Valdes-Perez


Scope of Innovation of Healthcare Benchmarking Engines

The engines at, powered by Artificial Intelligence methods and principles of User Experience design, show these technology-enabled advances over the benchmarking status quo.

1. A user experience based on selecting questions to be answered and getting noteworthy insights as answers, rather than pushing lots of data and dashboards without a clear sense of what is being answered and what is noteworthy. The first-encounter UI poses these questions:
o How is this provider doing? (i.e., where does it stand out positively or neutrally?)
o Where could it improve? (where does it stand out negatively?)
o Where has it changed? (over the last year or two, what changes stand out?)
o What’s best in class? (what are top achievements on specific measures by similar providers?)
o Where does it stand in its county? (or other geography, based on scoring the insights found)

2. Insights are written as perfectly readable, and shareable, English sentences, rather than dashboards. This key novelty led to our trademarked “A sentence is worth 1,000 data.®” and addresses a problem identified in the National Academy of Medicine article Fostering Transparency in Outcomes, Quality, Safety, and Costs, that ”Research has demonstrated that many of the current public reports make it cognitively burdensome for the audience to understand the data.” We believe that dashboards are fine to alert that a warehouse is on fire or your car is nearly out of gas, but not to motivate thoughtful deliberations on performance improvement.

3. Calculating provider latitude & longitude, which enables benchmarking each provider against others nearby, e.g., within 20 or 50 miles, or other distances selected by the user.

4. Insights are supplemented with highly-related facts which help the user understand the significance or scope of the stand-out behavior or outcome. These addenda are also written in precise English.

5. Peer groups are not limited to the usual state, national, and perhaps a pre-defined cohort. Instead, the engine does a massive search for peer groups, expressed as a simple combination of data attributes, in which the benchmarked provider stands out. Geographic proximity can be one of these attributes, alone or with others.

6. An especially novel type of benchmarking insight involves aligning two numeric measures. One measure expresses the stand-out behavior, while the second forms the peer group, possibly in combination with symbolic attributes. For example, “In Texas, Park Plaza Hospital in Houston, TX has the lowest nurse-communication rating (2 stars) of the 88 hospitals with as high a doctor-communication rating (4 stars).”

7. By specifying any known algebraic relationships among measures, the engine can insert action-oriented remarks such as the one italicized in this nursing-homes insight (see it online):  “Carroll Manor Nursing & Rehab in Washington, DC has the fewest total nurse staffing hours per resident per day (2.05) of all the 724 nursing homes that are located within a hospital. That 2.05 is 57% lower than the average of 4.8 across those 724 nursing homes. Reaching the average of 4.8 would imply an extra 80.9 nursing staff per day, assuming an 8-hour workday.”

8. Input data can be numeric, symbolic, yes/no, and even set-valued, which gives rise to innovative comparisons like this“Of the 1,488 hospitals that have at least 4 stars as an overall hospital rating, Shasta Regional Medical Center in Redding, CA is one of just 2 that have a 1-star rating in each of cleanliness, communication about medicines, doctor communication, and quietness (4 total).”

9. As discussed in an AHRQ report on usage of hospital evaluation websites, consumers and healthcare professionals often need different content. So, we have introduced a “Switch Audience” toggle, visible to the user when an insight contains content that appeals to one but not the other, which lets users declare their roles. See the difference by switching the audience to “professional” at this insight on emergency-room wait times. and noticing the paragraph that begins with “Note that …”

10. The final novelty is automation, so that many provider measures can be assessed with the same (human) effort, addressing this point by Dr. Robert Brook: “… quality must be measured in a comprehensive way in order to motivate an institution or physician to provide high-quality care. […] if just a few measures are used to assess quality, the quality of care delivered across all patients in all diseases will be distorted, emphasizing those things that are being measured. Fortunately, we have many well-tested comprehensive quality of care measures that can help prevent this distortion.” Moreover, automation enables introducing measures that express a change over time, and not just the last measurement period, so providers can be compared on how they’ve improved or gotten worse. For example: “… has the biggest plunge in cleanliness rating over one year (-2 stars) of the 1,193 hospitals on the East Coast.

Raul Valdes-Perez

Why Comparing Healthcare Providers Needs Automation

I’ve lived for years in the same area of Pittsburgh, whose streets don’t follow a grid design since it’s hilly and pre-dates the automobile. Sometimes before driving to a familiar destination, I’ll check Google Maps, which alerts me to a favored route that I didn’t even know existed. I act on the suggestion which usually turns out great. Is this unique to mapping, or can this happen in other domains of reasoning and discovery? How about healthcare?

Solution Spaces and Artificial Intelligence

Automated mapping helps me discover new routes not because I’m spatially challenged, but because the software explores side streets which motorists like me don’t consider. Instead, motorists tend to consider the larger, familiar streets that head toward their destination. Using Artificial Intelligence (AI) concepts, we say that mapping software searches for solutions within a larger space of possibilities than people do. In chess play, software considers piece sacrifices which none but top players will ever think of. It also occurs in scientific research. This should happen in healthcare, too, where there are huge potential gains for many stakeholders and rich data sets are publicly reported.

[Continue reading at LinkedIn Pulse …]



Unprecedented Data-Driven Performance Transparency in Healthcare, starting with Nursing Homes

I am proud to announce that, as part of OnlyBoth’s strong focus on healthcare during 2018, we just launched a web-based Nursing Homes benchmarking engine that deeply leverages the latest, rich data on 15,646 nursing homes published in January 2018 by Medicare’s Nursing Home Compare.  The press release is here, and the new, “front door” to the engine is at, shown here:benchmineOne of our goals is to bring the ultimate performance transparency to healthcare sectors, leveraging initially the tremendous work done by Medicare’s contractors, nursing-home inspectors, and nursing homes themselves to contribute data for public access.

To further this goal, we have chosen to make the service simple, quick, and especially affordable. To evaluate a single home, users pay $9 one-time with a credit card or Paypal. To evaluate any of the 15,646 nursing homes, pay $39. To perform queries that go across all nursing homes, pay $99. These payments give access to one quarterly edition of the engine. We expect to create new editions every quarter, using the latest published data. Read here about the features available at different price points, which support various roles within the nursing home industry.

Finally, I’ll emphasize that the benchmarking engine, which discovers comparative insights worth knowing and writes them up in perfect English, without injecting biased opinion anywhere, generates more words in its nursing-home application – around 80 million – within insightful sentences than are contained in the entire Oxford English Dictionary or the Encyclopedia Britannica.

But don’t let that volume scare you. Just as Google’s search engine stores nearly all the world’s web content, but brings you a manageable number of results, worth knowing, that are relevant to your query, so does a benchmarking engine!

Raul Valdes-Perez

Benchmarking the CDC 500 Cities on 28 Health Measures

The CDC’s 500 Cities Project recently published 28 health measures on the 500 largest U.S. cities (see them listed or mapped). The measures cover various resident behaviors, afflictions, medication, and screening. We at OnlyBoth downloaded the data and set up a cities benchmarking engine to answer these standard comparative questions: How is this city doing?, Where could it improve?, and What’s best in class?

Just enter any of the 500 cities at Then click on a left-side question to discover noteworthy peer groups in which your selection is near the top or bottom. Or, set up a fencemarking query and click Go at the bottom to, for example, learn the top insights among all 121 California cities, or to uncover comparatively-high binge drinking there (guess who?).

To appreciate this technology and its simplifying potential to motivate human and customer progress, compare to how standard dashboards have been applied to the 500 Cities data. Or, to understand why dashboards aren’t really up to the task of comparative performance evaluation, check out Why San Mateo Daily Journal Really Doesn’t Like California’s Education Dashboards.

Lastly, if you also wish to benchmark counties, read here.

A sentence is worth 1,000 data.®

Raul Valdes-Perez


Where Does Automated Customer Benchmarking Make Sense?

A customer benchmarking engine is an emerging technology which uses an artificial intelligence approach to automate the reasoning that underlies data-driven benchmarking. Its benefits are discussed here, there, and elsewhere. Briefly, it uncovers comparative insights on customers which empower customer-focused employees to be more proactive, or which are shown directly to those customers as a premium information service. The business benefits include churn reduction, market differentiation, extra revenue, and deeper customer relationships.

But, automated customer benchmarking doesn’t always make sense. So where does it? Here I’ll summarize the criteria that we’ve learned from clients, trials, conferences, discussions, and analysis.800px-Street_Sign_with_ideas

Data. A single organization collects data on its business-customers’ traits, behaviors, business outcomes, and feedback, e.g., via surveys. Lack of data on customer outcomes narrows the scope of the insights, which may still have internal value for account management. Also, the organization should not be contractually prohibited from performing comparative analysis across customers, appropriately anonymized if the resulting insights are to be shown to customers. Evidently, the data shouldn’t be wrong or mostly missing.

Motivation. The organization should be B2B because consumers (B2C) are generally less motivated to improve, because they are less driven by external stakeholders. The same lack of strong motivation may be found if the B2B organization serves very small businesses, which are less prone to carry out performance analysis: if they are tiny but making money, then life is good, and if they’re losing money, there are more-urgent issues to address. Think of your small neighborhood restaurant, for example.

Also, the business process that the organization supports with its services should not be seen as a utility, meaning that customers only care that the service be available when they need it, and little or nothing more. Think of an internet connectivity service, for example.

A strong positive indicator of motivation is when customers themselves ask the vendor organization how they’re doing compared to other customers, where they could improve, etc.

Comparability. In principle, benchmarking only makes sense if the benchmarked entities are comparable. It makes little sense to benchmark an elephant against an armchair and an airplane. Comparable doesn’t mean identical or even similar, just productively worthy of comparison. For example, a business consultancy that brings the smartest people in the world to fix whatever problem you have, whether it’s a leaky roof, runny nose, or buggy software, won’t have comparable customers. An HR SaaS company does have comparable customers, even if its customers range from the Fortune 500 to startups and in between, because HR has common elements across companies of any size or industry: employee motivation, compensation, tenure, promotion, recruiting, dismissal, etc. Comparability is a judgment call, but most B2B vendor organizations do have comparable customers, otherwise it would be hard for them to scale their business.

Scale. A customer benchmarking engine is a powerful tool that scales beautifully with the number of customers. But, just as a search engine is probably overkill if you only possess 50 documents, or a receptionist is overkill if you have 5 employees, benchmarking 50 customers likely isn’t worth the trouble, even though the engine will do its job. Given the tradeoffs, we believe that about 150 is the right minimum number of customers for automated benchmarking to make sense.

It’s worth citing some false disqualifiers which are wrongly believed to invalidate customer benchmarking, automated or not. (1) Customers need not be concentrated by industry or segment, much less be competitors, since one is benchmarking the customer’s business process that is supported by the vendor organization’s service, not benchmarking the customer’s overall market performance. (2) The data suitable for benchmarking is rarely scarce. For example, if a given metric (employee satisfaction, say) is potentially insightful, then so is the quarterly change in that metric, since it expresses a trend. Ditto for the change when compared to the same quarter last year. Thus, the insightful metrics are easily tripled, based on changes over time, as we’ve discussed elsewhere. (3) Data need not be perfect; it never is. And the end-result of imperfect data is not a plane crash, but a misleading insight, which tends to be caught and discarded before significant action is undertaken.

Now let’s summarize the four qualifiers data, motivation, comparability, and scale in a single brief sentence:  A customer benchmarking engine makes sense for B2B organizations that generate rich data on its 150+ non-tiny customers as a by-product of its non-utility-like, repeatable service.

Who are these organizations?  B2B SaaS (software as a service), Industrial Internet of Things, BPO (business process outsourcing), Managed Services Provider, and 3rd-Party Administrator, are generally good matches if they fit the other criteria.

Automation doesn’t always make business sense, especially when the enabling technology lies outside one’s own organization, which circumstance always involves a coordination cost. But automation scales well and can enable things or insights that don’t yet exist. Apart from the benefits discussed elsewhere, this article shares what we’ve learned about where the emerging technology of customer benchmarking engines makes sense.

Raul Valdes-Perez

Not Having Strategic Conversations with Customers? Here’s How to Fix That

I keep hearing that “our Customer Success Managers need to have more strategic conversations with our customers.” I’ve heard this in numerous discussions with many customer success executives over the last year. And more recently, I continued to hear the same thing while attending customer success conferences and meetups in San Francisco, Toronto, Denver and Pittsburgh. This is a big problem. If CSMs aren’t having strategic conversations with key customers, it will be very difficult to proactively move their accounts forward in any significant way. This keeps us all up at night.

I want to share a technique that customer success executives can use to help CSMs be equipped for strategic conversations with key accounts. For years, I worked with newer product managers to help them think more strategically. The technique I used was to ask three key questions that pushed them to seek the fundamental knowledge and insights they need for effective strategy discussions.  Below, I’ll share and unpack these questions that I’ve adapted for customer success managers.

Three questions CSMs need to answer to engage in strategic customer conversations

1.  Where is your customer going?

Strategy is what one intends to do to move from its current state to a desired future state.  Therefore, a CSM’s readiness for strategic conversations must begin with knowledge of a customer’s objectives and goals. If a CSM doesn’t know where a customer is going, how can they have any hope of engaging that customer strategically? When your CSMs knows this answer – down to the level of the organizational unit that purchased your solutions – they can add value in several ways: (a) focusing attention on actions that will have the most impact on a customer’s goals; (b) providing meaningful guidance to a customer; (c) evaluating a customer’s progress and (d) demonstrating the value they’ve realized.  When a customer is lacking effective goals, a CSM can help them. I discussed how a CSM can do this in a recent blog post.

2.  How will your customer get there?

I’d want the CSM to understand three items: (a) the primary approach a customer is taking to get to their desired future state; (b) the initiatives the customer will take to execute it and when; and (c) the role our solutions play in this. This knowledge of their customer’s strategy is important for several value-adding tasks including:

  • Identifying opportunities to add value with your other existing products, services and programs
  • Anticipating and prioritizing unmet customer needs
  • Aligning the roadmaps for products, services and programs with the needs of key accounts
  • Relating the actual use of your products, modules and features to the customer’s desired outcomes
  • Sharing best practices and lessons learned from other customers with similar strategies

3.  Where is your customer today?

The current state of the customer is often the focus of a CSM. But, for this question, I’d want to know if a CSM understands how a customer is doing versus their goals and versus benchmarks created from the results achieved by other similar customers. More specifically, do they know (a) Where is the customer doing well? (b) Where could they improve? (c) What has changed? (d) Where have we helped them effectively? (e) Where have we held them back? When a CSM has these comparative insights, they have the basis for adding value in several more ways including: prompting exploration of the reasons behind the results; sharing potential remedies known to work with other customers; and demonstrating the additional value a customer has realized with a solution.

Make the questions a regular activity

Strategic conversations with customers begin with understanding the strategic context of a customer. If your CSMs need to have more strategic customer conversations, try asking these questions on a regular basis in your internal customer review meetings. Because of your interest, your CSMs will put more emphasis on getting this information and understanding it. Plus, you’ll gain opportunities to coach and develop their strategic thinking skills.  Ultimately, when your customers and Customer Success Managers are talking strategically, customers will move closer to their goals and you’ll move closer to your goals.

Jim Berardone

Why San Mateo Daily Journal Really Doesn’t Like California’s Education Dashboards

In an editorial on March 22, the San Mateo Daily Journal called the new California School Dashboards “problematic”, “confusing”, and “useless”. It’s worth examining their perceptive reasoning, worthy of Silicon Valley, by reading the editorial, but the crux was this:

A dashboard in essence provides useful data, how fast you are going, what level your fuel is, if your engine is running hot, what your revolutions per minute are and if you need to check your engine. Based on your situation, one indicator may be more meaningful for you than others, but it is an apt description for a variety of indicators based on levels of data.

A car’s dashboard alerts to simple problems that call for immediate action:  slow down, get gas, or head to the mechanic. Dashboards were not designed to evaluate a driver’s or car’s ongoing performance, persuade that there’s a problem over a longer time scale, or motivate improvements, much less to compare performance to that of others. Making dashboards serve such goals leads to the editorial’s remark that “… how the information is presented is problematic and even the icons for performance levels are initially confusing.”

Overall, the editorial concludes that the California School dashboards don’t serve the goal of comparative evaluation, aka benchmarking:

The dashboard as it stands right now is fairly useless, and that could change once more information fills in. But the template also seems fairly poor as something parents may be able to use to see how their school is doing compared to other schools in the district or even other districts.

Except in narrow cases like a car running out of gas, we humans are best enlightened, persuaded, and motivated to act by language, which is why Daily Journals write editorials, and I write this post, rather than put up dashboards. Language deals quite well with quantitative information.

Raul Valdes-Perez