So much has been written recently on the scaling of myspace.com to 100 million users, I thought it would be interesting to summarize how MedCommons supports 100 million or more users, after all, we are hoping to have this very problem, shortly! Not surprisingly, some of the techniques are quite similar, but MedCommons needs to support more complex interactions with users, devices, and other computer networks than myspace is currently offering.
Don’t let the discussion of myspace fool you. I don’t have an account, and I don’t have any opinion on whether you should have one. But someday all those myspace users will want a MedCommons account, for sure. Here’s a goofy picture of a stack of servers, but that is just what we are talking about for each MedCommons instance.
Federation of Micro-Services
MedCommons encourages the federation of micro-services, allowing different specialist groups to run their own network of machines, patients, providers regardless of physical hospital location. This is a great scalability boon, as it distributes the overall load onto a larger collection of sites. For example, HeartSurgeons, Inc provides all the cardiology services for Hospitals A, B and C. This includes surgeons, assistants, equipment, IT services, and possibly patients. DialysisPartners does the same at a different, possibly overlapping collection of hospitals. Each operates its own equpment, but a MedCommons patient will have an account that is universally accessible.
Federation of Identity
Via support of SAML and the WSTrust protocols, separate and independent Identity Providers can connect to disparate MedCommons service providers. This is highly scalable in principle and there exist a collection of very large Identity Providers who have solved their own scaling problems within. However, there is a speedbump: In practice, legal documents need to be in place between two organizations before they can interoperate.
Markle/Connecting for Health Interoperability
The Markle Foundation has sponsored a series of healthcare policies and frameworks, and prototype implementations, of architecture for sharing information across disparate systems. The Wal-Mart driven Dossia effort will support the PHR interconnection- architecture specified therein, as will MedCommons, and a number of other projects.
The Markle specifications state specifically that there is to be only a single connection point to the network, and all communications between separate systems must be thru this point. The prospect of trying to arrange one single connection point for all MedCommons licensees is a bit silly. So each MedCommons instance will function as a separate SNO when interoperating according to these guidelines.
Horizontal Across PHR Accounts
Far and away, the most useful technique for the MedCommons PHR application, is to partition "accounts" or "users" across a collection of separate systems, and to construct a redirection mechanism that gets any user connected to the correct system whenever service is requested. Partitioning might be done according to user account id (as myspace does) but for now the first partitioning for MedCommons will be by licensee - that is each licensee runs its own MedCommons Instance, on servers that are committed to just that task. The separate licensees interoperate like current ATM networks, all licensees will honor all other licensees cards.
These separate MedCommons partitions can run in separate servers in the same data center, or in disparate, international centers, according to the wishes of the licensees, but all interoperate. Since we can handle any number of separate licensees independently running separate MedCommons instances the next scaling issue is the maximum size instance that we might support? A very large MedCommons licensee like a bank might operate several separate MedCommons instances, as it expanded the customer base to which it was supplying PHRs.
How large an instance can we reasonably handle on completely stock hardware, (eg amazon ec2 servers)? How large an instance can we reasonably handle on a custom assemblage and self-hosting of off the shelf components? Each of these can be measured without too much effort. This sets the size of the sub-partition in terms of the number of users per instance, and also sets a price per user.
Vertical within A MedCommons Instance
The next set of techniques work within a solitary MedCommons instance. The architecture of each instance surrounds a database with a local array of application servers, a local or remote array of gateway servers, and a very large collection of cxp and http devices supplying small updates to users personal health records. These arrays of servers can be added as necessary to support cpu capacity, or to add more user storage caching,
• We can add more application servers, until either the round robin capacity of our server farm is exceeded, or we exceed the number of permitted database connections into the solitary database.
• We can add more gateway servers to increase storage and handle more devices as we see fit. Since these connect dynamically, they will just see performance degradation when we overload the capacity of the central system application servers or database
• We can increase the transaction processing capacity of the database in a variety of ways, including throwing more hardware (more cpus and memory, more disks, better storage) and software (faster database software, smarter coding of MedCommons software)
MedCommons is currently configuring a system to support up to 100K user accounts that functions as a real or virtual appliance - when plugged in, it brings up a useful MedCommons instance that can be customized or used as is.
Finally, we need to consider whether the coordination service run by MedCommons to handle the redirections between instances is a performance limitation? As currently specified, each MedCommons instance interacts with the coordinator once for every 10000 tracking numbers, and once for every 1000 new accounts, so this load is totally controlled. What is unpredictable is how often the redirector service will really need to be invoked, but the answer, for now, is probably rarely. And, if necessary, a custom server could be written to handle the redirections, relying on large memory tables and high speed direct socket connections, rather than the layered, slower approach taken by the current software.