Thursday, February 03, 2011

SP2010 Do we need to Load balance the Topology Service Application

This post is dedicated to the discussion around whether if you need to load balance the Topology service app in a multi farm scenario where you have a farm that consumes Services from another farm.
I will try to reference the various blog posts that are about the same subject and at the same time show you a detailed diagram that outlines some of it.
The discussion I would like to trigger here is two fold:
  • First of all do we need to load balance the topology service to achieve higher availibility?
  • Secondly, I have seen contradictions in the blog articles. Some people claim the Topology service is responsible for load balancing across service app instances whereas other people claim that these end point urls are stored in the cache of the service app proxies on the consumer farm.
Now let me stop throwing terms and start right there at the beginning by describing a common real world scenario.

Real World Scenario: cross farm services

Assume you want to centralize your Search, User profiles and Corporate taxonomies. What would you do? Maybe you are planning to host different farms like a dedicated farm for your intranet, a specialized farm for your Departments where they can launch apps and a commodity farm for basic, standard team sites. To allow these farms to share the services you will need to introduce another farm often referred to as the Enterprise services farm.
Imagine you have created an Enterprise services farm which has two Application servers. Take an example: the managed metadata service app. Typically you will have a service instance running on each Application server. Both URIs will be published by the Topology service app to each consuming farm. Secondly the consuming farms will push there URIs to the URI cache of each Service App proxy. This may sound complicated but if you look at the diagram below you might be better able to understand it.

Example: detailed overview Shared Services



There are some steps involved because you are able to consume services between farms. You will need to setup a trust and next publish the services.
After you have done that you are able to consume the services on the Consuming farm.

To illustrate: if you have a webpart that makes a call to the managed metadata service app, it will use the service app proxy to determine the end-point of the app. The first time the service app proxy will retrieve the URIs from the topology service app at the Publishing farm. After that it will cache the end points in its own URI cache. Its own load balancing component is using both URIs. In case one of them does not respond, the URI will be taken offline.

Secondly there is a timer job, called the Application address refresh which calls the Topology service app on the Publishing farm every 15 mins. On the Publishing farm, the Topology service app discovers which end point URLs are available and returns those URLs to the Consuming farm. At the Consuming farm these URLs are pushed to the local Service App proxies in case they are updated or deleted.

The Contradictions

If you are reading some blog posts you might already be wondering if I am telling you the truth. Fact is, I have stumbled upon some contradictions.

The post written by Russ Maxwell: http://blogs.msdn.com/b/russmax/archive/2010/05/06/sharepoint-2010-shared-service-architecture-part-2.aspx, shows you that after the proxies have cached the URIs locally, they reach out directly to the Service Apps, not requesting the URIs first through the Topology service app.

However, some posts like the ones from Steve Peschka: http://blogs.technet.com/b/speschka/archive/2011/01/04/additional-info-on-load-balancing-the-sharepoint-2010-topology-service.aspx and the one from Josh Gave: http://blogs.msdn.com/b/besidethepoint/archive/2010/12/08/load-balancing-the-sharepoint-2010-topology-service.aspx make it appear if the Topology service is called every time a Service app proxy needs to connect to the Publishing farm. If that would be the case then we might have a single point of failure!

Single point of failure

The topology service is often published using a server NETBIOS name and a port number (32844 I believe).. That would indeed mean that we have a single point of failure.

Now, what happens if this service app goes down? To my opinion it means that the Application address refresh job will not be able to update the URI endpoints anymore. Secondly, you would not be able to publish new Service Apps to consuming farms anymore. Finally, at the consuming farm, Service apps would still be running fine as the Service app endpoint URLs are fetched from the URI cache of the proxy. The internal load balance component of the Service App proxy would recognize if an endpoint is down and take it offline automaticaly.

Summarized
What does this all mean? To my opinion it would mean that if the Toplogy service app on the Publishing farm goes down you would still have some time to fix the issue and bring it back online. I do not think that the Service Apps that are consumed will be interupted. Although I must admit I haven't tried it out myself yet.

Your input here!

The question is, do we need to load balance the Topology service app on the Publishing farm? Personally, I would say that if your farm is offering services globally 24x7, then yes. You probably do not want to get out of bed in case things go wrong. On the other hand it all depends on your service level and the decision is up to you.
Also, the contradictions still remain: is the Topology service being called every time by the Service app proxies on the consuming farm, or do they rely on their URI cache.

Fact is, I may have it all wrong! ;-) To be on the safe side, I will go through the tests of bringing down the Topology service and have a close look on the ULS log, filter: category = Topology, message = WcfSendrequest..

I have written this blog post to get your input on this! Did you try this out yourself? What are your experiences? Input is more than welcome. Thanks!

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hey Servé - thanks for your comments on my post and for alerting me to the addresses refresh job. I've done a ton of research and verified that in fact you're correct and the Topology Service doesn't need to be load-balanced. This has finally been confirmed internally at Microsoft as well.
    Please see my new post at http://blogs.msdn.com/b/besidethepoint/archive/2011/02/19/how-i-learned-to-stop-worrying-and-love-the-sharepoint-topology-service.aspx for a deep dive into how it all works.
    Thanks again!
    ~Josh

    ReplyDelete
  3. Serve, thanks for the post. My client is this exact situation. We are fleshing out a Business Continuity Plan and are trying to pull an auto failover with cross farm services. They are trying to load balance the URL so we can just switch over to it when the one service goes down. Great info in helping me explain and getting them to understand this process. We are testing this now. Will let you know what we find.
    Jason

    ReplyDelete