Communication between appdomains

From Sense/Net Wiki
Jump to: navigation, search
  •  
  •  
  •  
  •  
  • 100%
  • 6.1
  • Enterprise
  • Community
  • Planned

Overview

MSMQ
Any application (web node in NLB, or external tool) that uses the same Content Repository is handled as a separate application domain with its own index and own cache memory, so when they are running simultaneously they need to inform each other about the changes made to the Content Repository by any application domain. Communication is done via MSMQ, a message queue service integrated into Windows operating systems.

Details

Why we need messaging

The different application instances (appdomains) that connect to a Content Repository have their own index and own cache memory they rely upon when they serve requests or execute queries in the Content Repository. If there are more then one application instances that connect to the same Content Repository - for example if two web nodes operate to handle requests in an NLB configuration, or an Active Directory synchronization is scheduled to be run periodically while the portal is running - MSMQ has to be set up and configured for all instances to ensure that every one of them gets notified about changes made by any other instance to the Content Repository. For example if an Active Directory synchronization is running in the background and MSMQ is not correctly configured, you will not see the newly created users in the Content Repository, only after you've restarted the system. Whereas if MSMQ is configured you will be able to see changes immediately simply by refreshing the page that lists the portal users, since the web instance will get informed from the AD sync of changes via MSMQ.

Basic concepts

Whenever an appdomain makes any change in the Content Repository, or needs to communicate with another appdomain for any other reason (for example sends a request for a running web instance to make a backup of its index - see Backup Tool), it sends a message to all of the configured queues. Every appdomain needs to have its own configured private MSMQ queue.. The messages hold all the necessary information using which the receiver can execute the required tasks. A message will retain in the queues for a given interval (see configuration below). All other instances are periodically checking their incoming queues for incoming messages, and process every message they find there.

One local queue per appdomain

Every appdomain needs to have its own configured local private MSMQ queue for incoming messages - installed on the same machine as where the appdomain is running. This means that if for example an Active Directory synchronization tool is running on the same machine as a web node at least two queues need to be set up: one for the web node and one for the ad sync tool. When an appdomain sends a message, it sends it to all other appdomains' incoming queues, and will not send the message to its own incoming queue. To process incoming messages every appdomain reads its own incoming queue. This means that an appdomain uses different queues for sending and receiving. This is the ideal setup to enable fast message processing and high message loads.

Why not use a single global queue?

The essence of the multi-queue setup is that an appdomain always reads messages from a local queue, as reading messages could be slow from remote queues. On the contrary, sending is fast regardless of the target queue being local or remote. Therefore a single global queue would mean for all appdomains to read from a remote queue in case of a high availability setup, which would result in lower performance compared to that of a setup where every appdomain reads from a local queue.

Please note, that the one local queue per appdomain concept was introduced in Sense/Net 6.1. Any versions prior to 6.1 should use a single global queue for all appdomains.

Messages and stopped appdomains

No messages are processed by stopped appdomains, processing starts when they are started. Messages that had already disappeared from the queue before starting the appdomain are left out and never processed. This model is ideal in most cases since a recently started appdomain always has a fresh cache from an up-to-date Content Repository. This is not the case with the indexing messages however, as the index of the stopped portal might be out of sync compared to the database when started again. For this reason the indexing messages are persisted to the database and are processed at startup time, so when the started appdomain recieves its first message it will already have an up-to-date index.

Installing MSMQ

MSMQ can be installed by turning it on in the Windows Features dialog, accessible from the Control Panel in Windows. You need to check in the following:

  • Microsoft Message Queue (MSMQ) Server
  • Microsoft Message Queue (MSMQ) Server Core
  • MSMQ Active Directory Domain Services Integration

Setting up queues

If MSMQ is installed, you can manage queues from the Computer Management dialog, accessible via the Manage link after right clicking Computer. You can find queue containers under Computer Management / Services and Applications / Message Queuing. For Sense/Net, you can set up either private or public queues (private is recommended). To create a queue, follow these steps:

  • right click on the container (e.g. Private Queues), and select New / Private Queue
  • right click on the newly created queue, and select the Properties menu item
    • On the General tab make sure that the "Enabled" is unchecked in the Journal section.
    • On the Security tab grant Receive/Peek/Send message permissions for all the users that run the application pools in the NLB environment and the tools in the system (e.g. Import, Backup)

You can refer to the created queues in the following manner:

  • private queues:
    • .\private$\{queuename} for local computer
    • FormatName:DIRECT=TCP:10.101.2.191\private$\{queuename} for remote computer
  • public queues: {computername}\{queuename}

Security queues - from version 6.5

If you are using Sense/Net ECM 6.5 or higher, you will need to create additional queues for the security component. Similarly to queues for the portal, please create a security queue for every app domain and configure them as seen below.

Configuration

MSMQ is not configured in the Sense/Net default install, and it does not require MSMQ to be installed on the server machine. This means, that communication between appdomains is switched off by default - this is the ideal setup for single node server scenarios with no external tools executed ever. If you want to execute external tools, or configure more than one web nodes for Network Load Balancing (NLB), you will have to configure MSMQ in the application configuration for every web node and external tool that will connect to the same Content Repository. To enable MSMQ, make sure that the web.config and app.config files contains the following lines:

<!--=========== MSMQ configuration ===========-->
<add key="ClusterChannelProvider" value="SenseNet.Communication.Messaging.MsmqChannelProvider, SenseNet.Storage" />
<!-- Msmq queue paths. Provide at least 2 queue paths: first one should be the local queue for receiving messages, the subsequent names should be the remote queues for sending messages. -->
<add key="MsmqChannelQueueName" value=".\private$\ryan;FormatName:DIRECT=TCP:192.168.0.1\private$\ryan" />

Additionally you have to configure MSMQ for the security component and provide the dedicated security queues the same way as the portal queues above.

<add key="SecurityMessageProvider" value="SenseNet.Security.Messaging.Msmq.MsmqMessageProvider, SenseNet.Security.Messaging.Msmq" />
<add key="SecurityMsmqChannelQueueName" value=".\private$\security1;.\private$\security2" />

Additional/optional configuration:

    <!-- Message retention time in seconds. Default value: 10 sec. -->
    <!--<add key="MessageRetentionTime" value="10" />-->
    <!-- MsmqReconnectDelay defines the time interval between reconnect attempts (in seconds).  Default value: 30 sec. -->
    <!-- <add key="MsmqReconnectDelay" value="30" /> -->
    <!-- MSMQ test log path. Uncomment this to allow indexing tasks sent over MSMQ traffic to be logged here. Name of log file should reflect web node instance, eg. 1 for 1st node, 2 for 2nd node, etc.) -->
    <!-- <add key="MsmqLogPath" value="c:\\msmqlog-1-{0}.csv" /> -->
    <!-- Number of clusterchannel message processor threads. Default is 5. -->
    <!--<add key="MessageProcessorThreadCount" value="5" />-->
    <!-- Max number of messages processed by a single clusterchannel message processor thread. Default is 100. -->
    <!--<add key="MessageProcessorThreadMaxMessages" value="100" />-->
    <!-- Number of messages in process queue to trigger delaying of incoming requests. Default is 1000. -->
    <!--<add key="DelayRequestsOnHighMessageCountUpperLimit" value="1000" />-->
    <!-- Number of messages in process queue to switch off delaying of incoming requests. Default is 500. -->
    <!--<add key="DelayRequestsOnHighMessageCountLowerLimit" value="500" />-->
    <!-- Max size (in bytes) of indexdocument that can be sent over MSMQ. Default is 2000000. Larger indexdocuments will be retrieved from db. -->
    <!--<add key="MsmqIndexDocumentSizeLimit" value="2000000" />-->
    <!-- Cluster channel monitor heartbeat interval in seconds. Default: 60, minimum value: 10 -->
    <!--<add key="ClusterChannelMonitorInterval" value="60" />-->
    <!-- Timeout limit for receiving response for channel monitor test messages in seconds. Default: 10, minimum value: 1. -->
    <!--<add key="ClusterChannelMonitorTimeout" value="10" />-->
    <!-- MsmqReconnectDelay defines the time interval between closing and starting the channels (in seconds).  Default value: 1 sec. -->
    <!-- <add key="MsmqReconnectDelay" value="1" /> -->

The configuration values are the following:

  • ClusterChannelProvider: leave this unchanged as written above.
  • MsmqChannelQueueName: you need to set the queue paths for every web node and tool that will connect to the Content Repository. First queue path should be local incoming queue, consequtive queue names are paths of outgoing queues. Queue paths are separated with ';'. See Installing MSMQ section above on how to install and reference queues.
  • MessageRetentionTime: (optional) message retention time in seconds. Every message has a retention time: if it expires, the message is deleted from the queue, thus ensuring that the channel cannot get filled. The default value is 10, minimum value is 2.
  • MsmqLogPath: (optional) an optional log file path to investigate MSMQ sent and received messages.
  • MessageProcessorThreadCount: (optional) Number of threads processing/executing tasks of incoming messages. Default is 5.
  • MessageProcessorThreadMaxMessages: (optional) Number of messages one processor thread is allowed to process in a row. Default is 100.
  • DelayRequestsOnHighMessageCountUpperLimit: (optional) Number of unprocessed incoming messages that trigger a delay of incoming request handling. Default is 1000.
  • DelayRequestsOnHighMessageCountLowerLimit: (optional) Number of unprocessed incoming messages that re-enable incoming request handling if it was delayed due to high unprocessed message count. Default is 500.
  • MsmqIndexDocumentSizeLimit: (optional) Size of index documents that can be sent over MSMQ. Larger index documents will be retrieved from database. Default is 2000000.
  • ClusterChannelMonitorInterval: (optional) Cluster channel monitor heartbeat interval in seconds. Default: 60, minimum value: 10.
  • ClusterChannelMonitorTimeout: (optional) Timeout limit for receiving response for channel monitor test messages in seconds. Default: 10, minimum value: 1.
  • MsmqReconnectDelay: (optional) defines the time interval between closing and starting the channels (in seconds). Default value: 1 sec (available from version 6.0.7).

If not configured correctly (app pool user does not have permissions to the given queue, queue does not exist, etc.), the portal will fail to start with one of the following error messages:

  • There was an error connecting to queue (queuepath).
  • No queues have been initialized. Please verify you have provided at least 2 queue paths: first for local, the rest for remote queues!

Connection problems

There are situations when the message queue is unavailable for a short period of time. It could happen when the host running the MSMQ service is restarted, or network problems prevent the communication between the message queue and the web servers. In these situations the changes made on one server will not appear on the others because indexing tasks (transported by MSMQ) will not be executed on each server. To avoid these situations Sense/Net ECMS offers a self-healing and logging mechanism that helps administrators identify communication problems.

Cluster channel monitor

Cluster channel monitor is a feature that maintains the connection between active NLB nodes. When the monitor perceives a change in the environment (e.g. a channel disappears), it writes messages to the log that can be monitored by administrators.

  • ChannelStarted: an information level message containing some information about the active channels.
  • ChannelStopped
    • in case of planned shutdown an information level message is logged.
    • in case of an unexpected shutdown a critical error message is logged stating that a channel has stopped.

The communication mechanism between NLB nodes has a two-level error protection described in this section.

First level protection

If an error occurs during a communication action in the cluster channel module, the system tries to rebuild the channel in question. This applies to the local (receiver) and remote (sender) endpoints too. This means that errors are repaired instantly on this protection level. If the reconnection is not successful, there will be other attempts to restore the connection until the queue becomes available again. The time interval (in seconds) between connection attempts can be modified in the web.config using the MsmqReconnectDelay key (see the configuration section above).

Second level protection

There are cases however when the error occurs on the 'other side' of a channel. All channels have two endpoints but one appdomain can repair only its own endpoints, there is no way to alert the other side if the error happened there. This is when the second level protection comes in: we call is cluster channel monitoring. This protection is based on polling active channels. This relies on a special communication protocol that consists of sending and receiving regular messages. It can discover communication failures via a multiple request/response system and can repair the damaged endpoints. This polling mechanism can be configured in the configuration file (web.config or app.config) by fine-tuning the following config values (all of the values are in seconds, see their description above in the configuration section):

  • ClusterChannelMonitorInterval
  • ClusterChannelMonitorTimeout
  • MsmqReconnectDelay

If you know that the messaging queue will not be availabe for a longer period of time we recommend you to stop all the sites in your NLB environment except one until the problem gets solved.

Investigating message traffic

You can investigate message traffic by clicking on the Queue messages folder under the created message queue in Computer Management. The folder will show the messages that arrived and have not yet been deleted. Note, that if an appdomain processes a message, it immediately deletes it. You will need to refresh the folder to see the messages, as it does not refresh itself automatically.

Message types

Invalidating manager classes or in-memory structures:

  • ApplicationStorageInvalidateDistributedAction: sent when Application Storage is invalidated and reset. Happens every time something changes under any (apps) folder or with any Application.
  • ResourceManagerResetDistributedAction: sent when Resource Manager is reset. Happens every time a Resource is saved.
  • ContentTypeManagerResetDistributedAction: sent when ContentType Manager is reset. Happens when a new Content Type is installed in the system (or updated).
  • ReloadSiteListDistributedAction: sent every time a site is changed, added, or deleted.
  • ReloadSmartUrlListDistributedAction: sent every time smart urls are changed for a page and the page is saved.
  • NodeTypeManagerRestartDistributedAction: sent when NodeType Manager is reset. Happens when a new Content Type is installed in the system (or updated).
  • PermissionEvaluatorResetDistributedAction: sent when permission settings on any node change.
  • DeviceManagerResetDistributedAction: sent when DeviceManager is reset. Happens when any Device is created, changed or deleted.

Invalidating cache:

  • NodeIdDependency/FireChangedDistributedAction: sent every time a node is saved or permission inheritance on the node is broken or unbroken. For List changes pathdependency message is sent, not this one.
  • NodeTypeDependency/FireChangedDistributedAction: sent every time a Node Type is installed (or updated).
  • PathDependency/FireChangedDistributedAction: sent every time the path of a node changes, or permission inheritance on the node is broken or unbroken. Also when Lists change to invalidate subtree.
  • CacheCleanAction: sent when CleanupCache is requested on a page from url.
  • CleanupNodeCacheAction: not used.
  • PortletChangedAction: sent when properties of a portlet are changed and saved.
  • PortletChangedMessage: not used.
  • BundleCacheInvalidatorDistributedAction: sent when a js or css file changes. Invalidates bundle cache.

Indexing:

  • LuceneActivityDistributor: sent when changes occurred in the index.
  • IndexWriterClosedDistributedAction: sent whenever the IndexWriter has been closed.

Backing up and restoring the index:

  • RequestBackupIndexMessage: sent by Backup Tool to request an index backup operation by a web instance.
  • IndexBackupStartedMessage: sent by web instance to Backup Tool when it starts creating an index backup.
  • IndexBackupFinishedMessage: sent by web instance to Backup Tool when it finishes index backup.
  • RestoreIndexRequestMessage: sent by Backup Tool to request an index restoring operation by every web instance.
  • IndexRestoringStartedMessage: sent by web instance to Backup Tool when it starts restoring operation.
  • IndexRestoringFinishedMessage: sent by web instance to Backup Tool when it finishes the index restoring.
  • IndexRestoringErrorMessage: sent by web instance to Backup Tool when any error is occured during the the index restoring.

Example/Tutorials

Configuring MSMQ for multiple appdomains

Let's take the following infrastructure:

  • web server 1 on 192.168.0.1
  • web server 2 on 192.168.0.2
  • ad sync tool running on 192.168.0.1
  • third party tool running on 192.168.0.3

This is an NLB setup with 4 different appdomains and 3 different machines. In order that all appdomains work correctly they need to inform each other about changes. Since we need to configure a local queue for every appdomain, the following list summarizes the queues that need to be created on the different machines:

  • 192.168.0.1
    • a private queue named web1
    • a private queue named adsync
  • 192.168.0.2
    • a private queue named web2
  • 192.168.0.3
    • a private queue named tool

Once the queues have been created and permissions set on the queues correctly, let's update the web.config and app.config files of the 4 appdomains. Remember, for every appdomain the first queuepath refers to the local incoming queue, and the rest refers to outgoing queues:

  • web server 1
<add key="MsmqChannelQueueName" value=".\private$\web1;.\private$\adsync;FormatName:DIRECT=TCP:192.168.0.2\private$\web2;FormatName:DIRECT=TCP:192.168.0.3\private$\tool" />
  • web server 2
<add key="MsmqChannelQueueName" value=".\private$\web2;FormatName:DIRECT=TCP:192.168.0.1\private$\web1;FormatName:DIRECT=TCP:192.168.0.1\private$\adsync;FormatName:DIRECT=TCP:192.168.0.3\private$\tool" />
  • ad sync
<add key="MsmqChannelQueueName" value=".\private$\adsync;.\private$\web1;FormatName:DIRECT=TCP:192.168.0.2\private$\web2;FormatName:DIRECT=TCP:192.168.0.3\private$\tool" />
  • third party tool
<add key="MsmqChannelQueueName" value=".\private$\tool;FormatName:DIRECT=TCP:192.168.0.1\private$\web1;FormatName:DIRECT=TCP:192.168.0.1\private$\adsync;FormatName:DIRECT=TCP:192.168.0.2\private$\web2" />

Related links

References

There are no external references for this article.