MongoDB Replica Set Tag Sets: How to Use Them with Sharding and Change Streams

asartlis9620
Aug 19, 2023
6 min read

Tag sets let you customize write concern and readpreferences for a replica set. MongoDBstores tag sets in the replica set configuration object, which is thedocument returned by rs.conf(), in themembers[n].tags embedded document.

Custom read preferences and write concerns evaluate tags sets in different ways: read preferences consider the value of a tag when selecting a member to read from. while write concerns ignore the value of a tag to when selecting a member except to consider whether or not the value is unique.

MongoDB Replica Set Tag Sets

Download File

Max Staleness is the maximum replication lag in seconds (wall clock time) that a secondary can suffer and still be eligible for reads. The default is MONGOC_NO_MAX_STALENESS, which disables staleness checks. Otherwise, it must be a positive integer at least MONGOC_SMALLEST_MAX_STALENESS_SECONDS (90 seconds).

When you're developing against Amazon DocumentDB (with MongoDB compatibility), we recommend that you connect to your cluster as a replica set and distribute reads to replica instances using the built-in read preference capabilities of your driver. This section goes deeper into what that means and describes how you can connect to your Amazon DocumentDB cluster as a replica set using the SDK for Python as an example.

When using an SSH tunnel, we recommend that you connect to your cluster using the cluster endpoint and do not attempt to connect in replica set mode (i.e., specifying replicaSet=rs0 in your connection string) as it will result in an error.

Using the cluster endpoint, you can connect to your cluster in replica set mode. You can then use the built-in read preference driver capabilities. In the following example, specifying /?replicaSet=rs0 signifies to the SDK that you want to connect as a replica set. If you omit /?replicaSet=rs0', the client routes all requests to the cluster endpoint, that is, your primary instance.

The advantage of connecting as a replica set is that it enables your SDK to discover the cluster topography automatically, including when instances are added or removed from the cluster. You can then use your cluster more efficiently by routing read requests to your replica instances.

When you connect as a replica set, you can specify the readPreference for the connection. If you specify a read preference of secondaryPreferred, the client routes read queries to your replicas and write queries to your primary instance (as in the following diagram). This is a better use of your cluster resources. For more information, see Read Preference Options.

Reads from Amazon DocumentDB replicas are eventually consistent. They return the data in the same order as it was written on the primary, and there is often less than a 50 ms replication lag. You can monitor the replica lag for your cluster using the Amazon CloudWatch metrics DBInstanceReplicaLag and DBClusterReplicaLagMaximum. For more information, see Monitoring Amazon DocumentDB with CloudWatch.

Unlike traditional monolithic database architecture, Amazon DocumentDB separates storage and compute. Given this modern architecture, we encourage you to read scale on replica instances. Reads on replica instances don't block writes being replicated from the primary instance. You can add up to 15 read replica instances in a cluster and scale out to millions of reads per second.

The key benefit of connecting as a replica set and distributing reads to replicas is that it increases the overall resources in your cluster that are available to do work for your application. We recommend connecting as a replica set as a best practice. Further, we recommend it most commonly in the following scenarios:

Scaling up a cluster instance size is an option, and in some cases, that can be the best way to scale the cluster. But you should also consider how to better use the replicas that you already have in your cluster. This lets you increase scale without the increased cost of using a larger instance type. We also recommend that you monitor and alert on these limits (that is CPUUtilization, DatabaseConnections, and BufferCacheHitRatio) using CloudWatch alarms so that you know when a resource is being heavily used.

Instead, you could connect to the Amazon DocumentDB cluster as a replica set and distribute your reads to the replica instances. You could then effectively triple the number of available connections and cursors available in the cluster to 13,500 and 1,350 respectively. Adding more instances to the cluster only increases the number of connections and cursors for read workloads. If you need to increase the number of connections for writes to your cluster, we recommend increasing the instance size.

Typically we don't recommend that you connect to your cluster using the read preference of secondary. This is because if there are no replica instances in your cluster, the reads fail. For example, suppose that you have a two-instance Amazon DocumentDB cluster with one primary and one replica. If the replica has an issue, read requests from a connection pool that is set as secondary fail. The advantage of secondaryPreferred is that if the client can't find a suitable replica instance to connect to, it falls back to the primary for reads.

To better use the resources in your cluster, we recommend that you connect to your cluster using the replica set mode. If it's suitable for your application, you can read scale your application by distributing your reads to the replica instances.

I have a replica set of three members. Is it possible that I just want to read from one of the two secondary nodes? I use following code where the ip is one of the secondary, but I still saw the traffic was deployed to other nodes.

The Input Options tab enables you to specify which database and collection you want to retrieve information from. You can also indicate the read preferences and tag sets in this tab.

Tags allow you to customize write concerns and read preferences for a replica set. The Tag set specification table allows you to specify criteria for selecting replica set members. See Tag Sets for more information.

Click Join tags to append selected tag sets so that nodes matching the criteria are queried or written to simultaneously. If you select individual tag sets, then click Join tags, the tag sets are combined to create one tag set. Note that this change only occurs in the MongoDB Input window, not on the database.

Click Test tag set to display set members that match the tags indicated in the tag set specification. The ID, host name, priority, and tags for each replica set member that matches the tag set specification criteria are displayed.

Limiting reporting queries to dedicated nodes is a canonical example, used all over the MongoDB replication documentation. Reporting does not require writes, and permits eventually consistent data. Daily summaries do not suffer if they are derived from data which is seconds or minutes stale. It does not change the fundamental meaning of user behavior reports if your counts are missing a few actions, and some tallies are slightly misaligned.

You can build dedicated reporting nodes atop MongoDB replication by taking advantage of hidden replica set members, or tag sets in concert with read preferences. The first method is simpler, the second is more flexible.

MongoDB Replica Sets create uptime durability by replicating data to all the nodes in a set, and providing seamless fail-over to clients. They contain one primary node that allows writes, while the rest are read-only secondaries. They manage among themselves which is primary, holding elections to determine which node should be primary when conditions require. Replica sets should contain an odd number of members to facilitate rapid elections without ties.

It is fundamentally not knowable whether unreachable machines are down or if the network has been partitioned, so if a majority of the nodes in a replica set go offline (say, 2 out of a 3-member set), even if a healthy primary remains, it will step down to a read-only secondary. Not doing so could lead to multiple machines declaring themselves primary in the case of a network partition, and horrific data inconsistencies.

Hidden members of a replica set are configured to be priority: 0, to prevent them from ever being elected primary, and to be hidden: true, which prevents clients connected to the replica set from routing reads to it, even if they specify a read preference of secondary.

Kusanagi.local:29018 is now hidden. It will continue to replicate and vote in elections as usual, but clients connecting to the replica set will never read from it, even if Kusanagi.local:29019 is taken down:

With 2 ordinary and one hidden member in a replica set, fault tolerance for writing is identical to a regular 3-member set. However, should you lose two nodes, your production application will not be able to gracefully degrade to read-only mode, because your hidden member will not allow replica set client reads. If you just like the simplicity of a hidden member, and cost is not an issue, use a 5-member set (with one member hidden) instead.

The Ruby driver (as of 1.9.2), for example, does not refresh its view of the replica set unless the client is initialized explicitly to do so with refresh_mode: :sync. Check your driver documentation.

Read preferences are a new feature in MongoDB 2.2 that lets you finely control how queries are routed to replica set members. With fine control comes complexity, but fear not: I'll explain how to use read preferences to route your queries with PyMongo.

Which member of a replica set should PyMongo use for a find, or for a read-only command like count? Should it query the primary or a secondary? If it queries a secondary, which one should it use? How can you control this choice?

When your application queries a replica set, you have the opportunity to trade off consistency, availability, latency, and throughput for each kind of query. This is the problem that read preferences solve: how to specify your preferences among these four variables, so you read from the best member of the replica set for each query. 2ff7e9595c

ST

Strategic Consulting

MongoDB Replica Set Tag Sets: How to Use Them with Sharding and Change Streams

MongoDB Replica Set Tag Sets

Recent Posts

Comments