Blog

Evolution of Our MQTT Setup: From Single Instance to Logical Scaling

In the world of real-time device communication, MQTT (Message Queuing Telemetry Transport) is a game-changer. It’s the backbone of our system, enabling smooth and reliable data exchange between different components. But as our system grew more complex and expansive, we started hitting some snags with our original MQTT setup. Let’s dive into what we faced and how we tackled these challenges.

 

Scaling Challenges


We started with a simple MQTT setup: a single server instance using the Mosca library on Node.js, running on an EC2 instance, and managed by PM2. This setup was perfect for our early days. However, as our data volume and system demands grew, this solution quickly showed its limitations. The single instance became a major bottleneck, struggling to manage the increasing number of connections and messages. It was clear we needed a more robust solution to scale effectively.

 

 

Redis Sharding Issues

 

To tackle our scalability problems, we upgraded to a clustered setup using Redis as a backend for multiple Mosca nodes. Each node was configured to leverage Redis for seamless communication. Here’s a quick look at how we integrated Redis:

 

				
					const mosca = require('mosca'); 

const redis = require('redis'); 

const ascoltatore = { 

  // Configuration for Redis backend 

  type: 'redis', 

  redis: redis, 

  db: 12, 

  port: 6379, 

  return_buffers: true,  

  host: "localhost" 

}; 

const settings = { 

  port: 1883, 

  backend: ascoltatore 

}; 

const server = new mosca.Server(settings); 

server.on('ready', setup); 

function setup() { 

  console.log('Mosca server is up and running'); 

} 

server.on('clientConnected', function(client) { 

  console.log('Client Connected:', client.id); 

}); 

server.on('published', function(packet, client) { 

  console.log('Published', packet.payload); 

});
				
			

 

With this configuration, Mosca utilized Redis’ pub/sub mechanism to synchronize messages across nodes, ensuring they reached the correct subscribers. We opted for a Sharded Redis cluster to enhance scalability. Initially, this approach worked wonders. It allowed us to scale horizontally with ease and brought greater stability to our platform.

 

However, as we scaled further, new issues emerged. Our MQTT servers began restarting sporadically, leading to dropped client connections and partial service disruptions. Even when the servers didn’t restart, we noticed delays in MQTT operations, particularly with CONNACKS and PUBACKS. Suspecting issues with our MQTT servers, we ramped up logging and monitoring to diagnose the problem.

 

 

Unearthing the Redis Load Imbalance

 

Even with enhanced logging, we couldn’t pinpoint the exact issues with our MQTT servers, although the performance problems were undeniable. Then came a particularly troublesome day. As we faced persistent issues, we dived deep into debugging our servers, even inspecting the Mosca library code. That’s when we discovered a long-overlooked problem. Surprisingly, the issue wasn’t with our servers but with the Redis cluster managing our MQTT instances.

 

The load on the Redis shards was imbalanced. Upon closer inspection, we found that most MQTT instances were connecting to the same Redis shard due to an IP proximity bias. Essentially, Redis (we were using AWS Elasticache) was providing a node list in a fixed order. The Redis client would consistently connect to the first node in this list, overloading a specific shard while the others remained underutilized. This unbalanced load was the root cause of our performance issues.

 

Static Mapping Solution

 

To address the Redis sharding issue, we implemented a static mapping solution. Each MQTT server was assigned to a specific Redis node using a predefined hash function. This ensured a more balanced distribution of the workload across the Redis cluster. By manually managing the load distribution, we alleviated some of the performance problems. Additionally, we optimized our Redis configurations to ensure each node handled its share of the load efficiently and without interference. This proactive approach significantly improved our system’s stability and performance.

 

To implement this static mapping, we used the Redis shard URL as an environment variable in the ascoltatore configuration. Here’s how the modified configuration looks:

 

				
					const redisUrl = process.env.REDIS_URL; 

const ascoltatore = { 

  // Configuration for Redis backend 

  type: 'redis', 

  redis: redis, 

  url: redisUrl, 

  db: 12, 

  port: 6379, 

  return_buffers: true, // If you need buffers, specify this 

  host: "localhost" 

};  
				
			

 

In this snippet, process.env.REDIS_URL retrieves the Redis shard URL from environment variables, allowing each Mosca instance to connect to a specific Redis shard by setting the REDIS_URL environment variable.

 

For instance, you can run your Mosca node with a specific Redis shard URL like this:

				
					REDIS_URL=redis://<redis-shard-1-url> node moscaServer.js 
				
			

 

This way, each Mosca node is configured to connect to a specific Redis shard, effectively achieving the static mapping solution and distributing the load more evenly across the cluster.

 

 

Scaling Limitations

 

After implementing the Static Mapping Solution, our system initially performed better and operated more efficiently. However, this success was fleeting. As we continued to scale, performance degradation resurfaced, worsening over time. At this stage, we had a 1:5 ratio between MQTT instances and Redis shards, with each Redis shard serving five MQTT instances.

 

Despite vertically scaling our Redis cluster, we observed significant load issues. The static mapping meant high-traffic MQTT instances could end up on the same Redis shard, causing overloads. To manage this, we manually remapped high-traffic MQTT instances to different shards. However, this required restarting the instances, temporarily reducing their traffic load and increasing the load on other nodes, creating a vicious cycle.

 

Managing these complications became increasingly difficult. Eventually, we found ourselves with a 1:1 mapping of MQTT instances to Redis shards, hoping this would stabilize performance. This meant each MQTT instance had its dedicated Redis shard, drastically increasing our AWS costs. Despite the investment, performance did not improve; it worsened. Our MQTT cluster’s performance was at its lowest, and the costs were at their highest.

 

At this point, we had 18 MQTT instances supported by a Redis cluster with 18 shards, yet we struggled to handle the load. The root cause was how our Redis version handled pub/sub in cluster mode — it didn’t. Regardless of shard assignments, each shard propagated events to other shards, which then propagated them to their respective MQTT servers. This redundancy caused a massive increase in network traffic, leading to inefficiencies. Instead of resolving our issues, sharding introduced network bottlenecks. It was clear we needed a new solution.

 

Logical Scaling Solution

 

Realizing the need for a more sustainable approach, we pivoted towards a logical scaling solution. This involved assigning a unique broker ID to each broker and maintaining a map of client IDs to broker IDs using a key-value store in Redis. This map was updated and refreshed whenever a client connected to a broker.

 

When a client connected to a broker, the broker’s on-connect event was triggered, updating the client-to-broker mapping in Redis. This mapping allowed us to determine the appropriate broker for each client, ensuring balanced workload distribution and improved scalability.

 

Here’s how we implemented this functionality:

 

				
					const mosca = require('mosca'); 

const redis = require('redis'); 

  

const redisClient = redis.createClient({ url: process.env.REDIS_URL }); 

const brokerId = process.env.BROKER_ID; 

  

const server = new mosca.Server(); 

  

server.on('ready', setup); 

  

function setup() { 

  console.log(`Mosca broker ${brokerId} is up and running`); 

} 

  

server.on('clientConnected', function(client) { 

  console.log(`Client Connected: ${client.id}`); 

  // Update the client-broker mapping in Redis 

  redisClient.hset('clientBrokerMap', client.id, brokerId, redis.print); 

}); 

  

server.on('published', function(packet, client) { 

  console.log('Published', packet.payload); 

}); 

  

// Function to get the broker for a specific client 

function getBrokerForClient(clientId, callback) { 

  redisClient.hget('clientBrokerMap', clientId, function(err, brokerId) { 

    if (err) { 

      return callback(err); 

    } 

    callback(null, brokerId); 

  }); 

} 
				
			

 

With this solution, we eliminated the need for Redis shards, maintaining only a small Redis cluster to store key-value mappings. This change removed network bottlenecks and improved the overall system performance.

 

The logical scaling solution significantly enhanced our system’s performance and scalability. By decoupling the MQTT cluster and addressing scalability at a logical rather than purely physical level, we achieved greater resilience and efficiency.

 

 

Results and Future Plans

 

Implementing the logical scaling solution was a game-changer for our MQTT setup. We saw substantial improvements in performance, reliability, and scalability, allowing us to better meet our system’s evolving needs. By decoupling and optimizing each component, we significantly reduced bottlenecks and boosted overall system efficiency.

 

Looking ahead, we are committed to further optimizing and refining our MQTT setup. Our future plans include exploring advanced techniques such as dynamic load balancing and automated scaling based on real-time metrics. We also aim to enhance fault tolerance mechanisms and integrate machine learning algorithms to predict and prevent potential performance issues.

Nishant Srivastava

Engineering Team Lead

Book a demo with us

*REQUIRED FIELDS

VIEW OUR PRIVACY NOTICE

Book a demo with us

*REQUIRED FIELDS

VIEW OUR PRIVACY NOTICE

Thank you for your interest in Intangles

Thanks for reaching out to us.

 

We’re looking forward to showing you what Intangles can do for you.

 

One of our team members will be in touch soon to arrange for a personalized demonstration.

 

If you have any questions in the meantime, please feel free to reach out to us directly by emailing us at sales@us.intangles.com or calling us at 747-229-2727.

 

Yours truly,
Team Intangles

*REQUIRED FIELDS

VIEW OUR PRIVACY POLICY

Thank you for your interest in Intangles

Thank you for reaching out to explore our solutions.

 

A member of our team will be in contact with you soon to arrange your personalized demo.

 

Should you have any questions in the meantime, feel free to reach out directly at +91-7385550898 or drop us an email at connect@intangles.com.

 

We are here to help.