Location Transparency With Vert.x

See how Vert.X, its Service Directory component and its eventbus work to get services talking to one another on both single or multiple JVMs.


In my previous article, I explained the Service Discovery in Vert.x and introduced an example of transparent remoting using Service Discovery. Transparent Remoting is a remote method invocation that looks like a local method invocation. In the other words, we have a plain Java interface and its proxy implementation at the client side. In the meantime, we have stub at the server side, where the actual implementation runs. With Service Discovery in Vert.x, we can obtain service references using its service name so that we no longer need to care whether a service runs locally or remotely. However, Location Transparency in Vert.x is a very important topic, and I am going to explain it in detail in this article.

It all starts with the concept of “Single Responsibility,” which leads to having more granular components in the application. These components could be in the same JVM or different JVMs. In either case, they don’t need to be aware of the others’ location.

We also shouldn’t need to implement extra logic to fulfill this scenario. We can think of this concept as similar to how we use file paths without knowing which hard disk sector they are stored in.

Another equivalent convention goes for today’s distributed environment as well. For example, we access distributed files using HDFS or S3 filesystem paths without knowing partitions’ actual location in the network. In Vert.x, we use topics to either send messages or register handlers over the eventbus. When the eventbus consumer of the specific topic gets the reference from Service Discovery, a “bind” message is published with the topic named DEFAULT_USAGE_ADDRESS. And when the eventbus producer registers itself to Service Discovery, an “up” message is published through the eventbus with DEFAULT_ANNOUNCE_ADDRESS. We can take actions in each case.

Let’s put this into action by giving an example. I am calling the application in this example log-collector. Imagine a scenario where we have several daily rolling.log files for different resources where there could be physical storage or cloud storage like AWS S3, and scheduled job runs log-collector on each server to aggregate these log files into a central location like a MongoDB cluster.

First, these log-collectors expose files in the specified location like some other application’s log path to the Service Registry. Therefore, when the aggregator application comes up in the cluster, it can request those files with their names and read their contents and aggregate them into a centralized location for further processing. The aggregator can be deployed before or after collectors. Because the aggregator simultaneously listens and requests for the service, it needs and uses first one.

Image title

The whole cluster can be launched with docker-compose and will run one Zookeeper, two local file readers, one S3 file reader, and a mongo writer as shown below.

$ git clone https://github.com/SercanKaraoglu/log-collector.git
$ cd log-collector
$ gradle clean; gradle distDocker
$ cd app/clustered/
$ docker-compose up

All MongoWriterVerticle cares about is writing incoming events to MongoDB, so it doesn’t know about files’ location — but rather their names. And it can run with both a single JVM or multiple JVMs. You can check out the MongoDB like below;

docker exec -ti clustered_mongo_1 bash
[email protected]:/# mongo
MongoDB shell version: 3.2.8
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
> use logcollector
> show collections
> db["2016/08/31"].find().sort({_id:1})
{ "_id" : NumberLong(1), "i" : "1" }
{ "_id" : NumberLong(2), "i" : "2" }
{ "_id" : NumberLong(3), "i" : "3" }
{ "_id" : NumberLong(4), "i" : "e" }
{ "_id" : NumberLong(5), "i" : "f" }
{ "_id" : NumberLong(6), "i" : "6" }
{ "_id" : NumberLong(7), "i" : "h" }
{ "_id" : NumberLong(8), "i" : "d" }
{ "_id" : NumberLong(9), "i" : "7" }
{ "_id" : NumberLong(10), "i" : "5" }
{ "_id" : NumberLong(11), "i" : "8" }
{ "_id" : NumberLong(12), "i" : "b" }
{ "_id" : NumberLong(13), "i" : "4" }
{ "_id" : NumberLong(14), "i" : "c" }
{ "_id" : NumberLong(15), "i" : "a" }
{ "_id" : NumberLong(16), "i" : "g" }
{ "_id" : NumberLong(17), "i" : "x" }
{ "_id" : NumberLong(18), "i" : "y" }
{ "_id" : NumberLong(19), "i" : "z" }
{ "_id" : NumberLong(20), "i" : "q" }
Type "it" for more

Bottom Line

In Vert.x, everything is ready to scale. All information between Verticles is exchanged via its nervous system eventbus, and it is async. Event-driven reactive design is intentionally chosen so it could run on both a single JVM or clusters of JVMs smoothly. Zookeeper helps Vert.x instances find each other, so they can communicate using the eventbus. With the example above, I also showed that the deployment timing of any Verticle does not matter because Service Consumer Verticles can both request and listen to the services at the same time and consume them when they become available. This can happen through the Service Discovery component. When we publish the Message Source Verticle to the Service Discovery, they can be discovered by Service Consumers. That way, we can think of Service Discovery as the glue between Service Consumers and Service Providers.

Hope this article is helpful for everybody

VIALocation Transparency With Vert.x
Sercan Karaoglu had his BS. in Mathematics Engineering Department of Istanbul Technical University. Currently, he develops High Throughput-Low Latency Reactive Microservices and Reactive Stream applications. And passionate about Deep Learning & Machine Learning. He is currently studying his MSc @Department of Computer Engineering and field of Big Data Analytics and Management at Bahcesehir University https://www.kaggle.com/sercankaraoglu https://github.com/SercanKaraoglu http://vertx.io/materials/


Please enter your comment!
Please enter your name here