How I solved the performance nightmare (RxJS, JSON-RPC and more)...
Prerequisites:
- JavaScript and asynchronous programming,
- a general idea of JSON-RPC,
- a little bit of Angular knowledge, mainly the fact that the http service uses RxJS observables,
- and well, some knowledge of RxJS, at least what it is and its general features
In this technical post, I will explain how I used JSON-RPC 2.0 batch requests with Angular2 and RxJS to solve this significant scalability problem, while keeping the code (almost) as elegant as it used to be.
First.. Why did I have to send so many requests? Well, because this specific display that I was working on was incredibly complex, it displayed a lot of data elements fetched from different places in the server. It also required a lot of meta-information about these data elements, which also needed fetching from the server.
In this case, of course, the right solution would be to implement a single request on the server that would handle all of this for me, and send me the ready-made result. Unfortunately, I did not have this privilege, let's just say I did not have full programmable access to the server. It provided me with a bunch of methods that I could call using the JSON-RPC 2.0 protocol, and that was about it.
So, let's look at the problem in more detail, shall we?
Problem: Many requests, each of them doing its own thing!
My code kind of looks like this:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | loadData() { let requests: Observable<any>[] = []; //Do a bunch of things //. //.. //... requests.push(this.jsonRpcHelper.request({ jsonrpc: 2.0, method:'get_value', params: { path: 'some/path/' } })).map(result => { // transform the result }).do(result => { // generate some side effects }); //Do a bunch of things //. //.. //... requests.push(this.jsonRpcHelper.request({ jsonrpc: 2.0, method: 'get_metadata', params: { path: 'some/other/path/', otherParameter: 'true' } })).flatMap(result => { // do some more complex things }).do(result => { // generate some side effects }); //Do a bunch of things //. //.. //... //Add many more requests to our requests array //. //.. //... //Once we have collected everything we need in the requests array, we send all those requests in parallel using Observable.forkJoin return Observable.forkJoin(requests) } //Now, the caller of this method has to just do the following: loadData.subscribe(); |
(Note: jsonRpcHelper is just some imaginary service that you have probably created at some point to deal with Http headers, session management, etc.)
What we are doing here is very beautiful. Throughout the code of loadData, we are generating many requests; they could be generated in any form possible, with any transformations, side-effects, etc. Then, when we are done, and we have all our requests ready, we collect all of them using Observable.forkJoin, which will send them all in parallel (or at least will deal with them in a non-deterministic order), and will then wait for all of them to finish, aggregate their results and return.
This way, we can fetch all of our data, do with it whatever we want (e.g. put in in a data structure of some kind), and then move on with our job.
This is a very elegant solution, it shows the real power and flexibility of RxJS, because all of these requests can be generated in all different ways, can do as many transformations as needed, can be written inside different functions that have access to different scopes, and after all of them have been fully-specified, they are all sent and synchronized using forkJoin.
But there is a problem here, a problem that will demonstrate more and more as the number of requests increases:
Every request is sent as a separate HTTP request. This means that for every request, there is a separate TCP connection, and there are separate HTTP request headers and response headers.
Believe it or not, this affects performance dearly. TCP is not the most time-efficient/resource-efficient protocol. For example, every connection requires a three-way handshake before it can start, and requires storing connection state. In addition, HTTP headers added to each request/response means much more data being transmitted. The requests in JSON-RPC protocol are usually small, so the size of the HTTP headers constitutes a significant percentage of the size of the request.
So how can we minimize this TCP and HTTP overhead? We can use JSON RPC batch requests.
What are JSON RPC batch requests?
Simply put, batch requests allow you to send several JSON RPC requests in a single HTTP request. This is perfect, only one TCP connection, and only one request/response HTTP header pair. A batch request will look like this:[{ jsonrpc: 2.0, id: 0, method: "method1", params: { param1: "test", param2: "test2" } }, { jsonrpc: 2.0, id: 1, method: "method2", params: { param1: true, } }, { ... .. . }]
So, you simply put your requests one after the other in an array, and send them. Notice the "id" parameter, it needs to have a unique value for each request, the actual value doesn't matter, as the server will just put it in the response to each of those requests exactly as it received it. but why do we need it?
Well, the server might process your requests in parallel, and so it does not commit to sending back the results in the same order as the requests. The responses will also be collected in an array, just like the requests were, but the order might be different. Thus, the id will help us map each response to its corresponding request.
So... are we happy? Have we solved all of our problems? Well.... we can use the batch request to make the code more efficient, but that is, unfortunately, achieved over the tomb of our clean, well-divided code. Why?? Because we won't receive the response of each request individually anymore! The responses of all those requests are now collected into one big response. I mean, here is what our old code would look like right now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | loadData() { let requests: any[] = []; //Do a bunch of things //. //.. //... requests.push({ jsonrpc: 2.0, id: 0, method: 'get_value', params: { path: 'some/path/' } }); //Do a bunch of things //. //.. //... requests.push({ jsonrpc: 2.0, id: 1, method: 'get_metadata', params: { path: 'some/other/path/', otherParameter: 'true' } }) //Do a bunch of things //. //.. //... //Add many more requests to our requests array //. //.. //... //Once we have collected everything we need in the requests array, we send all those requests in parallel using Observable.forkJoin return this.jsonRpcHelper.request(requests).do((results) => { // Do everything you used to do in the "map", "do", "flatMap", etc. }) } //Now, the caller of this method has to just do the following: loadData.subscribe(); |
I'm not sure if you are seeing the horror that I am seeing! Just look at the comment in line 38. Everything that we have been doing in so many separate places, with so many different contexts, and so many different scopes, will have to be done inside here. We have avoided the performance nightmare, but fell into a much worse dirty-coding nightmare! Take a moment to grasp the implications of this, to know how bad it is...
So, what can we do to fix this??
We can make our own batch-aware fork join! A function that will allow us keep (most of) the elegance of Observable.forkJoin, while keeping the performance gains of using batch requests.
The Batch-Aware ForkJoin
Our batchAwareForkJoin will look very similar to the static Observable.forkJoin function of RxJS on the outside, but will be able to handle batch requests efficiently and (mostly) elegantly.On the outside, the only difference in the function will be that it does not accept an array of observables like Observable.forkJoin does. Instead, it will accept an array of:
{ method: String; params: any; subjectToNotify: Subject<any>; subjectObservable: Observable<any>; }
This looks complicated.. why do we need all of this? Well, we need the method and params in order to add them to the batch of requests, simply. But what about the subjectToNotify and the subjectObservable? Well, let's look at how we will use this forkJoin, this way it will become clearer why we need each of these parameters:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | loadData() { let requests: {method: string, params: any, subjectToNotify: Subject<any>, subjectObservable: Observable<any>}[] = []; //Do a bunch of things //. //.. //... let subject = new Subject<any>(); let observable = subject.asObservable() .map(result => { // transform the result }).do(result => { // generate some side effects }); requests.push({ method: 'get_value', params: { path: 'some/path/' }, subjectToNotify: subject, subjectObservable: observable }); //Do a bunch of things //. //.. //... let otherSubject = new Subject<any>(); let otherObservable = otherSubject.asObservable() .flatMap(result => { // do some more complex things }).do(result => { // generate some side effects }); requests.push({ method: 'get_metadata', params: { path: 'some/other/path/', otherParameter: 'true' }, subjectToNotify: otherSubject, subjectObservable: otherObservable }) //Do a bunch of things //. //.. //... //Add many more requests to our requests array //. //.. //... //Once we have collected everything we need in the requests array, we send all those requests in parallel using our batchAwareForkJoin return batchAwareForkJoin(requests) } //Now, the caller of this method has to just do the following: loadData.subscribe(); |
So what is happening here exactly? For each request that we add to the array, we are telling batchAwareForkJoin: this is the method that I want to call, and these are its parameters. Please add them to your batch request. Also, when you get the results back from the server, please send me the result that concerns me on subjectToNotify, that way I can use its results to do my processing. This is the meaning of the first three parameters, but what about the fourth parameter? "subjectObservable"? This is the observable in which we are doing all our processing, creating data structures, etc. and we are telling batchAwareForkJoin: when you provide me with the data, I will execute this observable, so please don't do anything before I'm done. In other words, we are telling batchAwareForkJoin to subscribe to this observable, and only emit after this observable has completed. We need this because we need to make sure that all the processing has been finalized before loadData() emits its final result.
So, what kind of magic does batchAwareForkJoin perform? Please, help yourself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | batchAwareForkJoin(requests: {method: string, params: any, subjectToNotify: Subject<any>, subjectObservable: Observable<any>}[]) { let batch = requests.map((request, index) => { return { jsonrpc: 2.0, id: index, method: request.method, params: request.params } }); return Observable.create((observer) => { this.jsonRpcHelper.request(batch).subscribe((results) => { Observable.forkJoin(requests.map(request => request.subjectObservable)).subscribe((observablesResults) => { observer.next(observablesResults); observer.complete(); }); results.forEach(result => { requests[result.id].subjectToNotify.next(result); requests[result.id].subjectToNotify.complete(); }) }) }) } |
First, in lines 3 to 10, we are creating the batch of requests from the array of requests that we received, assigning each request an id equal to its index. Remember that the id will later help us map the responses to their corresponding requests.
Then, we create an observable, inside this observable we send the batch request (line 13). When the response arrives. We do two things:
1- In lines 14 to 17, we use the regular Observable.forkJoin to subscribe to all the observables that we received in the array of requests. That way, we let them do all the processing they want to do, and when they are done, we emit their aggregated result, just like the regular forkJoin does.
2- However, these observables will never emit anything by themselves. Each of them is listening on a subject that it provided to us earlier in its "subjectToNotify" parameter, and it is waiting for us to provide it with the result. Thus, in lines 19 through 22, we take each result that we received, and send it to its corresponding subject. Note the use of the id field to map the response to the correct request.
Thus we have created a forkJoin that allows us to leverage the performance of batch request, while maintaining (most of) the elegance of the regular forkJoin solution.
I hope you have enjoyed this discussion, which tries to prove, to a degree, that elegance and performance don't always have to clash.
Very interesting and powerful method. I like the name "batchAwareForkJoin". Elegant and efficient solution for a complex problem
ReplyDeleteThank you Hani :)
Delete