Scalability question

Hello,

We are currently evaluating webRTC frameworks for one of our customers.
One of the main requirements being the scalability we (naturally) hit the lynkia/licode web page (“ … and we make it 100% scalable”).
We have downloaded it and we’ve done some (basic) samples.
The (only) licode architecture diagram we found is:
https://github.com/lynckia/licode/issues/335

We also found this presentation:

But we haven’t found any information/guideline (or benchmark based on different scenarios) regarding the scalability.
A priori one ErizoController can manage several MCUs (Erizo Agent via the ErizoJS API), the communication using the RabbitMQ broker. Is this correct?
Is it the only approach considering that the main “bottleneck” is/could be the MCUs?
Should the ErizoController be located on the same host as MongoDB?
Have you made some experiment with thousands of (simulated) users attending a web conference for instance ? Do you have any figures, recommendations?

Thanks for your help,

Pierre.

Hey,
I tested scalability for audio only at some point. And their are some scaling tipps in the thread as well.

https://github.com/lynckia/licode/issues/371

In my tests, the bottleneck was always the mcu. So my Setup is: 1 Sever (16cpus) for standard Licode -> MongoDB, RabbitMQ, ErizoController and Erizo Clients (can handle 60clients+). And then I boot up other Servers with only Erizo Clients on (if heigh cpu load).

I only got around 100clients max so cant tell if this scales very well to 1k or more…

@Cracker_F: That is interesting. I have been able to get relatively good results with a quite low-end machine. What type of configuration did you use? How many processes (the default is only one process)? Operating system level optimizations?

@jjahifi I’m on audio only and using OPUS codec. But my Servers CPU is not so fast at all (2.5Ghz / cpu). Im using 16 processes for each cpu one with no extra system level optimizations.

@Cracker_F, Ok, I apparently misunderstood you at first. Scalability testing has been a bit annoying (at least for me) in many cases. Some straightforward calculations have made me not even to try some tests. For example having 100 clients which are both publishers and listeners at the same time may mean something like 350-400MBytes/s data transfer, assuming high quality Opus audio only. This is much more than cheap network equipment can handle, even if the server could. Even with speech only level quality the data transfer would be something like 70MBytes/s, which eats up a big part of standard 1GBit/s lan-stuff’s theoretical throughput. Even if the server could handle the load, the network may not be able to do it. The real achievable throughput of many “so-called 1GBit/s routers” is less than that.

But if you can somehow find a way to minimise the number of simultaneously published streams and the number of subscriptions per client, then the number of active users can be much higher.

Thanks for your comments/remarks.
We also found the following post:
https://www.bountysource.com/issues/10231372-how-to-scale-licode
Quite interesting.

This discussion on scalability is, IMHO, somewhat misleading. Before asking about scalability you should define the intended use.

If you do not really require low latency, then there are many highly scalable ways to stream audio and video. And you need low latency for only a few situations like conferencing and some other uses. Most use cases do not really need the low latency provided by webrtc.

If you just want to broadcast sound (and video) for thousands of remote listeners/viewers, who do not publish themselves and do not need real-time interaction, then the system is relatively easy to set up. There are several easy to use streaming tools for that, both commercial and open source. In such cases the latency tends to be over 20s (could be even minutes), which means that interactivity like in meetings is not practical.

For interactive low-latency use cases like conferencing you need to pay serious attention to the architecture of the solution. The amount of data can easily turn out unpractical (like 500 attendees who are both receiving all streams and publishing at the same time), and your network just cannot handle it (even if the server could).

However, you can achieve quite high number of participants in a conference if only the relevant streams are used at a time. You can do that with Licode. That makes, however, the implementation of the actual system much more complex.

In summary, I would not consider a webrtc-based solution for broadcasting-type use.