Server CPU Usage

#1

A few months ago I started a discussion about “Scaling Node.Js On Multi-Core Systems”.

Yesterday I had the chance to measure the server performance again under the heavy load of Licode with the latest release (v2). It was a webinar with one publisher (audio+video) and hundreds of participants running on a dedicated server with 12 x 2Ghz Xeon Intel CPU (15M Cache) and 12GB of RAM. Every thing was fine until the number of participants reached 150 and then lags on audio began then we had to stop the video and continue the webinar with audio only. :disappointed_relieved:

Here’s the process status of the server using htop utility in the pick of the webinar. As you can see the server load is normal but one of the CPUs has the most process on it which means we couldn’t use the whole server capacity.

Here are some questions in my mind:

  1. Is there any way to use all the CPU power? Honestly 150 is disappointing.
  2. Is this about erizoJS architecture or libnice as we discussed before?
  3. If it’s libnice then perhaps upgrading to the latest version can help, no? The current libnice version we are using (1.4) is very old. What is preventing us from upgrading?
  4. Xeon CPUs benefit from many CPU cores but with a fewer clock rates comparing with Corei7 CPUs which have higher clock rates with less cores and of course with lower price. May be in this case, which most of the process is on one of the CPU cores, using Corei7 CPU provides better performance. What do you think?
#2

config.erizoAgent.maxProcesses = 1; // default value: 1 // Number of precesses that ErizoAgent runs when it starts. Always lower than or equals to maxProcesses. config.erizoAgent.prerunProcesses = 1; // default value: 1

In a general many-to-many scenario, you can start more than one erizoJS processes per erizoAgent, making better use of the available CPU. Publishers will be assigned to one of these processes by round-robin.
I’m afraid your scenario - one publisher with a high number of subscribers - can be problematic when the amount of subscribers is big enough. Licode, today, is not a broadcasting solution and is more geared towards videoconferencing.

In any case, that doesn’t mean it shouldn’t be able to make better use of resources. Are you sure CPU is the bottleneck here? Are you using simulcast? That amount of subscribers can very well cause problems in the publishers with the amount of feedback that is being forwarded, simulcast should take care of that. That would be my first recommendation.

Yes, we suspect there is a per-process bottleneck caused by libnice.

0.1.13 did not show significant performance improvements and it caused connectivity problems in some cases for us so we rolled it back to 0.1.4, the version that has worked best for us. 0.1.14 (the latest) has dependencies that Ubuntu 14.04 does not meet. We’re considering different options to address this bottleneck and are working on it as we speak.

We’ve not run extensive profiling in different types of CPUs but my guess is that it does not matter that much. ErizoJS is multithreaded, with each connection running on it’s own set of threads. It is, by no means, tied to a single core. That’s not the nature of the bottleneck we’re facing, it’s more related to I/O than actual CPU power.

#3

Thank you Pedro for the explanations.

Almost, there is lots of free memory, zero disk I/O and the host provider is asserting that there is enough Internet bandwidth behind the server. Any thing else to check? As I mentioned I’m using htop for monitoring. Does anyone know a better tools?

No! I thought it will increase the process on the server! So I will test it in our next webinar, probably tomorrow, and will share the result here. Just there is no clarification about numSpatialLayers. Please let me know what number best fits for my scenario?

BTW is simulcast available in v2 release? Not in the release notes.

#4

No… its not in v2, sorry about that. It will be in a release soon.
If you’re able, you should be safe running a seminar in the master branch right now. I’d probably use 3 spatial layers in your case. Please post the result here if you do try.

#5

Okay so I’ll test it locally first before using in the real webinar. Do you have any estimate for the next release?

I also figured it out that I can benefit from config.erizoAgent.maxProcesses, as you mentioned, since I’m using individual streams for audio and video. Thanks for the tips.

#6

We’re running some more tests with a couple of tweaks that we introduced recently but my guess is that we will release it sometime next week.

Looking forward to your feedback if you decide to use it once its out.

#7

Great! Yeah it’s a little risky and I always try to update the production with care. I’m very curious to see how simulcast will improve the quality and performance.

BTW I increased the config.erizoAgent.maxProcesses, tested in a small webinar with 2 publishers (2 audio + 1 video streams) and it seems we got a better performance. As you can see in the screenshot there are three EJ processes and a good balance over CPUs. Thanks again.

#8

Today another measurement on latest release (v3) on the same server and similar conditions, a webinar with one publisher (audio + video) and 90+ participants) but with simulcast enabled this time!

It was our best experience with Licode ever!! Thanks to the team and their recent hard work :ok_hand::pray:. The video quality was great and all participants were happy except those with poor Internet connection. As you see in the following picture there is a perfect balance over CPUs.

The only issue was the time that video takes to reach its best quality (several minutes) and I discovered during that time the video is out of sync with speaker’s voice (behind for a couple of seconds).

1 Like
#9

Hi, I have also been getting this out of sync issue between video and audio consistently when one of the two participants has a very slow connection. By checking in the code it seems the synchronisation during recording of the video/audio is not using NTP timestamp from the RTCP Sender Report to do the lip-sync.
Could that be the issue or the sync issue comes from higher up in the chain ?

#10

the synchronization is done by chrome on the receiver side using RTCP SR.
On the externaloutput (recorder) this is missing but should not cause any trouble with normal webrtc streaming.

#11

How can I solve it if the recording is out of sync but not Chrome?
It seems to consistently happen when recording a video session with one peer being “far” from the server or going through a remote VPN.

#12

atm you can’t solve this issue.
you have to implement a part o licode that is “missing”

If you know c++ you can maybe take a look at the externaloutput.cpp and listen for sr rtcp reports

#13

CPU usage is very important that i am not getting how to check but finally able to know, itunes error 0xe80000a help of and it’s worked fine.