Bug #6274: yate is eating 260% CPU while doing nothing - OCTOI - Osmocom Community TDM over IP - Open Source Mobile Communications

Actions

Copy link

Bug #6274

open

yate is eating 260% CPU while doing nothing

Added by laforge 5 months ago. Updated 5 months ago.

Status:

New

Priority:

Low

Assignee:

laforge

Category:

tdm-core

Target version:

Start date:

11/25/2023

Due date:

% Done:

Description

Something seems wrong if an idle yate doing nothing consumes about 260% CPU.

Looking at the yate threads, we can see that each Zap Worker consumes 10..12% of CPU while doing nothing:

This is quite weird, especially considering that osmo-e1d with 23 trunkdevs in parallel only uses 20% CPU - while actually processing packets/frames for all of those trunks.

a strace of such a single Zap Worker thread looks like this:

select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0

So basically we're using select to check if data on a single FD has arrived, time-out after 100 microseconds and then call a zero-nanoseconds sleep and start all-over again.

Looking at the source code, this looks like extremely poor programming style:

// Process incoming data
bool ZapInterface::process()
{           
    if (!m_device.select(100))
        return false;
    if (!m_device.canRead()) {
        if (m_device.event())
            checkEvents();
        return false;
    }

    int r = m_device.recv(m_buffer,m_bufsize + ZAP_CRC_LEN);
    if (r == -1) {
        if (m_device.event())
            checkEvents();
        return false;
    }
    if (r &lt; ZAP_CRC_LEN + 1) {
        Debug(this,DebugMild,"Short read %u bytes (with CRC) [%p]",r,this);
        return false;
    }

    s_ifaceNotifyMutex.lock();
    m_notify = 0;
    s_ifaceNotifyMutex.unlock();
    DataBlock packet(m_buffer,r - ZAP_CRC_LEN,false);
#ifdef XDEBUG    
    String hex;  
    hex.hexify(packet.data(),packet.length(),' ');
    Debug(this,DebugAll,"Received data: %s [%p]",hex.safe(),this);
#endif
    receivedPacket(packet);
    packet.clear(false);
    return true; 
}   

void ZapWorkerThread::run()
{   
    if (!m_client)
        return;
    DDebug(&plugin,DebugAll,"%s is running for client (%p): %s",
        s_threadName,m_client,m_address.c_str());
    while (true) {
        if (m_client-&gt;process())
            Thread::check(true);
        else
            Thread::yield(true);
    }
}




	So basically the thread is continuously calling Thread:yield rather than waiting on some actual I/O to happen.

Files

yate-cpu.png

View yate-cpu.png

45.9 KB

laforge, 11/25/2023 02:23 PM

No data to display

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Retronetworking » OCTOI - Osmocom Community TDM over IP

Custom queries

Bug #6274

yate is eating 260% CPU while doing nothing