Main Page | Report this Page
 
   
Linux Forum Index  »  Linux Development - System  »  CPUs scheduling oddities with 4 cores...
Page 1 of 1    
Author Message
...
Posted: Mon Sep 01, 2008 5:36 pm
Guest
I was running a program that is CPU intensive and often runs for a long time.
When I ran multiple processes I encountered an oddity. This is on a system
with 2 sockets each populated with a dual core AMD Opteron (so 4 cores total).

root at (no spam) tesla:/root 422# uname -a
Linux tesla.ipal.net 2.6.26.2 #1 SMP PREEMPT Sat Aug 16 22:54:27 CDT 2008 i686 Dual-Core AMD Opteron(tm) Processor 2220 AuthenticAMD GNU/Linuxroot at (no spam) tesla:/root 423#


1. When I run 4 processes as a normal user, all 4 processes use 100%.

=============================================================================
top - 17:12:34 up 15 days, 16:48, 6 users, load average: 3.91, 3.60, 2.72
Tasks: 226 total, 5 running, 221 sleeping, 0 stopped, 0 zombie
Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1026128k used, 7284576k free, 429476k buffers
Swap: 0k total, 0k used, 0k free, 138508k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12431 phil 20 0 1892 528 444 R 100 0.0 7525:47 factorize
30204 phil 20 0 1892 528 444 R 100 0.0 0:54.62 factorize
30210 phil 20 0 1892 528 444 R 100 0.0 0:54.63 factorize
30196 phil 20 0 1892 528 444 R 100 0.0 0:54.39 factorize
30241 root 8 -12 2460 1288 888 R 0 0.0 0:00.15 top
=============================================================================

2. When I run 1 process as a normal user and 3 processes as root, then one
of the CPUs is not being used.

=============================================================================
top - 17:14:54 up 15 days, 16:50, 6 users, load average: 3.89, 3.67, 2.87
Tasks: 223 total, 5 running, 218 sleeping, 0 stopped, 0 zombie
Cpu(s): 75.0%us, 0.1%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1024864k used, 7285840k free, 429796k buffers
Swap: 0k total, 0k used, 0k free, 138508k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30387 root 20 0 1892 528 444 R 100 0.0 1:32.78 factorize
30403 root 20 0 1892 532 444 R 100 0.0 1:32.69 factorize
30398 root 20 0 1892 532 444 R 67 0.0 1:01.87 factorize
12431 phil 20 0 1892 528 444 R 33 0.0 7527:05 factorize
30493 root 8 -12 2460 1252 888 R 0 0.0 0:00.07 top
=============================================================================

Why would the kernel choose to NOT schedule ONE CPU just because 3 processes
are running as root instead of a non-root user? I could understand root maybe
getting special priority (but the priorities here were set the same).

And setting the user process to negative niceness does not change it:

=============================================================================
top - 17:18:20 up 15 days, 16:54, 6 users, load average: 3.99, 3.83, 3.08
Tasks: 223 total, 5 running, 218 sleeping, 0 stopped, 0 zombie
Cpu(s): 75.0%us, 0.0%sy, 0.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1024596k used, 7286108k free, 428948k buffers
Swap: 0k total, 0k used, 0k free, 138496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30387 root 20 0 1892 528 444 R 100 0.0 4:58.55 factorize
30403 root 20 0 1892 532 444 R 100 0.0 4:58.33 factorize
30398 root 20 0 1892 532 444 R 66 0.0 3:19.08 factorize
12431 phil 10 -10 1892 528 444 R 34 0.0 7528:13 factorize
30771 root 8 -12 2460 1276 888 R 0 0.0 0:00.03 top
=============================================================================

I would have hope that at least in this case it would have given 12431 more
CPU access. And even setting the root processes to positive nice also did
not change this:

=============================================================================
top - 17:19:44 up 15 days, 16:55, 6 users, load average: 3.99, 3.86, 3.16
Tasks: 223 total, 6 running, 217 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.4%us, 0.0%sy, 66.6%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1023604k used, 7287100k free, 429040k buffers
Swap: 0k total, 0k used, 0k free, 138496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30387 root 30 10 1892 528 444 R 100 0.0 6:22.81 factorize
30403 root 30 10 1892 532 444 R 100 0.0 6:22.50 factorize
30398 root 30 10 1892 532 444 R 67 0.0 4:15.28 factorize
12431 phil 10 -10 1892 528 444 R 33 0.0 7528:41 factorize
30843 root 8 -12 2460 1256 888 R 0 0.0 0:00.01 top
=============================================================================

I fired up 3 additional non-root processes (now 4 non-root and 3 root) and
this does less all 4 CPUs run. The prioritizing is strange in this case
since one of the new (nice 0) processes gets 100% but the priority one
(nice -10) still only gets 33%.

=============================================================================
top - 17:27:21 up 15 days, 17:03, 6 users, load average: 6.38, 4.78, 3.76
Tasks: 229 total, 9 running, 220 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.9%us, 0.1%sy, 50.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1026256k used, 7284448k free, 429824k buffers
Swap: 0k total, 0k used, 0k free, 138496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31274 phil 20 0 1892 528 444 R 100 0.0 1:31.85 factorize
30403 root 30 10 1892 532 444 R 67 0.0 13:27.16 factorize
30387 root 30 10 1892 528 444 R 67 0.0 13:28.06 factorize
30398 root 30 10 1892 532 444 R 67 0.0 9:19.47 factorize
12431 phil 10 -10 1892 528 444 R 33 0.0 7531:14 factorize
31270 phil 20 0 1892 528 444 R 33 0.0 0:30.74 factorize
31264 phil 20 0 1892 524 444 R 33 0.0 0:30.74 factorize
31051 root 8 -12 2460 1288 888 R 0 0.0 0:00.82 top
=============================================================================

Here I let top run for a 120 second measurement interval to see if this is
a case of juggling which processes get more CPU time:

=============================================================================
top - 17:31:13 up 15 days, 17:07, 6 users, load average: 6.99, 5.97, 4.47
Tasks: 229 total, 9 running, 220 sleeping, 0 stopped, 0 zombie
Cpu(s): 50.0%us, 0.0%sy, 50.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8310704k total, 1026628k used, 7284076k free, 430192k buffers
Swap: 0k total, 0k used, 0k free, 138496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31274 phil 20 0 1892 528 444 R 100 0.0 5:23.80 factorize
30398 root 30 10 1892 532 444 R 67 0.0 11:54.46 factorize
30403 root 30 10 1892 532 444 R 67 0.0 16:02.01 factorize
30387 root 30 10 1892 528 444 R 67 0.0 16:02.91 factorize
12431 phil 10 -10 1892 528 444 R 33 0.0 7532:31 factorize
31264 phil 20 0 1892 524 444 R 33 0.0 1:48.21 factorize
31270 phil 20 0 1892 528 444 R 33 0.0 1:48.20 factorize
31582 root 8 -12 2460 1268 892 R 0 0.0 0:00.01 top
=============================================================================

We can see that 31274 is being inappropriately favored in this case as it
has accumulated over the 2 minute period more CPU time than the others.
Process 12431 should be getting the most, but isn't.

In all the above cases, process 12431 has been running for a few days.

--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
Bernhard Agthe...
Posted: Tue Sep 02, 2008 5:13 am
Guest
Hi,

Quote:
1. When I run 4 processes as a normal user, all 4 processes use 100%.

Not surprising.

Quote:
2. When I run 1 process as a normal user and 3 processes as root, then one
of the CPUs is not being used.
Why would the kernel choose to NOT schedule ONE CPU just because 3 processes
are running as root instead of a non-root user? I could understand root maybe
getting special priority (but the priorities here were set the same).

Yes, root does get very special priorities, because you want root able
to "rescue" a system which is on the brink of overload ;-)

Why would you want to run your processes as root, anyway?

Quote:
I fired up 3 additional non-root processes (now 4 non-root and 3 root) and
this does less all 4 CPUs run. The prioritizing is strange in this case
since one of the new (nice 0) processes gets 100% but the priority one
(nice -10) still only gets 33%.

Do you wonder? Three of your four CPUs get to handle two CPU-intensive
processes, so why should any of the respective two get 100%? Well, as
you found out, root gets some extra priority and you need super-user
rights to set the nice value for a process beyond a certain threshold,
so you could experiment with root-priority for a user process - but
again, why? ;-)

Have fun...
...
Posted: Thu Sep 04, 2008 12:38 am
Guest
On Tue, 02 Sep 2008 12:13:44 +0200 Bernhard Agthe <dark2star at (no spam) gmx.net> wrote:
| Hi,
|
|> 1. When I run 4 processes as a normal user, all 4 processes use 100%.
|
| Not surprising.
|
|> 2. When I run 1 process as a normal user and 3 processes as root, then one
|> of the CPUs is not being used.
|> Why would the kernel choose to NOT schedule ONE CPU just because 3 processes
|> are running as root instead of a non-root user? I could understand root maybe
|> getting special priority (but the priorities here were set the same).
|
| Yes, root does get very special priorities, because you want root able
| to "rescue" a system which is on the brink of overload ;-)

But that does not explain why a CPU is left idle. Letting root be the first
up to take a CPU over all others MIGHT be reasonably explained. However,
do niceness settings mean anything? Does root running at nice 19 still get
all control of the system in lieu of a user running at nice 0?


| Why would you want to run your processes as root, anyway?

Actually, it was a typo. They were run that way unintentionally. I would
not have discovered the funny behaviour otherwise.


|> I fired up 3 additional non-root processes (now 4 non-root and 3 root) and
|> this does less all 4 CPUs run. The prioritizing is strange in this case
|> since one of the new (nice 0) processes gets 100% but the priority one
|> (nice -10) still only gets 33%.
|
| Do you wonder? Three of your four CPUs get to handle two CPU-intensive
| processes, so why should any of the respective two get 100%? Well, as
| you found out, root gets some extra priority and you need super-user
| rights to set the nice value for a process beyond a certain threshold,
| so you could experiment with root-priority for a user process - but
| again, why? ;-)

There being 4 processes ready to run, and 4 CPUs available, then all 4 should
be running at 100%. That was not the case. 2 CPUs ran 2 of the processes,
1 CPU ran the next 2 processes, and 1 CPU sat idle ... in the case where root
was owner of 3 of the processes. This does not explain why leave one of the
CPUs idle doing nothing.

--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
Bernhard Agthe...
Posted: Thu Sep 04, 2008 4:39 am
Guest
Hi,

Quote:
But that does not explain why a CPU is left idle. Letting root be the first
up to take a CPU over all others MIGHT be reasonably explained. However,
do niceness settings mean anything? Does root running at nice 19 still get
all control of the system in lieu of a user running at nice 0?

Actually, I cannot tell you why one core is idle. As to the "nice"
setting, I think this is a relative measure, so if root starts out with
a higher priority and you lessen it, it might still be higher than a
user with (marginally) increased prio. There's a maximum increase users
are allowed actually, anything else needs root permissions to do.

Quote:
| Why would you want to run your processes as root, anyway?

Actually, it was a typo. They were run that way unintentionally. I would
not have discovered the funny behaviour otherwise.

OK ;-)

Quote:
There being 4 processes ready to run, and 4 CPUs available, then all 4 should
be running at 100%. That was not the case. 2 CPUs ran 2 of the processes,
1 CPU ran the next 2 processes, and 1 CPU sat idle ... in the case where root
was owner of 3 of the processes. This does not explain why leave one of the
CPUs idle doing nothing.

Yup. I cannot give you an answer to that. Sorry.

Ciao...
...
Posted: Thu Sep 04, 2008 10:19 pm
Guest
On Thu, 04 Sep 2008 11:39:47 +0200 Bernhard Agthe <dark2star at (no spam) gmx.net> wrote:
| Hi,
|
|> But that does not explain why a CPU is left idle. Letting root be the first
|> up to take a CPU over all others MIGHT be reasonably explained. However,
|> do niceness settings mean anything? Does root running at nice 19 still get
|> all control of the system in lieu of a user running at nice 0?
|
| Actually, I cannot tell you why one core is idle. As to the "nice"
| setting, I think this is a relative measure, so if root starts out with
| a higher priority and you lessen it, it might still be higher than a
| user with (marginally) increased prio. There's a maximum increase users
| are allowed actually, anything else needs root permissions to do.

I could understand the case where root would be a higher priority than a user,
at least for a default (0) nice value. If I set a root process to a niceness
greater than 0 *AND* set a user process to a niceness less than 0, then at
some point the user process should run in lieu of the root process. I want
a way to have processes NOT bog down the system even though they have to run
with root permission.

It's the fact that one CPU is idle that I take issue with. I suspect there
may be a bug in the kernel where some logic attempting to manipulate which
CPUs some processes can run on (affinity) has an error. I suppose I need to
do further testing with more combinations, and on other machines with other
numebrs of CPUs. For example, if running 1 user and 1 root process on a dual
CPU machine leaves one CPU idle, that's telling me (bad) things.

--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Mon Dec 01, 2008 9:09 pm