"Tamás Laczkó (IJ/ETH)"
2007-09-06 07:27:39 UTC
Hi guys,
I sent this question to this list back in August, but it seems that the
list was not really alive during that period, so I thought I'll just try
again. So here it is again, I would really appreciate your comments,
ideas about the following problem I have:
The problem occured when I was trying to start a kernel with kexec on a
system with a Dual Core AMD Opteron processor (see /proc/cpuinfo
attached at the end). The kernel I was using is the 2.6.16 kernel
supplied with SLES10, compiled for 32 bit architecture, not configured
to use SMP. I was using this for both (first and the second) kernels,
they were pratically the same. The new (second or kexec'd) kernel just
rebooted the processor at a point during the bootup procedure when
unmasking the timer interrupt in the PIC during bootup (during
time_init) without any error message whatsoever. I also tested the
scenario using a newer kernel (2.6.21) compiled for 64 bits and
configured to use SMP, but the problem still existed there...
With some help and testing (starting the new kernel using grub, which
worked fine, comparing interrupt environments for grub and kexec) I was
able to find out that the problem was related to the i8259 hardware and
how it is connected to the APIC. After a hardware reset the PIC is
connected using Virtual Wire Mode directly to the local APIC and not
using the IO APIC.
Debug info from the IO APIC at startup using grub:
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 001 01 1 0 0 0 0 0 0 00
02 001 01 1 0 0 0 0 0 0 00
03 001 01 1 0 0 0 0 0 0 00
04 001 01 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 001 01 1 0 0 0 0 0 0 00
08 001 01 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 001 01 1 0 0 0 0 0 0 00
0b 001 01 1 0 0 0 0 0 0 00
0c 001 01 1 0 0 0 0 0 0 00
0d 001 01 1 0 0 0 0 0 0 00
0e 001 01 1 0 0 0 0 0 0 00
0f 001 01 1 0 0 0 0 0 0 00
Debug info from the local APIC at start up using grub:
printing local APIC contents on CPU#0/0:
... APIC ID: 00000000 (0)
... APIC VERSION: 00040010
... APIC TASKPRI: 00000000 (00)
... APIC ARBPRI: 00000000 (00)
... APIC PROCPRI: 00000000
... APIC EOI: 00000000
... APIC RRR: 00000000
... APIC LDR: 00000000
... APIC DFR: ffffffff
... APIC SPIV: 0000010f
... APIC ISR field:
... APIC TMR field:
... APIC IRR field:
... APIC ESR: 00000004
... APIC ICR: 00004630
... APIC ICR2: 01000000
... APIC LVTT: 00010000
... APIC LVTPC: 00010000
... APIC LVT0: 00000700
... APIC LVT1: 00000400
... APIC LVTERR: 0001000f
... APIC TMICT: 00000000
... APIC TMCCT: 00000000
... APIC TDCR: 00000000
During booting up of the first kernel (in grub) I have the following
Kernel Warning message:
"ExtINT not setup in hardware but reported by MP table"
Looking at the code in the enable_IO_APIC() function
(arch/i386/kernel/io_apic.c), this message indicates that the system
could not find the PIC routed through IO APIC looking at the IO APIC
registers (so after this search, 'ioapic_i8259.pin' was '-1'), but it
could find it in the MP tables searching for legacy IRQs. It then
decided to override the 'ioapic_i8259.pin' value ('-1') according to the
value found in the MP tables, which in this case was '0'. That's why
later, before the shutdown of the first kernel, in the disable_IO_APIC()
function (also in io_apic.c), the new 'ioapic_i8259.pin' value '0'
triggered the enabling of Virtual Wire Mode connecting the PIC through
the IO APIC (and not directly to the local APIC which was the case at
normal boot up using grub), which in my case eventually meant that the
new kernel just rebooted the processor when unmasking the timer
interrupt in the PIC during bootup (during time_init).
An easy and fast way to quickly correct my problem is not to trust the
MP table and just check the IO APIC IRQ redirection table to find out if
the PIC is connected through the IO APIC in the enable_IO_APIC()
function (in the first kernel).
But what is the reason for trusting the MP table and create this
connection if it's not there from the beginning?
Obviously, the system is not able to start up if the PIC is connected
through the IO APIC by the APIC setup that is made in disable_IO_APIC()
before the shutdown of the first kernel.
What would be a suggested nice solution for this problem?
Thanks in advance,
Tamas
-------------------------------------------------------------------------------------
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 165
stepping : 2
cpu MHz : 1800.374
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 3606.40
-------------------------------------------------------------------------------------
I sent this question to this list back in August, but it seems that the
list was not really alive during that period, so I thought I'll just try
again. So here it is again, I would really appreciate your comments,
ideas about the following problem I have:
The problem occured when I was trying to start a kernel with kexec on a
system with a Dual Core AMD Opteron processor (see /proc/cpuinfo
attached at the end). The kernel I was using is the 2.6.16 kernel
supplied with SLES10, compiled for 32 bit architecture, not configured
to use SMP. I was using this for both (first and the second) kernels,
they were pratically the same. The new (second or kexec'd) kernel just
rebooted the processor at a point during the bootup procedure when
unmasking the timer interrupt in the PIC during bootup (during
time_init) without any error message whatsoever. I also tested the
scenario using a newer kernel (2.6.21) compiled for 64 bits and
configured to use SMP, but the problem still existed there...
With some help and testing (starting the new kernel using grub, which
worked fine, comparing interrupt environments for grub and kexec) I was
able to find out that the problem was related to the i8259 hardware and
how it is connected to the APIC. After a hardware reset the PIC is
connected using Virtual Wire Mode directly to the local APIC and not
using the IO APIC.
Debug info from the IO APIC at startup using grub:
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 001 01 1 0 0 0 0 0 0 00
02 001 01 1 0 0 0 0 0 0 00
03 001 01 1 0 0 0 0 0 0 00
04 001 01 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 001 01 1 0 0 0 0 0 0 00
08 001 01 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 001 01 1 0 0 0 0 0 0 00
0b 001 01 1 0 0 0 0 0 0 00
0c 001 01 1 0 0 0 0 0 0 00
0d 001 01 1 0 0 0 0 0 0 00
0e 001 01 1 0 0 0 0 0 0 00
0f 001 01 1 0 0 0 0 0 0 00
Debug info from the local APIC at start up using grub:
printing local APIC contents on CPU#0/0:
... APIC ID: 00000000 (0)
... APIC VERSION: 00040010
... APIC TASKPRI: 00000000 (00)
... APIC ARBPRI: 00000000 (00)
... APIC PROCPRI: 00000000
... APIC EOI: 00000000
... APIC RRR: 00000000
... APIC LDR: 00000000
... APIC DFR: ffffffff
... APIC SPIV: 0000010f
... APIC ISR field:
... APIC TMR field:
... APIC IRR field:
... APIC ESR: 00000004
... APIC ICR: 00004630
... APIC ICR2: 01000000
... APIC LVTT: 00010000
... APIC LVTPC: 00010000
... APIC LVT0: 00000700
... APIC LVT1: 00000400
... APIC LVTERR: 0001000f
... APIC TMICT: 00000000
... APIC TMCCT: 00000000
... APIC TDCR: 00000000
During booting up of the first kernel (in grub) I have the following
Kernel Warning message:
"ExtINT not setup in hardware but reported by MP table"
Looking at the code in the enable_IO_APIC() function
(arch/i386/kernel/io_apic.c), this message indicates that the system
could not find the PIC routed through IO APIC looking at the IO APIC
registers (so after this search, 'ioapic_i8259.pin' was '-1'), but it
could find it in the MP tables searching for legacy IRQs. It then
decided to override the 'ioapic_i8259.pin' value ('-1') according to the
value found in the MP tables, which in this case was '0'. That's why
later, before the shutdown of the first kernel, in the disable_IO_APIC()
function (also in io_apic.c), the new 'ioapic_i8259.pin' value '0'
triggered the enabling of Virtual Wire Mode connecting the PIC through
the IO APIC (and not directly to the local APIC which was the case at
normal boot up using grub), which in my case eventually meant that the
new kernel just rebooted the processor when unmasking the timer
interrupt in the PIC during bootup (during time_init).
An easy and fast way to quickly correct my problem is not to trust the
MP table and just check the IO APIC IRQ redirection table to find out if
the PIC is connected through the IO APIC in the enable_IO_APIC()
function (in the first kernel).
But what is the reason for trusting the MP table and create this
connection if it's not there from the beginning?
Obviously, the system is not able to start up if the PIC is connected
through the IO APIC by the APIC setup that is made in disable_IO_APIC()
before the shutdown of the first kernel.
What would be a suggested nice solution for this problem?
Thanks in advance,
Tamas
-------------------------------------------------------------------------------------
cat /proc/cpuinfo
processor : 0vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 165
stepping : 2
cpu MHz : 1800.374
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 3606.40
-------------------------------------------------------------------------------------