Broadcom BCM5701 / HP NC6770

Здравствуйте,

Post by Erich Jenkins, Fuujin Group Ltd
We were previously running 5.3 on this box (I know, VERY old), but never
had a problem. The link now fails to come up. I've tried forcing the
port out of auto (media 1000baseSX mediaopt full-duplex) and as long as
the port doesn't have an IP assigned via rc.conf on system boot, I can
get the switch to see it (Cisco 6505), but no traffic to flow. I've
checked all the obvious things (duplex setting on switch, cable failure,
etc.), all to no avail. I fiddled with the knobs in the driver (rxcsum,
vlan_mtu, etc) with no changes either.

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

--
Evgenii V Davidov

Erich Jenkins, Fuujin Group Ltd

2010-04-09 08:13:00 UTC

Post by Evgenii Davidov
Здравствуйте,

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

Actually, I was silly for not mentioning that in the original post, but
yes, it's up. I've even tried up/down/up on the card a few times after
fiddling with the driver knobs, but the same thing happened, no link on
the switch.

I've also tested this back-to-back with two machines and the same cards.
Same result: no transmission of data. Interestingly enough, the link
lite is lit on the cards (in back-to-back) and on the card but not the
switch. Not sure if that's significant or not. In the interest if
completeness, here's ifconfig output for that card:

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:08:02:28:76:4d
inet 10.222.222.144 netmask 0xffffff00 broadcast 10.222.222.255
media: Ethernet 1000baseSX <full-duplex>
status: active

I also verified the MTU size to ensure the switch ports weren't
configured for jumbo frames. They are correctly set on the switch and on
the FreeBSD box, but no traffic flows.

--
Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Pyun YongHyeon

2010-04-09 17:38:21 UTC

Post by Evgenii Davidov
Здравствуйте,

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

Actually, I was silly for not mentioning that in the original post, but
yes, it's up. I've even tried up/down/up on the card a few times after
fiddling with the driver knobs, but the same thing happened, no link on
the switch.
I've also tested this back-to-back with two machines and the same cards.
Same result: no transmission of data. Interestingly enough, the link
lite is lit on the cards (in back-to-back) and on the card but not the
switch. Not sure if that's significant or not. In the interest if
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:08:02:28:76:4d
inet 10.222.222.144 netmask 0xffffff00 broadcast 10.222.222.255
media: Ethernet 1000baseSX <full-duplex>
status: active
I also verified the MTU size to ensure the switch ports weren't
configured for jumbo frames. They are correctly set on the switch and on
the FreeBSD box, but no traffic flows.

Would you try booting to latest 7.3-RELEASE and check whether you
still see the issue?
If you see the same issue please show me verbose boot message of
bge(4) and its PHY driver.

Erich Jenkins, Fuujin Group Ltd

2010-04-10 06:13:07 UTC

Post by Evgenii Davidov
Здравствуйте,

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

Actually, I was silly for not mentioning that in the original post, but
yes, it's up. I've even tried up/down/up on the card a few times after
fiddling with the driver knobs, but the same thing happened, no link on
the switch.
I've also tested this back-to-back with two machines and the same cards.
Same result: no transmission of data. Interestingly enough, the link
lite is lit on the cards (in back-to-back) and on the card but not the
switch. Not sure if that's significant or not. In the interest if
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:08:02:28:76:4d
inet 10.222.222.144 netmask 0xffffff00 broadcast 10.222.222.255
media: Ethernet 1000baseSX <full-duplex>
status: active
I also verified the MTU size to ensure the switch ports weren't
configured for jumbo frames. They are correctly set on the switch and on
the FreeBSD box, but no traffic flows.

Would you try booting to latest 7.3-RELEASE and check whether you
still see the issue?
If you see the same issue please show me verbose boot message of
bge(4) and its PHY driver.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

Just finished the install, and ended up with the same result. I
installed OpenBSD on this box just to be sure there wasn't something
unrelated to the driver causing the issue. OpenBSD works fine.

Where to from here?

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Pyun YongHyeon

2010-04-10 21:25:20 UTC

ÐÐŽÑÐ°Ð²ÑÑÐ²ÑÐ¹ÑÐµ,
On Fri, Apr 09, 2010 at 01:39:16AM -0600, Erich Jenkins, Fuujin Group

Post by Erich Jenkins, Fuujin Group Ltd
We were previously running 5.3 on this box (I know, VERY old), but
never had a problem. The link now fails to come up. I've tried forcing
the port out of auto (media 1000baseSX mediaopt full-duplex) and as
long as the port doesn't have an IP assigned via rc.conf on system
boot, I can get the switch to see it (Cisco 6505), but no traffic to
flow. I've checked all the obvious things (duplex setting on switch,
cable failure, etc.), all to no avail. I fiddled with the knobs in the
driver (rxcsum, vlan_mtu, etc) with no changes either.

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

Actually, I was silly for not mentioning that in the original post, but
yes, it's up. I've even tried up/down/up on the card a few times after
fiddling with the driver knobs, but the same thing happened, no link on
the switch.
I've also tested this back-to-back with two machines and the same cards.
Same result: no transmission of data. Interestingly enough, the link
lite is lit on the cards (in back-to-back) and on the card but not the
switch. Not sure if that's significant or not. In the interest if
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:08:02:28:76:4d
inet 10.222.222.144 netmask 0xffffff00 broadcast 10.222.222.255
media: Ethernet 1000baseSX <full-duplex>
status: active
I also verified the MTU size to ensure the switch ports weren't
configured for jumbo frames. They are correctly set on the switch and on
the FreeBSD box, but no traffic flows.

Erich Jenkins, Fuujin Group Ltd

2010-04-11 01:06:31 UTC

????????????,
On Fri, Apr 09, 2010 at 01:39:16AM -0600, Erich Jenkins, Fuujin Group

Post by Erich Jenkins, Fuujin Group Ltd
We were previously running 5.3 on this box (I know, VERY old), but
never had a problem. The link now fails to come up. I've tried forcing
the port out of auto (media 1000baseSX mediaopt full-duplex) and as
long as the port doesn't have an IP assigned via rc.conf on system
boot, I can get the switch to see it (Cisco 6505), but no traffic to
flow. I've checked all the obvious things (duplex setting on switch,
cable failure, etc.), all to no avail. I fiddled with the knobs in the
driver (rxcsum, vlan_mtu, etc) with no changes either.

sorry for silly ansver but is your interface UP ? (ifconfig bge0 up)

Actually, I was silly for not mentioning that in the original post, but
yes, it's up. I've even tried up/down/up on the card a few times after
fiddling with the driver knobs, but the same thing happened, no link on
the switch.
I've also tested this back-to-back with two machines and the same cards.
Same result: no transmission of data. Interestingly enough, the link
lite is lit on the cards (in back-to-back) and on the card but not the
switch. Not sure if that's significant or not. In the interest if
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:08:02:28:76:4d
inet 10.222.222.144 netmask 0xffffff00 broadcast 10.222.222.255
media: Ethernet 1000baseSX <full-duplex>
status: active
I also verified the MTU size to ensure the switch ports weren't
configured for jumbo frames. They are correctly set on the switch and on
the FreeBSD box, but no traffic flows.

It seems there is async link state change issue for BCM5701 TBI
case. Link state handling is one of the most complex thing in
bge(4) so I'm not sure whether attached patch is right thing.
Public data sheet seems to indicate bit 0 of BGE_MI_STS should be
set to enable link to the MAC state machine if autopolling is
disabled so resetting the bit to 0 does not look right to me.
Try attached patch.
------------------------------------------------------------------------
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

Thanks for the patch, but I'm sorry to report there is no change.

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Erich Jenkins, Fuujin Group Ltd

2010-04-11 09:15:16 UTC

I've been muddling around in src/sys/dev on the old system and the new
system and there appear to be rather major changes to MII and bge,
possibly the whole stack?

There are a number of things that seem to have been merged with other
parts of the network stack, or perhaps written into the individual
drivers (someone working on the net stack would have to verify that).

For instance, some files called in 5.3-REL seem to have gone away
completely, and in the new (unpatched) version of if_bge.c under
7.3-REL, calls to these modules are gone:

- #include <vm/vm.h> /* for vtophys */
- #include <vm/pmap.h> /* for vtophys */
- #include <machine/clock.h> /* for DELAY */
- #include <machine/bus_memio.h>

- #include <dev/pci/pcireg.h> (called but something changed in here)
- #include <dev/pci/pcivar.h> (ditto above)

It appears that the checksum features have been completely rewritten,
and some of the ring settings have changed. It's interesting that the
driver only fills 256 of the rx rings in the hopes that the cpu is "fast
enough to keep up with the NIC". Would a subroutine here to grab the cpu
clock and count (number of procs/pipelines) be more trouble than it's
worth to "automagically" increase the number of rx rings the driver
fills based on the system in which it's installed?

Something also changed in pci/pcireg.h and pci/pcivar.h, but I haven't
had the time to hunt down and expand the source tree from the 5.3-REL
branch yet.

I have other machines with copper nics utilizing the bge driver, and
there are no issues at all. Perhaps I'm getting ahead of things, but
since this seems to have been broken through several releases, would it
make any sense to split the support between the BCM5701KHB chipset and
the more recent BCM chipset to avoid causing issues with cards/systems
not currently experiencing troubles?

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Pyun YongHyeon

2010-04-12 17:57:01 UTC

Post by Erich Jenkins, Fuujin Group Ltd
I've been muddling around in src/sys/dev on the old system and the new
system and there appear to be rather major changes to MII and bge,
possibly the whole stack?

It was not completely rewritten but many improvements were made.

Post by Erich Jenkins, Fuujin Group Ltd
There are a number of things that seem to have been merged with other
parts of the network stack, or perhaps written into the individual
drivers (someone working on the net stack would have to verify that).
For instance, some files called in 5.3-REL seem to have gone away
completely, and in the new (unpatched) version of if_bge.c under
- #include <vm/vm.h> /* for vtophys */
- #include <vm/pmap.h> /* for vtophys */

One of the most significant changes would be bus_dma(9) conversion
which is required to all drivers to make it work correctly on a
variety of platforms. bus_dma(9) does not directly use vtophys
anymore so these headers were nuked.

Post by Erich Jenkins, Fuujin Group Ltd
- #include <machine/clock.h> /* for DELAY */
- #include <machine/bus_memio.h>
- #include <dev/pci/pcireg.h> (called but something changed in here)
- #include <dev/pci/pcivar.h> (ditto above)

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Post by Erich Jenkins, Fuujin Group Ltd
and some of the ring settings have changed. It's interesting that the
driver only fills 256 of the rx rings in the hopes that the cpu is "fast
enough to keep up with the NIC". Would a subroutine here to grab the cpu

That magic number 256 is adequate for most cases but it may not be
enough to handle heavy loads. Internally the controller use fixed
512 RX buffers but bge(4) used only half of the buffers to save
resources. I think you can increase SSLOTS to 512 to get full 512
RX buffers.

Post by Erich Jenkins, Fuujin Group Ltd
clock and count (number of procs/pipelines) be more trouble than it's
worth to "automagically" increase the number of rx rings the driver
fills based on the system in which it's installed?

Dynamically increasing number of RX buffers is doable but it would
add much more code. If there is high demand for that I would just
increase number of RX buffers to 512. Controller can't be
configured to have more than 512 RX buffers.

Post by Erich Jenkins, Fuujin Group Ltd
Something also changed in pci/pcireg.h and pci/pcivar.h, but I haven't
had the time to hunt down and expand the source tree from the 5.3-REL
branch yet.
I have other machines with copper nics utilizing the bge driver, and
there are no issues at all. Perhaps I'm getting ahead of things, but

Yes that is expected one. :-)

Post by Erich Jenkins, Fuujin Group Ltd
since this seems to have been broken through several releases, would it
make any sense to split the support between the BCM5701KHB chipset and
the more recent BCM chipset to avoid causing issues with cards/systems
not currently experiencing troubles?

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.

One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

Pyun YongHyeon

2010-04-12 19:42:09 UTC

It was not completely rewritten but many improvements were made.

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Yes that is expected one. :-)

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.
One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

If polling(4) make it work, try attached patch.

Erich Jenkins, Fuujin Group Ltd

2010-04-13 00:10:30 UTC

It was not completely rewritten but many improvements were made.

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Yes that is expected one. :-)

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.
One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

If polling(4) make it work, try attached patch.
------------------------------------------------------------------------
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

I'll get this set up. I've got a jail issues on 7.0-REL that I'm trying
to figure out too, so it might take a few hours before I get to this.

I just checked on a reported iSCSI error on a machine using a BCM5721
nic (copper gigE) and I'm seeing issues like this:

Apr 11 06:24:59 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 11 06:24:59 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 11 16:51:52 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 11 16:51:52 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 10:32:49 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 10:32:49 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 11:55:42 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 11:55:42 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 14:07:13 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 14:07:13 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed

Any chance this could be because of the NIC chipset? I don't see this on
any of the machines configured identically, using the em driver for
Intel GigE nics.

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Erich Jenkins, Fuujin Group Ltd

2010-04-13 01:05:11 UTC

It was not completely rewritten but many improvements were made.

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Yes that is expected one. :-)

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.
One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

I'll get this set up. I've got a jail issues on 7.0-REL that I'm trying
to figure out too, so it might take a few hours before I get to this.

I beleive bge(4) in 7.0-RELEASE and 7.3-RELEASE is quite different.
So I'm not sure whether the patch works on 7.0-RELEASE.

Post by Erich Jenkins, Fuujin Group Ltd
I just checked on a reported iSCSI error on a machine using a BCM5721
Apr 11 06:24:59 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 11 06:24:59 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 11 16:51:52 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 11 16:51:52 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 10:32:49 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 10:32:49 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 11:55:42 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 11:55:42 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Apr 12 14:07:13 san0 iscsi-target: pid 863:iscsi.c:1149: ***ERROR*** Bad
"Opcode": Got 0 expected 5.
Apr 12 14:07:13 san0 iscsi-target: pid 863:target.c:1317: ***ERROR***
iscsi_write_data_decap() failed
Any chance this could be because of the NIC chipset? I don't see this on
any of the machines configured identically, using the em driver for
Intel GigE nics.

Have no idea what happens here. Does this also happen on
7.3-RELEASE?
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

Sorry, I meant to say I'd reinstall 7.3-REL to test this.

The iSCSI issues are happening on 8.0-REL. Are there any major
differences between 7.3 and 8.0 in the bge driver or network stack that
could be contributing to this?

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Pyun YongHyeon

2010-04-13 00:24:59 UTC

It was not completely rewritten but many improvements were made.

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Post by Erich Jenkins, Fuujin Group Ltd
and some of the ring settings have changed. It's interesting that the
driver only fills 256 of the rx rings in the hopes that the cpu is
"fast enough to keep up with the NIC". Would a subroutine here to grab
the cpu

Yes that is expected one. :-)

Post by Erich Jenkins, Fuujin Group Ltd
since this seems to have been broken through several releases, would
it make any sense to split the support between the BCM5701KHB chipset
and the more recent BCM chipset to avoid causing issues with
cards/systems not currently experiencing troubles?

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.
One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

I'll get this set up. I've got a jail issues on 7.0-REL that I'm trying
to figure out too, so it might take a few hours before I get to this.

I beleive bge(4) in 7.0-RELEASE and 7.3-RELEASE is quite different.
So I'm not sure whether the patch works on 7.0-RELEASE.

Have no idea what happens here. Does this also happen on
7.3-RELEASE?
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

7.3-RELEASE has more recent bge(4) changes. Most changes are
related with RX buffer handling and bus_dma(9) fixes. See CVS web
interface for more details. 8.0 has new network stack that has some
nice/experimental features. But I guess it wouldn't affect iscsi
behavior. It's just wild guess, I have no experience on iscsi so
others can point out iscsi differences between 7.3-RELEASE and
8.0-RELEASE.

Erich Jenkins, Fuujin Group Ltd

2010-04-13 05:15:02 UTC

Post by Erich Jenkins, Fuujin Group Ltd
I've been muddling around in src/sys/dev on the old system and the
new system and there appear to be rather major changes to MII and

<snip>

Have no idea what happens here. Does this also happen on
7.3-RELEASE?
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

Sorry, I meant to say I'd reinstall 7.3-REL to test this.
The iSCSI issues are happening on 8.0-REL. Are there any major
differences between 7.3 and 8.0 in the bge driver or network stack that
could be contributing to this?
Erich M. Jenkins
Fuujin Group Limited
"You should never, never doubt what no one is sure about."
-- Gene Wilder
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-net

Well, after a reinstall of 7.3-REL and trying polling, there seems to be
no difference in network connectivity using th BCM5701 GigE nics. Is
this a problem worth fighting considering how many different cards with
broadcom chipsets might be affected by changes without splitting the driver?

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Pyun YongHyeon

2010-04-13 00:02:55 UTC

It was not completely rewritten but many improvements were made.

No, these headers are still present.

Post by Erich Jenkins, Fuujin Group Ltd
It appears that the checksum features have been completely rewritten,

Checksum offloading was not completely rewritten but workaround
for buggy controllers was added.

Yes that is expected one. :-)

I'd like to if I can. Supporting huge number of different
controllers in single driver is maintenance nightmare. However,
rewriting some part that require special handling for certain
controller/revision is too risky because I don't have access to
most controllers.
One theory for the issue I got while reading the code is link state
handling. As I said in previous mail, link state handling for TBI
is somewhat tricky in bge(4) and driver seemed to rely on periodic
register access to keep track of link state. I guess polling(4) may
give different behavior on link state handling as it does not rely
on interrupts at all. So would you try to use polling(4) and see
that make any difference on your box?

I'll get this set up. I've got a jail issues on 7.0-REL that I'm trying
to figure out too, so it might take a few hours before I get to this.

I beleive bge(4) in 7.0-RELEASE and 7.3-RELEASE is quite different.
So I'm not sure whether the patch works on 7.0-RELEASE.

Have no idea what happens here. Does this also happen on
7.3-RELEASE?

Erich Jenkins, Fuujin Group Ltd

2010-04-10 06:22:18 UTC

Forgot the verbose boot messages:

Apr 9 22:59:37 test newsyslog[494]: logfile first created
Apr 9 22:59:38 test syslogd: kernel boot file is /boot/kernel/kernel
Apr 9 22:59:38 test kernel: Copyright (c) 1992-2010 The FreeBSD Project.
Apr 9 22:59:38 test kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988,
1989, 1991, 1992, 1993, 1994
Apr 9 22:59:38 test kernel: The Regents of the University of
California. All rights reserved.
Apr 9 22:59:38 test kernel: FreeBSD is a registered trademark of The
FreeBSD Foundation.
Apr 9 22:59:38 test kernel: FreeBSD 7.3-RELEASE #0: Sun Mar 21 06:15:01
UTC 2010
Apr 9 22:59:38 test kernel:
***@walker.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386
Apr 9 22:59:38 test kernel: Timecounter "i8254" frequency 1193182 Hz
quality 0
Apr 9 22:59:38 test kernel: CPU: Intel(R) Xeon(TM) MP CPU 2.80GHz
(2799.92-MHz 686-class CPU)
Apr 9 22:59:38 test kernel: Origin = "GenuineIntel" Id = 0xf25
Stepping = 5
Apr 9 22:59:38 test kernel:
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Apr 9 22:59:38 test kernel: Features2=0x4400<CNXT-ID,xTPR>
Apr 9 22:59:38 test kernel: Logical CPUs per core: 2
Apr 9 22:59:38 test kernel: real memory = 4093607936 (3903 MB)
Apr 9 22:59:38 test kernel: avail memory = 4003602432 (3818 MB)
Apr 9 22:59:38 test kernel: ACPI APIC Table: <COMPAQ 00000083>
Apr 9 22:59:38 test kernel: FreeBSD/SMP: Multiprocessor System
Detected: 8 CPUs
Apr 9 22:59:38 test kernel: cpu0 (BSP): APIC ID: 0
Apr 9 22:59:38 test kernel: cpu1 (AP/HT): APIC ID: 1
Apr 9 22:59:38 test kernel: cpu2 (AP): APIC ID: 2
Apr 9 22:59:38 test kernel: cpu3 (AP/HT): APIC ID: 3
Apr 9 22:59:38 test kernel: cpu4 (AP): APIC ID: 4
Apr 9 22:59:38 test kernel: cpu5 (AP/HT): APIC ID: 5
Apr 9 22:59:38 test kernel: cpu6 (AP): APIC ID: 6
Apr 9 22:59:38 test kernel: cpu7 (AP/HT): APIC ID: 7
Apr 9 22:59:38 test kernel: MADT: Forcing active-low polarity and level
trigger for SCI
Apr 9 22:59:38 test kernel: ioapic0 <Version 1.1> irqs 0-15 on motherboard
Apr 9 22:59:38 test kernel: ioapic1 <Version 1.1> irqs 16-31 on motherboard
Apr 9 22:59:38 test kernel: ioapic2 <Version 1.1> irqs 32-47 on motherboard
Apr 9 22:59:38 test kernel: ioapic3 <Version 1.1> irqs 48-63 on motherboard
Apr 9 22:59:38 test kernel: kbd1 at kbdmux0
Apr 9 22:59:38 test kernel: acpi0: <COMPAQ P27> on motherboard
Apr 9 22:59:38 test kernel: acpi0: [ITHREAD]
Apr 9 22:59:38 test kernel: acpi0: Power Button (fixed)
Apr 9 22:59:38 test kernel: Timecounter "ACPI-safe" frequency 3579545
Hz quality 850
Apr 9 22:59:38 test kernel: acpi_timer0: <32-bit timer at 3.579545MHz>
port 0x920-0x923 on acpi0
Apr 9 22:59:38 test kernel: pcib0: <ACPI Host-PCI bridge> on acpi0
Apr 9 22:59:38 test kernel: pci0: <ACPI PCI bus> on pcib0
Apr 9 22:59:38 test kernel: pci0: <base peripheral> at device 2.0 (no
driver attached)
Apr 9 22:59:38 test kernel: pci0: <base peripheral> at device 2.2 (no
driver attached)
Apr 9 22:59:38 test kernel: vgapci0: <VGA-compatible display> port
0x2800-0x28ff mem 0xf6000000-0xf6ffffff,0xf5ff0000-0xf5ff0fff at device
3.0 on pci0
Apr 9 22:59:38 test kernel: isab0: <PCI-ISA bridge> at device 15.0 on pci0
Apr 9 22:59:38 test kernel: isa0: <ISA bus> on isab0
Apr 9 22:59:38 test kernel: atapci0: <ServerWorks CSB5 UDMA100
controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at
device 15.1 on pci0
Apr 9 22:59:38 test kernel: ata0: <ATA channel 0> on atapci0
Apr 9 22:59:38 test kernel: ata0: [ITHREAD]
Apr 9 22:59:38 test kernel: ata1: <ATA channel 1> on atapci0
Apr 9 22:59:38 test kernel: ata1: [ITHREAD]
Apr 9 22:59:38 test kernel: ohci0: <OHCI (generic) USB controller> mem
0xf5fe0000-0xf5fe0fff irq 7 at device 15.2 on pci0
Apr 9 22:59:38 test kernel: ohci0: [GIANT-LOCKED]
Apr 9 22:59:38 test kernel: ohci0: [ITHREAD]
Apr 9 22:59:38 test kernel: usb0: OHCI version 1.0, legacy support
Apr 9 22:59:38 test kernel: usb0: SMM does not respond, resetting
Apr 9 22:59:38 test kernel: usb0: <OHCI (generic) USB controller> on ohci0
Apr 9 22:59:38 test kernel: usb0: USB revision 1.0
Apr 9 22:59:38 test kernel: uhub0: <(0x1166) OHCI root hub, class 9/0,
rev 1.00/1.00, addr 1> on usb0
Apr 9 22:59:38 test kernel: uhub0: 4 ports with 4 removable, self powered
Apr 9 22:59:38 test kernel: pcib1: <ACPI Host-PCI bridge> on acpi0
Apr 9 22:59:38 test kernel: pci1: <ACPI PCI bus> on pcib1
Apr 9 22:59:38 test kernel: ciss0: <Compaq Smart Array 5i> port
0x3000-0x30ff mem 0xf7cc0000-0xf7cfffff,0xf7bf0000-0xf7bf3fff irq 31 at
device 1.0 on pci1
Apr 9 22:59:38 test kernel: ciss0: [ITHREAD]
Apr 9 22:59:38 test kernel: pcib2: <ACPI Host-PCI bridge> on acpi0
Apr 9 22:59:38 test kernel: pci2: <ACPI PCI bus> on pcib2
Apr 9 22:59:38 test kernel: em0: <Intel(R) PRO/1000 Network Connection
6.9.6> port 0x4000-0x403f mem
0xf7de0000-0xf7dfffff,0xf7dc0000-0xf7ddffff irq 16 at device 2.0 on pci2
Apr 9 22:59:38 test kernel: em0: [FILTER]
Apr 9 22:59:38 test kernel: em0: Ethernet address: 00:1b:21:20:1a:72
Apr 9 22:59:38 test kernel: pci2: <base peripheral, PCI hot-plug
controller> at device 30.0 (no driver attached)
Apr 9 22:59:38 test kernel: pcib3: <ACPI Host-PCI bridge> on acpi0
Apr 9 22:59:38 test kernel: pci6: <ACPI PCI bus> on pcib3
Apr 9 22:59:38 test kernel: pci6: <base peripheral, PCI hot-plug
controller> at device 30.0 (no driver attached)
Apr 9 22:59:38 test kernel: pcib4: <ACPI Host-PCI bridge> on acpi0
Apr 9 22:59:38 test kernel: pci10: <ACPI PCI bus> on pcib4
Apr 9 22:59:38 test kernel: bge0: <HP NC6770 Gigabit Ethernet
Controller, ASIC rev. 0x000105> mem 0xf7ff0000-0xf7ffffff irq 26 at
device 1.0 on pci10
Apr 9 22:59:38 test kernel: bge0: Ethernet address: 00:08:02:28:76:4d
Apr 9 22:59:38 test kernel: bge0: [ITHREAD]
Apr 9 22:59:38 test kernel: acpi_tz0: <Thermal Zone> on acpi0
Apr 9 22:59:38 test kernel: atkbdc0: <Keyboard controller (i8042)> port
0x60,0x64 irq 1 on acpi0
Apr 9 22:59:38 test kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0
Apr 9 22:59:38 test kernel: kbd0 at atkbd0
Apr 9 22:59:38 test kernel: atkbd0: [GIANT-LOCKED]
Apr 9 22:59:38 test kernel: atkbd0: [ITHREAD]
Apr 9 22:59:38 test kernel: sio0: <Standard PC COM port> port
0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Apr 9 22:59:38 test kernel: sio0: type 16550A
Apr 9 22:59:38 test kernel: sio0: [FILTER]
Apr 9 22:59:38 test kernel: fdc0: <floppy drive controller (FDE)> port
0x3f2-0x3f5 irq 6 drq 2 on acpi0
Apr 9 22:59:38 test kernel: fdc0: [FILTER]
Apr 9 22:59:38 test kernel: cpu0: <ACPI CPU> on acpi0
Apr 9 22:59:38 test kernel: p4tcc0: <CPU Frequency Thermal Control> on cpu0
Apr 9 22:59:38 test kernel: cpu1: <ACPI CPU> on acpi0
Apr 9 22:59:38 test kernel: p4tcc1: <CPU Frequency Thermal Control> on cpu1
Apr 9 22:59:38 test kernel: cpu2: <ACPI CPU> on acpi0
Apr 9 22:59:38 test kernel: p4tcc2: <CPU Frequency Thermal Control> on cpu2
Apr 9 22:59:38 test kernel: cpu3: <ACPI CPU> on acpi0
Apr 9 22:59:38 test kernel: p4tcc3: <CPU Frequency Thermal Control> on cpu3
Apr 9 22:59:38 test kernel: pmtimer0 on isa0
Apr 9 22:59:38 test kernel: orm0: <ISA Option ROMs> at iomem
0xc0000-0xc7fff,0xc8000-0xcbfff,0xcc000-0xccfff,0xcd000-0xce7ff,0xee000-0xeffff
pnpid ORM0000 on isa0
Apr 9 22:59:38 test kernel: ppc0: parallel port not found.
Apr 9 22:59:38 test kernel: sc0: <System console> at flags 0x100 on isa0
Apr 9 22:59:38 test kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Apr 9 22:59:38 test kernel: sio1: configured irq 3 not in bitmap of
probed irqs 0
Apr 9 22:59:38 test kernel: sio1: port may not be enabled
Apr 9 22:59:38 test kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df
iomem 0xa0000-0xbffff on isa0
Apr 9 22:59:38 test kernel: Timecounters tick every 1.000 msec
Apr 9 22:59:38 test kernel: acd0: CDRW <DW-224E/A.1K> at ata0-master PIO4
Apr 9 22:59:38 test kernel: da0 at ciss0 bus 0 target 0 lun 0
Apr 9 22:59:38 test kernel: da0: <COMPAQ RAID 1 VOLUME OK> Fixed
Direct Access SCSI-0 device
Apr 9 22:59:38 test kernel: da0: 135.168MB/s transfers
Apr 9 22:59:38 test kernel: da0: Command Queueing Enabled
Apr 9 22:59:38 test kernel: da0: 70128MB (143624160 512 byte sectors:
255H 32S/T 17601C)
Apr 9 22:59:38 test kernel: SMP: AP CPU #4 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #6 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #1 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #3 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #5 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #7 Launched!
Apr 9 22:59:38 test kernel: SMP: AP CPU #2 Launched!
Apr 9 22:59:38 test kernel: Trying to mount root from ufs:/dev/da0s1a
Apr 9 22:59:39 test kernel: em0: link state changed to UP

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder