View previous topic :: View next topic |
Author |
Message |
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Mon Jan 05, 2009 7:48 am Post subject: SNAP protocol and message collisions |
|
|
Here is an 'interesting' problem.
I have some code ( part of a much larger project ), the relevant bits are like this:
Code: |
.............
Getinput:
while ischarwaiting() > 0
temp2 = inkey()
gosub SetSnapState
wend
return
'********************** Set SNAP state based on input ****************************************
'input byte is in temp2
'snapstate is set for each different state of the message
'bSnapFlag.7 is set when crc passes and full data packet is available
SetSnapState:
gosub Calc_crc 'update crc
select case bSnapState
case 8:
crc1 = temp2
if Crc = 0 then 'valid CRC, so message ok
if hdb2.0 > 0 then 'want an ack ?
'waitms 20
'gosub snapack 'just send an ack
set bSnapflag.6
end if
set bSnapFlag.7 'set message flag, to tell us message was valid
'end if
' reset porta.3
else 'crc fails
'waitms 20
' gosub snapnak 'send a nak
set bSnapFlag.5
end if
bSnapState = 0 'setup ready for next message
case 7:
crc2 = temp2
bSnapState = 8
case 6:
incr bCount
datapack(bcount) = temp2
if bcount => NrBytes then
bSnapState = 7
end if
case 5:
'sab1 = temp2
HisAddress = temp2 'hold source address
if NrBytes > 0 then
bSnapState = 6
bCount = 0 'start data packet
else 'if no data bytes expected
bSnapState = 7
end if
case 4:
dab1 = temp2 'we need to hold the dest address for other routines
if temp2 = MyAddress then
bSnapState = 5
' set porta.3
else 'nessage not for me, jump out
bSnapState = 0
end if
case 3:
hdb1 = temp2
Temp1 = Hdb1 And &B01110000 ' mask out CRC type
If Temp1 = &B01000000 Then ' recognise crc16
Else
bSnapState = 0 ' unrecognised crc
End If
gosub Set_Number 'get number of bytes to expect
bSnapState = 4
case 2:
hdb2 = temp2
' Packet header check routine
' Check HDB2 to see if MCU are capable to use the packet
' structure, if not goto Start
Temp1 = Chdb2 And &B11110000
Temp2 = Hdb2 And &B11110000
If Temp1 <> Temp2 Then
bSnapState = 0
End If
bSnapState = 3
case 0:
if temp2 = Sync then 'got a sync byte
bSnapState = 2
crc = 0 'clear crc accumulator, sync not in CRC
end if
end select
return
'*******************************************************************************************
|
Serial input is set up as buffered, and the buffer length is plenty long enough.
The GetInput routine is called regularly from a main loop, and when bSnapFlag.7 is set, we assume valid data is available in the datapack array. CRC ( 16 bit ) is checked, so there is no reason to assume the data is wrong. But we were getting invalid data, even then.
Eventually I discovered that we were getting message collisions, not just random noise on the line, which should have been ignored by the SNAP CRC and header checking. But I think what is happening is that the correct message gets checked, but before the process looking for the snapflag gets a chance to read the data out of the datapack array, more input occurs from the GetInput routine, tromping on the data.
Any comments, before I attempt to fix the collisions ? ( a difficult task - I dont have full control over some of the instruments putting data on the network ). _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
bzijlstra
Joined: 30 Dec 2004 Posts: 1179 Location: Tilburg - Netherlands
|
Posted: Fri Jan 09, 2009 8:46 pm Post subject: switching to quick? |
|
|
Adrian
I don't know your hardware, but I have used RS485 and was switching for sending to receiving to quick. Couldn't that be your problem.
Have fun
Ben Zijlstra |
|
Back to top |
|
|
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Sat Jan 10, 2009 12:24 am Post subject: |
|
|
Thanks Ben.
I dont think 'too quick' applies here. I always wait something like 5 msec between messages on a half-duplex RS485 bus, and have checked the lines are stable in that time with a scope. The issue seems to be that one device can start sending before it 'should' ( time jitter ) and tromp on another message already in progress. I have to accept this, the jitter is not under my control ( yet ! ) But the SNAP protocol should simply ignore the screwed up packet ( we do 16 bit CRC error checking ), and not even accept the data. But it does - I can see it cause invalid outputs. And as yet I cant find out how it gets past the error check.
The really annoying part is that the error occurs totally at random, around 1-30 errors in 24 hours, with 6 messages sent every 250 msec, so a very low error rate. But the error causes an effect visible to the end user, and so far I cant even find a way to mask it, never mind fix the source of the problem. _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
rileyesi
Joined: 19 Dec 2006 Posts: 398
|
Posted: Sat Jan 10, 2009 6:19 pm Post subject: |
|
|
Adrian,
I don't know if this helps, or even if this change is possible in your system, but I had a project that used RF MODEMs to communicate. In my first design iteration, I had problems with the data packets colliding (for lack of a better term). I simply upped my BAUD rate and the problem went away.
In other words, can you shorten the time it takes to broadcast your data? (as opposed to lenthening the time between broadcasts)
I don't know anything about SNAP protocol, so sorry if this is a silly suggestion!
Hope this helps.
Pete |
|
Back to top |
|
|
bzijlstra
Joined: 30 Dec 2004 Posts: 1179 Location: Tilburg - Netherlands
|
Posted: Sat Jan 10, 2009 9:09 pm Post subject: this is my hardware... |
|
|
I have had luck with this hardware:
9600 baud. From a single master. Connected several slaves. I have used it on a X/Y/Z drill machine. Communication between the master and the boards for X, Y and Z is going very well. Can communicate 10000 lines without any error. Master sends and waits for a reply from the slave, before sending the next command. Some details on this page
http://members.home.nl/bzijlstra/software/communication/Benbus.htm
and the machine itself on this page
http://members.home.nl/bzijlstra/hardware/stepper/stepper.htm
(have to reload the you-tube movies)
Have fun
Ben Zijlstra |
|
Back to top |
|
|
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Sun Jan 11, 2009 12:40 am Post subject: |
|
|
Thanks for your suggestions.
@Pete,
The baud rate ( 38400 ) is plenty high enough that here is spare space in the 250 msec allowed for each set of data. In fact the space is only about 50% occupied now, but I will be adding more slaves ( and hence messages ) later, if I can get the existing bugs out ! More than about 57600 baud will give us other problems with slew rate limiting and/or RF interference.
@Ben,
Yes, that is the same topology, except I use MAX483 drivers.
I fixed most of the collision problems, by fooling around with the timing, and got the error rate down to 1-5 in 24 hours. Some improvement, but not enough. And of course now its harder to track - less errors to see
I now have the problem narrowed down to the fact that the GPS in the system is sending ublox and NMEA protocol data, on those same lines. This was a very early design decision, and seemed to work ok for the past 2 years. Its the GPS timing jitter which was causing most of the problem, its data frames are supposed to come out at 250 msec intervals, but 200-350 msec range is about what we see. But now at least one of the SNAP devices on the net are interpreting the ublox message as a SNAP message ( even though totally different headers and checksums ), and responding to it. I think if I can fix this, I will have it solved, but its still very tricky, mostly because its so rare and random. Makes it exceedingly hard to trace.
We have in mind to put another processor inside the GPS unit, and fix the timing jitter by buffering in there, as well as encapsulating the GPS data in SNAP packets, but I still want a fix for existing units, preferably just by upgrading their software - we can do this in the field, no factory return needed. _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Mon Jan 12, 2009 1:17 am Post subject: |
|
|
I think I found at least another part of the problem. In the code I posted, states 2 and 3 do not return the state to 0 if the header is not valid. Code should be:
Code: |
case 3:
hdb1 = temp2
Temp1 = Hdb1 And &B01110000 ' mask out CRC type
If Temp1 = &B01000000 Then ' recognise crc16
bSnapState = 4
Else
bSnapState = 0 ' unrecognised crc
End If
gosub Set_Number 'get number of bytes to expect
case 2:
hdb2 = temp2
' Packet header check routine
' Check HDB2 to see if MCU are capable to use the packet
' structure, if not goto Start
Temp1 = Chdb2 And &B11110000
Temp2 = Hdb2 And &B11110000
If Temp1 = Temp2 Then
bSnapState = 3
else
bSnapState = 0
End If
|
That forces the input back to the beginning state, waiting for the next valid header. Without this, the state remains where it was, until finally enough junk comes in and eventually the CRC check passes. We send around 345000 messages in 24 hours, so statistically even a CRC false pass every 65000 messages ( 16 bit CRC ) can happen.
So far I only tested this for a few hours, but no false alarms ( yet ! ) _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
bzijlstra
Joined: 30 Dec 2004 Posts: 1179 Location: Tilburg - Netherlands
|
Posted: Mon Jan 12, 2009 9:22 am Post subject: Sounds good... |
|
|
That sounds good. Hope you get it working without any errors.
Have fun
Ben Zijlstra |
|
Back to top |
|
|
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Mon Jan 12, 2009 11:17 pm Post subject: |
|
|
Yes, it helped a bit. But I found that I could still get corruption in the data, even though the CRC check passed. I worked out that there is negligible chance that the 3 header bytes and the CRC all together could be set correct by accident ( 5 bytes = 40 bits = 1 / 2 ^ 40 chance of a result ) But I was STILL seeing errors.
What was happening was that I could get a full valid message, with CRC passed, and start processing it by taking data out of the datapack array. But then another device would start sending, pass the header checks, and start filling the datapack array with more data, even while I had not finished reading the last good data. Of course this leads to bad data being processed.
The lesson is that you MUST process the datapacket quickly, before any more data can come in. Normally this is not a problem, because messages are sent in orderly time, and even the 'normal' corruption due to spikes and noise on the line will never get into a datapacket. But its a whole different kettle of fish when a device sends a series of messages overwriting an existing data stream ! _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
bzijlstra
Joined: 30 Dec 2004 Posts: 1179 Location: Tilburg - Netherlands
|
Posted: Tue Jan 13, 2009 10:21 pm Post subject: Ringbuffer |
|
|
On the Wiz810MJ while reading or writing, a ringbuffer is used. When a packet arrives through internet, the data is stored at the location of the readpointer and up, and when completed, the readpointer is updated to packetsize + 1. This way the ringbuffer can hold more than one packet. There is some mechanisme to flip over the ringbuffer end, and start at the beginning.
Have fun
Ben Zijlstra |
|
Back to top |
|
|
AdrianJ
Joined: 16 Jan 2006 Posts: 2483 Location: Queensland
|
Posted: Tue Jan 13, 2009 11:03 pm Post subject: |
|
|
Yes, good point. Like a second serial buffer, but controlled from the serial read end. I will think about that. Although I can see that its very easy to end up with the same overwrite problem, just in a different buffer. I think at this stage I will just use fast data moves, probably with the MEMCOPY command, or just in ASM, and make sure I do it immediately after the CRC verify, rather than set a flag there, and do the actual data moves later in the main loop. Thanks for your comments. _________________ Adrian Jansen
Computer language is a framework for creativity |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|