I have a strange problem at the moment.
Im running an MQTT server on a raspberry pi connected to my home network.
This work really well and i can test with any MQTT publisher and messages are received very quickly and consistanlty.
I am using a NB-IOT Device to send MQTT packages to the raspberry pi broker, as shown in this user manual https://www.dragino.com/downloads/downloads/NB-IoT/NBSN95/NBSN95_NBSN95A_NB-IoT_Sensor_Node_UserManual_v2.1.pdf
Everything works but the time it takes to connect to the broker is very inconsistant.
Here is a screenshot of the data usage on one of my NB-IOT Sim cards.
The transmission only takes about 5 secconds and so a normal session time should be around 10-15 secconds to connect to the broker and transmitt the data then receive the acknowledgement.
Sometimes it take 30-40 minutes to connect to the broker.
Ive checked signal strength on the device and it seems to be fine.
Does anyone know if i can check anything else?
Are you connecting from your iot device to your raspberry pi broker, or to another broker in the cloud?
Let's step through the possible points of failure
1. DNS, are you using the hostname of the mqtt server or its IP address
2. SSL - are you using SSL or plain MQTT?
3. TCP - can you use tcpdump or wireshark to view the packets arriving at the machine running your MQTT server?
4. MQTT - are you running your MQTT server (I presume its mosquitto if on a raspberry pi) in debug mode with extra logging?
5. Sessions - are you using a client ID that is always the same, or random? Are you using the clean session option on or off?
So the Raspberry pi is the MQTT broker and is runnign the latest version of mosquitto.
1. I am currently using a No-IP hostname to route through to my public IP incase we lose power or router is turned off/on. I have also tested using just the public ip of my router and port number 1883 and the problem still happens. note that i am port forwarding 1883 to my pi using an assigned static ip.
2. I believe im using plain MQTT.
3. I have used Tcpdump to check the arriving and outgoing packets and they show the behavior i have explained where the device sometimes takes a long time to connect to the mqtt broker. Note that i am connected to the dragino device with serial and can see when it begins trying to connect. After it connects the packet is sent and the connection closed in about 5-10 secconds.
4. yes mosquitto and i am running in verbose mode, i dont have any issue connecting from an external IP using a websocket MQTT tester.
5. My dragino device is set to use the same client id, username and password and uses clean sessions.
I can see you say the messages are "received very quickly" but later I can see "it takes 30-40 minutes to connect". Is it possible you are confusing the time the device is attached to the network and the time it takes to actually send? To me, 30-40 minutes is not very quick and well beyond the time it takes for NB-IoT to send messages.
It sounds like the Dragino is sending very quickly but for some reason is staying attached for an extended and somewhat random period of time (up to 30-40 minutes). Staying attached for longer than is necessary will reduce battery life if the Dragino is battery powered. Is it possible you are not explicitly disconnecting from the network after you have sent your data? The standardised command to detach from a network is AT+CGATT=0 Presumably Dragino supports this command. Maybe it's worth checking your code to see if you are detaching from the network properly after you have finished sending your data.
Sorry for the confusion, and the long winded reply :O
I am connected to the dragino device through serial and can press a reset button to make it go through a full MQTT connection and packet send function then disconnect. I am also connected to the Raspberry pi via ssh and am viewing the output of the mosquitto broker in the command line to see incomming connections/publishes/packets/subscriptions.
The behaviour i see is as follows:
The big issue is why the dragino device is having such intermittent issues connecting to the broker.
The other issue is that using tcpdump i can see how many bytes are received/transmitted to and from the device (aprox 350 bytes for a full connection, packet send, subscribe, packet receive and disconnect with all the qos1 confirmations). but when i download the data usage, the sessions show data usage of 1 kb for a less then 1 mintue session, 1-2kb for a 5 minute session and 2-5 kb for a 20-50 minute session. It would be nice for the downloaded data usage to actually show how many bytes are being used and not just round to kb, and i added all the data usage (in rounded kb) and it matched the exact percentage usage shown on the card, does this mean the % is not being correctly calculated based on bytes?
You've done a thorough job of troubleshooting, and all the parts of your setup that you've described make good sense.
So we're left with blaming either the device or the network. I've previously had folks at Telstra go above and beyond to diagnose such problems, so perhaps the telstradev folks can put you in touch with somone in the m2m technical area. Another option (which I'd probably try first) is to see if you can customise the dragino firmware to add more logging of what it's doing.
For example, it might be getting an initial dns failure and then backing off for a retry interval, or experiencing some other transient error. I've often thought "gee it'd be nice to have wireshark for cellular"....actually if you have access to an SDR (software defined radio) you would at least be able to see whether the device is transmitting packets and trying to get online, or whether it is sitting around twiddling its thumbs.
One of the joys of lockdown is you find yourself with time on your hands looking for something to do. So I used that to think a bit further about your device problem.
Just building on @Chistoper Biggs comment, if you haven't already, it would be worth you getting hold of a copy of the BC95-G command manual and specifically take a look at the AT+NLOGLEVEL command. It could be changing the loglevel will provide you with more information on what is going on. My only concern with this approach, is based on your earlier comments about the Dragino firmware struggling with command echo turned on, it might also struggle with the radio module's loglevel changing. One way to find out.
Another thing to consider is the band configuration. Most radio modules when first powered up won't have a clue where in the world they are and will try to scan all the bands they are allowed to access. This can take considerable time although having done it once, they normally remember the last band that worked and subsequently try that first. It would be worth checking the AT+NBAND command and see what has been allowed. Telstra only uses band 28 for NB-IoT so limiting the band choice might be worthwhile consideration.
Finally I recall you had issues with the choice of APN and not being able to change the APN for cid 0, the cid used for autoconnect. I suspect this is the root cause of all your problems. You may find the reason you couldn't change the APN for cid 0 is because the radio was active. Some radio modules don't like you changing key configuration parameters when the radio is active. Can't guarantee this will work because I don't have a BC95-G to play with but you could try the following commands:
AT+CFUN=0 \\ turns off the radio
AT+CGDCONT=0,"IP","telstra.internet" \\ sets the APN
AT+CFUN=1 \\ turns the radio back on
Having done this, you have two possibilities. You could reboot the Dragino to force it to send a transmission because from what you have said previously it always retransmits after a reset. Alternatively you could just leave the device and wait for it to transmit on it's own accord following the time interval you defined using the AT+TDC command. The latter means that any intialisation sequence in the Dragino won't undo your changes. Note that I am not sure if the radio module will allow you to change the APN associated with cid 0 or not. Like I said, I don't have a Dragino or a BC95-G sitting here to play with so it may not work.
Be interesting to hear feedback on this because I have friends who are well versed on BC-95G and I could run your device issues past them.