nRF52832 + RFM95W - Tx sometimes gets stuck

Hi,
I’m using a custom board based on nRF52, with Adafruit Arduino Core, running LMIC LoRaWAN library v4.1.1. I have successfully connected it to the Helium network (EU 868), and most of the time I can send data without any problems (simple 0/1 status, sent when input changes).

However, every now and then (let’s say after a day or multiple days, not consistent), it doesn’t get to EV_TXCOMPLETE within 60 seconds. The function LMIC_setTxData2 runs without error, but then it can take minutes (up to 20 minutes in one occasion) for the transmit to actually happen.

This is part of the log when the issue happens (before that if was working 2+ days, as seen in millis()). Here it times out and needs 3.5 minutes to send data:

07:56:13.154 -> Send. os:6C248EB2 milis:153583415 op=900 txrx=20 txchn=0 duty=65756954
07:56:13.201 -> Packet queued
07:56:15.465 -> EV_TXCOMPLETE (includes waiting for RX windows)
07:56:15.465 -> os:6C26BEBE milis:153585708 op=900 txrx=20 txchn=1 duty=6C27379F
07:56:16.491 -> Send. os:6C27B450 milis:153586714 op=900 txrx=20 txchn=1 duty=6C27379F
07:56:16.491 -> Packet queued
07:56:20.599 -> EV_TXCOMPLETE (includes waiting for RX windows)
07:56:20.599 -> os:6C2BA459 milis:153590843 op=900 txrx=20 txchn=2 duty=6C2C1D3A
07:56:21.623 -> os:6C2C9A28 milis:153591850 op=900 txrx=20 txchn=2 duty=6C2C1D3A
17:25:29.948 -> Send. os:6B62060B milis:187739169 op=900 txrx=20 txchn=2 duty=6C2C1D3A
17:25:29.948 -> Packet queued
17:26:29.916 -> LMIC timeout!
17:26:29.916 -> os:6B9B4063 milis:187799177 op=108 txrx=20 txchn=0 duty=6C2C1D3A
17:26:30.947 -> Send. os:6B9C35F5 milis:187800183 op=108 txrx=20 txchn=0 duty=6C2C1D3A
17:26:30.947 -> LMIC opmode err: 108
17:27:30.958 -> LMIC timeout!
17:27:30.958 -> os:6BD5704D milis:187860191 op=108 txrx=20 txchn=0 duty=6C2C1D3A
17:28:30.958 -> LMIC timeout!
17:28:30.958 -> os:6C0EA937 milis:187920193 op=108 txrx=20 txchn=0 duty=6C2C1D3A
17:29:06.452 -> EV_TXCOMPLETE (includes waiting for RX windows)

Here is a code snippet for sending and debug log:

void do_send(osjob_t* j){
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & (OP_TXRXPEND | OP_POLL | OP_TXDATA)) 
    {
      Serial.print("LMIC opmode err: ");
      Serial.println(LMIC.opmode, HEX);     
    }
    else 
    {
      // prepare data
      mr_status = digitalRead(GPIO_IN_MR);
      last_status = mr_status; // potentially override status in main loop

      // save LED to show status during sending
      if (mr_status == 0) used_led = LED_GREEN;
      else used_led = LED_RED;
      
      // Clear Payload
      lpp.reset();
      lpp.addDigitalInput(1, mr_status);
      
      // reset tx_complete flag, to continuously poll lmic until tx is finished
      tx_complete = 0;               

      // Prepare upstream data transmission at the next possible time.
      // transmit on port 1 (the first parameter); you can use any value from 1 to 223 (others are reserved).
      // don't request an ack (the last parameter, if not zero, requests an ack from the network).
      // Remember, acks consume a lot of network resources; don't ask for an ack unless you really need it.
      lmic_tx_error_t ret = LMIC_setTxData2(1, lpp.getBuffer(), lpp.getSize(), 0);

      Serial.println("Packet queued");
    }
}

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);

  // Uncomment to blocking wait for Serial connection
  // while ( !Serial ) delay(10);

  Serial.println("GarageNotifier Init");
  Serial.println("--------------------------");

  Bluefruit.begin();
  Bluefruit.autoConnLed(false);
  //Bluefruit.setTxPower(4);    // Check bluefruit.h for supported values

  pinMode(GPIO_IN_MR, INPUT);

  ledOff(LED_GREEN);
  ledOff(LED_RED);
  
  // LMIC init
  os_init();
  // Reset the MAC state. Session and pending data transfers will be discarded.
  LMIC_reset();

  // ZB: latest LMIC limits max error to 0.4%, 
  // define LMIC_ENABLE_arbitrary_clock_error in lmic_project_config.h if more is needed
  // disabled here, no benefit was visible (defaults to 0.1%)
  // LMIC_setClockError(1 * MAX_CLOCK_ERROR / 100);
  
  LMIC_setLinkCheckMode(0); // Disable link-check mode and ADR, because ADR tends to complicate testing.
  LMIC_setDrTxpow(DR_SF7, 14); // SF7 = shortest time on air. 14 dBm output (max for EU)
  // LMIC_selectSubBand(1); // ZB: Sub band 2 only for US, not relevant in EU (and will result in errors)

  // Start job (sending automatically starts OTAA too)
  do_send(&sendjob); 
}

void loop() {
  // run LMIC loop continuously until tx is finished
  do 
  {
    os_runloop_once();
    delay(1);
    
    if (tx_complete == 0 && ++led_prescale > 50)
    {
      // current state (green/red LED) as tx status info, until sent
      digitalToggle(used_led);
      led_prescale = 0;
    }
    
    if ((millis() - start_time > 60000) && tx_complete == 0)
    {
      Serial.println("LMIC timeout!");
      start_time = millis();

      if (LMIC.opmode & OP_JOINING)
      {
        // if joining, don't reset LMIC
        Serial.println("Joining, LMIC reset skipped!");
        break;
      }

      // reset/rejoin disabled for debug
      /*LMIC_reset();
      LMIC_setLinkCheckMode(0);
      do_send(&sendjob); */
      break;
    }
  }
  while (tx_complete == 0);

  if (cnt < 2)
  {
    Serial.print("os:");
    Serial.print(os_getTime(), HEX);
    Serial.print(" milis:");
    Serial.print(millis());

    Serial.print(" op=");  Serial.print(LMIC.opmode, HEX);
    Serial.print(" txrx="); Serial.print(LMIC.txrxFlags, HEX);  
    Serial.print(" txchn=");  Serial.print(LMIC.txChnl, HEX);
    Serial.print(" duty=");  Serial.println(LMIC.globalDutyAvail, HEX);
    cnt++;    
  }

  ledOff(LED_GREEN);
  ledOff(LED_RED);
  // loop delay when no tx in progress, to save power (goes automatically to sleep in this period)
  delay(1000);
  
  mr_status = digitalRead(GPIO_IN_MR);
  if (last_status != mr_status)
  {   
    last_status = mr_status;

    Serial.print("Send. os:");
    Serial.print(os_getTime(), HEX);
    Serial.print(" milis:");
    Serial.print(millis());

    Serial.print(" op=");  Serial.print(LMIC.opmode, HEX);
    Serial.print(" txrx="); Serial.print(LMIC.txrxFlags, HEX);  
    Serial.print(" txchn=");  Serial.print(LMIC.txChnl, HEX);
    Serial.print(" duty=");  Serial.println(LMIC.globalDutyAvail, HEX);
   
    os_runloop_once();
    tx_complete = 0; 
    start_time = millis();
    // request new transmission
    os_setCallback(&sendjob, do_send);    
    
    cnt=0;   
  }  
}

Event handler code is taken from standard example, all Serial prints disabled except for EV_TXCOMPLETE, where flag tx_complete is set to 1 so the MCU doesn’t need to loop continuously.

Any ideas where to start to debug this?

print out the value of ret after LMIC_setTxData2() and see if it’s failing. If so, the LMIC is busy handling previous commands.

I suspect (since you’re in EU868) that you’re running out of available “time on air” and the LMIC is blocked on a previous transmit, waiting for available time.

I will do that, but note that there was a 10 hour gap between transmits (problematic tx starts at 17:25:29.948). How is it possible to run out of time on air?

Also, you can ignore the Send at 17:26:30.947, that one exits for the reason you mentioned, LMIC is busy with previous transmit that still didn’t finish (“timeout” won’t reset or retry anything in this debug code, it just prints to log every minute until message is sent).

I have added printing of LMIC_setTxData2() return value, but it’s 0, so no problem visible there:

16:59:45.014 -> Send. os:3F5DD98 milis:35422854 op=900 txrx=20 txchn=0 duty=667CD8D
16:59:45.014 -> LMIC setTxData ret: 0
16:59:45.014 -> Packet queued
17:00:44.997 -> LMIC timeout!
17:00:44.997 -> os:42F17F1 milis:35482862 op=108 txrx=20 txchn=1 duty=667CD8D
17:00:45.991 -> Send. os:4300D83 milis:35483868 op=108 txrx=20 txchn=1 duty=667CD8D
17:00:46.039 -> LMIC opmode err: 108
17:01:46.037 -> LMIC timeout!
17:01:46.037 -> os:46947DB milis:35543875 op=108 txrx=20 txchn=1 duty=667CD8D
17:02:46.005 -> LMIC timeout!
17:02:46.005 -> os:4A28102 milis:35603878 op=108 txrx=20 txchn=1 duty=667CD8D
17:03:46.011 -> LMIC timeout!