Box Offline, Service running but cant stop/restart

Hello,

I have Netgate 6100’s running pfsense 22.05 and Standard release adam:ONE

I am facing an issue where the box will show offline in the dashboard and the primary symptom is users at the site no longer can resolve internal DNS.

Digging into it this is what I am seeing:
sockstat |grep anmuscle returns no results

service anmuscle.sh status returns that the service is running
service anmuscle.sh stop runs without error but the service does not stop
service anmuscle.sh restart returns 96919 another instance already running

running the command “top” I do not see anmuscle running like I have at sites that are functional.

So far the only fix has been to reboot the firewall which is disruptive to the site.

Is there a better way to force the service to stop?

I’ve been able to fix this in the past by running ps aux | grep anmuscle then see what the PID is, then run. kill <pid> after this run service anmuscle.sh start

1 Like

@fernando Thank you for that piece of information. “ps aux” shows no running instance of anmuscle on the Netgate 6100’s currently showing this issue, trying to start the service yields

E 15/2 19:57:43.044493 96858 another instance already running
anmuscle is running on 12463

It sounds like the muscle didn’t delete it’s PID file when it crashed.
You can run the command rm /var/run/anmuscle.* to remove lock and PID files after which it should be OK to start again.
Also what version are you running there @Chris.Kraydich.Indie as you may want to try upgrading to Rapid Release to see if it solves the crashing issue.
adamone-upgrade test to do so.

1 Like

@atw Thank you that was the last breadcrumb I needed. I was able to remove the PID file and start the service

I do have one firewall running rapid release and just had the same issue. I just upgraded it to the RR version 4 hours ago so I will continue to monitor. I just got notification that 23.01 is now available so I will review the release notes and do some testing on our Development firewall.

OK if there is a crash in the current Rapid Release we certainly want to know about that. Please work with @tom on any details there.

For pfSense 23.01 you’ll just need to re-install the adam:ONE package afterwards so that it gets the correct package for that kernel version.

1 Like

@tom I just had the same issue on a box running RR about 30 minutes ago. I will submit the zip with the info and email it to the support team shortly

An update on this:
pfSense 22.05
adamone-core 4.8.4_12
adamone 4.6.7_1

No more issues crashing but the service has not been starting on power up/reboot. Requires manual remove of the pid file and starting the service currently.

On the dev firewall we have upgraded to 23.01 and the service is currently working on a reboot. We have some more testing to do before deploying 23.01 to production.