Jump to content
Froxlor Forum
  • 0

chown sporadically / very randomly failing on tasks cron job


ripieces
 Share

Question

We have a server with many "customers" (90+, because I tend to separate websites by customers when they are not related) and we get these errors very randomly by email and with about an average of 1  - 2 times per week I think (not every day at least):

Subject: Cron <root@h2> /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --tasks 1> /dev/null
Time:3/7/21, 6:40 PM

Content:

chown: invalid user: ‘froxlorlocal:froxlorlocal’

Edit: I also had it one time already where it failed with "invalid group" instead.


It's driving me a bit crazy, since I didn't find the issue yet, but it happens since quite a while, probably since after we updated to Debian 10.x (which is quite a while ago), but this also correlates with the number of "customers".
We are using libnss-extrausers on the server so the bug might be related to that.
We use PHP-FPM.

If anyone has ideas how to go about debugging this problem, help would be appreciated 😃
I already tried / looked for several things. I am suspecting it might be a problem with libnss-extrausers or chown itself or both.

On the server that has the problem:

root@h2 /etc # grep -r froxlorlocal /etc
/etc/subgid-:froxlorlocal:1279648:65536
/etc/subgid:froxlorlocal:1279648:65536
/etc/gshadow-:froxlorlocal:!::www-data
/etc/passwd-:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/shadow:froxlorlocal:*:17141:0:99999:7:::
/etc/subuid-:froxlorlocal:1279648:65536
/etc/gshadow:froxlorlocal:!::www-data
/etc/shadow-:froxlorlocal:*:17141:0:99999:7:::
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.owner = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.group = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:user = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:group = froxlorlocal
/etc/group-:froxlorlocal:x:9999:www-data
/etc/subuid:froxlorlocal:1279648:65536
/etc/passwd:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/group:froxlorlocal:x:9999:www-data

On the second server that has very similar setup, but way less customers and never has that error:

root@h3:~# grep -r froxlorlocal /etc
/etc/shadow-:froxlorlocal:*:17990:0:99999:7:::
/etc/subuid:froxlorlocal:100000:65536
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:listen.owner = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:listen.group = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:user = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:group = froxlorlocal
/etc/group-:froxlorlocal:x:9999:www-data
/etc/subgid:froxlorlocal:100000:65536
/etc/gshadow:froxlorlocal:!::www-data
/etc/shadow:froxlorlocal:*:17990:0:99999:7:::
/etc/gshadow-:froxlorlocal:!::www-data
/etc/subuid-:froxlorlocal:100000:65536
/etc/subgid-:froxlorlocal:100000:65536
/etc/passwd-:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/passwd:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/group:froxlorlocal:x:9999:www-data

 

Link to comment
Share on other sites

10 answers to this question

Recommended Posts

  • 1

Same here using libnss-extrausers. So this should only happen in moments cron is running WHILE froxlor is writing new passwd/groups to /var/lib/libextrausers.

But why happens froxlor jobs in parallel (writing extrausers and doing someting in scope of these users).

Maybe it shoud not be:
- deleting files
- generating files new from SQL using a query (consuming some time)

instead
- generating files new from SQL using a query as FILE.NEW
- delete old ones and rename/move FILE.NEW

or
- nscd may help (I removed this with migrating libnss-mysql to libnss-extrausers becauso of seldom problems nscd used 100% CPU)

Link to comment
Share on other sites

  • 0
40 minutes ago, ripieces said:

We have a server with many "customers" (90+, because I tend to separate websites by customers when they are not related) and we get these errors very randomly by email and with about an average of 1  - 2 times per week I think (not every day at least):

Subject: Cron <root@h2> /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --tasks 1> /dev/null
Time:3/7/21, 6:40 PM

Content:


chown: invalid user: ‘froxlorlocal:froxlorlocal’

Edit: I also had it one time already where it failed with "invalid group" instead.


It's driving me a bit crazy, since I didn't find the issue yet, but it happens since quite a while, probably since after we updated to Debian 10.x (which is quite a while ago), but this also correlates with the number of "customers".
We are using libnss-extrausers on the server so the bug might be related to that.
We use PHP-FPM.

If anyone has ideas how to go about debugging this problem, help would be appreciated 😃
I already tried / looked for several things. I am suspecting it might be a problem with libnss-extrausers or chown itself or both.

On the server that has the problem:


root@h2 /etc # grep -r froxlorlocal /etc
/etc/subgid-:froxlorlocal:1279648:65536
/etc/subgid:froxlorlocal:1279648:65536
/etc/gshadow-:froxlorlocal:!::www-data
/etc/passwd-:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/shadow:froxlorlocal:*:17141:0:99999:7:::
/etc/subuid-:froxlorlocal:1279648:65536
/etc/gshadow:froxlorlocal:!::www-data
/etc/shadow-:froxlorlocal:*:17141:0:99999:7:::
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.owner = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.group = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:user = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:group = froxlorlocal
/etc/group-:froxlorlocal:x:9999:www-data
/etc/subuid:froxlorlocal:1279648:65536
/etc/passwd:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/group:froxlorlocal:x:9999:www-data

On the second server that has very similar setup, but way less customers and never has that error:


root@h3:~# grep -r froxlorlocal /etc
/etc/shadow-:froxlorlocal:*:17990:0:99999:7:::
/etc/subuid:froxlorlocal:100000:65536
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:listen.owner = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:listen.group = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:user = froxlorlocal
/etc/php/7.3/fpm/pool.d/h3.vagas.co.il.conf:group = froxlorlocal
/etc/group-:froxlorlocal:x:9999:www-data
/etc/subgid:froxlorlocal:100000:65536
/etc/gshadow:froxlorlocal:!::www-data
/etc/shadow:froxlorlocal:*:17990:0:99999:7:::
/etc/gshadow-:froxlorlocal:!::www-data
/etc/subuid-:froxlorlocal:100000:65536
/etc/subgid-:froxlorlocal:100000:65536
/etc/passwd-:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/passwd:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/group:froxlorlocal:x:9999:www-data

 

sudo adduser froxlorlocal --disabled-password --no-create-home && sudo usermod -a -G www-data froxlorlocal
Link to comment
Share on other sites

  • 0
46 minutes ago, irisdina said:

sudo adduser froxlorlocal --disabled-password --no-create-home && sudo usermod -a -G www-data froxlorlocal

Thank you very much for your reply. I will report back if it solved it in a few days (I have doubts and need to test it).

Link to comment
Share on other sites

  • 0

Sadly it didn't help on the h2 one, it just happened again :(

Edit: Just so you can see I really entered the commands:

root@h2 ~ # grep -r froxlorlocal /etc
/etc/subgid-:froxlorlocal:1279648:65536
/etc/subgid:froxlorlocal:1279648:65536
/etc/gshadow-:froxlorlocal:!::www-data
/etc/passwd-:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/shadow:froxlorlocal:*:17141:0:99999:7:::
/etc/subuid-:froxlorlocal:1279648:65536
/etc/gshadow:www-data:*::froxlorlocal
/etc/gshadow:froxlorlocal:!::www-data
/etc/shadow-:froxlorlocal:*:17141:0:99999:7:::
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.owner = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:listen.group = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:user = froxlorlocal
/etc/php/7.3/fpm/pool.d/h2.vgstudios.co.il.conf:group = froxlorlocal
/etc/group-:froxlorlocal:x:9999:www-data
/etc/subuid:froxlorlocal:1279648:65536
/etc/passwd:froxlorlocal:x:9999:9999:,,,:/home/froxlorlocal:/bin/false
/etc/group:www-data:x:33:froxlorlocal
/etc/group:froxlorlocal:x:9999:www-data

 

Link to comment
Share on other sites

  • 0
21 hours ago, rseffner said:

Same here using libnss-extrausers. So this should only happen in moments cron is running WHILE froxlor is writing new passwd/groups to /var/lib/libextrausers.

But why happens froxlor jobs in parallel (writing extrausers and doing someting in scope of these users).

Maybe it shoud not be:
- deleting files
- generating files new from SQL using a query (consuming some time)

instead
- generating files new from SQL using a query as FILE.NEW
- delete old ones and rename/move FILE.NEW

or
- nscd may help (I removed this with migrating libnss-mysql to libnss-extrausers becauso of seldom problems nscd used 100% CPU)

I switched both servers to libnss-extrausers back then for the same reason, because libnss-mysql would make things eally slow and sometimes even stall a bit.


This could be indeed the reason, since up to 4 tasks can run at the same time with the default /etc/cron.d/froxlor:
 

# automatically generated cron-configuration by froxlor
# do not manually edit this file as it will be re-generated periodically.
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
#
*/5 * * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --tasks 1> /dev/null
0 0 * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --traffic 1> /dev/null
5 0 * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --usage_report 1> /dev/null
0 */6 * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --mailboxsize 1> /dev/null
*/5 * * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --letsencrypt 1> /dev/null
10 0 * * * root /usr/bin/nice -n 5 /usr/bin/php -q /var/www/froxlor/scripts/froxlor_master_cronjob.php --backup 1> /dev/null

And each of these tasks can cause a refresh of the extrausers if more than 1 job was run:
https://github.com/Froxlor/Froxlor/blob/8f850ee7f3c9339db0c09793496474fe6ab1f41c/lib/Froxlor/Cron/MasterCron.php#L114
https://github.com/Froxlor/Froxlor/blob/8f850ee7f3c9339db0c09793496474fe6ab1f41c/lib/Froxlor/Cron/MasterCron.php#L131

Link to comment
Share on other sites

  • 0
Quote

Just to be clear, your cronmail states
chown: invalid user: ‘froxlorlocal:froxlorlocal’

You know that this is the local user for froxlor, which has nothing to do with libnss-extrausers....

d00p pointed this out on GitHub, this is actually a very good question / fact :S
I still think it's related to libnss-extrausers, but that's really weird.

Link to comment
Share on other sites

  • 0

Looking (again / more properly) at the MasterCron.php code Froxlor already uses a locking mechanism based on the PID - however I think this is doomed to fail in my opinion, since cron jobs can run in parallel with different PIDs. (this is wrong info sorry)

I tried to run two scripts in parallel, one that does the nss extrausers files and one that does the chown many times each, all went through without any error, so that must be s.th. else.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...