Auditd

auditd is the userspace component to the Linux Auditing System. It's responsible for writing audit records to the disk. Viewing the logs is done with the ausearch or aureport utilities. Configuring the audit system or loading rules is done with the auditctl utility. During startup, the rules in /etc/audit/audit.rules are read by auditctl and loaded into the kernel. Alternately, there is also an augenrules program that reads rules located in /etc/audit/rules.d/ and compiles them into an audit.rules file. The audit daemon itself has some configuration options that the admin may wish to customize. They are found in the auditd.conf file.

EC2 instances with audit daemon running will stop automatically if auditd is unable to write the log files

Why would the audit daemon stop my instance if it can not write logs?

This is mainly a security response. If the system is unable to log actions or movements on the system, then if a compromise happens there would be no way to account for the actions of nefarious actors. Simply put, if auditd can't log anything to disk, no one should be on the system.

To facilitate these actions, there are configurable parameters. In the /etc/auditd.conf there are a few options that can manipulate the actions of the system which could cause a shutdown. The parameters are:

space_left

space_left_action

admin_space_left

admin_space_left_action

disk_full_action

disk_error_action

Below are the definitions for each of the above items according to man 5 auditd.conf:

space_left

   This is a numeric value in megabytes that tells the audit daemon when to perform a configurable action because the system is starting to run low on disk space.

space_left_action

   This parameter tells the system what action to take when the system has detected that it is starting to get low on disk space. Valid values are ignore, syslog, rotate, email, exec, suspend, single, and halt. If set to ignore, the audit daemon does nothing. syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. Email means that it will send a warning to the email account specified in action_mail_acct as well as sending the message to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging once its completed its action. This can be done by adding service auditd resume to the script. suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. The halt option will cause the audit daemon to shutdown the computer system.

admin_space_left

   This is a numeric value in megabytes that tells the audit daemon when to perform a configurable action because the system is running low on disk space. This should be considered the last chance to do something before running out of disk space. The numeric value for this parameter should be lower than the number for space_left.

admin_space_left_action

   This parameter tells the system what action to take when the system has detected that it is low on disk space. Valid values are ignore, syslog, rotate, email, exec, suspend, single, and halt. If set to ignore, the audit daemon does nothing. Syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. Email means that it will send a warning to the email account specified in action_mail_acct as well as sending the message to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging once its completed its action. This can be done by adding service auditd resume to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. The halt option will cause the audit daemon to shutdown the computer system.

disk_full_action

   This parameter tells the system what action to take when the system has detected that the partition to which log files are written has become full. Valid values are ignore, syslog, rotate, exec, suspend, single, and halt. If set to ignore, the audit daemon will issue a syslog message but no other action is taken. Syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging g once its completed its action. This can be done by adding service auditd resume to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. halt option will cause the audit daemon to shutdown the computer system.

disk_error_action

   This parameter tells the system what action to take whenever there is an error detected when writing audit events to disk or rotating logs. Valid values are ignore, syslog, exec, suspend, single, and halt. If set to ignore, the audit daemon will not take any action. Syslog means that it will issue no more than 5 consecutive warnings to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. halt option will cause the audit daemon to shutdown the computer system.

   By default, on Amazon Linux if the disk has an error or is full, the system is SUSPENDED.  Below are unmodified parameters from an ALAMI 2017.09 instance:

      disk_error_action = SUSPEND
      disk_full_action = SUSPEND
      admin_space_left_action = SUSPEND
      admin_space_left = 50
      space_left_action = SYSLOG
      space_left = 75

As you can see, auditd is configured to warn via syslog. You can use "email" as the value, however this value is dependent on "action_mail_acct" which is detailed below:

action_mail_acct

   This option should contain a valid email address or alias. The default address is root. If the email address is not local to the machine, you must make sure you have email properly configured on your machine and network. Also, this option requires that /usr/lib/sendmail exists on the machine.

   Additional Information regarding disk actions:

Allthough the man page mentions that if auditd.conf's disk_full_action and disk_error_action are set to SUSPEND it will still keep the daemon alive and just stop writing to disk, from all indicators, the suspend action does more than that and does include putting the computer into a sleep state. As seen with this message:

[ 16.872478] ACPI: Preparing to enter system sleep state S5

   Further messages may also be visible in /var/log/messages regarding the action auditd has taken:

grep auditd /var/log/messages | grep -i "space"

While you can change this behavior in /etc/audit/audit.conf and ignore the disk full or disk error its not the best practice. Best practice would be to set up log rotation and log file size limits to help manage the space in /var/log/audit. In RHEL machines that use LVM, /var/log/audit is usually only given 5GB. If you are using SELinux, this can fill up rather quickly due to the constant AVC denial messages if SELinux is not properly configured/used.

Possible Resolutions:

There are a few things you can do: Rotate logs and limit log size:

auditd can rotate its own logs, but not compress them. RedHat does offer the following information regarding the rotation and compression of such log files. The same may be applied to CentOS and ALAMI.

   By default, auditd in all versions of Red Hat Enterprise Linux rotates its own log files automatically when they reach a certain size, as determined by the max_log_file setting in auditd.conf (which defaults to 6 megabytes)

   Replacing auto-rotation based on size with auto-rotation based on time
   1. Disable rotation in /etc/audit/auditd.conf so that: max_log_file_action = ignore

   2. Tell auditd to reconfigure itself (applying your changes) by doing one of the following: kill -HUP $(pidof auditd)   (Any version) systemctl reload auditd   (RHEL7) service auditd reload   (RHEL6 and earlier)
   3. To manually trigger auditd to rotate, it needs to receive a USR1 signal Simple solution for daily rotation: copy auditd.cron to cron.daily

            ~]# cp /usr/share/doc/audit-*/auditd.cron /etc/cron.daily

            ~]# chmod +x /etc/cron.daily/auditd.cron

            ~]# cat /etc/cron.daily/auditd.cron

       #!/bin/sh
        ##########
        # This script can be installed to get a daily log rotation
        # based on a cron job.
        ##########
       /sbin/service auditd rotate
       EXITVALUE=$?
       if [ $EXITVALUE != 0 ]; then
         /usr/bin/logger -t auditd "ALERT exited abnormally with [$EXITVALUE]"
       fi
       exit 0

   Implementing log compression
   auditd does not support log compression; however, it's trivial to update the above script to rename old audit.log.n files and compresses them. A working example is provided for demonstration purposes.

   1. Follow the steps above to disable auto-rotation based on size
   2. Replace the previously-created script with the following code:

   #!/bin/bash
       export PATH=/sbin:/bin:/usr/sbin:/usr/bin
       FORMAT="%F_%T"  # Customize timestamp format as desired, per `man date`
       # %F_%T will lead to files like: audit.log.2015-02-26_15:43:46
       COMPRESS=gzip   # Change to bzip2 or xz as desired
       KEEP=5          # Number of compressed log files to keep
       rename_and_compress_old_logs() {
           for file in $(find /var/log/audit/ -name 'audit.log.[0-9]'); do      
           timestamp=$(ls -l --time-style="+${FORMAT}" ${file} | awk '{print $6}')
           newfile=${file%.[0-9]}.${timestamp}
               # Optional:
           remove "-v" verbose flag from next 2 lines to hide output
               mv -v ${file} ${newfile}
               ${COMPRESS} -v ${newfile}
               done
       }
       delete_old_compressed_logs() {
           # Optional: remove "-v" verbose flag to hide output
           rm -v $(find /var/log/audit/ -regextype posix-extended -regex '.*audit\.log\..*(xz|gz|bz2)$' | sort -n | head -n -${KEEP})
       }
       rename_and_compress_old_logs
       service auditd rotate
          rename_and_compress_old_logs

       delete_old_compressed_logs

   3. Modify the declarations of FORMAT, COMPRESS, and KEEP as desired
   4. Ensure the script is marked executable and set it to be called by cron at desired times (either via a normal cron job or by putting it in cron.daily as demonstrated above)

audit: backlog limit exceeded

AWS method

https://repost.aws/knowledge-center/troubleshoot-audit-backlog-errors-ec2

Short description The audit backlog buffer in a Linux system is a kernel level socket buffer queue that the operating system uses to maintain or log audit events. When a new audit event triggers, the system logs the event and adds it to the audit backlog buffer queue.

The backlog_limit parameter value is the number of audit backlog buffers. The parameter is set to 320 by default, as shown in the following example:

# auditctl -s
enabled 1
failure 1
pid 2264
rate_limit 0
backlog_limit 320
lost 0
backlog 0

Audit events logged beyond the default number of 320 cause the following errors on the instance:

audit: audit_backlog=321 > audit_backlog_limit=320 

audit: audit_lost=44393 audit_rate_limit=0 audit_backlog_limit=320 

audit: backlog limit exceeded
-or-

audit_printk_skb: 153 callbacks suppressed 

audit_printk_skb: 114 callbacks suppressed

An audit buffer queue at or exceeding capacity might also cause the instance to freeze or remain in an unresponsive state.

To avoid backlog limit exceeded errors, increase the backlog_limit parameter value. Large servers have a larger number of audit logs triggered, so increasing buffer space helps avoid error messages.

Note: Increasing the audit buffer consumes more of the instance's memory. How large you make the backlog_limit parameter depends on the total memory of the instance. If the system has enough memory, you can try doubling the existing backlog_limit parameter value.

The following is a calculation of the memory required for the auditd backlog. Use this calculation to determine how large you can make the backlog queue without causing memory stress on your instance.

One audit buffer = 8970 Bytes Default number of audit buffers (backlog_limit parameter) = 320 320 * 8970 = 2870400 Bytes, or 2.7 MiB

The size of the audit buffer is defined by the MAX_AUDIT_MESSAGE_LENGTH parameter. For more information, see MAX_AUDIT_MESSAGE_LENGTH in the Linux audit library on github.com.

Note: If your instance is inaccessible and you see backlog limit exceeded messages in the system log, stop and start the instance. Then, perform the following steps to change the audit buffer value.

Resolution Note: In this example, we're changing the backlog_limit parameter value to 8192 buffers. 8192 buffers equals 70 MiB of memory based on the preceding calculation. You can use any value based on your memory calculation.

Access the instance using SSH.

Verify the current audit buffer size.

Note: The backlog_limit parameter is listed as -b. For more information, see auditctl(8) on the auditctl-man-page

Amazon Linux 1 and other operating systems that don't have systemd:

$ sudo cat /etc/audit/audit.rules
# This file contains the auditctl rules that are loaded
# whenever the audit daemon is started via the initscripts.
# The rules are simply the parameters that would be passed
# to auditctl.

# First rule - delete all
-D

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 320 

# Disable system call auditing.
# Remove the following line if you need the auditing.
-a never,task

# Feel free to add below this line. See auditctl man page

Amazon Linux 2 and other operating systems that use systemd:

$ sudo cat /etc/audit/audit.rules
# This file is automatically generated from /etc/audit/rules.d
-D
-b 320
-f 1

Access the audit.rules file using an editor, such as the vi editor: Amazon Linux 1 and other operating systems that don't use systemd:

$ sudo vi /etc/audit/audit.rules

Amazon Linux 2 and other operating systems that use systemd:

$ sudo vi /etc/audit/rules.d/audit.rules

Edit the -b parameter to a larger value. The following example changes the -b value to 8192.

$ sudo cat /etc/audit/audit.rules
# This file contains the auditctl rules that are loaded
# whenever the audit daemon is started via the initscripts.
# The rules are simply the parameters that would be passed
# to auditctl.

# First rule - delete all
-D

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 8192 

# Disable system call auditing.
# Remove the following line if you need the auditing.
-a never,task

# Feel free to add below this line. See auditctl man page

$ sudo auditctl -s
enabled 1
failure 1
pid 2264
rate_limit 0
backlog_limit 320
lost 0
backlog 0

Restart the auditd service. The new backlog_limit value takes effect. The value also updates in auditctl -s, as shown in the following example:

# sudo service auditd stop
Stopping auditd:                                           [  OK  ]
# sudo service auditd start
Starting auditd:                                           [  OK  ]
# auditctl -s
enabled 1
failure 1
pid 26823
rate_limit 0
backlog_limit 8192
lost 0
backlog 0

Note: If your instance is inaccessible and you see backlog limit exceeded messages in the system log, stop and start the instance. Then, perform the preceding steps to change the audit buffer value.

The other method if auditd is enabled via the GRUB kernel parameter

In my experience of imaging a RHEL86 system that has the following kernel parameter in GRUB:

audit=1 audit_backlog_limit=8192

You can disable it by pressing 'e' in the grub menu to edit the grub kernel parameter line. Modify the following line like so:

audit=0

Or you can modify the audit_backlog_limit like so (however, keep in mind how much memory your system has):

audit=1 audit_backlog_limit=16384

References: https://access.redhat.com/solutions/4353521

Misc.

Additional information:

There has been known issued that could cause the kernel can panic due to audit option "f" in /etc/audit/audit.rules cat /etc/audit/audit.rule This would not cause the system to stop automatically and may be a different issue:

Example:

1. This file is automatically generated from /etc/audit/rules.d

-D

-b 8192

-f 1

The f flag sets the action that is performed when a critical error is detected, 0 -- Silent 1 -- Means that error will be handled by kernel log subsystem (printk, print a failure message) 2 -- Kernel panic in case of critical error Example conditions where this flag is consulted includes: transmission errors to user-space audit daemon, backlog limit exceeded, and rate limit exceeded.

Just wanted to add I have faced the error in the past where kernel got into panic due to audit option "f" in /etc/audit/audit.rules

cat /etc/audit/audit.rules

1. This file is automatically generated from /etc/audit/rules.d

-D -b 8192 -f 1 f - Sets the action that is performed when a critical error is detected,

  0 -- Silent
  1 -- Means that error will be handled by kernel log subsystem (printk, print a failure message)
  2 -- Kernel panic in case of critical error

Example conditions where this flag is consulted includes: transmission errors to user-space audit daemon, backlog limit exceeded, and rate limit exceeded

Notes: Per my customer's issue, we fixed it by: Removing the auditd.service from /usr/lib/systemd/system so you wont be able to start the service upon bootup.

To be able to start the service again you have to run: "systemctl daemon-reload” in addition to having the <>.service file in that directory

$ sudo systemctl start firewalld.service 
Failed to start firewalld.service: Unit not found.

$ sudo systemctl daemon-reload
$ sudo systemctl start firewalld.service

References: https://superuser.com/questions/513159/how-to-remove-systemd-services

Navigation menu