Refactor rffmpeg to support bad host checking

Allow rffmpeg to determine if a host is "bad", i.e. if the SSH
connection times out or fails, for instance due to an unreachable host
down for maintenance. In such a case, instead of exiting abruptly, we
will mark the host as "bad" in the current state file, and then retry
the whole process, skipping the bad hosts. If other rffmpeg processes
start while the current one is still running, they will also treat the
host as "bad", until the original process ends at which point the
statefile marking it bad will be removed and it will be retried by
future processes. This helps ensure that redundancy of transcode hosts
can actually be achieved, as before if the first host was down, the
process would simply fail, retry, and then fail again if the down host
was first in the list.

This required some major refactoring of the code, including
functionalizing various elements of the process as well as adding an
infinite loop to the main execution in order to keep looping through
hosts after marking one as bad.
This commit is contained in:
Joshua M. Boniface 2020-06-30 14:31:00 -04:00
parent ded1d6304a
commit 1e51fccba4
2 changed files with 171 additions and 98 deletions

View file

@ -18,11 +18,15 @@ rffmpeg is a remote FFmpeg wrapper used to execute FFmpeg commands on a remote s
1. Profit!
## rffmpeg options
## rffmpeg options and caveats
### Remote hosts
rffmpeg supports setting multiple hosts. It keeps state in `/run/shm/rffmpeg`, of all running processes. These state files are used during rffmpeg's initialization in order to determine the optimal target host. rffmpeg will run through these hosts sequentially, choosing the one with the fewest running rffmpeg jobs. This helps distribute the transcoding load across multiple servers, and can also provide redundancy if one of the servers is offline (connect timeout is 1 second so the impact should be minimal).
rffmpeg supports setting multiple hosts. It keeps state in `/run/shm/rffmpeg`, of all running processes. These state files are used during rffmpeg's initialization in order to determine the optimal target host. rffmpeg will run through these hosts sequentially, choosing the one with the fewest running rffmpeg jobs. This helps distribute the transcoding load across multiple servers, and can also provide redundancy if one of the servers is offline - rffmpeg will detect if a host is unreachable and set it "bad" for the remainder of the run, thus skipping it until the process completes.
### Terminating rffmpeg
When running rffmpeg manually, *do not* exit it with `Ctrl+C`. Doing so will likely leave the `ffmpeg` process running on the remote machine. Instead, enter `q` and a newline ("Enter") into the rffmpeg process, and this will terminate the entire command cleanly. This is the method that Jellyfin uses to communicate the termination of an `ffmpeg` process.
## Full setup guide

View file

@ -91,124 +91,193 @@ cli_ffmpeg_args = all_args[1:]
# Get PID
our_pid = os.getpid()
current_statefile = config['state_tempdir'] + '/' + config['state_filename'].format(pid=our_pid)
logger("Starting rffmpeg {}: {}".format(our_pid, ' '.join(all_args)))
###############################################################################
# State parsing and target determination
###############################################################################
def get_target_host():
"""
Determine the optimal target host
"""
logger("Determining target host")
# Ensure the state directory exists or create it
if not os.path.exists(config['state_tempdir']):
os.makedirs(config['state_tempdir'])
# Ensure the state directory exists or create it
if not os.path.exists(config['state_tempdir']):
os.makedirs(config['state_tempdir'])
# Check for existing state files
state_files = os.listdir(config['state_tempdir'])
# Check for existing state files
state_files = os.listdir(config['state_tempdir'])
# Read each statefile to determine which hosts are in use
active_hosts = list()
for state_file in state_files:
with open(config['state_tempdir'] + '/' + state_file, 'r') as statefile:
contents = statefile.readlines()
active_hosts.append(contents[0])
# Read each statefile to determine which hosts are bad or in use
bad_hosts = list()
active_hosts = list()
for state_file in state_files:
with open(config['state_tempdir'] + '/' + state_file, 'r') as statefile:
contents = statefile.readlines()
for line in contents:
if re.match('^badhost', line):
bad_hosts.append(line.split()[1])
else:
active_hosts.append(line.split()[0])
# Find out which active hosts are in use
host_counts = dict()
for host in config['remote_hosts']:
count = 0
for ahost in active_hosts:
if host == ahost:
count += 1
host_counts[host] = count
# Get the remote hosts list from the config
remote_hosts = config['remote_hosts']
# Select the host with the lowest count (first host is parsed last)
lowest_count = 999
target_host = None
for host in config['remote_hosts']:
if host_counts[host] < lowest_count:
lowest_count = host_counts[host]
target_host = host
# Remove any bad hosts from the remote_hosts list
for host in bad_hosts:
if host in remote_hosts:
remote_hosts.remove(host)
if not target_host:
logger('ERROR: Failed to find a valid target host')
exit(1)
# Find out which active hosts are in use
host_counts = dict()
for host in remote_hosts:
# Determine process counts in active_hosts
count = 0
for ahost in active_hosts:
if host == ahost:
count += 1
host_counts[host] = count
# Set up our state file
our_statefile = config['state_tempdir'] + '/' + config['state_filename'].format(pid=our_pid)
with open(our_statefile, 'w') as statefile:
statefile.write(config['state_contents'].format(host=target_host))
# Select the host with the lowest count (first host is parsed last)
lowest_count = 999
target_host = None
for host in remote_hosts:
if host_counts[host] < lowest_count:
lowest_count = host_counts[host]
target_host = host
###############################################################################
# Set up our remote command
###############################################################################
if not target_host:
logger('ERROR: Failed to find a valid target host')
exit(1)
rffmpeg_command = list()
# Write to our state file
with open(current_statefile, 'a') as statefile:
statefile.write(config['state_contents'].format(host=target_host) + '\n')
# Add SSH component
rffmpeg_command.append('ssh')
rffmpeg_command.append('-q')
return target_host
# Set our connection timeouts, in case one of several remote machines is offline
rffmpeg_command.append('-o')
rffmpeg_command.append('ConnectTimeout=1')
rffmpeg_command.append('-o')
rffmpeg_command.append('ConnectionAttempts=1')
def bad_host(target_host):
logger("Setting bad host {}".format(target_host))
for arg in config['remote_args']:
if arg:
rffmpeg_command.append(arg)
# Rewrite the statefile, removing all instances of the target_host that were added before
with open(current_statefile, 'r+') as statefile:
new_statefile = statefile.readlines()
statefile.seek(0)
for line in new_statefile:
if target_host not in line:
statefile.write(line)
statefile.truncate()
# Add user+host string
rffmpeg_command.append('{}@{}'.format(config['remote_user'], target_host))
logger("Running rffmpeg {} on {}@{}".format(our_pid, config['remote_user'], target_host))
# Add the bad host to the statefile
# This will affect this run, as well as any runs that start while this one is active; once
# this run is finished and its statefile removed, however, the host will be retried again
with open(current_statefile, 'a') as statefile:
statefile.write("badhost " + config['state_contents'].format(host=target_host) + '\n')
# Add any pre command
for cmd in config['pre_commands']:
if cmd:
rffmpeg_command.append(cmd)
def setup_command(target_host):
"""
Craft the target command
"""
logger("Crafting remote command string")
# Prepare our default stdin/stdout/stderr (normally, stdout to stderr)
stdin = sys.stdin
stdout = sys.stderr
stderr = sys.stderr
rffmpeg_command = list()
# Verify if we're in ffmpeg or ffprobe mode
if 'ffprobe' in all_args[0]:
rffmpeg_command.append(config['ffprobe_command'])
stdout = sys.stdout
else:
rffmpeg_command.append(config['ffmpeg_command'])
# Add SSH component
rffmpeg_command.append('ssh')
rffmpeg_command.append('-q')
# Determine if version, encorders, or decoders is an argument; if so, we output stdout to stdout
# Weird workaround for something Jellyfin requires...
if '-version' in cli_ffmpeg_args or '-encoders' in cli_ffmpeg_args or '-decoders' in cli_ffmpeg_args:
stdout = sys.stdout
# Set our connection timeouts, in case one of several remote machines is offline
rffmpeg_command.append('-o')
rffmpeg_command.append('ConnectTimeout=1')
rffmpeg_command.append('-o')
rffmpeg_command.append('ConnectionAttempts=1')
# Parse and re-quote any problematic arguments
for arg in cli_ffmpeg_args:
# Match bad shell characters: * ( ) whitespace
if re.search('[*()\s]', arg):
rffmpeg_command.append('"{}"'.format(arg))
for arg in config['remote_args']:
if arg:
rffmpeg_command.append(arg)
# Add user+host string
rffmpeg_command.append('{}@{}'.format(config['remote_user'], target_host))
logger("Running rffmpeg {} on {}@{}".format(our_pid, config['remote_user'], target_host))
# Add any pre command
for cmd in config['pre_commands']:
if cmd:
rffmpeg_command.append(cmd)
# Prepare our default stdin/stdout/stderr (normally, stdout to stderr)
stdin = sys.stdin
stdout = sys.stderr
stderr = sys.stderr
# Verify if we're in ffmpeg or ffprobe mode
if 'ffprobe' in all_args[0]:
rffmpeg_command.append(config['ffprobe_command'])
stdout = sys.stdout
else:
rffmpeg_command.append('{}'.format(arg))
rffmpeg_command.append(config['ffmpeg_command'])
rffmpeg_cli = ' '.join(rffmpeg_command)
logger("Remote command for rffmpeg {}: {}".format(our_pid, rffmpeg_cli))
# Determine if version, encorders, or decoders is an argument; if so, we output stdout to stdout
# Weird workaround for something Jellyfin requires...
if '-version' in cli_ffmpeg_args or '-encoders' in cli_ffmpeg_args or '-decoders' in cli_ffmpeg_args:
stdout = sys.stdout
###############################################################################
# Execute the remote command
###############################################################################
p = subprocess.run(rffmpeg_command,
shell=False,
bufsize=0,
universal_newlines=True,
stdin=stdin,
stderr=stderr,
stdout=stdout)
# Parse and re-quote any problematic arguments
for arg in cli_ffmpeg_args:
# Match bad shell characters: * ( ) whitespace
if re.search('[*()\s]', arg):
rffmpeg_command.append('"{}"'.format(arg))
else:
rffmpeg_command.append('{}'.format(arg))
###############################################################################
# Cleanup
###############################################################################
os.remove(our_statefile)
logger("Finished rffmpeg {} with code {}".format(our_pid, p.returncode))
exit(p.returncode)
return rffmpeg_command, stdin, stdout, stderr
def prepare_command():
logger("Preparing remote command")
target_host = get_target_host()
rffmpeg_command, stdin, stdout, stderr = setup_command(target_host)
rffmpeg_cli = ' '.join(rffmpeg_command)
logger("Remote command for rffmpeg {}: {}".format(our_pid, rffmpeg_cli))
return rffmpeg_command, target_host, stdin, stdout, stderr
def run_command(rffmpeg_command, stdin, stdout, stderr):
"""
Execute the remote command using subprocess
"""
logger("Running remote command")
p = subprocess.run(rffmpeg_command,
shell=False,
bufsize=0,
universal_newlines=True,
stdin=stdin,
stderr=stderr,
stdout=stdout)
returncode = p.returncode
return returncode
# Main process loop; executes until the ffmpeg command actually runs on a reachable host
while True:
logger("Starting process loop")
# Set up and execute our command
rffmpeg_command, target_host, stdin, stdout, stderr = prepare_command()
returncode = run_command(rffmpeg_command, stdin, stdout, stderr)
# A returncode of 255 means that the SSH process failed; ffmpeg does not throw this return code (https://ffmpeg.org/pipermail/ffmpeg-user/2013-July/016245.html)
if returncode == 255:
logger("SSH failed to host {}: marking this host as bad and retrying".format(target_host))
bad_host(target_host)
else:
# The SSH succeeded, so we can abort the loop
break
# Remove the current statefile
os.remove(current_statefile)
logger("Finished rffmpeg {} with return code {}".format(our_pid, returncode))
exit(returncode)