Lifecycle¶
The lifecycle of a Ponos agent has two main stages:
Setup
which is done once at the beginning,Loop
which is repeated infinitely until the Ponos agent is stopped.
graph TB
start[Start Ponos agent] --> setup
setup --> loop
subgraph setup[Setup]
direction LR
setup_tools[Setup tools] --> check_pid{Other Ponos agent running?}
check_pid --> |Yes| exit[Exit]
check_pid --> |No| write_pid[Mark Ponos agent as running]
write_pid --> register[Register Ponos agent on Arkindex]
register --> list_existing_tasks[List running tasks]
end
subgraph loop[Loop]
direction LR
ready{Ready?} --> |No| ready
ready --> |Yes| check_tasks
subgraph check_tasks[Check running tasks]
direction TB
upload_logs[Upload task's logs] --> task_state{Task's state?}
task_state --> |Finished| update_state_finished[Update task's state to `Completed` or `Failed`]
end
check_tasks --> get_actions[Retrieve actions from Arkindex]
get_actions --> action{Action?}
action --> action_start[Start Task]
action --> action_stop[Stop Task]
subgraph action_start[Start task]
direction TB
download_files[Download files] --> start_task[Start task]
start_task --> update_state_running[Update task's state to `Running`]
end
subgraph action_stop[Stop task]
direction TB
stop_task[Stop task] --> update_state_stopped[Update task's state to `Stopped`]
end
end
Setup¶
When a Ponos agent starts, it will set up its environment once at the beginning. The Ponos agent will:
- Set up Sentry (using the
sentry
parameter of its configuration). - Set up the Arkindex client to use (using the
url
parameter of its configuration). - Set up the logging (using the
logging
parameter of its configuration). - Check that there is no other Ponos agent on the same host. It uses a unique file (from the
pid_file
parameter of its configuration) containing the PID of the Ponos agent currently running. If this file contains a PID other than that of the current Ponos agent and corresponds to a running program, the Ponos agent stops. - Mark its presence on the host. It writes its PID to the unique file (from the
pid_file
parameter of its configuration). - Create a folder (using the
data_dir
parameter of its configuration) to store various files later to enable Arkindex tasks to be processed. - Register to Arkindex (using the
CreateAgent
endpoints and thefarm_id
andseed
parameters of its configuration). - List the tasks running on the host.
Loop¶
Once the Ponos agent’s setup is complete, it will loop infinitely to track the processing of its tasks and synchronize their state with Arkindex state. At each loop it will:
- Check whether the Ponos agent is ready. By default, the Ponos agent is always ready, but this condition depends on the type of Ponos agent used.
- Check running tasks on the host. For each task, the Ponos agent will:
- Check that the task is still running. If the Ponos agent cannot find the task, it will update its state to
Error
(using thePartialUpdateTask
endpoint). The task will no longer be listed as a running task. - Upload the task’s logs using the associated S3 URL.
- Check whether the task is finished. If the task is finished, it will update its state to
Completed
orFailed
according to its exit code (using thePartialUpdateTask
endpoint) and upload its artifacts (using theCreateArtifact
endpoint). The task will no longer be listed as a running task.
- Check that the task is still running. If the Ponos agent cannot find the task, it will update its state to
- Retrieve the list of actions (using the
RetrieveAgentActions
endpoint) and process each action.
Action¶
A task can be either started or stopped.
Start task¶
To start a task, the Ponos agent will:
- Check that the task is not already running. If the task is already running, the Ponos agent will ignore the action.
- Check that the task is correctly assigned. If the task is assigned to another Ponos agent, the Ponos agent will ignore the action.
- Download task’s artifacts (artifacts of parent tasks) (using the
RetrieveTaskDefinition
and theListArtifacts
endpoints), task’s extra files (like models) and store them in a specific folder (using thedata_dir
parameter of its configuration). - Start the task according to the type of Ponos agent used.
- Update the task’s state to
Running
(using thePartialUpdateTask
endpoint). - Add the task to its list of running tasks.
If an error occurs during any of the above steps, the Ponos agent will update the task’s state to Error
(using the PartialUpdateTask
endpoint).
Stop task¶
To stop a task, the Ponos agent will:
- Check that the task is not running. If the task is still running, it will stop it. The task will no longer be listed as a running task.
- Update task’s state to
Stopped
(using thePartialUpdateTask
endpoint).
Generic Ponos agent¶
The Ponos agent lifecycle described above is as generic as possible. We have currently implemented two agents for that lifecycle: using Docker engine for standard hardware, and Slurm for super-computers.