{server_url}/metrics/service
. These metrics are valuable for monitoring system health and performance.
Metric | Description | Use Case |
---|---|---|
active_invocations_gauge | Count of uncompleted invocations | Monitors system backlog |
active_tasks | Count of uncompleted tasks | Tracks overall system load |
unallocated_tasks | Count of tasks not allocated to executors | Identifies resource constraints |
max_invocation_age_seconds | Age of oldest running invocation | Detects stuck invocations |
max_task_age_seconds | Age of oldest running task | Identifies abnormally long-running tasks |
task_completion_latency_seconds_bucket_count{outcome="Success"} | Count of successfully completed tasks | Tracks successful throughput |
task_completion_latency_seconds_bucket_count{outcome="Failure"} | Count of failed tasks | Monitors system errors |
task_completion_latency_seconds_bucket | Distribution of task completion times | Analyzes performance trends |
/metrics/service
endpoint.
Endpoint | Description | Use Case |
---|---|---|
{server_url}/internal/allocations | Lists current allocations per executor | Debugging executor load balance |
{server_url}/internal/unallocated_tasks | Lists all tasks not being allocated | Identifying resource bottlenecks |
unallocated_tasks
metric is high:
--function
argument matching the unallocated task./internal/allocations
endpoint to see if executors are at capacitymax_task_age_seconds
is unusually high:
/internal/allocations
to identify the specific long-running tasksstdout
of long running tasks using the Indexify UI at {server_url}/ui
task_completion_latency_seconds_bucket_count{outcome="Failure"}
is increasing:
stdout
or stderr
of failed tasks using the Indexify UI at {server_url}/ui