Intro
statsd is an industry standard metrics protocol and aggregation service. A statsd client sends metrics to a statsd daemon running as part of a saga deployment, typically over UDP. The statsd daemon aggregates the metrics data over time intervals and sends them to one and more metrics storage services. Instead of storing the metrics immediately when they occur, they are aggregated in memory per interval. This reduces the amount of storage write operations. Since statsd is an industry standard there are various compatible storage solutions that works with statsd metrics data. Those storage solutions can enrich and combine Saga data with other data sets.
Metric types
Each metric has a measurement label, a value and tags.
Increment Increments a stat by a value (default is 1)
statsd_client.increment('my_counter');`
Decrement Decrements a stat by a value (default is -1)
statsd_client.decrement('my_counter',-2,['tags1','tag2']);
Histogram: send data for histogram stat
statsd_client.histogram('my_histogram', 42)
Gauge: Gauge a stat by a specified amount
statsd_client.gauge('my_gauge', 123.45);
Set: Counts unique occurrences of a stat (alias of unique)
statsd_client.set('my_unique', 'foobar');
statsd_client.unique('my_unique', 'foobarbaz');`
Saga is using the hot-shots statsd client and the versatile Telegraf statsd daemon. The daemon can be configured to use a wide range of storage systems aka outputs like influxdb, elasticSearch, datadog, bigQuery or timescale.
Default Metrics
Saga is tracking certain application events by default. Metrics label follow the HTTP and Socket naming convention. Every metric has a tag indicating the unique SAGA Application Name, so when a single storage system is used for various SAGA Applications they can be differentiated. Tags can be used to dive in and differentiate metrics.
/users/register/count - counts user registrations
/users/login/count - triggered when an existing user registers
/messages/create/count - counts user logins. tags: user_id, bot_id and message from
/bots/properties/count - count user property creation. tags: bot_id and property name
/users/properties/count - count bot property creation. tags: user_id and property name
/globals/properties/count - count global property creation. tags: property name
/bots/signals/count - count user signal creation. tags: bot_id
/users/signals/count - count bot signal creation. tags: user_id
/globals/signals/count - count global signal creation
/jobs/runnable_calls/count, - count job invocations. tags: job_id
/jobs/runnable_calls/duration - job duration histogram. tags: job_id
/jobs/runnable_calls/queue_duration - job queue_duration histogram. tags: job_id
/scripts/runnable_calls/count - count script invocations. tags: script_id
/scripts/runnable_calls/duration - script duration histogram. tags: script_id
/scripts/runnable_calls/queue_duration - script queue_duration histogram. tags: script_id
/jobs/parallel_processing/ histogram, amount of jobs worked on in parallel per instance
/scripts/parallel_processing/ histogram, amount of jobs worked on in parallel per instance
/lifecycle/count keeps track of starting and stopping on workers. tags: name=[start,exit] source=[scripts_worker,jobs_worker,http_worker]
/lifecycle/count keeps track of starting and stopping on workers. tags: name=[start,exit] source=[scripts_worker,jobs_worker,http_worker]
/scripts/queue/messages total amount of script messages in queue. tags: runnable_id ID of script
/scripts/queue/messages_ready amount of script messages in queue ready for consumption. tags: runnable_id ID of script
/scripts/queue/messages_unacknowledged amount of script messages in queue currently worked on. tags: runnable_id ID of script
/jobs/queue/messages total amount of job messages in queue. tags: runnable_id ID of jobs
/jobs/queue/messages_ready amount of job messages in queue ready for consumption. tags: runnable_id ID of jobs
/jobs/queue/messages_unacknowledged amount of job messages in queue currently worked on. tags: runnable_id ID of jobs
Custom Metrics
Application logic is coded in scripts and jobs. Every script and job has an entity called 'statsd_client' through which it can create custom metrics. Each custom metric has default tags, the application name and the script/job title and id.
(message, user, bot) =>{
statsd_client.increment("happy_message_counter");
}
API Calls
All metrics can be retrieved via a single analytics API call.
GET /analytics
Required Parameters.
- measurement: name of the measurement
- field: name of the field
- aggregateFunction: aggregate function for field, like
mean
,count
orsum
InfluxDB Aggregate Functions - start: date in javascript format
- end: date in javascript format
- group_by: time interval grouping
X[shmd]
, example1h
or-15m
Optional Parameters.
- timezoneOffset: translate times to the timezone with the given offset,
+/-dh
, example-5h
- query: json formatted tag filter, example:
{runnable_id:'ab479665ee22
}`
{"group_by":"15m",
"values":[
{"value":1,"date":"2023-05-29T16:15:00Z"},
{"value":1,"date":"2023-05-29T16:30:00Z"},
{"value":1,"date":"2023-05-29T16:45:00Z"},
{"value":1,"date":"2023-05-29T17:00:00Z"},
{"value":1,"date":"2023-05-29T17:15:00Z"},
{"value":1,"date":"2023-05-29T17:30:00Z"},
{"value":1,"date":"2023-05-29T17:45:00Z"},
{"value":1,"date":"2023-05-29T18:00:00Z"},
{"value":1,"date":"2023-05-29T18:15:00Z"},
{"value":1,"date":"2023-05-29T18:30:00Z"},
{"value":1,"date":"2023-05-29T18:45:00Z"},
{"value":1,"date":"2023-05-29T19:00:00Z"},
{"value":1,"date":"2023-05-29T19:15:00Z"}
]
}