Statsd -

Intro

statsd is an industry standard metrics protocol and aggregation service. A statsd client sends metrics to a statsd daemon running as part of a saga deployment, typically over UDP. The statsd daemon aggregates the metrics data over time intervals and sends them to one and more metrics storage services. Instead of storing the metrics immediately when they occur, they are aggregated in memory per interval. This reduces the amount of storage write operations. Since statsd is an industry standard there are various compatible storage solutions that works with statsd metrics data. Those storage solutions can enrich and combine Saga data with other data sets.

Metric types

Each metric has a measurement label, a value and tags.

Increment Increments a stat by a value (default is 1)

statsd_client.increment('my_counter');`

Decrement Decrements a stat by a value (default is -1)

statsd_client.decrement('my_counter',-2,['tags1','tag2']);

Histogram: send data for histogram stat

statsd_client.histogram('my_histogram', 42)

Gauge: Gauge a stat by a specified amount

statsd_client.gauge('my_gauge', 123.45);

Set: Counts unique occurrences of a stat (alias of unique)

statsd_client.set('my_unique', 'foobar');
statsd_client.unique('my_unique', 'foobarbaz');`

Saga is using the hot-shots statsd client and the versatile Telegraf statsd daemon. The daemon can be configured to use a wide range of storage systems aka outputs like influxdb, elasticSearch, datadog, bigQuery or timescale.

Default Metrics

Saga is tracking certain application events by default. Metrics label follow the HTTP and Socket naming convention. Every metric has a tag indicating the unique SAGA Application Name, so when a single storage system is used for various SAGA Applications they can be differentiated. Tags can be used to dive in and differentiate metrics.

/users/register/count - counts user registrations
/users/login/count - triggered when an existing user registers
/messages/create/count - counts user logins. tags: user_id, bot_id and message from
/bots/properties/count - count user property creation. tags: bot_id and property name
/users/properties/count - count bot property creation. tags: user_id and property name
/globals/properties/count - count global property creation. tags: property name
/bots/signals/count - count user signal creation. tags: bot_id
/users/signals/count - count bot signal creation. tags: user_id
/globals/signals/count - count global signal creation
/jobs/runnable_calls/count, - count job invocations. tags: job_id
/jobs/runnable_calls/duration - job duration histogram. tags: job_id
/jobs/runnable_calls/queue_duration - job queue_duration histogram. tags: job_id
/scripts/runnable_calls/count - count script invocations. tags: script_id
/scripts/runnable_calls/duration - script duration histogram. tags: script_id
/scripts/runnable_calls/queue_duration - script queue_duration histogram. tags: script_id
/jobs/parallel_processing/ histogram, amount of jobs worked on in parallel per instance
/scripts/parallel_processing/ histogram, amount of jobs worked on in parallel per instance
/lifecycle/count keeps track of starting and stopping on workers. tags: name=[start,exit] source=[scripts_worker,jobs_worker,http_worker]
/lifecycle/count keeps track of starting and stopping on workers. tags: name=[start,exit] source=[scripts_worker,jobs_worker,http_worker]
/scripts/queue/messages total amount of script messages in queue. tags: runnable_id ID of script
/scripts/queue/messages_ready amount of script messages in queue ready for consumption. tags: runnable_id ID of script
/scripts/queue/messages_unacknowledged amount of script messages in queue currently worked on. tags: runnable_id ID of script
/jobs/queue/messages total amount of job messages in queue. tags: runnable_id ID of jobs
/jobs/queue/messages_ready amount of job messages in queue ready for consumption. tags: runnable_id ID of jobs
/jobs/queue/messages_unacknowledged amount of job messages in queue currently worked on. tags: runnable_id ID of jobs

Custom Metrics

Application logic is coded in scripts and jobs. Every script and job has an entity called 'statsd_client' through which it can create custom metrics. Each custom metric has default tags, the application name and the script/job title and id.

(message, user, bot) =>{
  statsd_client.increment("happy_message_counter");
}

API Calls

All metrics can be retrieved via a single analytics API call.

GET /analytics

Required Parameters.

measurement: name of the measurement
field: name of the field
aggregateFunction: aggregate function for field, like mean,count or sum InfluxDB Aggregate Functions
start: date in javascript format
end: date in javascript format
group_by: time interval grouping X[shmd], example 1h or -15m

Optional Parameters.

timezoneOffset: translate times to the timezone with the given offset, +/-dh, example -5h
query: json formatted tag filter, example: {runnable_id:'ab479665ee22}`

{"group_by":"15m",
"values":[
    {"value":1,"date":"2023-05-29T16:15:00Z"},
    {"value":1,"date":"2023-05-29T16:30:00Z"},
    {"value":1,"date":"2023-05-29T16:45:00Z"},
    {"value":1,"date":"2023-05-29T17:00:00Z"},
    {"value":1,"date":"2023-05-29T17:15:00Z"},
    {"value":1,"date":"2023-05-29T17:30:00Z"},
    {"value":1,"date":"2023-05-29T17:45:00Z"},
    {"value":1,"date":"2023-05-29T18:00:00Z"},
    {"value":1,"date":"2023-05-29T18:15:00Z"},
    {"value":1,"date":"2023-05-29T18:30:00Z"},
    {"value":1,"date":"2023-05-29T18:45:00Z"},
    {"value":1,"date":"2023-05-29T19:00:00Z"},
    {"value":1,"date":"2023-05-29T19:15:00Z"}
    ]
}