Post

Managing Actions consumption and cost

Managing Actions consumption and cost

TL;DR Got a bill shock? Here’s a value stream approach to analyse Actions minutes consumption. First, know your CICD value stream. If none, it’s time to define it. Next, identify where along the flow are Actions involved and gather the data. Finally, do the math to forecast and optimise your next monthly consumption.

GitHub has per seat license where the cost is pretty predictable with headcounts. However as we adopt DevSecOps practices with GitHub Actions, one of the top challenges most organisations will encounter is the metered billing. From CICD to Security features, they all leverage Actions. Some GitHub platform owner may either get a ‘bill shock’ for this line item, or seeing their included Actions minutes vaporise rapidly within the first few days of the billing month, or being conservative and run CICD in their own servers (or under their desk).

These are signs of not managing Actions consumption early and pragmatically. Fear, uncertainty and doubt (FUD) may cause undesired behaviours, and may manifest as compromising validation checks that may cripple the product quality and reliability.

Applying FinOps should be considered when adopting metered services like Actions. In a nutshell, the FinOps is about

  • Moving away from siloed roles and responsibilities, to shared accountability and responsibility
  • Collaborative effort to optimise between business and IT
  • Make decisions base on business value rather than COGS or ROI

Three adoption phases of FinOps all start with visibility to usage. Understanding Actions usage is the fundamental step.

Understand usage with value stream

Having a structure approach to gain usage visibility will demystify the ambiguity, enable proper accountability, and allow better budgeting and forecasting to practice DevSecOps.

1. Define your CI/CD value stream and practices

The first step is to know your value stream. Quality should be well-planned and intentional.

For example, trunk-based development may typically have a CI/CD value stream that looks like this:

%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
              'cScale0': '#CCC',
              'cScale1': '#CCC',
              'cScale2': '#CCC',
              'cScale3': '#CCC',
              'cScale4': '#CCC',
              'cScale5': '#CCC'
       } } }%%
timeline
    A feature prioritised: auto-backlog management : agile planning : create feature branch
    Development: commits against branch : unit test: lint : static application security testing : software composition analysis
    Continuous integration : create pull request : build : integration test : static application security testing : software composition analysis : peer review : merge into default branch
    Continuous delivery : build : architecture fitness : package : sign & publish : deploy to non-prod : manual & exploratory test : dynamic application security testing
    Deploy to production : deployment gates : canary deployment : blue / green deployment
    Continuous monitoring : detect : response : post-mortem : live-site review

Above is a mermaid diagram for easier documentation in markdown. Conventional visualisations like this value stream also help to visualise the end-to-end DevSecOps workflow and identify the activities.

2. Gather Actions usage data

Next step is to trace Actions usage for the identified activities. The activities may be related to first-party GitHub Actions, third-party from the Marketplace or other open-sourced community. The data we want to trace are:

  • average duration - from past success workflow runs

  • success rate - complete and accurate rate

  • frequency - recorded occurrence

For example,

StageTriggerActions
activities
Duration to
complete stage
Success
rate
Frequency
A feature
prioritised
Scheduledauto-backlog management121m80%weekly
DevelopmentCommits to
feature branch
commit,
build,
unit test,
lint,
dependency scanning3,
secret scanning4
2m50%15 times / day
Continuous
integration
Merge to
default branch
build,
integration test,
static application security testing scanning5,
dependency review3
20m70%5 times / day
Continuous
delivery
Generate
a release
build,
architecture fitness test,
package,
sign6 & publish78,
deploy to environments,
exploratory test,
dynamic application security testing
20m80%weekly
Deploy to
production
Request to
go-live
deployment gates9,
deploy to production
20m90%ad-hoc
Continuous monitoringScheduleddependency scanning101m100%daily

3. Evaluate Actions minutes usage

Third step is to evaluate the Actions durations with the data gathered.

Calculation,

\[\text{ Total activity duration } = \text{ duration } \div \text{ Success Rate } \times \text{ Frequency }\]

Evaluate the rest of the stages, and the visual may show more clearly where and how are Actions used in your CICD:

%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
              'cScale0': '#888',
              'cScale1': '#CCC',
              'cScale2': '#CCC',
              'cScale3': '#CCC',
              'cScale4': '#CCC',
              'cScale5': '#CCC',
              'cScale6': '#CCC'
       } } }%%
timeline
    Stage: Average run duration : Success rate : Frequency : Normalised usage per week
		A feature prioritised: 1m : 80% : weekly : 1m
    Development: 2m : 50% : 15 times per day : 300m
    Continuous integration : 20m : 70% : 5 times per day : 714m
    Continuous delivery : 20m : 80% : weekly : 25m
    Deploy to production : 20m : 90% : ad-hoc : 22m
    Continuous monitoring : 1m : 100% : daily : 7m

Updated 15th Aug 2023 - I have published a GitHub action to help implementing what’s been discussed here. There are example use cases of how to apply in workflows (yml) - do check it out!

4. Adjustment for forecasting - growth rate and confidence level

Finally, the calculation can be extended to forecast future consumption. To do so, factor in these variables to adjust:

  • Growth rate is expected growth including increase in number of developers, teams and/or repositories
  • Confidence level is the success rate used to estimate an interval. Then the confidence interval is an estimated range based on historical data, incorporating the risk of uncertainty and other unplanned factors

Forecasting may not 100% accurate at first calculation. It is not set once and forget, but rather it requires continuous refinement.

Inspect and adapt are the flywheel

Visibility to Actions usage in value stream is the first step to gain assurance in Actions consumption. It needs to be followed by iterative process of inspect and adapt by both business and engineering. For FinOps framework, this is the Inform, Optimize, and Operate activities which may look like:

  • Continuously monitor with live-dashboards accessible to all responsible individuals (including engineers). Using budget alerts for 50%, 75% and 90% threshold, and delegate responsibility to respond to unplanned spikes.
  • Weekly inspect the level of Actions consumption by the product owner and engineering, and showback accountability to the business/service owners.
  • Monthly service view to inspect where the Actions minutes are being used help to optimize your CICD, and remediate technical debts. Pay attention to data such as workflow runs that continue to fail or redundant.

Our ultimate goal is to minimise surprises and manage FUD for all stakeholders, make practicing DevSecOps sustainable and value adding to business.


References of applying DevSecOps practices on GitHub

This post is licensed under CC BY 4.0 by the author.