Managing Actions consumption and cost

Posted May 11, 2023

By Kitty Chiu

6 min read

TL;DR Got a bill shock? Here’s a value stream approach to analyse Actions minutes consumption. First, know your CICD value stream. If none, it’s time to define it. Next, identify where along the flow are Actions involved and gather the data. Finally, do the math to forecast and optimise your next monthly consumption.

GitHub has per seat license where the cost is pretty predictable with headcounts. However as we adopt DevSecOps practices with GitHub Actions, one of the top challenges most organisations will encounter is the metered billing. From CICD to Security features, they all leverage Actions. Some GitHub platform owner may either get a ‘bill shock’ for this line item, or seeing their included Actions minutes vaporise rapidly within the first few days of the billing month, or being conservative and run CICD in their own servers (or under their desk).

These are signs of not managing Actions consumption early and pragmatically. Fear, uncertainty and doubt (FUD) may cause undesired behaviours, and may manifest as compromising validation checks that may cripple the product quality and reliability.

Applying FinOps should be considered when adopting metered services like Actions. In a nutshell, the FinOps is about

Moving away from siloed roles and responsibilities, to shared accountability and responsibility
Collaborative effort to optimise between business and IT
Make decisions base on business value rather than COGS or ROI

Three adoption phases of FinOps all start with visibility to usage. Understanding Actions usage is the fundamental step.

Understand usage with value stream

Having a structure approach to gain usage visibility will demystify the ambiguity, enable proper accountability, and allow better budgeting and forecasting to practice DevSecOps.

1. Define your CI/CD value stream and practices

The first step is to know your value stream. Quality should be well-planned and intentional.

For example, trunk-based development may typically have a CI/CD value stream that looks like this:

%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
              'cScale0': '#CCC',
              'cScale1': '#CCC',
              'cScale2': '#CCC',
              'cScale3': '#CCC',
              'cScale4': '#CCC',
              'cScale5': '#CCC'
       } } }%%
timeline
    A feature prioritised: auto-backlog management : agile planning : create feature branch
    Development: commits against branch : unit test: lint : static application security testing : software composition analysis
    Continuous integration : create pull request : build : integration test : static application security testing : software composition analysis : peer review : merge into default branch
    Continuous delivery : build : architecture fitness : package : sign & publish : deploy to non-prod : manual & exploratory test : dynamic application security testing
    Deploy to production : deployment gates : canary deployment : blue / green deployment
    Continuous monitoring : detect : response : post-mortem : live-site review

Above is a mermaid diagram for easier documentation in markdown. Conventional visualisations like this value stream also help to visualise the end-to-end DevSecOps workflow and identify the activities.

2. Gather Actions usage data

Next step is to trace Actions usage for the identified activities. The activities may be related to first-party GitHub Actions, third-party from the Marketplace or other open-sourced community. The data we want to trace are:

average duration - from past success workflow runs
success rate - complete and accurate rate
frequency - recorded occurrence

For example,

Stage	Trigger	Actions activities	Duration to complete stage	Success rate	Frequency
A feature prioritised	Scheduled	auto-backlog management¹²	1m	80%	weekly
Development	Commits to feature branch	commit, build, unit test, lint, dependency scanning³, secret scanning⁴	2m	50%	15 times / day
Continuous integration	Merge to default branch	build, integration test, static application security testing scanning⁵, dependency review³	20m	70%	5 times / day
Continuous delivery	Generate a release	build, architecture fitness test, package, sign⁶ & publish⁷⁸, deploy to environments, exploratory test, dynamic application security testing	20m	80%	weekly
Deploy to production	Request to go-live	deployment gates⁹, deploy to production	20m	90%	ad-hoc
Continuous monitoring	Scheduled	dependency scanning¹⁰	1m	100%	daily

3. Evaluate Actions minutes usage

Third step is to evaluate the Actions durations with the data gathered.

Calculation,

\[\text{ Total activity duration } = \text{ duration } \div \text{ Success Rate } \times \text{ Frequency }\]

Evaluate the rest of the stages, and the visual may show more clearly where and how are Actions used in your CICD:

%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
              'cScale0': '#888',
              'cScale1': '#CCC',
              'cScale2': '#CCC',
              'cScale3': '#CCC',
              'cScale4': '#CCC',
              'cScale5': '#CCC',
              'cScale6': '#CCC'
       } } }%%
timeline
    Stage: Average run duration : Success rate : Frequency : Normalised usage per week
		A feature prioritised: 1m : 80% : weekly : 1m
    Development: 2m : 50% : 15 times per day : 300m
    Continuous integration : 20m : 70% : 5 times per day : 714m
    Continuous delivery : 20m : 80% : weekly : 25m
    Deploy to production : 20m : 90% : ad-hoc : 22m
    Continuous monitoring : 1m : 100% : daily : 7m

Updated 15th Aug 2023 - I have published a GitHub action to help implementing what’s been discussed here. There are example use cases of how to apply in workflows (yml) - do check it out!

4. Adjustment for forecasting - growth rate and confidence level

Finally, the calculation can be extended to forecast future consumption. To do so, factor in these variables to adjust:

Growth rate is expected growth including increase in number of developers, teams and/or repositories
Confidence level is the success rate used to estimate an interval. Then the confidence interval is an estimated range based on historical data, incorporating the risk of uncertainty and other unplanned factors

Forecasting may not 100% accurate at first calculation. It is not set once and forget, but rather it requires continuous refinement.

Inspect and adapt are the flywheel

Visibility to Actions usage in value stream is the first step to gain assurance in Actions consumption. It needs to be followed by iterative process of inspect and adapt by both business and engineering. For FinOps framework, this is the Inform, Optimize, and Operate activities which may look like:

Continuously monitor with live-dashboards accessible to all responsible individuals (including engineers). Using budget alerts for 50%, 75% and 90% threshold, and delegate responsibility to respond to unplanned spikes.
Weekly inspect the level of Actions consumption by the product owner and engineering, and showback accountability to the business/service owners.
Monthly service view to inspect where the Actions minutes are being used help to optimize your CICD, and remediate technical debts. Pay attention to data such as workflow runs that continue to fail or redundant.

Our ultimate goal is to minimise surprises and manage FUD for all stakeholders, make practicing DevSecOps sustainable and value adding to business.

References of applying DevSecOps practices on GitHub

L200

This post is licensed under CC BY 4.0 by the author.