Managing Actions consumption and cost
TL;DR Got a bill shock? Here’s a value stream approach to analyse Actions minutes consumption. First, know your CICD value stream. If none, it’s time to define it. Next, identify where along the flow are Actions involved and gather the data. Finally, do the math to forecast and optimise your next monthly consumption.
GitHub has per seat license where the cost is pretty predictable with headcounts. However as we adopt DevSecOps practices with GitHub Actions, one of the top challenges most organisations will encounter is the metered billing. From CICD to Security features, they all leverage Actions. Some GitHub platform owner may either get a ‘bill shock’ for this line item, or seeing their included Actions minutes vaporise rapidly within the first few days of the billing month, or being conservative and run CICD in their own servers (or under their desk).
These are signs of not managing Actions consumption early and pragmatically. Fear, uncertainty and doubt (FUD) may cause undesired behaviours, and may manifest as compromising validation checks that may cripple the product quality and reliability.
Applying FinOps should be considered when adopting metered services like Actions. In a nutshell, the FinOps is about
- Moving away from siloed roles and responsibilities, to shared accountability and responsibility
- Collaborative effort to optimise between business and IT
- Make decisions base on business value rather than COGS or ROI
Three adoption phases of FinOps all start with visibility to usage. Understanding Actions usage is the fundamental step.
Understand usage with value stream
Having a structure approach to gain usage visibility will demystify the ambiguity, enable proper accountability, and allow better budgeting and forecasting to practice DevSecOps.
1. Define your CI/CD value stream and practices
The first step is to know your value stream. Quality should be well-planned and intentional.
For example, trunk-based development may typically have a CI/CD value stream that looks like this:
%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
'cScale0': '#CCC',
'cScale1': '#CCC',
'cScale2': '#CCC',
'cScale3': '#CCC',
'cScale4': '#CCC',
'cScale5': '#CCC'
} } }%%
timeline
A feature prioritised: auto-backlog management : agile planning : create feature branch
Development: commits against branch : unit test: lint : static application security testing : software composition analysis
Continuous integration : create pull request : build : integration test : static application security testing : software composition analysis : peer review : merge into default branch
Continuous delivery : build : architecture fitness : package : sign & publish : deploy to non-prod : manual & exploratory test : dynamic application security testing
Deploy to production : deployment gates : canary deployment : blue / green deployment
Continuous monitoring : detect : response : post-mortem : live-site review
Above is a mermaid diagram for easier documentation in markdown. Conventional visualisations like this value stream also help to visualise the end-to-end DevSecOps workflow and identify the activities.
2. Gather Actions usage data
Next step is to trace Actions usage for the identified activities. The activities may be related to first-party GitHub Actions, third-party from the Marketplace or other open-sourced community. The data we want to trace are:
average duration
- from past success workflow runssuccess rate
- complete and accurate ratefrequency
- recorded occurrence
For example,
Stage | Trigger | Actions activities | Duration to complete stage | Success rate | Frequency |
---|---|---|---|---|---|
A feature prioritised | Scheduled | auto-backlog management12 | 1m | 80% | weekly |
Development | Commits to feature branch | commit, build, unit test, lint, dependency scanning3, secret scanning4 | 2m | 50% | 15 times / day |
Continuous integration | Merge to default branch | build, integration test, static application security testing scanning5, dependency review3 | 20m | 70% | 5 times / day |
Continuous delivery | Generate a release | build, architecture fitness test, package, sign6 & publish78, deploy to environments, exploratory test, dynamic application security testing | 20m | 80% | weekly |
Deploy to production | Request to go-live | deployment gates9, deploy to production | 20m | 90% | ad-hoc |
Continuous monitoring | Scheduled | dependency scanning10 | 1m | 100% | daily |
3. Evaluate Actions minutes usage
Third step is to evaluate the Actions durations with the data gathered.
Calculation,
\[\text{ Total activity duration } = \text{ duration } \div \text{ Success Rate } \times \text{ Frequency }\]
Evaluate the rest of the stages, and the visual may show more clearly where and how are Actions used in your CICD:
%%{init: { 'logLevel': 'error', 'theme': 'forest' , 'themeVariables': {
'cScale0': '#888',
'cScale1': '#CCC',
'cScale2': '#CCC',
'cScale3': '#CCC',
'cScale4': '#CCC',
'cScale5': '#CCC',
'cScale6': '#CCC'
} } }%%
timeline
Stage: Average run duration : Success rate : Frequency : Normalised usage per week
A feature prioritised: 1m : 80% : weekly : 1m
Development: 2m : 50% : 15 times per day : 300m
Continuous integration : 20m : 70% : 5 times per day : 714m
Continuous delivery : 20m : 80% : weekly : 25m
Deploy to production : 20m : 90% : ad-hoc : 22m
Continuous monitoring : 1m : 100% : daily : 7m
Updated 15th Aug 2023 - I have published a GitHub action to help implementing what’s been discussed here. There are example use cases of how to apply in workflows (yml) - do check it out!
4. Adjustment for forecasting - growth rate and confidence level
Finally, the calculation can be extended to forecast future consumption. To do so, factor in these variables to adjust:
- Growth rate is expected growth including increase in number of developers, teams and/or repositories
- Confidence level is the success rate used to estimate an interval. Then the confidence interval is an estimated range based on historical data, incorporating the risk of uncertainty and other unplanned factors
Forecasting may not 100% accurate at first calculation. It is not set once and forget, but rather it requires continuous refinement.
Inspect and adapt are the flywheel
Visibility to Actions usage in value stream is the first step to gain assurance in Actions consumption. It needs to be followed by iterative process of inspect and adapt by both business and engineering. For FinOps framework, this is the Inform, Optimize, and Operate activities which may look like:
- Continuously monitor with live-dashboards accessible to all responsible individuals (including engineers). Using budget alerts for 50%, 75% and 90% threshold, and delegate responsibility to respond to unplanned spikes.
- Weekly inspect the level of Actions consumption by the product owner and engineering, and showback accountability to the business/service owners.
- Monthly service view to inspect where the Actions minutes are being used help to optimize your CICD, and remediate technical debts. Pay attention to data such as workflow runs that continue to fail or redundant.
Our ultimate goal is to minimise surprises and manage FUD for all stakeholders, make practicing DevSecOps sustainable and value adding to business.
References of applying DevSecOps practices on GitHub