Custom Budget Alerting in Google Cloud Platform

One of the most prominent reasons for moving to the cloud is to reduce costs, and every organization should want to keep tabs on spending. Unfortunately, managing costs can be tricky in any cloud provider’s environment. It may be the case that an organization and its teams are relatively small and it’s easy to hold people accountable if costs go a little off the rails, but what if you have dozens of teams? Or hundreds? Cost management can become quite cumbersome at scale, and Google Cloud Platform (GCP) poses its own intricacies to cost management. This article will focus on using a combination of GCP-native products to assist with that effort, allowing you to send customized notifications to your project owners by enforcing good standards with automation.

Natively in GCP, any user assigned the Billing Account Administrator or Billing Account User role automatically receives notifications for budget alerts, provided they are set up.  However, in most enterprises, only a select few people truly need either role. Assigning the Billing Account User role, for example, would grant users the ability to associate a GCP Billing Account with a project, but that role would typically be associated with only Service Accounts, and projects would be deployed via automation to control sprawl.

How, then, do folks keep track of their spending in GCP without having to navigate the console and run reports, which also requires their own set of IAM policies? Using GCP and a mix of PubSub, Cloud Functions, and the (now beta as of October, 2020) Billing Budgets API, organizations can provide their teams with a slick budget notification system.  

The general idea is to use the Billing Budgets API to construct budgets and budget alert levels for your projects. When these alert levels are achieved, a notification can be automatically sent out to “project owners”.

There are a few requirements to get this done. The first step is to define the target (person or distribution list) for the budget alerts. This is typically done at the project level with a Label set for the purpose. In this fashion, there can be as many different people (or groups) getting notifications as you may have projects.

The next step is to construct a Pub/Sub topic for these alerts. This will support scalability and provide a key abstraction to allow the supporting code to be written once and yet be applicable to all projects within the organization.

Create a small Cloud Function to invoke when budget alerts are generated. This function can be written once for the organization and used no matter how many projects are created, and no matter how many people or groups may need to be involved receiving Billing Budget Alerts.

Finally, establish the budgets and alert thresholds for each project. Some organizations will do this automatically, as they create projects through their infrastructure-as-code deployment scripts.  

While all of these steps could be performed manually, savvy organizations will develop and enforce the following standards to facilitate a consistent experience:

  • Projects should be deployed via automation
  • Automation should enforce a standard set of labels
  • Label inputs should conform to the standard

Not following these conditions will cause some gaps in your environment, and ultimately a failing process. For this example, assume all projects are required to have several project-level labels, one of which is “contact_email.” This label value is entered as an email address to make it easier for requestors. For those familiar with GCP, you know that labels do not support periods or @ symbols. That means the value received needs to be reformatted at deploy time, such that any periods are converted to a single underscore (“_”), and the @ symbol to a double underscore (“__”). This requirement is what helps trigger the notifications to the correct party.

I used Terraform to deploy the GCP resources, and Python 3.7 in the Cloud Function. You may be able to use other Cloud Function supported languages to implement the same solution.

Setting Up the Topic and Function

First, there needs to be at least one project to host the PubSub topic, Cloud Function, and in this example, a GCS bucket to store the Cloud Function code.  For ease of management, I’m using a single project for all three, but your organizational needs or policies may differ. In that case, you may need to implement additional IAM policies to provide permissions to the different components.

The following example code can be used to deploy the resources.

The PubSub Topic

The Cloud Function

The values for “source_archive_bucket” and “object” provide the location where the code for the cloud function resides.  In this example, the main.py and requirements.txt files are stored in a file named “budget_alerts.zip” that is located in the newly created bucket (hence the Terraform resource reference).

The Budget Alert

The last item to deploy is the billing budget, which in this example uses the beta (as of October, 2020)  Billing Budgets API.  This API can only be called via a Service Account, so please review the documentation if you’re not already familiar.  Budget alerts are configured in the billing section of GCP, and not within a specific project.  All budgets can be viewed from the GCP console under Billing -> Budgets & Alerts (assuming you have the required permissions).


You’ll need to use the google-beta provider to deploy this resource. The “display_name” is also critical.  Notice it uses the project ID as the first portion.  Since our Cloud Function extracts the project ID from the budget display name, it is critical that the function is named using the project ID.

The budget filter is applied only to the project being deployed (with a project module), and the “all_updates_rule” block directs the budget to ship any alerts to the PubSub topic that was created above. The alerts will be triggered based on the budget amount and the thresholds used.  Use whatever values are relevant for your particular organization.  It is highly recommended to have thresholds above 100%.

Note that if you hit 100% of a budget, there is no impact to running resources — all resources remain active, and usage charges continue to accrue. Organizations with more sophisticated Site Reliability Engineering practices may, of course, extend their Cloud Functions to take limiting action when a budget alert is produced. The extent to which you choose to respond to the budget alert is entirely your decision, but, by default, GCP does not automatically suspend activity simply because a budget threshold has been reached.

The Python Code

Let’s look at the Python code to see what happens when a Budget Alert is triggered.  I’ll paste snippets of code and explain what’s going on along the way.  All comments and error checking have been removed to shorten the length of this article, but indentations and spacing are represented.  Additionally, the label values for this example were validated during the project request process, and therefore sophisticated validation logic for these label values is outside the scope of this article.

The first item is the “requirements.txt” file, which includes all the Python dependencies needed for this function to run:

MSAL and Requests are needed if you plan to use the MS Graphs API to send email.  This example does, but there are other options like SendGrid or using Cloud Operations Monitoring to trigger alerts via email.  The other two entries help Python understand the GCP APIs and OAUTH authentication.

Next, the main.py file.  The first section contains the imported packages needed:


There are some MS Graphs API modules.  These may differ depending on the solution you choose.

Now we get to the function definition and first steps:

The function initially assigns the complete Pub/Sub message to the “pubsub_message” variable.  It then parses through to extract the information we want.  The information here shows how the PubSub message is constructed.  The attributes and the data portions are separated into different variables, then converted to JSON format.  Attributes are sent in clear text, while the data portion is encrypted (hence the “base64.b64decode” call).

Next:

The budget name, cost amount (which is the total accrued cost at the time of sending), the budget amount (set in our Budget Alert), and the percent of the budget used (dividing cost by the total budget amount) are extracted from the attributes and data.

Next:

This part takes the whole budget name, “project.id Budget Alert,” and separates the values based on spaces, creating an array with three items:  the project ID, the word “Budget,” and the word “Alert.”  Since our automation places the project ID in the name, we can pull that value out and assign it using the [0] place in the array, since it’s the first item.

Next:    

The function now calls the GCP Resource Manager API, and we can leverage the project ID to query and pull in the project-level tags.  It builds a request string to Resource Manager, then pulls in all project metadata into “response.”  From there, only the labels are extracted into a single “labels” array.

Next:

The function iterates through the array to find “contact_email.”  Because GCP does not allow label values with @ symbols or periods, we need to transmute the value we find into a usable email address.  The example format used was firstname_lastname__domain_com, so we first go through and change the double underscore to an @ symbol, then we go through and replace the remaining underscores with periods, producing a valid SMTP address!

If your organization will never have alert targets outside of your own domain, it may be possible to simply assume the domain information and just include the userID in the label.  In this fashion, the label might read “bob_smith”. The function would know to convert the underscore to a dot, and to tack on @sampledomain.com to complete the email address: bob.smith@sampledomain.com.

The Notification

The last steps are to construct and ship the notification.  Again, the example here uses the MS Graphs API.  Your organization may not be using MS Graphs, but If interested in this solution, please continue reading through the rest of the snippets and explanations.

First:

Here, we are setting up the connection string to MS Graphs.  Please ensure you replace the yellow highlights with valid values for your MS Graphs instance.

Next:

This section checks to see if there is an existing auth token available.  If not, it reaches out to MS Graphs to obtain one, then assigns it to the a_token variable.

This portion drafts the email contents.  I am using the variables from the start of the function for the project ID, budget percent, cost, and budget amount.

This line calls another function, “send_mail,” which I found here and modified.  Your organizational needs and policies may differ, and if so, may require some additional adjustment to this portion.

The access token we retrieved, email portions constructed, and our recipient are passed in.  The recipient list, passed in as an array, is then parsed to look for multiple recipients.  The email message is then built using the email portions.  Finally, the email call is sent via a post.  The highlighted “”will be the SMTP address seen on the “FROM:” line in the notifications.

And there you have it! Using GCP-native products and Python 3.7, we have built a custom budget notification system. If you would like to learn more about how to optimize your costs on Google Cloud Platform, download this white paper.

About the Author

Brian Kudzia
Brian Kudzia
Brian Kudzia is a Cloud Engineer and Architect who's been in IT since 2006 and with Maven Wave since 2018. He holds Associate and Bachelor's degrees from Purdue University, and has obtained the GCP Cloud Architect, GCP Network Engineer, and GCP Security Engineer certifications. He is an avid IT enthusiast, a big White Sox and Blackhawks fan, and lives on the northwest side of Chicago.
November 5th, 2020
INFRASTRUCTURE

Get the latest industry news and insights delivered straight to your inbox.

2020-11-05T15:09:48-06:00