By: David “Mac” McDaniel
The datasphere is expanding exponentially with more sources, applications, and methods than ever before. As it grows, data pros increasingly seek more innovative and cost-effective strategies to ingest, store, and analyze data in a meaningful way.
Google BigQuery offers a solution to this challenge. BigQuery makes large-scale analysis of data accessible and affordable to businesses of all sizes. BigQuery is a cloud-first, serverless, fully-managed data warehouse. It works with any data size — executing even the most complex queries in a matter of seconds.
There are many benefits to using BigQuery, as well as a number of tools and strategies to optimize BigQuery costs. When exploring BigQuery-enabled data strategies, it’s critical you understand the BigQuery pricing options available to you, plus what strategies will work best for your unique use case.
BigQuery has two main pricing models for analysis, or running queries:
- Flat-rate pricing: You purchase slots, which are virtual CPUs.
- On-demand pricing: You’re charged for the number of bytes processed by each query.
You can combine both of these models based on your needs. With this, you’re also getting machine learning capabilities without having to move data between separate systems.
BigQuery has a few pricing models.
In the flat-rate pricing model, you pay a flat fee, regardless of the number of bytes your queries scan. This pricing option is perfect for customers who need a fixed monthly cost within a specified budget. To use flat-rate BigQuery pricing, users must buy BigQuery slots.
BigQuery slots are units of the computational capacity necessary to execute SQL queries. Before executing a query, Google BigQuery calculates the number of slots required. The calculations are based on the complexity of the query, as well as its size.
Under the current prices, $10,000 would secure you 500 slots. That means that at any one time, your queries will be limited to 500 virtual CPUs.
BigQuery Flex Slots
In 2020, Google Cloud introduced BigQuery Flex Slots. This pricing option allows users to buy BigQuery slots for short amounts of time, starting at 60-second intervals. Flex Slots are an excellent addition for users who want to quickly scale up or down while maintaining control and predictability of costs.
Flex Slots are ideal for organizations with business models that are subject to massive shifts in data capacity demands. Events like a major app launch or a Black Friday shopping surge make perfect use cases.
Flex Slots currently cost $0.04 per slot, per hour with the option to cancel any time after 60 seconds. You will only be billed for the duration of the Flex Slots deployment.
In BigQuery’s on-demand pricing model, you pay for tool usage. The more bytes your queries scan, the higher your costs will be.
There are two types of tasks that incur costs on BigQuery:
- Storage: As your data accumulates over time, the rental prices of the storage can increase significantly. Therefore, you should make it a habit to clean the data tables you are no longer using. You should also clean and normalize the existing raw data.
- Query-related costs: The more queries you run, the more data is processed, and the higher your costs. What makes cost management particularly challenging is when you have countless scenarios where your analysts and engineers write bad queries. To manage this, create an admin rules engine that can control or altogether stop bad queries.
In addition to analysis pricing, there are also the costs of storing the data you load into BigQuery.
BigQuery has two storage types:
- Active storage: Any table or table partition that has been modified in the last 90 days. It costs $0.020 per GB.
- Long-term storage: Any table or table partition that hasn’t been modified in the last 90 days. It costs $0.010 per GB.
For both storage types, the first 10 GB is free each month.
Pricing is based on the amount of data stored in your tables when uncompressed. The data size is calculated based on the data types of the individual columns. You can see how it’s calculated here.
Controlling Costs in BigQuery
Employing BigQuery pricing best practices can help you control costs more effectively. Over the years, Google has added many features to BigQuery to reduce query costs.
The following are some of the most effective ways you can reduce BigQuery costs:
- Aggregate your data only once. Consider storing your aggregations’ calculations in a staging table and incrementally update it from one day to the next.
- Partition your data by date to allow for smaller data subset queries.
- Leverage clustering to automatically sort your data based on the values of a few columns that you specify and colocates them in optimally sized blocks.
- Use nested and repeated fields to align your schema with the BigQuery internal data representation.
- Stay away from using SELECT * every time you want to run a query. This way, you only query (and pay for) the data you need.
- Before executing any query, use the Google Cloud Price Calculator to check the price of that query.
- If you only want to sample your data, make use of BigQuery’s preview function.
- Consider breaking queries of large multi-stage data sets into small bits. When you parce down, you reduce the amount of data BigQuery needs to read, and ultimately, the costs you incur.
- Nested and repeated field
- If you need your data to be readily available, use Streaming. It’s free for the first batch and then $0.010 per 200 MB after that.
BigQuery is powerful and straightforward to use. It is an excellent tool for analyzing high volumes of data in an instant. Making it possible to scale up horizontally allows users to create data-driven architectures that would otherwise be practically unattainable. That said, it also requires them to do due diligence to ensure that they understand how to optimize performance without running into unsustainable BigQuery costs.