top of page
Search

BigQuery vs. Snowflake showdown

  • timandrews1
  • Jul 23, 2022
  • 7 min read



As someone who regularly uses both BigQuery and Snowflake, there are aspects in each tool that behave differently, or that provide either a better or worse experience than the other. The point of today's entry is not to summarize and select the best overall cloud data warehouse by objective measures, but more of a comparison based on the parts of the systems that I use. There are specific areas that I don't have enough experience in to comment on. For example, while I have set up and managed Google BigQuery projects, I've not done the same on the Snowflake side, which means I'm not able to compare the administration experience for both products. Instead, this is an article just for fun, which highlights things I really like or dislike with each product.


Compute utilization

BigQuery's default setup charges customers based on the amount of data processed, plus storage, and not the amount of compute reserved or used. This makes it extremely easy to use just the right amount of horsepower needed for each job, without any additional selection or configuration. Additionally, BigQuery also allows customers to choose flat-rate pricing and reserve compute power for those that would prefer an alternate approach.


Snowflake uses the more common approach in tying cost to compute resources allocated. It's up to the developer or cloud admin to decide which size of warehouse to assign to each job, which can affect cost. This results in more decision points, and in my opinion, wastes more time. For example, if a thirty-minute job should really take fifteen minutes or less, then developers and admins can be left deciding between query optimization or compute warehouse sizing. With BigQuery, the option would always be query optimization as compute sizing will scale automatically.

Winner: BigQuery

Schema organization

Most database systems utilize a three-tier organization system for objects. This is often expressed as the server level, the database level, and the schema level. Object names may be repeated as long as they are not in the same hierarchical path along the three levels. Although the concept of "server" means much less, or doesn't mean anything at all in cloud systems, it's still a logical organizational unit upon which policies and access may be applied.


From a data warehouse perspective, it's not uncommon separate objects into different databases based on the style of usage. For example, all raw data might be stored in a single database, and all transformed data might be stored in a separate database. Subject areas within raw and transformed might be separated into schemas. This allows quicker navigation and is more intuitive.


Snowflake supports the traditional server(or instance really)/database/schema model. BigQuery only has two levels: the project (database) and dataset (schema). Alternatively, BigQuery might be thought of as a server/database setup with no schemas. BigQuery does allow joining across different projects - which can work as more of a three-tier level. However, this isn't the same as what Snowflake offers, which is creating databases with simple SQL commands. Setting up new projects with BigQuery takes more cloud admin resources to implement. BigQuery requires more thoughtful naming conventions to delineate types of tables.

Winner: Snowflake

Web interface

As data engineers, the web interface is where we spend most of our time when we work with cloud data warehouses. That said, it's important for a web interface to promote efficiency and productivity, being intuitive to use. The comparison here is multi-dimensional, therefore we need to separate the comparison into subsections:


Number of web interfaces required. This is a strange category of comparison. The only reason that it exists is because Snowflake offers two web interfaces, both of which are less than ideal. What's worse is that the syntax across both of Snowflake's UIs is not consistent. Help documents often have two examples of code: one which works in the classic UI and one which works in SnowSight. True, the classic Snowflake interface has a button to navigate to the newer SnowSight interface, but it's still a disjointed process compared to the single, unified interface for BigQuery.


Overall look and feel. BigQuery uses a more compact interface with smaller default fonts, which results in more information displayed on the screen at a time. Snowflake uses larger fonts and icons, requiring more scrolling. Personally, I prefer BigQuery's more compact approach.


Integrated data visualization. SnowSight has a built-in, crude dashboard tool, making it easy to write queries, plop the results in graphs and send the insights out to those of interest. BigQuery has a button to pass a resultset over to a new Data Studio dashboard for visualization. While I appreciate the integration of both approaches, Snowflake does it better by containing the data viz within the same tool.


Worksheets and saved queries. Both platforms allow engineers to save queries to come back to later for further or repeated analysis. Snowflake's setup is more robust, allowing the creation of folders for organization and more granular sharing.


Object browsing. Snowflake's UIs are terrible at object browsing. The classic interface does not allow browsing anything other than databases and tables. To view stored procedures, we need to go into SnowSight. The problem with SnowSight is that objects with long names, combined with the large fonts used, tend to crop out important text while browsing. The left-side navigation pane is not expandable in SnowSight, which results in relying on mouseover tooltips to find the procedure of interest. BigQuery allows easy tree-like navigation for all objects within the same UI.


File uploading for ad-hoc analysis. It's quite common for an analyst or engineer to find themselves in a position in need of integrating an external file as a one-off exercise. BigQuery allows tables to be created, within the web UI, based on file uploads (with size limits), or by importing cloud storage files (Google and other cloud providers). Snowflake's web UI does not support importing files from the local PC. SnowSQL (a command-line tool) must be utilized to first upload the file into a stage. After that, the file may be copied from the stage into a table. Google's solution is more user friendly.


Overall, the most important aspect for me is the single, unified interface of BigQuery. Sadly, this seems like something that could easily be brought up to par on the Snowflake side. I do appreciate the better organization for saved queries in Snowflake, plus the dashboarding, but I just can't get past the dual-UI issue.

Winner: BigQuery

SQL language features

It's important for a database to provide a robust set of SQL commands which result in expressive and easy to understand queries. This is a large topic, which again yields itself to subcategories.


Window/analytic functions. Both platforms support the full range of window functions that we have all grown to love and rely heavily upon. I'm talking about things like RANK(), DENSE_RANK(), ROW_NUMBER(), LAG(), LEAD(), LAST_VALUE(), etc. Additionally, both tools support the QUALIFY statement, which makes it easy to filter on window functions.


Data types. Both databases allow sizing of string or varchar types. However, the use of non-specified length is promoted in both solutions. Google's string type is called STRING, and Snowflake uses the more traditional VARCHAR. One one hand, it's easier to type the word STRING than VARCHAR for some reason, but makes porting legacy code more involved. As far as type-casting is concerned, Snowflake has a very useful double-colon shorthand syntax for type-casting. BigQuery requires the more traditional, long-handed version of type-casting.


Semi-structured data types. Semi-structured data types are very important because they allow for easier integration with source systems that use JSON, and allow for more compact and ragged table designs. That said, both solutions are equally great in this category. Both support Array types. While BigQuery supports a Struct type, serving as kind of a record-within-a-record concept, Snowflake uses an Object data type which is fairly similar. Snowflake's Object type is really more similar to a JSON type, which itself is a new type coming to BigQuery. Overall, embedded objects in BigQuery is more robust and offers more options than Snowflake. However, Snowflake's Variant data type comes in handy, without an equivalent in BigQuery.


Procedural / control flow. Both BigQuery and Snowflake support procedural syntax and elements such as loops, branching logic and variables. The main difference, in my mind, is that Snowflake can be more verbose at times. For example, using the classic UI requires the use of $$ and Execute Immediate. It's more difficult to debug a dynamic string than it is to debug pure SQL complete with syntax highlighting. Another example is in iterating over rows. Google's For....In syntax makes this very easy, whereas Snowflake requires a cursor declaration. Another item of note is that Snowflake supports Javascript directly in the UI. Javascript is sometimes needed for full functionality when writing stored procedures. Overall, I prefer BigQuery's more SQL-oriented and concise approach to procedural language elements.


Predictive modeling. BigQuery supports the creation and execution of predictive models, directly as SQL language extensions. This is a killer feature and there is no equivalent in Snowflake.


Overall, the SQL support on both platforms is pretty close. However, it's a bit more streamlined in BigQuery without the need to use alternate languages, and predictive modeling in BigQuery really makes it a one-stop shop for an analyst.

Winner: BigQuery.

Killer cloud features

To me, cloning and time travel are both two crucial cloud data warehouse features. Cloning is an easy way to create dev copies of very large tables. Likewise, time travel allows quick recovery from mistakes. While both solutions support both features, Snowflake gets a slight nod as it allows configuration for the days saved for time travel per table.

Winner: Snowflake

Accessibility

Accessibility may not be a feature important to everyone, but to those of us who enjoy learning and building things on our own time (such as the research that goes into many of the posts on this site), it's a crucial feature.


In the old days, it was easy enough to spend $40 for SQL Server Developer Edition. Later on, the software turned into a free download. Cloud data warehouse systems introduce a barrier for learning outside of work. BigQuery is much more accessible than Snowflake - most everyone has a gmail account, making it easy to use to create a GCP project. Given BigQuery's pricing structure based on data processed, this means that I spend on average, $4 per month for Google BigQuery. This covers more than just experiments - I have near-realtime feeds for IoT data loading in BigQuery. It's just that as an individual household, I have no where near the data volume that an organization may have - pricing scales well down to the smallest use-case in BigQuery.


Snowflake requires more of an investment to use. At the lowest tier, doing an hour of querying on an XS virtual warehouse will cost $2. Snowflake also requires a company name to use to sign up for an account. While I could use BigQuery.com as my company, I would still need to commit to about $100 per month just to use Snowflake in a rudimentary form.

Winner: BigQuery.

Overall

Both BigQuery and Snowflake are awesome cloud data warehouses, with roughly equivalent functionality. I prefer both over Azure Synapse for various reasons. However, I do prefer BigQuery over Snowflake because I find it slightly easier to use, the built-in predictive modeling component, the billing structure, and the accessibility. What are your thoughts?

Comments


Post: Blog2_Post

Follow

  • Facebook
  • Twitter
  • LinkedIn

©2022 by Tim's BigQuery blog. Proudly created with Wix.com

bottom of page