Snowflake Computing, a startup technology vendor which claims to have built the first relational database specifically for the cloud, is expanding to the UK following a $100 million (£78 million) series D funding round in April.
Snowflake has opened its European headquarters with ten staff in London and signed up its first named UK customer in travel distributer GTA.
Speaking at its first London event yesterday, the Cloud Analytics Conference, Snowflake Computing’s CEO Bob Muglia spoke about how it can solve a whole host of enterprise IT problems with its “purpose-built data warehouse for the cloud”.
Cloud data warehouse
Muglia said that old data warehouse solutions being ported to the cloud, or even modern open source solutions like Hadoop, aren’t fit for purpose because they weren’t built specifically for the cloud.
“It has been a slog to get the skills that make it work and to implement and maintain these data solutions,” he said. “They are like wild animals that need to be tamed on a daily basis and you never know when that lion will bite your hand off.”
Instead, Snowflake -- established before the word had connotations relating to a certain liberal-leaning generation -- has been built on the cloud, for the cloud. It is “built from scratch. A solution designed to work in the cloud and take advantage of that effectively infinite resource”, Muglia said.
In the world of NoSQL and Apache data stores like Hadoop or Hive, relational databases -- popularised by Oracle and Microsoft -- can look “stodgy” in Muglia’s parlance. And he certainly knows the business, having joined Microsoft in 1988 as part of its SQL Server division before moving to Juniper Networks in 2011 and then Snowflake in 2014 to lead it to market.
The advantages of using a relational database in the cloud is that it is a familiar, and popular format, while also accelerating the business by leveraging the cloud to bring down cost and increase speed and scalability.
As Muglia sees it: “It’s a SQL database and relational technology is the superior, correct way of doing data analytics. We can talk about online transaction processing and NoSQL, but in data analytics, relational rules.”
Google has its own range of cloud-based database solutions (Bigtable, Cloud Spanner, Cloud Datastore), including Cloud SQL, where companies can run an Oracle MySQL relational database on the Google Cloud Platform.
With the data stored in a relational way analytics teams can use the tools they like (Qlik, Tableau, even Microsoft Excel), instead of having to bring in new skills like Python.
Then there is the cost saving of not having to employ data engineers and database ‘gurus’ to tune the underlying infrastructure. Muglia wants Snowflake to allow users to simply “load the data and run queries, and everything else happens under the covers”.
Another advantage of Snowflake over traditional data warehouse is that it allows for lots of concurrent users without sacrificing query speed. In the old world “things work great with one query, but what happens when you have five or ten or 1,000 users on it? Can you make it work?” Muglia asks. “Not with existing systems, but the architecture of Snowflake allows us to scale to effectively limitless concurrency.”
Snowflake's cloud architecture
This all sounds great, but how has Snowflake magically solved all of these age old relational database problems? The answer, according to Muglia, is in the architecture.
Architecturally, Snowflake is underpinned by centralised AWS S3 storage. Then the compute layer is a set of AWS EC2 nodes. The key then is how Snowflake “micro-partitions” the underlying database so that queries are only being run on the data they need to keep latency low.
So where Hadoop is limited by the memory of a cluster (queries will naturally slow down as more concurrent users are added) Snowflake claims to be able to maintain speed regardless of the number of users because of this “multi cluster shared data architecture”.
“We pull the data we need to run that query as we need and it is stored locally in cache, so it doesn’t need to go back to S3, which is relatively slow compared to local flash memory,” Muglia said.
Muglia claims to have a customer running queries in under a second on a 400 terabyte data table, because it doesn’t “scan a lot of data and we do that because we understand the contents of the data,” he explained.
Snowflake charges customers along a typical software-as-a-service (SaaS) model, pricing storage in line with AWS for S3 and then charging as and when they use the compute capabilities, which it calls ‘virtual warehouse’. So because it offers data warehouse as a service the price scales up like any SaaS vendor, according to increased or decreased storage, compute or concurrent user requirements.
Snowflake claims to have more than 500 customers using its data warehouse solution (around half of which are in live production), including Sony, Nielsen and Hotel Tonight.
Speaking on stage yesterday Erika Bakse, head of business intelligence at IAC Publishing Labs (owners of popular websites like Ask.com, Dictionary.com and Investopedia) explained how they shifted entirely from an on-premise data warehouse and Hadoop instance to Snowflake.
The legacy data warehouse was sitting on old hardware, which was creating “performance issues” and wasn’t allowing for concurrent users to run analytics, according to Baske. For example 20 analysts trying to run a query would take half an hour, and the cluster often fell down under pressure.
The organisation migrated to Snowflake at the beginning of 2016 in just three months, just as its on-prem licence was wrapping up. It used the AWS Snowball to upload its data warehouse into the cloud. Bakse liked that Snowflake came with minimal operational overhead, as IAC didn’t have a data engineer on its books.
Baske says the majority of the analytics done at IAC is regarding its various websites, so most of it is in the machine-readable JSON format, “which Snowflake handles beautifully”, processing around 1.5 terabytes per day.
“We now get the data at an hour latency for users, so we end up with cleaner data, faster, cheaper,” she said.