British retail site Not On The High Street has shifted away from a brittle, homegrown data warehouse to Snowflake’s wholly managed, cloud-built solution in a bid to simplify management and save on some hefty compute bills.
A great British small business story in its own right, the site was founded in 2006 by Sophie Cornish and Holly Tucker and curates the best of small British creative gift makers that tend to lean to the quirky and customisable side.
On the technology side there is a small team who were dealing with a data architecture that had quickly become a burdensome, legacy model.
Essentially the company has been busy over the past couple of years shifting from a data warehouse that was made up of MySQL databases and a bunch of Python scripts, to a more fully-managed cloud data warehouse from the upstart vendor Snowflake.
Here's how it got there.
The old model
Speaking at a Snowflake morning event in London last month, Ben Davison, technical lead of data at Not On The High Street explained the transition from the old architecture, essentially a monolithic Ruby on Rails web app running on a MySQL database, to a system underpinned by Amazon's Redshift data warehousing solution.
In 2010 the company's engineers decided to build a completely custom Ruby on Rails MySQL data warehouse, "which would allow you to pull out facts and dimensions form the production database, put it into another MySQL database and use a custom BI element to write all of the custom BI you needed," Davison said.
"This architecture was built for one specific thing: to get data out of the production database, grab facts and dimensions and allow finance to run their reports," he said.
Then in 2015, as the company's data started to significantly grow, it shifted again, this time to Amazon Redshift; a "game changer" at the time, according to Davison.
So that architecture consisted of: a production MySQL database the site runs off, feeding another homegrown MySQL data warehouse and then a bunch of scripts to replicate data to Redshift for better parallel querying.
However as the company added more processes and API integrations, using more and more scripts, "it was starting to get brittle," Davison said. "The engineers had coded themselves into a corner, they had to maintain so much stuff: all of these Python scripts which all relied on each other. They were spending so much time keeping this stuff running and you also have Redshift running on top of this, which comes with its own set of problems: it wasn't built for the cloud."
So in 2018 the company looked for a new solution that required far less maintenance.
The retailer outlined six success criteria for the right data warehouse solution, starting with decoupled storage and compute. "It must be able to put an unlimited amount of data into this platform without having to scale up compute, I would say that is Redshift's biggest problem," Davison said.
Not On The High Street has already shifted 9 terabytes of Google analytics data to Snowflake but doesn't run hugely compute intensive queries. "We do run big queries but most of the time our data warehouse doesn't use a lot of CPU but we need to store a lot of data," he added. "We would not have been able to have had that in Redshift without spending a lot of money, more than we spend on Snowflake by quite a bit."
Next it had to be easy to maintain. "We just want to turn something on and whip out our credit card and say: 'you do it'," he said.
The next factor was cost, with the new data platform having to come in cheaper than the existing Redshift-based system. "Being able to scale up and scale down according to when our data analysts work and turn it off on the weekend," allowed the company to achieve that.
Speaking to Computerworld UK later on that afternoon, data director at Not On The High Street Andrew Thomas said: "We haven’t been able to yet sit down and do a proper rationale of the costs [but] back of the envelope, it is cheaper," he said.
Earlier in the day publisher TI Media said its move to Snowflake accounted for a 50 percent saving and Thomas said “we are really keen to get to that point, right now that isn’t the priority, but we are pretty confident of making cost savings."
Lastly, they wanted the system to be familiar to its existing end users, and to work with the wider ecosystem.
"I have to be able to give a connection string, user name and login to our analysts and data scientists and it just works. I can't be telling them to install new things, we need it to be plug and play for them and Snowflake has that," Davison said. "We were looking at Snowflake as the bedrock, the centre of our data universe, we need to be able to plug in all of the tools we were thinking of using."
Now, Not On The High Street is "at the point where Snowflake is up and running and supports a big chunk of our workload and we are really starting to move the core transactional models out of Redshift and into Snowflake," Thomas said.
That migration was a traditional lift and shift and took two engineers around six months, while also doing their day jobs. Now, "we don't run any of it, we don't host it, we just went and got others to do it for us," Davison said. "I don't have to wake up at 2 a.m to fix something or build out a new API integration, all my time is spent on what is most valuable to the business."
Thomas admits that Snowflake isn't "a complete game changer for us" but what it does allow the business to do, aside from reduce costs and free up engineering time, is "move much faster and get answers much quicker, but we are still running the same sort of queries," he said.
Due to the way it is built, Snowflake promises almost limitless scale and concurrency by effectively spinning up new cloud instances (S3 on AWS, for example) for each workload to effectively run as a standalone data warehouse but all under the same roof, so data science queries never tread on the toes of BI, or vice versa.
For Not On The High Street this has an immediate impact. For example, there is a dashboard running off that Redshift data for partners to view their performance metrics. "Then alongside that we have the data science workloads at massive scale up, scale down SQL engine queries and then the BI tools running all the time, so just enabling us to have a separate warehouse per workload," Thomas said.