This is a guest post by Steven Collings, Senior Data Consultant at Zoopla
Zoopla is a property website that enables users to find residential or commercial property to buy or rent in the UK and overseas. Since acquiring Property Software Group and Expert Agent, we also offer a backend software that agents can use to build their businesses. Amidst the growth and acquisitions, we needed a way to bring together data from disparate systems to drive key performance indicators (KPIs) for all the Salesforce and NetSuite data we store in Amazon Redshift.
Building a flexible and scalable data warehouse with Amazon Redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze all of your data using your existing business intelligence tools. We have used Amazon Redshift as our data warehouse for more than 5 years and have developed deep knowledge of the AWS analytics stack. Amazon Redshift has always performed well for us and integrates with other services we rely on, such as Amazon Simple Storage Service (Amazon S3), AWS Glue, and Microsoft Power BI, among others. Importantly, Amazon Redshift has evolved along with our needs. For example, we adopted Amazon Redshift Spectrum to query data directly in our Amazon S3 data lake so that we can scale efficiently from a cost and performance perspective, and easily combine data in the warehouse and the lake. In general, we’re pleased that AWS has continuously allowed us to scale and move forward.
Complicated custom scripts and disparate data
We had custom-built scripts pulling the data into Amazon Redshift from different places, including NetSuite and Salesforce. These were built by different people, often in different languages, and not documented. Each script required maintenance to keep up with changes to source systems and APIs. We wanted a solution to help us integrate data more quickly and efficiently, using less developer time.
In addition to custom scripts, we were using native connectors from Power BI to shortcut data straight into reports. We were integrating data too high up in the stack to be able to reuse it in the ways that we wanted to. A proliferation of Power BI models was causing data to become siloed, and we ended up with a series of point solutions. We wanted our data centralized in our Amazon Redshift data warehouse so we could ensure its quality, join it together, and create enterprise data models.
We recognized that feeding the data directly into Power BI wasn’t scalable. Power BI has a key role in our data stack for dashboarding and self-service analytics, and we wanted to keep our use of the tool squarely in its sweet spot. We didn’t want to push every piece of fine-grained data into Power BI just so we could use it for a deep-dive analysis. Not only would this approach be expensive, it also had performance implications and reduced the freedom of our analyst community to use the best tool for the job. It made more sense to have that data in Amazon Redshift (as our existing data warehouse solution), a platform that is well suited for running fine-grained, large-scale analyses using whichever tool best suits the use case.
Fivetran for automated data pipelines
We selected Fivetran to ingest the data. Fivetran replicates data from applications, databases, events, and files into Amazon Redshift. Fivetran connectors deploy in minutes, require zero maintenance, and automatically adjust to source changes so our data team can stop worrying about engineering and focus on driving insights. With Fivetran bringing data into Amazon Redshift, we have increased data quality and can easily integrate new datasets.
Freeing up engineering resources
Due to competing priorities for data engineering resources, my team faced a reduced level of support. With Fivetran, we could push ahead and make progress while working with fewer resources. We enabled existing members of the BI team to perform data integration tasks that previously required engineering effort (such as importing new sources, modifying existing sources, implementing data cleansing, and shaping logic) and freed up our data engineers to apply their skill set to value add activities beyond maintaining data pipelines.
We estimate that Fivetran currently does the work of up to one full-time engineer, and we expect that number to increase. We’re interested in adding more sources that aren’t being integrated at the moment (such as campaign performance or customer helpdesk), which will increase the number of engineering hours that Fivetran saves us.
Building out comprehensive KPIs
One of the biggest drivers for bringing on Fivetran was a project that required centralizing NetSuite and Salesforce data for a large KPI project. We had a custom-built Salesforce connector but we didn’t have the skill set on the team to maintain it, and we didn’t want to spend development resources when we could buy it off the shelf.
The project entailed building a KPI overview for the senior leadership team. The weekly dashboard monitors about 40 different KPIs and metrics across Sales, Product, Marketing, Financials, HR, and other departments. It’s constantly available to the senior leadership team and allows them to understand overall business performance and also drill down into areas of concern that require further investigation and analysis. A streamlined version of the dashboard is displayed on screens around the office so that everyone feels informed and connected to our mission.
While some of these KPIs were already available, they were spread around different systems, lived in different reports, or were never even surfaced. If they were attainable, the process was often manual and prone to errors. This has been the key deliverable. It was always in our mind that we didn’t want to build a point solution. We wanted to ensure that all the data we were landing could be leveraged for other purposes, and we wanted to make this data available in a self-service capacity. By providing faster, simpler access to data, we enable quicker, more informed decision-making and open up the next wave of questions as people understand what is possible.
By centralizing data into the existing Amazon Redshift data warehouse, using Fivetran to automate data ingestion, and building dashboards with Power BI we’ve created a consistent and efficient analytics process. It’s saved our team time, and made sure we’re able to continue to deliver valuable insight to our stakeholders.
About the Author
Steven Collings is a Data Consultant (formerly Head of Data) at Zoopla, with 15 years experience of data storage, ETL, data modelling, and reporting & visualisation techniques and technologies.