This article was written by Kisi's data analyst Roman Vakhrenev.
In July 2022, upon joining Kisi as a data analyst, our data infrastructure consisted of BigQuery as the data warehouse and LookerStudio (formerly Data Studio) as our reliable BI tool. Initially, our dashboard setup was fairly simple – fewer than a dozen dashboards, lacking clear guidelines for development, and utilizing a single dataset for various dashboards' views and tables. It was manageable at first, but complexities arose as the number of dashboards increased.
Growing out of our first pair of BI shoes
- Chores: Making changes to data replication settings necessitated manual updates to all queries. For instance, replacing instances of "properties." with "property_" in different views and scheduled queries required tedious manual effort, proving highly impractical for scalability.
- Dashboard isolation: Reusing views across multiple dashboards became problematic when modifications were needed, leading to discrepancies in other dashboards.
- Maintenance barriers: Absence of a system to track views and tables used in specific dashboards made cleanup daunting, resulting in unnecessary accumulation that was difficult to delete.
- Unclear cost attribution: Our use of hardware logs as a data source led to unexpectedly high costs, which went unnoticed until later. Implementing cost tracking per dashboard would have helped identify and rectify the issue earlier.
- Query performance: Dashboards using BigQuery views experienced frustratingly slow performance due to the need for query execution each time they were accessed.
- Service credentials: Upon a data analyst's departure from Kisi, several dashboards lost functionality because multiple data sources in LookerStudio were linked to the departing analyst's credentials. Manual reassignment of permissions became necessary after her account was deleted.
Bringing a DevOps approach to BI
Following team brainstorming sessions, we implemented the following strategies:
- Isolated datasets: We created distinct datasets for each dashboard, preventing the use of tables from one dataset in others. This approach effectively resolved issues 2 and 3.
- Scheduled queries: Instead of views, we now rely on scheduled queries, running at regular intervals and storing results in physical tables. This shift tackled problems 4 and 5 and enabled better cost monitoring by easily identifying these tables in billing logs.
- Terraform: Terraform played a crucial role in addressing problem 1. We utilized Terraform to create scheduled queries and datasets, streamlining our processes. Terraform, an open-source infrastructure as code tool, facilitated easy query updates with a single click. Additionally, it served as a robust disaster recovery plan, storing queries and datasets externally on GitHub for swift restoration in case of data warehouse issues.
- Service accounts: Implementing service account credentials for dashboards and scheduled queries resolved problem 6, ensuring uninterrupted data source access.
The results
These strategic measures transformed our data warehouse into a manageable structure, allowing meticulous cost tracking at the data source level. We can now independently monitor the cost and usage of each dashboard, and access to dashboards remains unaffected by changes in our BI team.
Despite the substantial size of our Terraform file (approximately 15,000 lines, mainly SQL queries), it remains easily manageable. Queries can be located using specific names (such as "Sales Dashboard: Main Query" in the BigQuery UI or the internal name "sales_dashboard_main_query" in Terraform) and corrected as needed.
Stored in our GitHub repository, Terraform files enable tracking changes in queries through commit history and facilitate rollbacks if necessary. This adaptability will persist even with hundreds of dashboards; the Terraform file may grow larger, but its complexity will remain manageable. We also have the option to divide the file into smaller ones in the future if required.
Furthermore, precomputation significantly reduced our monthly expenses for scheduled queries and dashboard data usage, often lowering costs to as little as $10 per month while enhancing dashboard loading speeds.
Interested in what we do at Kisi?
Check out our open positions.
We don't just operationalize BI - we also transform physical security into an IT topic. Learn more about Kisi here.
About the author
Roman excels as a Data Analyst, transforming complex datasets into clear, actionable insights. With a talent for demystifying intricate information, he empowers decision-makers with data-driven clarity.