As we reimagined data management across all of Tableau, we identified five pillars for the migration process framework that dovetailed well with our Snowflake selection criteria, and would thereby increase trust and confidence in the data that everyone uses. These pillars are: staffing, governance, authentication, communication, and documentation.
We’ll first discuss staffing, governance, and authentication in this post, highlighting some key lessons learned, unexpected issues with responses, and recommendations to consider when migrating and tackling data sets—large or small, simple or complex.
Staffing
We don’t want to sugar-coat the complex undertaking of any migration at enterprise scale. We started by forming a small, core migration team and quickly realized more assistance was needed to update approximately 9,500 workbooks and 1,900 data sources, and address any downstream content effects caused by differences at the database level.
The core team possessed some essential skills we suggest that organizations who make the same journey have: project management; development expertise with Python or a similar scripting language for modifying semistructured data like XML; and Custom SQL savvy. Recruiting talent with the right mix of data and programming skills that we needed was time consuming; we ended up reviewing upwards of 300 resumes and placing dozens of calls. Our central migration team counted at seven people—1.5 full-time program managers for six months, 0.25 server admins, approximately three full-time engineers, and two contractors—supporting upwards of 15-20 domain experts across sales, finance, and marketing.
The extended team—data scientists, engineers, analysts, and subject matter experts who work in business teams and helped move or transform data in Snowflake—were the first line of defense when questions or concerns surfaced from business users. These “stewards” of our data were able to answer questions ranging from data access and permissions, to process and timeline questions.
“We were the bridge between IT and finance business users since many data sources were managed by our team,” explained Dan Liang, formerly Manager of Finance Analytics at Tableau (now Manager, Finance Data Office, Salesforce). IT provided the centralized platform and standardization across the enterprise, but Finance Analytics tailored communication for their end users. “It was all hands on deck for a month as we handled content conversion, testing, and validation of data sources for our team’s migration. Tableau Prep was an integral part of our validation strategy to automate reconcile key measures between Snowflake and SQL Server.”
Recommendations:
- Identify and define roles and responsibilities: Without clear roles, there will be confusion about who is responsible for what specific aspects of the process. In our case, we had data stewards and consumers test the data with specific experts designated to sign off on the data.
- Automate (where possible): We could have allocated more time to better automate this process, especially around workbook and data source XML conversion, as well as data testing.
- Know that you’re comparing apples and oranges: We provided summary statistics like row and column counts and data types to our testers to help them compare the two data sets. But because of many factors (differing ETL and refresh times, plus potential latency), it was very difficult to pin down significant differences versus noise.
Governance
Our cloud migration was a golden opportunity to strengthen competencies around governance. Everything from Tableau Server content to naming conventions in Snowflake received fresh scrutiny in an effort to improve user experience and ensure scale. Those teams that invested in governance by establishing single sources of truth (through well-curated and certified, published data sources) had a more straightforward content migration experience. Those that didn’t invest as much in governance struggled with unclear ownership and expectations around data, and their users encountered surprise effects downstream during the migration like broken data pipelines and dashboards.
Because we had many different languages used by data engineers over time, we also conducted thoughtful upfront discussion about standardizing code patterns, including outlining which characters were and weren't allowed. Acting on these discussions, “We implemented Continuous Integration and Continuous Deployment (CI/CD) on our source control tool (GIT), so we could more efficiently peer-review code and transfer work between members of the team as needed,” said Isaac Obezo, a software engineer. “This was much easier than having domain experts do everything in a pipe.”
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
Further strengthening governance, built-in Snowflake features enable transparency into database metadata, including the ability to see and save all queries processed. Since that history is typically only stored for a week, we built a pipeline to store all of the historical data so we could provide more targeted support to end users, create new data curations, and promote our single sources of truth. In finance, we used this data to proactively reach out to users who experienced query timeouts and other errors. It also helped us maintain user access controls around Sarbanes-Oxley (SOX) compliance.
Recommendations:
- Use data quality warnings: These can communicate the status of a data source to users quickly and easily so they know when migration will happen and what will change.
- Recognize data management is a marathon—not a sprint: Progress and value deliverables are iterative. We concentrated on delivering smaller, but valuable changes as we migrated to the best model or data. We also benefited from using data to monitor performance of our cloud solution. Below is a sample visualization we built to monitor usage and performance of Snowflake.
- Minimize tech debt: Tableau Catalog gave us visibility into our data, including lineage and impact analysis to identify content owners. We were able to easily communicate to people what data and content could be deprecated and what was critical to move to Snowflake because of its usage or downstream impact. Consider leveraging an enterprise data catalog to help your end-users build knowledge and trust in data assets.
- Establish a clear cutoff: Appropriately budgeting time to complete a cloud migration is key; we took an informal survey and estimated an average of five hours per data source and one to two hours per workbook migration. Eventually, a final cutoff must be established where employees no longer have support from the legacy database or from the central migration team. If someone didn’t migrate their data when given ample time and assistance, they likely no longer needed it.
Authentication
Changing to a cloud based database required changing the database authentication method employed by users, apps, and connected systems. We went from an all On-Premise world of Active Directory (AD) identity management and automatic Windows authentication through AD for users, apps, and systems to the reality of the cloud where identity management across different apps or systems is not seamless or integrated out of the box. The best option is a federated Identity Provider (IdP) with Single-Sign On (SSO) capabilities across different cloud vendors and apps. If you are planning on having multiple cloud based apps, or want users to have a SSO experience, selecting the IdP that works best for you should be done before or in conjunction with your Snowflake adoption.
Initially, we connected directly to Snowflake with SAML via our IdP. This works fine, but has pain points, especially coming from the automated world of Active Directory, namely: SAML IdP password changes will require manual embedded password changes in all Tableau content using embedded credentials. In the time between a user changing their password and updating connections in Tableau, any extract refreshes using them would fail and workbooks using their embedded password would not render. The only way to have a seamless password change experience with nothing breaking in Tableau was to switch to OAuth use. Be sure to check if your IdP can be used with OAuth for Snowflake!
One important lesson learned here was the power of a Tableau email alert. We worked with IT to automate an email that assists users in the password rotation. One month out from a required password rotation, users receive an email from their Tableau Server Admins reminding them that their password needed to be updated, as well as a link to just that content on Tableau Server.
Recommendations:
- Be prepared to communicate and document changes: When changing authentication types you can expect to receive and have to answer many questions about how it works and how it differs, keeping in mind users’ different degrees of technical understanding.
- Strategically manage your storage driver: When you’re conducting an enterprise deployment to a platform like Snowflake, it’s important to push out the driver for everyone’s machines to maintain version control and updates.