You know you need to evolve your data & analytics experience – perhaps you are moving your whole warehouse to the cloud for easier access and more robust schemas, or perhaps you are looking to drive more intelligent insights with visualizations to drive faster business decisions. You may have a massive amount of scope to tackle across data architecture, engineering, and visualizations tasks – and you have a small team with limited capacity. How do you estimate your effort so you can get capital funding approval? How do you prioritize your features so you can deliver value iteratively, through phased releases?
Imagine you are a movie theater hurting for revenue as rivals like Netflix and Hulu surge in viewership. You want to drive an increase in customer movie visits, and because other brands have shown success with this, you’re considering enabling a movie theater subscription offering. BUT, is this even a good idea? Will subscriptions cannibalize ticket revenue? Will an increase in visits and food purchase through subscriptions offset the loss in ticket revenue? To understand the answers to this, we need DATA. But you don’t really have a data warehouse – you have siloed data sets from each source system like the point of sale, food and beverage sales, film schedules, and more. Let’s break down all of the activities and roles that may be involved in delivering an analytics solution set that can answer these questions.
What roles and tasks are involved in deriving an insight from raw source to visual dashboard? As our team has worked on data transformations, we’ve developed a process that curates a healthy data analytics flow, which includes:
- Business Architect that models business objectives, tracks success measures, and oversees data schema design. This role doesn’t necessarily need to be full time, and if your Product Manager is senior enough, they could cover these responsibilities.
- Product Manager or Data Analyst that’s intimately involved with stakeholders and end users; this role models business process flows to uncover key business decisions, elaborates analytics needs in visualization white papers, and models data details and dashboard wireframes. If you’re lucky, this person can ALSO develop their own visualizations (but not necessarily engineer a raw data source or write code). The dream data product manager is able to rapidly prototype visualizations with stakeholders using their tool of choice, craft elegant calculations to support needed measures, and collaborates daily with data engineers to ensure data sources support prioritized analytics use cases. If you have a less technical analyst or Product Manager, you may have a fourth role on the team: a data visualization developer. If you have a large amount of stakeholders or developers to support, you may have a senior Product Manager working alongside a second visualization developer or analyst.
- Data Engineer that does the heavy lifting with data collection, wrangling, transformations, and ultimately owns the development of the schema designed by your architect and analysts.
- Note – this does NOT include the efforts required of a technical architect to configure your tech stack infrastructure or security model, if you are building a net new environment. Because this is so variable depending on the technology, scale, and architect involved, we decided to exclude those elements from this estimation tool. Work with your solution architect to estimate these one-time up front activities to tack on to your derived totals.
So how do we estimate all of this? Pop in to the data estimation template to follow along.
Sign up below to receive your Data and Analytics Estimation Template!
Let’s define the team members you think will be involved from the “Resources” sheet
- Each resource type should have one row (in column A). This is pre-loaded with the roles listed above. Don’t worry about quantities of resource types yet – once we determine how many estimated hours to allocate to each resource, you can determine based on your delivery speed requirements how many people of each type you will need (or if you should combine roles to make efficient use of cross-functional resources).
- If you need to estimate cost down to the dollar, you can assign an hourly rate to each resource type in column B. This is great if you are outsourcing this work to a data & analytics firm (like Blue Fish | Seilevel ), but may not necessarily be as important if you are leveraging full-time employee resources for the work.
- Leave columns D through L alone. These are calculated to sum the hours required by each resource type by phase, based on your inputs from the task sheet.
Now let’s determine our task types, and which resource type is assigned to those tasks, in the “Tasks” sheet
- Column A is pre-loaded with the series of activities we’ve found to be beneficial in a healthy analytics flow. Not all of these steps may be relevant to your organization. Delete any rows that aren’t relevant to you, or insert additional rows if there are other activities that are important to your team.
- Column B is pre-loaded with our sample resource types from the Resources tab. You’ll need to make sure any values in this column match EXACTLY to the values you have in column A of your resources tab so all formulas function correctly. Go through each row and paste the appropriate resource type in for that task type.
- Nearly every task involves modifying an object. The object types are noted like <object type> in the task description, and listed on their own in column C. This means the task is repeated for every instance of that object type in your inventory – more to come on this in step 3.
- Now for the hard part….actually estimating! Column D is populated with a sample hourly estimate based on our experience. This is how many hours it will take you to complete the task for EACH object type that needs the task completed. If you have three process flows to model, column D should be how long it would take you to complete one process flow on average. Fill in column E with what you think YOUR estimate would be – preserving the original for reference later. Obviously this can vary widely depending on:
- Size and complexity of the modified object
- Experience of the resource
- How far along you are in the project – the 10th time you do a task might be 10 times faster than the first time. You can factor in efficiency levers in outer phases if you find this to be the case.
Once you have your resources and tasks broken down, we have to figure out the actual scope by analyzing the inventory of object types that need to modified through your task list. Let’s work through your scope by updating the “Objects” sheet.
- First take a look at your task sheet to understand all of the unique object types listed in column C. Each of these object types should have one or more rows in the Objects tab. As we ALWAYS like to do in clean data sets, uniquely identify each object instance with a unique identifier in column A (you can actually do this last once your object description rows are populated)
- Starting in the first row, take your first unique object type and populate that in column B. Start listing all of the instances you think are in scope for this object type in column D. Don’t worry about priorities yet, just try to capture everything for that object type. Optionally, once you are done, you can categorize each row in groupings with column C.
- Business Outcome: what are you hoping to achieve? Ideally these are really specific with a baseline, target, and timeframe defined, like: Increase same store sales from $1M/month to $2M/month by Q2. If you already have a direction for how this will be accomplished, you can tack that on to the end (but it’s not required) like “by increasing customer cinema visits through a subscription offering”. Each outcome should be modeled in a business objectives model that helps you understand the flow from your highest level business problems, objectives, down to features. Stay tuned for a follow up post on how to analyze business outcomes for data investments.
- Business Process: what is the business doing, and what decision points do they incur throughout that flow? Each process should have a corresponding set of process flow models (that may have multiple levels), ideally with identified pain points or desired improvements listed as callouts. The output of this modeling effort should be a set of clearly defined business decisions that you hope to better answer with data.
- Data Object: these are the nouns in your universe. Each object should be modeled in a Business Data Diagram so you can understand the relationships, business rules, and cardinality between objects. This will eventually inform the more technical data schema design.
- Data Source: the various source systems where data is coming from. This could even include an Excel spreadsheet that the business maintains manually.
- Dimension Slice: these are core sets of object combinations that will be important in your data aggregations to support dashboards. If you are selling products online, you will almost always want to see product sales by day (so product + day would be the dimension slice). These slices will result in either aggregated views in your database or joined tables in a published BI tool data source.
You obviously can’t build everything at once, and some activities will be dependencies for others (you need to model and engineer the data before you can visualize it). Staying on the “Objects” tab, now we are going to estimate prioritization.
- You can approach this a few different ways. In this example, we wanted to split the work in to three core phases, with each phase having a different business objective focus. So our priorities are just listed as 1, 2 or 3. You may decide to actually stack rank the entire list so it’s a sequence from 1-n. You may decide to assign Fibonacci points for the elements in Weighted Shortest Job First to derive a prioritization score. However you decide to approach this, ensure “Phase” is populated in column E, and any adjacent details are added in other columns. If you decide to go with 1 phase where everything is a straight stack rank, you can just populate everything as 1 in column E, add stack rank to column F, and then hide or delete phases 2 or 3 from the “Tasks” tab.
- If you navigate back to your “Tasks” tab, by now you should see all of the data automatically populated in phase 1, 2, 3. If you see errors, make sure the names of resource types and object types match exactly. Note, this template is configured for three phases and many formulas are dependent on the phase ID. You may need to refactor the Tasks tab a bit if you decide to go a different route with your prioritization.
- Now, each phase should have an estimated dollar amount (if you’re tracking money) and a total hours estimate. From here it’s working to calibrate your scope and priorities to ensure you can deliver your objectives based on your desired time goals. You’ll notice several of the example object types don’t have anything populated for phase – these are items that you can cut from scope to save cost, OR simply wait on prioritizing until you know you can get your highest priority features delivered.
As with all estimates, this likely won’t spit out AN ABSOLUTELY FLAWLESS PERFECT ESTIMATE THAT WILL BE 100% CORRECT. And in fact, you most likely severely underestimated your tasks on your first attempt. Continue to use this tool as a planning asset as your team matures and grows; it would be fun to start tracking how your estimates change over time, as the team gets more efficient and comfortable with this flow. Note that these tasks are all directly variable to your scope – but there is other overhead you should factor in, like any agile ceremonies or meetings the team will need to attend. I typically track a 9-hour day per resource and count up to 6 hours of capacity per day to complete deliverables; the other 3 hours cover meeting overhead, housekeeping, planning, and the inevitable buffer in how long each task will take in case you underestimated on the first go. But this should set you up with at least a really decent first stab.
HAPPY DATA TRAILS!