Season 1 Highlights: 7 Data Strategies That Work - What the Best Data Teams Do Differently
Welcome to season one highlights of Data Matters. I started Data Matters at the end of twenty twenty four with a simple idea. Talk to data leaders from all over the world and hear about their problems. I'm delighted to say we've now spoken to seven guests, and they've done just that. They've shared their problems.
Aaron Phethean:They've shared their insights. They've shared their challenges, and we've ended up with an amazing array of recordings. I really hope you enjoy it. Let's dive into season one. Starting with Jessica from not on the high street.
Aaron Phethean:Now my recollection of the recording was that the time absolutely flew by. She described her journey from being dumped into the deep end where there was no architecture, there was no plan, there was no documentation, and trying to navigate that and communicate what was important. Now she came up with some quick wins. She came up with a a couple of great techniques. The Wardley map stood out for me as just an amazing communication device.
Aaron Phethean:So good that we've started to use it, and I've recommended that our clients start to use it. It. So that little insight, I think, was more than worth recording the entire podcast. To see how people deal with chaos is really helpful to our community. I hope you enjoy it as much as I did.
Bethany Lyons:Yeah. So wordly mapping, I kind of came across by accident. Before that, so I was tasked with, like, I need to put together a data strategy. So I'd, like, been looking into a number of things. I did
Aaron Phethean:a lot
Bethany Lyons:of research. Like, there's a lot of, like, from different consulting firms. Like, this is what your data strategy should have, and it's, like, really long and intense, and no one's gonna read 50 pages of data strategy. And that's so it's a lot of, like, buzzwords and keywords that are meaningless to anybody who who doesn't understand them. So it needs to be like, for me, I want things to be simple.
Bethany Lyons:I want them to I want it to be easy for me to understand because I don't have a typical data background. I've only been working in data for the past five years, so it's was important to me that it was something that I could understand. I have a mentor, and she pointed me towards, like, data maturity analysis. So I looked into that, and it started to make a little bit more sense. But, all I'm a part of, somebody posted a link to a video with Simon Wardley, who's the inventor of Wardley maps.
Bethany Lyons:He was doing a talk. I think it was at CECOM, explaining the whole concept of a audio map. And it was at that point, I was like, oh my gosh. That makes so much sense. So he was talking about audio maps and how it helped him drive the strategy for his technology company at the time.
Bethany Lyons:He goes through the whole background of how he read a lot of books about war and the strategy within war and how maps are used more in how you kind of develop your strategy and like which approach you're gonna
Aaron Phethean:take. Yeah. Very
Bethany Lyons:very Exactly.
Aaron Phethean:Comes after the lies, actually.
Bethany Lyons:So, yeah,
Bethany Lyons:like I was just like, oh my gosh. This makes so much sense. And he had a very good, like, example of how he applied it to his photo upload business and, like, where they could kind of see things. So the idea behind a Wardley map is you basically focus on a specific user and their needs within the organization. And all of those needs have their own needs, And it kind of starts to cascade down, and you eventually get to the point where you have a complete picture of particular process or particular service.
Bethany Lyons:And the way a Wardley map works is you you kind of put your user at the top of it, and then you've got all their needs, and then you can start to match their needs on the the Wardley map. So top to bottom is how able close it is to the user. It's more visible to them. They probably care a little bit more about it. The further away from the user, the less they would care about it or the less visible it is to them.
Bethany Lyons:So data, for example, users use Looker reports and Tableau reports on a daily basis. So that's very visible to them. So Tableau and Looker would be quite near to the top. Whereas your data warehouse is Snowflake, BigQuery, whatever you're using. They don't care.
Bethany Lyons:They'd, like, maybe hear you say Snowflake once in a while, and they don't, like as long as I where's my graph? That's what I care about. I don't really care where it's coming from. Just make the graph visible. So the less kind of visible, the less things people care about, that's kind of at the bottom.
Bethany Lyons:And then when you're looking from left to right on a walking map, that's kind of how evolved particular components in your map would be. So on the left, be things that are very new, like you're still exploring them, they're not on for developed ideas. And then as you move more to the right, it's kind of things that are well developed commoditized solutions, things you can just buy off the shelf. You're never gonna create your own data warehouse nowadays. It's done.
Bethany Lyons:You're not gonna create your own BI tool necessarily. You just buy one off the shelf. So those kind of things sit on the very far right. And then in between is kind of, you know, things that you're developing in house. There's still ideas.
Bethany Lyons:You you're still working on them going towards this is quite established. We don't do much maintenance on it, but it's still something we're responsible for in house.
Aaron Phethean:And show that to people? Do you then take that to meetings and, you know, explain to leadership, this is what this is what our architecture looks like, and then and do they understand it when you show it if you if you do?
Bethany Lyons:So it was this year that I created it because I wanted to create it as part of our data strategy and was a great motivator for me to be able to show why I wanted to do some more, like, tech oriented projects rather than business impacting projects. So they do impact the business, and I was able to show hows.
Aaron Phethean:In episode two, Joe and I had the opportunity to discuss the Humatica implementation at CityStream and why it was necessary. CityStream started out with this problem where there was an organization losing trust in their data, and that had occurred over time. You know, almost this change is too small to see. They had introduced new technologies. They had some technology debt.
Aaron Phethean:They had this, chaos of where all the data was coming from and, you know, no single source of truth. So the plan was to fix that and get back to a gold standard of data. And, well, Joe and I discussed some of the challenges, the organizational challenges, the technology challenges, and, you know, these are this is a project. All the while, you have to run the business and keep things going. So I really hope you enjoy these, highlights where we discuss some of those real problems that people face.
Aaron Phethean:And if you're in the middle of a a big migration, perhaps these can help you.
Joe Wright:But I call it a house of cards of of of of a of a data platform, because we've had, as you can imagine, and I'm sure this is is the same for a lot of a lot of people, people come and go. They all have their own way of doing things. So what we've ended up with is a sort of sedimentary BI stack where there's some logic and then someone's added a bit of logic on top and so on and so forth. And over time, that becomes quite complicated to manage. So the the opportunity that was spotted with working with you guys was to take all of that history, all that legacy, actually work out what's really relevant for the business today because a lot of it is just noise that's accumulated over time.
Joe Wright:And to create a much more streamlined BI stack with just the logic that's relevant in as few steps as possible to deliver what the business needs today. But on top of that, to build in the maturity around that stack that that prevents us sort of having the same problem build up again in the future.
Aaron Phethean:Yeah. Yeah. And and I think probably, you know, another common problem that I guess people are dealing with in a in a certainly in a similar situation to ours. This legacy infrastructure, this legacy, stack is working to some extent from the outside world. So the management team, the operations teams, they they ask you for data and they ask you to do some analysis, and they then they get an answer.
Aaron Phethean:So from their point of view, it's kind of working. Whereas, you know, the reality on the other side, you know, for you and your team, it's it's hard. Like, it's really difficult, and it's becoming increasingly difficult was was part of what we observed when we started the project. So I think, you know, one of the things I saw us doing for the business was bringing a kind of, obviously, you know, infrastructure cleanup, a, you know, kind of a a better system to work on, and therefore, you know, people free up their time, they then become more effective. From your point of view, how do you go about communicating the benefits of that to the business?
Aaron Phethean:Because from their point of view, it was working, and then there's this project to make it better. So you know, how do they see the benefits? How do you think they might see the benefits in a in a project of this scale?
Joe Wright:You have to find something, I mean, a tangible benefit that they can latch onto is my experience. So just rebuilding the platform, with the same outputs is a tough sell because, well, because it might make my life easier, but they're you know, that's not that's not the aim of my job to make my life easier. It is it is so long as it supports the business, but just for the sake of it, it's not. Yeah. So to get buy in from stakeholders in the business, they have to have a tangible benefit.
Joe Wright:And for us, one of the tangible benefits was, moving towards one version of the truth. So one of the challenges that we've always struggled with in BI is being able to deliver a set of numbers that is always the same no matter which report you look at it in. And because there's all these legacy logic, in the stack, it tends to generate forks in that data where there's one view written for one set of people and a slightly different view with slightly different requirements written for another set. And then all of a sudden, they're looking at reports, and when they compare, they don't match. You know?
Joe Wright:So it's a story as old as date of time, isn't it?
Aaron Phethean:In episode three, Stéphane Burwash and I discussed some of the technologies that go into the heart of Matatika. This story that Stephen tells is a great example of technology collaboration and the open source community coming together to discuss challenges and problems and and come up with solutions. So absolutely awesome experience to be able to discuss these types of issues on our podcast.
Stéphane Burwash:So when you get started well, as I said, I got started alone in my own team with a manager who was incredibly supportive, but he had no experience in the modern data stack in the modern data space. So I was on my own. And so where do you find that support? Where do you find help? Like, my answer is everywhere.
Stéphane Burwash:Like, there is no single solution. There is no that is the Reddit thread you have to be following, and you'll get all the answers from there. I I was scrounging for information. So we met on, the Meltano Slack, which was just the Slack of an open source tool that, I still use to this day, which
Aaron Phethean:That's great. The Slack group's great. The tool is great. The support there is great. Like, you know, if you need a need someone else to speak to, like, they probably know how to solve
Stéphane Burwash:You get emotional and technological support. But, the Meltano Slack actually bred well, we should be on other Slacks. We should be on the dbt Slack, the airflow Slack, but not only Slacks. Reddit. I went on the data engineering Reddit a lot, asking questions, read a lot of books, went a lot to meetup events, in person.
Stéphane Burwash:We actually were so inspired by meet up events that we actually started our own meet up events. So we're now Okay. In charge of the PyData Montreal and the MLOps Montreal meet ups. The one piece of advice I would give to anybody that is just lost as lost as I was and still am to this day, in the data space is ask questions. A
Stéphane Burwash:people Yeah. Are scared of asking questions on the DBT Slack or anywhere they feel like they'll be judged. They feel like their question is stupid. And I feel, very strongly about this because everybody else within that space was exactly where
Aaron Phethean:Exactly.
Stéphane Burwash:You are now, which is you have no idea what is going on. And the beauty about is there's so many tools out there, and there's so many methodologies that you are constantly put back in that position of having
Aaron Phethean:no clue.
Stéphane Burwash:And everybody has been in that position, which makes, in my mind, everybody incredibly generous with their own
Aaron Phethean:I do find that. I think the community culture is very strong. Maybe I was gonna say at the moment, but I just I do feel that that, you know, you can ask questions. You don't get, you know, called out for being, you know, stupid or not knowing. Because everyone they said everyone was there.
Aaron Phethean:Everyone didn't know one time. And, maybe maybe a little bit more so in the open source world, in the kinda new tools world because well, basically, everyone remembers that because it was just the other day. You know? It's it's it's all new. That's that's cool.
Aaron Phethean:Huge thank you to Bethany Lyons who joined us on episode four. Now Bethany is a long time product manager and, you know, a large visualization company and turned to data consultancy and working at the coalface with with real data. The idea in the beginning was that we would almost have a technology roast of, you know, Manateeke and is it any good? And, you know, she's she's knowledgeable in the industry. I I, you know, look forward to that, but also is a little bit apprehensive.
Aaron Phethean:The end result was was much better than I could have imagined. One of my key takeaways from the discussion with Bethany is is her focus on pragmatic solutions and getting the job done. And a lot of times, that is the most important thing for a business, especially in analytics. You don't need perfect. What you need is enough to make a good decision.
Aaron Phethean:I hope you enjoy these highlights as much as I enjoyed recording
Bethany Lyons:them. There's a pragmatic side of things as well. Like, it's nice to be like, oh, in the ideal world, you would have this as an input field in your system. And it's like
Aaron Phethean:yeah,
Bethany Lyons:that's a it's an expensive IT project and a huge change management, like transformation, like,
Aaron Phethean:yeah,
Bethany Lyons:and that's kind of the way I always see data is like shadow IT in a way.
Aaron Phethean:Yeah, because
Bethany Lyons:there's so much, there's
Bethany Lyons:so much you can fix with broken IT systems in a kind of, you know, not an ideal way, but like, in a good enough way, by just hacking together data behind the scenes from multiple systems. So I actually was like speaking to the head of Treasury yesterday. And I was like, today, your shadow IT building all these access databases, like, let us, the central data team be your shadow IT department. So you you be the business we'll be the shadow IT team, and then the IT team can be the IT team.
Aaron Phethean:Yeah, that that resonates with me so much. I think the one place my mind goes to when you work on transaction, your record keeping systems, you often completely forget about the downstream reporting. And you're even a product manager of a of a a data processing system. You you think your job is to move a item through its its life cycle. But your job doesn't end there.
Aaron Phethean:It has to be the reporting output as well. So Yeah. The kind of shadow IT nature is that the poor old data team is coming along afterwards and trying to collect together what happened and the information they need. And Yeah. You know, that that's just repeated over and over because people don't think about the end.
Bethany Lyons:They don't
Bethany Lyons:think about it at all. Yeah. Yeah. And and this happens, like, when I was in hotel tech, we built a property management system and I worked on the reports. And it was always, like, just fighting fires because feature teams would wanna do stuff like enable you to uncancel a reservation.
Bethany Lyons:And I'm like, oh my god. Don't implement that. You will break my pickup report. Like, it will be nonsensical.
Bethany Lyons:And you you have to, like, anticipate every possible eventuality. It's, yeah, it's bonkers.
Aaron Phethean:In episode five, Adam reframed data quality as a company wide challenge. And he took what was fragmented systems and data silos and, you know, applied his experience in getting teams to work together and teams to care about the data. His work broke down data silos, got teams working together, and created a trusted source of data through that collaboration. Adam reminded us that data quality isn't just a data team problem. It's a shared responsibility.
Aaron Phethean:Let's dive in and hear what Adam has to say about data quality. A way of talking about it because a lot of people talk about it as a data quality issue, but actually, it's more like a data team to rest of company interface issue. Like, you know, if people don't even know that downstream there's a data team to worry about, Of course, the quality's gonna be, you know, and, you know, it's kinda how it surfaced, but that's not exactly where it starts.
Adam Dathi:Yeah. Yeah. I I think that this is a lot to do with the company's internal ways of workings rather than necessarily, like, a data quality problem. Data quality now makes it sound like it's the data team's responsibility, and there is a responsibility to make sure that the data's good quality. But it's it's not specifically a data problem.
Adam Dathi:It's a company problem that surfaces via the data, and it's just usually lack of alignment. And I do think, generally, in lots of places I've worked in, data quality is an issue, and I've seen different levels of investment in trying to support it and trying to correct it. Quality is something that I think in most teams that I've seen, they could do they could be doing more to monitor the quality of what they're producing. Also think that SQL itself doesn't lend itself well naturally to things like unit tests. There's certain best practices that would be there if you were from more of a development background, a software engineering background.
Adam Dathi:And it's taken a little while for it to start to migrate down into more of the the analytics world.
Aaron Phethean:Yeah. Exactly. It tends to be, you know, characterizing a little bit, it tends to be that the data team is expected to follow fast and just sort of put up with what they get. And, you know, that is obviously really hard. That's a really hard place to be because either you're pressed for time or you've got no ability to change your inputs.
Aaron Phethean:So, yeah, that definitely hampers what you can, you know, produce. You know, that that's a that seems to be a, you know, a data team slot in the world. And like you said, actually calling it a data quality issue almost lands it in the data team, and that's that's kind of unhelpful in its own way.
Adam Dathi:Yeah. Yeah.
Aaron Phethean:In episode six, Nick Bromley and I talked about transport data and some of the things that we'd explored over the last few years. Nick aims to tackle an issue that is quite surprising, transport data that's nearly a century out of date, point in time data that just had there's no resemblance to what transport requires today. And yet, we're still trying to make long term and short term plan decisions decisions off that date.
Nick Bromley:Well, yes. I mean, but you see, we've we've had decades of of this very problem and how we solved it because, you know, modeling and collection of transport data has has been more of a science than guesswork probably since the mid to late nineteen seventies. You know, a lot of
Aaron Phethean:lot
Nick Bromley:of modeling, work in transport sector so that you can trace it back to sort of seventies and certainly the eighties. So it's not like we haven't had forty years at, you know, at at this. It's just that we've been using lots of samples of data, and we've been trying to sort of work out, you know, what we're looking at, a bit like sort of feeling feeling an elephant in a dark room. You know, we sort of feel bits of it, and then draw conclusions as to what sort of beast it is. And, and that's become a, you know, a whole industry in itself is, you know, what what data we're gonna use here and how reliable is it, and what's the modeling tool we're gonna use here, and how reliable is is it.
Nick Bromley:And and and these have become accepted norms, you know, both within the public sector, in transport planners, highway engineers, etcetera, but also the consultants who work for them. And, yeah, that's not a bad thing. Yeah. We had we had
Aaron Phethean:to do
Nick Bromley:something, as I said, because of the guesswork we were doing almost pre pre since pre war.
Aaron Phethean:Yeah.
Nick Bromley:But the world's, again, moved on. You know, there's yeah. We've almost fifty years since a lot of this started to change, and now the big change coming again.
Aaron Phethean:And I think, you know, talking about what's coming. You know? I think it's now so at at one time, it was impossible to have the full picture or even process the full picture. Yeah. Even if you had collected it, it would have been impossible.
Aaron Phethean:With AI and the investment in big data processing technologies, cloud infrastructure, it is possible to process it. It is possible to have the complete picture. It might not be available yet, so we might not be collecting, you know, sensing, like, we actually have gathered the the full picture, but it is possible. Does that mean it's desirable? Like, do you see do you think AI and this kind of mass processing, do we need the full picture, or is it kinda good enough to just stick with the and we're making a an analysis of making an informed decision.
Aaron Phethean:We're making a good enough decision. Do we need to have the full picture?
Nick Bromley:Well, of course, that's a sort of catch 22. I I mean, basically, you you need to re benchmark the data we already have out there by building a new model with a new set of data, and then you can see how wide of the mark you are currently. If it I mean, it might turn out that certainly some parts of the network are pretty much optimized, but you might find there are some real disparities. But but I'd also say you don't need to keep doing this. This isn't sort of a a dashboard that you sort of sit there like some godlike being and on a daily basis, re reconfigure all the connections nodes.
Nick Bromley:You know, you, I I think it's a sort of a onetime, and then you let it settle down for another five, ten years. Well, frankly, you know, the public purse can't afford to keep reengineering, certainly, the railways. Mhmm. But the bus networks, you know, there is quite a lot of flexibility and capacity that you could be retweaking fairly frequently.
Aaron Phethean:In episode seven, Muteza and I discussed AI in-depth. Muteza is building an AI platform in a regulated banking environment. Now there are unique challenges to that environment. There's regulators and there's, you know, a need for, guardrails and and data quality systems that, you know, aren't perhaps necessary in, you know, simple product recommendations that that aren't regulated. I really enjoyed this discussion.
Aaron Phethean:And it has a had the opportunity opportunity to share his experience from, previous roles, but also their experience of of what they're building now. Brilliant episode. Really great to get to know Mataza and and talk about the challenges that he faces.
Murtaza Kanchwala:Yes. Initially, there were internal use cases where we created, multiple, AI assistant or AI bots, with the vertical knowledge stores and then some, ability to generate some content if then if they wish to. But then it did go to the customer. So the initial use cases were indirect to the customer. So the internal employees would deal we were engaging with these, bots and then having a human touch with the customer.
Murtaza Kanchwala:And then we started looking at exposing these interactions directly with the customer. That took a while because we wanted to make sure, it's has the right guardrails. It's matured enough. The conversations are right. There's lots and lots of testing you need because every time a response would be different potentially.
Murtaza Kanchwala:We also had to look at, you know, how do we actually store the conversation history because the conversation could get quite longer. Right? And and and those conversation history, then could cost you a lot of money just to have one one one conversation due to the token limits. So, those did happen, but we had to go through a mature cycle to understand this this, use case will work. It doesn't hallucinate.
Murtaza Kanchwala:It has the right guardrails before we actually went into production. And we did go into production, through, for some of the social media channels like WhatsApp, Facebook, and Instagram.
Aaron Phethean:Yeah. Yeah. Yeah. Yeah. I I also found that testing and the guardrails a challenging aspect because it's actually quite counterintuitive to any kind of normal testing.
Aaron Phethean:You know, you very much expect an input to generate an output and and and to test that. Whereas, you know, with Gen AI, you can have an input and an output and then do you test again, same input, different output. And that's just, you know, a concept that's quite hard initially to, like, get your head around, well, a, how to deal with it, but also why that might be the case. And, you know, same with the guardrails, you know, you kind of don't really know initially how to test it and go how to go about making sure that it's doing the right thing. And and we what what did you do in that regard?
Aaron Phethean:Did you do it all internally through your own research team? Did you bring in externals? Like, you know, these are the early days where no one really knew. So, yeah, how did you overcome those challenges?
Murtaza Kanchwala:It was mainly through internal research teams. We did engage, you know, one external research organization, but most of the engineering actually, pretty much % of the engineering we we looked at, internally. And you're you're absolutely right. It's it's it's fairly new. Right?
Aaron Phethean:It's it's fascinating to see how GenAI is growing and and, you know, the hype around it. And one of the things I I kind of wonder often wonder, you know, have we reached the peak already? You know, are we are we, you know, starting to see all the use cases we will see? What's your opinion? Is it is it are there more things it can do in financial services or or in the wider world that that we haven't even dreamt of yet?
Murtaza Kanchwala:I I wouldn't say we've reached the peak as of yet.
Aaron Phethean:Well, that's a wrap. Season one exceeded my expectations. Many more visitors, many more views than I had imagined. So thank you everyone for watching and and being part of our journey. Discussing these different challenges with data leaders was the point, and I'm delighted to say that season two is now underway.
Aaron Phethean:We're recording right now, and there'll be more of that. There'll be more conversations with data leaders just like you who are experiencing challenges, solving problems with people, technology, processes. Right? You know, this is this is all part of what goes into making a data team work or or or or not and deliver that value to companies. Season two guests, we have leaders from AWS.
Aaron Phethean:We have leaders talking about quantum computing. You know, this is one of the special things about data matters, I think, is that we're exploring topics around data and, you know, for me personally, it's a fabulous chance to get to discuss and and hear about topics that I wouldn't otherwise, you know, have an opportunity to. So looking forward to season two. Very much, appreciate your support. Click subscribe so you can see each new episode of series two, when it drops.
Aaron Phethean:Yeah. Thanks very much.