Free Form AI | Transcript: 4 - How to Transfer Projects

4 - How to Transfer Projects

March 30, 2025 / 42:06/E4

Michael Berk (00:00.994)
Welcome back to another episode of Freeform AI. My name is Michael Berk and I do data engineering and machine learning at Databricks and I'm joined by my co-host.

Ben (00:09.116)
Ben Wilson, I modify builds at DataRing.

Michael Berk (00:15.502)
Cool. So today we're talking about transferring code responsibilities. This is something that I'm currently working on, and we do it all the time in our day-to-day jobs when we transfer ownership of something that was built. You might be receiving this code, you might be giving away this code, and it also doesn't have to be code. It can be project ownership, can be team responsibilities, you name it. So let's say we've built a solution.

let's call it a machine learning model. Very on topic for this podcast and we're ready to hand it off. So if you were the team that is handing off this AI model to another organization, what is the checklist that you would go through to ensure that it's in a receivable state?

Ben (01:09.788)
What a great question. Now that we're talking about models, I think a lot of organizations do this, right? They have a data science team that's building a model artifact and they need to transfer that to a machine learning team of either MLEs or software engineers. And they have to do like production hardening of this thing. And there's a difference between, are you just handling the artifact by itself? And then some spec associated with what is the

the inference data that we need in order to get a prediction out of this? Or is it they built something like in a notebook and they want to transfer it to a team to make it into production grade modeling code that can be autonomously retrained? I think there's like

Michael Berk (01:58.542)
Let's limit it to there's a production solution, because doing the MLOps, ML engineering productionization, I think is sort of a different topic.

Ben (02:10.172)
So our use case is hand a notebook off to a team that's going to production harden that.

Michael Berk (02:17.464)
No, I think our use case should be, it's already in production.

Ben (02:22.116)
And you want to transfer ownership to be like, hey, another team is taking responsibility of this. And we're assuming that an MLE team built this solution. Okay. Yeah. Perfect. Hand off checklist.

Michael Berk (02:33.058)
Yeah, exactly.

Ben (02:41.73)
It really depends on.

what the nature of the project is. So do you have your initial project designs that explain what the heck did we build here? Why did we build it? What problem is it solving? That's super important to hand that off to another team so they can understand that before diving into the code to see how it was implemented. And then, like that's one important checklist item. Another item I would say,

What is the state of CI? Like how much test coverage do you have? And this isn't going through and like getting an automated test coverage report and saying, oh, we 70 % test coverage in all modules. Sure, you can use that as a starting point. But those automated tests infrastructure testers, they're not a panacea for every.

scenario and there's plenty of examples where depending on how you architected your code, it's irrelevant to test certain utilities if they're not meant to be public facing. They're just internally used. Just test the thing that's using that to save time. So you need to know, do we have good test coverage here? And also a history associated with that of how stable it's been over time.

Is this something that needs patching every week or is this something that we just have improved it over a period of six months with these 30 features that have been added and we had like 10 bug fixes that we've had to do. That assuming team would want to know that like is what maintenance burden am I bringing on to my team by taking over this code base?

Michael Berk (04:37.272)
Yeah, that makes sense. So project design and then sort of stability metrics slash infrastructure, which would be number of recent issues, and then test coverage.

Ben (04:50.714)
Yeah, and that project design should cover who's using this, why are they using it, how are they using it.

Michael Berk (04:58.592)
Okay, cool. So that makes a lot of sense if your teams have one-to-one skills. My question is, how do you handle it when, and I'm specifically coming from a consulting angle where the person or organization that I'm working with may have a different skill set than the team that I'm from. And so how do you think about knowledge transfer? And it also, I'm sure it happens in the same organization between two engineering teams.

At a very extreme example, one could be a front end team, one could be a back end team. And if you hand off a code component, maybe there aren't the same skills between these two teams. So how do you think about bridging that gap?

Ben (05:44.656)
For the latter suggestion, for software engineering teams, that would probably never happen unless you're all within the same. Like if I work on a team that predominantly does backend engineering, we do have front-end components. We have full stack engineers. Some of us have learned enough front-end stuff to be a little dangerous. Like we can implement some small features, but we're not.

with the exception of the two people on our team that are really good with front end, the rest of us are like, we can fix bugs and like build some UI components. That's about it. We're not going to design a whole new page from the ground up and get that built in a reasonable amount of time. We just don't have the background for that. So if you're talking about

Back end engineers.

Michael Berk (06:36.524)
Yeah, actually, let me make it really specific. I see this a lot. There's a Gen.AI team that's more research focused, and then they hand it off to an ML, like a more traditional ML team.

Ben (06:48.762)
That would probably not work too terribly well. It's all about context. So my team predominantly focused on backend engineering for an open source project or a number of open source projects. We have a certain way of doing development, way of doing maintenance. We have things that we have to think about that are not the same as a team that does similar backend engineering work at our company.

have like a different process associated with it. You could get somebody up to speed on that by sitting down with the team and handing over a bunch of documents, answering questions. That does happen, but it's usually a long process that is, might embed people from the old team into the new team for a quarter, just to make sure that they're bringing every, like ramping everybody up on like, how do we do maintenance on this thing?

How do we check for failures and where in the code would this potentially be where this bug is happening? So that's one way to do that. But you wouldn't do something like across department boundaries. Like if we assumed code from say Spark Core and they're just like, hey, you guys own this entire implementation for how the driver communicates to the workers in Spark.

Could we figure that out? Yeah. Could we maintain that? Sure. How long would that ramp up process take? We're talking, that other team has probably spent years just working on this whole concept and they have so much ingrained knowledge about how does this, like how are features applied to this? What are all the things that we have to think about when we build something? How does this interact with the broader ecosystem of our platform?

It would take.

Ben (08:54.094)
six months to a year to get up to speed on something like that. It's much more effective to split that team out and say, okay, we're taking three engineers, we're gonna hire two new people, those three are gonna cross train and they're gonna own this. And then the other team is gonna hire the replacement people and they'll get kind of brought on board to the other components that they're owning.

So you can't underestimate the power of having embedded people to get you through those, that painful period.

Michael Berk (09:20.632)
Got it.

Michael Berk (09:29.294)
What's a model that you've seen work for that? Like, start off, like, take two people, put them half time in this new team for two months and then a quarter time for another month and then leave? Or fully switch teams for a year, hire someone? Like, what do you recommend in these scenarios?

Ben (09:50.268)
I can tell you how we do it. So we do it when we're building something new, like a new product. We create virtual teams, which is like, hey, we have expertise in these fields that are going to be part of this feature that we're building. We need team members from three different teams in order to make this thing work. But

before you start that project, you've already identified what team is gonna be owning this code going forward. And people from that team have to be a part of that vertical team, that virtual team. So the V team concept works really well because once that code base comes back over to the team that's gonna be owning it, the two or three people that were on that team, they know that code base.

And they're able to cross train other people for maintenance on the team that's inheriting that. That's like the clean way to do it. Kicking something over the wall. I've actually never seen that work out very well. It does happen every once in a while. We're like, Hey, this team built this thing. It's in your domain. Uh, best of luck. And it's kind of painful when you have to spend a little bit more time troubleshooting, diagnosing, debugging.

Michael Berk (10:47.662)
That's super smart.

Cut it.

Ben (11:12.11)
understanding how this thing was built. But with professional software engineers, you're going to figure that out in a short enough period of time. So long as it's within the domain that your team typically works in. If it's something completely foreign, that would never work out. It's just a recipe for like terrible customer service and support. And you might add features to that code base without fully understanding everything that works. You could introduce all sorts of regressions.

It's bad, bad time.

Michael Berk (11:44.942)
So, a follow-up question to that. How do you think about splitting offline versus institutional slash tribal knowledge?

Ben (11:56.164)
I think that if you're transferring something to somebody else and you're not going to be involved in maintaining that, everything has to be documented with context about like why the decisions were made to implement things in certain ways. That doesn't mean put comments in code. that's super annoying to read through. it's more like here's a document that I've crafted that explains the decisions about why this was made, how it works.

to give people a primer before they start reading the to understand why things were built the way they were.

Michael Berk (12:35.426)
Yeah. Okay. That's super interesting. I've noticed at Databricks specifically, but also in a lot of customers, there's a big difference in what people think about when they think documentation. So there's a pipeline that I'm currently working on. We're doing a production rollout on Monday. The dev rollout went great, but everybody's like fingers crossed for production. And the reason everybody's fingers crossed is six different people have worked on the same pipeline.

There's legacy code in there that cannot be called, and there's no documentation of why stuff was done. There's a lot of what, like explaining a potentially complex implementation. There's docstrings, but there's no clear understanding of like why a cluster was selected or something like that. And so when we make changes, we're not quite sure if stuff is going to break. We now have to understand the full problem end to end. The pipeline isn't crazy. Like we can do that in an hour or whatever.

But for bigger systems, that's really, really dangerous. So I really strongly echo that decision context should be really well documented. I would argue more than like what, it's the why is the important.

Ben (13:46.812)
Exactly. Uh, we don't do the what internally ever. There's no documentation of like, sometimes it's just a high level, like the project plan, like what is this thing? Why does it exist? That's important. implementation details about what was built. That's just the code. You should be able to read the code and understand what the heck is going on.

Hopefully it's written in a way that's intelligible and you don't need a whole bunch of tribal knowledge to understand, that's why this system was used or this, like this library was used for this because X, Y, If you need to explain something like that, where a reasonable person coming in and looking at the code would have a question about that, like why did they not use this library, which makes this a super simple thing.

Why do they have to implement their own version of this? That should be notes in code. A warning to future developers. Hey, we tried using this. This doesn't work because of this. So we had to do this.

So then people reading that will be like, okay, thanks for the context. No problems here. Hope I don't have to maintain that or hope I don't have to change that.

Michael Berk (15:08.142)
Wait, this is clicking. Yeah. Fingers crossed. But yeah, no, this is clicking super hard for me. So the what is the code? You should just write good code. Make it readable, self-documenting, et cetera. The thing that is not always evident from good code is the why. That needs an external document. Are there other things, I mean, if we do when, what, who, how, where, like those verb questions or whatever?

Part of speech that is Are there other things that you think need to be documented because we've covered the why we've covered the what? Who kind of irrelevant?

Ben (15:47.184)
Who is in your Git commit history?

Michael Berk (15:48.748)
Yeah. when? Can't commit history. How is what? Yeah.

Ben (15:52.006)
Get commits.

Ben (15:56.88)
How is the implementation itself? Like, how did we build this?

Michael Berk (16:02.51)
That's actually an interesting point. Like theoretically, dev practices would be shared within the organization. But for me as a consultant, that is not always the case. So often we do have to educate on the how, like what is a sprint? What is a PR? How do you review? How do you make comments nice and not hurt people's feelings? How do you be very direct with your feedback? What do you nitpick? What do you not? So I think that's actually something that in the field we struggle with a ton.

which is how to communicate the how. How did you do it when you were in the field?

Ben (16:38.536)
that was always through discussion sessions with people. if I need to impart things that I've learned, I wouldn't, it's not in a dictatorial fashion. It's more of, Hey everybody. I'm not saying this is the best way to do something. I'm just saying this has worked for me in the past. And then we can talk about things that hasn't, that haven't worked and imparting that knowledge from this.

gained through experience.

that's always resonated very well. And maybe it's after we've done the implementation, we have an extra day. How do you spend that day? Do you just sit around and talk about stories or you're like, Hey, I'm done a day early. I'm peace. out. I'm going to go for a six hour lunch. I would never do that. I've always been like, Hey, let's, since let's have like a recap of this project and

Sometimes questions like that come up from that dev team. They're like, well, why did you, why did you split your PRs this way? Like you couldn't test this end to end. Why did you do this one thing and then like write some unit tests on that? And then your next PR comes after that using that functionality in the next phase of something. I'm like, well, otherwise I'm just doing waterfall development. And here's the

a 50,000 line PR, are you going to read that? Are you going to understand all the nuance in there? It's just too much information for the human mind to comprehend. So giving that level of feedback to them, saying, here's how I do it because my brain doesn't work like a computer. So this is how I limit the amount of information overload I'm going to be exposing to myself.

Michael Berk (18:35.83)
And also that virtual team concept blends to this really, really well, where you show them via taking some tickets and then delegate other work and then review. So that's a great way to get the how adapted as well.

Ben (18:48.602)
Mm-hmm. Yeah, you got to chunk up work for anything that you're doing. Make sure that the right people are working on the right things. Like, where does your... If you understand this part of this project, you should be the one building that. You don't take somebody who has no context and ask them to build something that they don't really understand. It's just going to be like super frustrating for them. going to learn stuff, but it's just going to waste a lot of time.

and just burn out developers.

Michael Berk (19:23.928)
problem that I'm currently facing is I've been working with very small organizations.

Michael Berk (19:30.926)
services groups that are bigger than their entire dev team of like we have 10 engineers working on this at once. So how the hell do you transfer them?

Ben (19:42.946)
You need a liaison with that team and you should be embedded with them. So it's not, Hey, we're coming in and we're just going to build this for you and we're going to hand it over at the end and it'll work. We have all the testing set up and we know that this is going to work. They're not going to be able to maintain that. They're going to look at this and be like, yeah, thanks, I guess. But how do we change stuff in this without breaking everything? It's now like that handoff and how it's.

typically handled through consulting. If they're, if you're not embedded in working with them every day and having them do some work and then them, they should be reviewing every PR, not because they're necessarily going to be providing expert guidance on implementation problems. It's more like they're just becoming familiar with that code. They're understanding what, what is going on there in that, that whole chain of PR reviews.

They might be asking questions that are irrelevant. It's fine. Don't make them feel dumb. Make them feel included. And in the process of...

them looking at that back and forth that happens between the tech lead or more senior engineers versus the more junior engineers, they'll start to understand and just absorb like, these are the things that I should be thinking about. I saw the first commit that they did. And then the tech lead said, no, we shouldn't do this. We should do this instead. You get context for like why things are built the way they are. And you can kind of learn from that.

Michael Berk (21:27.086)
Yeah, I'm worried that the team might even be so small that they can't. Like even if we did perfect knowledge transfer, one person couldn't extend and maintain, which is not true, but yeah. Okay, food for thought. Other question. Let's say I, the team that has built the solution, has built an abomination. What do I do?

Ben (21:49.628)
Do not transfer that until you fix it. This happens all the time.

I wouldn't say it's particularly common at Databricks that this happens because there is a, you know, a tech lead review of the assuming team that will look through that and be like, yeah, we're not taking ownership of this until it's fixed in these ways. Like we don't have any context about why this was built this way. So you're effectively do your, like your own PR review on a code base and it might take several days to go through it all.

and you'd have this huge list of questions that you'd ask. And we wouldn't do it in an antagonistic way. We would just say, hey, I have questions about this just so that our team understands how this works. Can we have a meeting where we discuss why this works the way that it does? And in that meeting, you're just taking notes for the rest of the team to read later on so that everybody has a point of reference.

And then you discuss it as a team together and be like, does this make sense? yeah, now that makes sense. Okay. Yeah, we're good. You know, it's important to have that. But if you're, if you're like doing consulting work and you've built something, you're like, Hey, everything works. And then we're all green on CI. We ran it in staging. We're getting everything flowing through the way that we want this, this application works.

and you don't have any documentation about why you did it the way you did. Nobody on that team was involved in that build process at all. And you just hand over a solution like, hey, your job's configured, good to go. We're done. Consulting job checklist mark. And you walk away. You could be setting that team up for not being able to maintain that.

Michael Berk (23:49.438)
my god, yeah. I've done that so many times, unfortunately. You try not to, but there's a lot of reasons why that can happen. But question to you, how do you know that there has been a successful handoff? And the proxy that I've been using, I went and checked back with one of my old projects, and I had spent so much fricking energy and time doing knowledge transfer.

like really gave it my all. And throughout the entire project, the team submitted zero pull requests to the repository. I built 100 % of it. And there was two resources that were supposed to be working alongside me. And I just checked again. And we literally did like an on-site where we had four PRs partially ready, and they just had to review and merge them. They're still not merged. There's not been a single commit since I left.

So that's a pretty clear indication that knowledge transfer was not successful. Or, well, yeah, potentially. That's a clear indication that they have not taken ownership, regardless of... Like, the point is not to point fingers or anything like that. It's just that the result was not achieved. Like, I wanted them to be able to build and maintain the solution, and they're not building and maintaining. So...

Yeah, how do you think about evaluating whether a team is able to do that prior to seeing them do it?

Ben (25:26.428)
So my question, and I've done them when I was doing consulting at Databricks, I would do this sometimes without charging the account. I know that's a bad thing, but it'd be like, hey, we're six weeks out from when I left. Let's have like an hour long meeting, just a virtual call and want to see how you guys are doing. And they always sign up for that. They're always like, yeah, let's do this. This is awesome. And my first question would be like, hey, let's...

Michael Berk (25:35.886)
Mm-hmm.

Ben (25:56.742)
Let's look at the last two weeks of runs on this thing that was supposed to be running every day.

And locally on our platform, you can just go into like, the jobs API dashboard. can see is great visual representation of success or failure or like what was going on. And if I see just a sea of green, then I go and say, what other features have you built into it? And sometimes they'll be like, yeah, we've merged like 30 changes. We built this new functionality that we're like when you were here.

Remember we were talking like, maybe we'll do this in the future. Yeah, we did that. And you want to see the code. like, no, I'm good. It's, it's passing. Everything's good. Do you have any questions for me? Sort of thing. But I've also done the six week later and you show up and there's like, you know, four days of just red and then it's just green again. And I instantly know they fixed a problem. They figured it out.

and they recovered the job, sweet. They own this code now and they fully understand what's going on. I don't need to press them for anything. But I've also seen it was green for five weeks and then it just has been red ever since. And they're kind of panicked and they're like, do you have some time to go through this with me? And that's the ones that I knew. I was like, okay, they don't, they.

They were faking that they understood what was going on. And those are the ones, those are like my early day ones where I was like, I need to change my approach here. This isn't working. I can't just deliver code. I need to make sure they understand what's going on.

Michael Berk (27:44.79)
What are the predictive signals for whether they will be able to build and maintain prior to that six week check-in?

Ben (27:52.24)
I mean, you can test them. And I did a number of times with some teams, write intentionally, introduce a bug into the code, like a week before I was going to leave from the project. And it would like break CI, things would be all messed up. And do they recover that in like before the next run or like something that's in staging, you know, do that in prod. but something that's like, Hey, we're in development mode right now. We're doing some testing.

Michael Berk (27:56.206)
Hmm.

Michael Berk (28:00.014)
Ha.

Michael Berk (28:16.334)
Mm-hmm.

Ben (28:22.208)
And just check in something that I know is going to blow it up in kind of an obvious way. And then see if they can figure that out, revert that commit. If they know how to properly do that with Git, or do they just go in and file like another PR that just deletes that one line that was creating an issue and then leave some comment or something, or are they hostile with me? They're like, why did you commit that? Like, I was just testing you guys.

They're like, really? Like, yeah, look at the PR. Like, it's obviously intentional. And they're like, can you not do that anymore? We understand how to fix things. So sometimes you get that response and I'm like, sorry, sorry, I just needed to make sure that you understood how to troubleshoot this. And other times it was more, they could not figure it out and they just started panicking and, you know, log into my computer Monday morning and I just have like tons of Slack messages and emails and

Michael Berk (29:02.53)
Hmm.

Ben (29:21.658)
request to jump on a video call immediately. They were like, okay, they didn't understand this. And I'd use that as an opportunity to teach them how to troubleshoot something.

Michael Berk (29:35.256)
Got it. That's really smart, yeah. I can see how people would be pissed as hell though. That's funny.

Ben (29:40.186)
Yeah. I mean, if you're going to do that, by the way, you always let the manager or the director know you're going to do that beforehand. You're like, hey, I'm going to test your team. Are you cool with this? And every single time I've asked that, they're like, hell yeah, do it. Put a bug in the code. I want to make sure that they can figure this out.

Michael Berk (29:49.282)
Mm-hmm.

Michael Berk (30:01.72)
That's great. Yeah.

Ben (30:03.622)
You don't just do that on your own though. Cause that's how you get a very painful conversation with someone who's executive at the client. Who's like, why are you building bad code for me?

Michael Berk (30:15.01)
Yeah.

Ben (30:15.91)
Do you think this is a game sort of thing? But there were a handful of times where I had like managers or directors just try to say like, don't do that to my people. I'm like, okay, I won't.

Michael Berk (30:18.424)
Kinda.

Michael Berk (30:29.858)
That's weird though, but...

Michael Berk (30:35.394)
company's different culture. And I guess if the team is already really, really busy, like that would probably be the excuse from a manager, right? They would say, my team doesn't have time to fix a intentional bug.

Ben (30:47.802)
Yeah, people that would usually refuse something like that would never give a reason. They would just, I was like, okay, this person is just hoping for magic, that this thing never fails. So they probably never built anything themselves and never had to maintain anything.

Michael Berk (30:52.664)
Got it.

Michael Berk (31:00.29)
Yeah.

Michael Berk (31:05.45)
Yeah, that makes sense. Is there a difference in how you approach knowledge transfer as it relates to maintaining versus extending a feature?

Ben (31:16.642)
for sure. If you're going to hand off something that is... When you build something, you should know, like, is this something that is going to be under active development? Like, did we just build an MVP here that we know we need these other 30, 40, 50 features to make this actually good? If you're going to transfer something like that, you need to...

approach the knowledge transfer from more of a design perspective so that they have context on... It's not giving implementation details that's kind of insulting to most software engineers. It's more like, hey, we've had these feature requests and here's kind of how I would approach it. Just in very abstract terms, like, I probably had this module here and it would like have these...

these API contracts associated with them. But if you're handing off something that is like, Hey, we built a CLI tool that does these things and it's within your wheelhouse for you to maintain stuff like this.

It was designed to do this one thing. It does this one thing. Are you going to have to maintain and change it? Probably. Because underlying systems change all the time. But you might look at that code base and see, oh, it's had like 50 lines of code changed in the last year. Because it just works. And it's doing this one isolated thing. So in that case, you wouldn't do this full like.

You would just give the document of like, is what we designed and what this thing is supposed to do. Here's how people would use it. And here's the code. Have any questions? Let me know.

Michael Berk (33:09.272)
Got it. OK, cool. That makes sense. And then we're going to keep this episode short and sweet. But I wanted to conclude with a very common topic, which is upskilling for generative AI. It's just a fast moving industry. A lot of people have schooling that does not relate to generative AI. And typically, ML teams inherit these responsibilities. you really know XGBoost, but do you know what an LLM is and how it works?

Let's just take turns maybe shouting out three tips each about upscaling for Gen.ai specifically as it relates to maintaining and extending code. And then also specifically as it relates to having a traditional ML background or more of a software engineering.

Michael Berk (33:57.56)
Sound good?

Ben (33:58.234)
Mm-hmm. You go first.

Michael Berk (33:59.95)
You want to kick it off? All right. My first tip is, Gen.AI is a lot more like software engineering than traditional ML. The way that I've seen traditional ML skill sets leveraged really effectively is on the eval side. So really thinking critically about how to A, create an eval set, B, slice and dice, C, understand whether it's going to be representative and generalized to other use cases.

That's where traditional ML training comes into play. But beyond that, it's sort of like software. You just stitch together a bunch of things, try it, hopefully it works.

Ben (34:40.828)
Yeah, and mine would be the counterpoint to what you said. GEN.AI is nothing like software engineering for one point and one point only. The code interfaces are very similar. Like that's why most GEN.AI development done today, GEN.AI application development are done by software engineers at most companies because it's interfacing with REST APIs. It's building

Michael Berk (34:50.538)
love it.

Ben (35:10.488)
know, client interfaces to an application. It's deploying an application so that it can be hosted by like a web service. This is all traditional software engineering things. most software engineers are pretty darn good at all of that stuff because we do it so often. But the one thing that's different is testing. So when you're...

It's not different in the architecture of testing. Whether you're interfacing with an RDBMS database that you're getting data from in order to verify that your computation of within your functions that you're creating is doing the right thing and returning the expected data structure. That's identical. That's software engineering unit testing. The difference is integration testing. So whenever you put ML into an application, whether it's traditional

deep learning, or gen AI, you're now are dealing with non deterministic behavior. You're not guaranteed to get the same output every time that you call this thing. And that's the thing that the software engineering folks who are, they're kind of struggling a little bit if they don't have that background of like ML, historically dealing with that. So they end up writing an integration test.

I've seen a few that people have written where they're actually calling out to a service and they're like, well, how do I verify that this thing is actually doing what it's supposed to do? cause every time I call this thing, I get different results. And actually, if you look back, it's a fun fact. You look back in our early, end to end tests for ML flow, when we built our first gen AI integration, which was the transformers library.

Michael Berk (37:04.046)
I'm fat.

Ben (37:04.494)
our first iterations of tests had stuff like, hey, I have this BERT model that I need to test whether it's able to do like text summarization properly. So we had an integration test where we were loading BERT up, logged it to MLflow, loaded it back, and then inferred based on data. And we sent a blob of text at this thing. And because of the nature of, simplistic nature of BERT, it was

I think it was like six weeks before we got a test failure because it was always summarizing it the way that we expected it to. And then somebody bumped the version of the artifact, like created a new version of that BERT model on Hugging Face. And all of sudden that test failed. We're like, huh, maybe we should not be doing this. Let's just make sure that it's returning like any text and that it's of this like certain length.

Michael Berk (37:41.432)
Wow.

Ben (38:04.06)
So we had to rewrite hold on a test because we're now dealing with this non-deterministic system and we can't just say, does it contain the word whatever, apples? Because when it's generating text, it can rephrase things to such a degree that it uses a synonym for that or something.

Michael Berk (38:17.271)
Am I not?

Michael Berk (38:26.016)
Damn, that's a really good call out. Yeah. I'll piggyback on that. The lack of determinism means that you should try to add guardrails to your implementations. So.

I think the best example is fail saves. if like steps, let's say we have a chain, for instance, a lang chain chain or a lang graph graph. There are individual steps that may be a bit brittle. And so if you can have a fail safe where if the output is null or if the output just isn't meeting a quality standard, you go back to something more deterministic so the entire chain can proceed. That's really, really valuable because

with these complex systems, and if a component fails 1 % of the time, then you have six of those components, well, that's actually a lot of failures. So ensuring that there's an alternative, more reliable, less robust route, like less semantically bridge or whatever route, that's a really great way to create stable software.

Ben (39:32.016)
Yes, 100%. And that like guardrails.

That's like the ultimate rabbit hole when you really start getting into that. I know that a bunch of systems that I've seen people like, I'm fine tuning my own model and I'm just going to have like a system prompt that instructs people, instructs this LLM to not respond about these particular topics. And then you get some a-hole who's got a little too much time and creativity on their hands. Who are like, I wonder if I could prompt hijack this thing.

and get it to like bypass that safeguard. And yeah, you give it, if you're good at doing that type of stuff, you can get it to respond and bypass that security feature. So it's all about like adding in post-processing. Like did it return a result that contains this sort of topic? Do you need to have another, you know, LLM that's like checking the state of this to determine is this safe or not? And then thinking about what is the latency of that?

How much of an impact is that going to be to my end product?

Michael Berk (40:41.634)
Yeah, that makes sense.

Ben (40:42.652)
a lot of things to think about when pushing something out there that the general public can interface with versus something that's internal at your own company.

Michael Berk (40:54.542)
And yeah, let's end on one more from you.

Ben (41:02.48)
the big differences in gen AI.

Ben (41:08.156)
I'd say the biggest difference for people to wrap their heads around is how fast the space is moving. I was on a call yesterday where somebody was making that exact comment. Like, hey, you guys are changing these APIs to this new thing. OpenAI just released the request API. When are you going to support that? And then what are you going to do with this other thing that's kind of getting replaced by that? And is this ever going to become stable? I'm like, well, uh.

Yeah, we'll be supporting that soon. And I don't know what to tell you about the speed, the velocity of this entire ecosystem. Everybody's trying to move fast, faster than anybody's tried to move before with anything with software. things are going to become deprecated and become irrelevant very quickly. So it's all about shifting your mindset. If you're to be shipping something to production,

do not expect that that's going to be relevant a year from now. Approach problem, like approach this entire development life cycle in a new way. Like people need to do that paradigm shift of, well, I shouldn't build this like, approach this in a way that I was approaching something that I built two years ago, which is like, this is going to work for years and I need to harden this and spend all this extra time making sure the solution is like perfect. It's more like,

Let's stick with the MVP philosophy here because we might have to pop smoke and rewrite this with this new paradigm that's showing up. So getting something that's good enough to get out there, iterating on it, maybe completely redoing it from scratch, keep it as simple as possible so that you can rapidly do that.

Michael Berk (42:58.798)
Heard? Yeah, well said. That made me start to ask, well, what are the consistent skills that will remain for the next five years for Gen.ai? But that's a whole other episode. Yeah. True. Well, anyway, we're on time. I will summarize. Today we talked about thinking about transferring code bases effectively or AI models or projects.

Ben (43:10.82)
Learn how to write and read code really quickly.

Michael Berk (43:28.75)
The transfer checklist that we typically look to do for a software project is first a project design and then second ensure that your test coverage and stability is at least documented and ideally pretty good. When you're transferring skills and the skills gap is too big, you can create a virtual team comprised of subject matter experts and the team that will soon be owning this project. Don't just kick it over the wall. Have collaboration.

And then finally, in terms of knowledge transfer, we broke it down into some hopefully digestible sub bullets. The first is thinking about why. Why is often not evident from the implementation. And so you need a design slash decision document. Next is what, and that is the implementation itself. If you just write readable, well-documented code, hopefully that'll be sufficient for the new team to maintain and build. And then if you have specific development practices that work really well for your organization, the how.

Try to collaborate. Try to have online brainstorms, or even have an on-site where you get together in person and triage issues and then show how it should be done. So that's it. Anything else? All right. Well, until next time, it's been Michael and my co-host. And have a good day, everyone.

Ben (44:40.134)
Yep. No? Good summary.

Ben Wilson. We'll catch you next time.

Creators and Guests

Producer

David Del Grande

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere