Knowledge Sharing, Culture and the Future of Cloud + AI (E.21)

Michael Berk (00:00)
Welcome back to another episode of Freeform AI. My name is Michael and I do data engineering and machine learning at Databricks. I'm joined by my cohost. His name is Ben Wilson. He no longer writes docs anymore and now he codes again.

Ben (00:14)
Yeah, finally getting back into doing some backend engineering, which is nice.

Michael Berk (00:19)
Hell yeah, as, you should be. ⁓ for context, we just had our annual data and AI summit at Databricks and that sprint is always hell. And now that it's over, we can go back to doing real work. That's fun and paste normally, ⁓ myself included. So today we are joined by Alexander. ⁓ he studied computer science in school. And then after working a couple of product roles, he joined IBM doing sales, then worked at VMware doing systems engineering.

Ben (00:35)
Mm-hmm.

Michael Berk (00:47)
And most recently he worked at AWS as a solution architect and specialist solution architect. His current role is he is head of DevRel at Nebius. And so Alexander, to kick it off, I'm curious. I've heard a lot of DevRel journeys. Typically you start in engineering and then you're like, I don't want to do that anymore and move over to whatever DevRel is. So A, what even is DevRel and why'd you make that switch?

Aleksandr (01:11)
That's a great question and interesting and it's a tricky question because I'm I'm not only the head of dev rel actually my main role is the head of the product and my second role is a head of dev rel Just because I like to do that Yeah, and about the dev rel you correctly said, you know, it's It's about like, you know, sometimes you feel that okay, I know a lot

Michael Berk (01:27)
man.

Aleksandr (01:38)
I want to share with the people and honestly, it's really difficult to be engineer like daily engineering of things, writing a code and teaching people. So that's why in some moment people say, okay, so ⁓ now I want to focus on sharing. actually want to learn. I want to take a look on the many, many different things, then summarize it, combine and help people to really quickly understand that. So usually what

At least what I think usually drives people to switch to the dev rels. And again, switching to dev rels, it's not about not doing engineering anymore. It's more about you do engineering for a really big community. You want to reach to many people. So it actually brings much more pressure on you because from that moment, your code is not...

hidden somewhere like inside of the company private repository. Now everyone could see that. And if you do not a good code, everyone could see that as well.

Michael Berk (02:35)
Yeah.

Aleksandr (02:35)
Yeah, but

I think mainly people just want to share knowledge and this is cool.

Michael Berk (02:40)
Yeah, so DevRel typically stands for Developer Relations, and it can involve writing blogs, doing talks, making tutorials, that type of thing. As head of DevRel, are you just delegating those tasks to other people and steering strategy? Or what does your day-to-day actually look like from a DevRel perspective?

Aleksandr (02:57)
So from a dev world perspective, it's...

Let's say from the, while I'm at DevRel, ⁓ actually just, you know, we recently started to find people who we were looking for, but before that there was no people. So that was like not really a head until I would just make it like a second job as a DevRel. ⁓ Like a daily activity is just, can just looking what's going on interesting. Like for example, you can find...

Because in Anibius, we're a lot of different things. We're building data centers. We're building hardware. So we actually design our own servers. We're producing them. We're verifying them. Then we're writing our own software. And we're actually doing a lot of AI research. And that's why by just talking to people around you, you can find so many interesting things that no one even knows that they exist. Like, for example,

from the hardware, which I learned a couple of months ago. Back in the days when I was building the computer servers from the... Usually when you get a server, it's just a bunch of the box where you start to assemble everything. And you have a lot of cords to connect disk, power inside, like buses. And those actually...

when you have a course to connect something internally, they could lose a connection just because of the vibration from the fans. And like before, before Gen.EI, before AI, before GPUs, when it was all about the CPU, those fans were not really ⁓ fast, not really powerful. But now when you actually need to move so much air to remove so much heat from the servers, those fans become really powerful.

Ben (04:25)
Hmm.

Aleksandr (04:45)
and they make a new vibration. And because of that vibration, you actually could lose a connection of the cables. And then imagine that you have like, I know, 8,000 GPU cluster, which is training new state of the art model. And you lose connection of one cable and two servers. And from that moment, the entire cluster is stopping because you lost the server. You could not really continue the training and you lost the progress until, you know, you need to go back to checkpoint. You need to find what's going on.

repair it. So that's what, for example, in an e-bius we replace that. So we do have quarts at all. We replace like all baseboard to baseboard connections. And that's kind of interesting knowledge that as a dev rel you can share with the people because it's just a fact. Sometimes it's not about fun facts. Sometimes it's about ⁓ how to get better performance. Like, for example, like in a way we've been doing a lot of testing.

to find what's the best way how to prepare servers, operating system, drivers, and many different things. And for example, we found recently that ⁓ CPU pinning when you actually put specific process on a specific core could actually really help to get maximum utilization from your cluster. So as a devil, you can learn all... You're just talking to people internally.

you find a lot of interesting information and you can build the content to help people achieve the same without going through the same learning curve.

Michael Berk (06:16)
super cool.

Aleksandr (06:16)
And

as a head of the DevRel, it's more ⁓ about building the process. doing one blog is nice, but you need to repeat it. To repeat it, you need to find patterns, you need to understand how to better tell the story. Because sometimes people have great content, they just

don't know how to do that. don't know frameworks about storytelling. So as a head of DevRel, it's about you need to build a process, you need to help your team find the resources to edit video, to launch new big training, or just help them to find the best way to reach to the audience.

Michael Berk (06:59)
And I bet as a product leader as well, that curiosity driven exploration is really helpful to get to lay the land, explore new things. So do you find there's actually overlap in your roles and they reinforce each other?

Aleksandr (07:10)
I think so. I think so. So that's why in my case, it just became naturally. ⁓ When you build a product, you know, when I, I never been the product manager before, in AWS, I was solution architect building. I was specially solution architect dedicated for AI. So was helping big clients to do something with the AI from AWS perspective. And I was looking like, okay, I want to start building products. I think I have like idea. I want to want to build.

And I asked my friend, what does it mean to be a product? That time he was a product director and I asked him, like, what does it mean to be a product manager? How could you become a product manager? And he told me a really simple thing, that the only characteristic of yourself that you need to have is to love your product. That's the only thing that you need to become a product manager. So, and then when you love a product, which you build it,

You want to tell people about it. You want to help people understand how to use it, to see that from different angles that you see that. So that's why when I joined the Nebus, I really fell in love with the Nebus. And I just want to talk about it. And I started talking different conferences, sharing how we're building things, how could they achieve the same things, sharing knowledge and it becomes a devil as well.

So that's why I think products and DevRel, they're really close because you can tell about the product if not the product manager actually put his soul inside of the product.

Michael Berk (08:41)
And as TL of open source at Databricks, do you love MLflow?

Ben (08:45)
you have to. You have to care about it a lot. if you had a group of people maintaining a successful open source package that didn't believe in it and didn't want to see it get better, it would just eventually become shelf-ware. Nobody would care. You could almost see the features that would be released and the level of maintenance that would

be part of that, just decline to the point where people would be like, yeah, this sucks. I'm not using it anymore. Somebody else will come in and build something better because they actually love it and care about it. I'm really interested, Alexander, on the transition. It kind of gave me a flashback to early in my career when I went from ⁓ the inverse of what you've done. So I used to be a hardware person who moved into software.

And you've started out as software moving into a hybrid of software and hardware. You know, your company manages ludicrously sized data centers that you build yourself. How has that journey been and have you found it exciting learning about all of the data center stuff and about like how these things are actually built?

Aleksandr (09:52)
I couldn't say that I actually moved from the software to hardware. My main, let's say, role, my main first big job was actually in IBM and it was a power system engineer. It's a time when you have this full racks of size servers who's running big oracles.

Ben (10:06)
⁓ nice.

Mm-hmm. Yeah.

Aleksandr (10:15)
business critical for banks and different areas. So actually I really loved hardware. ⁓ And after IBM I jumped to VMware where it actually became only software, no hardware. But I really love to come back to hardware. In EBSI, not really fully attached to hardware. So while we do a lot of things with the data center, with the servers,

Ben (10:22)
Mm.

Aleksandr (10:43)
My area of responsibility is more in the software. For the data center and the service, we have different teams. ⁓ And those are amazing people. know, amount of knowledge that they have, amount of tasks that they could achieve. Incredible because, you know, people don't really think about the hardware. ⁓

Ben (10:48)
Done.

Aleksandr (11:06)
I know that when I was in AWS, I was also learning a lot about hardware. Well, actually I was from the cloud, but it's amazing how ⁓ companies and why companies, different companies decided to not buy servers in the market, they decided to build them. And usually, they're asking like, why, you why you want to do that? That's not your business. And in reality, it's actually, it's like a base behind your business. So if you have not...

really high reliable and performant hardware. So you cannot really build a cloud. And one of the ways to control it, you actually go to the hardware. And that's a world where you can learn about electricity, about circuits, about thermal packaging. ⁓ We even have a ⁓ thermal booth where we are testing our service. So when you design the server,

You need to understand like will it be possible to operate like what will be the temperature which it would operate in our case For example our source could operate in the 40 degrees and they're fine. They were even with the GPU So it's also a lot of the you know research what was done but I really enjoying talking to to the data center and to the to hardware team because that's absolutely new world, you when you especially when you see how they're doing that and

In our office in Amsterdam, that's a place where we have R &D, hardware. They have all those baseboards, all those things just on the table. You can just enter the room and you can see everything. It's like a flashback when I... Back in the days, I was assembling the servers, touching all those things. And it's the same. And you can see how people actually designing it. So how they calculate...

Ben (12:50)
Mm-hmm.

Aleksandr (12:56)
how they could design baseboards, chips, put all connectors, know, resistance. And it's incredible. From that moment, you can understand, okay, so while writing software, you're actually just writing software. And nowadays a lot of the code assistant could help you to do that. But here it's like, you do really something by your hands. Yeah, so I think it's amazing to...

come back partially to hardware and see how actually it's crawling, how it's boring, how it's built.

Ben (13:29)
Yeah, your mention of vibration issues brings me back to when I was at Samsung and we had our first Hadoop cluster that we were building. This is before you could just buy it off the shelf. This is before Cloudera was a thing and Hortonworks. So IT just gave us, like, hey, here's this full stack, you know, server, do whatever you want. Here's the hardware room.

And we're looking and we're like, hang on. So you just gave us like the oldest full stack that you're going to throw in the dumpster. And when we go to the hardware room, half of it, we're not allowed to go over, which is all like the production stuff that's used for like the main data center. So we had all these, this components just in bins everywhere that were all things that they had pulled out. They're like retired older stuff, but it was so fun for us to dig around and, and like, try to find stuff that would work.

And then try it out. like, well, ⁓ we actually had the same problem, but it was a, one of the automated robots that was two floors up every time it did a, it was a cycle that happened every day where it, it like docked into a bay where it would go and get recharged and go through like a maintenance routine. Every time that thing went into that, that equipment, created a vibration that went down and we would have random.

fiber cables just fall out of the pack of stuff. we're like, oh, the Hadoop cluster crashed again.

Michael Berk (14:57)
man.

Ben (14:57)
So there's

like interesting stuff that you find about hardware while you're sitting there just trying to write code. You're like, I've to go into the room and connect stuff back up again.

Michael Berk (15:05)
plug it back

in. You'd think with a semiconductor manufacturing lab, like they would have vibrations down, but I guess they're different buildings.

Ben (15:12)
No.

one really big building, but all of the stuff in the production ⁓ set up, they had all those things solved. And we ended up actually talking to them. like, how do you guys solve this problem? They're like, you didn't use these connectors? Like, let me show you. And they hooked it up for us. We're like, thank you.

Michael Berk (15:30)
As someone who knows literally nothing about this, how often do you have to modernize the hardware and swap out like a new physical system?

Aleksandr (15:37)
⁓ that's an interesting question. So there are multiple aspects that you need to think about. first of all, the market is continuously moving. The new GPUs, in our case, we are, the new GPUs is actually the cloud, which is focused on AI, only on AI. So we're building a lot of things to help people build.

whatever they need, starting from the hardware, virtualization, we have our own hypervisor, the Kubernetes slorum, till the APIs where you can just get models and take your token. So it's like a full stack cloud for AI. As you can imagine, to move fast, the frontier labs, the clients, everyone wants the latest, fastest the hardware.

The new hardware is continuously released. Last year, people were talking about Grace Hoever ⁓ H100, H200, GPUs from Nvidia. This year, everyone already talking about Blackwell, B200, GB200. And while not everyone already released it in the production, we already see the next generation, Vera Rubin. That's supposed to be next year.

So it's continuous process, it's not only about it, but it's not about you always need to run and immediately throw away the previous hardware. When you do a planning, you think about how many years you actually want to use the hardware. So for how many years you will put it in a data center. Just when on the some moment you just stop buying new hardware, like not buying the same GPUs as you already have, you're just starting to...

by the new GPUs and looking how to use the previous hardware. And the truth is that ⁓ there are still a lot of...

Michael Berk (17:29)
But do you recycle

it, like put it in a bin and get the metals out or do you just use it for a less high latency use case or low latency use case? Like what's the

Aleksandr (17:37)
Sometimes,

let's say some hardware, depending on how old it is, like for example, when I joined Nebius, we had V100, but now we don't have it, so we had to recycle them just because it becomes too old, like people were not looking for it. But for example, H100, which we have a lot in data centers, we have almost

everything we have L40s that's a small we have H100, H200, B200, GB200 so we have all of them but for example H100 which is quite old let's say nowadays ⁓ it's still good enough it's not that old it's still you can get a lot of things it's even now there are a lot of use cases where it's too big like for example if you need just an Olamo 70B

like 8 billion model or something smaller like 2 billion model. It's already too big card. You don't need that amount of memory. You're probably looking for something smaller like L40S. And if we make a step back from Gen.Eye to the machine learning which actually helped a lot of people like in the forecasting with the...

I know with the prediction, with the classification, they don't need, I think, GPUs at all. mean, like maybe the latest, like, know, the latest forecasting models like Amazone Chronos, they need GPUs just because they're big, they're smart, because they're based on the transformers. But something like previous, like, I know, random cut forest algorithms, they're running fun on the CPUs as well.

So that's why even if you have some older cards, you can still find use cases for them. You can still use them and people who are still looking for them.

Michael Berk (19:31)
Ben, why has Databricks not built data centers?

Ben (19:31)
you

Uh, it's funny that you bring that up. Uh, early days of data rigs that was actually on the table. You'd like, Hey, let's build our own hardware and, set all this up. And, uh, a very heated discussion resulted among the founders resulted in them saying, no, we're just going to leverage the cloud. This is early days of cloud. Like AWS was the only one on the, on the market to speak of. And.

It was a very different world than the one that we live in now. So this is 12 years ago, 13 years ago. And yeah, they just, they went on all in on saying, we'll let other people manage the hardware for us and we'll just build a software stack on top. I think it's worked out pretty well for Databricks doing that, but I don't think a company like Nebius would.

it would be a parallel to that because they're going, you you guys are going into what you're really good at, which is building that hardware stack and offering the services for people to build whatever they want on top of it. It's just all, I think it's all about just stick with what you're good at and what you want to be good at.

Aleksandr (20:38)
Yeah, yeah, great. Great. Well, while we want while we're also running the software, but yes, in reality, the hardware is is super difficult and just becomes more and more difficult nowadays. Because like nowadays, it's the the cool that you know, it's not enough just to have a cool air anymore. Now you need liquid cooling, which means you need to rebuild data center completely.

Ben (21:00)
Mm-hmm.

Aleksandr (21:05)
There are ways where you can still put liquid cooling inside of data centers called Liquid2Air. So when you have like rack with the servers and then you put like next rack with actually heat exchanger. So you take like a heat by liquid to the second rack and you blow the heat from it with a fence. So it's still possible, it's just less effective if you're actually doubling the space that you need.

So it's better to build a data centers from scratch, you know, designed for the liquid cooling from the beginning. And that's kind of challenges because like if you pick a company and you already have a data center, like, do you really want to start to rebuilding data center just to put GPUs or not? So in this case, it might be actually more efficient just to go and find some.

cloud, Neo cloud, because nowadays we also have this Neo cloud name and just get those GPUs for you and just use them. And this is also what we're trying to help with people. you know, we, the thing that we're not actually in the business of the hardware because no one has access to our hardware at this moment. So we actually virtualize everything except GPUs.

This is not an interesting topic if you want to talk about how to virtualize things without actually virtualizing the GPUs, without losing performance of the most expensive parts in the server. But yeah, we do provide access to hardware at this moment. ⁓ We're virtualizing everything in terms of the CPU ROM. Yeah, so the software is quite important and big part as well.

Michael Berk (22:26)
Hell yeah.

Aleksandr (22:50)
And to make it also available to people, there is a question like, ⁓ where you want to stop? Because you can stop on just launching the hardware and providing SSH access. That's also a model. People could just log in with SSH, install Slorum, install their own Kubernetes or whatever they want. Or you can provide, OK, I can give you managed Kubernetes, like auto healing, auto recovery, auto fixing, or you can give like managed Slorum. And then it turns out that

Ben (23:03)
Hmm.

Aleksandr (23:19)
while people use Slorm, people use Kubernetes, they still want to manage. So they're looking for like IAM, like access management control. They want to have observability. They want to have even billing, you know, nice cost exploration. So that's kind of tools you need to actually build if you want to call yourself, I think, cloud. Because this is like, know, historically, the cloud started, as you said, like in...

Ben (23:40)
Hmm.

Aleksandr (23:45)
with AWS, the major GCP big companies. And they actually build it, what I think like a standard of the cloud. Like what you typically expect when you say cloud. ⁓ And when you came to like Neo clouds, you might not see a lot of things that you typically used to. So in an eBus we're trying to build them as well to provide like the same experience. And also we started to integrate with things outside.

Skypilot to allow people just to manage the multi-cloud and Nibus will be one of the cloud behind it or like Outer Bounds, Dstack. So there is a lot of the software should be created to make user experience really nice. The same is like you're doing in Databricks. You're doing amazing software work. You started with a Spark and now you have like several GPUs, you have Notebooks, you have MLflow.

You're doing some great work with the software stack that you're building.

Ben (24:41)
It just keeps on getting, getting bigger over time and more people need to be hired and getting more, ⁓ more exciting use cases running, which is, I'm sure the same for you guys. You know, you started off with, with those, ⁓ those older, you know, instances, mostly CPUs and then expand out. But it's interesting to kind of note how that evolution happens. I think if you guys hadn't started that at that point.

you wouldn't be prepared to do stuff like offer GP 200s right now and have them be like, hey, these are going to be ready for use. Click on this button on your account page and reserve them today. You know, there's a lot of learnings that probably went on to building that experience over time.

Aleksandr (25:25)
That's true. That's true. You know, it's quite hard to just start from nothing, from zero. So you need a lot of the knowledge from the previous to find a way how to make it work. You know, how to make auto healing, how to make crystallization, how to make liquid cooling. So you also need a lot of knowledge. You know, I remember that a couple of months ago, I was in the

conference about AI, there were a lot of the booths about AI agents. But then I found that one of the corner of the hole was full of the pipes, connectors, tubes. It's like I was feeling myself like I'm in like, know, build yourself like type of the store where you buy pipe screws and the set.

There was nothing about AI software. It was all about feeding connectors, tubes, heat exchangers. You can actually take them and start to screw them up, to build some pipes and connect them. It was quite an interesting experience. But this is a part of AI world right now. You cannot run a big model if you don't have a proper infrastructure behind it.

Ben (26:17)
Mm-hmm.

Michael Berk (26:41)
Yeah, actually to that end, I've been working on a really, really cool project for the past roughly year. And we're basically doing distributed online or basically offline batch inference with some LLMs. And ⁓ I was curious your take on self-hosted GPU instances or doing something like paper token that has a service. ⁓

How do you guide customers through that decision? Because obviously the paper token via an external service like OpenAI, you typically would get better performance and they don't let you have their model weights. There was something like Lama and fine tuning. You can actually put that on a GPU. And that seems to be given you do hardware sort of your bread and butter. So do you have any advice through that journey or tips and tricks?

Aleksandr (27:31)
It's great question. It's a little bit not the way I think how you frame it. Selecting between Llama and OpenAI and Anthrophic, it's actually you're selecting not between self-hosting and a managed service like per token because you actually select between the models. They have a different quality, different...

quality in a different use case. So it's more about selecting models. But for example, the llama, deep seek, you can get them from the inference for token, or you can get them from the GPU and run it by yourself. there are two, let's say there are multiple aspects that you can look when you do this kind of selection. One could be, what's your workload pattern? Like if you,

If you have spontaneous, not predictable workloads, like zero, many, zero, many, then it's probably better to go with a per-token service just because it might become cheaper. Especially if you're just doing MVP, you just actually do experiments. You don't have millions of users continuously utilizing it. In this case, just from an economy perspective, paying per-token will be cheaper.

Yeah, so, but if you, for example, starting to have like a lot of requests or you have continuous bad jobs, for example, you continuously evaluating code that you created, like, know, with some code assistant. So that means you can actually continuously utilize GPU. In this case, it will become cheaper to have GPUs than paper token. ⁓ It also, it also depends like in on the skills, like

You said that the OpenAI might be more performance. From that perspective, I'm not sure, honestly, because even running a model is not one line of the code. While we can actually run model, for example, you can take VLLM, can take SG-Lang, and you can launch model with just one line of the code. That's true. But if you actually dig a little bit deeper, and you will start to learn what is like...

Batching continuous batching. What is like? I know

different techniques that you might use like quantization. There are a lot of techniques that you can actually use to almost double, triple your performance from the same GPU. So it's about the skills. Do you have skills? Do your easier team have skills to actually get maximum from the GPU? ⁓ And this is like, for example, what our AI Studio team is doing. So they're learning a lot of things. They're evaluating

every model with every engine, computing engine with different techniques, with different features to understand what combination performs the best. And that's why you can find that some models run even one technology stack, other models run even another technology stack. And if you want to do that by yourself, you need to know how to do that. You need to put time. And again, if it's just MVP experiment, it doesn't make sense to do that.

But there's like technical aspects, economical aspects, but there are also aspects that ⁓ could block you completely. Like for example, that could be data regulation, data governance, like some security things. Like for example, you have a use case where you have super secret data. Like in this case, you probably could not really send it to any service where you have per token.

or it's like a medical data, you want to build like a system to recommend, I know treatment for the medicine, for the patients, you probably will again not be able to send it to some outside of your country or not certified platform. So in this case, you might be just the only way you need to use GPUs because you can say it's running like right on my table, it's running next to my...

you know, in the server room next to the building, in my data center, I control who access it, where is the data. So those kinds of aspects, especially from regulations, they could actually block different aspects immediately, just if you think about them. And the last thing I want to say that it's important question to ask yourself at the beginning, unfortunately I saw startups who starting to build their product.

I saw the startups from the healthcare industry who started to build their products with the inference per token service. And it's about prompt tuning, it's about integrations and all things. And then they realized that actually they could not certify it. ⁓ It's impossible. You could never get certification for that. Please, kids. And they need to throw everything and restart. They need to take open source model.

Ben (32:09)
Yeah.

Aleksandr (32:17)
host them by themselves, you need to rewrite prompts, need to rewrite optimization, integrations, and so on. So you're literally starting from scratch just because you haven't evaluated at the beginning.

Michael Berk (32:27)
Yeah, to that end, I'm curious on cost. So typically a lot of these big providers give you the paper token price and then they have a batch inference price, which is 50 % of the paper token price. have, I don't know, we've looked into that pretty robustly. I'm curious for a self-hosted GPU, where does it compare? Is it more expensive than paper token? Less? What's the percent exchange?

Aleksandr (32:52)
So I would say that there is no golden rule. It's super easy to calculate, for example, for AWS Lambda because the pricing per hour. You can pay an hour or two hours. You can just find where is the...

Michael Berk (33:05)
Yeah, holding throughput constant. Have you guys ever done those calculations?

Or been if you know.

Ben (33:09)
Ha ha.

Aleksandr (33:11)
The reason why it's hard to answer you, honestly, because it really depends on the workload. So one of the rules that exists ⁓ when you do calculation of the language models, you actually, every next token, because how it's working, you get your input and you start to process that input. And every token in the input, every next token actually have a quadrant difficulty of calculation.

Ben (33:15)
Mm-hmm.

Aleksandr (33:35)
So it's rising not linearly, it's actually rising quantum. Which means if you have really big, like if you like, for example, ⁓ if you have like, you know, back in the days, a lot of people were playing with the language models, like write me an article, write me a document. So it means like you have really short input and really log in output. So that's one kind of economy. And that's one type of the, let's say, calculation that you need to perform. And then nowadays,

When a lot of people actually start to do opposite, they start to do summarization, they start to ask question and you search like thousands of the websites and you just, you know, looking for a simple answer or you do code assistant where you actually throw your code base and asking to write next printf. ⁓ You know, you actually still need to process all of that. And because it increase in quadratic, you need much more compute.

And that's why, know, really it's hard to answer where exactly you need click. And we see that as well, like, you know, because of that, where you have a huge inputs, it just becomes more expensive to host to process it than if you have a big output. That's by the way, one of the reasons if you look on the pricing click, if you just look on inference per token services, you can find that some of them

have a pricing different for input and for output.

Ben (34:59)
Hmm.

Michael Berk (34:59)
Yeah, that actually makes sense. Okay. I always wondered why it was different pricing. That's that makes a ton of sense.

Aleksandr (35:00)
That's why it's really hard to say.

Ben (35:07)
I have a question about your technical staff. So because you guys are at the forefront of, you know, the company is kind of focused around getting the most performance out of hosting stuff like Gen.ai. You're building an immeasurable depth of knowledge in pretty much everybody that's working on this stuff.

who understand all the nuances of like, you mentioned like, my, my encoding, you know, precision on something like a torch model. ⁓ I can, I can get away with like eight bit or I can do like four bit or, know, different varying levels of that. staff at the company, particularly on the software side is probably among the forefront of people who understand that bridge between the user interface and the hardware.

Is there an issue with poaching?

Like other companies are like, hey, we'll offer you a ridiculous amount of money if you just leave your company and come work for us.

Aleksandr (36:04)
So I haven't experienced that. ⁓ No, I couldn't say so. mean, it's normal then when people want to find another job. But I couldn't say that I see a lot of people living because they get more money because of the something. It's also probably the part of the culture. Like, ⁓ Denivius was

Ben (36:08)
Not yet.

Aleksandr (36:33)
It's kind of, know, it's I would say it's a feeling when you really feel that you're doing something great and the team around you doing something great. ⁓ people actually working. So we do have like defined work hours. We do not have no one controls it. But I could say that the people they just sometimes sitting in the evening in office just because they want to see they want to test something. So I see a lot of passionate people who want to make needles.

create, who wants to build amazing products. And I don't think that in this case, they will even they will leave even for money. Like for example, for myself, could say for myself, if someone will tell me like, you know, double my salary at this moment, I won't leave. Just because I see things that they know. You know, it's it's

Michael Berk (37:17)
triple.

Quadruple.

Aleksandr (37:20)
Still no, it's not about the money. It's more about... For me personally, it's coming from why I left, for example, AWS. So for me, was like... I really like AWS. I think it's one of the really great companies where you can learn so much if you want, really. If you want to learn, you can learn a lot about the hardware, software, how to build things, how to build tools.

But in my case, you know, when I met Nebus, I've been told, so we're building like a new cloud from the beginning, especially for AI. Do you want to come and build whatever you want? Like you have ideas, you probably saw in the market what is missing, we just join us and you can build it. And I still have ideas that they want to build. And that's a place where I could build literally whatever I want, whatever I see. And I see the people who inspire it and they really want to help.

And that's what keeps me. And it's not about the salary at all. I mean, I couldn't... It's probably also because ⁓ the salary is nice. So it's not about that you're trying to understand, can I buy one more cup of coffee? No, it's... Yeah, I mean, if you feel yourself in that position, you probably will... It's a of survival. But when you don't have a survive, like there is the pyramid of Maslow.

Ben (38:17)
No, that's great to hear.

Aleksandr (38:40)
how people are choosing what they need. So when you're not on the base level of the survival, you probably already like, I'm more interesting for it find like, to put my soul in something and not just, you know, struggling.

Ben (38:54)
It's good to hear that your company parallels that same sort of ethos that we have, like in the R &D side of Databricks. mean, it's weekly that most of us get contacted by sometimes very big tech companies offering stuff that's five, six, 10X, what we are total, like total compensation. And you just don't see people leaving for that.

like the retention is so strong, because nobody cares about that. You summed it up perfectly. It's like you don't, at a certain point, you don't need more money. And it's not about, that's not like the driving force in your life. It's more like, I love my team. I love my product. I want to see what we can build next and what other problems we can help solve in the world by like, you know, building better stuff. So that's super cool. ⁓

that's the same sort of ethos that you guys have.

Michael Berk (39:49)
Yeah.

Ben (39:49)
You don't hear

that from everybody. Just letting you know. Some people are like, yeah, we're hemorrhaging like 15 engineers a month. It's like, yeah, that sucks.

Aleksandr (39:51)
Yeah.

Michael Berk (39:58)
Yeah. Yeah. I think that's really indicative of a innovation based culture as well, because it leads to better performance. Like if everybody's a gun for hire and doesn't actually have emotional connections to the product, typically you don't see great work put out, but ⁓ people like each other. And if people care about the thing that they're building, it's markedly different.

Aleksandr (40:18)
Yeah, I'm fully agree you.

Michael Berk (40:20)
Yeah, well on that note, I think we're sort of coming up on time. I'll quickly summarize and then kick it over to Alexander if you have any next steps. But ⁓ today we talked about a bunch of different things per usual. Some stuff that stood out to me was the product manager one-liner is love the product and then everything else should fall from that first principle. Hardware is cool. And then for LLM inference, ⁓

Aleksandr (40:20)
It's a

Michael Berk (40:48)
When deciding between paper token versus sort of a self-hosted implementation, if you're building an MVP, paper token is a lot easier. It's a much faster quick start. And also, if you have volatile, spiky usage, paper token is great because it'll conceptually auto scale. Someone else is hosting all those VMs for you, and your request load is not dependent on having a physical system. But if your team is technical, you might want to approach a self-hosted solution.

You can typically eke out a lot more performance with quantization and things like that. So if you have a team that can leverage those advanced features, it's worth exploring. And then also if you have a highly regulated industry, it's also worth exploring ⁓ a VM that will lock down your data into a single data store, not send it over the network or not send it to a third party like OpenAI or Anthropic or whoever.

So Alexander, if people want to learn more about you, more about Nebius, or more about your background, where should they go?

Aleksandr (41:41)
I think that might be two good sources. First of all, know, Nebus website, we're trying to give up, we're trying to tell people our story, which is like the platform where anyone could achieve and build something with AI, regardless the level of expertise and regardless the use case. If you're looking for like 10, 80, just 100 GPUs as a cluster.

Feel free to come, you can get it. And it's pay-as-you-go self-service. Or if you're looking for just model for per token, you can also get it in E-Bus. That's one. And I think that the second, which might be really interesting for the engineers is actually our YouTube. At the end of the last year, we've done a series of the events called road shows. We've done it London and in Paris and in San Francisco.

Why specifically for the engineers? Because that role chose, those role chose, they were delivered by head of the products and head of the development. So, and developers as well, and the product managers. So it was not a marketing like typical when you see like, you know, some graphs, some, you know, promises. It was like about, okay, so why we're not, why we do not have availability zones.

Like why we decided to build just one of our abilities on the region. And here is technical description why we've done it. Like why we decided to go and build virtualization. Here are details like what kind of, what kind of virtualization you're using. So here is exactly which open source projects we're using, how we combine them. It's about everything. So it's about the hardware. It's about software, how we cook in Kubernetes, how we create it.

S operator, is like Slurm operator for the Kubernetes. There are just three solutions in the world as far as I know such exists and just two of them open source and one of them from us. So anyone could use it in any cloud. And the same, there is a talk about how we actually pushing the GPUs to the boundaries inside of the EI studio where you get this model. actually how we provide that amount of tokens per GPU.

So we're talking about things like speculative recording, quantization, different interest engines. So there are like six hours of the really technical videos about how Nibius is actually built. So I think those are two really good sources, like website and the YouTube channel where you can find the recordings of the roadshow. And anytime if you want to get in touch with me, you can find me on LinkedIn, I know on Twitter.

So if you want to talk, just send me a message. We can find a time just to chat about technologies. I really love to talk about technologies.

Michael Berk (44:26)
Beautiful. All right, well, this has been a lot of fun. Until next time, it's been Michael Burke and my co-host. And have a good day, everyone.

Ben (44:32)
Ben Wilson.

We'll catch you next time.

Knowledge Sharing, Culture and the Future of Cloud + AI (E.21)
Broadcast by