Databricks Lakebase (OLTP) Technical Deep Dive Chat + Demo w/ @Databricks Cofounder, Reynold Xin

By Josue Bogran Channel

Summary

## Key takeaways - **OLTP Databases Stagnant for 20-30 Years**: OLTP databases like MySQL, Postgres, or Oracle look more or less the same as in the '90s, while analytical databases have evolved two or three orders of magnitude faster thanks to cloud and database engineering advances. [00:43], [01:29] - **AI Agents Create 80% of Neon Databases**: 80% of databases on Neon were created and managed by AI agents post-acquisition, up from 30% a year ago, shifting the persona from humans to agents supervising them. [01:49], [02:06] - **Neon Separates Storage from Compute for Postgres**: Neon built safekeeper for replicated write-ahead logs and page server for caching Postgres pages on S3, enabling minimal changes to Postgres for low-latency OLTP access. [09:46], [10:54] - **Lakebase Scales Compute to Zero in 500ms**: Autoscaling provisions Postgres nodes in under 500 milliseconds, allowing databases to scale down to zero during idle periods like nights or lunch, changing the OLTP paradigm. [12:02], [13:05] - **Lakebase vs Aurora: Open S3 Beats Proprietary**: Unlike Aurora's proprietary storage engine, Lakebase uses open S3 or blob stores for cheaper costs and enables direct analytics queries with Spark or DuckDB on Postgres pages. [15:07], [16:13] - **Demo: Instant Free Trial Branches**: Lakebase creates trial branches from production instantly with copy-on-write, costing nothing extra for storage until writes occur, and autoscales compute to zero after inactivity. [21:10], [23:14]

Topics Covered

OLTP Databases Stagnant for 30 Years
AI Agents Now Create 80% of Databases
Neon Separates Postgres Storage from Compute
Scale OLTP to Zero in Milliseconds
Lakebase Unifies OLTP and Analytics

Full Transcript

I'm here with one of the co-founders of data bricks, Rainol Shing. Did I say that correct? Awesome.

that correct? Awesome.

>> Close enough.

>> Close. Look, we were just talking about my name and how difficult it can be. So,

I think we're we're we're even. Real,

you want to go ahead and tell us a little bit as to why Lakebase matters so much to not just

current data bricks customers, but why it really matters to the ecosystem as a whole.

>> Yeah, hope I pronounced it correctly.

>> Said that couldn't have said it better myself.

>> Uh, well, thanks for having me here. the

I think from a broader industry point of view really um I was talking about this at data and AI summit we felt like databases uh OLTP databases really

haven't changed that much in the last 20 30 years if you look at analytical databases they have evolved a lot like we are now probably two or three orders

of magnitude faster than we were in the 90s and maybe even early 2000s and there's a lot that's because we built on a lot of foundational technologies

um that are both cloud-based and also just hardcore like database engineering.

Um but OOLTP database more or less if you crack open a MySQL, Postgress or Oracle today. Um there are a lot of

Oracle today. Um there are a lot of changes but they look more or less the same as what they were back in the '90s.

Um so we felt just that hey it's the right time to think about how we can disrupt and build next generational databases that are cloud native and also

um agent native um the one thing that was pretty interesting we actually didn't know about this until the neon acquisition which was 80% of the

databases on neon um was actually created or were created and managed by AI agents um and this is an incredible sort of um

stats because a year ago that number was 30%. So it went from 30% to like 80%.

30%. So it went from 30% to like 80%.

And you just extrapolate that line at some point 99% of the databases will be created by AI agents and historically databases were really

operated like provisioned and operated by humans. Um I think with the LM trend

by humans. Um I think with the LM trend agent coding uh we will see that basically the actual persona they might there might be still a human somewhere that's probably supervising all those

agents but really the persona is changing from humans to agents and that shift in persona I think comes with a lot of new requirements that historically simply weren't even part of

requirements list for uh databases or maybe they were very they're not even the top 100. Um, an example would be, hey, you want to paralyze a lot of

different agents um to work on a task and maybe have each agent run some version of the experiment and you want to compare which one does best. At this

point, you want to be able to snapshot your data. Um, and it's just and then

your data. Um, and it's just and then actually provide a different instance of the database that's fully isolated and sandbox just for that one agent because the agents could be making mistakes. You

don't want the agent to be impacting your actual um data. And every one of this operation be pretty fast because the agents are operating at pretty fast speed. So I think there's just a lot of

speed. So I think there's just a lot of all this new requirements that are coming in now that which historically simply not even a concern for database system developers and and we love that

uh when this new requirements come in we think it's actually gives an opportunity for us to rethink the fundamentals and think about how we should be architecting databases. Then we can

architecting databases. Then we can build the databases for the future. So

the databases of the past were built for human operators. In the future will be

human operators. In the future will be built for human operators that supervising a lot of different agents doing it. Um and it's also

doing it. Um and it's also intellectually sort of challenging but ultimately comes down to hey we think we can build a new era of databases are just much better than what was done in

the past especially given a new persona >> and and I also think too u the agent side aside obviously apps have been a big investment for data bricks too and I

think a very good opportunity of growth uh >> so I think you also have that going on for you agents aside I know everything everything is agents right now but I

think even agents aside I think having the ability to have everything together in one single place. No, absolutely.

Data bricks apps has been the number one I think maybe might have been the fastest growing product in history of data bricks. Uh it used to be data

data bricks. Uh it used to be data bricks SQL but I think apps actually took that now it's earlier on so maybe um it's it's a little bit hard to predict exactly how we go in the future but it's been growing super fast and the

number one feedback we've been getting from customers is hey can I just get a provision database together with every app I provision. Um so we're definitely seeing that trend.

>> Yeah. No, and I just wanted to add that just just because I think yeah, again there's there's so much on agents and I know there there's a lot of value going

there, but but for me personally as someone that um that has dabbled back and forth in application development in the past,

having so many of the pieces together, for me at least, it it helps me actually get stuff done. So, I appreciate that part. The truth is what's really good a

part. The truth is what's really good a lot of the requirements for agents um once we actually can like support them it ends up being helping humans a lot

also but I would say for humans it's a much nicer to have whereas for a lot of agents it becomes kind of necessity but let's take like we talked about maybe creating snapshots and branches of your

databases like quickly right that's actually incredibly useful feature even just for human coding uh one of the thing we see a lot have customers doing now is that they would um for every git

branch it would automatically create a snapshot like a a branch of the data uh for the database um and and developers really love it because it makes it much easier to test against high fidelity

data.

>> Yeah, I know some of the founders tend to be more on the business side, some more towards the technical side and I know you are definitely a lot more on

the heavy technical side. So give us a little bit more details as to just how the architecture uh really works underneath the hood for

Lake Base.

>> Yeah. So um Lakebase is has been building on this technology we actually acquired from Neon and uh the interesting thing is we have been working together actually long before

the acquisition.

Um the and that actually led to the acquisition itself because we just realized hey how great the tech was. Um

and one I think one of the big like earlier we talked about lake analytics had analytics database look very different today uh from like 20 years ago and one of that change was lick

house and lake house uh has this one very important property which is separate storage from compute um so you can store your data in massive volume in

cloud object stores um this like s3 or a blob store and then uh you can launch um compute instances is very ephemeral um to be processing the data um in object

stores and there are many many advantages here. Uh one of them is it's

advantages here. Uh one of them is it's just enable elastic scaling right you can you can dump as much data as possible and then now um you can put s

uh when you have a big job to run you launch more compute resources when you don't you shut them down um it's much more cost- effective and then the cloud object stores also tend to be uh

probably the cheapest medium uh durable storage medium for uh data um and the thing that the neon team have pulled off and built uh design and build is that they

actually uh they have a similar architecture but it's actually harder to accomplish in OLTP because in analytics um 100 millisecond query is like

considered pretty fast um but as a result that tail latency spike of the object stores are fine but in OLTP you want to be able to process queries in less than a second or less than a

millisecond um a second way too slow like even 100 milliseconds often considered they're too slow.

So the the neon and the other thing is um Postgress has kind of worn and become the lingua franka of OLTB database. If

you look at all the uh database usage trends uh stack overflows the RDBM DBMS ranking Postgres like going up super fast uh in

>> community too right >> yeah community large community large open source community uh like I think that's probably thanks to that large open source community its usage and

adoption is growing like crazy um but Postgress is not fun fundamentally is a couple story from compute instance. If

you go pretty much to like vast majority of Postgress providers, they give you a box. The box has storage in them, has

box. The box has storage in them, has compute in them. You cannot

independently scale it. Um if something if you need more storage, you now need more compute. And even that move itself

more compute. And even that move itself is very difficult because it's more considered fixed capacity provisioning.

Um so the Neon team actually figured out a a really uh interesting way um to implement separation of storage from

compute uh for Postgress specifically.

Um they build this storage service um there's actually two key services that sits on top of uh cloud object stores like F3. There's a safekeeper and

like F3. There's a safekeeper and there's a page server. The safekeeper is basically what uh a replic it's it's think of it as a replicated write a

headlog. So the way databases work is

headlog. So the way databases work is instead of actually modifying the data in place um they write it to a riot headlock first. Um and the safekeeper is

headlock first. Um and the safekeeper is basically a distributed replicated right headlock service and then safekeeper is

a service that uh takes all the uh actual Postgress pages. So databases in the case of Postgress separates all the storage into different sort of tiny

granular small pages. Um the page server uh is a distributed storage system for all the pages. it ultimately stores everything in S3 but because we don't

want to pay for the uh latency hit of going to the S3 it's a distributed farm that uh caches all the data also as much of data as we want uh to provide much

lower latency access and then they make extremely minimum changes to Postgress um to to actually change the very underlying storage layer they find some

very narrow waste um in Postgress so you can change Postgress instead of writing to local disk uh for both write the headlock and for the data pages you just use this two services um they're on top

of S3 um and there's a lot of benefits that's uh make possible with this uh interesting architecture one is the

storage cost is super low um and storage is automatically durable it's as durable as the cloud object storage which are often either multi-AZ or multi-reion um whereas in the past when you get a

fixed capacity postgress instance that storage is only as durable as the storage on that box. So if that box goes away, you lose all data, right? So

people then come up with storage sol like backup solutions or so of they may sell them like use EBS and like super expensive compared with S3. Um the other

one is it makes it uh very super fast autoscaling like on top of this architecture. Um the Neon team actually

architecture. Um the Neon team actually built an autoscaling service that basically provisions compute nodes like Postgress nodes in less than 500 milliseconds.

And because you can do this now, I actually think it it changes the paradigm of both for agents and for humans here. The uh agent is not just a

humans here. The uh agent is not just a marketing term here, but it changes the paradigm of databases. Um because for many

I think the while there are c one class of databases that extremely latency sensitive and you simply cannot afford to have a t latency spike at all. I

think most applications uh especially internal applications that a lot of enterprise build it's okay every once in a while to have latency spike of a few hundred milliseconds. You

don't want it to be like seconds but 100 milliseconds fine right because that's roughly human perception time 300 milliseconds.

um if we can actually provision and uh acquire compute resources in hundreds of milliseconds. Now we can actually scale

milliseconds. Now we can actually scale the database itself entirely down to zero uh without paying for it when there's very little when there's no traffic to your service which happens all the time

also right like um you might have a lot of traffic for internal app is specific time zone during 9 to5 but then past that hours it drops down to approximately zero for many companies um

and for many companies maybe the lunch hours also dropping down or maybe in some cases you have spikes at 9 a.m.

because everybody like gets into work and start looking at some uh looking up some app, but then at 10 a.m. it it goes down. It doesn't go down to zero, but it

down. It doesn't go down to zero, but it goes down. Um so they build this

goes down. Um so they build this autoscaling architecture that makes the database itself can adjust uh and um its resources dynamically based on the load

and can also go all the way down to zero. And this is super cool uh because

zero. And this is super cool uh because I think it's actually probably the first time that you can have a OOLTB database be so responsive and so elastic. One of

the comparisons that I've heard and read some people make is well isn't this similar to how Aurora works from AWS? So I'm curious to hear your

from AWS? So I'm curious to hear your thoughts on that.

>> Yeah, it's a great question. Um I I think at a very high level there's some similarities. Um especially if you think

similarities. Um especially if you think about Aurora also have some attempt at uh Aurora might have been the most popular of separate storage from computer LTB offering out there. Um but

if you actually look deep enough and think about the technical Brahm harder it's actually very very different and that leads to very different benefits for the end user also. And one of them

is that um Aurora was actually I think because Aura started many years ago. AR

started more than a decade ago and at the time the object stores were much worse. So they had to build a completely

worse. So they had to build a completely proprietary storage engine um that are not open at all and only Aurora services

would have access to um and whereas if you think about the lakebased side neon um it's built on just cloud data lake it's built on S3 it's built on Azure

blob store and this might seem like hey what's a big deal like proprietary storage system um a more open storage system. What's what's the big deal?

system. What's what's the big deal?

There's actually a massive difference here. One is uh aside from engineering,

here. One is uh aside from engineering, it's actually engineering wise more complicated to build it to make it work for object stores. But what what's the benefit? Why should like maybe the user

benefit? Why should like maybe the user care? One is um uh cloud object stores

care? One is um uh cloud object stores are far cheaper than EBS um just because the way it was done um EBS considered more of a high-end

network storage whereas cloud object stores more of a data lake and data lake are fundamentally cheap. you just have much better unit economics. Um the

second thing is because it's built on cloud object stores and the data that store on it are also open spec and open source like they're literally just Postgress pages.

So we're kind of recreating the interesting lakehouse paradigm here except for OLTB which is you have the actual uh source of data stored in cloud

object store which can be very high throughput in open format. So on top of um it we could not just build we're not just building OLTB database at Postgress

being able to access that but we can actually build we haven't done all of it yet this are still in the work but enables that possibility of having for example spark directly quering all the

data um having um sort of duct DB quering the data just provided there's the right adapter for it um but because it's open format um and it's honestly

it's just Postgress pages um we could be creating either us or sort of the community could be creating those uh adapters. Yeah. So it just

opens a lot of use cases and I can give you an example of what it might be. Um

let's say um well one very simple one is hey there used to be this giant divide between analytics like lakehouse data warehouse

data links and uh OLTP. Yeah.

OLTP.

>> Yeah. When when you become lickbased or LTP when when you become successful enough, you want to start applying analytics orb data. Um how do you do that? You start trying to build

that? You start trying to build pipelines to get data out and you learn like complicated concepts like CDC and

all that. Um but if the actual OLTB data

all that. Um but if the actual OLTB data are open format on top of uh data lake which is basically lick houses now

they're no different from a lakehouse data just go read them directly you can read them using spark you don't have you can read them in massively parallel fashion if you have a pabyte of data go

read them it's not an issue uh I think that just changes the scaling and also makes a lot of operations simpler Um every database like every company

operates I think any large enterprises operates tens of thousands if not millions of databases.

Um they are destroying database instances because the storage was not shared. Um by having all of that

storage in one data lakeink basically the lake house here um you can now actually enable um workloads that plump through all of

them like for example if you want to identify if you want to do a global scan let's say hypothetically speaking um if you found so one of your database that

got compromised and there's a table that got created with a uh hey wire send uh bitcoin to this address and I'll give

you your data back, right? Um hopefully

or in the case of uh Lickbase uh you don't have to worry about getting your data back because we have like time travel and snapshot of them but uh you at the very least you want to know hey what what's going on with my other

databases did anything else get compromised how do you do that across maybe a million different instances if they are destroying it's incredibly difficult time consuming laborous job

but if all of them are actually in the lakehouse you could actually write a single spark query to scan often or in DBSQL if you like SQL um that I think is going to be

revolutionary to how we operate and manage databases um as well. Okay. Uh this is not like a super complicated demo. I just want to show maybe quickly the developer

experience and um how fast it is with linkbase. And this is just your standard

linkbase. And this is just your standard data bricks workspace UI. Any data

bricks user should be pretty familiar with it. But there's this new thing here

with it. But there's this new thing here with a little bit drop down that goes into this thing called lakebase. In

lakebase you can create a new project which is effective a database um with many different potential instances.

Let's say host way test. It's going to create a new one maybe one here. Um so

now we've created a new database project and um everything is super snappy super fast. In this case we created it

fast. In this case we created it automatically creates a development branch of the database also. Um and then

if we just come in here um this uh the primary rep basically the primary branch um of this or primary replica of this branch production branch is now

available and let's try just want to run something uh some query here. It's like

some sample queries it's running um and everything's completed all the SQL queries. Um and it just took uh uh it's

queries. Um and it just took uh uh it's fairly fast. But what is uh interesting

fairly fast. But what is uh interesting is if you want for example to create a trial branch of this production

like my trial um and you could actually uh have it automatically disappear. Um so for

automatically disappear. Um so for example if you know that you are uh only using this for less than a day you could do it uh but if you don't it's even okay and then you can uh create a branch

based on whatever data that's in your production branch you can actually based on past data I mean we don't really have anything in the past we just create it here you just create and now that branch is created and this branch is actually

it's effectively free uh because he has this copy on write uh architecture that unless you start adding more and more data to it doesn't really cost you

anything. Um, it only cost you when you

anything. Um, it only cost you when you need to use some compute um against it, which hopefully is pretty short because you're using it for development. Um, and

it only cost you if you start writing additional data to it. But otherwise,

you can have like say a terabyte of data in your production and then you create a branch. Um, the branch is copyright. So,

branch. Um, the branch is copyright. So,

it doesn't actually cost anything additional. But then you actually get

additional. But then you actually get all the branches um for basically as separate um independent branches that

each of them is kind of its own mini Postgress but just as a storage layer um they were pointing back to the same uh storage. Um so this trip quite literally

storage. Um so this trip quite literally it's it's not that you are duplicating the data in itself but it's your bas

>> yeah you are you're it's it's a logic it's a logic level basically that's ultimately in in a very simple way it's a logic that's telling you hey this are the records that should belong here

these are the records that should belong in this other branch.

>> Yeah exactly. Um so the and then because of well in this case we're doing the autoscaling of 0 to4. Um so at this is

sort of approximate some sort of memory um sort of the hardware but you can actually get it to like super small you can have it to scale to zero. So if

there's no usage after like 5 minutes this uh whole thing just disappears but then if it needs to come back um it will come back in hundreds of milliseconds.

Uh, so the compute is disappearing.

You're still paying for the storage.

>> You're paying whatever you have in storage. If you have like

storage. If you have like >> computer, you're not really paying anything more for compute in that moment.

>> And the storage is like super cheap.

>> Yeah. Yeah. Yeah. But yeah.

>> So, um, this is basically the entire experience. Um, it it's super simple. Um

experience. Um, it it's super simple. Um

and it's it's a full-blown I we didn't really show so it's a full but it's a full-blown Postgress um you connect you could use the building SQL editor but vast majority of the people will

actually be connecting um to it through uh probably uh from the applications right >> this is the new and improved lakebase

>> yeah so to to uh it's a little bit confusing here so we announced lakebase public preview actually at data and AI

summit um and that is more of a fixed p uh fixed provision lick base uh so you you provision whatever resources you

want and that's it um you could reprovision or provision it down um it would take a while uh because it's still separated storage from computer it's the

same neon techn storage technology under the hood but what I have shown here this is a improved version upon that the in beta right now. Um that actually has

basically all of neon.com's technology.

So it's not just the storage part, but the entire all those scaling, the developer experience, all of that. So

that's when we have now have like snapshotting, uh branching, backup, restore, all those are now and everything's like super simple. It's

just in the UI or so you can use command line, all that work. Um and it it's it's much snappier. Um, we actually see this

much snappier. Um, we actually see this to be the future and eventually replace the uh uh the current Nick Lickbase um basically provision Nick Lickbase.

>> It's kind of like a branch of lake base right now.

>> Yeah, exactly.

>> That's awesome. Thank you for that.

Loading...

Loading video analysis...