AWS re:Invent 2025 - Amazon S3 performance: Architecture, design, and optimization (STG335)
By AWS Events
Summary
## Key takeaways - **Leverage S3's Massive Scale**: Amazon S3 stores over 500 trillion objects equating to hundreds of exabytes and serves over 200 million requests per second. Use this vast infrastructure by going broad and wide across fleets to unlock high throughput. [02:13], [02:21] - **Parallelize with Multipart Uploads**: For a 500MB object, use multipart uploads across 5 connections at 100MB/s each to reduce upload from 5 seconds to 1 second. Failed parts only need re-uploading, enabling pause-resume and streaming starts. [05:50], [06:12] - **Prefix Strategy Drives TPS**: New buckets support 3500 puts/5500 gets per second; 10 prefixes enable 35K puts/55K gets per second with automatic partitioning on load. Place dates rightward to reuse partitions across days. [16:42], [17:01] - **Avoid Day-Prefixed Hotspots**: Day at prefix start wastes partitioning daily as prior splits go unused, causing HTTP 503s during repartitioning. Push dates later so prefix splits carry over across days. [18:40], [19:44] - **S3 Express: Single-Digit ms Latency**: S3 Express One Zone delivers single-digit millisecond access, up to 2M requests/second per directory bucket with 200K gets TPS out-of-box. Append to objects and rename in O(1) time. [22:23], [23:05] - **Use CRT for Auto-Parallelization**: AWS Common Runtime automatically handles multipart uploads, range gets, DNS multi-value answers, and retries with target throughput config in gigabits per second to maximize performance. [11:22], [12:25]
Topics Covered
- Parallelize connections to exploit S3 scale
- Prefix partitioning auto-scales throughput
- Push dates right in prefixes for reuse
- S3 Express delivers single-digit latencies
- Directory buckets scale TPS instantly
Full Transcript
Um, we're gonna talk a little bit about S3's architecture and design and how you can leverage that to drive massive throughput and get high performance out of S3.
My name is Ian McGarry. I'm a director of software development for Amazon S3, and I'm joined by my colleague Dev Kumar, who is a principal product manager for S3.
So let's go through our agenda quickly.
Um, in this talk we'll dive deep into how S3 works, its scale, and how you can leverage that to drive massive throughput and get, um, low latency out of S3, um, and we'll also cover Amazon S3 Express 1 zone, which is our high performance storage class. My colleague
Devil will be covering that, when to use it and what the key benefits of it are.
OK, so the question is, why would you want to drive height throughput and get low latency out of S3?
It's an object store.
Well, many of our customers look at S3, they see the durability, they see the availability, they see the cost, but most importantly for these types of workloads, they see the elasticity. It can scale up and
the elasticity. It can scale up and down to high throughput as needed.
Um, a lot of customers are running data lakes analytics workloads or machine learning, um, training on S3 and are scaling to hundreds of gigabytes of, uh, per second of sustained throughput.
Similarly, they're running interactive querying on logs, um, and machine model, uh, loading on Express.
So we'll talk a little bit about those use cases and, um, how to get the most of it out of S3 for them.
OK, so let's dive right in.
Optimizing for high throughput.
If you take anything away from this talk at all, it's that you should use Amazon S3 scale to your advantage.
There is a lot of infrastructure, disks servers and networking that goes into making Amazon S3 work for everybody, and that's actually the key to unlocking your throughput as well. Go broad and go wide
as well. Go broad and go wide across the, the various fleets we have across the world.
And speaking of scale, I just wanted to take you through some cool numbers that I thought would be helpful to give you a sense of the, the scale we operate at.
Um, Amazon S3 currently stores over 500 trillion objects, which equates to hundreds of exabytes of storage.
We also serve over 200 million requests per second, and customers are running over a million data lakes on AWS.
So scale is definitely the uh the goal. Because we
goal. Because we support such scale such as this, we have, as I mentioned, a lot of infrastructure, and that'll be the key to uh to unlocking the high throughput today.
OK, so S3 is architecture.
At a very high level, we think of S3 in three simple components, and this is in the context of serving get object and put object. So
retrieving data from S3 or persisting data into S3.
There's our front end, which is the set of services that route your request to S3.
And also includes the services that orchestrate the processing of your requests. What do I mean by orchestration and processing? I mean by running authorization, running authentication, generating events, generating logs for your requests, all the things that go into actually serving the request.
It's also responsible for orchestrating, generating meta metadata about your object and your request, looking that up in our index, which I'll talk about in a second, and then also persisting and uh retrieving that data from our discs.
The next is our index component.
And that is basically a very, very large distributed key value store and it's very simply there to map object metadata to the object bytes. Simple example is take the key name of the object and store the locations of the store the location of the bytes on the disks so that we know where to go to get the data. It
also stores things like creation date and so on. And then finally, our storage
on. And then finally, our storage subsystem.
This is responsible for managing all of those discs at scale and then also figuring out where to place data.
So these three components work together to fulfill a get object and a put object request, and that's, that's how we think of them. We organize around those internally
them. We organize around those internally and so for, for example, for myself, I lead the front end teams um who work on all the APIs for S3.
OK, so. That's the architecture.
so. That's the architecture.
How does that map to what you do?
Let's take a very, very simple example.
I have a 500 megabyte object. It could be a video, could be a document, could be anything, and I wanna upload that to S3.
I establish a connection between my client and S3, and I start uploading via the request.
One key piece of information, when you establish a connection to S3, you can only have one active request on that connection at one time. It'll
become important in a little bit.
So, 500 megabyte object, establish my connection and begin my request.
There's no actual limit on the amount of data you can upload on a single connection from the S3 perspective, but generally given network client configuration and different constraints, we find most customers can achieve about 100 megabytes per second on any single connection. So if you take my 500
connection. So if you take my 500 megabyte object, you take my 100 megabytes per second connection, that means I can upload that object in about 5 seconds.
How do I get that down to 1 2nd? I wanna
go faster.
The way I do that is parallellyze, and that's what I mean about use S3's breath.
Paralyze many connections across S3 so you can achieve higher throughput.
For for the persisting of data, we'll talk about multi-part uploads and for the retrieval of data and uh paralyzing the retrieval of data we'll talk about range gets.
So we're gonna talk about spreading data across, or sorry, connections across, uh, many, many different IP addresses, and we'll talk a little bit about um the tools for doing this in a second.
So let's go back to our example.
Now I have my single object which again can be a video, but I carved that up into 5 parts.
And have a single connection to upload those parts in parallel.
And now you can see that with each connection getting 100 megabytes per second, I've carved it up into 500 megabyte chunks or parts. I can upload those all within a second, but there's more benefits to this.
We'll talk of taking multi-part upload, which is the API to achieve the put object in parallel.
There's a few key benefits to this. The first one I mentioned, it improves throughput, but it also improves our recovery time. If any individual
time. If any individual connection fails, which could be due to your client failing, could be due to network trouble, it could be due to the server on the S3 side, which is physical infrastructure failing, then I only need to re-upload that individual part. So you can imagine in my initial
part. So you can imagine in my initial connection, I get 250 or 400 megabytes the way through.
My connection times out for whatever reason, because I go through a bad a bad router or a bad switch.
I now have to go back and re-upload all that data again.
Not with multi-part upload.
Now, if I upload 4 out of those 5, I only need to re-upload the one that failed.
These can be uploaded in any order as well. And so the beauty of this multi-part
well. And so the beauty of this multi-part upload is that you can even start an upload to S3 before you have all the data in memory. You can imagine for a streaming use case where you have data coming in, you don't know what its size is yet. It's streaming video
is yet. It's streaming video or it's coming from um a stream of data, you can start part uploading in parts, and only when you have all those parts uploaded, then you can complete that um multi-part upload and persist your data.
So, a lot of uh benefits that you can also pause and resume object load uploads as well. You
can go in individual parts, pause for a period, resume, and then complete at the end.
It's the same on reads, except we're not using multi-part upload this time, we're using ranged gets.
So again, if I uploaded my 500 megabyte object, now I want to download it again because I plan to use it, I can now do ranged gets, which are 5 individual get requests off across 5 individual connections to pull all that data down at the same time and then reconstitute it on the client.
So again, here, instead of taking 5 seconds to download the whole 500 megabyte object, I can now download across 5 individual connections and get it down to 1 2nd.
How do I do range gets? Range gets are a little bit different because now I've actually stored all my parts up there, so I need to use the list parts API which lists out all the individual parts of your object, and maps them to ranges, and then I can use the range get API to download those in parallel.
Again, the same benefits apply here. If
any individual part fails to download due to network trouble, trouble with my client, I only need to download that individual part.
So this is kind of the key recipe or ingredients to going massively parallel. I've talked about
massively parallel. I've talked about one object, 5 connections.
This can scale very, very broad, and we have uh customers who are doing hundreds of thousands of connections today, uh, in this way.
OK, take you a little bit under the hood. Um,
what I mentioned earlier was I have 5 individual connections.
The ideal state is that it's 5 individual connections to 5 different IP addresses of S3.
S3 has many, many IP addresses that all get stored in our DNS zones and they get returned to customers.
And ideally you want to parallelizeze across many because if we have a problem with one server, we want you to quickly be able to recover and try different servers and similarly, if you're doing a multi-part upload or arranged get and one of those servers fails, we want you to be able to quickly retry. So rather
than having all your connections to one single IP address because if that fails, then all your connections fail, we want you to spread across and how do we do that?
A couple of years ago, the teams um that I lead launched multi-value answers DNS.
So you can see on the right hand side I've got a dig command. I've just digged for Ian's bucket um at our S3 endpoint in US East 1.
And if you look at the answer section, which is the response to the DNS query that I made, we've got 8 IP addresses there.
So on any DNS lookup we will return up to 8 IP addresses, um, to the client.
Many, many clients can take advantage of this and actually take all those 8 answers rather than just taking one off the top, cache them and use them to parallelize, but also use them to retry against. For
example, if they fail a connection to one AIP, they can just pick another one without having to re-resolve DNS.
So this actually takes the latency down on connections and also allows you to go in parallel, and many SDKs and many clients actually resolve DNS and then build a bigger and bigger cache list of all the IPs that they can use.
Um AWS SDK versions Java 2.X and C++ and Python
Java 2.X and C++ and Python Boto 3 all take advantage of this and have so for the last couple of years. So a lot of this is actually baked in under the hood. You're getting it, but if you want, you can go dig the endpoint today and see, see for yourself.
OK. So I
talked a lot about connection management in theory, um, a couple of years ago, a coup, a few of us got together in S3 and realized, hey, we have all these best practices we're talking on a case by case basis with customers about actually using them, baking them into their client, about how do they architect their H2B clients, and we thought, how can we actually go and just give this to everybody for free and get
them to stay up to date so they don't have to think about this very often. It's good to know about it, but you don't want to
often. It's good to know about it, but you don't want to be tinkering around with clients every single day.
And with that we launched the AWS Common Runtime or CRT as we call it for short. It has all of our best practices
short. It has all of our best practices baked in in code, um, and it has a ton of value. So it,
it has things like it handles asynchronous IO.
It has a HTB client that's optimized baked in, um, and it does authentication and authorization for you.
And it does things like automatically parallellyze for uploads and downloads so it all those range gets in multi-part uploads I spoke about it handles that for you and it will actually also try and scale to your performance needs and I'll talk about that in, in, in a second.
Um, and the other thing, it has built in retry logic. So when I talk, spoke about failures, it already has optimal retry logic baked in. So if you fail a request, it will
in. So if you fail a request, it will retry using a cached IP as opposed to re-resolving DNS. So you get performance benefit from that.
DNS. So you get performance benefit from that.
There's a lot within the CRT, but I just wanted to talk about one single configuration that I really like because of its simplicity, and that is target throughput.
I've put up the SDK config and the CLI config example, um, but the target throughput is basically you telling the CRT, hey, I wanna get this much throughput out of my client.
Now, just to, just one call out, this throughput is in gigabits per second.
Not gigabytes, which we were talking about megabytes earlier. This is bits, so it's a little bit
megabytes earlier. This is bits, so it's a little bit of a different scaling factor. The reason it's in bits is most of our EC2 instances, their network interface cards are rated in bits. So it's nice for you to be able to think, hey,
bits. So it's nice for you to be able to think, hey, I wanna get 20 gigabits per second out of this S3 client on this instance, which has a 50 gig nick, for example. So the default
example. So the default is usually set at 10 gigabits per second, which is pretty high, but you can go and configure this, and what it will tell the CRT to do is.
To looking at the objects you're wanting to upload or download and taking your target throughput, it will automatically figure out how many connections we should open to try and maximize to that throughput, and it gets, I've, I've tested myself, it gets pretty close to around the 20 if you set it at 20 you've got the right workload.
So something to look at, it's a really nice configuration to set. I wanna achieve this throughput, get this
to set. I wanna achieve this throughput, get this performance, instantiate the client, configure it, and away you go.
OK, CRT, how do you, how do you actually use it? Well, the CRT is embedded in multiple clients, as I mentioned.
Um, it's also the foundation of our open source file connector client which is Mountpoint for Amazon S3.
Um, and it is also available by default on PO 3, and the DAWSCLI on TRN1, P4D, and P5 EC2 instances.
And the reason is those have very, uh, large CPUs and large network interfaces, so they benefit the most from these performance design patterns.
For other instance types you can go and enable it and configure it yourself. OK,
it yourself. OK, I'm putting a QR code here for getting started with the AWSCRT.
Now you know about it.
It's definitely the first place to go when you're trying to think about how to drive higher and higher throughput to S3.
OK. So we talked a lot about parallelization of connections and going broad.
And a reasonable question you might ask yourself is, well, I can I just take that and just continue to scale indefinitely? And the answer is
indefinitely? And the answer is generally yes, but it also depends on your prefix structure.
I'll talk about what that means in a second.
So we talked about those three components. We had our front end, we had our uh index, and we had our storage. Our index
is a subsystem that's responsible for mapping metadata to storage locations. So you can imagine when a request comes in for a get object, we are looking up that index saying, hey, where are those bytes? so I can get them back to the customer.
those bytes? so I can get them back to the customer.
Or to the client And that's what we're looking up our index. Our
index works around key or sorry, our index works around prefix structure, and we'll talk about what that means in a second, but just to give you, take you back to the architecture.
OK, so what is an S3 prefix? It is
any string of characters after the bucket name.
So you create your bucket name, Ian's bucket, for example, and then any of the keys after that or any of the text after that is your prefix structure, and that's used to map into your the object, uh, data itself.
Let's make this concrete.
Here, for example, I've got my organization, and I have 2 divisions or 2 departments within that organization. I've got
engineering and I've got finance. My engineering
organization likes to store their data between prod and test.
In prod, they have their production software artifacts that they use to. Deploy and in test they have their test software artifacts that they use to deploy.
My finance team likes to think about their fiscal artifacts in years because they have deadlines towards the end of the year that they need to prepare for, so they go based on year.
So that's what your prefixed structure might look like in a bucket.
Prefixes with an S3 because it's an object store are not directories, but it's a very easy and natural or organic way to think about them. It's like directories going
them. It's like directories going down to my data, but it's very important you don't just think about them as directories because it can lead to.
Some, um, suboptimal structures and prefix.
So let's see how we put this in practice.
Um Or first, sorry, why does this matter?
As I mentioned, it matters because there's limitations per prefix.
When you create your bucket for the very first time, when I create Ian's bucket, I can achieve 3500 puts or 5500 gets against that bucket. I can do more if I want to, but I need to think about my prefix strategy underneath. If I create 10
strategy underneath. If I create 10 prefixes in my bucket, I can now 10x that, that, uh, request per second or puts per second or gets per second.
So now I can achieve 35,000 puts per second or 55,000 gets per second.
We automatically scale um based on your prefix structure as you are driving more and more requests.
So as you drive requests that exceed the 3500 puts or the 5500 gets, we'll automatically partition out your prefix structure so you can drive more and more TPS.
So let's walk through an example.
I've got my reinvent bucket. Today I can do 5500 gets per 2nd and 3500 puts per second.
So let's talk about what the, the first split might look like for my prefixes. I've got
prefix one and I've got prefix two.
You might think of this as sales and engineering from before or finance and engineering from before.
On my first prefix, I can now drive 5500 gets, and on my second prefix, I can drive 5500 gets as well. When I'm driving a combined 11,000 requests per second, S3 will automatically split out these prefixes, seeing that there is heat, uh, what we call prefix heat on both, and then splitting out those, uh, prefixes.
But we can take this a step further.
Very simply to keep things simple on the screen, I've now got A and B under prefix 1 and prefix 2, and now I can drive a combined 22,000 TPS. Again,
TPS. Again, as I drive more requests to it, S3 automatically partitions out these prefixes. So really important to think about how you're prefixing your data before.
You want natural splits in your prefixes between different workloads.
This can again lead to sharp edges if we're not careful, and we'll talk a little bit about that next.
So here I've now switched over to a different format. I've
got day within my prefix, and this is a very common pitfall for uh users of S3, um.
Having time-based data or day-based data. It's very,
data. It's very, it's very alluring or it's very easy to stick the day up front because you're changing your data over time and you might have analytics workloads that want to run on a day by day basis so it's easy to stick day at the start of your prefix to help you think about your data, but that does lead to sub optimal outcomes.
So let's think, let's talk about it.
We'll take our exact same sample before where we have our prefix 1 and prefix 2 and we have A and B, but day 1 is to the left and day 2 and day 3 and day 4 will all be to the left.
The problem with this is that.
Once I've generated enough requests to my prefixes, which has resulted in the partitioning to achieve my 22,000, when I move on to the next day, that's all wasted, it's all gone, because I have a day at the start of my prefix instead of later on.
So you wanna make sure that day two or any dates are further down or right in your prefix structure so that when you are, we are partitioning your prefixes to the left, you're able to reuse that those requests later.
So I, I've called it out here, but the partitions from day one are now unused, and we need to do the same splitting on day two while we see sustained load.
Now while we're actually partitioning or breaking out these prefixes to drive more requests, you might see um HTP 503s from us. We're telling you to slow down while we're
from us. We're telling you to slow down while we're doing the work to uh break these out.
But once we've done the work, then you will sustain that those requests per second without seeing any H2B 503s or slowdowns.
And so it's really important that rather than putting the date to the left, you're pushing it down so that the work we do to partition out your prefixes gets reused as the days go on.
And so in this example, I've now pushed day 1 and day 2 to the right, so any work we've done on prefix 1 and prefix 2, to split them, and then A and B to split those gets carried over as the days move on.
So really, really common pitfall, um, and something to definitely take away from this, um.
But whenever you're starting with an S3 bucket just to test or mess around, it's very easy to just create any prefix structure that maps to what you have.
But when you know you're going to do high throughput and high scale, take a second, step back, right down and make sure you're thinking about how your prefix, um, structure will look like and that you're having natural divides to ensure that you can drive the TPS you need to.
Here I've got another QR code.
It's a best practices reference for optimizing um high request rate workloads, talks a lot about prefixes, gives you some other examples.
I think it's a great secondary um dive into this. I've given
you the kind of high level of how this works at a very simple case, but if you wanna go deeper on that, use the QR code.
OK, now I will hand over to my colleague Dev to talk about uh Express One zone.
Thank Hello, everyone. My name is uh Dev Prat
everyone. My name is uh Dev Prat Kumar, and I'm a product manager on Amazon S3 team.
I lead S3 Express One Zone, and I'm gonna talk about its performance characteristics, some of its unique capabilities, um, Top use cases customers are using it for, and some key architectural considerations when building with S3 Express one zone.
So, let's dive right in.
STX press 1 zone is the fastest object storage, delivering single digit millisecond access times. That's really fast.
access times. That's really fast.
It also offers up to 2 million requests per second per directory bucket.
I'll talk about it a bit later in more detail, but one of the key things to note here and kind of extension of um what Ian was talking about is you get the TPS capacity right when you create the bucket.
You can scale it up to 2 million, but you get 200K right out of the box, and we'll, we'll look into why that's important and why um it's super relevant for some of the bursty access use cases.
Then you also have some pretty cool differentiated capabilities, specifically in this storage class.
So for example, you can add data to an existing object while it is in the object storage.
You can basically I mean to say you can append data to an object while it is in ST Express one zone.
Something pretty cool for an object storage, which before um this capability had been um immutable, you could not actually mutate an object in object storage.
Also, you have order of one rename operation now available, which basically means that regardless of the size of the object, you can rename it in constant time. A brand new API we launched a few months back this year. With these capabilities,
year. With these capabilities, um, customers achieve up to 10x faster access times on S3 Express 1 zone compared to S3 standard.
So, why does all of this matter?
It matters because there are a whole bunch of use cases that need high performance, kind of intuitive, I guess.
At this point, actually, let me do a quick show of hands.
How many of you in this room have at least one of these problems?
Like at least one use case here that you work with. OK, that's a, that's a good number.
work with. OK, that's a, that's a good number.
Hopefully all of you use XT Express already and benefit from this talk.
Let's start with the first one, ML training.
Over the past few years, a lot of customers are using ST Express One zone to drive very high levels of throughput, very high levels of data transfer speed.
As we know for ML trading, customers deploy very large number of high performance GPUs, and they want to keep these GPUs busy, right? So your data is in your object storage,
right? So your data is in your object storage, you want to get it as soon as possible. So
you want to have the fastest data transfer speed that you can attain.
When you try to do that, You run very high levels of TPS.
And your, you know, your transfer speeds could be up to 1 terabytes per second.
And because ST Express one zone is built for high performance, you benefit from, uh, its capability to scale to these levels, and you can keep your GPUs busy and continue doing meaningful work.
So, super popular for, for ML training these days, and we see a lot of customers adopting it, some even training, uh, foundational models with S3 Express one zone.
The next one is interactive query.
And you know this is a use case that has existed for as long as I can think of.
Customers have data, they want to make use of this data. They want to run high performance queries.
A lot of these queries are also interactive where you have an end user waiting for the answer to come back.
So for example, think of observability analytics use case. So you
have a whole bunch of log files sitting in S3 and you want to get insights out of those log log files.
When an end user is running a query.
He might burst to 10s to 100s of thousands of TPS because he may need to scan a lot of that data.
And obviously he requires low latency access because he, he doesn't want to wait too long.
So this is exactly when you would want to use ST Express 1 zone to benefit from his high TPS low latency access for these interactive querying scenarios.
Next one is, again, like login log comes again here, but this is a different use case, log in media streaming.
So what happens with login media streaming is that you constantly get new data. So let's say you're, you
new data. So let's say you're, you know, you, you, you have applications and internal services, creating logs, so new data is coming, or let's say you are a broadcast company and you have continuous feed coming in, you typically write to an existing file.
And before we had that this append capability, customers used to provision and manage their own storage on top of S3, and then that's where they created these files.
And then when you have these files, you have consumer applications. If you're a broadcast company, then you probably want to send it to your end users.
Um, if you are an observability company, you want your end users to be able to run analytics on these files. So you want to perform rights as reads as well.
So, with ST Express one zone.
Because now it has the upend capability, you can essentially create these files in the object storage itself, so you don't need to maintain an additional layer on top of it.
You can build your files, you can use the upend capability, and you can run high TPS low latency queries on top and have like, The full workflow running right from um object storage itself. So that simplifies um architectures, lowers cost and obviously improves performance because of the unique um performance characteristics that ST Express one zone offers.
Last but not the least on this slide.
I model loading.
So, over the last, I would say 1 or 2 years, a lot of enterprise companies are now building inference pipelines.
In an inference pipeline, let's say, if you are an e-commerce company and you have, you, you have a recommendation engine.
You may need to update your models because, let's say your inventory has changed or um your customer usage pattern has changed.
And whenever a model gets updated, customers want their inference notes to pick up the updates as soon as possible.
And in a typical inference pipeline, you would have tens of thousands of nodes. We have also seen like hundreds of 100,000
nodes. We have also seen like hundreds of 100,000 plus nodes trying to read, um, uh, you know, a model and, and weights which is just a few files pretty much at the same time. So it's a very bursty access problem.
With S3 general purpose buckets, as Ian was explaining, the TPS capacity scales gradually. You
can scale to very high levels, but the scale up is gradual.
If you suddenly have like a burst of 100K gets coming in, let's say on a new bucket, then you may get slowdown, um, uh, error messages.
You can retry, but that, that may, uh, slow down the workflow, and that is something that customers are very sensitive to, specifically for these, um, real world interact, um, uh, inference pipeline use cases. So
this is again where ST Express one zones, high TPS out of the box, uh, really comes in handy for customers, and we have customers building a bunch of, um, model loading and inference pipelines on ST Express today.
Before we go on to the next slide, um, I think by this point, we understand that, you know, S3 Express 1 zone is the fastest object storage, offers 10x faster um uh access times compared to S3 standard.
So, it's actually a really good option to build your cache on, right? So you can keep your data on in the general
right? So you can keep your data on in the general purpose bucket in your regional storage classes, you get multi AZ resilience, um, and have a caching layer that runs out of, of uh S3 Express one zone.
And that's exactly why we have a number of customers today using ST Express One effectively as a cache on top of general purpose bucket and regional storage classes.
One of these customers we are super excited about is uh Tavili.
Davili is an AI platform company.
They have, they basically serve as the web access layer for agenttic workloads and um uh large language models, and previously they managed and built their own cache.
They provisioned storage and use that to deliver low latency access to these agentic workflows.
But it's harder to scale, as you would imagine, uh, their management overheads.
So they looked, they started looking into uh ST Express one zone, and today they're running on production and their caching layer is basically ST Express one zone.
It scales elastically, delivers single digit millisecond access times, and the total cost of ownership of Tivoli has gone down by up to 6 times.
Just an example of how elastic scaling and high performance out of the box can help customers save costs in addition to improving performance.
All right So, now, at this point, we appreciate, I hope, the performance, uh differentiated the differentiated performance of SDX press 1 zone, the popular use cases customers are, are using it for.
Now, let's go into some of the architectural considerations that you may want to keep in mind when building, um, with STX press 1 zone for one of these use cases or any other use case you may find it useful for.
There are basically three considerations.
Um, the first is it's single availability zone nature, single, it's a zonal storage class.
The second one is, uh, the directory buckets, which is a new construct we announced alongside STX press 1 zone.
And the third one is it's unique auth mechanism, um, that we will talk about, which is optimized for low latency access. So, let's
access. So, let's review each of these in greater detail.
STXpress one zone is a single AZ availability um single availability zone storage class, which basically means that when you create objects, it is stored in a, in a directory bucket in a single um availability zone.
This allows faster access, but that doesn't mean that you cannot do a cross AZ access.
Once you create your directory bucket, you add your object there.
You can access it from a different availability zone.
And one of the remarkable characteristics of S6 + 1 zone is that the access cost remains the same.
So regardless of whether you're accessing your directory bucket from the same availability zone where it is located, or you are accessing it from compute from a different availability zone, there is no additional network cost.
Something to be aware of, um, in case your fleet is really large and distributed.
Um, across different availability zones, which is pretty common for use cases like um machine learning training and, and large analytics workloads.
Now, let's talk about the second um architecture consideration that I would say like a design concept.
So, S6 directory bucket is a relatively new type of bucket that we launched two years ago.
It um exists alongside our general purpose bucket and has a pretty uh different scaling model as far as TPS goes.
So with general purpose buckets, as Ian was explaining, It scales TPS capacity based on load, so if you, if your TPS load increases on the bucket, then it automatically scales to higher TPS capacity.
On the other hand, With directory buckets, as soon as you create a directory bucket, it is already scaled up to 2000 or 200,000 get requests per 2nd and 100,000 put requests per second.
So if you have bursty access use cases where it's hard to predict, let's say you have a whole bunch of end customers sending observability queries and you don't know which query is going to be like super bursty or you have like model loading pipeline that I was talking about earlier, then you want to be, to, you want your bucket to be already scaled up, and that's exactly what directory buckets are.
They're already scaled up to 2000.
Gets per second by default, 100K puts per second by default, and you can also request for further scaling to up to 2 million get requests per second.
So different scaling models of TPS relevant for different use cases, um, depending on the nature of your application.
Moving on As the name suggests, directory buckets store their data in directories, no surprises there.
And the name space is basically hierarchical.
The implication of this is that if you try to run a list operation against your directory bucket, and you're trying to list the entire bucket, then your list results are not going to be lexicographically sorted.
This is in contrast to general purpose buckets, where your list results are lexicographically sorted.
Why does it matter?
We always want to make it easy for our customers to reuse existing applications, existing codes. So if you are
codes. So if you are reusing your code, um, that works against a general purpose bucket, this is a consideration you want to be aware of. If you have, if your application, for
of. If you have, if your application, for example, doesn't make any assumptions of the list order sorting, then it's totally fine, but if it does, then, then this is something that you may want to be aware of. Next,
of. Next, let's talk about art.
When we launched SD Express One zone, we also launched a new auth mechanism, the session-based auth.
What happens with session-based auth is that you have an API, the create session API, that you use to create a session.
Again, no surprises there, and get temporary credentials that you can then use in subsequent requests for authentication.
What this does is it amortizes or distributes the cost of your art across multiple requests.
Which basically means that each of your requests actually finishes faster.
Again, the reason why we did this is to improve latency performance for every single request that you make. If you architect
make. If you architect against REST APIs directly.
You would need to use this API to manage um sessions and tokens.
However, if you use one of our SDKs, then this is done for you.
So basically, the session management and token management is taken care of by SDK and you don't have to worry about it.
Again, something to be aware of and we strongly recommend using our SDK and CRT and other stuff that I'll touch upon again and Ian already talked about.
OK. So,
to summarize what we have talked so far, If you want to use ST Express one's own, if you have one of the use cases I talked about earlier.
You create a directory bucket first, scaled out to 200K get TPS by default.
You want to have your compute in the same availability zone to optimize for latency.
You can keep it in different availability zones as well, um, if your fleet is spread out, but to get the best latency performance, you want to co-locate your compute with storage.
You want to use the session-based auth to access your objects, and if you're using your SDK then session management, token management is taken care of for you.
With this architecture, you achieve high TPS, low latency access right out of the box.
You increase the speed of data access by 10x and Benefit from ST Express One zone's performance for request sensitive or request intensive applications like machine learning training and latency sensitive applications like query.
To put it all together, What Ian talked about, what I talked about, and taking a step back.
When you are thinking about optimizing performance for your use case, you want to think about The requirements You want to think about What's the latency requirement of your end user or your application?
You want to think about what is the access pattern, whether the requests are bursty, whether the, the request is gradually increase over time, and you want to think about the kind of throughput that your application and your workload would drive.
If your access pattern is bursty.
If you have end users waiting and your application is latency sensitive, then we recommend using S3 Express One zone.
On the other hand, if your TPS load increases gradually over time, or you have predictable access patterns that you can partition your prefixes as Ian was describing, then you want to use general purpose buckets and you can use any of our regional storage classes like S3 Standard or S3 intelligent tiering.
Regardless, we strongly recommend using CRT based client or Amazon, uh, common runtime library-based client because it implements a lot of performance best practices like use of multi-part uploads and range gets for you. And you can
you. And you can benefit from CRT if you are using our SDKs and opt into it, or if you're using um Mountpoint for Amazon S3, which is our file client for object storage.
Here is a.
Best practices reference essentially a blog which contextualizes everything that we talked about in the context of a really popular use case these days, which is checkpoint storage.
If you are in the ML space, strongly recommend reading it, and even if you are not, recommend reading it so that, you know, you, you get a sense of everything that we talked about in the context of a real world problem.
On that note, um, we really appreciate you joining us today and hopefully you found the content useful.
We request you to share feedback on the app and enjoy the rest of the reinvent. Thank
you.
Loading video analysis...