TLDW logo

Co-Packaged Optics for our Connected Future

By Microelectronics

Summary

## Key takeaways - **Data Centers Trap 76% Traffic Internally**: Up to 76% of all data center internet traffic traverses internally within the data centers, bouncing dozens of times between storage and processing. This explosion drives the leading market force for interconnect R&D. [03:20], [03:31] - **Disaggregated Compute Wastes 30-35% Memory**: Traditional siloed compute over-provisions memory, with 30-35% typically unused due to diverse workloads. Disaggregated architectures pool resources via high-bandwidth, low-latency interconnects to reallocate efficiently. [05:28], [05:51] - **Chiplets Enable Die Yield, UCIe Standard**: Chiplets—dies co-packaged as one chip—improve yield with smaller dies, mix technologies, and speed time-to-market using pre-validated designs. Universal Chiplet Interconnect Express (UCIe) standard now governs die-to-die interfaces. [07:00], [07:54] - **DSP Shift at 28nm Sacrificed Efficiency**: At 28nm node, early DSP-based transceivers traded power efficiency for scaling, enabling continued Moore's Law benefits. From there, efficiency improved exponentially with process scaling. [14:52], [15:34] - **CPO Die-to-Die Hits 5Tbps/mm Beachfront**: Chiplet die-to-die links achieve 5Tbps per mm edge using UCIe, outpacing compute demands. Optics must match with 1Tbps/fiber or denser packing to avoid bottleneck. [37:41], [38:45] - **Co-Optimization Yields 160Gbps on 32GHz**: Co-designing optics, electronics, packaging, and DSP enables 160Gbps transmission over analog circuits with only 32GHz bandwidth. This unlocks major performance gains via integrated optimization. [47:42], [45:31]

Topics Covered

  • Data Centers Trap 76% Traffic Internally
  • Disaggregation Frees 35% Unused Memory
  • Chiplets Democratize Advanced Packaging
  • Co-Packaged Optics Risks Overheating ASIC
  • Co-Optimization Doubles Bandwidth Limits

Full Transcript

thanks thanks everybody for attending so I'm going to speak today about co-packaged Optics and talk about it as part of our connected future I'll begin with some motivation

and try to put co-packaged Optics in the context of the overall evolution of i o and then look at some of the specific applications where co-packaged Optics are likely to start having an ad impact

sooner and some later and we'll talk about the challenges and opportunities for co-packaged Optics and some of those some of those different applications um it's a hot area and you'll see you'll

see my perspective come out through the talk you'll see that you know that there's some map it's there's a lot of uh research and development going into it a lot of emphasis lots of excitement around it justifiably it's an exciting

technology but you know it's important to recognize where there are roadblocks and thereby identify the appropriate use cases for co-packaged Optics so hopefully that's something you'll get out of the talk and then the last part

of the talk I'll talk about um some work on co-optimization of Optics electronics and packaging which I think is really important to enable this technology to have an impact going

forward we'll start with the motivation obviously I I always tell uh my my students that I I hate these

slides about the never-ending growth of data traffic but needless to say we won't be satisfied till every flat surface and every room and every building is papered with 3D high def

video and cameras uh so that's driving a tremendous amount of data traffic um and data transmission networks today account for estimated one to one and a half

percent of global electricity use and they represent about one percent of energy related greenhouse gas emissions actually those numbers are very low I remember the numbers were similar five to ten years ago when I first started

looking at them they were forecast to increase dramatically and it's only through the really hard work of smart people like you that those numbers have held constant which is great and it's going to continue to be a challenge for

the for all of us to keep it constant into the future foreign so I think an important point is to remember is that all this data traffic

whether the end points are separated by kilometers of optical fiber or meters of copper cable or in some cases just a fraction of a millimeter printed wiring

the start and the endpoints are and will be silicon chips I mean there that's where we've got memory got tremendous computational capability tremendous infrastructure for CMOS fabrication so

if you were to be able to trace a bit in its long journey from endpoints right we usually think of communication we should think of our phones right we should think of that last pop the wireless link to my laptop

or whatever it is but really that's just the last hop and a long journey where the data is traversing into some Network typically there's some optical fiber carrying it into some compute

infrastructure data centers which is called Data Centers um as a shorthand really for a lot of different types of infrastructure Computing and then once it's there it bounces around back and forth dozens of

times into storage out of storage do some processing on it put it back in storage and then produce a result send it back out again over the network um and so in fact

up to 76 of all data center internet traffic traverses internally within the data centers so there's some source for that actually I believe in them the fraction is actually higher I'm sure it depends how you count what kind of like

what kind of hops of data you actually count in that number I actually think it's a much higher fraction than that um so that's a tremendous driver just the explosion to interconnect within the

data center is now the leading Market Force that drives R and DM connectivity in general but another one is is the continued rollout of wireless networks like rolled out of 5G

has meant you know a continued increase in the number of users and the bandwidth delivered to those users has motivated the placement of application specific compute close to these

endpoints of data so you'll see remote compute sites more of them sprinkled about it geographically not just the few massive data centers but you've got compute all over the place to provide in

the end better user experience for all of us um so that's also creating new applications for high-speed connectivity right you've got compute out here it's got to talk back to larger

infrastructure so all these new applications emerging it's all driving or fueling r d and Optics foreign there's some other sort of Mega trends that I want to highlight before that

that really set the stage for co-packaged Optics another important big Trend to be aware of is this trend towards disaggregated Computing so if we look at traditional siled based compute

architectures essentially you've got these cards these boards each one's a computer on its own it's got compute it's got storage memory um and then it needs some connectivity

to talk into a network so this makes a lot of sense because then processors can fetch and store data in and out of storage and memory quickly and with low latency because it's co-located right

there but on the other hand you can imagine that when you roll this out over a massive data center you've got very diverse kinds of compute workloads going on it's very hard to make that efficient

there's going to be some compute tasks that are memory intensive others that require much less latency much less memory and storage to perform so you're going to be over provisioned somewhere

and estimates are that with that this current architecture that there's 30 to 35 percent of memory that just typically sits there unused in a typical data center that's actually not bad when you

think about the diversity of tasks that are going on there but still that's a lot of money spent on memory that's not being used on the other hand if you switch to this disaggregator the trend now is to look at these disaggregated architectures

where you've got clusters of storage uh compute accelerators memory and they're interconnected by a network an interconnect fabric connectivity

that's so high performance so high band with solo latency that it performs seamlessly just like the one on the left but allows you to redistribute reallocate resources according to compute workloads that's obviously

really attractive and it's great if you're someone like me that works on connectivity because it's basically saying let's take that money that we used to spend on unused memory and storage and instead let's invest it on a

better interconnect infrastructure that enables this disaggregated architecture so again this shift creates a sort of Step change increase in the emphasis the importance

of connectivity in the performance of these things so I've been with low latency connectivity here is a key enabler and and we'll see this co-packaged Optics is

a key part of that the other thing is the emergence of a new design Paradigm for complex electronic systems it's using chiplets so chiplets are essentially if you haven't heard term

their individual dies that can be co-packaged side by side and sold in combination in a single package as a single chip and behaves like a single chip obviously

in order to behave like a single chip that's going to rely on a fabric of dense high-speed interconnect between the chiplets and this is something that's been happening inside of a few large companies for quite some time

already if you follow this area you will know about amd's efforts and Intel's efforts and apples and efforts in this area but what's exciting is that there's growing momentum in a growing ecosystem

to support the democratization of this technology to the across the whole industry there's massive push for this as PayPal costs and the most advanced CMOS Technologies continue to increase

for example in the past year we saw for the first time wide acceptance of a standard that could govern this diet these die-to-die interfaces it's called Universal chiplet interconnect Express

ucie standard if that sounds reminiscent of pcie it's not an accident there's some shared history there um and so the potential benefits of this

new chiplet design Paradigm are many for the whole industry it offers the potential for lower cost because by having many smaller dies instead of one Giant die you're going to get better die yield it's going to allow you to use

older Technologies for some parts of the system which will be cheaper to design and fabricate uh it'll let you implement new functionalities in all inside a package by combining different Technologies

using different Technologies for what they're best suited for by allowing companies to use pre-validated chiplet designs it's going to accelerate time to market for some really complicated systems right you're

going to be able to grab at least the pieces you need off the shelf people are doing that today by using IP blocks but in principle it'll be even quicker if you have a completely pre-validated

chiplet with a standardized interface to mix and match and go to market with one of these so it's really going to open up Innovation more more players so very exciting a lot of momentum behind this

um and uh it's really a big a big Trend that we'll see factors into co-packaged Optics really relies on multi-chiplets in a package you've also seeing at the same time an

increased use of optical cables and getting those Optics closer and closer to the endpoints of the data so historically we might interconnect computers and racks to each other with

copper cables if you interconnect them to a switch for example that then routes data traffic to other racks usually those are ethernet links and then the data gets aggregated into higher data

rates and flows over longer distances up to a few kilometers so traditionally the Optics optical links are reserved for the longer reaches with the higher bandwidths are required but

has overall Network traffic increases and data rates increase that use of Optics is percolating into shorter reaches into the rack and that's a trend

that the natural extension of which would be bringing the fiber even closer to the endpoints of data bringing it right next to the CPUs the gpus the switches themselves

and that's where that's where co-packaged Optics come in right so we've we've already got even even in a picture like this where we have Optical cabling connecting servers in Iraq you've you've got it you've already got

a combination of optical and electrical links there right you've you've still got to have these little pale blue boxes that are essentially doing the optoelectronic conversion for you and still result in an electrical link here

and the intention is look if we just bring the optoelectronic conversion right next to the endpoint we can eliminate the need for an intervening receive and re-transmit Chip there that

is not insignificant in terms of cost and power consumption um so that's the high level goal and motivation that's how it supports a lot of Mega Trends

um but there's a lot of a lot of challenges along the way so again just in summary of this section big trends we've got first of all to say this trend towards this

aggregated compute which relies even more on high performance interconnect uh the horizons chiplet design Paradigm and a whole ecosystem around it that's going to allow for mix and match all in

a package you've got this desire to move bring Optics closer to the endpoints of data and then all this rides on the train of CMOS scaling right like that kind of didn't bother wasting a slide on

that right CMOS scaling down three nanometer now going to menashi transistors this year um uh you know has been fueling this is

an underlying Force driving that and and that's important piece of the evolution of i o all that innovation in CMOS technology as well as demand for

our ubiquitous video and cameras and so on have have driven tremendous allowed us to implement tremendous information processing on a single die to the point where ultimately moving the information

between chips is now becoming the bottleneck right so it's the bottleneck in terms of the its energy consumption it's consuming a large and growing

fraction of overall energy consumption in these systems uh in terms of the dye area you know I mean you can't do much compute if you're spending the whole die just moving bits on and off die um in terms of our design effort which

is a big deal and perhaps time to Market and impacts the cost of r d which is a big fraction of the cost of these systems and it makes the thing right it's one of the riskiest parts of the

design I mean memory digital logic you know there we have good flows we have solid procedures for ensuring it'll work high speed i o remains a significant

risk factor in a lot of these chips uh they can it can cause yield Fallout that impacts cost significantly and test time right so when these things are fabricated even volume production some

kind of testing is to perform on every single part it has to be performed in every single part and it actually takes a significant amount of time just to test the data going in and out of these chips and that

slows down the flow of the chips this costs money so fueled by increasing data traffic you've seen here see here a scatter plot that's a survey of

Publications from to the leading conferences in this area the international solid state circuits conference and the VLC circuit Symposium over the last couple decades

um and you see this trend towards increasing data rate note that it's a log scale on the y-axis here versus linear scale for time so you see this sort of generally the doubling of data

rates every five years in support of all these Trends we've been talking about um and that is being coupled with of course Moore's Law so if we change the Y

act the x-axis here now instead of being year we changed it to process node you still see something similar right you still see this kind of scaling of data rate with technology

node but I think an interesting thing I I like a different way I like to parse the data that's very instructive to me just to understand how this has evolved over time isn't here I've still left process node

as the x-axis but I've changed the y-axis here to energy per bit or power efficiency on a log scale um by the way and so what you see here

is generally you can visualize here in exponential improvement with process technology scaling and with time but you see something with the reason I use process node here on this this x-axis is

to highlight something funny happened here a 28 nanometer you give this the same survey of just published papers combination of industry and academic papers here um you see the 28 nanometer node

um a big spread of papers and if you dig into the data there what you see happened is that that was the first node where some of these Works were making use of DSP based transceivers and the early adopters of

that type of Technology by which I mean essentially implementing a complete modem for every single Connection in and out of a chip right you've got like a modulator a

demodulator digital logic doing Equalization and all that um the people that first transitioned to that style of architecture in the 28 nanometer node paid a price in terms of

the power efficiency they were less power efficient but that transition in architecture was necessary to enable this continued scaling right that can allow us to continue to benefit from

Moore's Law scaling into the future so you kind of see a sort of a funny note here where some people moved over some people didn't to DSP architecture and then from there the scaling continued on

the present day right power efficiency continues to improve so I think that's interesting to understand so now we're in a error where DSP based transceivers are ubiquitous

um for these very high-end connectivity applications so the other Trend that we talked a little bit about is this gradual transition from electrical cabling to Optical cabling and then gradually

moving the optical cabling closer and closer to the packaging so I'll step through these slides quickly here to make sure we have time for everything I want to talk about but initially you just had chips on

separate boards talking to each other over completely passive combination of printed wiring traces and copper cables as data rates increase the loss over

that link became untenable so repeaters were added essentially receivers and retransmitters were introduced the sad's cost and power but it allowed data rates to continue to increase when people

start getting clever to try to allow data rates to continue to increase maybe we can use cables instead of the printed wiring traces maybe they can be engineered to have less crosstalk and

progress continues at some point there's a motivation depending on the reach the combination of reach and data rate to use Optics here at this point you need to do some optoelectronic

conversion somewhere in the link there is I would argue quite fundamentally some higher cost and power associated with optoelectronic conversion if

if you're in a reach and data rate combination where you're you're able to do it with a copper link like reasonably right you just you got to pay a price to convert back and forth top to

electronics but the challenge is at some point the combination of reach and data rate just becomes impractical for copper links altogether and then at that point you bite the bull and you start reaping

benefits from using Optics now again here we're in the situation where we've got these annoying repeaters along the way can't we start moving

those closer and closer to the endpoints of data so the first step is to maybe move them onto the board close to the chip I would say this is

been modest at Best in its impact the reason is it's pretty simple just by looking at the picture you still got an electrical link there unfortunately right um so you can save something but you're just basically making the electrical

link a little shorter so the benefits um have not been compelling enough to make a sea change I would say in the way systems are built some people are really looking forward to okay we've really

just got to put the Optics right next to the endpoints right in the package and even once you're there there's lots of outstanding questions there how do we do this do we just put many chiplets side

by side one of them doing optoelectronic conversion another one may be filled with amplifiers and then you've got the main compute chip or networking chip there or can we stack some of these

chips on top of each other and have a 3D chiplet kind of system what do we do with the lasers do we bring those in the package so I'll talk about some of these as we start looking at some of the

different applications for co-packaged Optics which is what I want to do next Okay so one main application for this

co-packaged Optics is where you've got a really large Asic application specific IC that's doing a lot of work uh it could be a networking chip GPU AI accelerator

just a general processor whatever it is this is a big digital chip doing a lot of uh crunching a lot of data one way or another and therefore it requires tremendous

aggregate i o bandwidth right and and just a growing amount going going forward so in fact if we start talking about

feeding this chip this Asic with 100 terabits per second 200 terabits per second in and out of data you've got a limited amount of wiring you can provide in and out of that Asic uh progress on

that front in terms of the number of wires we can connect to it is is modest at best we can't rely on that so the data rates are increasing 100 200 gigabits per second per Lane of

traffic and at those High data rates just traversing this package and going down through its thickness and onto a board even that's just becoming a big hassle

so the idea is to integrate these miniature up to electronic converters Optical engines right on top of the package and obviate some of those losses so that's the general picture that's

what co-packaged Optics is but there's a lot of a lot of different ways you can implement this a lot of permutations here so one that looks good uh at first and and has

been of Interest some people say okay well let's put in all the transceiver circuits on the Asic okay and then just perform the optoelectronics presumably

with silicon platonics on these separate chiplets that are co-packaged um the the problem there is this puts a heavy burden on this Asic here which

because it's a big data crunching chip it's going to have to be implemented in a very Advanced nanoscale CMOS technology you're asking that chip now to also house a bunch of

amplifiers with tens of gigahertz of bandwidth which that those Technologies are not really designed to do um you're asking it to just do a bunch of work and you as you're back to that

situation I mentioned before you're spending half of this chip on just taking care of the i o so not a there's been not a lot of systems looking like this

instead you can again start making use of the chiplet Paradigm here to say okay let's focus the Asic on doing what it's

good at compute memory and so on and move the RF circuits onto these bar these black bar chiplets here and they're kind of a glue between the Asic

and the Silicon photonics the through the electronic conversion it this is again an example of the benefits of the chiplet design Paradigm in that doing this allows us not only to

free up space on the Asic for more memory and compute but also allows us to use a dedicated process for these amplifiers the City by CMOS process or something that's well tailored to those

amplifiers another approach here is to go the next step and say you know what I'm gonna not just do that I'm going to put some of the other transceiver circuits on here as well I'm going to try and just

Reserve that middle Asic this is a cartoon I drew but imagine the middle Asic is still large right it's not quite to scale anymore um and just Reserve that whole chip from

memory and compute and just put as little as I can on there put everything that has to do with shuttling bits in and out of the package put it all on these chiplets that's another uh

important approach the whole DSP based transceiver resides over here on these chiplets you still need some kind of die to die interface here and that could rely on for example the new ucie

standard or something like it to communicate the data to the chiplets but then on the chiplets you've got all the equalization all the data converters all the functionality that's required to

ensure robust Optical links as well as those amplifiers put on here so that creates some trade-offs in here because now this chip is more complicated there's a combination of analog digital

stuff on there but again it the benefit here is it it frees up as much area as possible on the main Asic for digital compute and Storage

then you've got the final picture here where you have some silicon photonic technologies that allow you to integrate some CMOS transistors on it so an example here is the 45 nanometer CMOS

plus silicon photonics technology from Global foundries right so this opens up another Knob in optimizing the system right maybe we should put some of these

circuits over here onto that chip onto the Silicon photonics chip so for example there's a bunch of control circuitry needed to regulate the Optics and make sure it stays operating

robustly across temperature and so on maybe some of that can be integrated on on this looking photog chip so you got all these different trade-offs really this is I think an illustration of the chiplet design Paradigm and all the

freedom for Innovation that it opens up it's quite likely that the optimal solution will depend on the end application I just try to highlight some of the trade-offs here but I will say that most real systems being worked on

and research most heavily today are in categories B and C here I mean the simplest solution of those for co-packaged Optics is to basically just

take what people are doing today with the optoelectronic conversion at the edge of the boards just take a very similar looking function and miniaturize

it maybe silicon photonics helps you do that maybe a chiplet co-package design Paradigm helps you do that just miniaturize it stuff it in the package um the problem is that

this doesn't really do anything other than make things smaller right if you if you literally just take all the same functionality all the same DSP that was on the face plate and just stuff it in the package

um you haven't really saved any power and in fact you've concentrated the power into a smaller volume so you may have made overheating a bigger problem so this approach is now largely being

abandoned in favor of a couple other approaches which I've already touched on I want to provide a little bit more detail on them here so one is so-called

direct drive co-packaged Optics so here you've got a a combination of chiplets let's call it here's the main Asic and the intention here is that

the large Asic houses on it all the circuitry needed to take care of transmit and receive functions except maybe for that last layer of amplifier

that interfaces directly to the optoelectronics so um and this is called direct drive because other than these amplifiers here

the main Asic is more or less directly driving the Optics these are just very light analog circuits here so this relies heavily on the on on

circuits that are integrated onto the Asic but the good thing is it allows for flexible packaging options this is a big deal because imagine now that you've put everything you need in that Asic you can

again start making use of the chiplet design Paradigm to say hmm maybe I'll swap out different Optics here and the same Asic will will work with it maybe in fact I'll forget the Optics in one

application and I'll just go out electrically because I'm only going a short distance anyway and an electrical link can handle it and then I don't need to spend all this money for these extra opt electronics and chiplets here so it

makes the Asic more flexible allows one async to address a wider variety of situations um but you know there's a there's a

challenge here in that you need more circuitry here and it limits the density with which you can get data on and off this chip because you you've got pretty relatively large complex transceivers on

the Asic the alternative is being referred to as Digital Drive co-packaged Optics um so what this is referring to is

making use of a really dense low power light die to die interface which is usually wider meaning more wires in parallel and operating at a lower data rate that allows it to use simpler

circuits that consume less power and allow it to con to communicate more aggregate bandwidth between two chips while occupying less error area on the main Asic so frees up more area for

compute memory there and um and takes up less of its power Envelope as well so it looks looks more optimal overall but with this type of die to die

transceiver on the Asic you're only going a millimeter or two right so you then this is more of a tailored solution where you're coupling it with a chiplet that has a mating die to die interface

on the other side and then all this DSP based transceiver that was integrated on the Asic in this picture is now being pushed onto this chiplet here so this is

another another Paradigm ultra low power and small area on the Asic and and also allows for a very high bandwidth density going in and out of the Asic die but it

does create uh other challenges as well so again it's going to depend on on the application which is preferable so where do we see Co package Optics

actually being used today some of the the highest volume applications right now are actually not in these very big chips yet um they're actually inside the plugable modules themselves these are the modules

that the edge of the card that perform the optoelectronic conversion there we're already seeing essentially co-packaged Optics being used just to miniaturize what's inside of these these

plugs so there's a paper from Intel just a couple weeks ago that provides some details about the integrated they call it integrated photonic engine again that's the our Optical engine is the

kind of terminology that's used for this thing essentially this is a co-packaged thing combination of an electronic and silicon photonic chips and maybe some other stuff in there

um that is allows a very small form factor that can then be crammed inside a plugable module um so it's being used inside 800 gig modules for this purpose it's also been

used you know in in coherent Optical data transmission as well where again there's a there's a benefit there there it's being used for different motivations not the ones I'm talking about these large chips that need High

aggregate traffic it's just the flexibility allows for co-design and co-optimization is beneficial to the application so this is good because it's sort of germinating co-packaged Optics

in general creating the packaging technology that's needed letting the industry figure out how to do fiber attach inside of these packages um but really mostly what I've been saying where this is going is in these

some of these large chips so and this is what is generating a lot of excitement I would say a lot of a lot of chatter in the research community one type of large Asic is a networking

Asic like a switch broadcom made a lot of um you know it could generate a lot of interest when they announced um their co-packaged Optical switch so essentially they've got up to electronic

conversion arranged around the perimeter of this package the main die in the middle and then just fibers coming out hundreds of fibers in fact coming out of this thing they call their technology

for doing that silicon photonic chiplets in a package or skip let's say a cool name Intel even earlier announced a demonstration of something very similar

switch with Optical i o and Cisco just a couple weeks ago made news at ofc by demonstrating something similar which they call silicon one so that's a main

application classic example of sort of thing I've been talking about big Asic in the middle you would love to just cram as much memory and and processing on that main chip as possible and but it

requires super high aggregate bandwidth in and out of it then this more emerging interesting opportunity is in high performance compute artificial intelligence machine

learning computation here you know there's also continuous increasing interconnect requirements driven as I mentioned in part and in the

growing Way by this disaggregated compute model that requires low latency and high bandwidth connectivity so you see here this is from a paper a recent paper uh from

Nvidia actually published just a few weeks ago highlighting some of these Trends and they're they're distinguishing between their ethernet switches and their gpus

and some purpose-built switches uh in between and so you see a trend towards increasing aggregate bandwidth for all of these and interestingly you know what

you see is gpus sort of catching up to the other applications so this is a trend that everyone's been foreseeing and anticipating and talking about for some time at some point these types of

applications May outstrip the traditional networking applications in terms of their bandwidth requirements and they may start driving r d in this area and they may in fact drive it in

swim with different directions because of the somewhat unique requirements for them they're really the reach is not as important right as say in a switch application where you may need to go

hundreds of meters or even kilometers across a massive data center you read this so in an HPC AIML application the reach required may be much less tens of meters low latency is super important

right because again it's this disaggregated model for getting data from storage and memory so those are some of the app that's sort of the application landscape for some of this some of these things and so the next

part of the talk I want to talk about some of the opportunities and challenges so let's talk first about CPU for these really large Asics we already talked about how you know a benefit that's

motivating that's just a really high level kind of obvious benefit that this promises is that it can eliminate the need for these retimers this extra electrical hop in the data's long

journey that I talked about so that promises to lower overall system power can lower latency because every time you receive and re-transmit you introduce a little bit of latency and there may be the potential lower

cost there after all if there's a lot if there's one less chip in the mix here one less retransmitter um that should mean less cost right there's one less widget to buy

um it also offers the potential to improve aggregate i o bandwidth the idea there is look Optics have tremendous i o bandwidth potential in them right they're these amazing wave

guides for tons of data it's just a matter of harnessing them that's that's the challenge but there's there's a lot of challenges first of all we talked about moving these functions all into the same package concentrates a lot of power

dissipation and therefore a lot of heat creation all in a small volume and by the way it's putting the largest single heat generator anywhere in the system the

main processing engine the Asic right next to the components that are some of the components anyway they're likely to be the most temperature sensitive uh sensitive which are the Optics

um so you may create challenges there uh that have to be dealt with also the Assembly of this the system over here this is a snapshot I grabbed from The

Cisco silicon one Announcement by the way and just to give credit um but you can imagine if you've got a switch with hundreds of fiber pigtails coming out of it right someone's now

gotta go in there and connect every one of those and do it in a way that those links work there's no dust there's no now compare that with a plug a plugable environment to basically someone

sticking their fingers in one of these pink loops and pulling right and just popping a new one in and if one of them breaks you just send somebody in there and you do it it's a big deal it's a really big deal and imagine you have one

link that's a failure right and the diagnostic report is sent to the data center operator what's somebody got to do now now you need someone that can actually pull this switch out and by the way when you do that you're bringing down tons of compute any compute that's

connected to it is impacted when you do that crack open the switch go in there start figuring out what's going I mean this is com this is a big difference compared to a plugable environment where

you know which link is down you replace the plug it's all good this is a big deal for these people uh field service right that's what I'm talking about there another important issue is that it

actually restricts competition and concentrates Optical link r d into the the companies that make the switches because now there's essentially the

whole the whole product is one chip it's a chip made out of multiple chiplets but the whole system is all integrated in there right in in the current Paradigm where we most

networking switches rely on pluggable optronics you get a lot of different companies innovating on different pluggables and different Technologies there and incorporating silicon photonics into them and different laser

Technologies and lowering power and and competing there right there are very few companies that can build one of these 50 T 100 T switches

all right we can count on one hand um and in this Paradigm those would be the same one two three D four companies would also be the only ones that who

could would then be responsible for any r d and Optics right so it just increases the barrier Innovation probably slows progress in some areas um so you can see

market dynamics playing out here how this might be um challenging and beneficial for different companies okay another thing I want to talk about is as a challenge and opportunity here

for co-packaged Optics this idea bandwidth density so we talk about the fundamental challenge at some level because the end points and data are always these silicon chips at the end it's like how do you

get data in and out of the edges of these chips the chip size is limited by the manufacturing machines that we use combination of that and the yield that's achievable just because it defects on a

silicon wafer the chip size is what it is even for the largest chips and so we are now often quantifying the performance of connectivity in and out

of a chip by what we're calling the bandwidth density or beachfront density it's basic or sometimes it's called Shoreline density but the idea is how many gigabits or terabits per second can

you communicate per millimeter of dye Edge Now using a chiplet paradigm here if you have two dies that are co-packaged and separated by a millimeter or even less a

very dense wiring that's implemented in a advanced packaging technology you can get five terabits per second per millimeter of each run or even more and

here's a paper from from February just a month ago from an author at Samsung that that you know highlights it provides some numbers behind what's achievable here and again

this is using the ucie standard data points you highlighted here in blue this is a different standard a whole bunch of wires but doing the same thing but the bottom line is you've got

multiple terabits per second per millimeter beachfront density that's enough to meet compute and processing demands actually for some time right uh if we the

packaging technology needs to evolve a bit to keep up with that so that's good it means that that bottleneck is being is alleviated by this die to die link and other bottlenecks that are ending up

limiting Us in this regard so that's good the problem with co-packaged Optics is is that typically fibers are arranged sort of four fibers per millimeter so if you wanna

keep up with this five terabits per second per millimeter of beachfront density that you can get in and out of the Asic you're going to have to send one terabit per second per fiber to keep up with

that kind of beachfront density now there are solutions it's not so so you will be bottleneck you know although ironically one of the promises of CPO is alleviating

you know or addressing the challenge of aggregate bandwidth in and out of a chip it can actually become a limit right in this in this Paradigm that's not to say there are not solutions that are being

heavily researched you can and and some of them are just obvious right well maybe we can make this fiber a right denser somehow um maybe we can make use of multiple wavelengths per fiber right or maybe you

can adopt such as a paper that just came out from an authored Nvidia that showed yeah I mean one way uh to approach this is to just basically do some fan out here and use a bigger package and start routing links a little

further um um these die to die links then have to get longer and there's some r d needed there perhaps but that's not to say that this is not insurmountable it's just to say that

there's a challenge there that's still being grappled with as to what's the right approach another interesting area where more Innovation is going to continue there's

going to be more investment in r d and more Innovation is going to continue to reap benefit is this idea where should a laser go

one and I would say currently the more prevailing philosophy is to make use of an external Laser Source which means a laser that's not inside the package

um sometimes it's called els for short good things are it keeps the laser away from the heat producing Asic laser is one of those things whose lifetime can be impacted by a lot of heat and high temperature

and there are even standards or at least multi-source agreements emerging for uh external laser sources that allow them to be made pluggable so basically

they put them in the same form factor as these plugable modules but they just put a laser there so that if a laser fails you can again someone go stick their finger in pop out the plug and then replace the laser that way so this is uh

you know allows it to be field replaceable and multi-sourced right you can get lasers from different providers that helps drive down cost on the other hand there are technologies that have been demonstrated and that are

growing in maturity for integrating the laser inside the package or even integrating it directly on top of the Silicon photonics this results in a very slick

solution obviously that's highly integrated it also offers the potential for lower coupling loss between the laser and the photonic silicon photonic IC

um so interesting right and then in fact because if the laser attachment is done in a robust way some argue that it's uh that it's actually more robust and has better long-term

reliability in spite of the extra heating there so it's it's it's it'll be interesting to see how that that plays out but again these are some of the trade-offs there and

um I think it's going to be an evolution where um you know we're initially seeing Solutions focused on an external Laser Source until integrated Laser Technology

demonstrates its maturity okay so finally I'll just quickly want to talk a bit about optoelectronics and packaging co-optimization because I

think that's a really important opportunity for co-packaged Optics once you've got all these things co-packaged side by side you've got the opportunity to co-design them and I said some of the existing

applications of co-packaged Optics where they're being used inside plugable modules for coherence and direct detection based Optical links this was really the main

Motivation by more tightly integrating these things and having them co-designed you can get overall better performance or more functionality somehow in the system

now in order to co-design this the the appropriate sort of knobs you have to play with and the co-design depend on the packaging technology you use how are you going to put these interconnect

these different chiplets let's call them on the same package you can connect them with wire bonds and some existing CPO for pluggables are implemented this

way you can flip chip these onto a common substrate now you're relying on wiring embedded in the packaging substrate to do the job you can start stacking these things and there's different ways to stack them you can

stack one on the top of the other or um you can have the interconnect going a few different ways it depends how you build these things up over time um and each alternative is going to offer different trade-offs between

signal integrity and cost and with any of these there's the not to be underestimated challenge of accurately coupling the fiber to the

photonics with low loss so let me uh jump around a little bit I just want to highlight um one project uh that I've been working

on here at the University of Toronto that's looking at this um this was first authored by drove and there you go right um

it's great great work here and um essentially the objective here was to co-design the package and the front end

amplifier in an optical receiver to provide the best overall performance given a commercial photodiode um so there's interesting opportunities

to do that it basically it's not obvious what should be the the type of interconnect that's used in the package substrate should it be long and skinny should it be short and fat there's pros

and cons with each and in fact there's an optimization that can be a co-optimization that can be done between the shape of this interconnect and the

analog front end circuits that are embedded on chip so some of the on-chip inductors couple inductors into a t-coil the front end of the transimpedance

amplifier the first stage amplifier by co-designing the whole thing you can achieve significant performance improvements overall um and so this is leading us to look at an optimization framework here that

implicates um machine learning to help us do this complicated optimization uh in an efficient way so I think there's lots of work to to be done there but you can see

the kind of benefits to be had there these are some of the the sweeps showing sweeping different types of interconnect with a given analog front end design and

you can see you can achieve tremendous bandwidth Improvement by co-designing these things properly a factor of two or more and uh it was fun to look at the picture here

the thing so this is what it looks like all put together all right there's a CMOS chip in this case it's flipped on top of a package substrate photodiodes an array of them flipped as well side by

side separated by a small fraction of a millimeter and an interconnected the transimpedance amplifiers live here there's one here there's a different

photodiode and these are so back illuminated here so picture the testing here right showing the Prototype there's the

photodiodes and the CMOS amplifier chip fiber coming in here and then the electrical output being measured on the top of the package so

so with that I'll just jump to the concluding slide here and just point out that Optical communication again takeaway points continuing to see

increasing use over shorter and shorter distances within data centers and so these Optics are moving closer and closer to the endpoints computation and the Natural Evolution of that is to

bring the Optics inside the package but although there's opportunities there there's also significant challenges so the exact timing when co-packaged Optics is going to take hold is going to be different for different applications and

because in many cases uncertain and I'm I'm not a betting person uh on that front but as it happens I think that co-design and

co-optimization of the Optics packaging and analog circuits and the DSP as well is going to offer tremendous performance improvements so as this happens either the people doing these different

parts of the system have to work closely together or um or you know there's gonna have to be some Advanced tools involved to help

them get the job done properly and so these are just some pictures showing that by co-optimizing the DSP alongside the analog circuits alongside

the packaging we're able to demonstrate 160 gigabit per second transmission uh over an analog circuit that only had 32 gigahertz of bandwidth which is a little bit surprising but it shows you the

power of doing the co-optimization properly you can do that make sure to finish off acknowledging I drew Baha I think I didn't mention Chris Lee I should have uh neglected that into

the slide all contributing to some of the work you saw here and uh bezelagia colleague from alpha wave that also contributed some of the figures as well

Loading...

Loading video analysis...