TLDW logo

Building a Vector Database by Hand ✍️ in Excel with Prof Tom Yeh

By AI by Hand

Summary

## Key takeaways - **Train Word Embeddings Manually**: Use Christmas text corpus to tokenize into words, clean punctuation and lowercase with SUBSTITUTE and LOWER, get unique sorted vocabulary with UNIQUE and SORT, then assign random 5D vectors pasted as values. [03:02], [07:46] - **Chunk Text with Overlap**: Chunk cleaned tokens into groups of 3 with overlap of 1 using custom CHUNK function to match sentence transformer's 3-token context limit, like 'Christmas bring joy' then 'bring joy love'. [15:28], [17:04] - **Build Sentence Transformer in Excel**: Lookup word vectors with XLOOKUP and TRANSPOSE into 3 column vectors, apply linear layers with MMULT and biases, simulate attention by transposing and multiplying, ReLU activation, final combine and L2 normalize to 8D sentence embedding. [17:27], [22:10] - **Index Chunks into Vector DB**: Copy-paste sentence transformer formulas across chunks, fixing weight matrix references, store chunk ID, text, and normalized 8D embedding in database table. [34:26], [38:50] - **Query with Cosine Similarity**: Embed query like 'joy love and' using same transformer, compute cosine similarities via MMULT with database embeddings since L2 normalized, SORT descending by score for top-k retrieval. [40:54], [42:12]

Topics Covered

  • Excel Builds Word Embeddings
  • Embeddings Learn Semantic Similarity
  • Chunking Matches Transformer Limits
  • Matrix Math Powers Linear Layers
  • Cosine Similarity Ranks Retrieval

Full Transcript

this is the whole database in Excel that I built and I shared it out right before this webinar to give you a preview and today what I'd like to attempt is to go

from this blank spreadsheet and to reconstruct that big database and while taking you through step by step and to provide my commentary and explain and to

to talk like a professor hope you can learn something from me today all right okay and but I realize that to uh May my

life easier and also stay true to the philosophy think I actually sketched out the whole thing I going to code today and by hand so I'm going to take you

through the first step of training the embedding for each word and then if we have a text that would like Chun them into uh individual pieces and then I

will be able to take each CH and fck them into a sentence Transformer in this case have three three words in each chunk those are marked by green and then

if you scroll down like each chunk going to be embedded into this embedding Vector which is the vertical column of this embedding Vector just one column

and when when we have enough of this and then we just end which our destination today is to to pull them all in into a vector database and down here looks like

a table and with different chunks and embedding and then eventually we'll be able to support application which is to take a query which is also a

chunk three word in this case and then get a query embedding I'll be able to compute similarity using cosine metric and be able to find eventually get a

topk in this case too all right so this is the Roat of what I like to accomplish and hopefully I'll will be successful and but since it's live you'll see me

making mistakes throughout and uh so and then is just then uh to all can learn from my mistake as well all right

so the first step I'd like to be able to do is to to tell you what embedding means so I like like to get this things called the war edding that's all in this

orange stripe over here and so over here if you can see my mouse over here so I turn on the cursor location you can see where I my cursor is at and so before I

go further and I have I've shared this exactly this blank Excel sheet on the on the website and the website is uh

by-hand doai you could download this version if you like to follow along you can or you like to skip ahead to see what's to come you could I encourage you to download this Excel sheet you can

open up yourself too all right and uh so in this also share I also share the code that I didn't think to build anyway going back to train embedding my goal

for this step is to for this I want to have a list of word and sort it in each word i' like to have one two three four

five five number to represent the word in each word have the fine number over here okay so that's my goal and so this step is a bit more warm up you are not familiar you're probably very familiar

with Excel already but like so this is warm up before we get harder so first uh I'm going to copy this long text into

this box so to train in beding typically we have a corpor cpers of a lot of text and then today in this for example we're going to just copy and paste this and

because Christmas season coming so I I just found this right up some random text about Christmas Christmas bring joy

love and warms fam gather sharing gifts laughter and stories by cozy fires and so on so forth and then to use this text

to train the embedding so this is a very simple the natural language processing 101 and again it's a warm up but just add that to familiar

with just to show you and practice coding a little bit in Excel so for First Step I'd like to break it down into individual tokens in instead of just one really long string I don't

break down the token in Excel I can use this pi split which is the function I will use and I'll go here and select this and

then I would like to not the not not breaking down into columns I like to break it down into rows so I skip this and there's a roll delim the the limiter

so I use a space so this is going to give me a bunch of the same words I mean the the individual word in my training text into uh column so this one and I

claim that a little bit if you not is that and here laughter can you see this laughter lter is a comma and they have some kind of period I want to remove them so that I just only get

work okay and then so so for instance for twinkle I want to get twinkle just twinkle not twinkle comma all right so so in Excel it doesn't have a very good

function to do that but the the way I I like to use is I use substitute so I can use substitute and select this one and I

W substitute all the period to nothing so they'll get me do this if I test this picture test test this I copy this and

go to this uh where do I have period here Wars so you can see that warms become warms period become warms so clean that a little bit and then if I

can do more is I want to also clean up if I copy over here do twinkle it doesn't quite work I have have to substitute doesn't doesn't take more

than one word so I can do this nesty substitution again so uh and I will do this time I want to substitute all my

comma with like nothing so this is will remove comma and also period so you can imagine that somebody will write lots of this text processing routine in order to

clean up another thing I can do is that instead of if I use this one for the first one Christmas I maybe I like this to be all lower case

so I add a lower at the very end okay so this still very Excel 101 that you probably done this before nothing too new about this but it's more a review

okay now I get this Christmas lower warms is period is removed and Peri uh commas are removed you can do more like

you can remove s to do the same but then is for this today we're not this is not about NP in steps so I just I think this

good enough for me for clean step that I go to copy and just paste this and then okay work and then I just then I can paste to the whole thing now whole

column and go all the way down all the way down oops too much let's go back that's not what I want to do that's it's just say just light copy over here

D okay I copy more and then I'll copy more all the way to the bottom okay here to this one this where I want to St okay

so now have all the my token cleaned up so and then there are some words that kind of repeating themselves multiple times but I want to get a unique set of

words so this in turns out it's kind of easy to do in Excel because we can use unique a function building function and I then I just select this array now this

time I can go all the way back but that's the result so it turns out in this text it's only a few words that repeat itself so now I've got a shorter

word token and I want then eventually I want sort them so easier for me to see what words are available I sort this select again s to the bottom and then

this is my sorted tokens but so this is the vocabulary they are I'm learning from my training text and in practice this training text will be like gazillion

uh documents gigabytes words of text and but should follow the similar process of breaking tokenizing it into tokens and clean it up and or normalize them

removing different stems sometimes really long workout chuck chuck into different different uh even smaller tokens that can be done before all right so these are text and embeding this idea

is that can we rep maybe there's a way we can represent each word into Vector so the first decision we people we make is that how many what are the dimension what is the dimensionality of this

Vector so say five in this case I just use five so I have one two three four five space Reserve so I can type something like

0.1 0.2 0.1 some Rend the number to begin and before I start some learning process and for R I can do 1.2.1 so just studies we can start with

some random number and so and in the machine learning for in this and from okay and then and so one way I can

do this is can do using the random function and then I just copy my function to the entire block that I need

so this is my uh there will be one two three four five okay so this are all the different numbers I have here it's kind of hard to see maybe it's not that hard

to see and okay so this is I have a wor B this are random number I haven't really learned to in learning yet and so and in Excel if you use a random number

one thing that I kind of found it annoying is that I say I put another add this somewhere else all this number change so I like I like the behavior to stop so I could just comfortably work on

it so the trick I'd like to use for myself is that I will just copy this copy whole number and then I go back up

and I'll will p paste over itself but I will paste them using spatial pasting I only paste the values not formulas

anymore I press V press values and okay so now this number will fix to the value instead the formula so you got do changes stuff doesn't change anymore now

we initialize we're in Bedding with some random number and the learning process is kind of like this in this way so for instance there are some you were trying to find they are similar in meaning for

inance share as they say Smiles is an and laughter there's a smile here and we have laughter up above and then in you have a training examples probably like

oh laughter and smile they tend to appear in the same sentence you see it over again every time you see it appear in the same sentence we could say oh well I was just going to say 0.5 to

0.9 and then I'll will add uh smile to 0.9 again so that kind of becoming similar a b more similar and they can find another two word love

maybe love and memory should it be similar maybe they should now just change the number to some something closer to each other okay and love is

anything that opposite if say uh senta and maybe it's not a good idea say uh shine I don't have a opposite just say

uh let's PE any two words say I say love love and uh what

else love and uh and kept they are not supposed to be similar but there two different word I could just purposely make one minus one oh sorry not here

where is my love here one make it one but make it another one opposite say minus one to put it further away so you keep repeating this process so that it's

like an intuition behind training this war bedding that's usually done by large companies like Google or and they share on huging face you can just use this for invading but I just want to give you

some intuition that's how we embedding our train it's just very simple process training embedding take a lot large cus training text poize then clean that up

got a whole bunch of words they become your vocabulary and in here I have about 200 words maybe 100 words but in in reality in practice you could have

10,000 words or 60,000 words if you multilingual vocabulary you have 200,000 words and so on and you choose to embed inside we have five here but in practice

could be 512 it tend not to be super big 512 is pretty common here so that's what we have here now now we have this one embedding now we have a now we talk

about application but now I want to use this embedding to to to index a text I actually do have so this new text coming from me okay so now I know I have a

vocabulary that's found the Christmas I have some kind of shorter Christmas text as my text christm bring joy love and warmth uh just actually it turns out it's a

shorter version of the same text just pretending this is my text I want to index now into my own database and then will follow similar process of tokenizing and cleaning it up so I'm

going to do copy my tokenizing uh function over here it doesn't work like that because it was selecting the wrong sale so I got

to move this over to my new text block so now it token light that into this it's a shorter text there a fewer tokens now I can also then use clean

up okay and I copy my clean text and then copy over I copy a formula and copy over all my tokens locations so let's pop now I clean that all the period and

uh all the periods and commas are removed and all is lower case and might notice that I skip sorting and unique because now in the application in

indexing it's okay actually I do care about where the words repeat themselves I'm not trying to learn a unique vocabulary here I just use them because I want to index them I want to put in

that into a vector embeding sentence Transformer which going to talk about they will transform bunch of words or vector invading into a single vector and I can

put into the vector there datase here so that's what I want to do next so the and because usually when you take a Transformer kind of encoder so there are some limit into the context lens how

many token you can feed for instance 5,000 tokens or 10,000 tokens in my very simple sentence Transformer is the limit

is three token very small but I there's a but there something I can handle three tokens so because of that I I have to

Chuck my input text into the unit of three instead or or fewer than three two but in this case I have a parameter for my chunking which is my chunk side is

three that matches the context window for my sent Transformer side and then also have overlap B me in my chunk I like to have some overlap for instance I

got to free my pain now finally uh so if I have chunk is like Christmas green Joy is one chunk if I that means my joy love

and and is not CH you can look at in this here this is my first charm and this is and my second CH is not my second charm will be joy joy love so

that's what I like to accomplish and I have the function called Chun and I'll say select this cleaned up set of token

words and then my first parameter is strong side is three BL is one and while I going to get this arrange in this case Okay play with this function a little

bit so if I have a chunk side equal zero over overlap and you can see that this Christmas GRE Joy is without

overlevel one have we have overlevel zero is Christmas green Joy but love and there's no overl let's go back to my overlevel one here chunk size is equals

one so that's my chunking I get I break down a list of tokens a sequence of token into a groups of three now have now I have six of this and each one have

un ID what is number ID keep the ID one all the way to now I have six Chunk in this short text six chunk over here okay so this is a chunk G now it's a going to

we finish War up I'm going to get into the hardest part which is of this today's lecture which is to tell you about a little bit about sentence Transformer and uh so

hopefully I I I able to explain this m just in a simple enough for you to to see so the overall my goal is to now have a chunk of text for instance the B

my first chunk is Christmas bre joy and so I could take them but I'm I'm La so I got just use this equal sign to say oh this is the same as this three cell okay

can you see this all right so have Christmas BR Jo here and then I like to do is to look up this embedding from my vocabulary where can I find this

invading from vocabulary one we can do is that I go to the left here this is my vocabulary okay so I was so Christmas is which word is Christmas I

go down my vocabulary and find the word Christmas here so and then this is my embeding here I can copy this or maybe

not copy this I want I want to actually select this so I select this uh I want select this invading for

from Christmas so I'll select here but it doesn't work it goes the wrong direction how can we fix this so we can use this function called

transpose so I basically want to transpose then into a column Vector so it's what this what happen here okay so

I just did it one for one work Christmas I can repeat the same process of greens and joy it so I have three of this okay just you rep this process just do one

more time just as a review if you didn't follow me so going to go find green that's why I sorted it it easier to find greens right here and I select

this okay now I have this repeat the same process is in the wrong orientation so I going to do is a transpose to this

transpose transpose okay I have two of this now I can do another one but I going to write a formula now instead of doing this by hand and but the overall goal is that what sens transform going

to go through the goal is to have a three vertical orange stripes which represent three column Vector which one is five Dimensions serious series of

transformation somehow it became one vector with a more Dimension higher Dimension Space 1 2 3 4 five 6 7 eight I decided that I would like to use eight

dimension for my Bing Vector so I got to go next is going to go through the transformation space and it's big review or explanation of how very high level

how uh like neuron Network work is a way I explain in my class as well so um hopefully that you can learn something

from it and and so okay all right so now I instead of doing this by hand I could potentially use the formula to do that I

can do this very useful function called X look up so I could first I want to give a lookt value I'm look up the value and Lookout is this token Christmas and

where you want to Lookout look where's the lookout array the lookout array is going to again shift to the left it's my entire vocabulary here so this is my Lookout array going down going all and

down so this is my Lookout array and what's the return array return array will be my word in Bedding so it's here

this is my uh return array so go all the way down and then if I do this you're going to give

me this and I know that it's a wrong orientation so I can fix this do trans

poose okay so I transpose this to the right one okay all right so now instead of picking up by manually I can use a formula to do this look up Christmas

find the associated in bading and put it right there so I can do it for brain it doesn't quite work because the it's the reference got shied in Excel the way to

prevent it from doing it is that I want to fix that to the K column so I picking up the vocabulary column holding the column holding my vocabulary Works

similarly my embedding is by L and P column so I fix them so they won't change if I copy over to the right so let's remove this now I'm ready to copy over

okay now getting three column vectors now I'm do the first trans First Transformation is the go from five to six how do we change that to five to six

that's what a neon Network linear layer does is basically somehow combine them in in a way that they could get uh any arbitrary

Dimension you want next so have you want to go from five to six meaning that in this level you have five neurons in next sorry a five input but in this level you

have six neuron you need six neuron to convert into a six dimensional vector and the way the math work is that I do a very simple multiplication and if you

just write math you might see this math in the in your deep learning class there kind of WX plus v equals y and and this

Excel allow you to visualize what what exactly happening so what's happening is that that this is the Matrix I may map out so if I want to transform from five

to six I will have five one two three four five this my five input and six will be one two three four 5 six is my

output so it arranged this way so now I want to prepare this transformation matrix that's copy I copy all the zero

over to initialize to zero okay and then to as a proof of concept I going to put the weight for the first location to be

one and this weight is going to m to the First Dimension in my income so this is the correspondence okay so this I would

expect that if I I modifcation so I will get this this l 0.828 so this should be

0.82 a and this should be 0.036 this should be 0.842 okay this what I expect my mass to to be and then

let's trying to implement this so me mation quite easy for you do M M M M

Mount and now I can go select my first Matrix over here and then I select another Matrix up here this is my input

and the first in Matrix my wait second Matrix input voila yeah this this actually help to see is this 0.828 let's just hi it angle a little bit for a

moment and see that 0828 come on 0828 Zer so this proof of concept is correct if I do that so if I

move over here you can see it's copying the second row if I do that so this is actually getting the value of third row so if I do this 111 is a com basically

summing the first three R to obtain a new feature okay so this is the intuition Le layer let's really this back up my ink back up and now in my practice just usually I just studed now

I got the intuition right so I just kind of randomly assign some one some way or maybe another two to here okay so just about some random way

assign so that to for my simple work and every time I do that you can see those number change multiplication got R variated next I going to look at the uh

how to add a biases so here I want to remove I want to add biases so that for instance this is 0.461

0.799 i' like to add one so everything kind of go up by one so I can put my Biance here how can we add that so that everything can move by one over here so

bias is is a good place to put is here and the 00 Z I only add bias to my fourth location or actually fifth location my new R and so the way to do

it I go back update my equation maybe just my right now and then plus and then I go to the left and I

select there not so I used my mouse so select this column so this my weight column and then you can see this number got moved by one so if I put two

everything move by two move by three zero back where we are minus one going down opposite direction so there's no modification involved here let more

adding the biases now this number look scary and this is my hand sketch looks kind of preter a bit less scary okay now next is activation function what does

activation function do is they going to bring this back again so I like to activate this neuron only if this value is positive in this case where this

three value are negative I like this to be zero 0 0 let's just get back here so this will be 0 z0 everyone else will be

exactly the same number that's what radio doeses and then so to implement radio so one way to implement this

if if if this value wait no I Implement here

if I can select uh so in I can say the whole block if this is greater than zero and then I will just select this block

one more time which should return the same thing but if not greater than zero which are negative I set it to zero so this is the equation then vaa you can

see that all the numbers are the same okay but except for since that negative so I'm getting what I want so there's a logic and philosophy behind radio basically

take any positive number so I have to go a bit quicker in my commentary and the next step is that uh as you might see

the Transformer the core of Transformer is uh is a tension so how add attention to it so one way to approximate

attention is that we could transpose the token into this way and when we do the combination that we're combining them across another dimension across the position so basically I can combine one

two three position we can get a new Vector which is based on some combination of uh vectors SP here so this could be a combination of these three in some kind of attention weave

Matrix so this is my attention weave Matrix which is Al usually a square Matrix so I'm going to do it very quickly now without a lot of commentary

so transpose I'm going to transpose this my speed transpose this okay and then and this I going to have

my attention Matrix so now if I have a this this I'm creating a and this is no bias and then I'm creating a d Matrix so

now I do a m mat notation I'm talk too much okay and then I'm going to multiply this so I get this and I can add my bias

turn go bi then my bias here so this is my attention simulate up approximately by D

trans so there a l literature how you can run this NP all the way so there's a sphere behind this and I can test it if I add one everything go by by one if I

start to add more add more let's just let's just reset this I want to show you if we just have identity Matrix there no biases this number is looks like the

same just a trans spell but that to add more so or this will allow me to combine then into different ases so there's a ni

behind it but once we then that I can move kind of transpose them back to this three column Vector for my three words and then I could run another linear

layer to get eventually get to my desired uh Dimension I like to move to six to eight Dimension so I repair my 6

by8 Matrix here and and also prepare my bias turn here and then once I do that I

can start to say some random ones random weight value sord as this example so in when you train this network you usually that's what you do too just start with

some random weight and then you run that propagation to update this way now I want to do my me modation select this

okay and then also then select this and then add my bias so this is my linear layer I

implemented this is my line layer my final linear layer and but we still have to find a way to combine them into one vector before I can output so to do that

is that I will use this transpose transpose trick transpose them into another orientation and then I can can combine them instead of just 3 to three I want

to be 3 to one that's only one rule of weight so if I do one one one one what does it do what it do what it does is that if we do just one one one one it means I'm combining them with equal

weight all of them so they will give me the exactly the same result summing them over adding them up then add another

bias turn here okay so oh I forgot to add select

I want to select select this select the it's my second me multiply okay so now if I do this is you

can test this is does moving my value and then eventually transpose back so I got the foror I want to Output so this is my sentence embedding but it's not

normalized it's it could be like every arbitrary weight and so typically I normalize so that by using the L2 Norm

the way to get L2 Norm is we can do we can add them up but I want to add the square of each of this number up so I do

some Square so some Square so have one two one right and the the normal sum is just four but Su

square is going to be 1 14 1 which is six so this is idea s square and and then I take a square root of that that will give me a L2

nor okay and mathematically basically then you take this some something Square so L2 nor okay and then once you have

that I could take the whole vector and just weight them by normalize that by then each one will divided by this number this value and so we have this

new Vector which is being weighted and what's nice property about this that if you calculate to sum of this number again you get one add out one so have

and then if you do that I I see I have some space for the the column Vector from for other chunk they'll have a same

size and also same length magnitude because normalization and make comparison a lot easier you will see in the v inting and while now we have this Vector finally have first word what I

can do is that I will and yeah let's just do a split screen here now I can skip ahead

to my right hand side oops not what I do this okay here go out here so kind of moving around so this is what I'm trying

to find okay uh let's just redo this it's kind of confusing let me do one more time so what I want to do is that I can now put into my V base finally have

something to put in my V database My First Data first chunk my embedding what's my embedding that's okay now this where I want to be I want my embedding I'm going save my embedding to be the

vector I just calculate so first I like to I can use a split screen so I could go find that Vector I split

this why is that split here okay I sleep from here okay so now I scoll to the back here okay this way we were and scroll down so

this way we just compute right so what I can do is that back to my I will set this to this guy so now I put the edding but I

realize this is not the right orentation I go back to change the transpose okay 15 10 minutes that's too

fast enough and then also I would like to for that to have my chunk information go backwards my chunk

the chunk is here okay so now I go back here I want set this to chunk here set the here okay now I'm what I'm doing is

I storing this into my in B the database in the table and assign ID to one this is my first chunk I just finished index in the first one and I can do that for

the rest of them which I do very quick the the reason why I can do this is that I can copy and paste all this calculation over and get exactly the same result because we're going shootting this way and before I can show

you where I do a little trick is that I have to modify my formula but I fix the uh the the weight matrixes the column so

I I can copy them without worry about I can reuse this way for my uh way and vies so I can do it very quick

without talking too much so let me just do it quick quick quick okay done okay now one more I I have two

more I have to do so I have to fix this F fix fix my column fix my column or my weight matrices and biases okay and then just one more fix

my column for my this is my bias biases so this is my weight Matrix so okay now I fix okay this is the most fun thing

now what I can do is that I could for my second position second chunk then make sure I select enough column all I have to do is now I just copy come on this

not what I do okay go here copy I want just make sure I not copy this okay and then then I come back here and paste

them now and then I would see almost that calculation is be copy except that I have to provide the just make sure I select the right chunk so I'm gonna do

is that this is I'm gonna come back here to select my second chunk which is here okay now so you can see that all

the calculation is inherited but we still copy and we get a new set of uh new set of vectors here what I what did

I just do oh I it's a type something here okay all right and then do the same

thing for my third one yes now select this yes right now yes and then copy and then copy

again okay now I can do three chunk but I have to do reset the chunk Tes over here select my third chunk and come back

here and select my Force chunk and then reuse all the weights here and then you can see check the

calculation can see that my equation see the equations you see the weave matrices and also the input are being selected correctly and then we'll get this more

column Vector here now so now with that I can go back to my database I want to index all them now

so right here and I'll split the screen here so then I'll be able to go back first I want to make sure I copy all the

CH here on maybe I should here all right so my Chun two is here so now I select my

Chun two is going to be this two tokens is it right no didn't work one one more time

TR to like okay so okay good and then and then the invading this is I hope there's a f Way good eding I got

scroll down this this embedding for second one I forgot I forgot transpose to

transpose okay and now copy here and okay it doesn't work okay let's do one more time since I'm here I might

just do equals transpose and select this okay down equals transpose and select

this so I'm what I'm doing is indexing now I'm putting all the in beding for this four chunks but let me just go back here into select the chunk text so I can put the Chun text back in the database

as well so when I retrieve I have access to the Chun text and then if you are familiar with rack that's what you do is to then you can

put the the Chun taex into your uh front to provide some extra context okay now I'm done finish indexing this four CH

into my dat so I have still have time great uh it'll be rough but now I can talk about query suppos now have a new query what do we do with it so say I

have a query uh say Jo Joy love and joy so this if you look at this it's kind of similar to this one Joy love and and or

Christmas B Joy so how do we look it up so that compute the simility value with respect to each of the chunk already in the

database uh so what we can do is hopefully and then so I will do similar copying and paste here for my

query okay I you of this and then copy over here and but now instead of copying this Chunk from earlier I can just set the

chunk to be this query okay now I'm reusing my sentence Transformer I just built and then then I have this new eding Vector now I go back to the database here I can put this in this

strategic location I'm going to tell you why I do that so I can select this then this query is over here okay now have this query in Bedding

arrange it this way because if I arrange this way I'll be able to do is I just do a mat modification is

this all this embedding and with then with this aquar embedding M modifcation because the fact that we already

normalize them is this number is actually cosine similarity can see that similar quite similar because we have random ining but in practice it shouldn't be this close but just say

proof of concept right do laugh Joy laugh and which exactly the same as the second chunk so you can see this value become one okay so in coine similarity

similarity value the most similar identical you get higher number so like to get a top K say two how do we get a

top K and we you like to go in the particular direction we want to go descending order which what which means that want put the highest number at the

top and and go down as we go so now we could basically do is we can now sort you sore by function so I want to

sort uh my array with my ID and chunk this is my database uh the database information but

sore by now I can sore by my cosine similarity values which are score that calculated by with the cosine simility with invading and then this is the most

important decision you to make is it descending or ascending IAL just talk earlier this is descending so so I got to do descending so like this one minus

one and then do descending of then then while sorting this so Joy left and which is exactly the same my curve right now is rank as top one and Christmas bring

joy because of joy is top is top two it makes sense and then at the end I just I can add I only want to take two of this

so imp top two but in practice you will sore a m a millions of record and take like top two top five Etc but right now

I just take top two out of four okay so this is completed my uh the whole thing going from as a little review what I have managed to do

is that we go from training embedding get word edding and chunk them into individual chunk and run them through V embedding somehow take you through the

math of a very simple Transformer with both the linear layer also uh tension very basic attention mechanism and

summarize into a word embedding I mean sentence embedding which with with one vector put it into the database um so one thing kind of and now

be able to compare them by taking the Mach taking a do product or ma modifcation between each database embedding over here and other C buing and able to rank them over here and you

can play with it like for instance I can that do joy joy joy so it got get updated it's all upd light and get

different chunks okay so I managed to finish uh I'm quite happy about this thank you

Loading...

Loading video analysis...