TLDW logo

Master Dify AI Chatbot in 1 Hour: Complete Beginner's Guide for 2025

By Eddie Chen | AI Automation

Summary

## Key takeaways - **Dify: Open-Source Low-Code AI Builder**: Dify.AI is a low code platform to build production ready AI solutions quickly with 112K GitHub stars, open source so you can self-host it like N8N using drag and drop interface. [00:46], [01:06] - **Chatbot vs Agent: No Tools vs Tool-Calling**: Chatbot is conversational but can't connect to tools or take real-world actions, whereas agent can connect to tools, do tool calling, and make decisions. [05:18], [05:34] - **Context Window Limits Force RAG Chunking**: LLMs have context limits like Claude's 200K tokens so a 500K document won't fit entirely, losing info; long prompts confuse models with middle content forgotten per research. [15:16], [16:50] - **Firecrawl Cleans Noisy Website Data**: Don't paste entire websites into knowledge base due to HTML noise like images, links, maps; use Firecrawl web scraping tool to clean up messy data first. [18:28], [19:12] - **Parent-Child Chunking Boosts Relevance**: Parent-child chunking uses parent paragraphs for context enrichment and child sentences for pinpoint accuracy, better than general chunking by retrieving precise then enriching chunks. [28:15], [29:24] - **Rerankers Optimize Vector Retrieval**: Vector DBs prioritize speed and recall but rerankers like Jina's rerank m0 judge true relevance by processing user query and chunks simultaneously, returning top relevant ones. [45:17], [47:02]

Topics Covered

  • Context Windows Kill Long Prompts
  • Firecraw Cleans Website Noise
  • Parent-Child Chunking Beats General
  • Rerankers Fix Vector Recall Flaws

Full Transcript

What's up, guys? My name is Edwin, and this is absolutely everything you need to know about setting up advanced racket chat bots in Nivy.AI.

By the end of this video, you'll know how to set up advanced chat agents for your business beyond just making one ad chatbt or blindly uploading PDFs to knowledge base and hope for the best.

Now, why should you listen to me to teach you about how to set one up? My

name is Edwin and I'm the co-founder of Legacy AI where I've been serving more than 20 retail and B2C clients in the past 6 months with conversational agents like the one you see in this video. So,

you're learning from someone who's actually done it before in the real world, not just theory. Now, let's dive straight into it. Let's jump straight in into DV.AI. If you didn't know, DVAI is

into DV.AI. If you didn't know, DVAI is basically a really low code platform that allows you to build production

ready AI solutions really quickly. And

it has got 112K GitHub stars which is really good like speaks for itself. And

the good thing about this is one it's open source so you can actually access the the code base and you can actually configure it on your own if you want.

You can actually self-host it. So

something similar to N if you're familiar with it. You can also self-host the in your own organizations if you want to. So it's a really good like

want to. So it's a really good like platform for us to build this kind of using drag and drop interface to build this sophisticated well-gineered AI solutions. And it's not just cool demos.

solutions. And it's not just cool demos.

You can actually build production ready LLM AI applications using this platform.

And the way that you can get started is just get your account. And I've already got an account. But once you're in and inside the login, you will find yourself in kind of this workspace canvas. And

the thing about defy is there's something called marketplace. So like if you want to build something. So for

example, if you want different AI providers, you would need to get it from like kind of like a marketplace kind of an app store if you will. And the way to do that first of all before you even

click on anything is you should get your settings and configurations first. So

the way you do that is you click on settings and you go to model profile.

And from here, this is where you can choose which LM provider you want. So

you would need to install these providers in the DV market space. So if

I want Enthropic, Claude, Amazon, if you want Hugging Face, Deep Sea, Geminine, or Llama, some of the open source as well with Llama, uh there's Maestro, etc. Right? There's Llama API. So you

etc. Right? There's Llama API. So you

can kind of choose whatever you want.

I'm just going to keep it easy and just stick with OpenAI. And the way you set it up is just click on set up. You enter

your opening API key from here, right?

You save it and from here click on this the like three dots and you can basically configure the system model settings and the first one is system recent model

which is which model that I'm just going to use to build my applications. I'm

just going to choose GP4.1 in this case.

And from here you can also choose your embedding model which I will talk a little bit about. If you're familiar with rack you're probably familiar with this already. If not, I will explain

this already. If not, I will explain I'll go through a deep dive into how you build good rack applications within divy

ranker model. I will explain that as

ranker model. I will explain that as well. So don't worry too much about that

well. So don't worry too much about that if you're not familiar with it. But

essentially we'll be using jina. So the

the way that you can go is you can go to jina.ai and this will be something that we will need to use. So just sign up using an account and access your API key this way

and you'll just need to click on manage API and you'll just need to create your own API key and import it in here right super simple I will explain what it is

speech to text if you're using voice applications text to speech and speech to text is where you build voice applications which you can just put your own model in like I just use both the G4

mini text to speech and transcribe or you can just use whisper if you want but like it doesn't really matter because we're not going to be using it today, but it's just there. If you know, this is where you configure all your models

and settings here. Now, when you're done with that, now you can actually go back to to the main page. And if you're new, brand new to this, don't worry. Studio

is where you can actually build different AI applications. You have

different choices. You can choose workflow which are more manual kind of not necessarily manual but like stepby-step more logical data lm

workflow similar to n but n is more on the kind of business application settings so like if you want to connect to CRM if you connect want to connect to outlook business applications is easier

to set up in nan but workflow is where you actually want to have precise and highlevel engineering control over an AI

driven application. So it's really

driven application. So it's really really focused on LM and AI.

You got chat flow which is slightly different um different to workflow because chat flow as you know as you from the name is more on the conversational use case. So like chat bots you know chat agents, voice bots,

voice agents etc. Chatbot is confusing I know but chatbot is basically an easier version of chat flow. So it's like it's like a more simple applications which

we'll be doing today. agent. The

difference between chatbot and agent is that chatbot doesn't can connect to any tools. So it can't actually do anything

tools. So it can't actually do anything useful in the like doesn't take any actions in the real world. It can just talk it many application is conversational. Whereas agent actually

conversational. Whereas agent actually do something in the real world. It can

connect it to tools do tool calling. It

can make decisions over the tool etc. Right? So today let's keep it simple.

Right? So today let's keep it simple.

The only thing we're going to build is a chatbot for a gym. So, I'm just going to use um the pure gym canary wolf because it's the gym that's close to me. So, I'm

just going to be using this building a very simple, you know, membership um FAQ chatbot in this website, right? That's

the use case to kind of showcase you kind of just to get familiar with the divy use case. So, what I'm going to do is I'm just going to create a blank new chatbot. So, I'm just going to create

chatbot. So, I'm just going to create blank. And you can see there's only

blank. And you can see there's only workflow in CF flow in general in in the real business stage. You probably want to stick with one of these. But for now, because we're kind of starting from

scratch, we're going from simple. I

actually want to give you like a good fundamental overview of what you can do in terms wise. That's beyond just oh, let me just upload the PDF into the knowledge base and then I'm done. Right?

That's what that's what most local tools like nan or like other local platform bord press whatever that do which we are the reason why we're using diff is because we can do more than that much

more than that we can get much better results at least in terms of accuracy and relevance which we will learn which are metrics in building rack pipelines like that's why we used it here right

and for now I'm just going to click on chatbot right with a simple setup I'm just going to give it a name I'm just going to call it um gem description. I'm just going to save it.

description. I'm just going to save it.

Just going to click create and ta this is your first chatbot workspace. So here

the instructions if you're familiar with kind of this logo tools. This is where you can put your system props. The

instructions that you give to the chatbot. I already have a prop. So I'm

chatbot. I already have a prop. So I'm

just going to paste it in here but I'll just go through it with you um just so you understand. Right. So you will have

you understand. Right. So you will have here an instruction right? You're in FAQ system.

Your task is to answer user questions about gym facilities, membership options, slash schedules, opening hours, etc., right? And then you're just going

etc., right? And then you're just going to follow the steps, read the user's question, right? It's always good to

question, right? It's always good to have a role. You always good to introduce the tasks. And it's always good to have a kind of chain of thought, prompting techniques, which is doing this step by step. Step one, read the

user's question. Step two, identify if

user's question. Step two, identify if the question is related to D. Step

three, if the question is about gym facilities, provide clear information about equipment for opening hours.

Right? You can you can kind of see the flow kind of like step by step what how the chatbot should think. Now if you're like man how do you create this prompt?

Wait, don't worry too much. You can

click on this generate prom if you want and it's actually pretty good. You can

actually just you know use a prompt generator and you just describe what you want to build. If I say I want to build a gym chatbot, right? Obviously be more descriptive if you want like you can

also use one of their default options which is let's say a travel planning a agency, right? travel planning assistant

agency, right? travel planning assistant etc. This way you can actually generate um the instructions right and yeah so if you if you're struggling with

problems try that and it should help you at least with a foundational layer of how to create and structure text.

Now you should also have you know in your prompt you should also give it examples right you should always give example what time do you open on the weekends it's only open 24/7 because this is what we call fuel short

prompting which is giving the models examples to follow right so it's more likely to get the answer correct one and also teach the model kind of how you

should how they should think about answering such questions right so I'm not going to go through all of this like it again this is just kind of examples questions that you just like make up or

use the prompt generator to help you generate and from here you can see there is a variable user question. Now

variable within D5 actually allows us to introduce prompt um when so what that what what that means right is this very strong long sentence what all that means

is if we set a variable let's say I'm going to call my variable um you can do it in paragraph select number short test I'm just going to name it user question

so user question right and I'm just going to save now if I have this once I set up this variable you can see that I have the user question variable here.

That means that every time when I chat to the chatbot, the user will have to will have to provide a value for this variable, right? Which obviously is

variable, right? Which obviously is really annoying in this kind of chatbot interface, but it's going to come into handy especially in workflow and chat flow because when you have more

structured logic in terms of how you structure this AI agent or assistance, you're going to definitely need therapies, right? which is not

therapies, right? which is not surprising which is if you're familiar with tools like nan right or other local tools it's kind of you know standard but this is just how you define it within

the theme but for now let's just forget about this and just delete that and for now I will also delete that so I'm just going to say user question

rather than having a fixed variables right and the way you save in the video is you click on publish and you click on publish this is how you can save this um

this versioning that you're working with, if you will.

Now, you've given a system prompt and instructions on how to set up this chatbot, but you wanted to just kind of have a conversational starter first, right? Because right now, it's kind of

right? Because right now, it's kind of off, right? You have to ask the

off, right? You have to ask the questions. But the way that you can have

questions. But the way that you can have the AI to ask the first question is click on manage. You can click on conversational open. So what this does

conversational open. So what this does is that it actually allows the AI to say something first. So it will say hey

something first. So it will say hey there thanks for dropping by to in

how can I help you today?

Right. So I'm just going to click on save and you can click on follow up which is setting up next question suggestions to give us a better chat experience. It's usually recommended to

experience. It's usually recommended to click that on text to speech is converted to voice which we don't necessarily need at the moment.

Citations this is where it's important.

Remember, I'm going to explain to you like a chatbot if you have never built one before. For it to be useful, an AI

before. For it to be useful, an AI chatbot, you would need something called knowledge, which I'll explain to you why. But if you basically, if you add on

why. But if you basically, if you add on citations, that means that it will actually give you precise source of information of where the chatbot gets that information from. It makes more

sense once I explain like content moderation, right? By using moderation

moderation, right? By using moderation API, it makes sense. So basically it's kind of guardrail. It just has a built-in guard rails if you want. So you

can just use open AI moderation and you can just kind of you know turn those on to prevent people from entering inappropriate content or when they're trying to abuse spot. This is how you

can use the moderation. All right. Um

and then you're just going to save. Oh,

you you need to sorry you need to I forgot to type in something. So you can just say um but for now I'm just going to turn this off.

All right. I'm just going to click on restart and say hey there thanks for dropping by to watching how today.

Right. Perfect. Now vision is where you can import images. So if I turn this on you can see that I can actually now attach image to the chatbot. So people

can users can actually do that. See how

simple it is to allow this right. And

you also have metadata filtering which I won't be explaining in this video because it's a little bit more advanced but I will be explaining knowledge.

Why do we need knowledge? Now before we kind of jump into how you do it, let me explain on a conceptual level everything you need to know about setting up a

basic rack chart.

Now the first thing we're going to do is any rack pipelines at least in divy. So

I'm going to be using divy specific keywords but it can extend to any applications. It has four components.

applications. It has four components.

Importing the data which is very common nowadays with these local tools which is importing this unstructured data PDFs, text, word document, whatever you want

to call it into a knowledge base. That's

the first step. It's unstructured

because it's text data. It's not a structured data like a Excel, right? So

this will be this will be a structured data right like is is in a spreadsheet format this is structured in a table format text documents are unstructured.

Now you may ask hang on and why do we need to you know import this why why we doing this in the first place like why can't we just you know for example um if

I just plug the entire document into the prompts right then the chatbook will have access to specific context right

well this is why every lm as you're probably familiar has a very very special feature called context if you think about you

this document right uh the size of this document let's say you have PDF of size let's say u 500,000

characters right but most of the times open AI models not open AAI models just any lms have a specific limit called context that means that there's a limit

as to how much information how much characters how much words you can actually stuff inside the model itself as an example If I have claude which is

200k um context window. So let's say I'm using clo and it's up to here this is 200k right that means that if I stuff

the entire document which is 500k document it's not going to fit inside cloth right I cl here right it's not it's got it's not going to fit in here.

So that means that all this information will be lost like all this information here will not fit in the model. So that

is why you need a way for um models to be able to retrieve the information that it needs from this 500k documents and somehow fit inside this

small context window that LLM pro LM models usually have. Now you might argue like hey GBT has 1 million context window right Gemini has two 2 million is

it one million two I think it's 1 million contest window uh Gemini has one million context but why why bother like why can't we just stuff it in well

you're correct one but if like when a model has let's say 1 million content that doesn't mean you should fit 1 million tokens into that like the reason

why is because it's is because like long prompts usually get confused by the Labs. So like if let's say you have a

Labs. So like if let's say you have a really really really long document here, right? It's a 1 million ter

right? It's a 1 million ter in the middle research has shown that anything in the middle is going to get lost by the LM. So that is why it's much

better to have precise and relevant information being feed inside this context window as opposed to just

brute force plugging everything into this lm provider. So that is why you could you might have heard like people talk about oh context engineering that's why because I'm trying to fit inside

this context window as concisely and as efficient as possible to increase the performance of these AI applications

right that is why we need right and yeah we talk about input data right so the way you do that let's do it step by step within DI is you can add knowledge so

I'm just going to publish it just in case I need to publish it.

I'm just going to add add knowledge base, right? We have some knowledge, but

base, right? We have some knowledge, but I'm just going to build it from scratch so you guys understand what's happening.

So, I'm just going to click on create knowledge. Now, there's three options.

knowledge. Now, there's three options.

You can either import from file, which is you can import any import any text format. You can sync it from notion or

format. You can sync it from notion or you can sync it from website. Now, for

this for this purposes of our build can, I'm going to actually scrape from the website. So the way that I do that is by

website. So the way that I do that is by using something called fraw. So the way you do that is configure file crawl and you'll need to go to f. I've already

have it configured. So I'm just going to show you guys what I meant. Fraw.

So firecraw is actually a web scraping tool that you can just use like I use it a lot. And the reason why you don't just

a lot. And the reason why you don't just paste the entire website, by the way, into a knowledge base is because there's so many like noise within a within a website document, right? So, if you

didn't know, a website is dumb being using HTML codes. So, there's a lot of different things that are just not relevant such as the images like how is that useful? There are many links,

that useful? There are many links, right? Links that are kind of useless,

right? Links that are kind of useless, right? there's a lots of like useless

right? there's a lots of like useless info like the map you're not going to be using the map within the feeding it into knowledge base right because it supports text based data only so you can see like there's a lot of information that's kind

of useless on a website and it's really messy so that's why using something like fire where allows you to clean up this data within this website much easier and once you go to fo you sign up an account

all you have to do is copy this API key and go back to the vi and actually paste your API key here Right. And then you just escape. And now you just need to

just escape. And now you just need to all you have to do is copy the website and paste it here and click run. Right.

I'm just going to limit myself to one at the moment because like I want to keep it simple and I don't want it to run on forever. But you can please feel free to

forever. But you can please feel free to go ahead and basically scrape whatever websites you want. Right?

So it's going to take a little bit of time.

So as you can see we have finished you know configuring the divy workspace. So

I'm just going to click on next. That's

the import done. That's the import done.

Now it's going to say document processing and you have like chunk settings and all that. Let me explain in details.

The next thing is chunking. Chunking

means breaking it down. Right? All that

means is you can probably tell from the name is you have one big document.

You're trying to piece it into divide it or chop it if you want into multiple blocks. Think of it like a leg, right?

blocks. Think of it like a leg, right?

You have a big Lego project like a I don't know whatever like a building Lego building you can actually take it apart by using different Lego blocks. The same

way that you can chunk a document into different pieces. Again the reason why

different pieces. Again the reason why again back to the context window remember why I said we want to be efficient. We want to only retrieve the

efficient. We want to only retrieve the most relevant information to feed into the context window of this AI model, this LM model so that it will generate

the most relevant output back to the user. Right? That's the that's the whole

user. Right? That's the that's the whole idea. So that is why you would need to

idea. So that is why you would need to chunk it into different pieces because each chunk will have a very very small size that is precisely relevant to the

user's question. That is why we need to

user's question. That is why we need to use chunking which is breaking down within divide. There's two different

within divide. There's two different chunking methods. There's something

chunking methods. There's something called general chunking and there's parent child chunking. Let me go ahead and explain general chunking first which is kind of the way that people most

people are used to doing it. General

chunking within D5 is basically you have to you the user have to manually define the text chunking which is what are the size of the chunks? Are there any overlaps? How do we actually chunk them?

overlaps? How do we actually chunk them?

So what it means is like if we go back to this analogy it's like how big is each of these chunk each of these pieces that we've defined in the documents how many pieces should we define this

document how big each piece should be and should there be any overlap right between this this these chunks right these are all questions that we need to be asking we need to set ourselves in

this general chunking method which if you go back to the V right you can see that there's delimiter there's the maximum chunk length and chunk overlap right So let's go through them one by

one. So you can see this horrible slashn

one. So you can see this horrible slashn slashn. If you're not a developer, you

slashn. If you're not a developer, you might be like, what is that? So a

limiter is the character uses separatives, right? If you can just read

separatives, right? If you can just read through the the dos, right? But what

this really means is basically telling the um AI how should I chunk the document? Like what's this specific

document? Like what's this specific method I should think about chunking it?

Because there are different ways that you can define it, right? If I use slash what that actually means is I am going to divide the document by sentence by

complete sentence. Right? If you use

complete sentence. Right? If you use this as an example you can see that this is a long sentence. It's run on sentences like the the only time you see a full stop is here. Right? So this is

classified as one complete sentence. And

if I use slashn as a delimter, I'm telling that I'm going to be chunking this document by one sentence as one chunk, second sentence as another chunk.

Right? That's how that's how you explain this. But if you use slash slashn, that

this. But if you use slash slashn, that means that I'm going to be using two sentences, right? So if you go to here,

sentences, right? So if you go to here, I know it's a long runon sentence, right? So the first full stop is here.

right? So the first full stop is here.

This is the first sentence. The second

sentence is a run on and on and on to here. the second full stop. Right? So sl

here. the second full stop. Right? So sl

is two sentences as one chunk. Every two

sentences I do one chunk. Right? That's

kind of the most common way to define how you want to chunk it. Right? Which

is the delimiter, right? The limiter is how you want to chunk it. For me

personally, I like to use slash and plus. All that means is I can have more

plus. All that means is I can have more than two sentences um in my chunking just because for websites like this it's

usually like very very like sparse. What

it means if I just show you um if I just use slashn and I just click on preview chunks by the way if you click on preview chunks you can kind of visualize how your chunks looks like how it's currently being divided. You can see

like oh there's like this link you got the 50% second chunk this you know how how fragmented this looks. This is

because like website data is oftenly very fragmented by definition. So you

can see there's two 86 characters 82 estimated chunk. That's way too many

estimated chunk. That's way too many chunks for a simple website. So the way that I like to is group them together more chronally. So I'll just click on

more chronally. So I'll just click on slash and plus. And what this does is it's going to group more things into one chunk. And you can see there's only 11

chunk. And you can see there's only 11 chunks which is much better for something simple, something sparse like a website data. But if your if your source is like a big long well

ststructured PDF document, a text document like a text report, then it could be completely different. So the

way that you define your chunking really depends on your source of information that you have. And by the way, I usually um click on like the delete all URLs and

email addresses. So let me quickly

email addresses. So let me quickly explain what this is. Like it's the text pre-processing rules which is it will automatically remove the consecutive space and tag which is just stop wasting space. You delete all irrelevant URLs or

space. You delete all irrelevant URLs or email address which is very common in website. So if I just click on preview

website. So if I just click on preview chunk like it's basically gone ahead and cleaned it up a little bit better. This

is just a cleaning basically um a little to make it a little bit more like less noise less rubbish less garbage within this chunk essentially.

Now we define how we chunk it. But we also need to define what's the size of each chunk, right? What's the size of a

chunk, right? What's the size of a chunk? This is how you define it using

chunk? This is how you define it using the maximum chunk. So each chunk will have a maximum limit 1,24 characters, right? Obviously, you can

characters, right? Obviously, you can change this like if you want like if you want to do this like however you want, but usually I'll just keep it at 1.

That's 24 for now. There's trunk

overlap. So you may ask, wait, what is trunk over? Why do we need chunk

trunk over? Why do we need chunk overlap? Like like what do you mean by

overlap? Like like what do you mean by that? Right. So chunk overlap is as you

that? Right. So chunk overlap is as you might know is the kind of common words or common overlaps right between chunk

one and chunk two let's say right there.

There's an overlap here, right? Why do

we need that? Well, if you think about it, the way that we chunk it normally is by sentences or by two sentences.

However however the sentence here might be very very related conceptually, right?

Conceptually or semantically meaning in terms of meaning. The second and verse sentence might be describing the same thing. For example, if I say I like

thing. For example, if I say I like apples as the first sentence, right? The

second sentence is apples are great and they are made of da da, right? And they

taste like that. Obviously, the second sentence is built on top the first sentence. But the way that it's being

sentence. But the way that it's being chunked right now by sentence means that chunk one is completely independent and unrelated to second chunk. That means

we've lost kind of the connection or the relationships between the two sentences.

Right? So that is why we need some sort of chunk overlap so that the AI when it's retrieving the chunks it will have

it will preserve that connection that relationship between the first chunk and the second chunk. So that is why we need to use chunk overlap which is usually

around 10 to 20% of your maximum chunk length. That's just based on research

length. That's just based on research that this is the optimal kind of chunk overlap that you should be. So I'm just going to stick with 15 for right. This

is general chunking. Now let me talk a little bit about kind of the more advanced way of doing this. This is kind of the general naive way. This is the more controlled way which is using

parent child chunking.

Click on parent child chunking. Right.

So what is parent child chunking? So

parent child chunking as you can probably guess from the name is using the parent chunks aka the paragraphs which serve as the larger text units to

provide context while the child chunks which is smaller chunks the sentences to focus on kind of pinpoint accuracy

relevance based retrieval right so it's having two data structures having one small chunk for accuracy and one big chunk like a paragraph for context or

for enrichment right so let's actually take a look how it works right so if you kind of just visualize this diagram is you have a document if I use parent child chunking that means that I have

this child like chunking which is precise which is the sentence that we talked about that will actually this main use case is for accuracy and you

have this parents paragraphs or multiple paragraphs even to enrich this data, right? And all of that will be plugged

right? And all of that will be plugged into the Lab to generate an answer essentially, right? And you can probably

essentially, right? And you can probably see how this is a better better method, right? Because basically the first step

right? Because basically the first step is you car child chunks which is more info to match the users pre cury for precision, right? There doesn't have to

precision, right? There doesn't have to be one sentence, but you can kind of control it.

And the second step is the context enrichment, right? with the parent

enrichment, right? with the parent chunks that I just talked about. We're

just using the largest section paragraphs to enrich the child context.

So how it really works in D5 is you can select your parent chunk right which is the paragraph again within each of the parent chunk. You can decide again how

parent chunk. You can decide again how you want to chunk it. So this in here I'm saying that every like two sentences is one par chunk right because remember slash and slings is two sentences. I'm

just going to put in a section plus for now so that it's a more it's more feel like one paragraph and you can select the maximum chunk length which is how big each paragraph should have. So each

paragraph should have 24 characters right or if you want you can use the entire document as the parent chunk and treat directly. So that means that in

treat directly. So that means that in our example the parent chunk rather than using you know paragraphs is the entire document will be used to provide context.

Now the advantage is it's going to have a lot of information. The disadvantage

is if paragraph one is talking about um let's say apple, paragraph 2 is talking about birds. You can see how paragraph 2

about birds. You can see how paragraph 2 is completely irrelevant to paragraph one. So the context that you're feeding

one. So the context that you're feeding in using the parent child ch method is completely irrelevant. It's completely

completely irrelevant. It's completely rubbish. Right? So kind of it depends on

rubbish. Right? So kind of it depends on your document. So it really depends on

your document. So it really depends on use case. That's why you want to switch

use case. That's why you want to switch between the paragraph and the full dot uh parent child method essentially.

And you will have the child chunk for retrieval delter. This is where we

retrieval delter. This is where we define like usually we just keep it as one sentence / n. And this the maximum chunk length um for the child chunk for the precise retrieval. Right? So if you

click on pre preview the chunk you can see that it looks slightly different.

Right? It's a very different like chunking method that we used to. So you

can see that there's chunk one which is the parent chunk and you can see C1 C2 C3 right so these are the child chunks so this is the sub chunks right for

precise retreat for C11 C12 whereas the chunk one the big chunk this is the parent chunk this is the paragraph that we talked about to enrich the context hopefully I'm not rambling on too much

and you're kind of understanding this like the reason why I've spent so much time is because this is really important to get it right right um it's really kind of the bread and butter of everything that we

step three index which is now that you've break the well kind of recap what we've done right we import the data which is using firework we break it down

into multiple pieces right we've done that and we're now doing organize because now they give multiple divided into chunks we need to one store them

somewhere because where's this information being saved one and two how is it being organized how can you organize this information right so these are the information that

can be solved by something called indexing and if you're familiar with ragore you would know that we'll be using a factor database to store these

charts right and this is exactly how we're going to do it so if you scroll down to the divy not space you will see that there's index method that's high quality or economical I'm going to break

it down exactly what each one means.

So, a vector store. So, first of all, like a vector is if you don't know, like you don't need to get mathematical or or like geeky about it, like all that means

is it's something that has um some a direction in a three let's say a 3D space, right? So, if you literally think

space, right? So, if you literally think of a data point in a three-dimensional space, right? You can visualize it. Um

space, right? You can visualize it. Um

let's say a data point living in this space. Each data point can represent

space. Each data point can represent like a chunk or a concept. So for

example, each chunk this this data point here it can represent chicken. This here

can represent cat. This data point here can represent banana. In the real world it will have 300 dimensions. This is

three dimensions because humans can only visualize in 3D. Right? But in it in a real production vector database it can even have thousands of dimensions.

Right? So this is a vector database and the high quality the high quality index method really is just or is really is

just a vector. Essentially how it works is you turn chunks right which are text right chunks are just these text into vectors. So you turn a bunch of text

vectors. So you turn a bunch of text right text and you just turn those into vectors. You turn those into vectors

vectors. You turn those into vectors like this. And the advantage of using a

like this. And the advantage of using a vector store is you can store them according to semantic. Semantic meaning

what that means is concepts that are related to each other will be grouped together. As an example, if you look at

together. As an example, if you look at this vector database, if we're talking about fruits, right, apples and bananas are semantically related, right? They

are they both fruits. That's why they're grouped together in the same similar space. are close to each other. But

space. are close to each other. But

Apple the future store and Apple the fruit even they share the same name they they're conceptually slightly different things. So they will still be grouped

things. So they will still be grouped together but they're not exactly the same because they're slightly different concept. Whereas something like chicken

concept. Whereas something like chicken and cat they're animals which are completely unrelated to computers or fruits. Right? So that's why they're

fruits. Right? So that's why they're being stored somewhere really far away from the bananas and the apples, right?

So that is the advantage of using back databases. You can store them and

databases. You can store them and organize them. Remember indexing is just

organize them. Remember indexing is just organizing information in a semantic meaning, right? In in basically how how

meaning, right? In in basically how how the concepts are related to each other.

And that is back to store.

Now we also have something called economical right which is keyword index.

So economical if you select there you go it's the economical. So economical as you can see here is using 10 keywords per chunk for retreat. So no tokens are

consumed at expense reduc right that all that really means is that it's using a keyword search method. It's not use this is using l by the way because the way

that you turn text into vectors is through an embedding model. Remember how

we I told you to kind of set it up like using text embedding small like in the beginning when your text when you're configuring the settings and the model providers. This is where the embedding

providers. This is where the embedding model and it's using the open AI embedding model which turns text to values that consumes tokens, right?

Because using an LM, right? So that's

the that's basically like cost tokens and cost money. This does technically doesn't cost tokens at least because it's using a keyword search. So it's

using what we call an inverted index method which is basically saying that each chunk will store 10 keywords, right? and you store the information,

right? and you store the information, you organize the information according to the keywords and the frequencies. Let

me kind of give you an example what I mean by inverted index. So the way that we can explain inverted index is let's explain what forward index really means, right? There's forward and info inverted

right? There's forward and info inverted index. Forward index the usual way of

index. Forward index the usual way of doing it is you assigned the ID to the documents, right? So let's say I have a

documents, right? So let's say I have a document or term. If you think about the term or keywords as a document cat, I will assign the ID one to cat. I'll

assign the ID2 to dog. I will assign the ID3 to cat, right? Etc., etc. Whereas for something like the inverted index, it's the other way around. We are

mapping the document to ID. We're

mapping the if cat appears three times, right? I'm going to assign the single

right? I'm going to assign the single term document cat to multiple ids 1, three, and six, etc. So, how it really works kind of a more kind of kind of easier way to think about is we have

three documents. Each document is one

three documents. Each document is one sentence. Winter is coming, else is

sentence. Winter is coming, else is theory, the choice is yours, right?

Three documents. The thing the way that it works is you pretty much just divide each words as a term, right? Choice is

one word, timing is one word, theory is one word, is one word, is one word, is one word, win is one word, right? And

then you basically have some sort of frequency like this appears one time, this appears one time, this appears one time, this appears three times, you count how many times it appears and you

map it to the documents, right? So you

basically this is basically a way of organizing like keywords so that it based on how many times it appears and which and you can quickly find like which document which source it comes

from. So the way why am I explaining to

from. So the way why am I explaining to you this all this right is basically using the keyword indexing the way that you the reckoning retrieves um the the

information back to the user is based on keywords if I the user asks the chatbot about winter it's going to go ahead and search for the term winter and based on

the frequencies right and the document that I signed that we map it to I can just quickly retrieve the document one back to the user into the to sorry into the lm and the LM can then generate a

response back to the user. Right? So

this is how kind of keyword search works. But it's as you can see the

works. But it's as you can see the disadvantage is you have lost all semantic meaning. They this inverted

semantic meaning. They this inverted keyword indexing method doesn't tell you how concepts are organized together conceptually. Right? Kind of like this.

conceptually. Right? Kind of like this.

It loses all conceptual relationships and meanings. That's a disadvantage. So

and meanings. That's a disadvantage. So

for now I'm just going to go with high.

You should pretty much always go with high quality. And by the way, if you go

high quality. And by the way, if you go with like the parachong, you can only go with high quality. So that's that.

That's that. So the final one is the retrieval settings, right? And this is where the kind of the rack pipeline, the final step of the rec pipeline is search. We've imported the document.

search. We've imported the document.

We've break it down into multiple pieces. We've organized them and stored

pieces. We've organized them and stored them somewhere. Now we just need to

them somewhere. Now we just need to search for it, right? So the way that we do that is under the high quality method that the high quality index chosen.

There's two there's three different ways. The first way is our standard way

ways. The first way is our standard way or to be pretty much familiar with is the factor. How it works is the user

the factor. How it works is the user asks a question. So let's say the user right they ask some sort of questions.

let's say blah blah blah blah blah and the text then turns into a vector using the same embedding model using the same embedding model to turn the text into vector and

project this into the vector store that we already have. So let's say the user's question is this. It got translated to a vector. Let's say this data point here.

vector. Let's say this data point here.

And it's going to go ahead and see surrounding which documents right or which sorry which chunk right that we've that's already in the vector store are

closest in distance to this query or the user question vector because remember the closer the vector points are they're semantically the closer right? So if I

search for the top three results, that means that I will retrieve the top three closest chunks to my query vector that are most that most relevant to my

questions and it's going to retrieve those chunks back to the outl right.

That's what top K means by the way because you can see if you go to vector search you can see that there's a top K, right?

That's all that means is basically retrieving the top three closest chunks, right? So go back to three

right? So go back to three and you have a threshold, right? A

percentage threshold that you can see, right? A score threshold. It's basically

right? A score threshold. It's basically

0 to 100% or if you will, sometimes it's just 0 to one. It's basically a score.

So what score it? What score is that? So

usually it's a cosine similarity score.

Again, don't worry too much about the fantasy term like you don't need to know about that. Just conceptually how it

about that. Just conceptually how it works really is basically telling kind of in the ve space how close different concepts are because basically the

closer the vectors are like how to define closeness right one of them is just by distance but in a 3D space usually by an angle so like we have this

small angle here if let's say two vectors right have a smaller angle that means they're more related to each other right So that is why we use some sort of

like similarity score to see how semantically how close they are. So

we're basically saying that if the score is above let's say 50% let's say 0.5 only then will I retrieve those shots right? Anything that's less than 0.5

right? Anything that's less than 0.5 that means that semantically they shouldn't be too close to what the user is asking and therefore I'm going to

discard those questions right sorry discard those charts right that is our vector search

and the second way is by the full text search so a full text search is basically I'm going to index or not organize all the terms in a document is

using again a more advanced keyword search. It's using a keyword search to

search. It's using a keyword search to go through all the documents and it's basically running a bunch of algorithms that you don't need to know about, but it's basically using these to judge the

relevance to the user's question. In a

full text search algorithm, it's able to pick up things like run, words like run, ran, right, that are closely related to each other but slightly different. is

able to pick up these things and it's going to judge the relevance based on different parameters. Things like word

different parameters. Things like word count, how many times do they appear, proximity, how close are the keywords together in the stock. This will all

influence the full text search results.

But usually what we do is using a hybrid search. It's basically research

search. It's basically research basically shows that if we use either of those, it's not going to be as powerful as combining both vortex search and vector search. There's what we call the

vector search. There's what we call the hybrid search. This is the hybrid search

hybrid search. This is the hybrid search which is the recommended way of doing the retrieval search.

And the way that you can do that is by weights and by rebrandings.

Weights right here from here are basically how should I adjust the kind of strategy of where I should adjust kind of the semantic and kill. What it

means is that okay I have this vector search I have this full text search. How

much should I weigh those? Like which

one is more important? Should I do a little bit more vector search than the full text search? Right? Which one is more important? This is where the

more important? This is where the weights come in. So if you choose 0. So

if you choose 0.7 so you can set semantic to be 0.7 that means that 70% of the time it should do semantic and only 30% of the time like you should do not exactly like that but just to give you a good intuition to understand

what's going on is that it's going to place more importance on semantic than the keyword right but the more optimized to do it is use

something called rankers. Now remember

when I asked you to set up a Gina AI, this is what the reanker models. Gina is

basically really good AI frameworks and they have different models of different embeddings embedding models, right? And

also different um ranking model, right?

It's really useful for rack applications. And what are rerankers?

applications. And what are rerankers?

Well, think about it this way. When the

user asks the questions, it's turned it's doing a you know a vector indexing that we've discussed turns into a vector using embedding model to a vector database right imagine these are all the

documents being chunked up store in the vector database as vectors and during a vector search it's going to retrieve that right but the problem with using a

top k vector search pure vector search is that a vector database is optimized for speed and recall meaning that it's optimized for speed. You know what that means?

for speed. You know what that means?

Recall is just basically have I missed anything? I want to make sure that I

anything? I want to make sure that I have missed anything. Right? That's the

advantage of using vector database is you usually won't miss much by using at least a high top K. But the thing is just because that just because the vector data space retrieval is being

careful, it means that only some of the you know chunks are truly relevant to the user's query. And ranking models are

optimized models to optimize for relevance. How relevant truly is to use

relevance. How relevant truly is to use this question to these top chunks that retrieve from the vector database.

That's where this binary ranking step is really useful.

So the reason why it's more by the way it's more accurate is because for the vector database you never do the users query and the document simultaneously.

That means that you every time you run a vector rack, you index and chunk all the documents in your knowledge base first

and then only then do you chunk the sorry embed the user query. Whereas for

reanking model you're doing both simultaneously. That is why it's able to

simultaneously. That is why it's able to optimize for for actually relevance. So

what it does is a model that can go through all the chunks. It's going to rerank race in the name rank the most relevant chunks in ascending order and

only return the truly top three or four most relevant chunks right using a rank model and that is why we always always always

most of the time should be using the reanker and way you do that is you just connect to the jina rear ranker m0 by configuring the API key within this div

right that's it that's the entire rack pipeline. I know it's a lot of

pipeline. I know it's a lot of information to take in, but to summarize, you import the data, you chunk it down using usually the parent child chunking

to enrich your data. You organize and store it in a vector database and you just search it using a hyper search of both keywords, full text

search and vector search. And finally,

you use a ranking model to truly only retrieve the most tricks. That's the

most basic rack pipelines that you can think of.

So yeah, this is this is all all you need to know about rack. I hope this has been helpful um been been really relevant to to kind of how do you think about setting up rack especially in

something like diffy. So I'm just going to go ahead and actually process it. So

I'm just going to go ahead and click save and process.

And I'll be processing.

And now, as you can see, it's finished chunking. So, I'm just going to go to

chunking. So, I'm just going to go to document.

And you can see the status is available, right? And now is officially within the

right? And now is officially within the knowledge base that we just did. So, all

I'm going to do is now that it's saved in a vector store, in the knowledge base within the divy workspace, I'm just going to go back to the studio and go back to our chatbot. All you have to do

is to click on the retrieval, the one that we just did, add it back to the knowledge base, and that's it. And you

can just set the retrieval settings by using the reanker model that we just talked about. Set it to the ranker M0. I

talked about. Set it to the ranker M0. I

usually just go with the top three chunks. Um, but you can set four chunks

chunks. Um, but you can set four chunks as well if you want. Doesn't really make a difference. And you can just click

a difference. And you can just click save. And that's it. You've built your

save. And that's it. You've built your first rack chatbot within Divy AI.

And now let's talk to it like hey there how can I help you today? So I'm just going to say like hey uh what's the how much right for membership right so

that's usually a good question that they will usually like cuz users usually ask so as you can see is remember you can slice the information from the website saying that offic is that much joining

fee the core is that much right class is that much like you've got what do you get exactly in each package etc and you can see with the follow-up suggestion you can actually ask questions like can

I post moments are the classes included.

Is there student discount? Right? So,

you can just ask questions like that.

Um, is there student discount? Do they

like see specific information about spec uh student discounts because it's not in this website at the moment, right? So,

we just say like um when do you guys open? Right. It should be able to answer

open? Right. It should be able to answer this information, right? Exactly. So this is kind of the

right? Exactly. So this is kind of the use case right where you can have this chatbot on the website to kind of answer questions that don't so that they don't have to ask you know the questions that

being asked million times to the staff members in the in the in I'm pretty sure this is so many so many people just ask the same questions that can be answered in the chat. So the way you do that you

click publish and you go on publish update to save it. Now you may ask wait I built this cool chatbot how do I actually deploy this? All you have to do there's four different option. You can

either run app which is hosted within DV which is this website that's being powered by defy itself or you can embed it into the website. All you have to do

is you click on copy copy this code into your web website back end. So all you have to do is you if you for example doing it for PJ you would log into the you talk to your web developers to build

this website for pilgrim. You will paste the code that I just copied from here and you will just paste it within the back end and it will be you will have a chat widget on your bottom right here

and people can just talk to it that way.

This kind of the standard chatbot settings and this is exactly how you would deploy it. Really simple. So

hopefully this has given you a good idea of how you want to build this chatbot. I

know it's been a long one, but hopefully you got some value out of it and you know, if you're a business owner and you've seen value in this, like you think this is something that's useful, please like feel free to book a

discovery call with me and my team so that we can actually take this a step further and actually build something even more advanced and more valuable for your business. um like if you want to

your business. um like if you want to build some sort of chat agents, you know, AI automations or just voice a channel. But yeah, if you enjoy this

channel. But yeah, if you enjoy this kind of content, um please consider to like and subscribe and I'll speak to you in the next one.

Loading...

Loading video analysis...