Master Dify AI Chatbot in 1 Hour: Complete Beginner's Guide for 2025
By Eddie Chen | AI Automation
Summary
## Key takeaways - **Dify: Open-Source Low-Code AI Builder**: Dify.AI is a low code platform to build production ready AI solutions quickly with 112K GitHub stars, open source so you can self-host it like N8N using drag and drop interface. [00:46], [01:06] - **Chatbot vs Agent: No Tools vs Tool-Calling**: Chatbot is conversational but can't connect to tools or take real-world actions, whereas agent can connect to tools, do tool calling, and make decisions. [05:18], [05:34] - **Context Window Limits Force RAG Chunking**: LLMs have context limits like Claude's 200K tokens so a 500K document won't fit entirely, losing info; long prompts confuse models with middle content forgotten per research. [15:16], [16:50] - **Firecrawl Cleans Noisy Website Data**: Don't paste entire websites into knowledge base due to HTML noise like images, links, maps; use Firecrawl web scraping tool to clean up messy data first. [18:28], [19:12] - **Parent-Child Chunking Boosts Relevance**: Parent-child chunking uses parent paragraphs for context enrichment and child sentences for pinpoint accuracy, better than general chunking by retrieving precise then enriching chunks. [28:15], [29:24] - **Rerankers Optimize Vector Retrieval**: Vector DBs prioritize speed and recall but rerankers like Jina's rerank m0 judge true relevance by processing user query and chunks simultaneously, returning top relevant ones. [45:17], [47:02]
Topics Covered
- Context Windows Kill Long Prompts
- Firecraw Cleans Website Noise
- Parent-Child Chunking Beats General
- Rerankers Fix Vector Recall Flaws
Full Transcript
What's up, guys? My name is Edwin, and this is absolutely everything you need to know about setting up advanced racket chat bots in Nivy.AI.
By the end of this video, you'll know how to set up advanced chat agents for your business beyond just making one ad chatbt or blindly uploading PDFs to knowledge base and hope for the best.
Now, why should you listen to me to teach you about how to set one up? My
name is Edwin and I'm the co-founder of Legacy AI where I've been serving more than 20 retail and B2C clients in the past 6 months with conversational agents like the one you see in this video. So,
you're learning from someone who's actually done it before in the real world, not just theory. Now, let's dive straight into it. Let's jump straight in into DV.AI. If you didn't know, DVAI is
into DV.AI. If you didn't know, DVAI is basically a really low code platform that allows you to build production
ready AI solutions really quickly. And
it has got 112K GitHub stars which is really good like speaks for itself. And
the good thing about this is one it's open source so you can actually access the the code base and you can actually configure it on your own if you want.
You can actually self-host it. So
something similar to N if you're familiar with it. You can also self-host the in your own organizations if you want to. So it's a really good like
want to. So it's a really good like platform for us to build this kind of using drag and drop interface to build this sophisticated well-gineered AI solutions. And it's not just cool demos.
solutions. And it's not just cool demos.
You can actually build production ready LLM AI applications using this platform.
And the way that you can get started is just get your account. And I've already got an account. But once you're in and inside the login, you will find yourself in kind of this workspace canvas. And
the thing about defy is there's something called marketplace. So like if you want to build something. So for
example, if you want different AI providers, you would need to get it from like kind of like a marketplace kind of an app store if you will. And the way to do that first of all before you even
click on anything is you should get your settings and configurations first. So
the way you do that is you click on settings and you go to model profile.
And from here, this is where you can choose which LM provider you want. So
you would need to install these providers in the DV market space. So if
I want Enthropic, Claude, Amazon, if you want Hugging Face, Deep Sea, Geminine, or Llama, some of the open source as well with Llama, uh there's Maestro, etc. Right? There's Llama API. So you
etc. Right? There's Llama API. So you
can kind of choose whatever you want.
I'm just going to keep it easy and just stick with OpenAI. And the way you set it up is just click on set up. You enter
your opening API key from here, right?
You save it and from here click on this the like three dots and you can basically configure the system model settings and the first one is system recent model
which is which model that I'm just going to use to build my applications. I'm
just going to choose GP4.1 in this case.
And from here you can also choose your embedding model which I will talk a little bit about. If you're familiar with rack you're probably familiar with this already. If not, I will explain
this already. If not, I will explain I'll go through a deep dive into how you build good rack applications within divy
ranker model. I will explain that as
ranker model. I will explain that as well. So don't worry too much about that
well. So don't worry too much about that if you're not familiar with it. But
essentially we'll be using jina. So the
the way that you can go is you can go to jina.ai and this will be something that we will need to use. So just sign up using an account and access your API key this way
and you'll just need to click on manage API and you'll just need to create your own API key and import it in here right super simple I will explain what it is
speech to text if you're using voice applications text to speech and speech to text is where you build voice applications which you can just put your own model in like I just use both the G4
mini text to speech and transcribe or you can just use whisper if you want but like it doesn't really matter because we're not going to be using it today, but it's just there. If you know, this is where you configure all your models
and settings here. Now, when you're done with that, now you can actually go back to to the main page. And if you're new, brand new to this, don't worry. Studio
is where you can actually build different AI applications. You have
different choices. You can choose workflow which are more manual kind of not necessarily manual but like stepby-step more logical data lm
workflow similar to n but n is more on the kind of business application settings so like if you want to connect to CRM if you connect want to connect to outlook business applications is easier
to set up in nan but workflow is where you actually want to have precise and highlevel engineering control over an AI
driven application. So it's really
driven application. So it's really really focused on LM and AI.
You got chat flow which is slightly different um different to workflow because chat flow as you know as you from the name is more on the conversational use case. So like chat bots you know chat agents, voice bots,
voice agents etc. Chatbot is confusing I know but chatbot is basically an easier version of chat flow. So it's like it's like a more simple applications which
we'll be doing today. agent. The
difference between chatbot and agent is that chatbot doesn't can connect to any tools. So it can't actually do anything
tools. So it can't actually do anything useful in the like doesn't take any actions in the real world. It can just talk it many application is conversational. Whereas agent actually
conversational. Whereas agent actually do something in the real world. It can
connect it to tools do tool calling. It
can make decisions over the tool etc. Right? So today let's keep it simple.
Right? So today let's keep it simple.
The only thing we're going to build is a chatbot for a gym. So, I'm just going to use um the pure gym canary wolf because it's the gym that's close to me. So, I'm
just going to be using this building a very simple, you know, membership um FAQ chatbot in this website, right? That's
the use case to kind of showcase you kind of just to get familiar with the divy use case. So, what I'm going to do is I'm just going to create a blank new chatbot. So, I'm just going to create
chatbot. So, I'm just going to create blank. And you can see there's only
blank. And you can see there's only workflow in CF flow in general in in the real business stage. You probably want to stick with one of these. But for now, because we're kind of starting from
scratch, we're going from simple. I
actually want to give you like a good fundamental overview of what you can do in terms wise. That's beyond just oh, let me just upload the PDF into the knowledge base and then I'm done. Right?
That's what that's what most local tools like nan or like other local platform bord press whatever that do which we are the reason why we're using diff is because we can do more than that much
more than that we can get much better results at least in terms of accuracy and relevance which we will learn which are metrics in building rack pipelines like that's why we used it here right
and for now I'm just going to click on chatbot right with a simple setup I'm just going to give it a name I'm just going to call it um gem description. I'm just going to save it.
description. I'm just going to save it.
Just going to click create and ta this is your first chatbot workspace. So here
the instructions if you're familiar with kind of this logo tools. This is where you can put your system props. The
instructions that you give to the chatbot. I already have a prop. So I'm
chatbot. I already have a prop. So I'm
just going to paste it in here but I'll just go through it with you um just so you understand. Right. So you will have
you understand. Right. So you will have here an instruction right? You're in FAQ system.
Your task is to answer user questions about gym facilities, membership options, slash schedules, opening hours, etc., right? And then you're just going
etc., right? And then you're just going to follow the steps, read the user's question, right? It's always good to
question, right? It's always good to have a role. You always good to introduce the tasks. And it's always good to have a kind of chain of thought, prompting techniques, which is doing this step by step. Step one, read the
user's question. Step two, identify if
user's question. Step two, identify if the question is related to D. Step
three, if the question is about gym facilities, provide clear information about equipment for opening hours.
Right? You can you can kind of see the flow kind of like step by step what how the chatbot should think. Now if you're like man how do you create this prompt?
Wait, don't worry too much. You can
click on this generate prom if you want and it's actually pretty good. You can
actually just you know use a prompt generator and you just describe what you want to build. If I say I want to build a gym chatbot, right? Obviously be more descriptive if you want like you can
also use one of their default options which is let's say a travel planning a agency, right? travel planning assistant
agency, right? travel planning assistant etc. This way you can actually generate um the instructions right and yeah so if you if you're struggling with
problems try that and it should help you at least with a foundational layer of how to create and structure text.
Now you should also have you know in your prompt you should also give it examples right you should always give example what time do you open on the weekends it's only open 24/7 because this is what we call fuel short
prompting which is giving the models examples to follow right so it's more likely to get the answer correct one and also teach the model kind of how you
should how they should think about answering such questions right so I'm not going to go through all of this like it again this is just kind of examples questions that you just like make up or
use the prompt generator to help you generate and from here you can see there is a variable user question. Now
variable within D5 actually allows us to introduce prompt um when so what that what what that means right is this very strong long sentence what all that means
is if we set a variable let's say I'm going to call my variable um you can do it in paragraph select number short test I'm just going to name it user question
so user question right and I'm just going to save now if I have this once I set up this variable you can see that I have the user question variable here.
That means that every time when I chat to the chatbot, the user will have to will have to provide a value for this variable, right? Which obviously is
variable, right? Which obviously is really annoying in this kind of chatbot interface, but it's going to come into handy especially in workflow and chat flow because when you have more
structured logic in terms of how you structure this AI agent or assistance, you're going to definitely need therapies, right? which is not
therapies, right? which is not surprising which is if you're familiar with tools like nan right or other local tools it's kind of you know standard but this is just how you define it within
the theme but for now let's just forget about this and just delete that and for now I will also delete that so I'm just going to say user question
rather than having a fixed variables right and the way you save in the video is you click on publish and you click on publish this is how you can save this um
this versioning that you're working with, if you will.
Now, you've given a system prompt and instructions on how to set up this chatbot, but you wanted to just kind of have a conversational starter first, right? Because right now, it's kind of
right? Because right now, it's kind of off, right? You have to ask the
off, right? You have to ask the questions. But the way that you can have
questions. But the way that you can have the AI to ask the first question is click on manage. You can click on conversational open. So what this does
conversational open. So what this does is that it actually allows the AI to say something first. So it will say hey
something first. So it will say hey there thanks for dropping by to in
how can I help you today?
Right. So I'm just going to click on save and you can click on follow up which is setting up next question suggestions to give us a better chat experience. It's usually recommended to
experience. It's usually recommended to click that on text to speech is converted to voice which we don't necessarily need at the moment.
Citations this is where it's important.
Remember, I'm going to explain to you like a chatbot if you have never built one before. For it to be useful, an AI
before. For it to be useful, an AI chatbot, you would need something called knowledge, which I'll explain to you why. But if you basically, if you add on
why. But if you basically, if you add on citations, that means that it will actually give you precise source of information of where the chatbot gets that information from. It makes more
sense once I explain like content moderation, right? By using moderation
moderation, right? By using moderation API, it makes sense. So basically it's kind of guardrail. It just has a built-in guard rails if you want. So you
can just use open AI moderation and you can just kind of you know turn those on to prevent people from entering inappropriate content or when they're trying to abuse spot. This is how you
can use the moderation. All right. Um
and then you're just going to save. Oh,
you you need to sorry you need to I forgot to type in something. So you can just say um but for now I'm just going to turn this off.
All right. I'm just going to click on restart and say hey there thanks for dropping by to watching how today.
Right. Perfect. Now vision is where you can import images. So if I turn this on you can see that I can actually now attach image to the chatbot. So people
can users can actually do that. See how
simple it is to allow this right. And
you also have metadata filtering which I won't be explaining in this video because it's a little bit more advanced but I will be explaining knowledge.
Why do we need knowledge? Now before we kind of jump into how you do it, let me explain on a conceptual level everything you need to know about setting up a
basic rack chart.
Now the first thing we're going to do is any rack pipelines at least in divy. So
I'm going to be using divy specific keywords but it can extend to any applications. It has four components.
applications. It has four components.
Importing the data which is very common nowadays with these local tools which is importing this unstructured data PDFs, text, word document, whatever you want
to call it into a knowledge base. That's
the first step. It's unstructured
because it's text data. It's not a structured data like a Excel, right? So
this will be this will be a structured data right like is is in a spreadsheet format this is structured in a table format text documents are unstructured.
Now you may ask hang on and why do we need to you know import this why why we doing this in the first place like why can't we just you know for example um if
I just plug the entire document into the prompts right then the chatbook will have access to specific context right
well this is why every lm as you're probably familiar has a very very special feature called context if you think about you
this document right uh the size of this document let's say you have PDF of size let's say u 500,000
characters right but most of the times open AI models not open AAI models just any lms have a specific limit called context that means that there's a limit
as to how much information how much characters how much words you can actually stuff inside the model itself as an example If I have claude which is
200k um context window. So let's say I'm using clo and it's up to here this is 200k right that means that if I stuff
the entire document which is 500k document it's not going to fit inside cloth right I cl here right it's not it's got it's not going to fit in here.
So that means that all this information will be lost like all this information here will not fit in the model. So that
is why you need a way for um models to be able to retrieve the information that it needs from this 500k documents and somehow fit inside this
small context window that LLM pro LM models usually have. Now you might argue like hey GBT has 1 million context window right Gemini has two 2 million is
it one million two I think it's 1 million contest window uh Gemini has one million context but why why bother like why can't we just stuff it in well
you're correct one but if like when a model has let's say 1 million content that doesn't mean you should fit 1 million tokens into that like the reason
why is because it's is because like long prompts usually get confused by the Labs. So like if let's say you have a
Labs. So like if let's say you have a really really really long document here, right? It's a 1 million ter
right? It's a 1 million ter in the middle research has shown that anything in the middle is going to get lost by the LM. So that is why it's much
better to have precise and relevant information being feed inside this context window as opposed to just
brute force plugging everything into this lm provider. So that is why you could you might have heard like people talk about oh context engineering that's why because I'm trying to fit inside
this context window as concisely and as efficient as possible to increase the performance of these AI applications
right that is why we need right and yeah we talk about input data right so the way you do that let's do it step by step within DI is you can add knowledge so
I'm just going to publish it just in case I need to publish it.
I'm just going to add add knowledge base, right? We have some knowledge, but
base, right? We have some knowledge, but I'm just going to build it from scratch so you guys understand what's happening.
So, I'm just going to click on create knowledge. Now, there's three options.
knowledge. Now, there's three options.
You can either import from file, which is you can import any import any text format. You can sync it from notion or
format. You can sync it from notion or you can sync it from website. Now, for
this for this purposes of our build can, I'm going to actually scrape from the website. So the way that I do that is by
website. So the way that I do that is by using something called fraw. So the way you do that is configure file crawl and you'll need to go to f. I've already
have it configured. So I'm just going to show you guys what I meant. Fraw.
So firecraw is actually a web scraping tool that you can just use like I use it a lot. And the reason why you don't just
a lot. And the reason why you don't just paste the entire website, by the way, into a knowledge base is because there's so many like noise within a within a website document, right? So, if you
didn't know, a website is dumb being using HTML codes. So, there's a lot of different things that are just not relevant such as the images like how is that useful? There are many links,
that useful? There are many links, right? Links that are kind of useless,
right? Links that are kind of useless, right? there's a lots of like useless
right? there's a lots of like useless info like the map you're not going to be using the map within the feeding it into knowledge base right because it supports text based data only so you can see like there's a lot of information that's kind
of useless on a website and it's really messy so that's why using something like fire where allows you to clean up this data within this website much easier and once you go to fo you sign up an account
all you have to do is copy this API key and go back to the vi and actually paste your API key here Right. And then you just escape. And now you just need to
just escape. And now you just need to all you have to do is copy the website and paste it here and click run. Right.
I'm just going to limit myself to one at the moment because like I want to keep it simple and I don't want it to run on forever. But you can please feel free to
forever. But you can please feel free to go ahead and basically scrape whatever websites you want. Right?
So it's going to take a little bit of time.
So as you can see we have finished you know configuring the divy workspace. So
I'm just going to click on next. That's
the import done. That's the import done.
Now it's going to say document processing and you have like chunk settings and all that. Let me explain in details.
The next thing is chunking. Chunking
means breaking it down. Right? All that
means is you can probably tell from the name is you have one big document.
You're trying to piece it into divide it or chop it if you want into multiple blocks. Think of it like a leg, right?
blocks. Think of it like a leg, right?
You have a big Lego project like a I don't know whatever like a building Lego building you can actually take it apart by using different Lego blocks. The same
way that you can chunk a document into different pieces. Again the reason why
different pieces. Again the reason why again back to the context window remember why I said we want to be efficient. We want to only retrieve the
efficient. We want to only retrieve the most relevant information to feed into the context window of this AI model, this LM model so that it will generate
the most relevant output back to the user. Right? That's the that's the whole
user. Right? That's the that's the whole idea. So that is why you would need to
idea. So that is why you would need to chunk it into different pieces because each chunk will have a very very small size that is precisely relevant to the
user's question. That is why we need to
user's question. That is why we need to use chunking which is breaking down within divide. There's two different
within divide. There's two different chunking methods. There's something
chunking methods. There's something called general chunking and there's parent child chunking. Let me go ahead and explain general chunking first which is kind of the way that people most
people are used to doing it. General
chunking within D5 is basically you have to you the user have to manually define the text chunking which is what are the size of the chunks? Are there any overlaps? How do we actually chunk them?
overlaps? How do we actually chunk them?
So what it means is like if we go back to this analogy it's like how big is each of these chunk each of these pieces that we've defined in the documents how many pieces should we define this
document how big each piece should be and should there be any overlap right between this this these chunks right these are all questions that we need to be asking we need to set ourselves in
this general chunking method which if you go back to the V right you can see that there's delimiter there's the maximum chunk length and chunk overlap right So let's go through them one by
one. So you can see this horrible slashn
one. So you can see this horrible slashn slashn. If you're not a developer, you
slashn. If you're not a developer, you might be like, what is that? So a
limiter is the character uses separatives, right? If you can just read
separatives, right? If you can just read through the the dos, right? But what
this really means is basically telling the um AI how should I chunk the document? Like what's this specific
document? Like what's this specific method I should think about chunking it?
Because there are different ways that you can define it, right? If I use slash what that actually means is I am going to divide the document by sentence by
complete sentence. Right? If you use
complete sentence. Right? If you use this as an example you can see that this is a long sentence. It's run on sentences like the the only time you see a full stop is here. Right? So this is
classified as one complete sentence. And
if I use slashn as a delimter, I'm telling that I'm going to be chunking this document by one sentence as one chunk, second sentence as another chunk.
Right? That's how that's how you explain this. But if you use slash slashn, that
this. But if you use slash slashn, that means that I'm going to be using two sentences, right? So if you go to here,
sentences, right? So if you go to here, I know it's a long runon sentence, right? So the first full stop is here.
right? So the first full stop is here.
This is the first sentence. The second
sentence is a run on and on and on to here. the second full stop. Right? So sl
here. the second full stop. Right? So sl
is two sentences as one chunk. Every two
sentences I do one chunk. Right? That's
kind of the most common way to define how you want to chunk it. Right? Which
is the delimiter, right? The limiter is how you want to chunk it. For me
personally, I like to use slash and plus. All that means is I can have more
plus. All that means is I can have more than two sentences um in my chunking just because for websites like this it's
usually like very very like sparse. What
it means if I just show you um if I just use slashn and I just click on preview chunks by the way if you click on preview chunks you can kind of visualize how your chunks looks like how it's currently being divided. You can see
like oh there's like this link you got the 50% second chunk this you know how how fragmented this looks. This is
because like website data is oftenly very fragmented by definition. So you
can see there's two 86 characters 82 estimated chunk. That's way too many
estimated chunk. That's way too many chunks for a simple website. So the way that I like to is group them together more chronally. So I'll just click on
more chronally. So I'll just click on slash and plus. And what this does is it's going to group more things into one chunk. And you can see there's only 11
chunk. And you can see there's only 11 chunks which is much better for something simple, something sparse like a website data. But if your if your source is like a big long well
ststructured PDF document, a text document like a text report, then it could be completely different. So the
way that you define your chunking really depends on your source of information that you have. And by the way, I usually um click on like the delete all URLs and
email addresses. So let me quickly
email addresses. So let me quickly explain what this is. Like it's the text pre-processing rules which is it will automatically remove the consecutive space and tag which is just stop wasting space. You delete all irrelevant URLs or
space. You delete all irrelevant URLs or email address which is very common in website. So if I just click on preview
website. So if I just click on preview chunk like it's basically gone ahead and cleaned it up a little bit better. This
is just a cleaning basically um a little to make it a little bit more like less noise less rubbish less garbage within this chunk essentially.
Now we define how we chunk it. But we also need to define what's the size of each chunk, right? What's the size of a
chunk, right? What's the size of a chunk? This is how you define it using
chunk? This is how you define it using the maximum chunk. So each chunk will have a maximum limit 1,24 characters, right? Obviously, you can
characters, right? Obviously, you can change this like if you want like if you want to do this like however you want, but usually I'll just keep it at 1.
That's 24 for now. There's trunk
overlap. So you may ask, wait, what is trunk over? Why do we need chunk
trunk over? Why do we need chunk overlap? Like like what do you mean by
overlap? Like like what do you mean by that? Right. So chunk overlap is as you
that? Right. So chunk overlap is as you might know is the kind of common words or common overlaps right between chunk
one and chunk two let's say right there.
There's an overlap here, right? Why do
we need that? Well, if you think about it, the way that we chunk it normally is by sentences or by two sentences.
However however the sentence here might be very very related conceptually, right?
Conceptually or semantically meaning in terms of meaning. The second and verse sentence might be describing the same thing. For example, if I say I like
thing. For example, if I say I like apples as the first sentence, right? The
second sentence is apples are great and they are made of da da, right? And they
taste like that. Obviously, the second sentence is built on top the first sentence. But the way that it's being
sentence. But the way that it's being chunked right now by sentence means that chunk one is completely independent and unrelated to second chunk. That means
we've lost kind of the connection or the relationships between the two sentences.
Right? So that is why we need some sort of chunk overlap so that the AI when it's retrieving the chunks it will have
it will preserve that connection that relationship between the first chunk and the second chunk. So that is why we need to use chunk overlap which is usually
around 10 to 20% of your maximum chunk length. That's just based on research
length. That's just based on research that this is the optimal kind of chunk overlap that you should be. So I'm just going to stick with 15 for right. This
is general chunking. Now let me talk a little bit about kind of the more advanced way of doing this. This is kind of the general naive way. This is the more controlled way which is using
parent child chunking.
Click on parent child chunking. Right.
So what is parent child chunking? So
parent child chunking as you can probably guess from the name is using the parent chunks aka the paragraphs which serve as the larger text units to
provide context while the child chunks which is smaller chunks the sentences to focus on kind of pinpoint accuracy
relevance based retrieval right so it's having two data structures having one small chunk for accuracy and one big chunk like a paragraph for context or
for enrichment right so let's actually take a look how it works right so if you kind of just visualize this diagram is you have a document if I use parent child chunking that means that I have
this child like chunking which is precise which is the sentence that we talked about that will actually this main use case is for accuracy and you
have this parents paragraphs or multiple paragraphs even to enrich this data, right? And all of that will be plugged
right? And all of that will be plugged into the Lab to generate an answer essentially, right? And you can probably
essentially, right? And you can probably see how this is a better better method, right? Because basically the first step
right? Because basically the first step is you car child chunks which is more info to match the users pre cury for precision, right? There doesn't have to
precision, right? There doesn't have to be one sentence, but you can kind of control it.
And the second step is the context enrichment, right? with the parent
enrichment, right? with the parent chunks that I just talked about. We're
just using the largest section paragraphs to enrich the child context.
So how it really works in D5 is you can select your parent chunk right which is the paragraph again within each of the parent chunk. You can decide again how
parent chunk. You can decide again how you want to chunk it. So this in here I'm saying that every like two sentences is one par chunk right because remember slash and slings is two sentences. I'm
just going to put in a section plus for now so that it's a more it's more feel like one paragraph and you can select the maximum chunk length which is how big each paragraph should have. So each
paragraph should have 24 characters right or if you want you can use the entire document as the parent chunk and treat directly. So that means that in
treat directly. So that means that in our example the parent chunk rather than using you know paragraphs is the entire document will be used to provide context.
Now the advantage is it's going to have a lot of information. The disadvantage
is if paragraph one is talking about um let's say apple, paragraph 2 is talking about birds. You can see how paragraph 2
about birds. You can see how paragraph 2 is completely irrelevant to paragraph one. So the context that you're feeding
one. So the context that you're feeding in using the parent child ch method is completely irrelevant. It's completely
completely irrelevant. It's completely rubbish. Right? So kind of it depends on
rubbish. Right? So kind of it depends on your document. So it really depends on
your document. So it really depends on use case. That's why you want to switch
use case. That's why you want to switch between the paragraph and the full dot uh parent child method essentially.
And you will have the child chunk for retrieval delter. This is where we
retrieval delter. This is where we define like usually we just keep it as one sentence / n. And this the maximum chunk length um for the child chunk for the precise retrieval. Right? So if you
click on pre preview the chunk you can see that it looks slightly different.
Right? It's a very different like chunking method that we used to. So you
can see that there's chunk one which is the parent chunk and you can see C1 C2 C3 right so these are the child chunks so this is the sub chunks right for
precise retreat for C11 C12 whereas the chunk one the big chunk this is the parent chunk this is the paragraph that we talked about to enrich the context hopefully I'm not rambling on too much
and you're kind of understanding this like the reason why I've spent so much time is because this is really important to get it right right um it's really kind of the bread and butter of everything that we
step three index which is now that you've break the well kind of recap what we've done right we import the data which is using firework we break it down
into multiple pieces right we've done that and we're now doing organize because now they give multiple divided into chunks we need to one store them
somewhere because where's this information being saved one and two how is it being organized how can you organize this information right so these are the information that
can be solved by something called indexing and if you're familiar with ragore you would know that we'll be using a factor database to store these
charts right and this is exactly how we're going to do it so if you scroll down to the divy not space you will see that there's index method that's high quality or economical I'm going to break
it down exactly what each one means.
So, a vector store. So, first of all, like a vector is if you don't know, like you don't need to get mathematical or or like geeky about it, like all that means
is it's something that has um some a direction in a three let's say a 3D space, right? So, if you literally think
space, right? So, if you literally think of a data point in a three-dimensional space, right? You can visualize it. Um
space, right? You can visualize it. Um
let's say a data point living in this space. Each data point can represent
space. Each data point can represent like a chunk or a concept. So for
example, each chunk this this data point here it can represent chicken. This here
can represent cat. This data point here can represent banana. In the real world it will have 300 dimensions. This is
three dimensions because humans can only visualize in 3D. Right? But in it in a real production vector database it can even have thousands of dimensions.
Right? So this is a vector database and the high quality the high quality index method really is just or is really is
just a vector. Essentially how it works is you turn chunks right which are text right chunks are just these text into vectors. So you turn a bunch of text
vectors. So you turn a bunch of text right text and you just turn those into vectors. You turn those into vectors
vectors. You turn those into vectors like this. And the advantage of using a
like this. And the advantage of using a vector store is you can store them according to semantic. Semantic meaning
what that means is concepts that are related to each other will be grouped together. As an example, if you look at
together. As an example, if you look at this vector database, if we're talking about fruits, right, apples and bananas are semantically related, right? They
are they both fruits. That's why they're grouped together in the same similar space. are close to each other. But
space. are close to each other. But
Apple the future store and Apple the fruit even they share the same name they they're conceptually slightly different things. So they will still be grouped
things. So they will still be grouped together but they're not exactly the same because they're slightly different concept. Whereas something like chicken
concept. Whereas something like chicken and cat they're animals which are completely unrelated to computers or fruits. Right? So that's why they're
fruits. Right? So that's why they're being stored somewhere really far away from the bananas and the apples, right?
So that is the advantage of using back databases. You can store them and
databases. You can store them and organize them. Remember indexing is just
organize them. Remember indexing is just organizing information in a semantic meaning, right? In in basically how how
meaning, right? In in basically how how the concepts are related to each other.
And that is back to store.
Now we also have something called economical right which is keyword index.
So economical if you select there you go it's the economical. So economical as you can see here is using 10 keywords per chunk for retreat. So no tokens are
consumed at expense reduc right that all that really means is that it's using a keyword search method. It's not use this is using l by the way because the way
that you turn text into vectors is through an embedding model. Remember how
we I told you to kind of set it up like using text embedding small like in the beginning when your text when you're configuring the settings and the model providers. This is where the embedding
providers. This is where the embedding model and it's using the open AI embedding model which turns text to values that consumes tokens, right?
Because using an LM, right? So that's
the that's basically like cost tokens and cost money. This does technically doesn't cost tokens at least because it's using a keyword search. So it's
using what we call an inverted index method which is basically saying that each chunk will store 10 keywords, right? and you store the information,
right? and you store the information, you organize the information according to the keywords and the frequencies. Let
me kind of give you an example what I mean by inverted index. So the way that we can explain inverted index is let's explain what forward index really means, right? There's forward and info inverted
right? There's forward and info inverted index. Forward index the usual way of
index. Forward index the usual way of doing it is you assigned the ID to the documents, right? So let's say I have a
documents, right? So let's say I have a document or term. If you think about the term or keywords as a document cat, I will assign the ID one to cat. I'll
assign the ID2 to dog. I will assign the ID3 to cat, right? Etc., etc. Whereas for something like the inverted index, it's the other way around. We are
mapping the document to ID. We're
mapping the if cat appears three times, right? I'm going to assign the single
right? I'm going to assign the single term document cat to multiple ids 1, three, and six, etc. So, how it really works kind of a more kind of kind of easier way to think about is we have
three documents. Each document is one
three documents. Each document is one sentence. Winter is coming, else is
sentence. Winter is coming, else is theory, the choice is yours, right?
Three documents. The thing the way that it works is you pretty much just divide each words as a term, right? Choice is
one word, timing is one word, theory is one word, is one word, is one word, is one word, win is one word, right? And
then you basically have some sort of frequency like this appears one time, this appears one time, this appears one time, this appears three times, you count how many times it appears and you
map it to the documents, right? So you
basically this is basically a way of organizing like keywords so that it based on how many times it appears and which and you can quickly find like which document which source it comes
from. So the way why am I explaining to
from. So the way why am I explaining to you this all this right is basically using the keyword indexing the way that you the reckoning retrieves um the the
information back to the user is based on keywords if I the user asks the chatbot about winter it's going to go ahead and search for the term winter and based on
the frequencies right and the document that I signed that we map it to I can just quickly retrieve the document one back to the user into the to sorry into the lm and the LM can then generate a
response back to the user. Right? So
this is how kind of keyword search works. But it's as you can see the
works. But it's as you can see the disadvantage is you have lost all semantic meaning. They this inverted
semantic meaning. They this inverted keyword indexing method doesn't tell you how concepts are organized together conceptually. Right? Kind of like this.
conceptually. Right? Kind of like this.
It loses all conceptual relationships and meanings. That's a disadvantage. So
and meanings. That's a disadvantage. So
for now I'm just going to go with high.
You should pretty much always go with high quality. And by the way, if you go
high quality. And by the way, if you go with like the parachong, you can only go with high quality. So that's that.
That's that. So the final one is the retrieval settings, right? And this is where the kind of the rack pipeline, the final step of the rec pipeline is search. We've imported the document.
search. We've imported the document.
We've break it down into multiple pieces. We've organized them and stored
pieces. We've organized them and stored them somewhere. Now we just need to
them somewhere. Now we just need to search for it, right? So the way that we do that is under the high quality method that the high quality index chosen.
There's two there's three different ways. The first way is our standard way
ways. The first way is our standard way or to be pretty much familiar with is the factor. How it works is the user
the factor. How it works is the user asks a question. So let's say the user right they ask some sort of questions.
let's say blah blah blah blah blah and the text then turns into a vector using the same embedding model using the same embedding model to turn the text into vector and
project this into the vector store that we already have. So let's say the user's question is this. It got translated to a vector. Let's say this data point here.
vector. Let's say this data point here.
And it's going to go ahead and see surrounding which documents right or which sorry which chunk right that we've that's already in the vector store are
closest in distance to this query or the user question vector because remember the closer the vector points are they're semantically the closer right? So if I
search for the top three results, that means that I will retrieve the top three closest chunks to my query vector that are most that most relevant to my
questions and it's going to retrieve those chunks back to the outl right.
That's what top K means by the way because you can see if you go to vector search you can see that there's a top K, right?
That's all that means is basically retrieving the top three closest chunks, right? So go back to three
right? So go back to three and you have a threshold, right? A
percentage threshold that you can see, right? A score threshold. It's basically
right? A score threshold. It's basically
0 to 100% or if you will, sometimes it's just 0 to one. It's basically a score.
So what score it? What score is that? So
usually it's a cosine similarity score.
Again, don't worry too much about the fantasy term like you don't need to know about that. Just conceptually how it
about that. Just conceptually how it works really is basically telling kind of in the ve space how close different concepts are because basically the
closer the vectors are like how to define closeness right one of them is just by distance but in a 3D space usually by an angle so like we have this
small angle here if let's say two vectors right have a smaller angle that means they're more related to each other right So that is why we use some sort of
like similarity score to see how semantically how close they are. So
we're basically saying that if the score is above let's say 50% let's say 0.5 only then will I retrieve those shots right? Anything that's less than 0.5
right? Anything that's less than 0.5 that means that semantically they shouldn't be too close to what the user is asking and therefore I'm going to
discard those questions right sorry discard those charts right that is our vector search
and the second way is by the full text search so a full text search is basically I'm going to index or not organize all the terms in a document is
using again a more advanced keyword search. It's using a keyword search to
search. It's using a keyword search to go through all the documents and it's basically running a bunch of algorithms that you don't need to know about, but it's basically using these to judge the
relevance to the user's question. In a
full text search algorithm, it's able to pick up things like run, words like run, ran, right, that are closely related to each other but slightly different. is
able to pick up these things and it's going to judge the relevance based on different parameters. Things like word
different parameters. Things like word count, how many times do they appear, proximity, how close are the keywords together in the stock. This will all
influence the full text search results.
But usually what we do is using a hybrid search. It's basically research
search. It's basically research basically shows that if we use either of those, it's not going to be as powerful as combining both vortex search and vector search. There's what we call the
vector search. There's what we call the hybrid search. This is the hybrid search
hybrid search. This is the hybrid search which is the recommended way of doing the retrieval search.
And the way that you can do that is by weights and by rebrandings.
Weights right here from here are basically how should I adjust the kind of strategy of where I should adjust kind of the semantic and kill. What it
means is that okay I have this vector search I have this full text search. How
much should I weigh those? Like which
one is more important? Should I do a little bit more vector search than the full text search? Right? Which one is more important? This is where the
more important? This is where the weights come in. So if you choose 0. So
if you choose 0.7 so you can set semantic to be 0.7 that means that 70% of the time it should do semantic and only 30% of the time like you should do not exactly like that but just to give you a good intuition to understand
what's going on is that it's going to place more importance on semantic than the keyword right but the more optimized to do it is use
something called rankers. Now remember
when I asked you to set up a Gina AI, this is what the reanker models. Gina is
basically really good AI frameworks and they have different models of different embeddings embedding models, right? And
also different um ranking model, right?
It's really useful for rack applications. And what are rerankers?
applications. And what are rerankers?
Well, think about it this way. When the
user asks the questions, it's turned it's doing a you know a vector indexing that we've discussed turns into a vector using embedding model to a vector database right imagine these are all the
documents being chunked up store in the vector database as vectors and during a vector search it's going to retrieve that right but the problem with using a
top k vector search pure vector search is that a vector database is optimized for speed and recall meaning that it's optimized for speed. You know what that means?
for speed. You know what that means?
Recall is just basically have I missed anything? I want to make sure that I
anything? I want to make sure that I have missed anything. Right? That's the
advantage of using vector database is you usually won't miss much by using at least a high top K. But the thing is just because that just because the vector data space retrieval is being
careful, it means that only some of the you know chunks are truly relevant to the user's query. And ranking models are
optimized models to optimize for relevance. How relevant truly is to use
relevance. How relevant truly is to use this question to these top chunks that retrieve from the vector database.
That's where this binary ranking step is really useful.
So the reason why it's more by the way it's more accurate is because for the vector database you never do the users query and the document simultaneously.
That means that you every time you run a vector rack, you index and chunk all the documents in your knowledge base first
and then only then do you chunk the sorry embed the user query. Whereas for
reanking model you're doing both simultaneously. That is why it's able to
simultaneously. That is why it's able to optimize for for actually relevance. So
what it does is a model that can go through all the chunks. It's going to rerank race in the name rank the most relevant chunks in ascending order and
only return the truly top three or four most relevant chunks right using a rank model and that is why we always always always
most of the time should be using the reanker and way you do that is you just connect to the jina rear ranker m0 by configuring the API key within this div
right that's it that's the entire rack pipeline. I know it's a lot of
pipeline. I know it's a lot of information to take in, but to summarize, you import the data, you chunk it down using usually the parent child chunking
to enrich your data. You organize and store it in a vector database and you just search it using a hyper search of both keywords, full text
search and vector search. And finally,
you use a ranking model to truly only retrieve the most tricks. That's the
most basic rack pipelines that you can think of.
So yeah, this is this is all all you need to know about rack. I hope this has been helpful um been been really relevant to to kind of how do you think about setting up rack especially in
something like diffy. So I'm just going to go ahead and actually process it. So
I'm just going to go ahead and click save and process.
And I'll be processing.
And now, as you can see, it's finished chunking. So, I'm just going to go to
chunking. So, I'm just going to go to document.
And you can see the status is available, right? And now is officially within the
right? And now is officially within the knowledge base that we just did. So, all
I'm going to do is now that it's saved in a vector store, in the knowledge base within the divy workspace, I'm just going to go back to the studio and go back to our chatbot. All you have to do
is to click on the retrieval, the one that we just did, add it back to the knowledge base, and that's it. And you
can just set the retrieval settings by using the reanker model that we just talked about. Set it to the ranker M0. I
talked about. Set it to the ranker M0. I
usually just go with the top three chunks. Um, but you can set four chunks
chunks. Um, but you can set four chunks as well if you want. Doesn't really make a difference. And you can just click
a difference. And you can just click save. And that's it. You've built your
save. And that's it. You've built your first rack chatbot within Divy AI.
And now let's talk to it like hey there how can I help you today? So I'm just going to say like hey uh what's the how much right for membership right so
that's usually a good question that they will usually like cuz users usually ask so as you can see is remember you can slice the information from the website saying that offic is that much joining
fee the core is that much right class is that much like you've got what do you get exactly in each package etc and you can see with the follow-up suggestion you can actually ask questions like can
I post moments are the classes included.
Is there student discount? Right? So,
you can just ask questions like that.
Um, is there student discount? Do they
like see specific information about spec uh student discounts because it's not in this website at the moment, right? So,
we just say like um when do you guys open? Right. It should be able to answer
open? Right. It should be able to answer this information, right? Exactly. So this is kind of the
right? Exactly. So this is kind of the use case right where you can have this chatbot on the website to kind of answer questions that don't so that they don't have to ask you know the questions that
being asked million times to the staff members in the in the in I'm pretty sure this is so many so many people just ask the same questions that can be answered in the chat. So the way you do that you
click publish and you go on publish update to save it. Now you may ask wait I built this cool chatbot how do I actually deploy this? All you have to do there's four different option. You can
either run app which is hosted within DV which is this website that's being powered by defy itself or you can embed it into the website. All you have to do
is you click on copy copy this code into your web website back end. So all you have to do is you if you for example doing it for PJ you would log into the you talk to your web developers to build
this website for pilgrim. You will paste the code that I just copied from here and you will just paste it within the back end and it will be you will have a chat widget on your bottom right here
and people can just talk to it that way.
This kind of the standard chatbot settings and this is exactly how you would deploy it. Really simple. So
hopefully this has given you a good idea of how you want to build this chatbot. I
know it's been a long one, but hopefully you got some value out of it and you know, if you're a business owner and you've seen value in this, like you think this is something that's useful, please like feel free to book a
discovery call with me and my team so that we can actually take this a step further and actually build something even more advanced and more valuable for your business. um like if you want to
your business. um like if you want to build some sort of chat agents, you know, AI automations or just voice a channel. But yeah, if you enjoy this
channel. But yeah, if you enjoy this kind of content, um please consider to like and subscribe and I'll speak to you in the next one.
Loading video analysis...