Apple Surprises with CLaRa-7B: A Useful RAG Model: Install Locally
By Fahd Mirza
Summary
Topics Covered
- RAG's Broken Gradient Fixed
- Retrieve Memory Tokens, Not Text
- SCP Captures Semantic Core
- Compact Tokens Slash Context Costs
- Apple Bridges AI Gap Steadily
Full Transcript
Apple has just released CLA. Claraara,
which stands for continuous latent reasoning, is a unified rag framework designed to bridge the gap between document retrieval and answer
generation. We are going to install it
generation. We are going to install it locally and we are going to test it out on few of the examples. This is Fahad Miza and I welcome you to the channel.
Please like the video and subscribe and consider becoming a member as that helps a lot. If you are looking for AI updates
a lot. If you are looking for AI updates without hype and fluff, please follow me on X as that helps a lot. Before I start the installation, let's talk very
quickly about this model. Traditional
rag systems optimize retrieval and generation separately where the retriever selects document based on surface level similarity and the
generator process raw text creating a broken gradient that prevents end to end learning. This is where Apple's new
learning. This is where Apple's new model is trying to help out. Claraara
addresses this by mapping both document and queries into a shared continuous representation space. I will be talking
representation space. I will be talking more about its architecture and training. But for now, let's get started
training. But for now, let's get started with the installation. This is my Ubuntu system and this is my GPU card Nvidia
RTX 6000 with 48 GB of VRAM. If you're
looking to rent a GPU or VM on very affordable price, you can find the link to their website in video's description with a discount coupon code of 50% for a
range of GPUs. So, please do check them out. Okay. Next up, I'm going to install
out. Okay. Next up, I'm going to install transformers and do make sure that you install it from the source so that you get the uh latest one. It is going to
take couple of minutes. While that
happens, let's talk a bit more about this model. Now,
this model. Now, as I said that it uses shared continuous representation space. So, what happens
representation space. So, what happens inside is that instead of retrieving raw text, the model retrieves compact memory tokens that represent the documents and
this allows the system to pass gradients from the final answer generation back to the retriever process. And that whole thing ensures that the retriever learns
to select documents that actually help the answer specific query. If you have been using rag in your organization which means that you are providing your
own data to the models then you know that this could be a huge issue when you are using rag oriented models. So this
is where Apple is trying to help out so that everything remains in the context.
Also I will be talking about a very interesting new concept of SCP which is salient compressor pre-training. But for
now let's carry on with our installation which should be done any minute.
And now let's download the model. For
that let's first log into hugging face.
And I just need to put in my free token which I already have grabbed from my profile and I am now logged in. Now they
have released various checkpoints uh on hugging face. They also have a GitHub
hugging face. They also have a GitHub repo. I will also drop the link to it in
repo. I will also drop the link to it in video's description. So I'm just going
video's description. So I'm just going to download everything in one go just to avoid any confusion. You can see that it is downloading um all the stuff and it's not a huge model. It's 7 billion
parameter but it is mainly geared towards rag. So let's wait for it. Once
towards rag. So let's wait for it. Once
it is done I am going to launch my Jupiter notebook.
and then we will play with this model over there.
Okay. Now let's specify our model. I'm
just importing this uh transformers.
Maybe I will also go with torch. And now
let's grab our model.
And the model is now loaded. Let me also show you the VRAM consumption. So it is consuming under 15 gig of VRAM. Okay.
Now let's see it in action.
So if you look at this simple example, this code is providing raw text descriptions of plants and a specific comparison question to the Claraara model which compresses the raw text into
internal embeddings or numerical representations or vectors to generate an answer. And you can see that if I run
an answer. And you can see that if I run this uh model, it is going to be fairly quick. it will uh provide a search
quick. it will uh provide a search generated answer and the output confirms that the model successfully has identified WINA as a correct genus
native to Mexico and Guatemala by reasoning over the provided document context. So all in all, I think the this
context. So all in all, I think the this example clearly shows the model's ability to perform instruction tuned QA where it digests external documents on
the fly to answer questions accurately without hallucinating outside information. And now this tells us a lot
information. And now this tells us a lot of benefits of this model in my opinion.
For instance, unlike standard drag where the retriever and generator are trained separately which causes a broken gradient, Claraara operates in a shared continuous space.
This allows the retriever to receive feedback directly from the generation loss learning to select documents that actually help the answer the question
rather than just those with surface level similarity. And [clears throat]
level similarity. And [clears throat] after working on various production grade software, I think this was badly needed and Apple has done quite well
here.
Now let me quickly try to explain this SAP concept in as simple words as possible because I think this is where
uh lot of goodies are happening as far as this model is concerned. Now not only that but if you look at for example this concept which
stands for salient compressor pre-training this is primarily ensuring that the compressed embeddings capture essential
meaning rather than just super superficial patterns. So what happens is
superficial patterns. So what happens is that the standard compression often wastes capacity on reconstructing trivial tokens whereas this SCP forces a
model to focus on the semantic core of the text. And it [clears throat] does
the text. And it [clears throat] does this by using an LLM to synthesize a training data set comprising simple QA and then complex QA and paraphrase
documents. The compressor is then
documents. The compressor is then trained on this data to generate compressed vectors that can answer these questions or reconstruct the paraphrased meaning. This whole process ensures the
meaning. This whole process ensures the resulting embeddings are semantically rich and digest the documents salient information before end to end training
begins. So I think if you are trying to
begins. So I think if you are trying to analyze some logs and that sort of streaming data maybe this could be a
real real gamecher here. Also uh CLA compresses documents into compact memory tokens or they also called as dense vectors instead of reading raw text into
the LLM and this drastically reduces the context length required that allows the model to process more documents faster and with less computational cost than
standard models. So Apple is bit late to
standard models. So Apple is bit late to the AI game but I think they are really bridging the gap fairly quickly and steadily. So keep an eye on Apple. I
steadily. So keep an eye on Apple. I
think they might surprise us all just like they have surprised us with this Claraara 7 billion model and still I don't think so this model is that well known. It is still flying under the
known. It is still flying under the radar. Maybe it is too new. So let's see
radar. Maybe it is too new. So let's see how it goes. Uh I think it is as underrated as my channel. Uh one thing I'm still bit confused is that I'm not
sure what the license is here. Um but we will see. Anyway, other than that, good
will see. Anyway, other than that, good stuff from Apple. Again, please like the video and subscribe and please follow me on X as um it helps a lot.
Loading video analysis...