Apple Surprises with CLaRa-7B: A Useful RAG Model: Install Locally

By Fahd Mirza

Summary

Topics Covered

RAG's Broken Gradient Fixed
Retrieve Memory Tokens, Not Text
SCP Captures Semantic Core
Compact Tokens Slash Context Costs
Apple Bridges AI Gap Steadily

Full Transcript

Apple has just released CLA. Claraara,

which stands for continuous latent reasoning, is a unified rag framework designed to bridge the gap between document retrieval and answer

generation. We are going to install it

generation. We are going to install it locally and we are going to test it out on few of the examples. This is Fahad Miza and I welcome you to the channel.

Please like the video and subscribe and consider becoming a member as that helps a lot. If you are looking for AI updates

a lot. If you are looking for AI updates without hype and fluff, please follow me on X as that helps a lot. Before I start the installation, let's talk very

quickly about this model. Traditional

rag systems optimize retrieval and generation separately where the retriever selects document based on surface level similarity and the

generator process raw text creating a broken gradient that prevents end to end learning. This is where Apple's new

learning. This is where Apple's new model is trying to help out. Claraara

addresses this by mapping both document and queries into a shared continuous representation space. I will be talking

representation space. I will be talking more about its architecture and training. But for now, let's get started

training. But for now, let's get started with the installation. This is my Ubuntu system and this is my GPU card Nvidia

RTX 6000 with 48 GB of VRAM. If you're

looking to rent a GPU or VM on very affordable price, you can find the link to their website in video's description with a discount coupon code of 50% for a

range of GPUs. So, please do check them out. Okay. Next up, I'm going to install

out. Okay. Next up, I'm going to install transformers and do make sure that you install it from the source so that you get the uh latest one. It is going to

take couple of minutes. While that

happens, let's talk a bit more about this model. Now,

this model. Now, as I said that it uses shared continuous representation space. So, what happens

representation space. So, what happens inside is that instead of retrieving raw text, the model retrieves compact memory tokens that represent the documents and

this allows the system to pass gradients from the final answer generation back to the retriever process. And that whole thing ensures that the retriever learns

to select documents that actually help the answer specific query. If you have been using rag in your organization which means that you are providing your

own data to the models then you know that this could be a huge issue when you are using rag oriented models. So this

is where Apple is trying to help out so that everything remains in the context.

Also I will be talking about a very interesting new concept of SCP which is salient compressor pre-training. But for

now let's carry on with our installation which should be done any minute.

And now let's download the model. For

that let's first log into hugging face.

And I just need to put in my free token which I already have grabbed from my profile and I am now logged in. Now they

have released various checkpoints uh on hugging face. They also have a GitHub

hugging face. They also have a GitHub repo. I will also drop the link to it in

repo. I will also drop the link to it in video's description. So I'm just going

video's description. So I'm just going to download everything in one go just to avoid any confusion. You can see that it is downloading um all the stuff and it's not a huge model. It's 7 billion

parameter but it is mainly geared towards rag. So let's wait for it. Once

towards rag. So let's wait for it. Once

it is done I am going to launch my Jupiter notebook.

and then we will play with this model over there.

Okay. Now let's specify our model. I'm

just importing this uh transformers.

Maybe I will also go with torch. And now

let's grab our model.

And the model is now loaded. Let me also show you the VRAM consumption. So it is consuming under 15 gig of VRAM. Okay.

Now let's see it in action.

So if you look at this simple example, this code is providing raw text descriptions of plants and a specific comparison question to the Claraara model which compresses the raw text into

internal embeddings or numerical representations or vectors to generate an answer. And you can see that if I run

an answer. And you can see that if I run this uh model, it is going to be fairly quick. it will uh provide a search

quick. it will uh provide a search generated answer and the output confirms that the model successfully has identified WINA as a correct genus

native to Mexico and Guatemala by reasoning over the provided document context. So all in all, I think the this

context. So all in all, I think the this example clearly shows the model's ability to perform instruction tuned QA where it digests external documents on

the fly to answer questions accurately without hallucinating outside information. And now this tells us a lot

information. And now this tells us a lot of benefits of this model in my opinion.

For instance, unlike standard drag where the retriever and generator are trained separately which causes a broken gradient, Claraara operates in a shared continuous space.

This allows the retriever to receive feedback directly from the generation loss learning to select documents that actually help the answer the question

rather than just those with surface level similarity. And [clears throat]

level similarity. And [clears throat] after working on various production grade software, I think this was badly needed and Apple has done quite well

here.

Now let me quickly try to explain this SAP concept in as simple words as possible because I think this is where

uh lot of goodies are happening as far as this model is concerned. Now not only that but if you look at for example this concept which

stands for salient compressor pre-training this is primarily ensuring that the compressed embeddings capture essential

meaning rather than just super superficial patterns. So what happens is

superficial patterns. So what happens is that the standard compression often wastes capacity on reconstructing trivial tokens whereas this SCP forces a

model to focus on the semantic core of the text. And it [clears throat] does

the text. And it [clears throat] does this by using an LLM to synthesize a training data set comprising simple QA and then complex QA and paraphrase

documents. The compressor is then

documents. The compressor is then trained on this data to generate compressed vectors that can answer these questions or reconstruct the paraphrased meaning. This whole process ensures the

meaning. This whole process ensures the resulting embeddings are semantically rich and digest the documents salient information before end to end training

begins. So I think if you are trying to

begins. So I think if you are trying to analyze some logs and that sort of streaming data maybe this could be a

real real gamecher here. Also uh CLA compresses documents into compact memory tokens or they also called as dense vectors instead of reading raw text into

the LLM and this drastically reduces the context length required that allows the model to process more documents faster and with less computational cost than

standard models. So Apple is bit late to

standard models. So Apple is bit late to the AI game but I think they are really bridging the gap fairly quickly and steadily. So keep an eye on Apple. I

steadily. So keep an eye on Apple. I

think they might surprise us all just like they have surprised us with this Claraara 7 billion model and still I don't think so this model is that well known. It is still flying under the

known. It is still flying under the radar. Maybe it is too new. So let's see

radar. Maybe it is too new. So let's see how it goes. Uh I think it is as underrated as my channel. Uh one thing I'm still bit confused is that I'm not

sure what the license is here. Um but we will see. Anyway, other than that, good

will see. Anyway, other than that, good stuff from Apple. Again, please like the video and subscribe and please follow me on X as um it helps a lot.

Loading...

Loading video analysis...