Skip to content

notes

This page is automatically generated from my day-to-day notes. Idea stolen from Yacine’s website

programming lan

purifai

todo 1

[ ] get the channels dinamicaly — show the list of yt channels enable and add a way to enable-disable or even add more from the UI

todo 2

[ ] when click the video, be able to reproduce it directly in the website to take notes;

todo 3

[ ] still needed?? — a simple dense retriever can initially retrieve around 100-500 candidates, which ca then be reranked with ColBERT to bring the most relevant results to the top.

todo 4

[ ] enhance retrieval — make all the text in lowercase, even the user’s query

why pytorch tensors are faster than list in python

write an article, blog or tw about it. remark the efficiency just using cpu and the way is handled the data to achieve this.

learning corne layout

i gotta do a whole video or post about this, is so hard. the layout is qwerty but forces you to use all your fingers to type, and also, use the necessary for each key

micrograd in c

keep going

ai world generated

research

auto vid

create automated videos to explain topics w 3b1b lib. NEEDS A HUGE FINE TUNING

youtube sanitizer: follow last post —> DEMO

run: vectordb, back, front embeddings:

agent

add:

movie recommendation system

so fucking annoying open 20 apps to search for a movie and don’t know what exactly to see. figured out how i can solve this

train 1b model from scratch

just that. if multimodal later better. make it a tiny agent in OS level. must have:

learn from myself

reddit browser

query into reddit like i do to google with “site:reddit.com” and gimme a good answer. should be based on a few posts and vary answers of people, include upvotes, names and date.

saturday tasks

take the time to learn all the vim shortcuts and try zed ide

---

infinit canvas of concepts

you search for a concept or something like that and a break down it in a graph of concepts if you click to generate more it keeps granulating the concepts until leafs

llms interview

continue the training and find some useful coding test. leet code is ok?

fine-tuninig

llama3 8b with function and api calling dataset. use qlora

automate-

start work with bookmarks and classy?? more ideas will appearn while i do my stuff

qlora paper

keep going with “lora-and-qlora.txt” notes, double quantization.

read list

automate

do some updates and run the script from the start as a background process look for more use cases w gpt api. add dictionary.

add gpt

add system prompt

see how can i add system prompt to my 4-bit quantized gemma2 which is not fine-tuned for that. try to add a prompt to make it act as natural as possible. like chatting with some friend

fix multi-download and long videos

output: Chunk 0: 199.95 MB Chunk 1: 199.95 MB Chunk 2: 199.95 MB Chunk 3: 199.95 MB Chunk 4: 199.95 MB Chunk 5: 199.95 MB Chunk 6: 74.19 MB

figurate out what’s the max size that could be handled

quantize a model

i’ll quant a 1b model to see how it acts, probably 1-bit quantized running locally

what’s next?

continue w ‘my-own’ series, tokenizer done. what’s the next to do?

tokenizer

finish ‘my-own-tokenizer’. apply one like gpt with pre-wrote code

to-do idea

i gotta build everything i use in my daily-basis from scratch and document it in my ml resources as well:

FIX CUDA OR QWEN

I DUNNO IF CUDA IS FAILING OR QWEN MODEL THAT I’M USING, TRY WITH OTHERS

dataset

i gotta do the dataset, i hate do datasets

llama.cpp

run qwen cli

keep on it

i gotta keep on what’s wrote on my notebook

record demo

how works this section of the web showing logs and how fast it is

uber location

research how uber map/path/location works

i think it’s not gonna work but should try more examples

bert

i think i never finished bert paper i’ll check it later

reading roberta paper

idk why i’ve not done it yet

books section

add a books section in my website w reviews