notes

*deprecated rn, gotta configure again

This page is automatically generated from my day-to-day notes. Idea stolen from Yacine’s website

purifai

todo 1

[ ] get the channels dinamicaly — show the list of yt channels enable and add a way to enable-disable or even add more from the UI

todo 2

[ ] when click the video, be able to reproduce it directly in the website to take notes;

must be the notes stored for each user-video?
should i crete a graph of knowledge?
a chart of consumed content? history of watched videos?

todo 3

[ ] still needed?? — a simple dense retriever can initially retrieve around 100-500 candidates, which ca then be reranked with ColBERT to bring the most relevant results to the top.

todo 4

[ ] enhance retrieval — make all the text in lowercase, even the user’s query

why pytorch tensors are faster than list in python

write an article, blog or tw about it. remark the efficiency just using cpu and the way is handled the data to achieve this.

learning corne layout

i gotta do a whole video or post about this, is so hard. the layout is qwerty but forces you to use all your fingers to type, and also, use the necessary for each key

micrograd in c

keep going

ai world generated

research

auto vid

create automated videos to explain topics w 3b1b lib. NEEDS A HUGE FINE TUNING

youtube sanitizer: follow last post —> DEMO

run: vectordb, back, front embeddings:

should i use one vectordb per channel?
how the text is chunked right now is enough to just create the vector db? retrieve:
hybrid? qdrant? what

agent

add:

play albums/playlist
add stt -> learn about stt first, look for a lightweight model to run locally
calendar access
email crud
add tts -> learn about tts
improvements for research?
note-taking
recommendations based on note-taking??? vectorstore and then what?

movie recommendation system

so fucking annoying open 20 apps to search for a movie and don’t know what exactly to see. figured out how i can solve this

train 1b model from scratch

just that. if multimodal later better. make it a tiny agent in OS level. must have:

stt recognition
function calling
passive listening/active/waiting (something like the ‘hey alexa’ behaviour)

learn from myself

do screenshot from my screen every 15’ and categorize what is on it.
build a graph of concepts.
explain each concept so you can navigate through them.
search for a way to connect new concepts with old concepts that do not appear in the text used to classify
recommendation system
analytics.
add a heatmap like github commit but fill it when with color the days i learned something (basically content related to programming, ml, papers, ide, etc).
add the url or metadata to the prompt in order to understand in which page im in.
amount of hours in my pc and which pages/apps i use the most.

reddit browser

query into reddit like i do to google with “site:reddit.com” and gimme a good answer. should be based on a few posts and vary answers of people, include upvotes, names and date.

saturday tasks

take the time to learn all the vim shortcuts and try zed ide

---

infinit canvas of concepts

you search for a concept or something like that and a break down it in a graph of concepts if you click to generate more it keeps granulating the concepts until leafs

llms interview

continue the training and find some useful coding test. leet code is ok?

fine-tuninig

llama3 8b with function and api calling dataset. use qlora

automate-

start work with bookmarks and classy?? more ideas will appearn while i do my stuff

qlora paper

keep going with “lora-and-qlora.txt” notes, double quantization.

read list

Deep Learning by ian goodfellow
Hands-On Generative AI with Transformers and Diffusion Models???

automate

do some updates and run the script from the start as a background process look for more use cases w gpt api. add dictionary.

add gpt

should i add it to my automate tool
check and change shortcuts. cannot use ctrl+s or ctrl+d for example

add system prompt

see how can i add system prompt to my 4-bit quantized gemma2 which is not fine-tuned for that. try to add a prompt to make it act as natural as possible. like chatting with some friend

fix multi-download and long videos

check yt download bc i’m downloading the whole playlist from a link and i just want the mp3 from the vid url
check how to handle long videos to don’t make a lot of requests to asr api.

output: Chunk 0: 199.95 MB Chunk 1: 199.95 MB Chunk 2: 199.95 MB Chunk 3: 199.95 MB Chunk 4: 199.95 MB Chunk 5: 199.95 MB Chunk 6: 74.19 MB

figurate out what’s the max size that could be handled

quantize a model

i’ll quant a 1b model to see how it acts, probably 1-bit quantized running locally

what’s next?

continue w ‘my-own’ series, tokenizer done. what’s the next to do?

tokenizer

finish ‘my-own-tokenizer’. apply one like gpt with pre-wrote code

to-do idea

i gotta build everything i use in my daily-basis from scratch and document it in my ml resources as well:

tokenizer
transformers core parts (embedding layer, positional encoding, attention mechanism, feed-forward)
lm head?
training loop?
sampling methods?

FIX CUDA OR QWEN

I DUNNO IF CUDA IS FAILING OR QWEN MODEL THAT I’M USING, TRY WITH OTHERS

dataset

i gotta do the dataset, i hate do datasets

llama.cpp

run qwen cli

keep on it

i gotta keep on what’s wrote on my notebook

record demo

how works this section of the web showing logs and how fast it is

uber location

research how uber map/path/location works

mmr search?

i think it’s not gonna work but should try more examples

bert

i think i never finished bert paper i’ll check it later

reading roberta paper

idk why i’ve not done it yet

books section

add a books section in my website w reviews