LLM News and Articles Weekly Digest — April 15, 2024
Latest News
x.AI Unveils its First Multimodal model, Grok-1.5 Vision
x.AI has announced that its latest flagship model has vision capabilities on par with (and in some cases exceeding) state-of-the-art models.
OpenAI Fires Researchers For Leaking Information
OpenAI has reportedly fired two researchers who were allegedly linked to the leaking of company secrets following months of leaks and company efforts to crack down on such incidents.
Cohere Launches New Rerank 3 Model
This adaptable model seamlessly meshes with various databases or search indexes and effortlessly integrates into older applications boasting native search capabilities. With the insertion of a mere line of code, Rerank 3 can amplify search effectiveness or slash the costs associated with running RAG applications, all while keeping latency to a minimum.
Mistral releases Mixtral 8x22 Apache 2 licensed MoE model
A new 8x22B model like always with a magnet link. Initial community benchmarks indicate that the first version performs impressively as a base model, boasting 77 MMLU (typically linked with reasoning tasks).
Google’s Gemini Pro 1.5 Enters Public Preview
Google has made its most advanced generative AI model, Gemini 1.5 Pro, available in public preview on its Vertex AI platform. It has a context window of 1 million tokens, can understand audios, has a JSON mode for devs and acts on your commands.
Meta Confirms That Llama 3 Is Coming Next Month — GPT 4 Competitor?
Meta has confirmed plans to release Llama 3, the next generation of its large language model for generative AI assistants, within the next month.
Articles
Lessons after a half-billion GPT tokens
This article outlines seven key insights gained by Truss, a startup that recently rolled out several features heavily reliant on LLMs in the past six months. These insights encompass enhancing prompts, refining tooling and maximizing usage efficiency, understanding GPT’s limitations, and more.
The State of Generative AI, 2024
A nuanced analysis and a glimpse of the future. If you watch the news, it’s easy to miss the forest for the trees. This article gathers it all in place.
Folks at LlamaIndex have launched the LlamaIndex + MistralAI Cookbook Series for creating a range of RAG applications, Cookbooks
The Fears and Opportunities of AI Written Content
AI writing is a balancing art, is it a looming threat to professional writers?
Papers and Repositories
Evaluating Large Language Models on Long Texts
Ada-LEval is a groundbreaking benchmark for assessing long-context capabilities with adaptable-length questions. It includes two challenging tasks: TSort, arranging text segments, and BestAnswer, selecting the best answer from multiple candidates.
karpathy/llm.c: LLM training in simple, raw C/CUDA.
Karpathy’s project focuses on developing a minimalist GPT-2 training framework using C/CUDA, aiming to replicate the PyTorch model in around 1,000 lines of code while enhancing performance through direct CUDA integration and tailored CPU optimizations.
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs.
Apple researchers have created Ferret-UI, an advanced multimodal large language model (MLLM) tailored for enhanced interpretation and interaction with mobile user interface (UI) screens.
Rho-1: Not All Tokens Are What You Need.
The authors analyze token importance in language model training, revealing varied loss patterns. This leads to RHO-1, a new model using Selective Language Modeling (SLM) to focus on training with beneficial tokens, rather than treating all equally.
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.
The work introduces Infini-attention, an attention mechanism within a Transformer block, enabling LLMs to handle infinitely long inputs while maintaining bounded memory and computational requirements.
Thank you for reading !
The blog is originally published in https://shresthakamal.com.np/blog/2024/newsletter-edition-2/.
If you have any feedbacks or suggestions , please do comment.