Introduction

The rapidly evolving AI era presents new discoveries daily, particularly in software engineering, where AI assistants are now integral to our workflows. While these innovations have boosted productivity, they also bring challenges like:

  • Reliance on cloud-based services and closed models ๐Ÿ›…
  • Privacy and security concerns ๐Ÿ”’
  • Other barriers to entry ๐Ÿšง

The Challenge of Cloud-Based and Proprietary Models

According to the Chatbot arena leaderboard the top entries with high Elo ratings are all proprietary models.

Chatbot arena leaderboard

This is not surprising, as the best models are trained on massive datasets and require significant compute resources. However, this means that the best models are not available to experiment and iterate with as either you will need to pay for access and/or rely on a online workflow.

Privacy and Security Concerns

Using cloud-based models or services often involves sending data to third parties, raising significant privacy and security concerns.

Overcoming Barriers to Entry

The cost of model access and the need for GPUs present significant barriers, especially for beginners or those wanting to experiment without substantial investment.

The Advantages of Local Large Language Models (LLMs)

Local LLMs emerge as an ideal solution, offering privacy, security, and lower barriers to entry. I explored several options, like:

All of the above are great projects and can run the top open-source models locally. I experimented with my MacBook Pro (with and without M1 chips) and in both cases the performance was great so was the ease of setup. I found Ollama to be the easiest to setup and use. It also can multiplex between multiple models. Ollama has a very Docker like feel to it and I found it very easy to use. Ollama also exposes a REST API which makes it easy to integrate with other tools.

Building a Browser Sidekick

Now that the barrier to entry is removed, I got thinking about how to leverage these models to assist with day to day tasks. After thinking for a while I landed on the idea of building a browser sidekick. Since the browser has access to both private and public content it is a great place to have a local LLM. The idea is to have a local LLM available in the browser via a browser extension that can assist with tasks such as:

  • Summarization of web pages
  • Answering questions about the web page
  • Answering questions about local documents
  • Chatting with the LLM from the browser

One of the inspiration for this was Brave Leo AI

DistiLlama

DistiLlama After a lot of prototyping and experimentation I landed I had a working extension, now the hard part was what to name it after a bit of word play and as a homage to the Llama family of LLMs I landed on the name DistiLlama

DistiLlama is primarily built with Ollama, Langchain, Transformers.js and Voy

Here is the main menu of the DistiLlama extension side panel: main-page

Here is a demo of DistiLlama in action:

summarization

Conclusion

Check out DistiLlama on GitHub and let me know what you think. It has certainly helped me with my daily productivity and I hope it helps you too.

distillama

More experiments are more tools are coming soon. Stay tuned 😄