- Edited
This thread is dedicated to the possibility of integrating BERT models and implementing a LLM interference engine.
BERT models are advancing and I would like them to serve Orion in a way the user chooses to.
With BERT models it is possible to personalize the users experience without sacrificing your privacy.
Implementation
- Dropdown Menu in the Settings where the user can choose to download a BERT model
- Option to use multiple BERT models
- Specify the usage of the BERT model upon given criteria
- Kagi Search collaborating together with Orion for greater UX
- BERT models adjusting site content based on collected data kept privately in the browser.
- BERT Benchmark for Orion is required
Possibilities
Category | Use Case / Capability | Description / Bundled Examples |
---|---|---|
1. Text Analysis & Classification | General Text Classification | Automatically categorizing texts by topic, type, or relevance (e.g., news articles, support tickets, legal documents). This may be used for bookmarks. |
Specific Classifications | Identifying sentiment (product reviews, social media), spam (emails, comments), or user intent (chatbots). | |
Named Entity Recognition (NER) | Extracting and classifying specific information like names, locations, dates from texts. | |
2. Information Retrieval & Recommendation | Semantic Search & Question Answering (Q&A) | Understanding search queries or questions to extract relevant information from documents, web pages, or browser history (contextual understanding). |
Recommendations & Ranking | Suggesting similar websites, products, or content; ranking results or visited sites based on relevance, preference, or user behavior. | |
Contextual Autocompletion | Intelligent suggestions for completing inputs (e.g., search bar, forms) based on context and past interactions (beyond simple keyword matching). | |
3. User Experience Enhancement | Pattern & Anomaly Detection (e.g., "Annoyances") | Identifying unusual or potentially disruptive website behavior patterns (e.g., frequent title changes, pop-ups) by analyzing sequences/logs. |
On-Device Personalization | Adjusting site content, layout, or recommendations based on locally stored user data (history, preferences) processed privately within the browser. |
BERT and SLM Models for Browser Use
Model Name | Params / Model Size | Typical RAM (Inference) | CPU Usage (Browser) | Max Seq. Length | Browser Suitability | Use Cases | Link(s) |
---|---|---|---|---|---|---|---|
BERT (base) | 110M / 420MB | 400–800 MB | High for long seq., moderate | 512 | Prototyping, not optimal | Text classification, NER, feature extraction | Hugging Face GitHub Demo |
BERT (large) | 340M / 1.6GB | 1.2–2.5 GB | Very high | 512 | Not practical for browser | Advanced NLU, classification, QA | Restackio |
DistilBERT | 66M / 250MB | 200–400 MB | Lower than BERT-base | 512 | Excellent, browser-optimized | Fast text classification, Q&A, sentiment analysis | Hugging Face |
MobileBERT | 25M / 100MB | 100–200 MB | Low, efficient | 512 | Best for browser/mobile | Lightweight Q&A, classification | Hugging Face Demo |
ALBERT (base) | 12M / 45–60MB | 80–160 MB | Very efficient | 512 | Very good, small footprint | Efficient text classification, intent detection | Hugging Face |
MiniLM | 33M / 120MB | 150–250 MB | Low | 512 | Excellent for search/ranking | Semantic search, ranking, recommendations | Hugging Face |
TinyBERT | 14M / 55MB | 60–120 MB | Very low | 512 | Excellent | Fast, efficient, competitive on small tasks | Hugging Face |
ModernBERT | 110M / varies | 400–700 MB | Moderate (browser-optimized) | 4096+ | SoA, large context | Retrieval, classification, code/doc search | Hugging Face Blog |
LongFormer | 149M / 575MB | 600–1000 MB | High for long documents | 4096 | Research, long docs | Long doc Q&A, summarization | GitHub |
BioBERT | 110M / 420MB | 400–800 MB | As BERT-base | 512 | Research, domain-specific | Biomedical QA, extraction | Hugging Face |
Entity-BERT | 110M / 420MB | 400–800 MB | As BERT-base | 512 | Research, specialized | Entity extraction in domain tasks | Frontiers Paper |
NeoBERT | 80–120M / TBD | Expected lower | Expected lower | 512+ | Research, next-gen efficient | Classification, retrieval, efficient inference | arXiv |
Qwen2 (1B) | 1B / 4GB | 2–4 GB | Moderate | 2048+ | Good (quantized) | Multitask, edge, fast | Hugging Face |
Phi-3 (mini) | 1.8B / 7GB | 4–8 GB | Moderate | 128k | Good (quantized) | Code, reasoning, dialogue | Hugging Face |
TinyLlama | 1.1B / 4GB | 2–4 GB | Moderate | 2048+ | Good (quantized) | General SLM, tiny, fast | Hugging Face |
Llama 2 (7B) | 7B / 28GB | 8–16 GB | High, edge device | 4096 | Feasible (quantized/edge) | General, strong SLM | Meta |
Mistral (7B) | 7B / 28GB | 8–16 GB | High, edge device | 8192 | Feasible (quantized/edge) | Fast, efficient, browser/edge | Hugging Face |
Gemma 2 (2B) | 2B / 8GB | 4–8 GB | Moderate | 8192 | Good (quantized) | Efficient, edge, general SLM | |
Gemma 3 1B | 1B / 529MB | 1–2 GB | Excellent | 32,000 | Excellent | Text gen, QA, summarization, chat | Google HF |
Gemma 3 4B | 4B / 2GB | 3–5 GB | Edge / Browser (quantized) | 128,000 | Excellent (quantized) | Text/image gen, multilingual, summarization | Google HF |
StableLM 2 1.6B | 1.6B / 6GB | 2–4 GB | Excellent | 8,192 | Excellent | NLU, chat, multilingual tasks | HF |
MiniCPM | 1.2B / 5GB | 2–4 GB | Excellent | 2,048 | Excellent | General SLM, chat, summarization | HF |
BLOOMZ 560M | 560M / 2GB | 1–2 GB | Excellent | 2,048 | Excellent | Translation, chat, summarization, instruct | HF |
Unreleased: Qwen3 SLM | 1–4B (expected) | ≤4 GB (quantized, est.) | Excellent (design goal) | 32–128k (rumor) | Excellent | Multilingual, multimodal, chat, QA | GitHub |
Unreleased: Llama 4 Mini | 1–4B (expected) | ≤4 GB (quantized, est.) | Excellent (design goal) | 32–128k (rumor) | Excellent | NLU, chat, summarization | Meta |
Unreleased: DeepSeek SLM v2 | 1–4B (expected) | ≤4 GB (est.) | Excellent | 32–128k (rumor) | Excellent | General SLM, chat | GitHub |
Unreleased: Falcon Lite 1B | 1B (expected) | ≤2 GB (est.) | Excellent | 8k+ | Excellent | Multilingual NLU, chat | Falcon HF |
Modern SLMs (Qwen2, Phi-3, TinyLlama, Mistral) are designed to be both more efficient and more capable than classic BERT, especially for generation and multitask use.
For pure browser deployment, models under 100MB–250MB are strongly preferred. Quantization (4-8 bit) can reduce RAM / Storage needs for larger LLMs.
RAM and CPU usage estimates are for inference (not training), assuming float32 precision; quantization or float16/int8 reduces requirements further https://huggingface.co/docs/accelerate/en/usage_guides/model_size_estimator
https://krishansubudhi.github.io/deeplearning/2019/09/20/BertMemoryAnalysis.html
Frameworks for Browser BERT Inference
Framework / Tool | Stage | Supported Models / Formats | Browser Suitability | Key Features / Notes | Link(s) |
---|---|---|---|---|---|
TensorFlow.js | Mature | TF.js, BERT, MobileBERT, etc. | Excellent (all major browsers) | Open source, run/convert models, large JS community | TensorFlow.js |
Transformers.js | Mature | Hugging Face Transformers | Excellent | Open source, easy API for Hugging Face models in browser | Transformers.js |
ONNX.js | Mature | ONNX (e.g., MiniLM, DistilBERT) | Very good | Open source, runs ONNX models, WASM/WebGL acceleration | ONNX.js GitHub |
ONNX Runtime Web | Mature | ONNX | Excellent | High performance, WASM, WebGL, quantization support | ONNX Runtime Web |
WebLLM | Research / Active | LLMs, Transformer variants | Excellent (WebGPU / WebGL browsers) | In-browser LLM inference engine, supports quantized models | WebLLM |
Browser-BERT | Demo / Research | BERT (converted), TF.js | Excellent for demos | Example project, educational, shows browser deployment | Browser-BERT |
browser-use | Mature | Integrates with various LLMs / APIs | Excellent (controls any browser) | Open source automation / orchestration tool, lets AI control browser actions | GitHub |
Key Distinction:
- Frameworks like TensorFlow.js, Transformers.js, ONNX.js, ONNX Runtime Web, and WebLLM are primarily for running the machine learning models (like BERT) directly within the browser's JavaScript environment for tasks like inference.
- browser-use is an automation and orchestration tool that allows an AI (often an LLM) to control the browser itself (navigate, click, type, scrape). It can utilize the other frameworks to analyze content gathered during automation but doesn't run the models itself in the same way.
Notes:
- Model sizes and RAM / CPU usage can often be reduced by quantization (e.g., float16, int8, int4) using tools such as accelerate estimate-memory.
- Research frameworks like WebLLM are rapidly evolving and are likely to support more BERT/Transformer variants in the near future.
- For production browser projects, MobileBERT, DistilBERT, ALBERT, and MiniLM are currently the most practical choices.