3

This thread is dedicated to the possibility of integrating BERT models and implementing a LLM interference engine.

BERT models are advancing and I would like them to serve Orion in a way the user chooses to.

With BERT models it is possible to personalize the users experience without sacrificing your privacy.

Implementation

  • Dropdown Menu in the Settings where the user can choose to download a BERT model
  • Option to use multiple BERT models
  • Specify the usage of the BERT model upon given criteria
  • Kagi Search collaborating together with Orion for greater UX
  • BERT models adjusting site content based on collected data kept privately in the browser.
  • BERT Benchmark for Orion is required

Possibilities

CategoryUse Case / CapabilityDescription / Bundled Examples
1. Text Analysis & ClassificationGeneral Text ClassificationAutomatically categorizing texts by topic, type, or relevance (e.g., news articles, support tickets, legal documents). This may be used for bookmarks.
Specific ClassificationsIdentifying sentiment (product reviews, social media), spam (emails, comments), or user intent (chatbots).
Named Entity Recognition (NER)Extracting and classifying specific information like names, locations, dates from texts.
2. Information Retrieval & RecommendationSemantic Search & Question Answering (Q&A)Understanding search queries or questions to extract relevant information from documents, web pages, or browser history (contextual understanding).
Recommendations & RankingSuggesting similar websites, products, or content; ranking results or visited sites based on relevance, preference, or user behavior.
Contextual AutocompletionIntelligent suggestions for completing inputs (e.g., search bar, forms) based on context and past interactions (beyond simple keyword matching).
3. User Experience EnhancementPattern & Anomaly Detection (e.g., "Annoyances")Identifying unusual or potentially disruptive website behavior patterns (e.g., frequent title changes, pop-ups) by analyzing sequences/logs.
On-Device PersonalizationAdjusting site content, layout, or recommendations based on locally stored user data (history, preferences) processed privately within the browser.

BERT and SLM Models for Browser Use

Model NameParams / Model SizeTypical RAM (Inference)CPU Usage (Browser)Max Seq. LengthBrowser SuitabilityUse CasesLink(s)
BERT (base)110M / 420MB400–800 MBHigh for long seq., moderate512Prototyping, not optimalText classification, NER, feature extractionHugging Face GitHub Demo
BERT (large)340M / 1.6GB1.2–2.5 GBVery high512Not practical for browserAdvanced NLU, classification, QARestackio
DistilBERT66M / 250MB200–400 MBLower than BERT-base512Excellent, browser-optimizedFast text classification, Q&A, sentiment analysisHugging Face
MobileBERT25M / 100MB100–200 MBLow, efficient512Best for browser/mobileLightweight Q&A, classificationHugging Face Demo
ALBERT (base)12M / 45–60MB80–160 MBVery efficient512Very good, small footprintEfficient text classification, intent detectionHugging Face
MiniLM33M / 120MB150–250 MBLow512Excellent for search/rankingSemantic search, ranking, recommendationsHugging Face
TinyBERT14M / 55MB60–120 MBVery low512ExcellentFast, efficient, competitive on small tasksHugging Face
ModernBERT110M / varies400–700 MBModerate (browser-optimized)4096+SoA, large contextRetrieval, classification, code/doc searchHugging Face Blog
LongFormer149M / 575MB600–1000 MBHigh for long documents4096Research, long docsLong doc Q&A, summarizationGitHub
BioBERT110M / 420MB400–800 MBAs BERT-base512Research, domain-specificBiomedical QA, extractionHugging Face
Entity-BERT110M / 420MB400–800 MBAs BERT-base512Research, specializedEntity extraction in domain tasksFrontiers Paper
NeoBERT80–120M / TBDExpected lowerExpected lower512+Research, next-gen efficientClassification, retrieval, efficient inferencearXiv
Qwen2 (1B)1B / 4GB2–4 GBModerate2048+Good (quantized)Multitask, edge, fastHugging Face
Phi-3 (mini)1.8B / 7GB4–8 GBModerate128kGood (quantized)Code, reasoning, dialogueHugging Face
TinyLlama1.1B / 4GB2–4 GBModerate2048+Good (quantized)General SLM, tiny, fastHugging Face
Llama 2 (7B)7B / 28GB8–16 GBHigh, edge device4096Feasible (quantized/edge)General, strong SLMMeta
Mistral (7B)7B / 28GB8–16 GBHigh, edge device8192Feasible (quantized/edge)Fast, efficient, browser/edgeHugging Face
Gemma 2 (2B)2B / 8GB4–8 GBModerate8192Good (quantized)Efficient, edge, general SLMGoogle
Gemma 3 1B1B / 529MB1–2 GBExcellent32,000ExcellentText gen, QA, summarization, chatGoogle HF
Gemma 3 4B4B / 2GB3–5 GBEdge / Browser (quantized)128,000Excellent (quantized)Text/image gen, multilingual, summarizationGoogle HF
StableLM 2 1.6B1.6B / 6GB2–4 GBExcellent8,192ExcellentNLU, chat, multilingual tasksHF
MiniCPM1.2B / 5GB2–4 GBExcellent2,048ExcellentGeneral SLM, chat, summarizationHF
BLOOMZ 560M560M / 2GB1–2 GBExcellent2,048ExcellentTranslation, chat, summarization, instructHF
Unreleased: Qwen3 SLM1–4B (expected)≤4 GB (quantized, est.)Excellent (design goal)32–128k (rumor)ExcellentMultilingual, multimodal, chat, QAGitHub
Unreleased: Llama 4 Mini1–4B (expected)≤4 GB (quantized, est.)Excellent (design goal)32–128k (rumor)ExcellentNLU, chat, summarizationMeta
Unreleased: DeepSeek SLM v21–4B (expected)≤4 GB (est.)Excellent32–128k (rumor)ExcellentGeneral SLM, chatGitHub
Unreleased: Falcon Lite 1B1B (expected)≤2 GB (est.)Excellent8k+ExcellentMultilingual NLU, chatFalcon HF

Frameworks for Browser BERT Inference

Framework / ToolStageSupported Models / FormatsBrowser SuitabilityKey Features / NotesLink(s)
TensorFlow.jsMatureTF.js, BERT, MobileBERT, etc.Excellent (all major browsers)Open source, run/convert models, large JS communityTensorFlow.js
Transformers.jsMatureHugging Face TransformersExcellentOpen source, easy API for Hugging Face models in browserTransformers.js
ONNX.jsMatureONNX (e.g., MiniLM, DistilBERT)Very goodOpen source, runs ONNX models, WASM/WebGL accelerationONNX.js GitHub
ONNX Runtime WebMatureONNXExcellentHigh performance, WASM, WebGL, quantization supportONNX Runtime Web
WebLLMResearch / ActiveLLMs, Transformer variantsExcellent (WebGPU / WebGL browsers)In-browser LLM inference engine, supports quantized modelsWebLLM
Browser-BERTDemo / ResearchBERT (converted), TF.jsExcellent for demosExample project, educational, shows browser deploymentBrowser-BERT
browser-useMatureIntegrates with various LLMs / APIsExcellent (controls any browser)Open source automation / orchestration tool, lets AI control browser actionsGitHub

Key Distinction:

  • Frameworks like TensorFlow.js, Transformers.js, ONNX.js, ONNX Runtime Web, and WebLLM are primarily for running the machine learning models (like BERT) directly within the browser's JavaScript environment for tasks like inference.
  • browser-use is an automation and orchestration tool that allows an AI (often an LLM) to control the browser itself (navigate, click, type, scrape). It can utilize the other frameworks to analyze content gathered during automation but doesn't run the models itself in the same way.

Notes:

  • Model sizes and RAM / CPU usage can often be reduced by quantization (e.g., float16, int8, int4) using tools such as accelerate estimate-memory.
  • Research frameworks like WebLLM are rapidly evolving and are likely to support more BERT/Transformer variants in the near future.
  • For production browser projects, MobileBERT, DistilBERT, ALBERT, and MiniLM are currently the most practical choices.
    Xytronix changed the title to Integration of BERT models into Orion .
      8 days later

      We are actively monitoring LLM models, which could be integrated into Orion, but we don't plan to implement it at the moment. Thanks for the post!

      No one is typing