Zernel

The first operating system where the kernel itself understands machine learning.

Faster training. Lower energy costs. Quantum-secure models. Zero code changes.

View on GitHub ▶ Watch Demo Download ISO Documentation
19.0
TFLOPS on A100
(97.4% of theoretical peak)
1.8x
FP16 speedup
with mixed precision
10-20%
Energy savings
via phase-aware power mgmt
50+
Built-in ML tools
zero external dependencies

The kernel knows your workload

Zernel's sched_ext scheduler detects five ML workload phases in real time and applies different kernel policies to each. No code changes needed.

# Training step timeline — what the kernel sees:

[Data Loading][Forward Pass][Backward Pass][All-Reduce][Optimizer]

# CPU priority: HIGH LOW (yield) LOW (yield) HIGH (tc) HIGH
# GPU clocks: 33% 100% 100% 50% 100%
# Power limit: 60% 100% 100% 70% 100%

# Result: GPU never starves for data. NCCL gets network priority.
# Energy reduced 10-20% with <1% throughput impact.

What you can do right now

GPU Management

Clean nvidia-smi replacement. Real-time process viewer, memory tracking, GPU locking, health checks.

zernel gpu top zernel gpu kill 0

Experiment Tracking

Automatic metric extraction from training output. SQLite-backed, zero config, ZQL queries.

zernel run train.py zernel exp compare a b

ML Benchmarks

Prove your hardware performance. GPU TFLOPS, memory bandwidth, DataLoader throughput, ResNet-50.

zernel bench all zernel bench e2e

Training Debugger

One command to diagnose why training is slow. GPU, CPU, memory, I/O analysis with fix suggestions.

zernel debug why-slow zernel debug oom

Model Deployment

Deploy to local vLLM, Docker container, or AWS SageMaker. Multi-GPU inference with quantization.

zernel serve start ./model

PQC Security

Quantum-resistant encryption and signatures for model weights. AES-256-GCM + ML-KEM compatible.

zernel pqc sign ./model zernel pqc encrypt

Fleet Management

GPU fleet dashboard, cost attribution per team, idle detection, capacity planning.

zernel fleet status zernel fleet costs

Distributed Training

One command for local, SSH multi-node, or Kubernetes PyTorchJob. Automatic NCCL configuration.

zernel job submit --target ssh --nodes 4

Compliance

SOC 2 and HIPAA compliance reports. Immutable audit trail, data lineage, model provenance.

zernel audit report --standard soc2

Verified A100 benchmarks

NVIDIA A100-SXM4-80GB | PyTorch 2.10 | CUDA 12.8
GPU Compute (4096x4096)19.0 TFLOPS
Memory Bandwidth690 GB/s
Host→Device Transfer4.5 GB/s
DataLoader (4 workers)3,413 samples/s
Training Step (FP32)4.48 ms/step
Training Step (FP16 AMP)2.49 ms/step (1.8x)
ResNet-50 Training942 images/s

Why not just use Ubuntu?

Stock UbuntuZernel
CPU SchedulerCFS (generic)sched_ext with 5 ML phase types
GPU Observabilitynvidia-smi (poll-based)5 eBPF probes (real-time, zero overhead)
Power ManagementStatic limitsPhase-aware dynamic clocks (10-20% savings)
Model SecurityPlaintext, RSA keysPQC encryption + quantum-resistant signatures
Setup TimeDays (drivers, CUDA, frameworks)Minutes (pre-installed, pre-validated)
Experiment TrackingInstall MLflow separatelyBuilt-in, zero config
Debuggingnvidia-smi + guessingzernel debug why-slow (automated)
Cost TrackingCustom scriptszernel fleet costs (built-in)
ComplianceManual auditzernel audit report (one command)

Get started in 5 minutes

$ git clone https://github.com/dyber-pqc/Zernel.git && cd Zernel
$ cargo build --workspace --release
$ export PATH=$PWD/target/release:$PATH

$ zernel doctor               # Check your environment
$ zernel gpu status           # See your GPUs
$ zernel bench quick          # 5-minute benchmark
$ zernel init my-project      # Create a project
$ zernel run train.py         # Train with auto-tracking
$ zernel watch                # Live GPU dashboard

Download Zernel

Bootable ISO — install on bare metal or VM

Server: headless, max GPU memory | Desktop: GNOME + GPU dashboard
Both include: CUDA, PyTorch, JAX, vLLM, Ollama, 50+ Zernel tools

Pre-installed ML stack

Frameworks

PyTorch + CUDA
JAX + CUDA
TensorFlow

LLM

vLLM
Transformers
PEFT, TRL, bitsandbytes

Distributed

DeepSpeed
FairScale
ColossalAI

Developer

JupyterLab
TensorBoard
W&B, MLflow, Gradio

RAG

LangChain
ChromaDB
FAISS-GPU

Local LLM

Ollama
Llama 3.1 8B
(works offline)