Synoros Foundry

Run AI Models Locally

Name: Synoros Foundry GPU Servers
Brand: Synoros Foundry
Availability: InStock

Pre-built computers that run large language models on your desk or network. From compact boxes for personal use to rack servers for production workloads. Every system is tested for 48+ hours, warrantied, and ships with a guide showing exactly which models it can run.

View Configurations Contact Sales

Configurations

16-768GB

VRAM Range

$499

Starting Price

90d-1yr

Hardware Warranty

Why Buy From Us

Every build is assembled, tested, and documented by the same engineers who run these systems for production LLM inference.

Tested and Validated

Every system runs real AI workloads for 48+ hours before shipping. We verify thermals, memory, and model performance so you know it works out of the box.

Fraction of the Price

We build from quality refurbished enterprise components instead of charging new-retail markups. Same hardware, 40-60% less than comparable pre-built systems.

Hardware Warranty

Non-GPU components covered for 1 year. Used GPUs covered for 90 days. New GPUs carry a full 1-year warranty. DOA replacement within 14 days.

Model Guidance Included

Each configuration ships with tested model recommendations and quantization guidance so you know exactly what runs and at what quality level.

Transparent Pricing

We publish our exact component costs, sources, and markup formula. See how every dollar is spent.

Read the full breakdown →

No Vendor Lock-in

Standard server hardware, standard GPUs, standard Linux. No proprietary firmware, no licensing fees, no support contracts required. You own it outright.

Three Series

Foundry Lite

CB-16 / CB-24 / CB-32 / CB-32D / CB-48 / CB-64

Local AI boxes for your desk or home network. Quiet, compact, plugs into a standard outlet. Run models like Llama, Mistral, and Qwen privately on your own hardware. From $499.

16-64GB GPU memoryDesk/shelf size1-2 GPUsStandard outlet$499-$2,499

Foundry Workstations

WS-48 / WS-72 / WS-96 / WS-128 / WS-192T / WS-320 / WS-384

Multi-GPU towers for teams and heavier workloads. Run 70B+ parameter models or serve multiple users on your local network. 3-4 GPUs in a tower chassis.

48-384GB GPU memoryTower3-4 GPUs120-240V$1,299-$82,999

Foundry Rack & Servers

R4-128 / R8-192 / R8-256 / R4-192A / R4-192W / R4-320 / R4-384 / S8-768

Rackmount servers for production deployment, model training, and infrastructure. 4-8 GPUs, designed for datacenter or server closet installation.

128-768GB GPU memory4U Rack4-8 GPUs208/240V$5,499-$75,999

Quality Certified Refurbished Parts. We source quality-certified refurbished and lightly used enterprise components from verified suppliers. Every GPU, CPU, and memory module is individually tested and validated during our 48-hour burn-in process. This is how we keep prices 40-60% below comparable new-build configurations without compromising reliability. GPUs carry a 90-day warranty, all other components 1 year. DOA replacement within 14 days.

Choose Your Configuration

Ultra-budget inference boxes with datacenter GPUs

CB-16Ships in 1-2 weeks

16GB

Synoros CostaBox 16

Cheapest local LLM box that actually works

Compact Desktop / First-time local LLM user, student, or hobbyist

GPU1x Tesla P100 16GB

CPUXeon E3-1245 v3 (4C/8T)

RAM8GB DDR4

DISK256GB SSD

BUSPCIe 3.0 / Compact Desktop

Starting at

$499

Ships in 1-2 weeks

Sweet spot

7B-13B models at full quality

Stretch

24B-27B at q4 quantization

Configurable

+CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
+RAM: 8GB DDR4 (standard) / 16GB DDR4 / 32GB DDR4
+Storage: 256GB SSD (standard) / 512GB NVMe / 1TB NVMe
+Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Compact chassis limits cooling. GPU is power-limited for thermals

•Single PCIe slot. Not expandable without changing chassis

•16GB VRAM ceiling means larger models require aggressive quantization

⚡Standard 120V outlet. Total system draw under 300W.

↑Upgrade path: CostaBox 24 or CostaBox 32

Configure CB-16

CB-24Ships in 1-2 weeks

24GB

Synoros CostaBox 24

24GB VRAM for 27B model inference

Compact Desktop / Developer, researcher, or small team wanting real 27B inference

GPU1x Tesla P40 24GB

CPUXeon E3-1245 v3 (4C/8T)

RAM8GB DDR4

DISK256GB SSD

BUSPCIe 3.0 / Compact Desktop

Starting at

$799

Ships in 1-2 weeks

Sweet spot

13B-27B models (Gemma 2 27B, Qwen 2.5 27B, Mistral Small)

Stretch

35B at q4 quantization

Configurable

+CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
+RAM: 8GB DDR4 (standard) / 16GB DDR4 / 32GB DDR4
+Storage: 256GB SSD (standard) / 512GB NVMe / 1TB NVMe
+Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•P40 has no tensor cores. Slower FP16 than V100, but 24GB is the draw

•Compact chassis limits cooling. GPU power-limited for thermals

•Single slot. Not expandable

⚡Standard 120V outlet. Total system draw under 350W.

↑Upgrade path: CostaBox 32 or CostaBox Duo 48

Configure CB-24

CB-32DShips in 1-2 weeks

32GB

Synoros CostaBox Duo 32

Two P100s for the price of one V100

Mini Tower / Budget buyer who wants 2-way parallelism

GPU2x Tesla P100 16GB

CPUXeon E5-1620 v3 (4C/8T)

RAM16GB DDR4 ECC

DISK256GB SSD

BUSPCIe 3.0 / Mini Tower

Starting at

$749

Ships in 1-2 weeks

Sweet spot

24B-35B with 2-way tensor parallelism

Stretch

70B at q4 with both cards

Configurable

+RAM: 16GB DDR4 ECC / 8GB DDR4 ECC (budget) / 32GB DDR4 ECC / 64GB DDR4 ECC
+CPU: Xeon E5-1620 v3 4-core (standard) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
+Storage: 256GB SSD / 512GB SSD / 1TB NVMe
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•2x P100 is slower per-token than 1x V100 (no tensor cores, PCIe communication overhead)

•Models must be split across cards. Single-card models limited to 16GB

•Tower chassis is larger than SFF CostaBox

⚡Standard 120V outlet. 2x250W GPUs + host = ~600W peak.

↑Upgrade path: CostaBox Duo 48 or Foundry WS-48

Configure CB-32D

CB-48Ships in 1-2 weeks

48GB

Synoros CostaBox Duo 48

48GB VRAM in a desktop tower: Dual P40

Mini Tower / Serious hobbyist or small team wanting 70B at home

GPU2x Tesla P40 24GB

CPUXeon E5-1620 v3 (4C/8T)

RAM16GB DDR4 ECC

DISK256GB SSD

BUSPCIe 3.0 / Mini Tower

Starting at

$1,299

Ships in 1-2 weeks

Sweet spot

35B-70B at q4 quantization

Stretch

70B at q5/q6 with tight VRAM budget

Configurable

+RAM: 16GB DDR4 ECC / 8GB DDR4 ECC (budget) / 32GB DDR4 ECC / 64GB DDR4 ECC
+CPU: Xeon E5-1620 v3 (budget) / E5-2680 v3 (performance)
+Storage: 256GB SSD / 512GB SSD / 1TB NVMe
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•P40 has no tensor cores. Capacity-first, not speed-first

•70B at q4 is a tight fit (~43GB). Limited context window

•2 slots used in Z440/Z620. No further GPU expansion

⚡Standard 120V outlet. 2x250W GPUs + host = ~600W peak.

↑Upgrade path: CostaBox Duo 64 or Foundry WS-72

Configure CB-48

CB-32Ships in 1-2 weeks

32GB

Synoros CostaBox 32

V100 tensor cores in a desktop form factor

Compact Desktop / Developer who wants speed and capacity in the smallest box

GPU1x Tesla V100 32GB

CPUXeon E3-1245 v3 (4C/8T)

RAM16GB DDR4

DISK256GB SSD

BUSPCIe 3.0 / Compact Desktop

Starting at

$1,399

Ships in 1-2 weeks

Sweet spot

27B-35B models with room for context

Stretch

70B at aggressive q3/q4 with CPU offload

Configurable

+CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
+RAM: 16GB DDR4 / 8GB DDR4 (budget) / 32GB DDR4 / 64GB DDR4
+Storage: 256GB SSD / 512GB SSD / 1TB NVMe
+Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Compact chassis limits cooling. May need power limiting

•Single slot. Not expandable without chassis swap

•V100 PCIe (not SXM). Still excellent for single-card inference

⚡Standard 120V outlet. Total system draw under 400W.

↑Upgrade path: CostaBox Duo 64 or Foundry WS-96

Configure CB-32

CB-64Ships in 1-2 weeks

64GB

Synoros CostaBox Duo 64

Dual V100 tensor cores in a tower

Mini Tower / Developer or researcher wanting 70B with real speed

GPU2x Tesla V100 32GB

CPUXeon E5-1650 v3 (6C/12T)

RAM16GB DDR4 ECC

DISK512GB NVMe

BUSPCIe 3.0 / Mini Tower

Starting at

$2,499

Ships in 1-2 weeks

Sweet spot

70B q4 with comfortable headroom

Stretch

70B q6/q8 for higher quality output

Configurable

+RAM: 16GB DDR4 ECC (standard) / 32GB DDR4 ECC / 64GB DDR4 ECC / 128GB DDR4 ECC
+CPU: Xeon E5-1650 v3 6-core (standard) / Xeon E5-2680 v3 12-core / Xeon E5-2690 v3 12-core (higher clocks)
+Storage: 512GB SSD / 1TB NVMe
+NVLink: None (standard) / V100 NVLink bridge (doubles inter-GPU bandwidth)
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•V100 PCIe supports 2-way NVLink bridge (available as upgrade). Without bridge, tensor parallelism runs over PCIe

•Z440/Z620 has 2 PCIe x16 slots. Not expandable to 3+ GPUs

•Higher cost than P40 Duo but meaningfully faster per token

⚡Standard 120V outlet. 2x250W GPUs + host = ~650W peak.

↑Upgrade path: Foundry WS-96 or Foundry Rack 128

Configure CB-64

Multi-GPU tower workstations

WS-48Ships in 1-2 weeks

48GB

Synoros Foundry WS-48

3x P100 in a dual-Xeon tower: Entry multi-GPU

Tower / Hobbyist or first-time local LLM buyer

GPU3x Tesla P100 16GB

CPUDual Xeon E5-2620 v3 (12C/24T total)

RAM32GB DDR4 ECC

DISK256GB SSD

BUSPCIe 3.0 / Tower

Starting at

$1,299

Ships in 1-2 weeks

Sweet spot

24B-35B parameter models

Stretch

70B q4 experiments

Configurable

+RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
+CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
+Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional 2.5" or 3.5" SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•3x250W GPUs exceed HP's official 3x225W graphics envelope. Power-limited and thermally validated during burn-in

•PCIe 3.0 bandwidth ceiling visible during model load/offload, not during steady-state decode

•Pascal-generation FP16 performance. Compute-bound on small batch sizes

⚡Standard 120V outlet (1275W PSU). Power-limited GPUs draw ~225W each.

↑Upgrade path: Synoros Foundry WS-96 or Rack 128

Configure WS-48

WS-72Ships in 1-2 weeks

72GB

Synoros Foundry WS-72

Cheapest path to 70B-capable workstation VRAM

Tower / Capacity-first tinkerer or budget model collector

GPU3x Tesla P40 24GB

CPUDual Xeon E5-2620 v3 (12C/24T total)

RAM32GB DDR4 ECC

DISK256GB SSD

BUSPCIe 3.0 / Tower

Starting at

$2,099

Ships in 1-2 weeks

Sweet spot

70B q4 models

Stretch

122B low-quant experiments

Configurable

+RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
+CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
+Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•P40 has weaker FP16 performance than P100/V100. Capacity-first, not speed-first

•PCIe 3.0 bandwidth ceiling on model load/offload

•No tensor cores. Pure CUDA compute

⚡Standard 120V outlet (1275W PSU). 3x250W GPU draw.

↑Upgrade path: Synoros Foundry WS-96 or R4-192

Configure WS-72

WS-96Ships in 1-2 weeks

96GB

Synoros Foundry WS-96

Best-value used tower for serious 70B work

Tower / Power user who wants 70B to feel real on a tower

GPU3x Tesla V100 32GB

CPUDual Xeon E5-2680 v4 (28C/56T total)

RAM64GB DDR4 ECC

DISK512GB NVMe

BUSPCIe 3.0 / Tower

Starting at

$3,999

Ships in 1-2 weeks

Sweet spot

70B q5/q6 models

Stretch

122B q3-ish with CPU offload

Configurable

+RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
+CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
+Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
+NVLink: V100 NVLink bridge for 2-way GPU pairing (doubles inter-GPU bandwidth)
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
+Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Still a PCIe 3.0 tower. Best value, not a modern platform

•V100 PCIe supports 2-way NVLink bridge (available as upgrade)

•Model load times limited by PCIe 3.0 host bandwidth

⚡Standard 120V outlet (1275W PSU). 3x250W GPU draw.

↑Upgrade path: Synoros Foundry R4-192 or R4-320

Configure WS-96

WS-128Built to Order

128GB

Synoros Foundry WS-128

128GB VRAM in a tower: Rack performance without the rack

Full Tower Workstation / Lab, startup, or power user who wants 4x V100 on a desk

GPU4x Tesla V100 32GB

CPUDual Xeon E5-2680 v4 (28C/56T total)

RAM64GB DDR4 ECC

DISK512GB NVMe

BUSPCIe 3.0 / Full Tower Workstation

Starting at

$5,499

Built to order (1-2 weeks)

Sweet spot

70B at comfortable quantization with headroom

Stretch

122B at low quant with 4-way parallelism

Configurable

+RAM: 64GB DDR4 ECC (standard) / 128GB DDR4 ECC / 256GB DDR4 ECC
+CPU: Dual Xeon E5-2620 v4 (budget) / E5-2680 v4 (performance) / E5-2697 v4 (18-core)
+Storage: 512GB NVMe boot / 1TB NVMe / 2TB NVMe + HDD data drive
+NVLink: None (standard) / V100 NVLink bridges for 2 GPU pairs
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC
+Cooling: Air cooled (standard) / Enhanced fan configuration for 4-GPU thermals
+Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Full tower is large. This is a floor-standing workstation, not a desk box

•4x250W passive GPUs need good case airflow. Validated during burn-in

•PCIe 3.0 host limits load/offload but steady-state inference is fine

•V100 PCIe supports 2-way NVLink bridges (pairs of cards). Available as upgrade

⚡120V with high-wattage PSU (1400W+). 4x250W GPUs + dual Xeon = ~1.5kW peak.

↑Upgrade path: Synoros Foundry Rack 128 or R8-256

Configure WS-128

WS-192TBuilt to Order

192GB

Synoros Foundry WS-192T

4x Ampere A6000 in a Threadripper Pro tower: 192GB VRAM on PCIe 4.0

Full Tower Workstation / Research team, AI startup, or production-grade desktop inference lab

GPU4x RTX A6000 48GB

CPUThreadripper PRO 5955WX (16C/32T)

RAM128GB DDR4 ECC (4x32GB RDIMM)

DISK1TB NVMe Gen4

BUSPCIe 4.0 / Full Tower Workstation

Starting at

$36,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit with Ampere tensor cores

Stretch

400B at aggressive quantization with CPU offload

Configurable

+RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
+CPU: Threadripper PRO 5955WX 16-core (standard) / 5975WX 32-core / 5995WX 64-core
+Storage: 1TB NVMe Gen4 (standard) / 2TB NVMe / 4TB NVMe + SATA data array
+NVLink: None (standard) / A6000 NVLink bridges for 2 GPU pairs
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling for quieter operation
+Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•A6000 is active-cooled (fan noise). Louder than passive server cards

•Threadripper Pro platform is more expensive than used Xeon towers

•4 slots used. Not expandable to 8 GPUs without switching to rack

⚡120V with 1600W+ PSU. 4x300W GPUs + Threadripper = ~1.6kW peak.

↑Upgrade path: Synoros Foundry R4-192A or R4-320

Configure WS-192T

WS-384Built to Order

384GB

Synoros Foundry WS-384

4x Blackwell 96GB in a modern tower: The premium desktop flagship

Full Tower Workstation / Research group, inference provider, or premium buyer who wants 384GB on a desk

GPU4x RTX PRO 6000 Blackwell 96GB

CPUXeon W9-3475X (36C/72T)

RAM256GB DDR5 ECC

DISK2TB NVMe

BUSPCIe 5.0 / Full Tower Workstation

Starting at

$39,999

Built to order (2-4 weeks)

Sweet spot

400B+ models with room to spare

Stretch

Multi-model serving or large-scale training

Configurable

+RAM: 256GB DDR5 ECC (standard) / 512GB DDR5 ECC / 1TB DDR5 ECC
+CPU: Xeon W9-3495X (56-core) / W9-3475X (36-core)
+Storage: 2TB NVMe boot + NVMe data array
+NVLink: None (standard) / RTX PRO 6000 NVLink bridge (confirm compatibility at order)
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE/400GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling (recommended for 4x passive GPUs)
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•RTX PRO 6000 is passive-cooled. Requires validated tower airflow (burn-in verified)

•600W TDP per card at full power. Power-limiting available (400-600W configurable)

•Premium pricing reflects new Blackwell cards + modern Xeon W9 platform

⚡240V recommended. 4x400-600W GPUs + Xeon W9 = 2.5-4kW depending on power config.

↑Upgrade path: Synoros Foundry S8-768

Configure WS-384

WS-320Built to Order

320GB

Synoros Foundry WS-320

4x A100 80GB in a modern tower: Datacenter in a box

Full Tower Workstation / Enterprise team, research lab, or AI startup wanting 320GB without a rack

GPU4x A100 80GB PCIe

CPUXeon W9-3475X (36C/72T)

RAM256GB DDR5 ECC

DISK2TB NVMe

BUSPCIe 4.0 GPUs / 5.0 host / Full Tower Workstation

Starting at

$82,999

Built to order (2-3 weeks)

Sweet spot

400B low-quant with 4-way tensor parallelism

Stretch

Serious 122B/397B production serving

Configurable

+RAM: 256GB DDR5 ECC (standard) / 512GB DDR5 ECC / 1TB DDR5 ECC
+CPU: Xeon W9-3495X (56-core) / W9-3475X (36-core)
+Storage: 2TB NVMe boot + NVMe data array
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE/400GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling for sustained compute
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•A100 PCIe variant. NVLink not available on PCIe A100 (only SXM version has NVLink)

•Premium pricing reflects A100 market rates + modern workstation platform

•Large tower footprint. This is a full-size workstation, not a compact build

⚡120V with 1600W+ PSU or 240V recommended. 4x300W GPUs + Xeon W9 = ~1.8kW peak.

↑Upgrade path: Synoros Foundry R4-384 or S8-768

Configure WS-320

Production rackmount servers

R8-192Built to Order

192GB

Synoros Foundry R8-192

Maximum VRAM per dollar in an 8-GPU chassis

4U Rackmount / Budget-conscious buyer who needs raw VRAM capacity over speed

GPU8x Tesla P40 24GB

CPUDual Xeon / Dual EPYC (varies)

RAM128GB DDR4 ECC

DISK1TB NVMe

BUSPCIe 3.0 / 4U Rackmount

Starting at

$6,499

Built to order (1-2 weeks)

Sweet spot

122B low-quant with 8-way parallelism

Stretch

400B experiments with aggressive quantization + CPU offload

Configurable

+RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
+CPU: Dual Xeon / Dual EPYC 7002 (flexible based on chassis)
+Storage: 1TB NVMe boot + SATA/NVMe array (hot-swap bays available)
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC / ConnectX-5 100GbE SmartNIC
+GPU swap: Replace P40s with V100 32GB (better compute, same socket)
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•P40 has no tensor cores and weak FP16. This is a capacity play, not a speed play

•8x250W passive cards = 2kW GPU draw alone. 208/240V mandatory

•PCIe 3.0 host limits load/offload speed across all 8 cards

•Older compute architecture. Inference speed per token is slower than Ampere/Volta equivalent VRAM

⚡240V required. 8x250W GPUs + dual Xeon/EPYC host = ~2.5-3kW steady state.

↑Upgrade path: Synoros Foundry R4-192A or R4-320

Configure R8-192

R4-128Ships in 1-2 weeks

128GB

Synoros Foundry Rack 128

Entry rack for clean 70B deployment

4U Rackmount / Small lab or serious home rack builder

GPU4x Tesla V100 32GB

CPUDual Xeon E5-2680 v3 (24C/48T total)

RAM64GB DDR4 ECC

DISK512GB NVMe

BUSPCIe 3.0 / 4U Rackmount

Starting at

$5,499

Ships in 1-2 weeks

Sweet spot

70B at comfortable quantization

Stretch

122B low-quant with CPU offload margin

Configurable

+RAM: 64GB DDR4 ECC (standard) / 128GB DDR4 ECC / 256GB DDR4 ECC
+CPU: Xeon E5-2680 v3 (budget) / E5-2697 v4 (18-core)
+Storage: 512GB NVMe boot (standard) / 1TB NVMe boot / + up to 7x additional 3.5" SATA hot-swap drives (8 bays total)
+NVLink: None (standard) / V100 NVLink bridge for paired cards
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC / ConnectX-5 100GbE SmartNIC
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•4x250W passive GPUs on 120V is not comfortable. 208/240V strongly recommended

•PCIe 3.0 host limits model load throughput

•V100 PCIe supports 2-way NVLink bridge. Available as upgrade for paired cards

⚡208/240V strongly recommended. 4x250W GPUs + host = ~1.5kW total draw.

↑Upgrade path: Synoros Foundry R4-192

Configure R4-128

R8-256Built to Order

256GB

Synoros Foundry R8-256

8x V100 performance rack with 256GB VRAM

4U Rackmount / Serious lab, training workloads, or multi-model serving

GPU8x Tesla V100 32GB

CPUDual Xeon / Dual EPYC (varies)

RAM128GB DDR4 ECC

DISK1TB NVMe

BUSPCIe 3.0 / 4U Rackmount

Starting at

$10,999

Built to order (1-2 weeks)

Sweet spot

70B-122B with 8-way tensor parallelism

Stretch

400B at aggressive quantization with CPU offload

Configurable

+RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC / 1TB DDR4 ECC
+CPU: Dual Xeon / Dual EPYC 7002 (flexible based on chassis)
+Storage: 1TB NVMe boot + SATA/NVMe array (hot-swap bays)
+NVLink: None (standard) / V100 NVLink bridges for GPU pairing
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•8x250W passive cards = 2kW GPU draw. 208/240V mandatory

•PCIe 3.0 host limits model load throughput but steady-state decode is fine

•V100 PCIe supports 2-way NVLink bridge (available as upgrade)

•Used enterprise chassis. Cosmetic wear does not affect functionality

⚡240V required. 8x250W GPUs + dual CPU host = ~2.5-3kW steady state.

↑Upgrade path: Synoros Foundry R4-320 or R4-384

Configure R8-256

R4-192ABuilt to Order

192GB

Synoros Foundry R4-192A

Modern CUDA 48GB-per-card rack with tensor cores

4U Rackmount / Startup or lab wanting speed and capacity on NVIDIA CUDA

GPU4x RTX A6000 48GB

CPUEPYC 7313 (16C/32T)

RAM128GB DDR4 ECC

DISK1TB NVMe

BUSPCIe 4.0 / 4U Rackmount

Starting at

$35,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit inference with Ampere tensor cores

Stretch

400B with aggressive quantization + CPU offload

Configurable

+RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
+CPU: EPYC 7313 (budget) / EPYC 7443 (24-core) / EPYC 7543 (32-core)
+Storage: 1TB NVMe boot + NVMe/SATA data array
+NVLink: None (standard) / A6000 NVLink bridge for GPU pairing
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling for quieter deployment
+GPU upgrade: A6000 Ada (Lovelace) when used prices drop
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•A6000 is a workstation card (active cooling, 300W). Louder than passive server cards

•Used A6000 pricing has come down but is still higher than P40/V100 per-GB

•4 slots used. Not expandable beyond 192GB without a chassis swap

⚡208/240V recommended. 4x300W GPUs + EPYC host = ~1.8-2kW peak.

↑Upgrade path: Synoros Foundry R4-320 or R4-384

Configure R4-192A

R4-192WBuilt to Order

192GB

Synoros Foundry R4-192W

Modern 48GB-per-card 4-GPU rack

4U Rackmount / Startup, boutique datacenter, or quiet-ish rack buyer

GPU4x Radeon Pro W7900 48GB

CPUEPYC 7313 (16C/32T)

RAM128GB DDR4 ECC

DISK1TB NVMe

BUSPCIe 4.0 / 4U Rackmount

Starting at

$14,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit inference

Stretch

400B floor still out of reach without CPU offload

Configurable

+RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
+CPU: EPYC 7313 (budget) / EPYC 7443 (24-core) / EPYC 7543 (32-core)
+Storage: 1TB NVMe boot + NVMe/SATA data array
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Active workstation GPUs. Easier acoustically, higher per-card cost

•AMD ROCm ecosystem. Verify framework compatibility for your stack

•Premium pricing reflects current W7900 market rates

⚡208/240V recommended. 4x active-cooled GPUs + EPYC host = ~1.8kW peak.

↑Upgrade path: Synoros Foundry R4-320 or R4-384

Configure R4-192W

R4-384Built to Order

384GB

Synoros Foundry R4-384

Premium 4-card single-node 400B platform

4U Rackmount / Provider or research team needing one serious 4-card node

GPU4x RTX PRO 6000 Blackwell 96GB

CPUDual EPYC 7443 (48C/96T total)

RAM256GB DDR4 ECC

DISK2TB NVMe

BUSPCIe 5.0 / 4.0 / 4U Rackmount

Starting at

$31,999

Built to order (2-3 weeks)

Sweet spot

400B-class models with room to spare

Stretch

Serious multi-model serving infrastructure

Configurable

+RAM: 256GB DDR4 ECC (standard) / 512GB DDR4 ECC / 1TB DDR4 ECC
+CPU: Dual EPYC 7443 (standard) / 7713 (64-core)
+Storage: 2TB NVMe boot + NVMe/SATA data array
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling
+PCIe 5.0 host platform upgrade (eliminates bandwidth compromise)
+GPU expansion: 8x RTX PRO 6000 (768GB) in 8-GPU chassis
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•PCIe 4.0 host halves host-link bandwidth vs Gen5. Acceptable for inference, not ideal for high-throughput batching

•600W TDP per card at full power. Power-limiting available (400-600W configurable)

•New-generation cards. Pricing reflects current market, not used/refurbished

⚡240V required. 4x400-600W GPUs + host = 3-4kW depending on power config.

↑Upgrade path: Synoros Foundry S8-768

Configure R4-384

R4-320Built to Order

320GB

Synoros Foundry R4-320

Used-enterprise 80GB/card performance rack

4U Rackmount / Enterprise pilot or premium used-rack buyer

GPU4x A100 80GB PCIe

CPUDual EPYC 7443 (48C/96T total)

RAM256GB DDR4 ECC

DISK2TB NVMe

BUSPCIe 4.0 / 4U Rackmount

Starting at

$75,999

Built to order (2-3 weeks)

Sweet spot

122B/397B serving with tensor parallelism

Stretch

400B low-quant single-node inference

Configurable

+RAM: 256GB DDR4 ECC (standard) / 512GB DDR4 ECC / 1TB DDR4 ECC
+CPU: Dual EPYC 7443 (standard) / 7543 (32-core) / 7713 (64-core)
+Storage: 2TB NVMe boot + up to 8x NVMe/SATA hot-swap data drives
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC / ConnectX-7 400GbE SmartNIC
+Cooling: Liquid cooling option for datacenter deployment
+GPU expansion: Add 4 more A100s (640GB total) in same chassis
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•Premium used-enterprise gear. Not mainstream budget hardware

•A100 PCIe variant (not SXM). NVLink not available

•High power draw requires dedicated 208/240V circuit

⚡240V required. 4x300W GPUs + dual EPYC host = ~2.5kW steady state.

↑Upgrade path: Synoros Foundry R4-384 or S8-768

Configure R4-320

S8-768Built to Order

768GB

Synoros Foundry S8-768

Budget-datacenter flagship for large-model serving

4U/5U Rackmount / Budget datacenter, inference provider, or halo SKU buyer

GPU8x RTX PRO 6000 Blackwell 96GB

CPUDual EPYC 7443 (48C/96T total)

RAM512GB DDR4 ECC

DISK2TB NVMe

BUSPCIe 4.0 host / 5.0 GPUs / 4U/5U Rackmount

Starting at

$59,999

Built to order (3-4 weeks)

Sweet spot

400B+ models with full tensor parallelism

Stretch

Multi-model serving infrastructure or training workloads

Configurable

+RAM: 512GB DDR4 ECC (standard) / 1TB DDR4 ECC / 2TB DDR4 ECC
+CPU: Dual EPYC 7443 (standard) / 7713 (64-core)
+Storage: 2TB NVMe boot + up to 24x 2.5" hot-swap SATA/NVMe bays
+Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC / ConnectX-7 400GbE SmartNIC
+Cooling: Air cooled (standard) / Liquid cooling for sustained full-power operation
+PCIe 5.0 host platform upgrade (eliminates all bandwidth compromise)
+Power: Dual 240V 30A circuits for redundancy
+Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
+OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

•PCIe 4.0 host is the cost-down move. Gen5 host would add significant cost

•5kW+ power draw at full load. 240V 30A minimum, 40A recommended

•Passive GPUs require validated server chassis airflow. Do not attempt in consumer cases

⚡240V 30A minimum (40A recommended). 8x400-600W GPUs + dual EPYC + 512GB RAM = 4-6kW.

↑Upgrade path: 2-node cluster or premium Gen5 host refresh

Configure S8-768

What Can You Run

Green = comfortable, amber = possible with quantization, dash = not viable.

Model Size	CB-16 16GB	CB-24 24GB	CB-32 32GB	CB-32D 32GB	CB-48 48GB	CB-64 64GB	CO-12 12GB	CO-16 16GB	WS-128 128GB	WS-192T 192GB	WS-320 320GB	WS-384 384GB	WS-48 48GB	WS-72 72GB	WS-96 96GB	R4-128 128GB	R8-192 192GB	R8-256 256GB	R4-192A 192GB	R4-192W 192GB	R4-320 320GB	R4-384 384GB	S8-768 768GB
7B	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
13B	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
24B-35B	~	✓	✓	✓	✓	✓	~	~	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
70B	-	-	~	~	~	✓	-	-	✓	✓	✓	✓	~	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
122B	-	-	-	-	-	-	-	-	~	✓	✓	✓	-	~	~	~	✓	✓	✓	✓	✓	✓	✓
400B+	-	-	-	-	-	-	-	-	-	~	✓	✓	-	-	-	-	~	~	~	~	✓	✓	✓

Model VRAM Guide

Approximate VRAM at Q4_K_M and Q8 quantization. Use this to match models to configurations above.

Model	Params	Q4 VRAM	Q8 VRAM	Notes
Qwen 3.5qwen3.5:0.8b	0.8B	1.5 GB	2 GB
qwen3.5:2b	2B	2.5 GB	3.5 GB
qwen3.5:4b	4.7B	4 GB	6.5 GB
qwen3.5:9b	9.7B	7 GB	12 GB
Qwen 3qwen3:14b	14B	9 GB	16 GB
Aprielapriel-15b-thinker	15B	10 GB	17 GB
GPT-OSSgpt-oss:20b	21B (3.6B active) MoE			MXFP4: 16 GB
Llama 3llama3.1:8b	8B	5.5 GB	9.5 GB
llama3.3:70b	70B	40 GB	74 GB
Mistralmistral-small3.1:24b	24B	14 GB	26 GB
mistral-nemo:12b	12B	8 GB	14 GB
Gemma 3gemma3:4b	4B	3.5 GB	5.5 GB
gemma3:12b	12B	8 GB	14 GB
gemma3:27b	27B	17 GB	30 GB
Phiphi4:14b	14B	9 GB	16 GB
DeepSeek R1deepseek-r1:7b	7B	5 GB	8.5 GB
deepseek-r1:14b	14B	9 GB	16 GB
deepseek-r1:32b	32B	19 GB	35 GB
Devstraldevstral	24B (MoE) MoE	14 GB	26 GB
Qwen Coderqwen3-coder	30B (3.3B active) MoE	8.5 GB	18 GB
qwen2.5-coder:14b	14B	10 GB	16 GB

Green = fits 16 GB, amber = tight fit, red = needs more VRAM. MoE models load all weights but activate a fraction per token. See our quantization guide for details.

Performance Benchmarks

Published inference speeds from third-party benchmarks. Single GPU, single user, llama.cpp / Ollama.

7B Q4 Generation (tokens/sec, single GPU)

~35-50

P100 16GB

~41

P40 24GB

~85-107

V100 32GB

~102

A6000 48GB

~121

W7900 48GB

~138

A100 80GB

~185

RTX PRO 6000

Sources: GPU-Benchmarks-on-LLM-Inference (GitHub), DatabaseMart, Hardware Corner, LocalScore, GamersNexus.

Frequently Asked Questions

Common questions about our GPU server configurations.

What GPU do I need to run a 70B parameter model locally?

Running a 70B model (like Llama 3 70B) at Q4 quantization requires roughly 40GB of VRAM minimum. Our CostaBox Duo 48 (2x P40, 48GB) handles this comfortably, while the CostaBox Duo 64 (2x V100, 64GB) gives headroom for higher quantization and longer context. For full-precision 70B, look at our Workstation or Rackmount configurations with 96GB+ VRAM.

Are these servers new or refurbished?

Foundry Lite and Workstation builds use quality-certified refurbished enterprise components sourced from verified datacenters. Every GPU, CPU, and memory module is individually tested during a 48-hour burn-in with real LLM workloads before shipping. GPUs carry a 90-day warranty, non-GPU components 1 year, with DOA replacement within 14 days. CostaBox PCs (CO-12, CO-16) use all-new retail components.

Can I use these servers for training or just inference?

Our servers are primarily optimized for inference, which is what most local LLM users need. Training is possible on higher-end configurations with A100 or RTX PRO 6000 GPUs that have sufficient memory bandwidth and tensor core performance. For inference workloads, even our budget P100 and P40 builds deliver excellent tokens-per-second at a fraction of cloud API costs.

What is the difference between P100, P40, V100, A100, and RTX PRO 6000?

P100 (16GB, no tensor cores) is our budget entry point for 7B-13B models. P40 (24GB, no tensor cores) fits larger models at the best price per GB of VRAM. V100 (32GB, tensor cores) is the sweet spot for 24B-35B models with faster inference. A100 (80GB, 3rd-gen tensor cores) handles 70B+ models with high throughput. RTX PRO 6000 (96GB, latest architecture) is the fastest single-GPU option for maximum model size and speed.

Do these servers come with an operating system?

Every build ships with Costa OS (our Arch-based Linux with Ollama pre-configured) by default. You can choose Ubuntu 24.04 LTS, Windows 11 Pro, or no OS during configuration. Costa OS includes local model routing, voice control, and agent navigation, but any Linux distro or Windows will work with the hardware.

What warranty coverage do GPU servers have?

Non-GPU components (CPU, RAM, motherboard, PSU, storage, chassis) are covered for 1 year. Used datacenter GPUs are covered for 90 days. New GPUs (in CostaBox PCs and select configurations) carry a full 1-year warranty. Dead-on-arrival replacement is handled within 14 days. Full warranty terms are available on our warranty page.

Can I customize the configuration before ordering?

Yes. Every SKU has a configurator page where you can upgrade RAM, storage, CPU, networking, and GPU options with live pricing. For builds outside our standard options, contact sales for a fully custom configuration quote. We respond within 24 hours.

Warranty & Support

1 Year

Non-GPU warranty

90 Days

Used GPU warranty

10 Days

RMA turnaround target

Direct

Builder email support

Non-GPU components covered for 1 year. Used GPUs (which may have prior mining or datacenter runtime) covered for 90 days. New GPUs carry a full 1-year warranty. DOA replacement within 14 days.

Read full warranty terms →

Ready to Build?

Tell us what you need. We respond within 24 hours. Custom configurations available on request.

Name

Interested Configuration

Message

We respond within 24 hours. You will receive an email confirmation.