Run AI Models Locally

Pre-built computers that run large language models on your desk or network. From compact boxes for personal use to rack servers for production workloads. Every system is tested for 48+ hours, warrantied, and ships with a guide showing exactly which models it can run.

21

Configurations

16-768GB

VRAM Range

$499

Starting Price

90d-1yr

Hardware Warranty

Why Buy From Us

Every build is assembled, tested, and documented by the same engineers who run these systems for production LLM inference.

Tested and Validated

Every system runs real AI workloads for 48+ hours before shipping. We verify thermals, memory, and model performance so you know it works out of the box.

Fraction of the Price

We build from quality refurbished enterprise components instead of charging new-retail markups. Same hardware, 40-60% less than comparable pre-built systems.

Hardware Warranty

Non-GPU components covered for 1 year. Used GPUs covered for 90 days. New GPUs carry a full 1-year warranty. DOA replacement within 14 days.

Model Guidance Included

Each configuration ships with tested model recommendations and quantization guidance so you know exactly what runs and at what quality level.

Transparent Pricing

We publish our exact component costs, sources, and markup formula. See how every dollar is spent.

Read the full breakdown →

No Vendor Lock-in

Standard server hardware, standard GPUs, standard Linux. No proprietary firmware, no licensing fees, no support contracts required. You own it outright.

Three Series

Foundry Lite

CB-16 / CB-24 / CB-32 / CB-32D / CB-48 / CB-64

Local AI boxes for your desk or home network. Quiet, compact, plugs into a standard outlet. Run models like Llama, Mistral, and Qwen privately on your own hardware. From $499.

16-64GB GPU memoryDesk/shelf size1-2 GPUsStandard outlet$499-$2,499

Foundry Workstations

WS-48 / WS-72 / WS-96 / WS-128 / WS-192T / WS-320 / WS-384

Multi-GPU towers for teams and heavier workloads. Run 70B+ parameter models or serve multiple users on your local network. 3-4 GPUs in a tower chassis.

48-384GB GPU memoryTower3-4 GPUs120-240V$1,299-$82,999

Foundry Rack & Servers

R4-128 / R8-192 / R8-256 / R4-192A / R4-192W / R4-320 / R4-384 / S8-768

Rackmount servers for production deployment, model training, and infrastructure. 4-8 GPUs, designed for datacenter or server closet installation.

128-768GB GPU memory4U Rack4-8 GPUs208/240V$5,499-$75,999

Quality Certified Refurbished Parts. We source quality-certified refurbished and lightly used enterprise components from verified suppliers. Every GPU, CPU, and memory module is individually tested and validated during our 48-hour burn-in process. This is how we keep prices 40-60% below comparable new-build configurations without compromising reliability. GPUs carry a 90-day warranty, all other components 1 year. DOA replacement within 14 days.

Choose Your Configuration

0

Ultra-budget inference boxes with datacenter GPUs

CB-16Ships in 1-2 weeks
Synoros CostaBox 16 - 1x Tesla P100 16GB

16GB

Synoros CostaBox 16

Cheapest local LLM box that actually works

Compact Desktop / First-time local LLM user, student, or hobbyist

GPU1x Tesla P100 16GB
CPUXeon E3-1245 v3 (4C/8T)
RAM8GB DDR4
DISK256GB SSD
BUSPCIe 3.0 / Compact Desktop

Starting at

$499

Ships in 1-2 weeks

Sweet spot

7B-13B models at full quality

Stretch

24B-27B at q4 quantization

Configurable

  • +CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
  • +RAM: 8GB DDR4 (standard) / 16GB DDR4 / 32GB DDR4
  • +Storage: 256GB SSD (standard) / 512GB NVMe / 1TB NVMe
  • +Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Compact chassis limits cooling. GPU is power-limited for thermals

Single PCIe slot. Not expandable without changing chassis

16GB VRAM ceiling means larger models require aggressive quantization

Standard 120V outlet. Total system draw under 300W.

Upgrade path: CostaBox 24 or CostaBox 32

CB-24Ships in 1-2 weeks
Synoros CostaBox 24 - 1x Tesla P40 24GB

24GB

Synoros CostaBox 24

24GB VRAM for 27B model inference

Compact Desktop / Developer, researcher, or small team wanting real 27B inference

GPU1x Tesla P40 24GB
CPUXeon E3-1245 v3 (4C/8T)
RAM8GB DDR4
DISK256GB SSD
BUSPCIe 3.0 / Compact Desktop

Starting at

$799

Ships in 1-2 weeks

Sweet spot

13B-27B models (Gemma 2 27B, Qwen 2.5 27B, Mistral Small)

Stretch

35B at q4 quantization

Configurable

  • +CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
  • +RAM: 8GB DDR4 (standard) / 16GB DDR4 / 32GB DDR4
  • +Storage: 256GB SSD (standard) / 512GB NVMe / 1TB NVMe
  • +Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

P40 has no tensor cores. Slower FP16 than V100, but 24GB is the draw

Compact chassis limits cooling. GPU power-limited for thermals

Single slot. Not expandable

Standard 120V outlet. Total system draw under 350W.

Upgrade path: CostaBox 32 or CostaBox Duo 48

CB-32DShips in 1-2 weeks
Synoros CostaBox Duo 32 - 2x Tesla P100 16GB

32GB

Synoros CostaBox Duo 32

Two P100s for the price of one V100

Mini Tower / Budget buyer who wants 2-way parallelism

GPU2x Tesla P100 16GB
CPUXeon E5-1620 v3 (4C/8T)
RAM16GB DDR4 ECC
DISK256GB SSD
BUSPCIe 3.0 / Mini Tower

Starting at

$749

Ships in 1-2 weeks

Sweet spot

24B-35B with 2-way tensor parallelism

Stretch

70B at q4 with both cards

Configurable

  • +RAM: 16GB DDR4 ECC / 8GB DDR4 ECC (budget) / 32GB DDR4 ECC / 64GB DDR4 ECC
  • +CPU: Xeon E5-1620 v3 4-core (standard) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
  • +Storage: 256GB SSD / 512GB SSD / 1TB NVMe
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

2x P100 is slower per-token than 1x V100 (no tensor cores, PCIe communication overhead)

Models must be split across cards. Single-card models limited to 16GB

Tower chassis is larger than SFF CostaBox

Standard 120V outlet. 2x250W GPUs + host = ~600W peak.

Upgrade path: CostaBox Duo 48 or Foundry WS-48

CB-48Ships in 1-2 weeks
Synoros CostaBox Duo 48 - 2x Tesla P40 24GB

48GB

Synoros CostaBox Duo 48

48GB VRAM in a desktop tower: Dual P40

Mini Tower / Serious hobbyist or small team wanting 70B at home

GPU2x Tesla P40 24GB
CPUXeon E5-1620 v3 (4C/8T)
RAM16GB DDR4 ECC
DISK256GB SSD
BUSPCIe 3.0 / Mini Tower

Starting at

$1,299

Ships in 1-2 weeks

Sweet spot

35B-70B at q4 quantization

Stretch

70B at q5/q6 with tight VRAM budget

Configurable

  • +RAM: 16GB DDR4 ECC / 8GB DDR4 ECC (budget) / 32GB DDR4 ECC / 64GB DDR4 ECC
  • +CPU: Xeon E5-1620 v3 (budget) / E5-2680 v3 (performance)
  • +Storage: 256GB SSD / 512GB SSD / 1TB NVMe
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

P40 has no tensor cores. Capacity-first, not speed-first

70B at q4 is a tight fit (~43GB). Limited context window

2 slots used in Z440/Z620. No further GPU expansion

Standard 120V outlet. 2x250W GPUs + host = ~600W peak.

Upgrade path: CostaBox Duo 64 or Foundry WS-72

CB-32Ships in 1-2 weeks
Synoros CostaBox 32 - 1x Tesla V100 32GB

32GB

Synoros CostaBox 32

V100 tensor cores in a desktop form factor

Compact Desktop / Developer who wants speed and capacity in the smallest box

GPU1x Tesla V100 32GB
CPUXeon E3-1245 v3 (4C/8T)
RAM16GB DDR4
DISK256GB SSD
BUSPCIe 3.0 / Compact Desktop

Starting at

$1,399

Ships in 1-2 weeks

Sweet spot

27B-35B models with room for context

Stretch

70B at aggressive q3/q4 with CPU offload

Configurable

  • +CPU: Xeon E3-1245 v3 4-core (standard, has iGPU) / Xeon E3-1275 v3 4-core (faster, has iGPU) / Xeon E5-1650 v3 6-core (no iGPU) / Xeon E5-2680 v3 12-core (no iGPU)
  • +RAM: 16GB DDR4 / 8GB DDR4 (budget) / 32GB DDR4 / 64GB DDR4
  • +Storage: 256GB SSD / 512GB SSD / 1TB NVMe
  • +Display: Headless (no display adapter, SSH only, standard) / iGPU display (no extra card) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Compact chassis limits cooling. May need power limiting

Single slot. Not expandable without chassis swap

V100 PCIe (not SXM). Still excellent for single-card inference

Standard 120V outlet. Total system draw under 400W.

Upgrade path: CostaBox Duo 64 or Foundry WS-96

CB-64Ships in 1-2 weeks
Synoros CostaBox Duo 64 - 2x Tesla V100 32GB

64GB

Synoros CostaBox Duo 64

Dual V100 tensor cores in a tower

Mini Tower / Developer or researcher wanting 70B with real speed

GPU2x Tesla V100 32GB
CPUXeon E5-1650 v3 (6C/12T)
RAM16GB DDR4 ECC
DISK512GB NVMe
BUSPCIe 3.0 / Mini Tower

Starting at

$2,499

Ships in 1-2 weeks

Sweet spot

70B q4 with comfortable headroom

Stretch

70B q6/q8 for higher quality output

Configurable

  • +RAM: 16GB DDR4 ECC (standard) / 32GB DDR4 ECC / 64GB DDR4 ECC / 128GB DDR4 ECC
  • +CPU: Xeon E5-1650 v3 6-core (standard) / Xeon E5-2680 v3 12-core / Xeon E5-2690 v3 12-core (higher clocks)
  • +Storage: 512GB SSD / 1TB NVMe
  • +NVLink: None (standard) / V100 NVLink bridge (doubles inter-GPU bandwidth)
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

V100 PCIe supports 2-way NVLink bridge (available as upgrade). Without bridge, tensor parallelism runs over PCIe

Z440/Z620 has 2 PCIe x16 slots. Not expandable to 3+ GPUs

Higher cost than P40 Duo but meaningfully faster per token

Standard 120V outlet. 2x250W GPUs + host = ~650W peak.

Upgrade path: Foundry WS-96 or Foundry Rack 128

Multi-GPU tower workstations

WS-48Ships in 1-2 weeks
Synoros Foundry WS-48 - 3x Tesla P100 16GB

48GB

Synoros Foundry WS-48

3x P100 in a dual-Xeon tower: Entry multi-GPU

Tower / Hobbyist or first-time local LLM buyer

GPU3x Tesla P100 16GB
CPUDual Xeon E5-2620 v3 (12C/24T total)
RAM32GB DDR4 ECC
DISK256GB SSD
BUSPCIe 3.0 / Tower

Starting at

$1,299

Ships in 1-2 weeks

Sweet spot

24B-35B parameter models

Stretch

70B q4 experiments

Configurable

  • +RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
  • +CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
  • +Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional 2.5" or 3.5" SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

3x250W GPUs exceed HP's official 3x225W graphics envelope. Power-limited and thermally validated during burn-in

PCIe 3.0 bandwidth ceiling visible during model load/offload, not during steady-state decode

Pascal-generation FP16 performance. Compute-bound on small batch sizes

Standard 120V outlet (1275W PSU). Power-limited GPUs draw ~225W each.

Upgrade path: Synoros Foundry WS-96 or Rack 128

WS-72Ships in 1-2 weeks
Synoros Foundry WS-72 - 3x Tesla P40 24GB

72GB

Synoros Foundry WS-72

Cheapest path to 70B-capable workstation VRAM

Tower / Capacity-first tinkerer or budget model collector

GPU3x Tesla P40 24GB
CPUDual Xeon E5-2620 v3 (12C/24T total)
RAM32GB DDR4 ECC
DISK256GB SSD
BUSPCIe 3.0 / Tower

Starting at

$2,099

Ships in 1-2 weeks

Sweet spot

70B q4 models

Stretch

122B low-quant experiments

Configurable

  • +RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
  • +CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
  • +Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

P40 has weaker FP16 performance than P100/V100. Capacity-first, not speed-first

PCIe 3.0 bandwidth ceiling on model load/offload

No tensor cores. Pure CUDA compute

Standard 120V outlet (1275W PSU). 3x250W GPU draw.

Upgrade path: Synoros Foundry WS-96 or R4-192

WS-96Ships in 1-2 weeks
Synoros Foundry WS-96 - 3x Tesla V100 32GB

96GB

Synoros Foundry WS-96

Best-value used tower for serious 70B work

Tower / Power user who wants 70B to feel real on a tower

GPU3x Tesla V100 32GB
CPUDual Xeon E5-2680 v4 (28C/56T total)
RAM64GB DDR4 ECC
DISK512GB NVMe
BUSPCIe 3.0 / Tower

Starting at

$3,999

Ships in 1-2 weeks

Sweet spot

70B q5/q6 models

Stretch

122B q3-ish with CPU offload

Configurable

  • +RAM: 16GB DDR4 ECC (headless) / 32GB DDR4 ECC (standard) / 64GB DDR4 ECC / 128GB DDR4 ECC
  • +CPU: Xeon E5-2620 v3 (budget) / E5-2680 v4 (performance)
  • +Storage: 256GB SSD boot (standard) / 512GB NVMe boot / 1TB NVMe boot / + additional SATA drives (up to 4x 3.5" + 4x 2.5" bays available)
  • +NVLink: V100 NVLink bridge for 2-way GPU pairing (doubles inter-GPU bandwidth)
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3)
  • +Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Still a PCIe 3.0 tower. Best value, not a modern platform

V100 PCIe supports 2-way NVLink bridge (available as upgrade)

Model load times limited by PCIe 3.0 host bandwidth

Standard 120V outlet (1275W PSU). 3x250W GPU draw.

Upgrade path: Synoros Foundry R4-192 or R4-320

WS-128Built to Order
Synoros Foundry WS-128 - 4x Tesla V100 32GB

128GB

Synoros Foundry WS-128

128GB VRAM in a tower: Rack performance without the rack

Full Tower Workstation / Lab, startup, or power user who wants 4x V100 on a desk

GPU4x Tesla V100 32GB
CPUDual Xeon E5-2680 v4 (28C/56T total)
RAM64GB DDR4 ECC
DISK512GB NVMe
BUSPCIe 3.0 / Full Tower Workstation

Starting at

$5,499

Built to order (1-2 weeks)

Sweet spot

70B at comfortable quantization with headroom

Stretch

122B at low quant with 4-way parallelism

Configurable

  • +RAM: 64GB DDR4 ECC (standard) / 128GB DDR4 ECC / 256GB DDR4 ECC
  • +CPU: Dual Xeon E5-2620 v4 (budget) / E5-2680 v4 (performance) / E5-2697 v4 (18-core)
  • +Storage: 512GB NVMe boot / 1TB NVMe / 2TB NVMe + HDD data drive
  • +NVLink: None (standard) / V100 NVLink bridges for 2 GPU pairs
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC
  • +Cooling: Air cooled (standard) / Enhanced fan configuration for 4-GPU thermals
  • +Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Full tower is large. This is a floor-standing workstation, not a desk box

4x250W passive GPUs need good case airflow. Validated during burn-in

PCIe 3.0 host limits load/offload but steady-state inference is fine

V100 PCIe supports 2-way NVLink bridges (pairs of cards). Available as upgrade

120V with high-wattage PSU (1400W+). 4x250W GPUs + dual Xeon = ~1.5kW peak.

Upgrade path: Synoros Foundry Rack 128 or R8-256

WS-192TBuilt to Order
Synoros Foundry WS-192T - 4x RTX A6000 48GB

192GB

Synoros Foundry WS-192T

4x Ampere A6000 in a Threadripper Pro tower: 192GB VRAM on PCIe 4.0

Full Tower Workstation / Research team, AI startup, or production-grade desktop inference lab

GPU4x RTX A6000 48GB
CPUThreadripper PRO 5955WX (16C/32T)
RAM128GB DDR4 ECC (4x32GB RDIMM)
DISK1TB NVMe Gen4
BUSPCIe 4.0 / Full Tower Workstation

Starting at

$36,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit with Ampere tensor cores

Stretch

400B at aggressive quantization with CPU offload

Configurable

  • +RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
  • +CPU: Threadripper PRO 5955WX 16-core (standard) / 5975WX 32-core / 5995WX 64-core
  • +Storage: 1TB NVMe Gen4 (standard) / 2TB NVMe / 4TB NVMe + SATA data array
  • +NVLink: None (standard) / A6000 NVLink bridges for 2 GPU pairs
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling for quieter operation
  • +Display: Headless (no display adapter, SSH only) / GT 710 1GB basic display (standard) / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

A6000 is active-cooled (fan noise). Louder than passive server cards

Threadripper Pro platform is more expensive than used Xeon towers

4 slots used. Not expandable to 8 GPUs without switching to rack

120V with 1600W+ PSU. 4x300W GPUs + Threadripper = ~1.6kW peak.

Upgrade path: Synoros Foundry R4-192A or R4-320

WS-384Built to Order
Synoros Foundry WS-384 - 4x RTX PRO 6000 Blackwell 96GB

384GB

Synoros Foundry WS-384

4x Blackwell 96GB in a modern tower: The premium desktop flagship

Full Tower Workstation / Research group, inference provider, or premium buyer who wants 384GB on a desk

GPU4x RTX PRO 6000 Blackwell 96GB
CPUXeon W9-3475X (36C/72T)
RAM256GB DDR5 ECC
DISK2TB NVMe
BUSPCIe 5.0 / Full Tower Workstation

Starting at

$39,999

Built to order (2-4 weeks)

Sweet spot

400B+ models with room to spare

Stretch

Multi-model serving or large-scale training

Configurable

  • +RAM: 256GB DDR5 ECC (standard) / 512GB DDR5 ECC / 1TB DDR5 ECC
  • +CPU: Xeon W9-3495X (56-core) / W9-3475X (36-core)
  • +Storage: 2TB NVMe boot + NVMe data array
  • +NVLink: None (standard) / RTX PRO 6000 NVLink bridge (confirm compatibility at order)
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE/400GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling (recommended for 4x passive GPUs)
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

RTX PRO 6000 is passive-cooled. Requires validated tower airflow (burn-in verified)

600W TDP per card at full power. Power-limiting available (400-600W configurable)

Premium pricing reflects new Blackwell cards + modern Xeon W9 platform

240V recommended. 4x400-600W GPUs + Xeon W9 = 2.5-4kW depending on power config.

Upgrade path: Synoros Foundry S8-768

WS-320Built to Order
Synoros Foundry WS-320 - 4x A100 80GB PCIe

320GB

Synoros Foundry WS-320

4x A100 80GB in a modern tower: Datacenter in a box

Full Tower Workstation / Enterprise team, research lab, or AI startup wanting 320GB without a rack

GPU4x A100 80GB PCIe
CPUXeon W9-3475X (36C/72T)
RAM256GB DDR5 ECC
DISK2TB NVMe
BUSPCIe 4.0 GPUs / 5.0 host / Full Tower Workstation

Starting at

$82,999

Built to order (2-3 weeks)

Sweet spot

400B low-quant with 4-way tensor parallelism

Stretch

Serious 122B/397B production serving

Configurable

  • +RAM: 256GB DDR5 ECC (standard) / 512GB DDR5 ECC / 1TB DDR5 ECC
  • +CPU: Xeon W9-3495X (56-core) / W9-3475X (36-core)
  • +Storage: 2TB NVMe boot + NVMe data array
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE/400GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling for sustained compute
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

A100 PCIe variant. NVLink not available on PCIe A100 (only SXM version has NVLink)

Premium pricing reflects A100 market rates + modern workstation platform

Large tower footprint. This is a full-size workstation, not a compact build

120V with 1600W+ PSU or 240V recommended. 4x300W GPUs + Xeon W9 = ~1.8kW peak.

Upgrade path: Synoros Foundry R4-384 or S8-768

Production rackmount servers

R8-192Built to Order
Synoros Foundry R8-192 - 8x Tesla P40 24GB

192GB

Synoros Foundry R8-192

Maximum VRAM per dollar in an 8-GPU chassis

4U Rackmount / Budget-conscious buyer who needs raw VRAM capacity over speed

GPU8x Tesla P40 24GB
CPUDual Xeon / Dual EPYC (varies)
RAM128GB DDR4 ECC
DISK1TB NVMe
BUSPCIe 3.0 / 4U Rackmount

Starting at

$6,499

Built to order (1-2 weeks)

Sweet spot

122B low-quant with 8-way parallelism

Stretch

400B experiments with aggressive quantization + CPU offload

Configurable

  • +RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
  • +CPU: Dual Xeon / Dual EPYC 7002 (flexible based on chassis)
  • +Storage: 1TB NVMe boot + SATA/NVMe array (hot-swap bays available)
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC / ConnectX-5 100GbE SmartNIC
  • +GPU swap: Replace P40s with V100 32GB (better compute, same socket)
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

P40 has no tensor cores and weak FP16. This is a capacity play, not a speed play

8x250W passive cards = 2kW GPU draw alone. 208/240V mandatory

PCIe 3.0 host limits load/offload speed across all 8 cards

Older compute architecture. Inference speed per token is slower than Ampere/Volta equivalent VRAM

240V required. 8x250W GPUs + dual Xeon/EPYC host = ~2.5-3kW steady state.

Upgrade path: Synoros Foundry R4-192A or R4-320

R4-128Ships in 1-2 weeks
Synoros Foundry Rack 128 - 4x Tesla V100 32GB

128GB

Synoros Foundry Rack 128

Entry rack for clean 70B deployment

4U Rackmount / Small lab or serious home rack builder

GPU4x Tesla V100 32GB
CPUDual Xeon E5-2680 v3 (24C/48T total)
RAM64GB DDR4 ECC
DISK512GB NVMe
BUSPCIe 3.0 / 4U Rackmount

Starting at

$5,499

Ships in 1-2 weeks

Sweet spot

70B at comfortable quantization

Stretch

122B low-quant with CPU offload margin

Configurable

  • +RAM: 64GB DDR4 ECC (standard) / 128GB DDR4 ECC / 256GB DDR4 ECC
  • +CPU: Xeon E5-2680 v3 (budget) / E5-2697 v4 (18-core)
  • +Storage: 512GB NVMe boot (standard) / 1TB NVMe boot / + up to 7x additional 3.5" SATA hot-swap drives (8 bays total)
  • +NVLink: None (standard) / V100 NVLink bridge for paired cards
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-5 25GbE SmartNIC / ConnectX-5 100GbE SmartNIC
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

4x250W passive GPUs on 120V is not comfortable. 208/240V strongly recommended

PCIe 3.0 host limits model load throughput

V100 PCIe supports 2-way NVLink bridge. Available as upgrade for paired cards

208/240V strongly recommended. 4x250W GPUs + host = ~1.5kW total draw.

Upgrade path: Synoros Foundry R4-192

R8-256Built to Order
Synoros Foundry R8-256 - 8x Tesla V100 32GB

256GB

Synoros Foundry R8-256

8x V100 performance rack with 256GB VRAM

4U Rackmount / Serious lab, training workloads, or multi-model serving

GPU8x Tesla V100 32GB
CPUDual Xeon / Dual EPYC (varies)
RAM128GB DDR4 ECC
DISK1TB NVMe
BUSPCIe 3.0 / 4U Rackmount

Starting at

$10,999

Built to order (1-2 weeks)

Sweet spot

70B-122B with 8-way tensor parallelism

Stretch

400B at aggressive quantization with CPU offload

Configurable

  • +RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC / 1TB DDR4 ECC
  • +CPU: Dual Xeon / Dual EPYC 7002 (flexible based on chassis)
  • +Storage: 1TB NVMe boot + SATA/NVMe array (hot-swap bays)
  • +NVLink: None (standard) / V100 NVLink bridges for GPU pairing
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

8x250W passive cards = 2kW GPU draw. 208/240V mandatory

PCIe 3.0 host limits model load throughput but steady-state decode is fine

V100 PCIe supports 2-way NVLink bridge (available as upgrade)

Used enterprise chassis. Cosmetic wear does not affect functionality

240V required. 8x250W GPUs + dual CPU host = ~2.5-3kW steady state.

Upgrade path: Synoros Foundry R4-320 or R4-384

R4-192ABuilt to Order
Synoros Foundry R4-192A - 4x RTX A6000 48GB

192GB

Synoros Foundry R4-192A

Modern CUDA 48GB-per-card rack with tensor cores

4U Rackmount / Startup or lab wanting speed and capacity on NVIDIA CUDA

GPU4x RTX A6000 48GB
CPUEPYC 7313 (16C/32T)
RAM128GB DDR4 ECC
DISK1TB NVMe
BUSPCIe 4.0 / 4U Rackmount

Starting at

$35,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit inference with Ampere tensor cores

Stretch

400B with aggressive quantization + CPU offload

Configurable

  • +RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
  • +CPU: EPYC 7313 (budget) / EPYC 7443 (24-core) / EPYC 7543 (32-core)
  • +Storage: 1TB NVMe boot + NVMe/SATA data array
  • +NVLink: None (standard) / A6000 NVLink bridge for GPU pairing
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling for quieter deployment
  • +GPU upgrade: A6000 Ada (Lovelace) when used prices drop
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

A6000 is a workstation card (active cooling, 300W). Louder than passive server cards

Used A6000 pricing has come down but is still higher than P40/V100 per-GB

4 slots used. Not expandable beyond 192GB without a chassis swap

208/240V recommended. 4x300W GPUs + EPYC host = ~1.8-2kW peak.

Upgrade path: Synoros Foundry R4-320 or R4-384

R4-192WBuilt to Order
Synoros Foundry R4-192W - 4x Radeon Pro W7900 48GB

192GB

Synoros Foundry R4-192W

Modern 48GB-per-card 4-GPU rack

4U Rackmount / Startup, boutique datacenter, or quiet-ish rack buyer

GPU4x Radeon Pro W7900 48GB
CPUEPYC 7313 (16C/32T)
RAM128GB DDR4 ECC
DISK1TB NVMe
BUSPCIe 4.0 / 4U Rackmount

Starting at

$14,999

Built to order (2-3 weeks)

Sweet spot

122B 4-bit inference

Stretch

400B floor still out of reach without CPU offload

Configurable

  • +RAM: 128GB DDR4 ECC (standard) / 256GB DDR4 ECC / 512GB DDR4 ECC
  • +CPU: EPYC 7313 (budget) / EPYC 7443 (24-core) / EPYC 7543 (32-core)
  • +Storage: 1TB NVMe boot + NVMe/SATA data array
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-6 100GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Active workstation GPUs. Easier acoustically, higher per-card cost

AMD ROCm ecosystem. Verify framework compatibility for your stack

Premium pricing reflects current W7900 market rates

208/240V recommended. 4x active-cooled GPUs + EPYC host = ~1.8kW peak.

Upgrade path: Synoros Foundry R4-320 or R4-384

R4-384Built to Order
Synoros Foundry R4-384 - 4x RTX PRO 6000 Blackwell 96GB

384GB

Synoros Foundry R4-384

Premium 4-card single-node 400B platform

4U Rackmount / Provider or research team needing one serious 4-card node

GPU4x RTX PRO 6000 Blackwell 96GB
CPUDual EPYC 7443 (48C/96T total)
RAM256GB DDR4 ECC
DISK2TB NVMe
BUSPCIe 5.0 / 4.0 / 4U Rackmount

Starting at

$31,999

Built to order (2-3 weeks)

Sweet spot

400B-class models with room to spare

Stretch

Serious multi-model serving infrastructure

Configurable

  • +RAM: 256GB DDR4 ECC (standard) / 512GB DDR4 ECC / 1TB DDR4 ECC
  • +CPU: Dual EPYC 7443 (standard) / 7713 (64-core)
  • +Storage: 2TB NVMe boot + NVMe/SATA data array
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling
  • +PCIe 5.0 host platform upgrade (eliminates bandwidth compromise)
  • +GPU expansion: 8x RTX PRO 6000 (768GB) in 8-GPU chassis
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

PCIe 4.0 host halves host-link bandwidth vs Gen5. Acceptable for inference, not ideal for high-throughput batching

600W TDP per card at full power. Power-limiting available (400-600W configurable)

New-generation cards. Pricing reflects current market, not used/refurbished

240V required. 4x400-600W GPUs + host = 3-4kW depending on power config.

Upgrade path: Synoros Foundry S8-768

R4-320Built to Order
Synoros Foundry R4-320 - 4x A100 80GB PCIe

320GB

Synoros Foundry R4-320

Used-enterprise 80GB/card performance rack

4U Rackmount / Enterprise pilot or premium used-rack buyer

GPU4x A100 80GB PCIe
CPUDual EPYC 7443 (48C/96T total)
RAM256GB DDR4 ECC
DISK2TB NVMe
BUSPCIe 4.0 / 4U Rackmount

Starting at

$75,999

Built to order (2-3 weeks)

Sweet spot

122B/397B serving with tensor parallelism

Stretch

400B low-quant single-node inference

Configurable

  • +RAM: 256GB DDR4 ECC (standard) / 512GB DDR4 ECC / 1TB DDR4 ECC
  • +CPU: Dual EPYC 7443 (standard) / 7543 (32-core) / 7713 (64-core)
  • +Storage: 2TB NVMe boot + up to 8x NVMe/SATA hot-swap data drives
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC / ConnectX-7 400GbE SmartNIC
  • +Cooling: Liquid cooling option for datacenter deployment
  • +GPU expansion: Add 4 more A100s (640GB total) in same chassis
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

Premium used-enterprise gear. Not mainstream budget hardware

A100 PCIe variant (not SXM). NVLink not available

High power draw requires dedicated 208/240V circuit

240V required. 4x300W GPUs + dual EPYC host = ~2.5kW steady state.

Upgrade path: Synoros Foundry R4-384 or S8-768

S8-768Built to Order
Synoros Foundry S8-768 - 8x RTX PRO 6000 Blackwell 96GB

768GB

Synoros Foundry S8-768

Budget-datacenter flagship for large-model serving

4U/5U Rackmount / Budget datacenter, inference provider, or halo SKU buyer

GPU8x RTX PRO 6000 Blackwell 96GB
CPUDual EPYC 7443 (48C/96T total)
RAM512GB DDR4 ECC
DISK2TB NVMe
BUSPCIe 4.0 host / 5.0 GPUs / 4U/5U Rackmount

Starting at

$59,999

Built to order (3-4 weeks)

Sweet spot

400B+ models with full tensor parallelism

Stretch

Multi-model serving infrastructure or training workloads

Configurable

  • +RAM: 512GB DDR4 ECC (standard) / 1TB DDR4 ECC / 2TB DDR4 ECC
  • +CPU: Dual EPYC 7443 (standard) / 7713 (64-core)
  • +Storage: 2TB NVMe boot + up to 24x 2.5" hot-swap SATA/NVMe bays
  • +Network: 1GbE onboard (standard) / 10GbE SFP+ (Mellanox ConnectX-3) / ConnectX-6 25GbE SmartNIC / ConnectX-7 200GbE SmartNIC / ConnectX-7 400GbE SmartNIC
  • +Cooling: Air cooled (standard) / Liquid cooling for sustained full-power operation
  • +PCIe 5.0 host platform upgrade (eliminates all bandwidth compromise)
  • +Power: Dual 240V 30A circuits for redundancy
  • +Display: Headless (no display adapter, SSH only, standard) / GT 710 1GB basic display / Quadro P400 3x 4K display / Quadro P620 4x 4K + streaming
  • +OS: Costa OS (standard) / Ubuntu 24.04 LTS / Windows 11 Pro / No OS

PCIe 4.0 host is the cost-down move. Gen5 host would add significant cost

5kW+ power draw at full load. 240V 30A minimum, 40A recommended

Passive GPUs require validated server chassis airflow. Do not attempt in consumer cases

240V 30A minimum (40A recommended). 8x400-600W GPUs + dual EPYC + 512GB RAM = 4-6kW.

Upgrade path: 2-node cluster or premium Gen5 host refresh

What Can You Run

Green = comfortable, amber = possible with quantization, dash = not viable.

Model Size
CB-16
16GB
CB-24
24GB
CB-32
32GB
CB-32D
32GB
CB-48
48GB
CB-64
64GB
CO-12
12GB
CO-16
16GB
WS-128
128GB
WS-192T
192GB
WS-320
320GB
WS-384
384GB
WS-48
48GB
WS-72
72GB
WS-96
96GB
R4-128
128GB
R8-192
192GB
R8-256
256GB
R4-192A
192GB
R4-192W
192GB
R4-320
320GB
R4-384
384GB
S8-768
768GB
7B
13B
24B-35B~~~
70B--~~~--~
122B--------~-~~~
400B+---------~----~~~~

Model VRAM Guide

Approximate VRAM at Q4_K_M and Q8 quantization. Use this to match models to configurations above.

ModelParamsQ4 VRAMQ8 VRAMNotes
Qwen 3.5qwen3.5:0.8b0.8B1.5 GB2 GB
qwen3.5:2b2B2.5 GB3.5 GB
qwen3.5:4b4.7B4 GB6.5 GB
qwen3.5:9b9.7B7 GB12 GB
Qwen 3qwen3:14b14B9 GB16 GB
Aprielapriel-15b-thinker15B10 GB17 GB
GPT-OSSgpt-oss:20b21B (3.6B active) MoEMXFP4: 16 GB
Llama 3llama3.1:8b8B5.5 GB9.5 GB
llama3.3:70b70B40 GB74 GB
Mistralmistral-small3.1:24b24B14 GB26 GB
mistral-nemo:12b12B8 GB14 GB
Gemma 3gemma3:4b4B3.5 GB5.5 GB
gemma3:12b12B8 GB14 GB
gemma3:27b27B17 GB30 GB
Phiphi4:14b14B9 GB16 GB
DeepSeek R1deepseek-r1:7b7B5 GB8.5 GB
deepseek-r1:14b14B9 GB16 GB
deepseek-r1:32b32B19 GB35 GB
Devstraldevstral24B (MoE) MoE14 GB26 GB
Qwen Coderqwen3-coder30B (3.3B active) MoE8.5 GB18 GB
qwen2.5-coder:14b14B10 GB16 GB

Green = fits 16 GB, amber = tight fit, red = needs more VRAM. MoE models load all weights but activate a fraction per token. See our quantization guide for details.

Performance Benchmarks

Published inference speeds from third-party benchmarks. Single GPU, single user, llama.cpp / Ollama.

7B Q4 Generation (tokens/sec, single GPU)

~35-50

P100 16GB

~41

P40 24GB

~85-107

V100 32GB

~102

A6000 48GB

~121

W7900 48GB

~138

A100 80GB

~185

RTX PRO 6000

Sources: GPU-Benchmarks-on-LLM-Inference (GitHub), DatabaseMart, Hardware Corner, LocalScore, GamersNexus.

Frequently Asked Questions

Common questions about our GPU server configurations.

What GPU do I need to run a 70B parameter model locally?
Running a 70B model (like Llama 3 70B) at Q4 quantization requires roughly 40GB of VRAM minimum. Our CostaBox Duo 48 (2x P40, 48GB) handles this comfortably, while the CostaBox Duo 64 (2x V100, 64GB) gives headroom for higher quantization and longer context. For full-precision 70B, look at our Workstation or Rackmount configurations with 96GB+ VRAM.
Are these servers new or refurbished?
Foundry Lite and Workstation builds use quality-certified refurbished enterprise components sourced from verified datacenters. Every GPU, CPU, and memory module is individually tested during a 48-hour burn-in with real LLM workloads before shipping. GPUs carry a 90-day warranty, non-GPU components 1 year, with DOA replacement within 14 days. CostaBox PCs (CO-12, CO-16) use all-new retail components.
Can I use these servers for training or just inference?
Our servers are primarily optimized for inference, which is what most local LLM users need. Training is possible on higher-end configurations with A100 or RTX PRO 6000 GPUs that have sufficient memory bandwidth and tensor core performance. For inference workloads, even our budget P100 and P40 builds deliver excellent tokens-per-second at a fraction of cloud API costs.
What is the difference between P100, P40, V100, A100, and RTX PRO 6000?
P100 (16GB, no tensor cores) is our budget entry point for 7B-13B models. P40 (24GB, no tensor cores) fits larger models at the best price per GB of VRAM. V100 (32GB, tensor cores) is the sweet spot for 24B-35B models with faster inference. A100 (80GB, 3rd-gen tensor cores) handles 70B+ models with high throughput. RTX PRO 6000 (96GB, latest architecture) is the fastest single-GPU option for maximum model size and speed.
Do these servers come with an operating system?
Every build ships with Costa OS (our Arch-based Linux with Ollama pre-configured) by default. You can choose Ubuntu 24.04 LTS, Windows 11 Pro, or no OS during configuration. Costa OS includes local model routing, voice control, and agent navigation, but any Linux distro or Windows will work with the hardware.
What warranty coverage do GPU servers have?
Non-GPU components (CPU, RAM, motherboard, PSU, storage, chassis) are covered for 1 year. Used datacenter GPUs are covered for 90 days. New GPUs (in CostaBox PCs and select configurations) carry a full 1-year warranty. Dead-on-arrival replacement is handled within 14 days. Full warranty terms are available on our warranty page.
Can I customize the configuration before ordering?
Yes. Every SKU has a configurator page where you can upgrade RAM, storage, CPU, networking, and GPU options with live pricing. For builds outside our standard options, contact sales for a fully custom configuration quote. We respond within 24 hours.

Warranty & Support

1 Year

Non-GPU warranty

90 Days

Used GPU warranty

10 Days

RMA turnaround target

Direct

Builder email support

Non-GPU components covered for 1 year. Used GPUs (which may have prior mining or datacenter runtime) covered for 90 days. New GPUs carry a full 1-year warranty. DOA replacement within 14 days.

Read full warranty terms →

Ready to Build?

Tell us what you need. We respond within 24 hours. Custom configurations available on request.

We respond within 24 hours. You will receive an email confirmation.