Premium Yebisu Beer, Born in 1890, Tokyo

PRAXIS - A Maximally GPU-Optimized Digital Organoid

With the recent uptick in neuromorphic computing and increasing demand for recurrence and statefulness in LLMs, brain-inspired computing architectures are gaining renewed interest. Yet the field faces a fundamental dilemma: how do we reconcile biological realism with the constraints of modern computing hardware?

The neuromorphic computing community has largely bet on custom silicon - spending billions to design and manufacture specialized chips that can efficiently handle spike-based, asynchronous neural networks. But this approach overlooks a critical opportunity: what if algorithmic modifications could achieve brain-like computing capabilities using the massive GPU infrastructure already deployed for deep learning?

 

The GPU-first philosophy

Deep learning succeeded by abandoning biological fidelity. Neurons became matrix multiplications, spikes became floating-point activations, and complex dendritic trees simplified into linear transformations. This pragmatism unlocked unprecedented scale and capability.

We can propose taking the same pragmatic approach to neuromorphic computing. Rather than forcing biology’s asynchronous, spike-based computation onto synchronous hardware, we can ask: which biological principles truly matter for intelligent learning, and how can we preserve them while maximizing GPU efficiency?

This writing introduces PRAXIS (Practical Recurrent Additive eXpander Intelligence System) — a new framework for building “digital organoids” that captures the essential learning dynamics of biological neural networks while running efficiently on standard GPU hardware. By “digital organoid,” we mean a computational system that preserves the self-organizing, adaptive learning properties of biological neural tissue while being optimized for digital substrates.

 

Core principles

PRAXIS makes deliberate tradeoffs, prioritizing:

 

The problems with traditional neuromorphic computing

Traditional approaches to digital brain simulation suffer from a fundamental mismatch between biological operating principles and modern computing architectures. We can first examine why naive brain emulations fail on today’s hardware.

 

Asynchronous spikes vs. synchronous hardware

Biological neurons communicate through discrete, asynchronous spikes — each neuron fires independently based on its internal dynamics. This event-driven computation is elegant in wetware but catastrophic for GPUs, which achieve their tremendous throughput through lockstep parallel execution. Every spike requires branching logic, breaking the SIMD (Single Instruction, Multiple Data) paradigm that makes GPUs fast.

 

Information bottleneck

A single spike carries exactly one bit of information: fired or not fired. To convey analog values, biological neurons must integrate spike trains over time, requiring hundreds of computational steps to transmit what a floating point number represents instantly. This temporal coding is massively inefficient on hardware designed for dense matrix operations.

 

Sparse, irregular connectivity

Real neural networks follow log-normal distributions — most neurons have few connections while a small subset are highly connected hubs. This irregular topology requires sparse matrix operations and indirect memory access patterns that devastate GPU performance. Worse, the number of synapses grows quadratically with neuron count, creating a memory wall that limits practical network sizes.

 

Biological baggage

Traditional neuromorphic systems faithfully reproduce biological constraints that exist for metabolic, not computational reasons:

 

Added together, these features consume enormous computational resources while contributing little to learning capability.

 

The scaling catastrophe

The combination of these factors creates a perfect bottleneck: as networks grow, they become exponentially slower and more memory intensive. A biologically realistic simulation of even a small cortical column can bring a GPU cluster to its knees, while a transformer model with billions of parameters runs smoothly on a single accelerator.

This is why PRAXIS takes a radically different approach — preserving the computational principles that make brains powerful learners while reimagining their implementation for modern hardware.

 

Core adaptations for GPU efficiency

To bridge the gap between brain-like computation and GPU architecture, PRAXIS implements several key adaptations. Each represents a deliberate tradeoff — sacrificing biological fidelity for massive gains in computational efficiency while preserving the essential dynamics that enable learning.

 

1. Synchronous tick-based execution

Instead of asynchronous, event-driven spikes, PRAXIS operates on synchronized global time steps where all neurons update simultaneously. This aligns perfectly with GPUs, where thousands of cores execute the same instruction in parallel. The entire network state can be updated with a single matrix multiplication rather than complex event queues and conditional logic.

While some learning mechanisms that depend on exact spike sequences (fine grained spike timing and precise temporal coding) are lost, the essential learning dynamics remain intact. After all, artificial neural networks have demonstrated powerful learning with synchronous updates. The brain’s asynchrony may be more about biological constraints than computational necessity.

 

2. Rate-coded analog values

Rather than binary spikes, each neuron outputs a continuous value representing its expected spike rate over the time window of a single computational tick — essentially converting discrete events into analog streams. A single analog value (e.g. signed 8 bit integers) can encode what would require dozens of binary spikes, dramatically increasing information density per compute cycle. Matrix operations on continuous values map directly to tensor cores and benefit from decades of GPU optimization.

Losses on learning mechanisms that rely on individual spike timing precision do occur, but the core principle of importance — neurons with higher activation influencing their targets more strongly - is preserved. Research shows that most neural computation relies on firing rates rather than precise spike timing, especially for learning over longer timescales.

 

The key insight

These adaptations share a common philosophy: the medium is not the message. Whether information flows through discrete spikes or continuous values, whether updates happen asynchronously or in lockstep — these are implementation details. What matters is preserving the computational principles that enable flexible, powerful learning.

By making these pragmatic choices, PRAXIS can leverage the full power of modern GPU hardware while maintaining the essential dynamics that make biological neural networks such capable learners. The next adaptations take this philosophy even further, reimagining fundamental aspects of neural computation for the realities of silicon.

 

The Gaussian distribution revolution

While synchronous execution and rate coding have been explored in prior literature, PRAXIS introduces a genuinely novel insight that unlocks dramatic efficiency gains: enforcing Gaussian distributions throughout the network.

 

The problem with log-normal networks

Biological neural networks — and most artificial networks that use multiplicative operations — naturally develop log-normal distributions. This happens because neural activations typically result from multiplying many factors together (synaptic weights, gating mechanisms, normalization terms), and the product of many positive random variables tends toward a log-normal distribution.

This creates two critical problems:

  1. Memory inefficiency: log-normal distributions have long tails, requiring wide numerical ranges to capture rare but important values - most bits are wasted representing extremely large or small values that rarely occur.

  2. Connectivity explosion: to reliably sum to a target value from log-normally distributed inputs, you need many more connections to average out the high variance - this drives the quadratic growth in synapses that plagues biological simulations.

 

The Gaussian solution

PRAXIS actively maintains all neuron activations and synaptic weights in a Gaussian (normal) distribution centered at zero with unit variance. This seemingly simple constraint gifts profound implications:

 

1. Extreme compression without information loss

With values guaranteed to follow predictable ranges, we can confidently use signed 8 bit integers instead of 32 bit floats. Since 99.7% of values fall within ±3 standard deviations, we lose virtually no information while achieving 4 times the memory compression. This isn’t just quantization — it’s quantization with mathematical guarantees.

 

2. Natural sparsity through concentration of measure

An even more promising gain is that: when inputs follow a Gaussian distribution, you need only k√N connections per neuron (where k ≈ 2-4) rather than N to capture most relevant information. This is due to concentration of measure phenomena in high dimensions. For example, with 1000 neurons, each needs only ~32 connections instead of 1000 — a 30 times reduction in memory and computation. This further reduces computing and (more importantly) memory consumption.

 

3. Inherent stability

Gaussian distributions are closed under addition — adding Gaussian variables yields another Gaussian. This makes the network naturally stable, preventing the gradient explosions and vanishing that plague deep networks. No careful initialization schemes or gradient clippings are needed.

 

Implementation: the additive constraint

Maintaining Gaussian distributions requires a fundamental shift: all operations must be additive, not multiplicative. Traditional weight updates such as w = w * gradient are forbidden. Instead, PRAXIS uses pure addition: w = w + delta.

There however exists a fundamental limitation of such systems alone: additive-only networks are not Turing complete. A feedforward network using only addition and linear operations cannot compute arbitrary functions.

In order to achieve Turing-completeness, we must add: 1) recurrence (feedback of the previous hidden state into the next step), and 2) at least one non-polynomial activation function. Such recurrent nets with smooth activations were proven to be Turing complete by Siegelmann & Sontag.

 

Adding in recurrence

PRAXIS solves this by introducing recurrence in the network. Each neuron maintains a hidden state that feeds back into its next computation. Combined with at least one non-polynomial activation function (specifically, a custom saturating tanh function defined later), this restores Turing completeness while preserving the Gaussian property.

Remarkably, this recurrent state doesn’t just restore computational power - it enables further sophistication of learning through eligibility traces, allowing the network to bridge temporal gaps between causes and effects. The hidden values necessary for recurrence naturally provide the signals needed to compute eligibility traces.

Eligibility traces are essentially a “short-term memory” of recent neural coincidences which can wait for a global signal (e.g. a reward or disincentive) before deciding how the synaptic weight is updated. A biological example of such global signals is dopamine, acting as a signaler to tell neurons which recent activities were worthwhile. This allows an easy expansion to make use of three-factor learning rules - where synaptic updates consider presynaptic activity, postsynaptic activity, and a global modulation signal. This closely mirrors how real brains are thought to make learning progress.

This marriage of Gaussian distributions and recurrent dynamics isn’t just a technical hack. It’s a fundamentally different way of thinking about neural computation — one optimized for the realities of digital hardware rather than the constraints of biology.

Combined together, PRAXIS implements learning through a biologically plausible mechanism combining three factors:

This approach, coined as Additive STDP with Eligibility Traces (A-STDP-ET), allows PRAXIS to learn from various signals — whether from supervised learning, reinforcement learning, or self-supervised objectives—while keeping all computations local and biologically realistic.

 

Breaking free from physical topology: 0-dimensionalism

The tyranny of space in biological brains

Biological neural networks are prisoners of physics. Neurons occupy physical space, axons must traverse real distances, and connections require metabolic energy to maintain. These constraints fundamentally shape brain architecture:

 

Digital spatial freedom

In silicon, these constraints vanish. There’s no metabolic cost to a long distance connection, no signal degradation over distance, no physical volume to optimize. A synapse between adjacent neurons costs exactly the same as one spanning the entire network.

Yet most neural network architectures slavishly copy biological structures — layers, local connectivity, hierarchical organization. PRAXIS asks: what’s the optimal connectivity pattern when physics doesn’t matter?.

 

Enter expander graphs

The answer comes from pure mathematics: expander graphs. These structures achieve seemingly impossible efficiency:

Think of it as the “small world” phenomenon perfected. For example, in a network of a million neurons, each connects to just 1,000 others, yet information travels between any pair in under 20 hops.

 

Ramanujan graphs: optimal expanders

PRAXIS specifically uses Ramanujan graphs, which are mathematically proven to be optimal expanders. The construction is elegant:

  1. Choose a prime number p ≈ √N for the degree
  2. Use modular arithmetic with primitive roots to deterministically generate connections
  3. Each neuron’s connections can be computed on the fly from its index

No connection matrices to store. No irregular memory access patterns. Just pure mathematical relationships that GPUs can compute in parallel.

 

The advantages of topology-free design

This radical connectivity pattern delivers multiple benefits:

By abandoning the physical constraints that shape biological brains, PRAXIS achieves a connectivity pattern that’s not just different — it’s mathematically optimal for information processing in digital substrates.

 

Implementation architecture

With the core principles established, we can now examine how PRAXIS actually computes. The implementation elegantly combines all our design choices into a unified system that runs efficiently on GPUs while maintaining brain-like learning dynamics.

 

1. Neuron dynamics

Each neuron in PRAXIS maintains a simple state that evolves over discrete time steps. Unlike traditional artificial neurons that instantly compute outputs, PRAXIS neurons have temporal dynamics inspired by biological membrane potentials:

membrane_potential[j] = leak_rate * previous_activation[j]                                       
                      + sum(weight[i,j] * previous_activation[i] for i in pre_synaptic_neurons)  
                      + sum(input_weight[k,j] * external_input[k])                               
                      + bias[j]                                                                  
                                                                                                 
activation[j] = sat_tanh(membrane_potential[j])                                                  

Where the key design choices are:

And the key parameters being:

The saturating activation function is crucial:

sat_tanh(x) = 2·tanh(x/2)

This maintains near-linear dynamics in the normal operating range while preventing extreme values.

 

2. Efficient connectivity via Ramanujan graphs

Rather than storing massive connection matrices, PRAXIS generates connectivity on the fly using mathematical properties:

For a network with N neurons:

  1. Each neuron has degree d = k√N (where k ≈ 2-4)
  2. Choose a prime p that is close to d
  3. For each neuron v, connect to neighbors using standard Ramanujan graph construction algorithms
  4. Double the pattern with a second generator set for redundancy

Benefits:

 

3. Learning via Additive STDP with Eligibility Traces (A-STDP-ET)

PRAXIS implements a biologically-inspired learning rule that combines local correlation detection with global modulation signals — similar to how dopamine modulates learning in the brain.

 

Phase 1: local correlation tracking (per every tick)

Each synapse maintains an eligibility trace that tracks correlation between presynaptic and postsynaptic activity, with the trace decaying over time:

eligibility[i,j] = decay_rate * eligibility[i,j] + pre_neuron[i] * post_neuron[j]

This creates a “memory” of which connections were recently active together. The decay rate should be high enough (e.g., 0.9+) to maintain temporal information across multiple ticks.

 

Phase 2: weight updates with global modulation

Actual weight changes combine local correlation with global teaching signals:

weight[i,j] = weight[i,j] + learning_term - decay_term

Where:

learning_term = learning_rate * modulator * eligibility[i,j]
decay_term = weight_decay * weight[i,j]

From where:

The decay_term prevents weights from growing indefinitely, where:

The global modulator depends on the learning scenario:

 

Phase 3: maintain Gaussian distribution

After updates, weights are clipped to maintain our crucial Gaussian property:

weight[i,j] = clip(weight[i,j], -3 * weight_stddev, +3 * weight_stddev)

This hard boundary prevents outlier weights, maintains int8 quantization quality, and keeps the network stable, all while preserving 99.7% of the distribution.

Note: specific parameter values require empirical tuning. This writing focuses on key insights of algorithmic structure that maintains Gaussian distributions while enabling brain-like learning, not the exact constants.

 

Why this design?

 

4. Homeostatic regulation

To maintain stability without explicit normalization layers, PRAXIS implements two biological-inspired homeostatic mechanisms:

 

Variance control

Monitor and adaptively control activation variance to maintain stable dynamics:

Track the running variance of neuron activations and scale weights if needed:

activation_variance = momentum * activation_variance + (1 - momentum) * variance(current_activations)  
                                                                                                       
if activation_variance > 1.2:                                                                          
    all_weights_to_neuron[j] = all_weights_to_neuron[j] / sqrt(activation_variance)                    

This uses a very slow momentum (close to 1) to avoid disrupting learned patterns. The example variance threshold of 1.2 is projected to allow slight fluctuation above unit variance while preventing runaway dynamics, but further calibrations would reveal more fitting values.

This scales entire columns of the weight matrix uniformly, preserving relative strengths while controlling magnitude.

 

Firing rate homeostasis

Each neuron adjusts its bias to maintain balanced activation:

mean_rate[j] = exponential_moving_average(activation[j])    
bias[j] -= homeostasis_rate * (mean_rate[j] - target_rate)  

The exponential moving average tracks the neuron’s average activation over recent history. The homeostasis mechanism gently corrects biases to keep neurons centered around zero activation, preventing them from becoming permanently silent or saturated.

 

5. Quantization strategy

PRAXIS’s Gaussian constraints enable aggressive quantization without information loss:

During computation, dot products accumulate in int32 to prevent overflow, then dequantize once per matrix multiplication. This approach maximizes compute density—leveraging int8 tensor cores—while maintaining numerical stability throughout the network.

 

The unified system

These components all work together in synergy:

The result is a system that learns like a brain but computes like a modern neural network — the best of both worlds.

 

Scaling to extreme networks

PRAXIS’s design truly shines at massive scale. While traditional approaches hit memory walls with N^2 connectivity, PRAXIS needs only N^1.5 connections - the difference between impossible and practical for billion-neuron networks.

The numbers are compelling: A billion-neuron PRAXIS network requires approximately 32TB with int8 weights (10^9 neurons × √(10^9) connections × 1 byte ≈ 32TB), fitting on a single modern GPU cluster. A traditional fully connected approach would need 1000x more memory.

Three key scaling advantages exist for PRAXIS:

The main scaling challenge: global synchrony. Updating billions of neurons per tick pushes memory bandwidth limits. Future work might explore hierarchical time scales or sparse temporal updates.

If the theoretical promise holds, PRAXIS could be the first practical path to human-brain-scale networks on existing hardware - no exotic neuromorphic chips required.

 

Neuromorphic computing, but NOW

PRAXIS represents a fundamental rethinking of neuromorphic computing by bridging the gap between digital organoids and practical deployment. Where the organoid paradigm promises brain-like learning, PRAXIS delivers it on modern GPUs at scale. As a digital organoid, PRAXIS captures the self-organizing, adaptive properties of biological neural networks while being engineered for silicon substrates.

PRAXIS dissolves the false choice between biological realism and computational efficiency: we don’t need biological fidelity to capture biological learning dynamics. It achieves what traditional approaches cannot:

This isn’t just another neural network architecture — it’s proof that brain-inspired computing can be both biologically meaningful and computationally practical. As we push toward more capable AI, PRAXIS shows that the path forward isn’t choosing between brains or machines, but building systems that learn like one while computing like the other.

With PRAXIS, new possibilities are opened in the field of AI:

The digital organoid revolution doesn’t require waiting for neuromorphic hardware. It’s here, it’s practical, and it runs on the GPUs you already have. The future of neuromorphic computing isn’t in perfect neural simulation. It’s in pragmatic systems that capture what matters while exploiting every advantage our digital substrate provides. PRAXIS lights the way forward.