maxtext.layers.engram module#
- DeepSeek-AI, `Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
<https://arxiv.org/pdf/2601.07372>`_, 2026
Reference implementation: deepseek-ai/Engram
- class maxtext.layers.engram.CompressedTokenizer(tokenizer)[source]#
Bases:
objectA canonicalizing wrapper that reduces vocabulary sparsity for n-gram lookup.
This class maps semantically equivalent tokens (e.g., “Apple”, “ apple”, “APPLE”) to a single unified ID. This many-to-one mapping significantly reduces the combinatorial size of the n-gram space.
- Parameters:
tokenizer (HFTokenizer)
- lookup_table#
Array mapping original_id -> compressed_id.
- num_new_token#
Size of the compressed vocabulary.
- class maxtext.layers.engram.NgramHashMapping(engram_vocab_bases, max_ngram_size, engram_num_heads, layer_ids, tokenizer, pad_id, seed)[source]#
Bases:
objectDeterministically maps token indices to n-gram hash indices for embedding lookups.
This class implements Multi-Head Hashing to bypass the combinatorial memory requirements of explicit n-gram vocabularies. Specifically, it applies multiplicative-XOR hashing to each n-gram window.
Key Mechanisms for Collision Mitigation: - Multi-Head Factorization: Uses K distinct hash heads per n-gram order to increase
effective capacity within fixed memory constraints.
Unique Prime Moduli: Assigns a unique prime vocabulary size to each head to minimize simultaneous collisions.
- Parameters:
engram_vocab_bases (List[int])
max_ngram_size (int)
engram_num_heads (int)
layer_ids (List[int])
tokenizer (HFTokenizer)
pad_id (int)
seed (int)
- class maxtext.layers.engram.StaticWrapper(val)[source]#
Bases:
objectWrapper to prevent nnx from treating the value as a variable.
- class maxtext.layers.engram.MultiHeadEmbedding(*args, **kwargs)[source]#
Bases:
ModuleA flattened table representation for multi-head embedding spaces across n-gram orders.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.layers.engram.ShortConv(*args, **kwargs)[source]#
Bases:
ModuleDepthwise causal 1D convolution, with multi-branch integration.
Applies local temporal smoothing - Independent RMSNorms to each branch - Convolution to mix time steps [t-k, t]
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.layers.engram.Engram(*args, **kwargs)[source]#
Bases:
ModuleEngram Memory Layer with n-gram embedding, with multi-branch integration.
Main components: - Context-independent Retrieval: Fetch static n-gram embeddings via Multi-Head Hashing. - Context-aware Gating: Compute similarity between memory (Key) and context (Query) to determine relevance. - Mix: Apply local temporal smoothing via convolution.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any