AI
NN Memory
- Memory: 3.6 bytes per parameter
- Double Descent: At first, the model tries to just memorize the data, so it gradually improves. But once its memory fills up, it has to slowly start discarding what it memorized and actually learn the underlying logic of the subject. As a result, the model keeps getting better over time, then suddenly its quality drops, and after that, it begins improving again.
Transformer Layers
Different layers in a transformer specialize in different linguistic properties:
- Lower layers: Capture syntax and token identity.
- Middle layers: Handle grammar and sentence structure.
- Deeper layers: Focus on semantics, reasoning, and factual recall.
H100 Varieties
| Feature | H100 PCIe | H100 SXM | H100 NVL |
|---|---|---|---|
| Connection | Standard PCIe slot | Custom socket on DGX/HGX boards | PCIe slot (two cards bridged) |
| RAM to VRAM Speed | Slower (limited by PCIe bus) | Fastest | Slower (limited by PCIe bus) |
| GPU to GPU Speed | Slower | Fastest (via NVLink on the board) | Very Fast (via dedicated NVLink Bridge) |
| VRAM Type | HBM2e | HBM3 | HBM3 |
| VRAM Size | 80 GB | 80 GB | 192 GB total (2x 96 GB) |
| Best For | General compatibility, single-card setups where PCIe bandwidth isn't a bottleneck. | Demanding single-GPU or multi-GPU tasks needing the highest RAM-to-GPU bandwidth. | Dual-GPU tasks with massive data transfer between the two GPUs. |
| Key Downside | Slower data transfer to and from system RAM. | Requires specific, expensive motherboards (DGX/HGX). | Acts as a pair; not ideal for scaling beyond two bridged GPUs. |
Benchmarks
| Benchmark | Focus Area | Description | Example Metrics |
|---|---|---|---|
| MMLU (Massive Multitask Language Understanding) | General knowledge & reasoning | 57 subjects covering STEM, humanities, social sciences, etc. | Accuracy (%) |
| HellaSwag | Commonsense reasoning | Tests everyday scenario understanding | Accuracy (%) |
| ARC (AI2 Reasoning Challenge) | Logical reasoning | Grade-school-level science and reasoning questions | Accuracy (%) |
| GSM8K (Grade School Math 8K) | Math reasoning | Solving elementary school-level math problems | Accuracy (%) |
| MATH | Advanced math | Tests high-school & olympiad-level math ability | Accuracy (%) |
| BBH (BigBench Hard) | Complex reasoning | Harder subset of BigBench tasks, including ethics, social dynamics | Accuracy (%) |
| TruthfulQA | Truthfulness | Measures resistance to misinformation and factual consistency | Truthfulness Score (%) |
| MT-Bench | Multi-turn chat | Evaluates LLMs in a conversational multi-turn dialogue setting | Score (1-10) |
| HumanEval | Code generation | Tests LLMs' ability to write functional code | Pass@1 (%) |
| MBPP (Multi-turn Python Benchmark) | Python programming | Evaluates Python code generation for multi-turn problem-solving | Pass@1 (%) |
| OpenAI's Chatbot Arena | Overall LLM ranking | Human preference ranking of chatbot responses | Elo Score |
| AGIEval | Human-like intelligence | Measures model performance on human exams (SAT, GRE, LSAT, etc.) | Score (%) |
| SuperGLUE | NLP general tasks | Evaluates performance across a variety of NLP tasks | Accuracy (%) |
| TyDiQA | Multilingual QA | Tests question-answering ability in multiple languages | F1 Score |
| GSM-PLUS | Math reasoning & tool use | An extension of GSM8K that requires planning and tool use | Accuracy (%) |
| TMGBench | Table-based reasoning | A benchmark for table-based machine reading comprehension | F1 Score |
| VL-RewardBench | Vision-language alignment | Evaluates how well vision-language models align with human preferences | Accuracy (%) |
Singularity
- Researcher: Ray Kurzweil
- Reference: Based on Moore's law
Godfathers
Scikit-Learn
from sklearn.compose import ColumnTransformer, make_column_transformer, make_column_selector
from sklearn.pipeline import make_pipeline
num_pipeline = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
cat_pipeline = make_pipeline(SimpleImputer(strategy="most_frequent"), OneHotEncoder(handle_unknown="ignore"))
# Automatic approach
preprocessing = make_column_transformer(
(num_pipeline, make_column_selector(dtype_include=np.number)), # Detects numerical columns
(cat_pipeline, make_column_selector(dtype_include=object)), # Detects categorical columns
)
# Manual approach (to have more control)
num_attribs = ["longitude", "latitude", ...]
cat_attribs = ["ocean_proximity", ...]
preprocessing = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", cat_pipeline, cat_attribs),
])

