Chapter 2 – Machine Learning Basics for Security

2.1 Core Machine‑Learning Concepts

Supervised vs. Unsupervised: Labelled data for anomaly detection vs. clustering of unknown patterns.
Feature Engineering: Transform raw logs, network flows, or threat‑intel into numeric vectors.
Model Evaluation: Accuracy, precision/recall, ROC‑AUC, and the importance of a realistic validation set.
Overfitting & Regularization: Techniques such as dropout, L1/L2 penalties, and cross‑validation.

2.2 Data Pipelines for Security

Ingestion: Beats, Logstash, or custom collectors feeding into a central store (Elasticsearch, PostgreSQL, or a vector database).
Pre‑processing: Normalization, tokenization, and embedding generation (e.g., Sentence‑Transformers for log text).
Storage: Vector databases (FAISS, Milvus) for similarity search; relational DBs for structured telemetry.
Serving: REST/GraphQL endpoints or batch jobs that score new data against the trained model.

2.3 Model Types & Use‑Cases

Model	Typical Security Use‑Case	Example Tool
Logistic Regression	Binary threat vs. benign classification	scikit‑learn
Random Forest	Feature‑rich anomaly detection	scikit‑learn
Autoencoder	Unsupervised anomaly detection	PyTorch / TensorFlow
LLM (e.g., Llama‑2)	Log summarization, IOC extraction	Hugging Face Transformers
Graph Neural Network	Threat‑intel graph analysis	PyTorch Geometric

2.4 Open‑Source AI Tools for Security

Llama‑2: Large‑language model for natural‑language log analysis and policy generation.
Mistral: Lightweight LLM for on‑prem inference.
Sentence‑Transformers: Generate dense embeddings for semantic similarity.
FAISS: Efficient similarity search for large embedding collections.
Stable‑Baselines3: Reinforcement‑learning framework for automated play‑book generation.

This chapter equips readers with the foundational ML knowledge needed to build and evaluate security‑specific models, and introduces the open‑source tools that will be explored in later chapters.

Chapter 2 – Machine Learning Basics for Security#

2.1 Core Machine‑Learning Concepts#

2.2 Data Pipelines for Security#

2.3 Model Types & Use‑Cases#

2.4 Open‑Source AI Tools for Security#

Chapter 2 – Machine Learning Basics for Security

2.1 Core Machine‑Learning Concepts

2.2 Data Pipelines for Security

2.3 Model Types & Use‑Cases

2.4 Open‑Source AI Tools for Security