Loading and Inspecting Data

First steps with any dataset

import pandas as pd
 
df = pd.read_csv("dataset.csv")
 
# Always run these first
print(df.shape)           # how big is it?
print(df.head())          # what does it look like?
print(df.info())          # column types, missing values
print(df.describe())      # statistics for numeric columns
print(df.isnull().sum())  # missing values per column

Common formats

Format	Load with	Notes
CSV	`pd.read_csv()`	Universal, but slow for large files
Parquet	`pd.read_parquet()`	Fast, compressed, preserves types — use for large data
JSON	`pd.read_json()`	Nested data, API responses
Excel	`pd.read_excel()`	Needs `openpyxl`
SQL	`pd.read_sql()`	Direct from database

Questions to ask about any dataset

How many samples? How many features?
What are the column types? (numeric, categorical, text, datetime)
How much is missing? Is it random or systematic?
What is the target variable? Is it balanced?
Are there duplicates?

AI/ML Notes

Explorer

Loading and Inspecting Data

Loading and Inspecting Data

First steps with any dataset

Common formats

Questions to ask about any dataset

Links

Graph View

Table of Contents

Backlinks