import pandas as pddf = pd.read_csv("dataset.csv")# Always run these firstprint(df.shape) # how big is it?print(df.head()) # what does it look like?print(df.info()) # column types, missing valuesprint(df.describe()) # statistics for numeric columnsprint(df.isnull().sum()) # missing values per column
Common formats
Format
Load with
Notes
CSV
pd.read_csv()
Universal, but slow for large files
Parquet
pd.read_parquet()
Fast, compressed, preserves types — use for large data
JSON
pd.read_json()
Nested data, API responses
Excel
pd.read_excel()
Needs openpyxl
SQL
pd.read_sql()
Direct from database
Questions to ask about any dataset
How many samples? How many features?
What are the column types? (numeric, categorical, text, datetime)